Google hack

Conspiracy · December 17, 2008

okay I been testing a theory and it kind of works.

so you could find anything on google if you type in the correct search phrase, i began experimenting with the popular file transfer websites such as [yousendit] [megaupload] [rapidshare], soif you type something like:

[ yousendit nina simone ] without the brackets, you'll find alot of links of people uploaded nina simone albums and anyone could download them if they haven't expired those sites have 7 days expiry or something ..

so go ahead and try it!!

DISCLAMIR: I am not encouraging anyone to actually download those albums because they are pirated and thats illegal , this is only me researching!

wayrax · January 20, 2009

Great.

Let me try, I'll get back to U.

Urban · January 20, 2009

It works saxib.. tried one so far from megupload

Arsenal4Somalia · April 7, 2009

I hate Rapidshare u've to wait untill u make the next download, but it works fine

Jacaylbaro · April 7, 2009

I don't like that rapidshare too but i guess we don't have a choice sometimes ...

Thanks Conspiracy for this .........

Sherban Shabeel · April 7, 2009

props

Caano Geel · April 7, 2009

this is a very general hack, and the concept forms the base of a script that i use for generic media searches (original below and not written by me) .. you can modify the file types it searches and i suggest you add a pause to the search and get phase i.e. show the sites found and ask the user to continue

details are here http://noflashlight. com/

code:

#!/usr/bin/env python

"""This script is published under the MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy

of this software and associated documentation files (the "Software"), to deal

in the Software without restriction, including without limitation the rights

to use, copy, modify, merge, publish, distribute, sublicense, and/or sell

copies of the Software, and to permit persons to whom the Software is

furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in

all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR

IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,

FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE

AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER

LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,

OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN

THE SOFTWARE."""

import os, socket, sys, re, urllib, urllib2

socket.setdefaulttimeout(10)

class SimpleWGET(urllib.FancyURLopener):

'''

a simple class to download and save files

'''

def __init__(self):

urllib.FancyURLopener.__init__(self)

urllib._urlopener = self

def fetch(self, url, dst="."):

fn = os.sep.join([dst, urllib.unquote_plus(url.split('/').pop())])

if os.path.isfile(fn):

print "Skipping... %s already exists!" % fn

return

if sys.stdout.isatty():

try:

urllib.urlretrieve(url, fn,

lambda nb,bs,fs,url=url,fn=fn: self._reporthook(nb,bs,fs,url,fn) )

sys.stdout.write('n')

except IOError, e:

print str(e)

except Exception, e:

pass

else:

urllib.urlretrieve(url, fn)

def _reporthook(self, numblocks, blocksize, filesize, url=None, fn=None):

try:

percent = min((numblocks*blocksize*100)/filesize, 100)

except:

percent = 100

if numblocks != 0:

sys.stdout.write('r')

if len(fn) > 65:

third = round(len(fn) / 3)

fn = '%s...%s' % (fn[:third],fn[third*2:])

sys.stdout.write("%3s%% - Downloading - %s" % (percent,fn))

sys.stdout.flush()

def http_error_default(self, *args, **kw):

print "Oops... %s - %s" % (args[2],args[3])

class Google:

'''

search google and return a list of result links

'''

@classmethod

def search(class_, params, num_results='10'):

valid_num_results = [ '5', '10', '25', '50', '100' ]

if num_results in valid_num_results:

# build the query

url = 'http://www.google.com/search?num=%s&q=%s' % (num_results, params)

# spoof the user-agent

headers = {'User-Agent':'Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11'}

req = urllib2.Request(url, None, headers)

html = urllib2.urlopen(req).read()

results = []

# regex the results

atags = re.compile(r'<a.*?>', re.IGNORECASE)

href = re.compile(r'(?<=href=("|')).*?(?=1)', re.IGNORECASE)

for link in atags.findall(html):

if "class=l" in link:

results.append(href.search(link).group())

return results

else:

raise ValueError( "num_results must be either %s" % ', '.join(valid_num_results) )

class MediaFinder(SimpleWGET):

def __init__(self):

SimpleWGET.__init__(self)

def find_music(self, term, results_to_search='10', strict=True):

'''

do a smart google search for music, crawl the result pages and

download all audio files found.

'''

audio_ext = ['mp3','m4a','wma','wav','ogg','flac']

audio_options = {

'-inurl' : ['(htm|html|php|shtml|asp|jsp|pls|txt|aspx|jspx)'],

'intitle' : ['index.of', '"last modified"', 'description', 'size', '(%s)' % '|'.join(audio_ext)]

}

vars = ''

for opt, val in audio_options.items():

vars = '%s%s:%s ' % (vars, opt, ' '.join(val))

query = urllib.quote_plus(' '.join([vars,term]))

try:

google_results = Google.search( query, results_to_search )

except urllib2.URLError, e:

print "Error fetching google results: %s" % e

i = 1

for url in google_results:

print "Searching %s of %s pages." % (i, len(google_results))

if strict:

self.download_all(url, audio_ext, term)

else:

self.download_all(url, audio_ext)

i += 1

def download_all(self, url, extensions=['avi','divx','mpeg','mpg','wmv','flv','mp3','m4a','wma','wav','ogg','flac'], term=''):

'''

download all files found on a web page that have an extension listed.

(optionally match a term found in the URL)

'''

media = re.compile(r'(?<=href=("|'))[^"']+?.(%s)(?=1)' % '|'.join(extensions), re.IGNORECASE)

try:

req = urllib.urlopen(url)

html = req.readlines()

req.close()

found = []

for line in html:

match = media.search(line)

if match:

link = match.group()

if 'http://' in link: # absolute url

found.append(link)

elif link[0] == '/': # from web root

root_url = url[0:url.find('/',7)]

found.append(''.join([root_url,link]))

else: # relative urls ../ ./ etc.

link_dir = url[0:url.rfind('/')].split('/')

for piece in link.split('../')[:-1]:

link_dir.pop()

filename = re.search('[^./].*$',link).group()

link_dir.append(filename)

found.append('/'.join(link_dir))

if not found:

print "No media found..."

else:

print "Found %s files!" % len(found)

for url in found:

if not term or

strip_and_tolower(term) in strip_and_tolower(url):

self.fetch(url)

except IOError, e:

print "Error: %s" % e

def strip_and_tolower(url):

strip = re.compile('[^w]', re.IGNORECASE)

return strip.sub('', urllib.unquote_plus(url.lower()))

if __name__ == "__main__":

finder = MediaFinder()

args = sys.argv

if len(args) in range(2,6):

term = args[1]

results_to_search = '10'

strict = True

if '-r' in args:

results_to_search = str(args[args.index('-r')+1])

if '--non-strict' in args:

strict = False

try:

finder.find_music( """%s""" % term, results_to_search, strict )

except:

print "n"

else:

print """usage: %s "search terms" [-r (5|10|25|50|100) [--non-strict]]""" % args[0]

if you need any help understanding the structure and etc .. gimme a shout

Caano Geel · April 7, 2009

p.s.

from the "def find_music(self, term, results_to_search='1 0', strict=True): " function, the query sent to google for "nina simone" is:

code:

vars= -inurl(htm|html|php|shtml|asp|jsp|pls|txt|aspx|jspx) intitle:index.of "last modified" description size (mp3|m4a|wma|wav|ogg|flac)

term = nina simone

these are joined and sanitised for http consumptions as below - the results of join variable, are used to generate the actual query that is sent

code:

join = -inurl:(htm|html|php|shtml|asp|jsp|pls|txt|aspx|jspx) intitle:index.of "last modified" description size (mp3|m4a|wma|wav|ogg|flac)  nina simone

query = -inurl%3A%28htm%7Chtml%7Cphp%7Cshtml%7Casp%7Cjsp%7Cpls%7Ctxt%7Caspx%7Cjspx%29+intitle%3Aindex.of+%22last+modified%22+description+size+%28mp3%7Cm4a%7Cwma%7Cwav%7Cogg%7Cflac%29++nina+simone

Geel_jire · April 7, 2009

^ looks interesting.

this is the first ive seen of python source code.

it appears very similar to c in structure with an OOP twist (C++)

The Join is it, Join as in concatenation or is it a part of the query language (like SQL inner/left/right Joins)

where do you run this code ? and how does it gain access to the SOCKET API and other API's necessary to complete its task.

Caano Geel · April 7, 2009

^ you need python (http://www.python.o rg/) installed..

the libs "os, socket, sys, re, urllib, urllib2" are all part of the basic install and you run in from the command line. Since we are not creating any raw sockets, this doesnt need any special permissions.

if you have a *nix or mac its straight forward. follow the instructions on the here.

wrt. python, yes it looks a lot like C++, they are both C derivatives. But python is much simpler to program with and great for quick hacks/prototyping and etc..

wrt. join, it is used to concat lists

Nin-Yaaban · April 9, 2009

or just get those songs from pirate bay or the dozens of others that offer the peer 2 peer sharing thingy on the internet.

Caano Geel · April 9, 2009

^I realise this is not interesting for most peepz but how shall i put it .. the more socially challenged amongst us might like to know.

anyhow, another google term. that i use daily is "define", i.e. you can ask google to define terms for you, e.g. typing "define: geek" into the google search box gives a list of dictionary def i.e.:

# a carnival performer who does disgusting acts

# eccentric: a person with an unusual or odd personality

wordnet.princeton.ed u/perl/webwn

# Geek! is the second EP by My Bloody Valentine, and their first to feature bass player Debbie Googe. It was released in December 1985. ...

en.wikipedia.org/wik i/Geek!

...

anyhow for those of you that use a shell often - specifically bash, the following function in your .bashrc config file is very useful.

code:

# Define a word

# written by Andrew V. Newman

# last modified Fri Mar 5 07:27:17 MST 2004

# Shell script to bring up google based definitions for any compound or simple word

# - USAGE:

## define geophysics

## define shell script

## define bash

function define ()

{

# Checks to see if you have a word in mind or just typing in the command. If the latter,

# it will give you a short usage and example before exiting with error code 1.

if [ ${#} -lt "1" ]; then

echo "Usage: `basename $0` 'TRM' "

echo " where TRM is the word that you would like defined"

echo " Examples:"

echo " `basename $0` Fedora"

echo " `basename $0` shell script"

echo " or:"

echo " `basename $0` google"

exit 1

fi

# Use 'resize' to correctly set the environmental variables for your terminal

# (this may not work for all terminals).

eval `resize -u`

# Set the lynx program to your current term width (doesn't do it by

# default if being piped directly into another file), and turn off the

# default justification (can make output UGLY).

LYNX="/usr/bin/lynx -justify=0 -width=$COLUMNS"

# Set your 'more' or 'less' favorite pager.

PAGER=/usr/bin/less

# Sets a URL BASE assuming multiple variables as a compound word. The

# way WORD is defined, it will replace all the blank spaces with the URL

# equivalent of a blank '%20'.

WORD=` echo ${@} | sed 's/ /%20/g'`

#echo $WORD

# Define the google URL to search for the definition.

URL=http://www.google.com/search?q=define:$WORD

# Call up the google search in lynx and dump the output to a temp file

# and all stderr to /dev/null .

$LYNX $URL -dump 2> /dev/null >define.tmp

# Displays definition after stripping off unnecessary text before and

# after the definitions (the first sed command only prints lines

# between the line containing 'Definition' and the line containing

# '_____', inclusive. Then it pipes it through a second sed command

# which replaces the 'lynx' URL numbers with 'WEB:', before piping it

# through a third sed command that wipes out that last line. Finally

# the information is piped to a pager command which lets you page

# through the text on-screen with the space bar.

sed -n '/[dD]efinition/,/_____/p' define.tmp |

sed '/[[0-9][0-9]]/s// WEB: /' |

sed '/____/d' | $PAGER

# Remove temporary file.

rm -f define.tmp

}

obviously you need to install the command line browser "lynx". if you use ubutun/debian, just type: "aptitude install lynx" at the command line and that will install it.

what it does do? you ask

lets say your in the terminal and you want to look for the definition of a term .. etc.. you just type "define " and you get output as below:

how does it work?

1. it calls google with "http://www.google.c om/search?q=define:$ WORD", where word is the term you typed. the results are dumped to a file called "define.tmp"

2. we call "sed" to strip off everything before "Define" in the returned page, i.e. all the related stuff that google returns.

anyhow, the point is that those interested can adapt this to do things more relevant to your interests.

FORUM

Google hack

Recommended Posts

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Join the conversation

TODAY'S ACTIVE DISCUSSION TOPICS