S E A R C H 
 |  
 
  
fravia's how to search ~ Lesson 
10 ('light' version)
  
 
  
Fravia's Nofrill 
Web design 
(1998)
 | 
   | 
 
June 1998 Ported to searchlores.org in February 2000
 | 
 
Lesson 10 LET THE BOTS SEARCH FOR YOU
 ...and build your own search-bots :-)
  
| old version |  
Based on some original private  
emailings from +ORC | 
 Searchengines' strings
cracked by Master Accmailer G.E. 
Boyd |  
  
Preceding lessons:   
 lesson_5 about general
agora http:// retrieving ~ July 1996 
 lesson_6 about ftping 
files agora queries and emailing altavista ~ December 1996 
 lesson_7 about the 
W3gate, search spiders, error messages and evaluation of results ~ March 1997
 
 lesson_8 about advanced 
searching techniques (combing 
and klebing) ~ November 1997
 
 lesson_9 about "effective" 
searching techniques (infoseek 'finalised' and dejanews filtering) ~ January 1998
 
 
 | 
   | 
| Ported to Fravia's searchlores.org in February 2000 | 
[Never forget the bots!] 
[The 'pasted stringsearch' method] 
[FTPmail: mailing and re-mailing  :-)] 
[Is gopher dead?]
LET THE BOTS SEARCH FOR YOU
...and build your own search-bots :-)
To know answers is easy, the difficult 
part is knowing how to find any answer
Few people know how to search, and even less know how to find 
what they have searched for
Never forget the bots!
I have decided to 'resume' some of the must know 
techniques for automated searching and data retrieval on the web for all those 
 readers that keep writing me that some of the ftpmailer listed in my older lessons 
don't work anymore. Kids: the Web is a Quicksand! Lotta sites and servers and bots 
DISAPPEAR, but this does not mean anything at all: since you (should) know the sublime art: 
how to 
search, you'll always be able to catch the same (or analoguous) sites and services 
elsewhere!
As you already know (since I assume you have read the preceding lessons and 
have learned the basic of all 'getweb' techniques :-) there are many automated servers, 
out there, that will send you pages/files/source code and/or will answer 
your queries... of course for free, this is still 'our' web after all, the evil powers 
of commercialisation and advertisement don't dominate the net (yet)
As usual, since you're going to work with email, first of all check how 
much info you are leaking around with your own emails: send right now an email to
echo@tu-berlin.de
write 'test' both in 
the 
'Subject' and in the 'Text' fields and examine with attention what you will get back as 
automated answer in a couple of seconds from this German echo 
bot...
OK? Everything ok? Your emailing traces are nice enough? 
Now let's start this lesson 10...
Let's list the main services we'll deal with:
1)      I wanna get pages, files and images from da net!
        AGORA
                agora@dna.affrc.go.jp           [01]
                agora@kamakura.mss.co.jp        [02]             
                agora@www.eng.dmu.ac.uk         [03]
        AGORA-LIKE
                w3mail@gmd.de                   [04]
                w3mail@enigma.gex.gmd.de        [04]
                webmail@www.ucc.ie              [05]
2)      I wanna search da net
        GETWEB
                getweb@unganisha.idrc.ca        [06]
                getweb@lanic.utexas.edu         [07]
                getweb@usa.healthnet.org        [08]   
        ILIAD
                iliad@algol.jsc.nasa.gov        [09] 
                iliad@rosy.tenet.utexas.edu     [09]
3)      I wanna patrol da net
        E-MAIL-QUERY
                Email-Queries@Reference.COM     [10]
4)      Oldies but useful
        GOPHER SERVERS AND VERONIKAS
                gophermail@eunet.cz             [11]  
                gopher@dna.affrc.go.jp          [12]    
                http://veronica.psi.net         [13]
        
[01] the most used one by those who know this stuff
[06] a beautiful one for searches:
[04] a very powerful one for images retrieval
[08] very fast but with a 200.000 bytes weekly quota
[09] iliad has a "get url" or a "iliad query" function
[10] a very powerful 'filter' possibility to automatically patrol usenet
Each one of the preceding services will give us the possibility to learn 
a different face of searching... we'll now examine them all (only three in this version, 
I'll complete soon)
                agora@dna.affrc.go.jp        [01]
Who knows if these nice people from Japan really grasp how IMPORTANT their 
fantastic service 
is for any Interenet user? This is the "mother of all agoras", because it's 'speedy 
quick' and allows the three famous commands SEND (your 
target URL's text), SOURCE (your 
target URL with all its 
HTML formatting, so that you can browse it off line, pretty important in order to 
browse 'almost' anonymously a delicate target site :-) and DEEP 
(one URL with 
all linked URLs on it... yet whatch it! You can get hundred of emails if your target is 
a page that links to a lot of pages, like my aca300.htm).
Agora allows the retrieval of zipped files as well, btw, if you for instance ask for:
send ftp://ftp.crl.com/users/iv/iverham/ua.zip
agora will 
deliver you Uzi Paz's famous (and invaluable) file on Usenet access, techniques and 
newsgroups.
The 'pasted stringsearch' method
So, how do you do a search with an agora? Well, the trick is to do a search 
exactly as you would do it in your own browser... therefore you must first of all learn 
how you should search using your own browser, which many readers still don't know: i.e. the 
'pasted stringsearch' searching method... very useful indeed if you until 
now only searched using the ready-made searchengines forms, like the altavista 
one below or, if, even more slowly, you only used the 
advertisement overloaded front pages of the search engines themselves :-)
1) copy the following 
line (highlight it and then CTRL+C)http://www.altavista.digital.com/cgi-bin/query?pg=q&what=web&kl=XX&q=bozo
2) paste it into your browser's "URL" small window (CTRL+V, duh)
3) replace the "bozo" keyword with your search phrase, separating different words with a plus (+) 
sign, not with blanks... [ida+disassembler+regged] for instance... :-)
4) Press ENTER and up you go... much quicker than accessing altavista's real site isn't it? 
Actually it's even quicker than using a form like my own one:
Try both the form and the 'pasted stringsearch' methods for searching on line 
now... which one is quicker? :-)
Now, the same 'stringsearch' method 
can be used (with an agora server), per email. The 
advantage in this case of course is NOT rapidity, is automation... the following  
pre-prepared email form can be your first 'home-made' generic search agent... just cut and 
past the following block as TEXT in a email to agora@dna.affrc.go.jp and you'll see what 
I mean (send after having search-replaced [bots+source] with [your+own+searchstring], duh):
send http://www.altavista.digital.com/cgi-bin/query?pg=q&what=web\
&kl=XX&q=[bots+source]
send http://webcrawler.com/cgi-bin/WebQuery?searchText=bots+source
send http://search.dejanews.com/dnquery.xp?QRY=bots+source\
&defaultOp=AND&svcclass=dncurrent&maxhits=20&ST=QS&format=terse&DBS=2
send http://search.dogpile.com/search?q=bots+source&fs=web&ss=stop\
&to=twenty
send http://www.excite.com/search.gw?trace=a&search=bots+source
send http://www2.infoseek.com/Titles?qt=bots+source&col=WW
send http://www.lycos.com/cgi-bin/pursuit?cat=lycos&query=bots+source\
&mtemp=lite
send http://www.metacrawler.com/cgi-bin/nph-metaquery?general=bots+source\
&method=0&sort=relevance<arget=window&useFrames=1&iface=int1
send http://search.opentext.com/omw/simplesearch?SearchFor=bots+source\
&mode=and
send http://guaraldi.cs.colostate.edu:2000/search?KW=bots+source\
&Boolean=AND&Hits=10&Mode=MakePlan&df=normal&AutoStep=on
send http://search.yahoo.com/web/advanced/bin/search?p=bots+source
See? 
Now you can automate the whole process: prepare a batch file that will compose your 
'agora search' email, say every two days... with (some of) the selected search 
engines above, with your 
preferite search strings... and you are set for fishing the deep deep web without 
much work...
                getweb@unganisha.idrc.ca        [06]
OK, admittely the 'pasted searchstrings' method above has got a strong 'concurrence' 
from the new 'breed' of getweb servers... unganisha, for instance is 
a beautiful canadian robot. The getweb servers make it extremely easy to use any 
form based search engine, and have moreover integrated automated facilities for 
three difefrent search engines: SEARCH ALTAVISTA, SEARCH YAHOO and SEARCH INFOSEEK.
Just email getweb@unganisha.idrc.ca leave the subject blank and 
compose in your text the following:
begin
SEARCH YAHOO "automated retrieval" bots 
end
Notice the blank lines BEFORE begin, after begin, before end and after end. Since these 
blank lines are required by some of the getweb systems, you better get used to use them with 
EVERY getweb system, just in case. Of course you can substitute SEARCH ALTAVISTA or SEARCH INFOSEEK to 
the SEARCH YAHOO command above. SEARCH INFOSEEK has two important additional switches 
that will give more power to your search: NN (search the usenet) and NW (search 
only among the past MONTH of news)
Just email getweb@unganisha.idrc.ca leave the subject blank and 
compose in your text the following:
begin
SEARCH INFOSEEK NW "automated retrieval" bots 
end
Getweb's limits
There are limits on all these automated servers, these vary and 
lay currently between 10 and 100 documents requests 
every week OR between 100.000 and 700.000 kilobytes every week, of course 
you can use different email accounts to multiply your allowed quotas. Week limits 
regenerate after seven days from trespassing, NOT on monday morning :-)
                Email-Queries@Reference.COM     [10]
The emmail query service provides a powerful interface that lets 
you refine queries by author, author's organization, subject, 
newsgroup or e-mail list
So, how d'you use it? Well, first of all TRY IT right now with a 
"on the fly" query...
FIND 'software reverse engineering' WHERE AGE < 14 DAYS
And then send for HELP and learn how to create your own automated filtering bots... 
here you have a very simple example:
DEFINE QUERY botsscri AS 
FIND agents 
AND scripts
AND source
AND NOT fan money jobs sell help buy god
END
Ok, that should be enough for a start... and I believe that 
if you never used this service before 
you'll thank me a long time for this... more on the 'full' version of 
this lesson... 
FTPmail: mailing and re-mailing  :-)
Explained elsewhere on searchlores.org
Is gopher dead?
This is the 'light' version, I'm sure you have had enough info for to-day...  anyway, you 
should at least understand that gopher of course is not dead, the www notwithstanding... :-)
   Should you want to retrieve large zip files (say MPEG huge files) 
   that are accessed via
   a web page (and don't refer to any FTP site... else we should
   use ftpmail :-) you should by all means learn what gophers are 
   and how to use them. The idea to download huge files on-line is 
   IMO pretty silly: the aleas of the web and the number of accesses 
   to *ahem* pretty sensible files make such downloads a very difficult 
   enterprise at times. Once you have mastered the gopher techniques you'll never 
   download huge files on line again (get them sent to you by an automated 
   bot that will automatically retry to connect every time its connection 
   breaks... isn't it nice?)
 
 Go ahead, enjoy!
   
(c) fravia+ 1998, work in progress, all rights reserved nevertheless
                     
  
Back to how to search             
how to search 5                                                                    
how to search 6
how to search 7   
how to search 8                                                                    
how to search 9   
Entrance
links
~~
tools 
~~
antismut
anonymity
~~
~~
~~
~~
search bots
search_forms
mail_fravia
~~
 
(c) 
Fravia 1995, 1996, 1997, 1998, 1999, 2000. All rights reserved