HOW TO SEARCH THE WEB
by fravia+
~
Letter 008 - November 1997
    
   ADVANCED SEARCHING TECHNIQUES
(Combing and klebing)
(Based on some original private emailings from +ORC)
~
This stuff has been gathered and written by fravia+, so if you 
leech, copy,
use and spread it have at least the decency to give 
credit
 
__Combing__
(Some other specific combing examples are to be found inside my antismut pages)
           
What is combing?
Combing is a very effective search strategy: instead of simply searching, 
you 'milk' (or 'comb') various other net resources:
- The continuously updated "Top 100", "Top 1000", "Top whatever" URL-locations
  ('Real' combing)
- Usenet newsgroups and their various "vigilant filters" and "short range queries"
  (Usenet combing)
- Relevant site links pages.
  (a form of 'crumbs gathering', see my anti-smut pages)
  
Real combing: The WebSideStory example
Best way to learn combing is to have a try by yourself:
Let's take as example (yet there are THOUSAND of these 'top whatever'sites) 
one counter-related site that I have been using myself (it offers quick 
text-only stats and awful graphic stats): WEBSIDE STORY.
Here is websidestory's self-praise:
Updated every four hours. Last update Mon Nov 3 12:00:01 1997 - PST
WebSideStory, Inc. currently monitors 30,783 sites who have 15,264,666 visitors per day.
There are 22,750 sites listed in 36 categories, averaging 1,996,241 visitors per day.
 
The problem with all these 'top whatever' sites is that you oft 
have to wade trough a lot of pages to get where you are interested, 
because the poor sods want you to read their awful ads. 
Famous Listing of the Best Sites on the Internet
 
In the case of Websidestory you'll
 for instance first land at http://www.hitbox.com/wc/world.html:
WebSideStory's first page
 
Yet you'll eventually land inside this second page (divided by categories):
Websidestory's second page
And here you'll be eventually 
able to choose among the various categories that this counter-related database depot 
has chosen, for 
instance the following ones (I have of course chosen the ones I reckon could yeld some results:
Now you have seen it...
Obviously combing is an important technique for whatever interest you may 
have, quite effective and pretty  
useful in order to spare an incredible lot of Internet searching hours.
For combing purposes you may also use:
1) ftp search, looking for "hidden" subdirectories with relevant names  
	As anybody that knows how to use  
ftp search ("This server is located in 
Trondheim, Norway")
 already experienced, the ftp search approach (that fishes hidden directories) 
can fish incredible (if tricky to interpret) results.
 Just do a quick search for 
'warez' and you'll see what I mean.
2) the "big page provider" search engines
(Like the search engines that work page specific for geocities at  
http://www.geocities.com/search/  
or for mygale, or for angelfire, or for fortunecity, or for chez, or for you name one 
of the thousand existing free pages providers that have specific search engines)
There are THOUSAND of 'top whatever' counters and many carry some form 
of 'top side listing' within... you may want to examine a list with 
MANY counters on this good page: 
Web
Counters and Trackers (Access Counters for Web
Sites; Free Counters; Web site auditing)
 
Usenet combing
Usenet combing can work "on the fly" or "regularly" through the "Vigilant" 
 filter at
filter@vigilant.bc.ca
I'll show you for instance one of my favourite simple queries:
FIND how-to-search tutorial manual
		NOT spam
		NOT top position
		NOT advertising
		MAX 8
Such a query would give you useful information about "searching techniques" on 
the Web, you may of course construct how many queries you like and *register* 
(for free) by the vigilant filter, in order to get 
the results of your usenet queries emailed to you every day or week or 
month.
  
The vigilant robot
Learn the secrets of usenet FILTERING! Email
filter@vigilant.bc.ca with
 the word "help" inside BOTH subject and text
 and learn how to use it as soon as you get vigilant's
automated answer... this robot
 is capable of sending you automatically ALL usenet
messages that contain
 the wording that you have chosen... vigilant is NOT a
usenet depot, like Dejavu or
 reference.com... vigilant will send you (obviously for
free) "on-the-fly" all usenet
 messages that transit around dealing with matters that
may interest you, at times
 inside newsgroups you do not even know the names of...
to master well its filter
 capabilities is quite
  tricky though... study it and use it... you'll never
regret it and I'm sure you'll
 thank me for this tip
UNFORTUNATELY DOWN SINCE THE BEGINNING
OF AUGUST!
Why? Has anybody any clue? Are
there other "vigilant" services? This is another of the
"mysteries" of the Web: good services are retired and
awful bogus and useless "push" services abound:(
Dejanews
Remember that you can gather an INCREDIBLE amount of
information through the following
Usenet "depot":
DejaNews
__ONE OF THE *SCARIEST* BIG BROTHER SNOOPER ON THE
WEB__
You'll use it a lot, it allows you to reconstruct a
personality profile as soon
as somebody uses newsgroups (like all do). As a matter
of fact I tried to understand
who the hell hydes behind this service... have a look
at my 
deja.htm page if
you are interested too in this kind of things... hey,
did you know that there exists
also a nice 
stalking page of mine where
these matters are
 explained a little more?
And did you know that you
may even
"snatch" information
 from people browsing your pages?
Reference.com
Finally, you can gather an INCREDIBLE amount of information through the following
Usenet "depot":
reference.com
here you'll be able to "register" your
automated queryes... and THAT, believe me, is
really useful to snoop what's going on and where are the sites 
that you are looking for...
In fact usenet combing could be 
translated in 'let other people do the searches for me...": you'll 
simply find email snippets of people that has found the solution to your 
query inside some 
usenet group you do not even know the name of!
Usenet queries that can be done through the two big Usenet "depots": 
Dejanews 
and 
email query, are possible ALSO through the major search engines (if you know 
how to use them) and using the 'klebing' techniqe explained below:
Many of the main search engines allow such querying 
too, and they use (of course) the services of either Dejanews or emailquery.
NOTE THAT THERE ARE MANY MORE 'usenet-depot'... I recently found an 'italian' 
one at 
http://www.mailgate.org/mailgate/index.htm who 
knows how many more there are around!
 
__klebing__
Fishing query strings and locations
Klebing is a 'reversing search' technique that goes ways beyond 
"combing". And which offers incredible value. We will clear out 
what klebing is, below, using a ready made example on a site 
that you'll probably already know (it is an important hacker site and I link
  to it myself inside  
my links page):	here is the 'normal' 
URL of that site: 
L0pht  heavy
industries.
 We can use LOpht for this example because LOpht 
has (publicly) the 'row material' that we need for klebing: the 'remote connexions' list.
It is basically a very 
simple 	CGI-script, that updates inside its own database (LOpht updates every day) 
all the "remote" URL locations (i.e. the sites the various visitors come from) accessing any of the pages of a given site. 
You may easily write such an analogouus spider and add 
it to your site! In order to write quickly (and dirty) 
a 'crude' CGI-script like this you 
just need to list all the 
var where = document.referrer variables that any 
lamer's browser carries inside (well... not our reversed and 'ameliorated' 
browsers... in order to learn the relevant techniques you may want to have 
a look at Mammon_'s Reversing Netscape's buttons and 
menus essay... my copy of Netscape carries for instance a different random 
-and of course faked- document.referrer variable everytime it accesses a new 
site :-)
Well, have a look at the next link and you'll understand what I mean:
Here you have the real, updated LOpht's location you'll use yourself 
in order to perform your updated klebing endeavours: 
http://www.l0pht.com/ref.html
And here you have a copy of it that you should examine 
NOW in order to better follow what I'm telling you.
 In order to  
discuss together with you some of the 'results' of 
our klebing activities I have copied a 'still image' of this 
continuously updating database inside my site, talen from the location 
above 
on 4 Nov 1997 (to-day), here it is: 
lophtrev.htm
So, now that you had a look at them, let's say a couple of things:
1)	The utility of such a script from the Webmaster's point of view 
is obvious: he can immediately see 
WHO is sending hits to him and WHERE inside his site does he link to (and 
he can 'punish' eventual  
'fastidious' linking inside his site simply modifying 
the name of the branched 
pages, like I'll do soon with the academy section of my site if you keep entering from the 
sides to my pages :-(
2)	The utility of such a script (if publicly presented, like 
this by LOpht, or else 
if 'somehow' findbar inside a /cgi subdirectory -see my antismut pag for the 
relevant CGI-cracking techniques :-) is for our search purposes HUGE! If the site 
has some attinence with fields you are interested in (and LOphts for sure 
has it with sites that may interest us!) 
you are in for a surprise... in fact one wonders what's the point 
of laboriously browsing the web in search of possible new intersting 
sites where you could eventually learn something! Let 
those same sites COME TO YOU all by themselves alone... isn't it nice?
In fact, what do we have here?	  
Let's have a look at some intersting little fishes:
Yahoo and excite for instance, find both this site through the cdc cult
1409  |  http://www.yahoo.com/Society_and_Culture/Religion/Humor/Parody_Religions/Cult_of_the_Dead_Cow/ -> /cdc.html
 125  |  http://www.excite.com/search.gw?trace=1&search=hackers -> /cdc.html
'our' astalavista is also present:
124  |  http://astalavista.box.sk/cgi-bin/marek/robot/robot?srch=warez -> /lounge.html
Note thet there is already something that may be interesting for you (albeit 
well known by all search-experts): the FORM that an 
excite or astalavista query takes!
Yes, if you have read my previous letters, you'll have seen that it is possible 
to query search engines per email using URL addresses like:
  
http://lycos11.lycos.cs.cmu.edu/cgi-bin/flpursuit?first=1\\&maxhits=30\\
&minterms=1\\&minscore=0.01\\&terse=standard\\&query=linguistic+phenomena         
Therefore we have here a simple 'template' that we can immediatly use for OTHER
queries... c'mon: try it out: cut and paste the following line:
http://www.excite.com/search.gw?trace=1&search=hackers
that we have found through our klebing work, and paste it inside the 'location' 
window of your copy of navigator...
Have you done it?
Well, now backspace over hackers and digit instead 
crackers
.
Now press enter and have a look: your own ready-made excite search string!
And youll find THOUSAND of powerful and frequent or funny and seldom used 
'query string' possibilities trough 
this klebing method... d'you understand now how POWERFUL this can be?
 
New strings   
Back to our klebing page... as you can see, in order to land 
somewhere at LOft a part of these visitors has used Yahoo and has 
searched for 'hackers', 'attress', 'spycamera' and more
Now, some of these are banal, like 'hackers', yet some are quite 
interesting, like 'email intercepting'.
This can also be quite interesting... I have quite a lot of ready-made 
strings that I use with the search engines, and some of them I have gathered 
klebing sites... else I would probably never have come to some ideas.
 
Watch the watchers 
Some of our enemies have sites somewhere that tehy use to check us... it may 
be quite interesting to snoop onto those sites... through klebing you'll get 
them... have a look at what we have here at LOpht:
  
    316  |  http://www.microsoft.com/security/ntprod.htm -> /advisories.html
103  |  http://www.microsoft.com/security/issues.htm -> /advisories.html
 
Unknown mysteries   
 This one links to a cgi-bin page... why?:
 
 
  105  |  http://nowhere/nothing.html -> /cgi-bin/Count.cgi
well, this tells us
 FIRST
 That there is indeed a cgi-bin directory here with a Count.cgi script and
 SECOND
 that nowhere/nothing is interested in it.
 
Old friends   
 And who the hell is this next one? Our good old friend Bokler from Deja? (See my 
deja.htm page)
 
 14  |  http://spider.bokler.com/bokler/crak_body.html -> /index.htm
 Well... rich fishing, isn't it?  
 And the following ones could be interesting too, don't you believe?
 
106  |  http://astalavista.box.sk/cgi-bin/marek/robot/robot?srch=warez&submit=+search+ -> /lounge.html
114  |  http://netfind.aol.com/search.gw
 
 Yes, when you start klebing, you never finish off experimenting! :-)
 
 Go ahead, enjoy!
   
(c) fravia+ 1997, work in progress, all rights reserved nevertheless
                     
              
how to search 5                                                                    
how to search 6
how to search 7                                                                     
Entrance
links
~~
tools 
~~
antismut
anonymity
~~
~~
~~
search_forms
mail_fravia
fravia+ 04 Nov 97