Search engines and user-agents list (2006)

Search engines, web crawlers and user-agents

Published: 18 April 2006
Author: Serban Ghita

I've created a list with active web crawlers, web query/crawl tools, captured on a web site of mine. I've started the logging a few months ago, and the results are pretty amazing, seems like a lot of companies and institutes are making web research through distributed web crawlers. Please note that:

  • i might be wrong about some of the signatures below for being crawlers. They might be only automated downloading programs (such as DAP, NetAnt), or spoofed browser identities.
  • the IP's of the crawlers a just for extra information, the distributed crawlers can have thousands of IPs
  • i will update and correct this list with a lot of features, news and informations about the crawlers
Crawler Signature: IPs: Obs:

A

Anonymous/0.0 (Anonymous; http://www.anonymous.com; noreply@anonymous.com) 63.133.162.98  
aipbot/1.0 (aipbot; http://www.aipbot.com; aipbot@aipbot.com) 24.177.134.6  
Mozilla/2.0 (compatible; Ask Jeeves/Teoma; +http://sp.ask.com/docs/about/tech_crawling.html) 65.214.44.39  
asked/Nutch-0.8 (web crawler; http://asked.jp; epicurus at gmail dot com) 131.112.125.105  

B

Blogslive (info@blogslive.com) 64.158.138.84  
Baiduspider+(+http://www.baidu.com/search/spider.htm) 202.108.11.234  

C

Cazoodle/Nutch-0.9-dev (Cazoodle Nutch Crawler; http://www.cazoodle.com; mqbot@cazoodle.com) 220.130.191.235 Nutch based
ccubee/4.0 194.213.194.207  
ccubee/3.5 194.213.194.201
ConveraCrawler/0.9d (+http://www.authoritativeweb.com/crawl) 63.241.61.7 About Covera Crawler
Crawler/1.0+http://elibron.com 83.149.215.35  
CPG RSS Module File Reader 82.208.190.46  
csci_b659/0.13 156.56.103.14  

D

Data Searcher/0.1 libwww-perl/5.65 80.97.105.98  
Mozilla/5.0 (compatible; DNS-Digger/1.0; +http://www.dnsdigger.com <a href='http://www.dnsdigger.com/'>DNS-Digger</a>) 83.227.119.189 (HEAD)
Mozilla/5.0 (compatible; DNS-Digger/1.0; +http://www.dnsdigger.com) 212.214.165.218

E

envolk/1.7 (+http://www.envolk.com/envolkspiderinfo.php) 70.169.191.4  
Exabot/2.0 193.47.80.39 About Exabot

F

Feedster Crawler/1.0; Feedster, Inc. 64.95.116.1  
findlinks/1.1.1-a1 (+http://wortschatz.uni-leipzig.de/findlinks/) 139.18.2.216 About FindLinks
findlinks/1.0.9 (+http://wortschatz.uni-leipzig.de/findlinks/) 139.18.2.209
findlinks/1.1.1-a5 (+http://wortschatz.uni-leipzig.de/findlinks/) 139.18.2.81
findlinks/1.1.3-beta2 (+http://wortschatz.uni-leipzig.de/findlinks/) 139.18.13.204
findlinks/1.1.3-beta2 (+http://wortschatz.uni-leipzig.de/findlinks/) 139.18.13.204
findlinks/1.1.3-beta6 (+http://wortschatz.uni-leipzig.de/findlinks/) 139.18.13.201
findlinks/1.1.3-beta8 (+http://wortschatz.uni-leipzig.de/findlinks/) 139.18.13.203

G

Gigabot/2.0; http://www.gigablast.com/spider.html 66.154.103.99 About Gigabot
Gigabot/2.0/gigablast.com/spider.html 66.154.103.158
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 4.0; Girafabot; girafabot at girafa dot com; http://www.girafa.com) 64.210.196.197 About Girafabot
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) 66.249.72.37 About Googlebot
Googlebot/2.1 (+http://www.google.com/bot.html) 66.249.64.54
Feedfetcher-Google; (+http://www.google.com/feedfetcher.html) 72.14.199.69
Googlebot-Image/1.0 66.249.65.80
Generic Mobile Phone (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html) 66.249.65.139
Gregarius/0.5.4 (+http://devlog.gregarius.net/docs/ua) 86.35.0.162  
GOFORITBOT ( http://www.goforit.com/about/ ) 216.69.177.55  

H

HooWWWer/2.1.3 (debugging run) (+http://cosco.hiit.fi/search/hoowwwer/ | mailto:crawler-info<at>hiit.fi) 128.214.112.85  
HouxouCrawler/Nutch-0.9-dev (houxou.com's nutch-based crawler which serves special interest on-line communities; http://www.houxou.com/crawler; crawler at houxou dot com 193.203.240.135 Nutch based

I

ia_archiver 209.237.238.235  
209.237.238.226
ichiro/2.0 (http://help.goo.ne.jp/door/crawler.html) 210.173.180.151  
IRLbot/2.0 (+http://irl.cs.tamu.edu/crawler) 35.9.45.19  
128.194.135.81
imds_monitor/0.1 211.37.79.20 (HEAD)
ilial/Nutch-0.9-dev 164.67.195.67  
IlTrovatore/1.2 (IlTrovatore; http://www.iltrovatore.it/bot.html; bot@iltrovatore.it) 213.215.201.223  

J

Jakarta Commons-HttpClient/3.0-rc2 206.188.0.22  
Jyxobot/1 195.113.214.206  

L

Linkie Winkie Crawler (http://www.linkiewinkie.com/) 212.227.22.5  
LWP::Simple/5.803 64.34.164.36 About LWP
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp) 68.142.251.86 About Y! Slurp
Misterbot-Nutch/0.7.1 (Misterbot-Nutch; http://www.misterbot.fr; admin@misterbot.fr) 213.251.133.12  
Mozilla/5.0 (Windows;) NimbleCrawler 1.13 obeys UserAgent NimbleCrawler For problems contact: crawler@healthline.com 72.5.115.44  
sproose/0.1-alpha (sproose crawler; http://www.sproose.com/bot.html; crawler@sproose.com) 38.100.225.6  
sproose/0.1 (sproose bot; http://www.sproose.com/bot.html; crawler@sproose.com) 38.100.225.7
38.100.225.12
My WinHTTP Connection 81.18.79.174  
TurnitinBot/2.0 http://www.turnitin.com/robot/crawlerinfo.html 64.140.49.69  

M

MJ12bot/v1.0.7 (http://majestic12.co.uk/bot.php?+) 81.178.102.15  
Microsoft URL Control - 6.00.8862 194.102.182.71  
Microsoft-WebDAV-MiniRedir/5.1.2600 86.34.227.121 (OPTIONS)
85.166.207.145
msnbot/1.0 (+http://search.msn.com/msnbot.htm) 65.54.188.146  
msnbot-media/1.0 (+http://search.msn.com/msnbot.htm) 65.55.213.86  
MOT-RAZRV3xv/85.83.E1P MIB/BER2.2 Profile/MIDP-2.0 Configuration/CLDC-1.1 193.230.161.122  
Microsoft Data Access Internet Publishing Provider Protocol Discovery 86.105.61.38 (OPTIONS)

N

noyona_0_1 72.9.228.79  
NetResearchServer/4.0(loopimprovements.com/robot.html) 67.180.149.252  
NetSprint -- 2.0 212.77.102.121 (pl, lt)
NutchEC2Test/Nutch-0.9-dev (Testing Nutch on Amazon EC2.; http://lucene.apache.org/nutch/bot.html; ec2test at lucene.com) 216.182.237.22  

O

OmniExplorer_Bot/6.68 (+http://www.omni-explorer.com) WorldIndexer 65.19.150.213  
Oracle Ultra Search 141.146.4.11  
Mozilla/5.0 (compatible; OnetSzukaj/5.0; +http://szukaj.onet.pl) 213.180.128.154 Language: pl, en;q=0.5, *;q=0.2

P

PeerFactor Crawler 88.191.11.81  
Python-urllib/1.16 208.223.208.181  
POE-Component-Client-HTTP/0.65 (perl; N; POE; en; rv:0.650000) 64.239.7.216  
ping.blo.gs/2.0 66.218.65.40 Referer: http://blo.gs/ping.php
psycheclone 208.66.195.11  
psbot/0.1 (+http://www.picsearch.com/bot.html) 217.212.224.159  
217.212.224.165

R

Robozilla/1.0 207.200.81.166 (Referer: http://directory.mozilla.org)
RAMPyBot - www.giveRAMP.com/1.0 (RAMPyBot - www.giveRAMP.com; http://www.giveramp.com/bot.html; support@giveRAMP.com) 64.27.2.18  
REBOL View 1.2.48.3.1 84.163.170.119  

S

Syntryx ANT Scout Chassis Pheromone 64.92.202.124 This crawler has no signature, but sends it's name through referer.
Syntryx ANT Scout Chassis Pheromone; Mozilla/4.0 compatible crawler 216.7.179.20 This time this crawler has signature
Snapbot/1.0 66.234.139.198  
38.98.19.83
Snappy/1.1 ( http://www.urltrends.com/ ) 205.138.199.126  
Sphere Scout&v4.0 (beta) - scout at sphere dot com 64.40.115.54  
64.40.115.55
SuperBot/4.6.0.69 (Windows XP) 194.105.24.56 (has referer information)
Shim-Crawler(Mozilla-compatible; http://www.logos.ic.i.u-tokyo.ac.jp/crawler/; crawl@logos.ic.i.u-tokyo.ac.jp) 157.82.254.2  
Szukacz/1.5 (robot; www.szukacz.pl/html/jak_dziala_robot.html; info@szukacz.pl) 193.218.115.7 (uses the same ip?not distributed crawl?)
NutchCVS/0.8-dev (Nutch running at UW; http://www.nutch.org/docs/en/bot.html; sycrawl@cs.washington.edu) 128.208.6.227  

T

TulipChain/6.03 (http://ostermiller.org/tulipchain/) Java/1.5.0 (http://java.sun.com/) Linux/2.6.15.5-x1 RPT-HTTPClient/0.3-3 85.186.168.8 Used in DMOZ.org ; you can also log the referer
TurnitinBot/2.1 (http://www.turnitin.com/robot/crawlerinfo.html) 64.140.49.69  

U

UP.Browser/6.1.0.1.140 (Google CHTML Proxy/1.0) 64.233.178.136  

V

VSE/1.0 (testcrawler@hotmail.com) 24.3.56.88  

Y

Mozilla/5.0 (compatible; Yahoo! Slurp China; http://misc.yahoo.com.cn/help.html) 202.160.180.124  
Yahoo-Blogs/v3.9 (compatible; Mozilla 4.0; MSIE 5.5; http://help.yahoo.com/help/us/ysearch/crawling/crawling-02.html ) 209.191.83.2  
YahooFeedSeeker/2.0 (compatible; Mozilla 4.0; MSIE 5.5; http://publisher.yahoo.com/rssguide; users 0; views 0) 66.218.65.25  

W

WebRankSpider/1.37 (+http://ulm191.server4you.de/crawler/) 62.75.202.126  

Z

Zeusbot/0.8.1 (Ulysseek's web-crawling robot; http://www.zeusbot.com; agent@zeusbot.com) 217.113.244.119