Robot Identification Algorithm

User Help

Search Engine Optimization

Robot Identification Algorithm

Current Identification Algorithm

The following algorithm is used to determine if a user is a indexing (or otherwise) robot. If the remote machine identification (usually in the user agent field of the HTML request) includes any of the following, it is considered a robot:

ABOUT.ASK.COM
AISEARCHBOT
ATRAXBOT
BINGBOT
CAMONTSPIDER
CAZOODLEBOT
CCBOT
CRAWLER
DISCOBOT
GIGABOT
GOOGLEBOT
LINGUEE+BOT
MJ12BOT
MLBOT
MSNBOT
NEXTGENSEARCHBOT
PICSEARCH.COM
PLONEBOT
SCOUTJET
SEARCHME.COM
SITEBOT
SITESUCKER
SLURP
SOGOU+WEB+SPIDER
XENU+LINK+SLEUTH
YANDEX
WEBVAC
65.55.230.xx (MSN Search Engine)
220.181.51.xx (Baidu Search Engine)

In addition, several individual IP addresses have been blocked due to past behavior. While this is hardly a secure method, it does enable robot-friendly pages to be served, and other behavior to be blocked.

Finally, for diagnostic and demonstration purposes, by appending "?robot=yes" to the end of the URL a page can be viewed as if it was requested by a search engine robot. This only works if you are currently logged out of the system however.