mySobek Home   |   Help
Skip Navigation Links.
MISSING BANNER
User Help >> Search Engine Optimization >> Robot Identification Algorithm

Robot Identification Algorithm

Current Identification Algorithm

The following algorithm is used to determine if a user is a indexing (or otherwise) robot. If the remote machine identification (usually in the user agent field of the HTML request) includes any of the following, it is considered a robot:

  • ABOUT.ASK.COM
  • AISEARCHBOT
  • ATRAXBOT
  • BINGBOT
  • CAMONTSPIDER
  • CAZOODLEBOT
  • CCBOT
  • CRAWLER
  • DISCOBOT
  • GIGABOT
  • GOOGLEBOT
  • LINGUEE+BOT
  • MJ12BOT
  • MLBOT
  • MSNBOT
  • NEXTGENSEARCHBOT
  • PICSEARCH.COM
  • PLONEBOT
  • SCOUTJET
  • SEARCHME.COM
  • SITEBOT
  • SITESUCKER
  • SLURP
  • SOGOU+WEB+SPIDER
  • XENU+LINK+SLEUTH
  • YANDEX
  • WEBVAC
  • 65.55.230.xx (MSN Search Engine)
  • 220.181.51.xx (Baidu Search Engine)

In addition, several individual IP addresses have been blocked due to past behavior. While this is hardly a secure method, it does enable robot-friendly pages to be served, and other behavior to be blocked.

Finally, for diagnostic and demonstration purposes, by appending "?robot=yes" to the end of the URL a page can be viewed as if it was requested by a search engine robot. This only works if you are currently logged out of the system however.