Block search engine spiders that do not follow the robots protocol from crawling website pages — Blackhole for Bad Bots

I believe everyone still remembers the incident when the 360 ​​search engine first came out and was caught and beaten by Baidu for not following the robots protocol. We will not judge who is right or wrong. Today we are going to discuss how to prohibit these search engines that do not follow the robots protocol from crawling content that we do not want them to crawl.

Not long ago, a new plug-in called Blackhole for Bad Bots was added to the WordPress official plug-in directory. This plug-in is used to clean up these unruly search engine spiders. The principle of this plug-in is very interesting.robots.txtAdd a virtual link to the file. Once a spider attempts to access it, the plug-in will prevent the spider from accessing other pages in the website. Spiders who abide by the rules will naturally not visit this link and can unimpededly crawl the pages that the website allows search engines to include.

This is equivalent to setting a clever trap. If you obey the rules, I will naturally welcome you. But if you undisciplined and stepped into the trap I set, haha, I’m so sorry, you are not welcome here. Even better, normal users cannot see this hidden link, and search engines that follow the robots protocol are not affected.

Features

  • Easy to set up
  • clean code
  • Developed based on WordPress API
  • The function is simple and specific, not verbose
  • Lightweight, high performance, high flexibility
  • You can easily reset the blocked spider list
  • Individual blocked spiders can be deleted
  • The plug-in is set through the settings page and does not add to the background.
  • Work quietly in the background and will not affect normal user access
  • Optionally receive an email c-alert with WHOIS lookup for blocked bots
  • All major search engines have been added to the whitelist and will not be blocked
  • Customize the message displayed to blocked search engines
  • Reset plugin settings with one click

If your website is not built based on WordPress, it doesn’t matter. As long as the language you use is PHP, you can use Blackhole’sStandalone PHP versionAchieve the same function!

whitelist

By default, the plug-in does not block any of the following mainstream search engines. The following search engines are added to the white list of the plug-in by default. The plug-in also allows us to manually add other search engines to the white list in the settings.

  • AOL.com
  • Baidu
  • Bingbot/MSN
  • DuckDuckGo
  • Googlebot
  • Teoma
  • Yahoo!
  • Yandex

If your website is not built on WordPress, you can also use the PHP version of the plugin.