Bad webcrawlers

Websites are often visited by programs that wish to extract information about the website. These programs are called Webcrawlers or spiders. One of these spiders is for example "GoogleBot". Google needs to visit your website to know about your content so they can add your pages to its index.
Bots can cause a lot of traffic and strain on your server, in case of Google that is something good. Because you are getting traffic in return for it. Other spiders are just costing you resources.

There is a way for webmasters to tell these spiders to not visit your website. By adding an Robots.txt file to their website they can either block parts of your website, or just disallow the whole website for specific spiders.
These statistics are collected from the robots.txt file of the websites in our database. We measure the crawlers that have been disallowed all access to a website. Websites that do not have a robots.txt file are excluded.
Most banned webcrawlers
# Spider/Crawler Percentage
1 Nutch (Webcrawler from Apache) 21.68%
2 Mj12bot (Majestic 12 Search Engine) 18.46%
3 Baiduspider (Chinese Search Engine Baidu) 15.94%
4 Ahrefsbot (Seo website Ahrefs) 13.22%
5 Yandex (Russian Search Engine) 12.99%
6 Nerdybot 10.66%
7 Ia_archiver (Wayback machine - 7.8%
8 Semrushbot 5.76%
9 Blexbot 5.52%
10 Bingbot 5.31%