
If you’re concerned about AI bots scraping your website content for AI training, Cloudflare has a solution.
The company, which serves as a proxy for approximately 20% of the internet, has launched a new tool that blocks all AI bots from scraping text on a website. This tool is accessible to all Cloudflare customers, including those on the free tier.
Additionally: Is antivirus software still necessary in 2024?
As generative AI becomes more prevalent, companies require content to train chatbots. Many are resorting to web scrapers to extract text from websites for analysis (similar to what ChatGPT does with Reddit posts). While some companies are transparent about their web-scraping activities, others are not.
Last September, Cloudflare introduced a feature to block “malicious” AI web crawlers that scrape sites without permission. However, some companies have circumvented this by employing scrapers that masquerade as legitimate ones. This new tool aims to block all AI crawlers, even those that adhere to proper scraping protocols.
In June 2024, AI bots accessed around 39% of the top one million “internet properties” using Cloudflare. Less than 3% of these properties implemented measures to block AI bots. The top four bots scraping Cloudflare sites were Bytespider, Amazonbot, ClaudeBot, and GPTBot.
Bytespider, owned by Bytedance (the company behind TikTok), is used to collect training data for large language models like ChatGPT competitor Doubao. Amazonbot trains the question-answering capabilities of Alexa, ClaudeBot trains Claude AI, and GPTBot trains ChatGPT.
Additionally: 5 ways Amazon can enhance the value of an AI-powered Alexa subscription
For Cloudflare users, utilizing the tool is straightforward. Simply navigate to the settings section of your dashboard, select “Security,” then “Bots.” You will find a toggle button labeled “AI Scrapers and Crawlers.” Activate it to prevent AI bots from accessing your content.
Cloudflare notes that this feature will adapt automatically to detect the “fingerprints” of offending bots as they evolve.
The new tool is now available to all Cloudflare users, starting today.