How to block AI crawlers from your site
Artificial intelligence (AI) companies have to get their data from somewhere… unfortunately, that includes scraping various sites, and without the site owners’ permission. Knowing your blog posts or artwork are being used to feed AI generators’ slop isn’t the reason anyone creates online material.
While there isn’t a foolproof way of preventing your sites from becoming AI fodder, there are a few ways to thwart some of it. One way is by adding code to your site’s robots.txt file to block AI crawlers from accessing your site. If wondering, robots.txt is a text file that goes in the base level of a site that controls what parts of a site web crawlers are allowed to access. Unfortunately, it has one flaw: robots.txt files basically rely on voluntary compliance, so malicious crawlers might not pay it any attention. That said, it’s better than nothing.
Neil Clarke’s blog has instructions for what text to add to robots.txt files, and instructions on how to do so depending on what type of site you have (self-hosted WordPress, Wix, etc.). The blog also keeps a consistently updated list of crawlers to add.
I use the Yoast SEO plugin for my site, a popular WordPress search engine optimization plugin. (Never mind the current state of online search.) Thus, I just used the plugin to update my site’s robots.txt file. As for how I did it, here’s my take on Yoast’s instructions:
In WordPress’ administration account, go to Dashboard > Yoast SEO > Tools, then select “File editor.”
Click the “create robots.txt” button, if there isn’t already a robots.txt file.
In the text box, starting on a blank line, enter each desired AI crawler to block, in the following format:
User-agent: name_of_AI_crawler Disallow: / User-agent: name_of_another_AI_crawler Disallow: /
After you’re finished, select “Save changes to robots.txt.”
If you don't use Yoast (and are more technically skilled), editing your site's robots.txt file in a text editor (following the third step above) will also work fine.
While this isn’t perfect, I hope this helps.