AI Crawlers, Firewalls and Who Is Actually Visiting Your Website

AI bots are crawling websites and copying content into their language models. Not just headlines, but whole blog posts, service pages, and anything else they can reach. In most cases it happens without permission and without credit. Here is what that means for your site.

AI bots are crawling websites and copying content into their language models. Not just headlines or meta descriptions, but whole blog posts, service pages, FAQs, anything they can reach. In most cases it happens without permission, without a visit that registers in your analytics, and without credit. The AI company gets the value. Your site gets nothing.

Cloudflare, which I use to protect and manage all the sites I host, introduced a significant response to this in 2025. If you have not come across Cloudflare before: it sits between your website and the wider internet, handling DNS, security, caching, and firewalls. Every visitor passes through it before reaching your server, which means you can filter traffic, block attacks, and improve speed before anything hits your hosting. Cloudflare introduced the option to block AI bots by default, covering crawlers from OpenAI, Anthropic, Google’s AI crawler, and several others. They have also rolled out a pay-per-crawl feature that gives site owners the option to charge AI companies for accessing their content at all. It uses HTTP 402 status codes, which in plain terms means “not without payment.” It is still a relatively new model, but it is an important one: it reasserts that the content on your site has value.

Cloudflare is not the only one reacting to this. The BBC sent a legal warning to AI startup Perplexity demanding deletion of all scraped BBC content, a record of what had been taken, and payment to cover the misuse. This is not a future policy debate. It is happening now, to real organisations, over real content.

I have updated every site I manage with a strict firewall rule. Unless a bot is on the allow list, things like Googlebot or Bingbot that serve a legitimate purpose for your search visibility, it is blocked. AI bots, content scrapers, broken SEO tools, and fake browsers are denied at the Cloudflare edge and never reach the website itself. This is not a robots.txt file, which is essentially a polite request that well-behaved bots might or might not follow. It is a proper firewall. The results across the sites I manage have been consistent: spam traffic down, malicious login attempts almost eliminated, and server load reduced. None of it adds plugins or bloat to your WordPress install. It runs in the background and keeps things clean.

Whether to block AI bots entirely is a decision worth thinking through rather than defaulting either way. For a professional services business, a financial planner, a local tradesperson, or a consultancy, there is a reasonable argument for allowing certain AI crawlers. If someone asks an AI assistant a relevant question and your content is referenced in the response, that can still lead to an enquiry, even without a direct click. For a content-heavy site, a blog, a portfolio, or anything where the writing itself is the value, there is much less reason to allow it. The AI system serves your content directly to the user, with no visit, no pageview, and no credit to you. That is your work being used for someone else’s product.

If you do not know what bots are visiting your site, it is worth finding out. Client logs I have reviewed have shown thousands of hits per day from AI tools and content scrapers that never generate a single real visit. All they do is consume bandwidth and remove content from its context. Sorting this out properly is something I include as standard in my digital support service, and it is built into my website design process from day one. If you want to know what is crawling your site and what to do about it, get in touch and I will take a look.

Hello!

I’m Paul

I help independent businesses and creatives build websites, shape clear content and manage hosting that actually works in the real world.

Whether you're starting fresh or need help improving what’s already there, I offer honest, straightforward support to help your online presence grow - and keep growing.

About me

All AI blog content is the same, and it’s all crap

AI-written content has a signature: over-structured, over-polished, and written by nobody in particular. It all sounds the same because it is the same, and it is doing less for your business than you probably think.

The Decline of Facebook and the Rise of AI Search: Why You Need a Website

For years, a Facebook page was enough for many small businesses. Post an update, your followers see it, job done. That model has not just declined, it has largely stopped working. And the rise of AI-driven search has added a second problem that Facebook cannot solve at all.

Hand holding a pen next to printed analytics charts and a laptop keyboard

Why Your Website Traffic Numbers Might Be Lying to You

You check your stats and the traffic looks steady. But the enquiries are not there to match it. Before you change your marketing or redesign your site, it is worth asking whether the data itself is the problem.