There’s been a quiet shift happening online. AI bots are crawling websites and copying content into their language models. Not just headlines or meta descriptions, but whole blog posts, service pages, FAQs – anything they can reach. In most cases, it happens without permission, without a visit, and without credit. The AI company gets the value. Your site gets nothing.
This isn’t some far-off future problem. It’s already happening. And unless you’ve taken specific steps to stop it, your content is probably being scraped right now.
Cloudflare, the company I use to protect and manage all the sites I host, has just introduced a way to stop it. If you’ve not heard of them before, Cloudflare sits between your website and the wider internet. It handles things like DNS, security, caching, and firewalls. Every visitor to your site passes through Cloudflare first, which means you can filter traffic, block attacks, and speed things up before they ever hit your server. It’s fast, lightweight, and properly configurable. That’s why I use it.
As of June 2025, Cloudflare now blocks AI bots by default unless you choose to allow them. That includes OpenAI, Anthropic, Google’s new AI crawler, and several others. They’ve also started rolling out a new tool called pay-per-crawl, which gives you the option to charge AI companies for accessing your site at all. Full details are here: https://blog.cloudflare.com/introducing-pay-per-crawl/
They’re not the only ones reacting to this. The BBC recently sent a legal warning to AI startup Perplexity, telling them to delete all scraped BBC content immediately. They’ve demanded a record of what’s been taken, removal of everything copied, and payment to cover the misuse. This is no longer about future policy or academic debate. It’s happening now. You can read the story at: https://www.bbc.co.uk/news/articles/cvg885p923jo
I’ve already updated every site I manage with a strict firewall rule. Unless a bot is on the allow list – things like Googlebot or Bingbot – it’s blocked. AI bots, content scrapers, broken SEO tools and fake browsers are all denied at the Cloudflare edge. They don’t even reach the website. This isn’t a robots.txt file. This is a proper firewall that stops junk traffic before it ever touches your hosting.
The results speak for themselves. Spam traffic is down. Malicious login attempts have almost stopped. Server load is lower. And because none of this happens in WordPress, it doesn’t add bloat or plugins to your install. It just works in the background and keeps your site clean.
There’s no real impact for real users, the site loads fast and behaves exactly as expected. If someone tries to visit and their traffic looks odd or suspicious, they’ll be get challenged to prove they are human, or be blocked outright. It’s simple and it works.
Then there’s the question of whether to allow AI bots at all. That depends entirely on what your site is for. If you run a professional service business – financial planning, legal advice, consultancy, local trades – there may be a case for allowing AI bots. If someone asks an AI assistant a relevant question, and it includes a quote or a link to your content, that might lead to an enquiry. There’s a small SEO‑style benefit there. Visibility without the click, but still useful if it’s credited properly.
But if you run a blog, a personal project, a content-heavy portfolio or a hobby site – where the content itself is the value – letting AI tools scrape it is just giving your work away. The AI system serves your content directly to the user. There’s no visit, no pageview, no credit. You lose the audience and get nothing back. That’s where I strongly recommend blocking AI bots completely. Content like that should be protected.
Cloudflare’s pay-per-crawl feature adds a third option. You can require payment before any AI crawler is allowed to access your content. It’s still in beta and uses HTTP 402 status codes, which basically means “we don’t allow this unless you pay.” It’s a new model, but an important step. It means your content has value again.
If you don’t know what bots are crawling your site, you should. I’ve seen client logs with thousands of hits per day from AI tools, analytics scrapers, and marketing crawlers that never generate a single real visit. All they do is waste bandwidth and put your work in someone else’s hands.
That’s why I’ve made this a standard part of my digital support service. Every site I manage gets a proper Cloudflare setup, with firewalls, bot controls, and clear rules. If I’m building your site from scratch, it’s baked in from day one. It links into my website design process and ties into the analytics and performance work I do as well.
If you want to block the bots that are scraping your site – or find out who they are, what they’re taking, and whether they’re worth allowing – drop me a message. I’ll show you what’s going on and help you take back control.




