5 days ago
AI audits and 'pay per crawl': How Cloudflare is trying to fix a 'broken' web model
An announcement from network giant Cloudflare is deepening a divide between the worlds of tech and content publishing, which are at odds over the data used to train AI platforms.
Cloudflare said publishers using the cloud company's tools for hosting websites will now block AI crawlers by default from accessing and poaching content without permission.
'Upon sign-up with Cloudflare, every new domain will now be asked if they want to allow AI crawlers, giving customers the choice upfront to explicitly allow or deny AI crawlers access,' the company said.
'This significant shift means that every new domain starts with the default of control, and eliminates the need for webpage owners to manually configure their settings to opt out.'
Cloudflare first addressed concerns about AI data-scraping last year when it gave websites the option to block AI companies from poaching content.
'Now by default you have control over who crawls your site and what that information is used for,' said Stephanie Cohen, a chief strategy officer for Cloudflare.
'The benefit of that is that it creates the conditions for a new business model of the internet to develop,' she told The National after Cloudflare introduced the new settings and options.
Some of the strongest proponents of AI tools, and many of the tools' creators, have justified data scraping, saying it is akin to the early days of search engines when controversy briefly surfaced over whether or not search companies should be able to index sites.
Others say that comparison isn't appropriate, because search engines didn't poach the contents of entire websites.
Additionally, during the early days of web browsers, search engines and the crawlers they implemented provided a framework that built much of the internet as we know it.
It was a win-win situation for the likes of Google and media companies which provided information and sought to attract audiences by delivering web traffic through internet searches.
The debut of OpenAI's ChatGPT in 2022 and other AI platforms turned that economic model on its head.
Instead of directing traffic to websites, AI summaries have quickly become a destination unto themselves, siphoning traffic from the same websites from which they scrape data.
Ms Cohen said publishers and content creators using Cloudflare's services soon noticed a major dip in web traffic.
'Not only was it getting more difficult to get web traffic – to the tune of it being 10 times harder – but it was also getting more difficult at a faster and faster rate,' she said.
The web's economics based on search that built up over the last decade, she said, started to erode over a period of six months.
In 2024, Ms Cohen said Cloudflare allowed users to see which AI companies were scraping their sites and turn off that ability. This year they are taking things further by introducing 'pay per crawl'.
The tool gives publishers and website operators the option of allowing AI scraping for free, charging for it 'at the configured domain-wide price,' or blocking scraping entirely.
As AI developments quickens, so too does the bad blood between media organisations and the tech firms driving the AI boom.
Several lawsuits have been filed. The New York Times has sued OpenAI and Microsoft for allegedly using its articles to power increasingly popular chatbots.