logo
#

Latest news with #webscraping

The Prompt: Can Meta Hire Its Way To Superintelligence?
The Prompt: Can Meta Hire Its Way To Superintelligence?

Forbes

time02-07-2025

  • Business
  • Forbes

The Prompt: Can Meta Hire Its Way To Superintelligence?

Welcome back to The Prompt. Cloudflare, the tech platform that powers millions of websites representing about 20% of the internet, announced yesterday that it will by default block crawlers from AI companies from scraping content without permission. It's a significant move as AI juggernauts like OpenAI, Anthropic and Meta have scrounged the corners of the internet for data to power their AI models, often depriving the websites of the traffic (and associated revenue) they would have otherwise gotten. Publishers like Conde Nast, TIME, The Associated Press and The Atlantic as well as tech companies like Pinterest and Quora have expressed their support for Cloudflare's shift to permission-based crawling. The company also announced a partnership with an initiative called Pay Per Crawl that would allow website creators to get paid for their content. Let's get into the headlines. PEAK PERFORMANCE Microsoft claims that its AI system, called 'Microsoft AI Diagnostic Center,' can diagnose diseases four times more accurately than a group of experienced doctors at a significantly lower cost. That claim is based on an assessment of how correctly the AI tool was able to make a diagnosis on about 300 complex cases previously published in the New England Journal of Medicine in comparison to physicians. However, in the study doctors were asked not to use any additional tools to aid their diagnosis, which doesn't reflect real world scenarios. Healthcare is a burgeoning use case for AI with Microsoft Copilot and Bing receiving about 50 million health-related queries per month. ETHICS + LAW AI-generated videos that portray Black women as primates and perpetuate racist stereotypes are amassing millions of views on platforms like Instagram and TikTok, Wired reported. The videos, part of a social media trend called 'Bigfoot Baddie,' were generated by Google's popular AI video generator, Veo 3. TALENT RESHUFFLES Anysphere, the company behind fast growing AI coding tool Cursor, has hired two engineers from Anthropic who previously worked on Claude Code, The Information reported. Cursor, which is used by developers at top AI companies to create programs and write code, is also a customer of Anthropic's powerful coding models. AI DEAL OF THE WEEK Genesis AI, which is building an AI model for robotics, has raised a $105 million seed round led by Eclipse and Khosla Ventures. DEEP DIVE In the past few weeks, Meta CEO Mark Zuckerberg has personally reached out to dozens of AI researchers and engineers and offered eye-popping multi-million dollar pay packages, according to multiple reports. Zuffa LLC via Getty Images Mark Zuckerberg Sparks An AI Talent War With $100 Million Offers As the race to build powerful AI models and launch impressive products intensifies, companies are shelling out top dollar for the researchers making these systems. On Monday, Alexandr Wang, Meta's newly hired Chief AI Officer and former CEO of data labelling giant Scale AI, announced the creation of a new lab within Meta that is aiming to build so-called 'superintelligence'— an AI system that outperforms humans in a range of cognitive tasks including creativity and problem solving. (That's different from artificial general intelligence which is an AI system that can match human cognitive abilities,) Joining Wang are 11 top researchers that Meta CEO Mark Zuckerberg has freshly poached from leading AI labs including OpenAI, Anthropic and Google DeepMind, with shiny multi-million dollar offers. In the past few weeks, Zuckerberg has personally reached out to dozens of researchers and engineers and offered eye-popping multi-million dollar pay packages, according to multiple reports. The researchers' names can be found on 'The List'— the billionaire's compilation of the brightest minds and hidden geniuses in the field of AI. The Facebook founder has offered top talent $300 million over four years, with $100 million in total compensation (including equity) for the first year, Wired recently reported. Meta's initiative also raises questions about whether 'The List' has actually got the right names on it. Richard Socher, a pioneer in natural language processing and CEO of tells me that while the talent pool for AI engineers is relatively small, companies like Meta are scouting for talent in the most 'obvious places' like OpenAI, which ends up becoming expensive. "Not everyone who's joined OpenAI as employee number 500 is more qualified than someone in a smaller startup," he said. That said, not everyone at OpenAI is a target: Alex Nichol, a deep learning researcher at the startup posted on X, 'Kinda offended that Meta didn't try to recruit me… (not that I would accept, but it's nice to feel recognized)'. While the hires may seem flashy, they are entirely within reach for a tech giant like Meta, which has near-unlimited financial resources and unfettered access to powerful chips used to run these AI models to woo top talent, even from cushy jobs at juggernauts like OpenAI. But not everyone is convinced that the monetary incentives are enough to build a superstar AI team and create the best AI products. Two former Meta AI employees told Forbes they are unconvinced that huge sums of cash can motivate researchers to build superintelligence. 'You want to attract people who care,' a former Meta AI employee said. Meta itself has lost a lot of its top AI talent over the years, who've either left to start their own companies or to join rivals like OpenAI, the former Meta AI employee told Forbes. 'A lot of people left to go to OpenAI…. This is Mark trying to undo the loss of talent,' they said. WEEKLY DEMO In an experiment called Project Vend, Anthropic let its flagship AI model Claude run a small automated physical store in its San Francisco office. The AI system, which had access to different external programs like email, was responsible for maintaining the inventory, setting prices and carrying out transactions. But it experienced a series of hiccups that included inventing a fake Venmo account, stocking and selling metal cubes after being prompted by a customer and giving persuasive customers discounts. MODEL BEHAVIOR People are using AI tools like ChatGPT to act as 'trip sitters' — a term used to describe a sober person who monitors someone under the influence of psychedelic— while they consume psychedelics like magic mushrooms, MIT Technology Review reported. During their trip, people chat with AI tools to share their feelings and calm themselves. But adding AI into the mix could result in a 'dangerous psychological cocktail,' according to multiple psychotherapists.

Cloudflare Sidesteps Copyright Issues, Blocking AI Scrapers By Default
Cloudflare Sidesteps Copyright Issues, Blocking AI Scrapers By Default

Forbes

time02-07-2025

  • Business
  • Forbes

Cloudflare Sidesteps Copyright Issues, Blocking AI Scrapers By Default

AI training IT service management company Cloudflare is striking back on behalf of content creators, blocking AI scrapers by default. Web scrapers are bots that crawl the internet, collecting and cataloguing content of all types, and are used by AI firms to collect material that can be used to train their models. Now, though, Cloudflare is allowing website owners to choose if they want AI crawlers to access their content, and decide how the AI companies can use it. They can opt to allow crawlers for certain purposes—search, for example—but block others. AI companies will have to obtain explicit permission from a website before scraping. 'Original content is what makes the internet one of the greatest inventions in the last century, and it's essential that creators continue making it," said Matthew Prince, co-founder and CEO of Cloudflare. "AI crawlers have been scraping content without limits. Our goal is to put the power back in the hands of creators, while still helping AI companies innovate. This is about safeguarding the future of a free and vibrant internet with a new model that works for everyone.' The company's also introducing "pay per crawl"—the ability to charge companies for access. Currently in beta, this facility means that every time an AI crawler requests content, it has to either present payment intent via request headers for successful access, or receive a "402 Payment Required" response with pricing. Users will be able to bypass charges for specific crawlers as needed - useful if they want to allow a certain crawler through for free, or to take part in a content partnership. Crucially, the new measures don't depend on copyright law - currently a legal minefield when it comes to the use of data for AI - but on standard contract law. The move has been welcomed by content owners and creators, with dozens of media organizations and others saying they plan to sign up. 'Cloudflare's innovative approach to block AI crawlers is a game-changer for publishers and sets a new standard for how content is respected online. When AI companies can no longer take anything, they want for free, it opens the door to sustainable innovation built on permission and partnership,' said Roger Lynch, CEO of Condé Nast. 'This is a critical step toward creating a fair value exchange on the Internet that protects creators, supports quality journalism and holds AI companies accountable.' However, with Cloudflare used by millions of organizations around the world, the move looks like bad news for AI companies. 'This long-awaited feature by Cloudflare is a true disaster for many GenAI vendors, which may be fatal to the current business models of GenAI", said Ilia Kolochenko, CEO at ImmuniWeb and an adjunct professor of cybersecurity at Capital Technology University in Maryland. "Most GenAI vendors will soon face a tough reality: paying a fair price for high-quality training data, while staying profitable. In view of the formidable competition emanating from China, many Western GenAI companies may simply quit the business as economically unviable.'

Cloudflare Is Blocking AI Crawlers by Default
Cloudflare Is Blocking AI Crawlers by Default

WIRED

time01-07-2025

  • Business
  • WIRED

Cloudflare Is Blocking AI Crawlers by Default

Jul 1, 2025 6:00 AM The age of the AI scraping free-for-all may be coming to an end. At least if Cloudflare gets its way. Photograph:Last year, internet infrastructure firm Cloudflare launched tools enabling its customers to block AI scrapers. Today the company has taken its fight against permissionless scraping several steps further. It has switched to blocking AI crawlers by default for its customers and is moving forward with a Pay Per Crawl program that lets customers charge AI companies to scrape their websites. Web crawlers have trawled the internet for information for decades. Without them, people would lose vitally important online tools, from Google Search to the Internet Archive's invaluable digital preservation work. But the AI boom has produced a corresponding boomlet in AI-focused web crawlers, and these bots scrape web pages with a frequency that can mimic a DDoS attack, straining servers and knocking websites offline. Even when websites can handle the heightened activity, many do not want AI crawlers scraping their content, especially news publications that are demanding AI companies to pay to use their work. 'We've been feverishly trying to protect ourselves,' says Danielle Coffey, the president and CEO of the trade group News Media Alliance, which represents several thousand North American outlets. So far, Cloudflare's head of AI control, privacy, and media products, Will Allen, tells WIRED, over 1 million customer websites have activated its older AI-bot-blocking tools. Now millions more will have the option of keeping bot blocking as their default. Cloudflare also says it can identify even 'shadow' scrapers that are not publicized by AI companies. The company noted that it uses a proprietary combination of behavioral analysis, fingerprinting, and machine learning to classify and separate AI bots from 'good' bots. A widely used web standard called the Robots Exclusion Protocol, often implemented through a file, helps publishers block bots on a case-by-case basis, but following it is not legally required, and there's plenty of evidence that some AI companies try to evade efforts to block their scrapers. ' is ignored,' Coffey says. According to a report from the content licensing platform Tollbit, which offers its own marketplace for publishers to negotiate with AI companies over bot access, AI scraping is still on the rise—including scraping that ignores Tollbit found that over 26 million scrapes ignored the protocol in March 2025 alone. In this context, Cloudflare's shift to blocking by default could prove a significant roadblock to surreptitious scrapers and could give publishers more leverage to negotiate, whether through the Pay Per Crawl program or otherwise. 'This could dramatically change the power dynamic. Up to this point, AI companies have not needed to pay to license content, because they've known that they can just take it without consequences,' says Atlantic CEO (and former WIRED editor in chief) Nicholas Thompson. 'Now they'll have to negotiate, and it will become a competitive advantage for the AI companies that can strike more and better deals with more and better publishers.' AI startup ProRata, which operates the AI search engine has agreed to participate in the Pay Per Crawl program, according to CEO and founder Bill Gross. 'We firmly believe that all content creators and publishers should be compensated when their content is used in AI answers,' Gross says. Of course, it remains to be seen whether the big players in the AI space will participate in a program like Pay Per Crawl, which is in beta. (Cloudflare declined to name current participants.) Companies like OpenAI have struck licensing deals with a variety of publishing partners, including WIRED parent company Condé Nast, but specific details of these agreements have not been disclosed, including whether the agreement covers bot access. Meanwhile, there's an entire online ecosystem of tutorials about how to evade Cloudflare's bot blocking tools aimed at web scrapers. As the blocking default rolls out, it's likely these efforts will continue. Cloudflare emphasizes that customers who do want to let the robots scrape unimpeded will be able to turn off the blocking setting. 'All blocking is fully optional and at the discretion of each individual user,' Allen says.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store