Latest news with #TechBrief

AI systems ‘ignorant' of sensitive data can be safer, but still smart

Washington Post

3 days ago

Science
Washington Post

AI systems ‘ignorant' of sensitive data can be safer, but still smart

Happy Tuesday! I'm Nitasha Tiku, The Washington Post's tech culture reporter, filling in for Will Oremus on today's Tech Brief. Send tips about AI via Signal to: nitasha.10 Restricting the information diet of AI software could make it safer. Tech companies including OpenAI and Google have told lawmakers and courts that they must be allowed to grab as much online data as possible to create cutting-edge artificial intelligence systems. New research suggests that screening the information shoved into machine learning algorithms could make it easier to tackle safety concerns about AI. The findings could provide ammunition to regulators who want AI companies to be more transparent and accountable for the choices executives make around the vast troves of data powering generative AI. The research was a collaboration between the British government's AI Security Institute and the nonprofit lab Eleuther AI. They found that filtering the material used to train an AI system to remove key concepts can reduce its ability to help a user work on biohazards, like a novel bioweapon. And that remedy didn't reduce broadly reduce the system's overall capabilities. To test their technique, dubbed 'deep ignorance,' the researchers trained multiple versions of open source AI software for text called Pythia-6.9B, developed by Eleuther. Some were built with copies of a standard dataset of online text that had been filtered to remove potentially hazardous information such as research on enhanced pandemic pathogens, bioterrorism and dual-use virology. In the tests, versions of the AI software built on filtered data scored better on benchmarks designed to test AI capabilities around biorisks. Further experiments showed this didn't come at the cost of reducing the overall performance of the AI system or performance on high-school biology questions, although there was a slight reduction of accuracy on college-level biology questions. The researchers say their methods are not overly burdensome and that their filtering required a less than 1 percent increase in the computing power used to create an AI model. Openly released AI models can be used and modified by anyone, making them hard to monitor or control. But the researchers say their data-filtering technique made it significantly harder to tweak a completed AI model to specialize in bioweapons. The results suggest policymakers may need to question one of the AI industry's long-established narratives. Major AI companies have consistently argued that because recent breakthroughs in AI that yielded products including ChatGPT came from training algorithms on more data, datasets are too colossal to fully document or filter and removing data will make models less useful. The argument goes that safety efforts have to largely focus on adjusting the behavior of AI systems after they have been created. 'Companies sell their data as unfathomably large and un-documentable,' said Eleuther's executive director, Stella Biderman, who spearheaded the project. 'Questioning the design decisions that go into creating models is heavily discouraged.' Demonstrating the effects of filtering massive datasets could prompt demands that AI developers use a similar approach to tackle other potential harms of AI, like nonconsensual intimate imagery, Biderman said. She warned that the study's approach probably worked best in domains like nuclear weapons, where specialized data can be removed without touching general information. Some AI companies have said they already filter training data to improve safety. In reports issued by OpenAI last week about the safety of its most recent AI releases, the ChatGPT maker said it filtered some harmful content out of the training data. For its open source model, GPT-OSS, that included removing content related to 'hazardous biosecurity knowledge.' For its flagship GPT-5 release, the company said its efforts included using 'advanced data filtering' to reduce the amount of personal information in its training data. But the company has not offered details about what that filtering involved or what data it removed, making it difficult for outsiders to check or build on its work. In response to questions, OpenAI cited the two safety testing reports. Biderman said Eleuther is already starting to explore how to demonstrate safety techniques that are more transparent than existing efforts, which she said are 'not that hard to remove.' Trump's chip deal sets new pay-to-play precedent for U.S. exporters (Gerrit De Vynck and Jacob Bogage) Nvidia, AMD agree to pay U.S. government 15% of AI chip sales to China (Eva Dou and Grace Moon) Intel CEO to visit White House on Monday, source says (Reuters) Brazil kept tight rein on Big Tech. Trump's tariffs could change that. (New York Times) Top aide to Trump and Musk seeks even greater influence as a podcaster (Tatum Hunter) New chatbot on Trump's Truth Social platform keeps contradicting him (Drew Harwell) End is near for the landline-based service that got America online in the '90s (Ben Brasch) Meta makes conservative activist an AI bias advisor following lawsuit (The Verge) GitHub CEO Thomas Dohmke to step down, plans new startup (Reuters) Reddit blocks Internet Archive to end sneaky AI scraping (Ars Technica) Why A.I. should make parents rethink posting photos of their children online (New York Times) Wikipedia loses UK Safety Act challenge, worries it will have to verify user IDs (Ars Technica) These workers don't fear artificial intelligence. They're getting degrees in it. (Danielle Abril) Labor unions mobilize to challenge advance of algorithms in workplaces (Danielle Abril) That's all for today — thank you so much for joining us! Make sure to tell others to subscribe to the Tech Brief. Get in touch with Will (via email or social media) for tips, feedback or greetings!

Musk's chatbot Grok posts antisemitic tirade on X

Washington Post

10-07-2025

Business
Washington Post

Musk's chatbot Grok posts antisemitic tirade on X

Happy Thursday! I'm Andrea Jiménez, the tech bureau's news aide, announcing some changes to the Tech Brief. The Thursday edition will highlight a selected Post story and continue the comprehensive tech policy news roundup that we have always run in every brief. The Tech Brief's Tuesday edition will stay unchanged and be anchored by Will Oremus.

AI firms say they can't respect copyright. These researchers tried.

Washington Post

05-06-2025

Business
Washington Post

AI firms say they can't respect copyright. These researchers tried.

Happy Thursday! I'm Nitasha Tiku, The Washington Post's tech culture reporter, filling in for Will Oremus on today's Tech Brief. Send tips about AI to: AI firms say they can't respect copyright. These researchers tried. As the policy debate over AI and fair use heats up, a new paper suggests there's a more transparent — if time-consuming — alternative to slurping up web content without permission. Top artificial intelligence companies argue that it's impossible to build today's powerful large-language models — the GPT in ChatGPT — unless they can freely scrape copyrighted materials from the internet to train their AI systems. But few AI developers have tried the more ethical route — until now. A group of more than two dozen AI researchers have found that they could build a massive eight-terabyte dataset using only text that was openly licensed or in public domain. They tested the dataset quality by using it to train a 7 billion parameter language model, which performed about as well as comparable industry efforts, such as Llama 2-7B, which Meta released in 2023. A paper published Thursday detailing their effort also reveals that the process was painstaking, arduous and impossible to fully automate. The group built an AI model that is significantly smaller than the latest offered by OpenAI's ChatGPT or Google's Gemini, but their findings appear to represent the biggest, most transparent and rigorous effort yet to demonstrate a different way of building popular AI tools. That could have implications for the policy debate swirling around AI and copyright. The paper itself does not take a position on whether scraping text to train AI is fair use. That debate has reignited in recent weeks with a high-profile lawsuit and dramatic turns around copyright law and enforcement in both the U.S. and U.K. On Wednesday, Reddit said it was suing Anthropic, alleging that it accessed data from the social media discussion board without a licensing agreement, according to The Wall Street Journal. The same day, the U.K.'s House of Commons offered concessions on a controversial bill that would allow AI companies to train on copyrighted material. These moves follow President Donald Trump's firing last month of the head of the U.S. Copyright Office, Shira Perlmutter. Her ouster brought more attention to the office's recent report on AI, which cast doubt on fair use applying to copyrighted works in generative AI. AI companies and their investors, meanwhile, have long argued that a better way is not feasible. In April 2023, Sy Damle, a lawyer representing the venture capital firm Andreessen Horowitz, told the U.S. Copyright Office: 'The only practical way for these tools to exist is if they can be trained on massive amounts of data without having to license that data.' Later that year, in comments to the U.K. government, OpenAI said, '[I]t would be impossible to train today's leading AI models without using copyrighted materials.' And in January 2024, Anthropic's expert witness in a copyright trial asserted that 'the hypothetical competitive market for licenses covering data to train cutting-edge LLMs would be impracticable,' court documents show. While AI policy papers often discuss the need for more open data and experts argue about whether large language models should be trained on licensed data from publishers, there's little effort to put theory into action, the paper's co-author, Aviya Skowron, head of policy at the nonprofit research institute Eleuther AI, told The Post. 'I would also like those people to get curious about what this task actually entails,' Skowron said. As it turns out, the task involves a lot of humans. That's because of the technical challenges of data not being formatted in a way that's machine readable, as well as the legal challenges of figuring out what license applies to which website, a daunting prospect when the industry is rife with improperly licensed data. 'This isn't a thing where you can just scale up the resources that you have available' like access to more computer chips and a fancy web scraper, said Stella Biderman, Eleuther AI's executive director. 'We use automated tools, but all of our stuff was manually annotated at the end of the day and checked by people. And that's just really hard.' Still, the group managed to unearth new datasets that can be used ethically. Those include a set of 130,000 English language books in the Library of Congress, which is nearly double the size of the popular-books dataset Project Gutenberg. The group's initiative also builds on recent efforts to develop more ethical, but still useful, datasets, such as FineWeb from Hugging Face, the open-source repository for machine learning. Eleuther AI pioneered an analogous open-source effort in 2020, creating an often-cited dataset called the Pile. A site that hosted the dataset had to take it down in 2023 after a Digital Millennium Copyright Act request from the Danish anti-piracy group Rights Alliance, which targeted the fact that the Pile contained Books3, a dataset of books that Meta is being sued over. The new dataset is called Common Pile v0.1, and the model is called Comma v0.1 — a deliberate reference to the group's belief that they will be able to find more text that is openly licensed or in the public domain that can then be used to train bigger models. Still, Biderman remained skeptical that this approach could find enough content online to match the size of today's state-of-the-art models. The group of authors represented 14 different institutions, including MIT, CMU, and University of Toronto, as well as other nonprofits such as Vector Institute and the Allen Institute for Artificial Intelligence. Biderman said she didn't expect companies such as OpenAI and Anthropic to start adopting the same laborious process, but she hoped it would encourage them to at least rewind back to 2021 or 2022, when AI companies still shared a few sentences of information about what their models were trained on. 'Even partial transparency has a huge amount of social value and a moderate amount of scientific value,' she said. Musk rails against Trump tax bill, calling it a 'disgusting abomination' (Jacob Bogage and Theodoric Meyer) Federal judge blocks Florida from enforcing social media ban for kids while lawsuit continues (Associated Press) Apple and Alibaba's AI rollout in China delayed by Trump trade war (Financial Times) Trump renegotiating Biden-era Chips Act grants, Lutnick says (Reuters) US removes 'safety' from AI Safety Institute (The Verge) 5 AI bots took our tough reading test. One was smartest — and it wasn't ChatGPT (Geoffrey A. Fowler) You are hardwired to blindly trust AI. Here's how to fight it. (Shira Ovide) Reddit sues Anthropic, alleges unauthorized use of site's data (Wall Street Journal) Amazon to invest $10 billion in North Carolina to expand cloud, AI infrastructure (Reuters) Germans are buying more electric cars, but not Teslas (New York Times) Google warns hackers stealing Salesforce data from companies (Bloomberg) Chinese hacked US Telecom a year before known wireless breaches (Bloomberg) ChatGPT can now read your Google Drive and Dropbox (The Verge) Google DeepMind's CEO thinks AI will make humans less selfish (Wired) The creatives and academics rejecting AI — at work and at home (The Guardian) That's all for today — thank you so much for joining us! Make sure to tell others to subscribe to the Tech Brief. Get in touch with Will (via email or social media) for tips, feedback or greetings!

A new Texas law mandates age checks on phones. It may be just the start.

Washington Post

29-05-2025

Business
Washington Post

A new Texas law mandates age checks on phones. It may be just the start.

Happy Thursday from the Tech Brief, where we only cite studies that exist. Send news tips to: Below: Trump's meme coin dinner draws scrutiny from Democrats. First: A new Texas law mandates age checks on phones. It may be just the start. Texas passed a law this week forcing Google and Apple to check the ages of mobile app store users and require parents' permission before a teenager or child can download an app or use it to buy something. It became the second and largest state to do so after Utah passed a similar law in March. Both laws are set to take effect next year, though legal challenges are likely. In the meantime, Texas lawmakers are eyeing an even more aggressive move that has yet to receive much national attention: a bill that would ban minors from using social media at all. Texas' age-verification law is a big deal in its own right. With more than 30 million people, Texas is increasingly wielding a California-like power to help set the national tech policy agenda. This is the latest in a string of consequential internet laws that Republican Gov. Greg Abbott has signed in recent years, including one that restricted social media content moderation and another requiring adult-oriented websites to verify users' ages. Both have been challenged in cases that reached the U.S. Supreme Court. The restrictions on content moderation are on hold after the high court's majority signaled last year in NetChoice v. Paxton that they may violate the First Amendment and sent the case back to lower courts. A Supreme Court ruling on the age-verification requirements for adult websites is expected in the coming weeks, and it could have implications for the legality of age-verification laws in general, including the one Texas just passed. The latest law is another win for Meta in its battle with Apple over who should be held responsible for checking users' ages. The social media giant has pushed for laws that put the onus on the operators of mobile app stores — i.e., Apple and Google — rather than individual apps, such as Facebook, Snapchat and Roblox. Apple has lobbied heavily on the other side. The Wall Street Journal reported last week that Apple CEO Tim Cook called Abbott personally to ask for changes or a veto. Evidently Abbott was unmoved. In a statement attributed to an unnamed spokesperson, Apple said it shares the goal of strengthening kids' online safety but is concerned that the Texas law will require the company to collect sensitive personal data on all users. The company argues for what it says is a forthcoming and less invasive approach called 'age assurance,' in which children's ages would be input by parents but then shared only as an inexact 'age range' with app developers, minimizing identifying details such as birthdays. Heat Initiative, a child advocacy group focused on pressuring Apple to do more to protect kids, sees that as progress but not sufficient. 'Apple had a very hands-off approach to the App Store' for years, Heat Initiative CEO Sarah Gardner said. That allowed developers to publish apps that were rated safe for kids despite providing sexualized content, anonymous chats with adult strangers, and AI 'girlfriend simulators' and appearance-rating tools. Now, Gardner added: 'I think finally we are seeing Apple being dragged into the child safety arena kicking and screaming.' Google has a foot on both sides of the fence. It runs the Google Play Store app on Android devices but also owns the social platform YouTube. Asked for comment on the Texas law, the company said only that it is 'assessing next steps.' Texas age checks are just one salvo in a larger battle over teens' access to the internet. Earlier this month, Apple lent its support to the federal Kids' Online Safety Act, or KOSA, which would hold social media companies rather than app stores responsible for keeping young users safe. That bill, co-sponsored by Sens. Marsha Blackburn (R-Tennessee) and Richard Blumenthal (D-Connecticut), passed the Senate by a vote of 91-3 last year but stalled in the House. Its backers hope Apple's endorsement, along with those of Microsoft, Snap and X, can help it overcome opposition by Meta and the internet trade group NetChoice. Meanwhile, numerous other states are considering age-verification bills similar to those in Utah and Texas, along with an array of other online safety legislation. It's the rare cause that enjoys bipartisan support these days in statehouses and Congress alike, with Republican and Democratic lawmakers differing mostly on the details. The biggest obstacle so far has proved to be the First Amendment, which sets a high bar for the government to justify laws that curtail otherwise legal expression, though there are exceptions for obscenity and speech that's deemed harmful to children. Free-speech advocates warn that overly restrictive or ill-crafted online safety laws could lead down a slippery slope to censorship and repression of marginalized groups and unpopular ideas. Texas's next big move could be a law to keep teens off social media altogether. The Texas House of Representatives earlier this month passed House Bill 186, which would require social media platforms to check users' ages and prevent anyone under 18 from creating an account. The next step is the state Senate, where it also enjoys bipartisan support, according to the Texas Tribune. The bill has critics on both left and right who say it's invasive, draconian and poorly written. If passed, it could face an uphill battle in the courts; a judge has already blocked a similar law in Utah. Regardless, the wave of state-level kids' online safety bills appears unlikely to crest anytime soon, adding to the pressure on Congress to tackle the issue. Trump's meme coin dinner draws Democrats' scrutiny A top House Democrat opened a probe into President Trump's meme coin dinner, our colleague Cat Zakrzewski reports for the Tech Brief. Rep. Jamie Raskin, the ranking opposition member of the House Judiciary Committee, pressed Trump to turn over the names of the guests who attended last week's gala, having secured their invitations by pouring millions of dollars into the president's crypto venture. Democrats are increasingly criticizing Trump for profiting from his office, focusing on the ways his sons are expanding his real estate and crypto businesses. 'Profiting off the memecoin is just the latest in a bewildering gamut of schemes in which you and your family have profited after your return to office,' Raskin wrote in a letter to the White House, also citing the Trump family's business deals in the Middle East. Raskin demanded that the Trump administration share details about the source of the funds used to buy the coins, citing concerns that the president may have violated the emoluments clause of the Constitution. The clause prohibits federal officials from accepting gifts from foreign governments without the consent of Congress. Back at SpaceX, Musk says in interview DOGE became D.C's 'whipping boy' (Christian Davenport) New student visas paused as State Dept. plans tougher social media review (Anumita Kaur and Adam Taylor) Trump administration ramps up push as crypto ally (Shannon Najmabadi) The U.S. government's new strategic reserve: Billions in seized crypto (Lisa Bonos) Labor Department reverses course on crypto in 401(k) plans (The Hill) Key market watchdog plagued by departures amid crypto turf war (The Hill) Tesla investors ask board to make Musk work full-time (Trisha Thadani) SpaceX loses another Starship on test flight as Musk seeks to renew focus (Christian Davenport) Civitai ban of real people content deals major blow to the nonconsensual AI porn ecosystem (404 Media) Netflix co-founder Reed Hastings joins Anthropic's board of directors (The Verge) Telegram, Musk-owned xAI partner to distribute Grok to messaging app's users (Reuters) Elon Musk tried to block Sam Altman's big AI deal in the Middle East (Wall Street Journal) DOGE employees may access sensitive Treasury data, judge rules (Kelly Kasulis Cho) Data broker giant LexisNexis says breach exposed personal information of over 364,000 people (TechCrunch) Anthropic CEO says AI could wipe out half of entry-level white-collar jobs (Axios) Taiwan semiconductor boom runs on exploited migrant labor (Rest of World) Climate and weather scientists are joining the anti-Trump resistance in the most 'scientist-iest' way (CNN) This film was made entirely with Runway AI and Google's Veo. It nearly broke us. (Wall Street Journal) Why 'wrench attacks' on wealthy crypto holders are on the rise (Associated Press) That's all for today — thank you so much for joining us! Make sure to tell others to subscribe to the Tech Brief. Get in touch with Will (via email or social media) for tips, feedback or greetings.

A bid to bar states from regulating AI is getting pushback

Washington Post

22-05-2025

Business
Washington Post

A bid to bar states from regulating AI is getting pushback

Happy Thursday from the Tech Brief, where our reading lists are always human-generated. Send news tips and imaginary book titles to: A bid to bar states from regulating AI is getting pushback. House Republicans' push to pass a 10-year ban on state AI regulations as part of the party's massive tax and immigration bill is forging ahead, but obstacles are mounting.

Latest news with #TechBrief

AI systems ‘ignorant' of sensitive data can be safer, but still smart

Musk's chatbot Grok posts antisemitic tirade on X

AI firms say they can't respect copyright. These researchers tried.

A new Texas law mandates age checks on phones. It may be just the start.

A bid to bar states from regulating AI is getting pushback

Get Started Now: Download the App