Latest news with #TechBrief


Washington Post
2 days ago
- Business
- Washington Post
AI firms say they can't respect copyright. These researchers tried.
Happy Thursday! I'm Nitasha Tiku, The Washington Post's tech culture reporter, filling in for Will Oremus on today's Tech Brief. Send tips about AI to: AI firms say they can't respect copyright. These researchers tried. As the policy debate over AI and fair use heats up, a new paper suggests there's a more transparent — if time-consuming — alternative to slurping up web content without permission. Top artificial intelligence companies argue that it's impossible to build today's powerful large-language models — the GPT in ChatGPT — unless they can freely scrape copyrighted materials from the internet to train their AI systems. But few AI developers have tried the more ethical route — until now. A group of more than two dozen AI researchers have found that they could build a massive eight-terabyte dataset using only text that was openly licensed or in public domain. They tested the dataset quality by using it to train a 7 billion parameter language model, which performed about as well as comparable industry efforts, such as Llama 2-7B, which Meta released in 2023. A paper published Thursday detailing their effort also reveals that the process was painstaking, arduous and impossible to fully automate. The group built an AI model that is significantly smaller than the latest offered by OpenAI's ChatGPT or Google's Gemini, but their findings appear to represent the biggest, most transparent and rigorous effort yet to demonstrate a different way of building popular AI tools. That could have implications for the policy debate swirling around AI and copyright. The paper itself does not take a position on whether scraping text to train AI is fair use. That debate has reignited in recent weeks with a high-profile lawsuit and dramatic turns around copyright law and enforcement in both the U.S. and U.K. On Wednesday, Reddit said it was suing Anthropic, alleging that it accessed data from the social media discussion board without a licensing agreement, according to The Wall Street Journal. The same day, the U.K.'s House of Commons offered concessions on a controversial bill that would allow AI companies to train on copyrighted material. These moves follow President Donald Trump's firing last month of the head of the U.S. Copyright Office, Shira Perlmutter. Her ouster brought more attention to the office's recent report on AI, which cast doubt on fair use applying to copyrighted works in generative AI. AI companies and their investors, meanwhile, have long argued that a better way is not feasible. In April 2023, Sy Damle, a lawyer representing the venture capital firm Andreessen Horowitz, told the U.S. Copyright Office: 'The only practical way for these tools to exist is if they can be trained on massive amounts of data without having to license that data.' Later that year, in comments to the U.K. government, OpenAI said, '[I]t would be impossible to train today's leading AI models without using copyrighted materials.' And in January 2024, Anthropic's expert witness in a copyright trial asserted that 'the hypothetical competitive market for licenses covering data to train cutting-edge LLMs would be impracticable,' court documents show. While AI policy papers often discuss the need for more open data and experts argue about whether large language models should be trained on licensed data from publishers, there's little effort to put theory into action, the paper's co-author, Aviya Skowron, head of policy at the nonprofit research institute Eleuther AI, told The Post. 'I would also like those people to get curious about what this task actually entails,' Skowron said. As it turns out, the task involves a lot of humans. That's because of the technical challenges of data not being formatted in a way that's machine readable, as well as the legal challenges of figuring out what license applies to which website, a daunting prospect when the industry is rife with improperly licensed data. 'This isn't a thing where you can just scale up the resources that you have available' like access to more computer chips and a fancy web scraper, said Stella Biderman, Eleuther AI's executive director. 'We use automated tools, but all of our stuff was manually annotated at the end of the day and checked by people. And that's just really hard.' Still, the group managed to unearth new datasets that can be used ethically. Those include a set of 130,000 English language books in the Library of Congress, which is nearly double the size of the popular-books dataset Project Gutenberg. The group's initiative also builds on recent efforts to develop more ethical, but still useful, datasets, such as FineWeb from Hugging Face, the open-source repository for machine learning. Eleuther AI pioneered an analogous open-source effort in 2020, creating an often-cited dataset called the Pile. A site that hosted the dataset had to take it down in 2023 after a Digital Millennium Copyright Act request from the Danish anti-piracy group Rights Alliance, which targeted the fact that the Pile contained Books3, a dataset of books that Meta is being sued over. The new dataset is called Common Pile v0.1, and the model is called Comma v0.1 — a deliberate reference to the group's belief that they will be able to find more text that is openly licensed or in the public domain that can then be used to train bigger models. Still, Biderman remained skeptical that this approach could find enough content online to match the size of today's state-of-the-art models. The group of authors represented 14 different institutions, including MIT, CMU, and University of Toronto, as well as other nonprofits such as Vector Institute and the Allen Institute for Artificial Intelligence. Biderman said she didn't expect companies such as OpenAI and Anthropic to start adopting the same laborious process, but she hoped it would encourage them to at least rewind back to 2021 or 2022, when AI companies still shared a few sentences of information about what their models were trained on. 'Even partial transparency has a huge amount of social value and a moderate amount of scientific value,' she said. Musk rails against Trump tax bill, calling it a 'disgusting abomination' (Jacob Bogage and Theodoric Meyer) Federal judge blocks Florida from enforcing social media ban for kids while lawsuit continues (Associated Press) Apple and Alibaba's AI rollout in China delayed by Trump trade war (Financial Times) Trump renegotiating Biden-era Chips Act grants, Lutnick says (Reuters) US removes 'safety' from AI Safety Institute (The Verge) 5 AI bots took our tough reading test. One was smartest — and it wasn't ChatGPT (Geoffrey A. Fowler) You are hardwired to blindly trust AI. Here's how to fight it. (Shira Ovide) Reddit sues Anthropic, alleges unauthorized use of site's data (Wall Street Journal) Amazon to invest $10 billion in North Carolina to expand cloud, AI infrastructure (Reuters) Germans are buying more electric cars, but not Teslas (New York Times) Google warns hackers stealing Salesforce data from companies (Bloomberg) Chinese hacked US Telecom a year before known wireless breaches (Bloomberg) ChatGPT can now read your Google Drive and Dropbox (The Verge) Google DeepMind's CEO thinks AI will make humans less selfish (Wired) The creatives and academics rejecting AI — at work and at home (The Guardian) That's all for today — thank you so much for joining us! Make sure to tell others to subscribe to the Tech Brief. Get in touch with Will (via email or social media) for tips, feedback or greetings!


Washington Post
29-05-2025
- Business
- Washington Post
A new Texas law mandates age checks on phones. It may be just the start.
Happy Thursday from the Tech Brief, where we only cite studies that exist. Send news tips to: Below: Trump's meme coin dinner draws scrutiny from Democrats. First: A new Texas law mandates age checks on phones. It may be just the start. Texas passed a law this week forcing Google and Apple to check the ages of mobile app store users and require parents' permission before a teenager or child can download an app or use it to buy something. It became the second and largest state to do so after Utah passed a similar law in March. Both laws are set to take effect next year, though legal challenges are likely. In the meantime, Texas lawmakers are eyeing an even more aggressive move that has yet to receive much national attention: a bill that would ban minors from using social media at all. Texas' age-verification law is a big deal in its own right. With more than 30 million people, Texas is increasingly wielding a California-like power to help set the national tech policy agenda. This is the latest in a string of consequential internet laws that Republican Gov. Greg Abbott has signed in recent years, including one that restricted social media content moderation and another requiring adult-oriented websites to verify users' ages. Both have been challenged in cases that reached the U.S. Supreme Court. The restrictions on content moderation are on hold after the high court's majority signaled last year in NetChoice v. Paxton that they may violate the First Amendment and sent the case back to lower courts. A Supreme Court ruling on the age-verification requirements for adult websites is expected in the coming weeks, and it could have implications for the legality of age-verification laws in general, including the one Texas just passed. The latest law is another win for Meta in its battle with Apple over who should be held responsible for checking users' ages. The social media giant has pushed for laws that put the onus on the operators of mobile app stores — i.e., Apple and Google — rather than individual apps, such as Facebook, Snapchat and Roblox. Apple has lobbied heavily on the other side. The Wall Street Journal reported last week that Apple CEO Tim Cook called Abbott personally to ask for changes or a veto. Evidently Abbott was unmoved. In a statement attributed to an unnamed spokesperson, Apple said it shares the goal of strengthening kids' online safety but is concerned that the Texas law will require the company to collect sensitive personal data on all users. The company argues for what it says is a forthcoming and less invasive approach called 'age assurance,' in which children's ages would be input by parents but then shared only as an inexact 'age range' with app developers, minimizing identifying details such as birthdays. Heat Initiative, a child advocacy group focused on pressuring Apple to do more to protect kids, sees that as progress but not sufficient. 'Apple had a very hands-off approach to the App Store' for years, Heat Initiative CEO Sarah Gardner said. That allowed developers to publish apps that were rated safe for kids despite providing sexualized content, anonymous chats with adult strangers, and AI 'girlfriend simulators' and appearance-rating tools. Now, Gardner added: 'I think finally we are seeing Apple being dragged into the child safety arena kicking and screaming.' Google has a foot on both sides of the fence. It runs the Google Play Store app on Android devices but also owns the social platform YouTube. Asked for comment on the Texas law, the company said only that it is 'assessing next steps.' Texas age checks are just one salvo in a larger battle over teens' access to the internet. Earlier this month, Apple lent its support to the federal Kids' Online Safety Act, or KOSA, which would hold social media companies rather than app stores responsible for keeping young users safe. That bill, co-sponsored by Sens. Marsha Blackburn (R-Tennessee) and Richard Blumenthal (D-Connecticut), passed the Senate by a vote of 91-3 last year but stalled in the House. Its backers hope Apple's endorsement, along with those of Microsoft, Snap and X, can help it overcome opposition by Meta and the internet trade group NetChoice. Meanwhile, numerous other states are considering age-verification bills similar to those in Utah and Texas, along with an array of other online safety legislation. It's the rare cause that enjoys bipartisan support these days in statehouses and Congress alike, with Republican and Democratic lawmakers differing mostly on the details. The biggest obstacle so far has proved to be the First Amendment, which sets a high bar for the government to justify laws that curtail otherwise legal expression, though there are exceptions for obscenity and speech that's deemed harmful to children. Free-speech advocates warn that overly restrictive or ill-crafted online safety laws could lead down a slippery slope to censorship and repression of marginalized groups and unpopular ideas. Texas's next big move could be a law to keep teens off social media altogether. The Texas House of Representatives earlier this month passed House Bill 186, which would require social media platforms to check users' ages and prevent anyone under 18 from creating an account. The next step is the state Senate, where it also enjoys bipartisan support, according to the Texas Tribune. The bill has critics on both left and right who say it's invasive, draconian and poorly written. If passed, it could face an uphill battle in the courts; a judge has already blocked a similar law in Utah. Regardless, the wave of state-level kids' online safety bills appears unlikely to crest anytime soon, adding to the pressure on Congress to tackle the issue. Trump's meme coin dinner draws Democrats' scrutiny A top House Democrat opened a probe into President Trump's meme coin dinner, our colleague Cat Zakrzewski reports for the Tech Brief. Rep. Jamie Raskin, the ranking opposition member of the House Judiciary Committee, pressed Trump to turn over the names of the guests who attended last week's gala, having secured their invitations by pouring millions of dollars into the president's crypto venture. Democrats are increasingly criticizing Trump for profiting from his office, focusing on the ways his sons are expanding his real estate and crypto businesses. 'Profiting off the memecoin is just the latest in a bewildering gamut of schemes in which you and your family have profited after your return to office,' Raskin wrote in a letter to the White House, also citing the Trump family's business deals in the Middle East. Raskin demanded that the Trump administration share details about the source of the funds used to buy the coins, citing concerns that the president may have violated the emoluments clause of the Constitution. The clause prohibits federal officials from accepting gifts from foreign governments without the consent of Congress. Back at SpaceX, Musk says in interview DOGE became D.C's 'whipping boy' (Christian Davenport) New student visas paused as State Dept. plans tougher social media review (Anumita Kaur and Adam Taylor) Trump administration ramps up push as crypto ally (Shannon Najmabadi) The U.S. government's new strategic reserve: Billions in seized crypto (Lisa Bonos) Labor Department reverses course on crypto in 401(k) plans (The Hill) Key market watchdog plagued by departures amid crypto turf war (The Hill) Tesla investors ask board to make Musk work full-time (Trisha Thadani) SpaceX loses another Starship on test flight as Musk seeks to renew focus (Christian Davenport) Civitai ban of real people content deals major blow to the nonconsensual AI porn ecosystem (404 Media) Netflix co-founder Reed Hastings joins Anthropic's board of directors (The Verge) Telegram, Musk-owned xAI partner to distribute Grok to messaging app's users (Reuters) Elon Musk tried to block Sam Altman's big AI deal in the Middle East (Wall Street Journal) DOGE employees may access sensitive Treasury data, judge rules (Kelly Kasulis Cho) Data broker giant LexisNexis says breach exposed personal information of over 364,000 people (TechCrunch) Anthropic CEO says AI could wipe out half of entry-level white-collar jobs (Axios) Taiwan semiconductor boom runs on exploited migrant labor (Rest of World) Climate and weather scientists are joining the anti-Trump resistance in the most 'scientist-iest' way (CNN) This film was made entirely with Runway AI and Google's Veo. It nearly broke us. (Wall Street Journal) Why 'wrench attacks' on wealthy crypto holders are on the rise (Associated Press) That's all for today — thank you so much for joining us! Make sure to tell others to subscribe to the Tech Brief. Get in touch with Will (via email or social media) for tips, feedback or greetings.


Washington Post
22-05-2025
- Business
- Washington Post
A bid to bar states from regulating AI is getting pushback
Happy Thursday from the Tech Brief, where our reading lists are always human-generated. Send news tips and imaginary book titles to: A bid to bar states from regulating AI is getting pushback. House Republicans' push to pass a 10-year ban on state AI regulations as part of the party's massive tax and immigration bill is forging ahead, but obstacles are mounting.


Washington Post
08-05-2025
- Washington Post
More airports are scanning faces. A new bill would limit the practice.
Happy Thursday from the Tech Brief, where we wouldn't dream of exceeding 2,000 words. Send news tips, but please no death threats, to: More airports are scanning faces. A new bill would limit the practice. It's becoming standard practice at a growing number of U.S. airports: When you reach the front of the security line, an agent asks you to step up to a machine that scans your face to check whether it matches the face on your identification card. Travelers have the right to opt out of the face scan and have the agent do a visual check instead — but many don't realize that's an option.


Washington Post
01-05-2025
- Business
- Washington Post
Google's Pichai says Justice Dept. proposals would cripple its business
Happy Thursday! Julian Mark here, joining Will Oremus on today's Tech Brief after attending the Google trial at D.C.'s federal courthouse yesterday. Send news tips to: Below: The GOP walks back its bid to strip the FTC of antitrust authority. But first: Google's Pichai says DOJ's breakup plan would derail its business.