Latest news with #LMArena
Yahoo
22-05-2025
- Business
- Yahoo
Chatbot Arena secures $100m to enhance AI platform
Chatbot Arena, a platform designed to compare the performance of various AI models, has raised $100m in seed funding. The round was led by Andreessen Horowitz (a16z) and UC Investments (University of California), with participation from Lightspeed, Laude Ventures, Felicis, Kleiner Perkins, and The House Fund. The round values the company at $600m, according to a Bloomberg report. The funding coincides with the upcoming relaunch of LMArena, featuring a fully rebuilt platform designed to enhance AI evaluation with greater rigor, transparency, and user focus. The platform, which originated as an academic project at UC Berkeley, enables researchers, developers, and users to assess how AI models perform in real-world scenarios. More than 400 model evaluations have been conducted on LMArena, with more than three million votes cast, influencing both proprietary and open-source models from companies such as Google, OpenAI, Meta, and xAI, the company said. LMArena co-founder and CEO Anastasios Angelopoulos said: "In a world racing to build ever-bigger models, the hard question is no longer what can AI do. Rather, it's how well can it do it for specific use cases, and for whom. We're building the infrastructure to answer these critical questions." The relaunched LMArena, set to debut in late May 2025, incorporates community feedback and introduces a rebuilt user interface, mobile-first design, lower latency, and new features like saved chat history and endless chat. Ion Stoica, co-founder and UC Berkeley professor, said: 'AI evaluation has often lagged behind model development. LMArena closes that gap by putting rigorous, community-driven science at the centre. It's refreshing to be part of a team that leads with long-term integrity in a space moving this fast.' The company collaborates with model providers to identify performance trends, gather human preference data, and test updates in real-world conditions, aiming to develop advanced analytics and enterprise services while keeping core participation free. 'We invested in LMArena because the future of AI depends on reliability,' said Anjney Midha, general partner at a16z. 'And reliability requires transparent, scientific, community-led evaluation. LMArena is building that backbone.' Jagdeep Singh Bachher, chief investment officer at UC Investments, said: 'We're excited to see open AI research translated into real-world impact through platforms like LMArena. Supporting innovation from university labs such as those at UC Berkeley is essential for building technologies that responsibly serve the public and advance the field.' "Chatbot Arena secures $100m to enhance AI platform" was originally created and published by Verdict, a GlobalData owned brand. The information on this site has been included in good faith for general informational purposes only. It is not intended to amount to advice on which you should rely, and we give no representation, warranty or guarantee, whether express or implied as to its accuracy or completeness. You must obtain professional or specialist advice before taking, or refraining from, any action on the basis of the content on our site.
Yahoo
21-05-2025
- Business
- Yahoo
LMArena Secures $100M in Seed Funding to Bring Scientific Rigor to AI Reliability
SAN FRANCISCO, May 21, 2025 /PRNewswire/ -- LMArena, the open community platform for evaluating the best AI models, has secured $100 million in seed funding led by a16z and UC Investments (University of California) with participation from Lightspeed, Laude Ventures, Felicis, Kleiner Perkins and The House Fund. The funding coincides with the relaunch of LMArena happening next week—a faster, sharper, fully rebuilt platform designed to make AI evaluation more rigorous, transparent, and human-centered. In a space moving at breakneck speed, LMArena is building something foundational: a neutral, reproducible, community-driven layer of infrastructure that allows researchers, developers, and users to understand how models actually perform in the real world. Over four hundred model evaluations have already been made on the platform, with over 3 millions votes cast, helping shape both proprietary and open-source models across the industry, including those from Google, OpenAI, Meta, and xAI. "In a world racing to build ever-bigger models, the hard question is no longer what can AI do. Rather, it's how well can it do it for specific use cases, and for whom," said Anastasios N. Angelopoulos, co-founder and CEO at LMArena. "We're building the infrastructure to answer these critical questions." The new LMArena next week reflects months of feedback from the community and includes: a rebuilt UI, mobile-first design, lower latency, and new features like saved chat history and endless chat. The legacy site will remain live for a while, but all future innovation is happening on "AI evaluation has often lagged behind model development," said Ion Stoica, co-founder at LMArena and UC Berkeley professor. "LMArena closes that gap by putting rigorous, community-driven science at the center. It's refreshing to be part of a team that leads with long-term integrity in a space moving this fast." Backers say what makes LMArena different is not just the product, but the principles behind it. Evaluation is open, the leaderboard mechanics are published, and all models are tested with diverse, real-world prompts. This approach makes it possible to explore in-depth how AI performs across a range of use cases. "Our mission has always been to make AI evaluation open, scientific, and grounded in how people actually use these models. As we expand into new modalities and deepen our evaluation tools, we're building infrastructure that doesn't just evaluate AI, it helps shape it" said Wei-Lin Chiang, co-founder and CTO of LMArena. "We're here to ensure AI is reliably measured through real-world use." LMArena is already working with model providers to help them uncover performance trends, gather human preference data, and test updates in real-world conditions. The company's long-term business model centers on trust: as they look to develop advanced analytics and enterprise services while keeping core participation free and open to all. "We invested in LMArena because the future of AI depends on reliability," said Anjney Midha, General Partner at a16z. "And reliability requires transparent, scientific, community-led evaluation. LMArena is building that backbone." Jagdeep Singh Bachher, chief investment officer at UC Investments, added, "We're excited to see open AI research translated into real-world impact through platforms like LMArena. Supporting innovation from university labs such as those at UC Berkeley is essential for building technologies that responsibly serve the public and advance the field." The relaunch of LMArena next week is a significant step forward, but it's far from the finish line. The team is actively shipping new features, refining the platform, and working closely with the community to shape what comes next. About LMArena: LMArena is an open platform where everyone has access to leading AI models and can contribute to their progress through real-world voting and feedback. Built with scientific rigor and transparency at its core, LMArena enables developers, researchers, and users to compare model outputs, uncover performance differences, and advance the reliability of AI systems. With a commitment to open access, reproducible methods, and diverse human judgment, LMArena is shaping the infrastructure layer AI needs to earn long-term trust. Learn more at Press Contact:Cherry Parkcherry@ View original content: SOURCE LMArena


Bloomberg
21-05-2025
- Business
- Bloomberg
LMArena Goes From Academic Project to $600 Million Startup
Chatbot Arena started as an academic project, where researchers and students at the University of California at Berkeley worked to evaluate the capacity of artificial intelligence tools. Now, the group has spun out into a new company, called LMArena, that's raised $100 in seed funding from a slate of A-list investors. Andreessen Horowitz and UC Investments — which manages an investment portfolio for the University of California — led the fundraising, which the company plans to announce Wednesday. The deal includes backing from Lightspeed Venture Partners, Felicis Ventures and Kleiner Perkins, among others, the company said.
Yahoo
01-05-2025
- Business
- Yahoo
Study accuses LM Arena of helping top AI labs game its benchmark
A new paper from AI lab Cohere, Stanford, MIT, and Ai2 accuses LM Arena, the organization behind the popular crowdsourced AI benchmark Chatbot Arena, of helping a select group of AI companies achieve better leaderboard scores at the expense of rivals. According to the authors, LM Arena allowed some industry-leading AI companies like Meta, OpenAI, Google, and Amazon to privately test several variants of AI models, then not publish the scores of the lowest performers. This made it easier for these companies to achieve a top spot on the platform's leaderboard, though the opportunity was not afforded to every firm, the authors say. "Only a handful of [companies] were told that this private testing was available, and the amount of private testing that some [companies] received is just so much more than others," said Cohere's VP of AI research and co-author of the study, Sara Hooker, in an interview with TechCrunch. "This is gamification." Created in 2023 as an academic research project out of UC Berkeley, Chatbot Arena has become a go-to benchmark for AI companies. It works by putting answers from two different AI models side-by-side in a "battle," and asking users to choose the best one. It's not uncommon to see unreleased models competing in the arena under a pseudonym. Votes over time contribute to a model's score — and, consequently, its placement on the Chatbot Arena leaderboard. While many commercial actors participate in Chatbot Arena, LM Arena has long maintained that its benchmark is an impartial and fair one. However, that's not what the paper's authors say they uncovered. One AI company, Meta, was able to privately test 27 model variants on Chatbot Arena between January and March leading up to the tech giant's Llama 4 release, the authors allege. At launch, Meta only publicly revealed the score of a single model — a model that happened to rank near the top of the Chatbot Arena leaderboard. In an email to TechCrunch, LM Arena Co-Founder and UC Berkeley Professor Ion Stoica said that the study was full of "inaccuracies" and "questionable analysis." "We are committed to fair, community-driven evaluations, and invite all model providers to submit more models for testing and to improve their performance on human preference," said LM Arena in a statement provided to TechCrunch. "If a model provider chooses to submit more tests than another model provider, this does not mean the second model provider is treated unfairly." Armand Joulin, a principal researcher at Google DeepMind, also noted in a post on X that some of the study's numbers were inaccurate, claiming Google only sent one Gemma 3 AI model to LM Arena for pre-release testing. Hooker responded to Joulin on X, promising the authors would make a correction. The paper's authors started conducting their research in November 2024 after learning that some AI companies were possibly being given preferential access to Chatbot Arena. In total, they measured more than 2.8 million Chatbot Arena battles over a five-month stretch. The authors say they found evidence that LM Arena allowed certain AI companies, including Meta, OpenAI, and Google, to collect more data from Chatbot Arena by having their models appear in a higher number of model "battles." This increased sampling rate gave these companies an unfair advantage, the authors allege. Using additional data from LM Arena could improve a model's performance on Arena Hard, another benchmark LM Arena maintains, by 112%. However, LM Arena said in a post on X that Arena Hard performance does not directly correlate to Chatbot Arena performance. Hooker said it's unclear how certain AI companies might've received priority access, but that it's incumbent on LM Arena to increase its transparency regardless. In a post on X, LM Arena said that several of the claims in the paper don't reflect reality. The organization pointed to a blog post it published earlier this week indicating that models from non-major labs appear in more Chatbot Arena battles than the study suggests. One important limitation of the study is that it relied on "self-identification" to determine which AI models were in private testing on Chatbot Arena. The authors prompted AI models several times about their company of origin, and relied on the models' answers to classify them — a method that isn't foolproof. However, Hooker said that when the authors reached out to LM Arena to share their preliminary findings, the organization didn't dispute them. TechCrunch reached out to Meta, Google, OpenAI, and Amazon — all of which were mentioned in the study — for comment. None immediately responded. In the paper, the authors call on LM Arena to implement a number of changes aimed at making Chatbot Arena more "fair." For example, the authors say, LM Arena could set a clear and transparent limit on the number of private tests AI labs can conduct, and publicly disclose scores from these tests. In a post on X, LM Arena rejected these suggestions, claiming it has published information on pre-release testing since March 2024. The benchmarking organization also said it "makes no sense to show scores for pre-release models which are not publicly available," because the AI community cannot test the models for themselves. The researchers also say LM Arena could adjust Chatbot Arena's sampling rate to ensure that all models in the arena appear in the same number of battles. LM Arena has been receptive to this recommendation publicly, and indicated that it'll create a new sampling algorithm. The paper comes weeks after Meta was caught gaming benchmarks in Chatbot Arena around the launch of its above-mentioned Llama 4 models. Meta optimized one of the Llama 4 models for 'conversationality,' which helped it achieve an impressive score on Chatbot Arena's leaderboard. But the company never released the optimized model — and the vanilla version ended up performing much worse on Chatbot Arena. At the time, LM Arena said Meta should have been more transparent in its approach to benchmarking. Earlier this month, LM Arena announced it was launching a company, with plans to raise capital from investors. The study increases scrutiny on private benchmark organization's — and whether they can be trusted to assess AI models without corporate influence clouding the process. This article originally appeared on TechCrunch at Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data
Yahoo
01-05-2025
- Business
- Yahoo
Study accuses LM Arena of helping top AI labs game its benchmark
A new paper from AI lab Cohere, Stanford, MIT, and Ai2 accuses LM Arena, the organization behind the popular crowdsourced AI benchmark Chatbot Arena, of helping a select group of AI companies achieve better leaderboard scores at the expense of rivals. According to the authors, LM Arena allowed some industry-leading AI companies like Meta, OpenAI, Google, and Amazon to privately test several variants of AI models, then not publish the scores of the lowest performers. This made it easier for these companies to achieve a top spot on the platform's leaderboard, though the opportunity was not afforded to every firm, the authors say. "Only a handful of [companies] were told that this private testing was available, and the amount of private testing that some [companies] received is just so much more than others," said Cohere's VP of AI research and co-author of the study, Sara Hooker, in an interview with TechCrunch. "This is gamification." Created in 2023 as an academic research project out of UC Berkeley, Chatbot Arena has become a go-to benchmark for AI companies. It works by putting answers from two different AI models side-by-side in a "battle," and asking users to choose the best one. It's not uncommon to see unreleased models competing in the arena under a pseudonym. Votes over time contribute to a model's score — and, consequently, its placement on the Chatbot Arena leaderboard. While many commercial actors participate in Chatbot Arena, LM Arena has long maintained that its benchmark is an impartial and fair one. However, that's not what the paper's authors say they uncovered. One AI company, Meta, was able to privately test 27 model variants on Chatbot Arena between January and March leading up to the tech giant's Llama 4 release, the authors allege. At launch, Meta only publicly revealed the score of a single model — a model that happened to rank near the top of the Chatbot Arena leaderboard. In an email to TechCrunch, LM Arena Co-Founder and UC Berkeley Professor Ion Stoica said that the study was full of "inaccuracies" and "questionable analysis." "We are committed to fair, community-driven evaluations, and invite all model providers to submit more models for testing and to improve their performance on human preference," said LM Arena in a statement provided to TechCrunch. "If a model provider chooses to submit more tests than another model provider, this does not mean the second model provider is treated unfairly." Armand Joulin, a principal researcher at Google DeepMind, also noted in a post on X that some of the study's numbers were inaccurate, claiming Google only sent one Gemma 3 AI model to LM Arena for pre-release testing. Hooker responded to Joulin on X, promising the authors would make a correction. The paper's authors started conducting their research in November 2024 after learning that some AI companies were possibly being given preferential access to Chatbot Arena. In total, they measured more than 2.8 million Chatbot Arena battles over a five-month stretch. The authors say they found evidence that LM Arena allowed certain AI companies, including Meta, OpenAI, and Google, to collect more data from Chatbot Arena by having their models appear in a higher number of model "battles." This increased sampling rate gave these companies an unfair advantage, the authors allege. Using additional data from LM Arena could improve a model's performance on Arena Hard, another benchmark LM Arena maintains, by 112%. However, LM Arena said in a post on X that Arena Hard performance does not directly correlate to Chatbot Arena performance. Hooker said it's unclear how certain AI companies might've received priority access, but that it's incumbent on LM Arena to increase its transparency regardless. In a post on X, LM Arena said that several of the claims in the paper don't reflect reality. The organization pointed to a blog post it published earlier this week indicating that models from non-major labs appear in more Chatbot Arena battles than the study suggests. One important limitation of the study is that it relied on "self-identification" to determine which AI models were in private testing on Chatbot Arena. The authors prompted AI models several times about their company of origin, and relied on the models' answers to classify them — a method that isn't foolproof. However, Hooker said that when the authors reached out to LM Arena to share their preliminary findings, the organization didn't dispute them. TechCrunch reached out to Meta, Google, OpenAI, and Amazon — all of which were mentioned in the study — for comment. None immediately responded. In the paper, the authors call on LM Arena to implement a number of changes aimed at making Chatbot Arena more "fair." For example, the authors say, LM Arena could set a clear and transparent limit on the number of private tests AI labs can conduct, and publicly disclose scores from these tests. In a post on X, LM Arena rejected these suggestions, claiming it has published information on pre-release testing since March 2024. The benchmarking organization also said it "makes no sense to show scores for pre-release models which are not publicly available," because the AI community cannot test the models for themselves. The researchers also say LM Arena could adjust Chatbot Arena's sampling rate to ensure that all models in the arena appear in the same number of battles. LM Arena has been receptive to this recommendation publicly, and indicated that it'll create a new sampling algorithm. The paper comes weeks after Meta was caught gaming benchmarks in Chatbot Arena around the launch of its above-mentioned Llama 4 models. Meta optimized one of the Llama 4 models for 'conversationality,' which helped it achieve an impressive score on Chatbot Arena's leaderboard. But the company never released the optimized model — and the vanilla version ended up performing much worse on Chatbot Arena. At the time, LM Arena said Meta should have been more transparent in its approach to benchmarking. Earlier this month, LM Arena announced it was launching a company, with plans to raise capital from investors. The study increases scrutiny on private benchmark organization's — and whether they can be trusted to assess AI models without corporate influence clouding the process.