Latest news with #MLCommons


Deccan Herald
3 days ago
- General
- Deccan Herald
MLCommons partners with NASSCOM to bring global AI reliability benchmarks to India
As part of this global effort, MLCommons is partnering with NASSCOM to introduce the benchmark to South Asia, with a focus on India-specific, Hindi-language reliability standards, according to a media release.


Reuters
02-04-2025
- Business
- Reuters
New AI benchmarks test speed of running AI applications
SAN FRANCISCO, April 2 (Reuters) - Artificial intelligence group MLCommons unveiled two new benchmarks that it said can help determine how quickly top-of-the-line hardware and software can run AI applications. Since the launch of OpenAI's ChatGPT over two years ago, chip companies have begun to shift their focus to making hardware that can efficiently run the code that allows millions of people to use AI tools. As the underlying models must respond to many more queries to power AI applications such as chatbots and search engines, MLCommons developed two new versions of its MLPerf benchmarks to gauge speed. One of the new benchmarks is based on Meta's (META.O), opens new tab so-called Llama 3.1 405-billion-parameter AI model, and the test targets general question answering, math and code generation. The new format tests a system's ability to process large queries and synthesize data from multiple sources. Nvidia (NVDA.O), opens new tab submitted several of its chips for the benchmark, and so did system builders such as Dell Technologies (DELL.N), opens new tab. There were no Advanced Micro Devices (AMD.O), opens new tab submissions for the large 405-billion-parameter benchmark, according to data provided by MLCommons. For the new test, Nvidia's latest generation of artificial intelligence servers - called Grace Blackwell, which have 72 Nvidia graphics processing units (GPUs) inside - was 2.8 to 3.4 times faster than the previous generation, even when only using eight GPUs in the newer server to create a direct comparison to the older model, the company said at a briefing on Tuesday. Nvidia has been working to speed up the connections of chips inside its servers, which is important in AI work where a chatbot runs on multiple chips at once. The second benchmark is also based on an open-source AI model built by Meta and the test aims to more closely simulate the performance expectations set by consumer AI applications such as ChatGPT. The goal is to tighten the response time for the benchmark and make it close to an instant response.


Associated Press
11-02-2025
- Business
- Associated Press
MLCommons Releases AILuminate LLM v1.1, Adding French Language Capabilities to Industry-Leading AI Safety Benchmark
MLCommons, in partnership with the AI Verify Foundation, today released v1.1 of AILuminate, incorporating new French language capabilities into its first-of-its-kind AI safety benchmark. The new update – which was announced at the Paris AI Action Summit – marks the next step towards a global standard for AI safety and comes as AI purchasers across the globe seek to evaluate and limit product risk in an emerging regulatory landscape. Like its v1.0 predecessor, the French LLM version 1.1 was developed collaboratively by AI researchers and industry experts, ensuring a trusted, rigorous analysis of chatbot risk that can be immediately incorporated into company decision-making. 'Companies around the world are increasingly incorporating AI in their products, but they have no common, trusted means of comparing model risk,' said Rebecca Weiss, Executive Director of MLCommons. 'By expanding AILuminate's language capabilities, we are ensuring that global AI developers and purchasers have access to the type of independent, rigorous benchmarking proven to reduce product risk and increase industry safety.' Like the English v1.0, the v1.1 French model of AILuminiate assesses LLM responses to over 24,000 French language test prompts across twelve categories of hazards behaviors – including violent crime, hate, and privacy. Unlike many of peer benchmarks, none of the LLMs evaluated are given advance access to specific evaluation prompts or the evaluator model. This ensures a methodological rigor uncommon in standard academic research and an empirical analysis that can be trusted by industry and academia alike. 'Building safe and reliable AI is a global problem – and we all have an interest in coordinating on our approach,' said Peter Mattson, Founder and President of MLCommons. 'Today's release marks our commitment to championing a solution to AI safety that's global by design and is a first step toward evaluating safety concerns across diverse languages, cultures, and value systems.' The AILuminate benchmark was developed by the MLCommons AI Risk and Reliability working group, a team of leading AI researchers from institutions including Stanford University, Columbia University, and TU Eindhoven, civil society representatives, and technical experts from Google, Intel, NVIDIA, Microsoft, Qualcomm Technologies, Inc., and other industry giants committed to a standardized approach to AI safety. Cognizant that AI safety requires a coordinated global approach, MLCommons also collaborated with international organizations such as the AI Verify Foundation to design the AILuminate benchmark. 'MLCommons' work in pushing the industry toward a global safety standard is more important now than ever,' said Nicolas Miailhe, Founder and CEO of PRISM Eval. 'PRISM is proud to support this work with our latest Behavior Elicitation Technology (BET), and we look forward to continuing to collaborate on this important trustbuilding effort – in France and beyond.' Currently available in English and French, AILuminate will be made available in Chinese and Hindi later this year. For more information on MLCommons and the AILuminate Benchmark, please visit About MLCommons MLCommons is the world's leader in AI benchmarking. An open engineering consortium supported by over 125 members and affiliates, MLCommons has a proven record of bringing together academic, industry, and civil society to measure and improve AI. The foundation for MLCommons began with the MLPerf benchmarks in 2018, which rapidly scaled as a set of industry metrics to measure machine learning performance and promote transparency of machine learning techniques. Since then, MLCommons has continued to use collective engineering to build the benchmarks and metrics required for better AI – ultimately helping to evaluate and improve the accuracy, safety, speed, and efficiency of AI technologies. SOURCE: MLCommons Copyright Business Wire 2025. PUB: 02/11/2025 12:00 AM/DISC: 02/10/2025 11:59 PM