Latest news with #benchmark

SPEC Releases New SPECapc for SNX 2024 Benchmark

Yahoo

2 days ago

Business
Yahoo

SPEC Releases New SPECapc for SNX 2024 Benchmark

SPECapc for SNX 2024 Benchmark GAINESVILLE, Va., May 29, 2025 (GLOBE NEWSWIRE) -- The Standard Performance Evaluation Corporation (SPEC), the trusted global leader in computing benchmarks, today announced the availability of an all-new SPECapc for SNX 2024 benchmark, providing a completely new take on measuring Siemens NX CAD and CAM software performance. Siemens NX is award-winning, processor-intensive software that helps designers and manufacturers deliver better products faster through a powerful combination of CAD and CAM solutions. The new benchmark runs on the continuous release version of Siemens NX, which will enable SPEC to update the benchmark more frequently.* The SPECapc for SNX 2024 benchmark executes graphics tests that include rotation, pan, zoom and clipping for each model. Viewport tests within the benchmark measure performance for field of view and feature regeneration operations. Anti-aliasing can be enabled or disabled to allow users to assess performance differences between the two modes. With the SPECapc for SNX 2024 benchmark, the range of application users, including professionals, students, and artists, as well as hardware developers and vendors, can discover how different hardware configurations affect the performance of the application. 'SPEC is committed to providing the Siemens NX user and development communities with a fair and reliable benchmark for Siemens NX, and we are grateful for the opportunity to work with Siemens to make this happen,' said SPECapc Chair Jessica Heerboth. 'We rigorously developed this benchmark according to our principles for creating a good benchmark, which include it being vendor agnostic, unbiased, use-case-dependent, scalable, extensible and more. These characteristics ensure the most accurate performance measurements, enabling the best possible decisions when making hardware purchases to run this processor-intensive design software.' Key features of the SPECapc for SNX 2024 benchmark Exporting models to different file formats – This test measures exports to STEP and IGES, the most frequently used file formats. Closest point calculations – This test measures a picked point in space for every surface / edge on a model and calculates which point on the model is closest. Mass property calculations – For every solid body in the model, this test calculates the vector of mass / inertia / movement, etc. The test calculates forces in physical simulations: volume, mass, center of mass, moments, moments of inertia, spherical moments of inertia, radii of gyration, etc. Model loading – This test measures the basic app function of opening and loading a model. Display mode – This test iterates through multiple modes. Cross section – This test cuts a model in half and rotates it around, providing a detailed look at the inside of a model – all the parts and details and how they fit together. Explosion – This test explodes out to show an inner view of a model. It is similar to the cross section test but provides a view of each part individually without the detail of how they fit together. Sync views – This test provides two different views of the model and rotates them in a synchronized fashion, providing a view of how things fit and potentially fuse together from different angles. Multiple viewports – This test provides six different views of a model from different perspectives. It continuously shows the model from all angles to increase awareness of how changes affect the model. Available for Immediate DownloadThe SPECapc for SNX 2024 benchmark is available for immediate download from SPEC under a two-tiered pricing structure: free for the user community and $2,500 for sellers of computer-related products and services. SPEC/GWPG members receive benchmark licenses as a membership benefit. About SPECSPEC is a non-profit organization that establishes, maintains and endorses standardized benchmarks and tools to evaluate performance for the newest generation of computing systems. Its membership comprises more than 120 leading computer hardware and software vendors, educational institutions, research organizations, and government agencies worldwide. *Please note: The SPECapc for SNX 2024 benchmark can run on the latest continuous release version of Siemens NX; however, since each new application build version can differ in terms of performance and output, please refer to the benchmark run rules for the exact Siemens NX build version officially supported. Media contact: Brigit Valencia360.597.4516brigit@ SPEC® and SPECapc® are trademarks of the Standard Performance Evaluation Corporation. All other product and company names herein may be trademarks of their registered owners. A photo accompanying this announcement is available at in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data

Galaxy Z Flip7's Exynos 2500 chipset gets benchmarked, here are the results

GSM Arena

5 days ago

GSM Arena

Galaxy Z Flip7's Exynos 2500 chipset gets benchmarked, here are the results

Samsung is now rumored to be using the delayed Exynos 2500 chipset for the upcoming Galaxy Z Flip7 smartphone. And a benchmark run of a prototype, bearing the model number SM-F766U, has been spotted in the Geekbench database. That's interesting since that model number belongs to the version headed to the US, so it seems very likely that the Flip7 will use the Exynos 2500 all over the world. The benchmark gives us an idea of what to expect, performance-wise, from the SoC that was meant to be in the Galaxy S25 family but ended up being replaced with the Snapdragon 8 Elite due to yield issues. Let's just outright say it: it's definitely not up there with the Snapdragon 8 Elite. Samsung Galaxy Z Flip 7 SM-F766U with Exynos 2500 runs on Exynos 25001 Core @ 3.30 GHz2 Cores @ 2.75 GHz5 Cores @ 2.36 GHz2 Cores @ 1.80 GHz🎮 Samsung Xclipse 950 GPU🍭 Android 16- 12GB RAMScoresSingle-core: 2012 Multi-core: 7563… — Abhishek Yadav (@yabhishekhd) May 26, 2025 The Flip7 prototype managed a 2,012 single-core score and a 7,563 multi-core score, which is more in line with the Snapdragon 8 Gen 3 than its successor. And when we take a look at the CPU configuration, it starts to make sense - this chip has a ten-core CPU, with one Cortex-X925 core clocked at up to 3.3 GHz, two Cortex-A725 cores clocked at up to 2.75 GHz, five Cortex-A725 cores clocked at up to 2.36 GHz, and two Cortex-A520 cores clocked at up to 1.8 GHz. The clocks are much lower on this chip than on its competitors from Qualcomm, MediaTek, and even Xiaomi. We're not sure whether that's a specific setup for this device, given how it's a foldable and won't be great at heat dissipation, or if this is the standard Exynos 2500 configuration. Anyway, the Flip7 will launch running Android 16 with One UI 8 on top, and will have 12GB of RAM, according to the Geekbench database. It's expected to be introduced in early July alongside the Galaxy Z Fold7.

OpenAI's Latest Hardware Push And HealthBench Work Will Accelerate Healthcare AI Capabilities

Forbes

5 days ago

Health
Forbes

OpenAI's Latest Hardware Push And HealthBench Work Will Accelerate Healthcare AI Capabilities

OpenAI's latest push into hardware has opened up an entirely new realm of possibilities for the ... More company. OpenAI, the famed company behind ChatGPT, released HealthBench, a new standard to measure AI outputs specifically for healthcare use cases. The company indicates that creation of the standard involved the partnership of 262 physicians across 60 countries to develop 5,000 conversations with customized 'rubrics' for each to determine the efficacy and quality of responses from models. The company announced that their vision for the benchmark is to ultimately ensure that healthcare models should be: Furthermore, the company also announced last week that it would be acquiring Jony Ive's startup IO for $6.5 billion to make its inroads into the world of hardware and devices. Ive is most famous for his contributions to and design of the original iPhone and other flagship products in Apple's early days of moving into the world of mobile. This move signals OpenAI's formal commitment to build a device that could potentially integrate its AI work; very little is known about what the device may be, but many are speculating that it will be 'unobtrusive [and] fully aware of a user's life and surroundings.' Why is all of this important? The intersection of healthcare and AI is rapidly growing across the ecosystem, especially as technology companies and large hyper-scalers are investing billions of dollars to ramp up models specifically for healthcare use-cases. Additionally, new hardware and devices add an entirely new layer to this phenomenon, as users will be able to better use these devices to interact with their surroundings, track their day-to-day health metrics further and have a true 'intelligent companion'-- almost akin to having a live concierge clinician with them at all times. Take for example Meta, which has created one of the most successful open-source models with Llama. Earlier this month, the company released a seminal case study which examined how a major health system (MHS) utilized the Llama 3.1 8B model to generate clinical documentation and ease workflows. Specifically, the model was used to 'reduce time spent abstracting data from electronic health records (EHRs) while maintaining patient confidentiality' and alleviate manual clinical annotation tasks and chart review. The study ultimately found that the use of the platform resulted in nearly 70-80% less manual annotations, creating the potential for nearly $176 in savings per patient record. Scaled across large healthcare systems over the course of multiple years, this could lead to potentially billions of dollars saved and thousands of hours recovered from clinical staff. Additionally, their much anticipated Orion glasses product line has massive potential to augment human health capabilities. Another great example is Google's Med-PaLM large language model. The original version of the model was incredibly successful, having received more than a 60% score on the U.S. Medical Licensing Exam (USMLE). Since then, the company has made significant progress and Med-PaLM 2 scored 86.5% on medical benchmark tests. Last week, Google also introduced its latest MedGemma model, which has even higher comprehension capabilities for medical text and images. Google has worked with numerous healthcare organizations and systems to deploy its models across a variety of use cases, ranging from clinical documentation and workflow optimization to agentic uses and task automation. Google also announced its own upcoming line of AI powered glasses, Android XR. Indeed, the landscape as a whole is growing immensely. A paper that was published in Nature in 2023 describes the impact that the growth of medically tuned large language models will have in medicine: 'LLMs have the potential to improve patient care by augmenting core medical competencies such as factual knowledge or interpersonal communication skills.' Specifically, the paper documents a variety of areas which are already capturing significant value from the development of these advanced models, including augmenting communication with patients, creating opportunities for better transmission of complex medical information, collating and summarizing data from a variety of data sources and formats, and even in medical research, which often requires large swaths of data to be analyzed to generate meaningful and concise insights. OpenAI's push with HealthBench, and the larger industry push towards creating broader device ecosystems, will inevitably advance healthcare and societal health outcomes, if done in a safe, well-tested and patient centered manner.

HK stocks hit new two month high, benchmark forecast improves

The Standard

23-05-2025

Business
The Standard

HK stocks hit new two month high, benchmark forecast improves

HK stocks hit new two month high, benchmark forecast improves The Hang Seng Index rose as high as 23,917 points before closing at 23,827 on Wednesday. SING TAO

AI benchmarking platform is helping top companies rig their model performances, study claims

Yahoo

23-05-2025

Business
Yahoo

AI benchmarking platform is helping top companies rig their model performances, study claims

When you buy through links on our articles, Future and its syndication partners may earn a commission. The go-to benchmark for artificial intelligence (AI) chatbots is facing scrutiny from researchers who claim that its tests favor proprietary AI models from big tech companies. LM Arena effectively places two unidentified large language models (LLMs) in a battle to see which can best tackle a prompt, with users of the benchmark voting for the output they like most. The results are then fed into a leaderboard that tracks which models perform the best and how they have improved. However, researchers have claimed that the benchmark is skewed, granting major LLMs "undisclosed private testing practices" that give them an advantage over open-source LLMs. The researchers published their findings April 29 in on the preprint database arXiv, so the study has not yet been peer reviewed. "We show that coordination among a handful of providers and preferential policies from Chatbot Arena [later LM Arena] towards the same small group have jeopardized scientific integrity and reliable Arena rankings," the researchers wrote in the study. "As a community, we must demand better." Beginning as Chatbot Arena, a research project created in 2023 by researchers at the University of California, Berkeley's Sky Computing Lab, LM Arena quickly became a popular site for top AI companies and open-source underdogs to test their models. Favoring "vibes-based" analysis drawn from user responses over academic benchmarks, the site now gets more than 1 million visitors a month. To assess the impartiality of the site, the researchers measured more than 2.8 million battles taken over a five-month period. Their analysis suggests that a handful of preferred providers — the flagship models of companies including Meta, OpenAI, Google and Amazon — had "been granted disproportionate access to data and testing" as their models appeared in a higher number of battles, conferring their final versions with a significant advantage. "Providers like Google and OpenAI have received an estimated 19.2% and 20.4% of all data on the arena, respectively," the researchers wrote. "In contrast, a combined 83 open-weight models have only received an estimated 29.7% of the total data." In addition, the researchers noted that proprietary LLMs are tested in LM Arena multiple times before their official release. Therefore, these models have more access to the arena's data, meaning that when they are finally pitted against other LLMs they can handily beat them, with only the best-performing iteration of each LLM placed on the public leaderboard, the researchers claimed. "At an extreme, we identify 27 private LLM variants tested by Meta in the lead-up to the Llama-4 release. We also establish that proprietary closed models are sampled at higher rates (number of battles) and have fewer models removed from the arena than open-weight and open-source alternatives," the researchers wrote in the study. "Both these policies lead to large data access asymmetries over time." In effect, the researchers argue that being able to test multiple pre-release LLMs, having the ability to retract benchmark scores, only having the highest performing iteration of their LLM placed on the leaderboard, as well as certain commercial models appearing in the arena more often than others, gives big AI companies the ability to "overfit" their models. This potentially boosts their arena performance over competitors, but it may not mean their models are necessarily of better quality. RELATED STORIES — Scientists use AI to encrypt secret messages that are invisible to cybersecurity systems — What is the Turing test? How the rise of generative AI may have broken the famous imitation game — US Air Force wants to develop smarter mini-drones powered by brain-inspired AI chips The research has called into question the authority of LM Arena as an AI benchmark. LM Arena has yet to provide an official comment to Live Science, only offering background information in an email response. But the organization did post a response to the research on the social platform X. "Regarding the statement that some model providers are not treated fairly: this is not true. Given our capacity, we have always tried to honor all the evaluation requests we have received," company representatives wrote in the post. "If a model provider chooses to submit more tests than another model provider, this does not mean the second model provider is treated unfairly. Every model provider makes different choices about how to use and value human preferences." LM Arena also claimed that there were errors in the researchers' data and methodology, responding that LLM developers don't get to choose the best score to disclose, and that only the score achieved by a released LLM is put on the public leaderboard. Nonetheless, the findings raise questions about how LLMs can be tested in a fair and consistent manner, particularly as passing the Turing test isn't the AI watermark it arguably once was, and that scientists are looking at better ways to truly assess the rapidly growing capabilities of AI.

Latest news with #benchmark

SPEC Releases New SPECapc for SNX 2024 Benchmark

Galaxy Z Flip7's Exynos 2500 chipset gets benchmarked, here are the results

OpenAI's Latest Hardware Push And HealthBench Work Will Accelerate Healthcare AI Capabilities

HK stocks hit new two month high, benchmark forecast improves

AI benchmarking platform is helping top companies rig their model performances, study claims

Get Started Now: Download the App