![[Inside K-AI] How benchmarks shape AI battlefield -- and where Korea's models stand](/_next/image?url=https%3A%2F%2Fwimg.heraldcorp.com%2Fnews%2Fcms%2F2025%2F08%2F14%2Fnews-p.v1.20250814.bff0185742694af599cedae774326d8f_T1.jpg&w=3840&q=100)
[Inside K-AI] How benchmarks shape AI battlefield -- and where Korea's models stand
The race for sovereign AI is intensifying, with countries rushing to build their own large language models to secure technological independence. Korea is no exception -- the government has tapped five leading companies to spearhead the creation of homegrown models tailored to national priorities. In this high-stakes contest, The Korea Herald launches a special series exploring Korea's AI industry and its standing in the global arena, and the rise of Korean-language-focused systems. This first installment looks at benchmarks -- the scorecards of the AI world -- and how Korean models measure up on the tests that are shaping the race. – Ed.
AI has swept across the tech industry, powering chatbots, search engines and productivity tools. OpenAI's ChatGPT -- which first ignited the global buzz in November 2022 -- and other big tech models sit firmly in the top tier, but the surge of large language models shows no sign of slowing.
Each new arrival is touted as the smartest or the first of its kind, outscoring the rest. That raises a key question: how are these models really evaluated, and which is the true leader?
The answer lies in benchmarks -- the standardized tests that have become the AI world's scoreboard, where companies race to climb the rankings and prove their worth.
In July, South Korea's Upstage pulled off an unexpected breakthrough when its 31-billion-parameter Solar Pro 2 became the only Korean model listed as a "frontier model" by UK-based benchmarking platform Artificial Analysis. It ranked just outside the global top 10 for intelligence and placed first in Intelligence vs. Cost to Run, a measure of how much capability a model delivers for its operating cost.
The result prompted swift reaction from Elon Musk, whose AI company xAI is also a relative newcomer battling entrenched leaders. In a post on X, he insisted his Grok 4 model "remains No. 1" and is "rapidly improving" -- a pointed defense that reflects how sensitive and strategic leaderboard positions have become in the global AI race.
Launching its latest GPT-5 model last week, OpenAI also promoted the model as "much smarter" than earlier ones and cited scores in several key benchmarks measuring performance in areas such as math, coding and visual perception.
"For engineers, benchmarks serve as a barometer for how the LLM they developed fares in the global competition, and as a compass for its future development," an official of an LLM startup said.
Constant race to set new records
Much like human IQ tests or university entrance exams, the benchmarks offer a structured way to measure various capabilities, from language comprehension and reasoning to code generation, under the same conditions. When an LLM tops a benchmark, it is deemed State-of-the-Art (SOTA) for that task -- a title that can quickly change as new models are released.
MMLU, which is one of the most widely used benchmarks, poses more than 15,000 multiple-choice questions across 57 subjects. HumanEval and LiveCodeBench test coding ability, while AIME and MATH-500 gauge mathematical reasoning.
For instance, OpenAI boasted that its new GPT-5 achieved SOTA in math, scoring 94.6 percent on AIME 2025 without tools; in real-world coding, scoring 74.9 percent on SWE-bench Verified; and in multimodal understanding, achieving 84.2 percent on MMMU, among others.
Korean LLM firms are also working fiercely to set new records. Releasing its most up-to-date model Exaone 4.0 on July 15, LG AI Research promoted its strong performance in advanced benchmarks. In MMLU-Pro, the 32-billion-parameter model scored 81.8 percent, ahead of Microsoft's Phi 4 reasoning-plus with 76 percent and Mistral's Magistral Small-2506 at 73.4 percent. In AIME 2025, it also outperformed those rivals with a score of 85.3 percent.
As LLMs advance rapidly, the benchmarks themselves are also evolving. MMLU now offers a Pro edition with more complex reasoning questions. In January, a coalition of 1,000 experts launched Humanity's Last Exam -- a 2,500-question test spanning classical literature to quantum chemistry.
But what often confuses the public is the endless list of scores. Experts note that because LLMs can do so many different things, each has its own strengths -- making it difficult to declare one model "the best" based on a single benchmark.
To make sense of the growing number of benchmark results, platforms like Hugging Face provide leaderboards that compile scores from multiple tests and rank models accordingly. The Artificial Analysis Intelligence Index is another prominent one that aggregates results from eight advanced benchmarks -- including the MMLU-Pro, Humanity's Last Exam and AIME -- to produce an overall score.
With strong scores across multiple benchmarks, LG's Exaone and Upstage's Solar Pro 2 were the only Korean LLMs to make the Artificial Analysis index in July.
At the time of release, Exaone 4.0 ranked 11th globally in the Intelligence Index, standing shoulder to shoulder with big brands such as Google's Gemini, OpenAI's ChatGPT and Alibaba's Qwen.
Upstage's Solar Pro 2 went a step further, becoming the only Korean model recognized in the leaderboard's Frontier Language Model Intelligence category -- reserved for the highest-performing systems at the cutting edge of research and development. It also topped the Intelligence vs. Cost to Run metric.
'It is fair to say Korean models are quite competitive, considering their rivals are often several times larger," an LG official said, explaining how models like Grok 4, which held the top spot in the July index, has a staggering 1.7 trillion parameters -- meaning it used far more resources in training to achieve the intelligence score.
The list has since updated its benchmarks with more challenging tests and added newly released models such as GPT-5 -- which overtook Grok 4 for the top spot -- nudging the Korean models down slightly, though both remain in the global index.
LG AI Research and Upstage have both been named among the government's five consortia tasked with leading the development of South Korea's proprietary AI foundation models, alongside Naver Cloud, SK Telecom and NC AI.
Naver, which became the third company in the world to develop a hyperscale AI model with HyperClova in 2021, has since upgraded its foundation model and in June released HyperClova X Think. The company cites its model's strength in its deep understanding of the Korean language.
Going beyond benchmarks
The way benchmarks gain recognition is similar to how a new measurement scale in the social sciences becomes a standard. After being published in a peer-reviewed paper, it should be validated at a reputable academic conference and adopted by the global AI community, an industry official explained.
As crowded as the AI field is becoming, with one LLM after another touting new benchmark scores, the results still serve an important purpose: they offer guidelines for engineers in measuring their progress.
"Global big techs still lead, but players in countries like China, France and Korea are closing in, and the race is intense," an LG official said. "The presence of Korean companies on leaderboards and key benchmarks shows the country is not only catching up but is firmly in the game."
At the same time, the rollout of GPT-5 shows that real-world user experiences are just as important as strong performances in advanced benchmark tests. Launched on August 7, the highly anticipated OpenAI model shot to the top in the Artificial Analysis Intelligence Index, but has faced backlash from users who claim it feels "downgraded," citing a blander personality and surprisingly basic mistakes.
Lee Kyoung-jun, a big data analytics professor at Kyung Hee University, stressed that the true measure of an LLM's competitiveness lies in its practical utility.
"Korean LLMs are making strides in benchmarks, but it's important to note that even major models like Exaone are having little impact on the general public for now," Lee said. "Efforts must continue to ensure these excellent models are adopted in real use cases and achieve widespread adoption."
herim@heraldcorp.com
Hashtags

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles


Korea Herald
4 hours ago
- Korea Herald
Korean battery-makers operate at half capacity amid mounting Chinese competition
Korea's top three battery-makers -- LG Energy Solution, SK On and Samsung SDI -- operated at around 50 percent capacity in the first half of 2025, amid aggressive expansion by Chinese competitors in global electric vehicle markets. According to LG Energy Solution's half-year report released Thursday, the company maintained an average utilization rate of 51.3 percent, producing roughly 10 trillion won ($7.2 billion) worth of products across its global facilities. The figure has steadily declined from year-round rates of 69.3 percent in 2023 and 57.8 percent in 2024. SK On, by contrast, posted an operational rate of 52.2 percent in the same period, up from 43.6 percent in 2024, but still below 87.7 percent reached in 2023. Its US facilities, which supply batteries for Hyundai Motor Co.'s local production, operated at near full capacity as Hyundai ramped up manufacturing in the country during the first half. Samsung SDI did not disclose the utilization rate for its main EV battery segment, though it reported a 44 percent rate for its small-battery division. Industry estimates, however, place the company's overall operational rate at around 50 percent. Its European plants were believed to have operated at 30 to 40 percent capacity in the first quarter, before gradually recovering in the second. Meanwhile, its US facility under a 50:50 joint venture with automotive conglomerate Stellantis was also estimated to have run at below 60 percent in the first half. Behind the sluggish operation rates of Korean battery companies was the rapid growth of Chinese companies in the global EV sector. According to SNE Research, although global EV battery usage outside China rose 23.8 percent year-on-year in the first half of 2025, the combined market share of the three Korean companies fell 8.1 percentage points to 37.5 percent. LG Energy Solution and SK On posted growth of 2.2 percent and 10.6 percent, producing 43 gigawatt-hours and 19.6 GWh, respectively. They ranked second and third in the market. Samsung SDI, however, recorded a 7.8 percent decline, placing fifth. In contrast, Chinese battery-makers grew far faster than their Korean rivals. CATL and BYD, ranked first and sixth globally, recorded battery usage increases of 33.2 percent and 153 percent, respectively. Three other Chinese companies in the global top 10 also posted annual growth of over 30 percent. To address these challenges, Korean companies have been ramping up research and development efforts. Samsung SDI led R&D spending in the first half, allocating 704 billion won, or 11.1 percent of total sales, up from 7.8 percent a year earlier. LG Energy Solution invested 620 billion won, accounting for 5.2 percent of sales, up from 4.2 percent in 2024. SK On spent about 148 billion won, representing 0.52 percent of sales in the same period.


Korea Herald
6 hours ago
- Korea Herald
Foreign workers in S. Korea face triple the workplace death risk of Korean nationals
Foreign workers in South Korea are dying on the job at rates far higher than Korean nationals. In just the first half of this year, 75 workers lost their lives, according to new government figures. The disparity in fatality rates is stark. Foreign nationals make up just 3.4 percent of South Korea's total workforce, about 1 million out of 29 million workers. Yet they account for between 10 and 15 percent of workplace accident deaths each year. Based on those proportions, the likelihood of dying in a workplace accident is roughly three to four times higher for foreign workers than for Korean nationals. The new figures come from the Korea Workers' Compensation and Welfare Service, which also tracks injury and illness claims. Between January and June, foreign employees submitted 5,173 claims, up 4.5 percent from 4,950 in the same period in 2024. Most were for accident-related injuries (4,415 cases), while a smaller portion involved occupational diseases (758 cases). During that same period, compensation was approved in 59 of the 75 death cases. Fatal accidents were recognized in nearly every instance (51 approvals out of 53 claims), while occupational disease deaths were far less likely to be approved (8 approvals out of 22 claims). The number of claims from foreign workers has been climbing steadily for years. Annual claims by foreign workers grew from 8,062 in 2020 to more than 10,000 in 2024. Labor experts say this is no coincidence. Many foreign employees are concentrated in sectors with higher accident risks such as construction, manufacturing and agriculture. They also face language barriers, social discrimination and unstable contracts that can make it difficult to demand safer working conditions. Under the current Employment Permit System, changing employers is tightly restricted, even for workers in dangerous environments. The Ministry of Employment and Labor has announced plans to relax these restrictions so that foreign workers can more easily move to safer workplaces. Lee Yong-woo, a senior researcher at the IOM Migration Research and Training Center, said foreign workers face 'a combination of language barriers, social discrimination and unstable employment status' that heightens their risk compared with the broader workforce. He urged authorities to also address 'blind spots' such as workplace safety for undocumented laborers.


Korea Herald
8 hours ago
- Korea Herald
Samsung chief back in Seoul after US visit to 'prepare next year's business'
Samsung Electronics Chair Lee Jae-yong returned to South Korea early Friday after a 17-day business trip to the US. Lee arrived at Incheon Airport shortly after midnight, telling reporters briefly that he was returning after 'preparing for next year's business' before leaving the airport, without elaborating further. His trip began on July 29, when Lee departed to Washington to support South Korea's diplomatic efforts to secure a tariff deal with the US. Industry watchers say his two-week trip likely included high-level meetings with American tech giants to bolster cooperation and explore new business opportunities, as well as to assess the impact of US tariffs on Samsung's operations. The visit came a day after Samsung signed a record $16.5 billion foundry contract with Tesla to produce the carmaker's next-generation artificial intelligence chip, known as AI6. Tesla CEO Elon Musk hinted that the total volume of the deal could expand, replying on social platform X that 'the 16.5 billion number is just the bare minimum,' with actual production likely to be several times higher. During Lee's stay, Apple also announced that Samsung would manufacture chips for its flagship iPhone at the company's plant in Austin, Texas, a deal analysts believe Lee played a direct role in securing. While in Washington, Korea clinched an agreement with the US on July 30 to cut tariffs on Korean goods from a threatened 25 percent to 15 percent, in exchange for Seoul's commitment to $350 billion in investments in the US. Sources said Lee leveraged Samsung's global network and semiconductor supply chain ties to support the negotiations. The Samsung leader is scheduled to return to Washington later this month to join President Lee Jae Myung for his summit with US President Donald Trump on Aug. 25. Close attention is being paid to whether Samsung will announce additional investments during that trip.