‘Complete collapse': Bombshell report into AI accuracy indicates your job is probably safe
The latest form of cutting-edge artificial intelligence technology suffers 'fundamental limitations' that result in a 'complete accuracy collapse', a bombshell report from Apple has revealed.
Researchers from the tech giant have published a paper with their findings, which cast doubt on the true potential of AI as billions of dollars are poured into developing and rolling out new systems.
The team put large reasoning models, an advanced version of AI, used in platforms like DeepSeek and Claude, through a series of puzzle challenges ranging from simple to complex. They also tested large language models, which platforms like ChatGPT are built on.
Large language model AI systems fared better than large reasoning models with fairly standard tasks, but both fell flat when confronting more complex challenges, the paper revealed.
Researchers also found that large reasoning models began 'reducing their reasoning effort' as they struggled to perform, which was 'particularly concerning'.
'Upon approaching a critical threshold – which closely corresponds to their accuracy collapse point – models counterintuitively begin to reduce their reasoning effort despite increasing problem difficulty,' the paper read.
The advancement of AI, based on current approaches, might've reached its limit for now, the findings suggested.
Niusha Shafiabady, an associate professor of computational intelligence at Australian Catholic University and director of the Women in AI for Social Good lab, said 'expecting AI to be a magic wand' is a mistake.
'I have been talking about the realistic expectations about the AI models since 2024,' Dr Shafiabady said.
'When AI models face countless interactions with the world, it is not possible to investigate and control every single problem that could happen. That is why things could get out of hand or out of control.'
Gary Marcus, a leading voice on AI and six-time author, delivered a savage analysis of the Apple paper on his popular Substack, describing it as 'pretty devastating'.
'Anybody who thinks [large language models] are a direct route to the [artificial generative intelligence] that could fundamentally transform society for the good is kidding themselves,' Dr Marcus wrote.
Dr Marcus then took to X to declare that the hype around AI has become 'a giant game of bait and switch'.
'The bait: we are going to make an AI that can solve any problem an expert human could solve. It's gonna transform the whole world,' Dr Marcus wrote.
'The switch: what we have actually made is fun and kind of amazing in its own way but rarely reliable and often makes mistakes – but ordinary people makes mistakes too.'
In the wake of the paper's release, Dr Marcus has re-shared passionate defences of AI shared to X by evangelists defending the accuracy flaws that have been exposed.
'Imagine if calculator designers made a calculator that worked 80 per cent correctly and said 'naah, it's fine, people make mistakes too',' Mr Marcus quipped.
Questions about the quality of large language and large reasoning models aren't new.
For example, when released in April, OpenAI described its new o3 and o4-mini models as its 'smartest and most capable' yet, trained to 'think for longer before responding'.
'The combined power of state-of-the-art reasoning with full tool access translates into significantly stronger performance across academic benchmarks and real-world tasks, setting a new standard in both intelligence and usefulness,' the company's announcement read.
But testing by prestigious American university MIT revealed the o3 model was incorrect 51 per cent of the time, while o4-mini performed even worse with an error rate of 79 per cent.
Truth and accuracy undermined
Apple recently suspended its news alert feature on iPhones, powered by AI, after users reported significant accuracy errors.
Among the jaw-dropping mistakes was an alert that tennis icon Rafael Nadal had come out as gay, alleged United Healthcare CEO shooter Luigi Mangione had died by suicide in prison, and a winner had been crowned at the World Darts Championship hours before competition began.
Research conducted by the BBC found a litany of errors across other AI assistants providing information about news events, including Google's Gemini, OpenAI's ChatGPT and Microsoft's CoPilot.
It found 51 per cent of all AI-generated answers to queries about the news had 'significant issues' of some form. When looking at how its own news coverage was being manipulated, the BBC found 19 per cent of answers citing its content were factually incorrect.
And in 13 per cent of cases, quotes said to be contained within BBC stories had either been altered or entirely fabricated.
Meanwhile, a newspaper in Chicago was left red-faced recently after it published a summer reading list featuring multiple books that don't exist, thanks to the story copy being produced by AI.
And last year, hundreds of people who lined the streets of Dublin were disappointed when it turned out the Halloween parade advertised on an events website had been invented.
Google was among the first of the tech giants to roll out AI, summarising search results relying on a large language model – with some hilarious and possibly dangerous results.
Among them were suggestions to add glue to pizza, eat a rock a day to maintain health, take a bath with a toaster to cope with stress, drink two litres of urine to help pass kidney stones and chew tobacco to reduce the risk of cancer.
Jobs might be safe – for now
Ongoing issues with accuracy might have some companies thinking twice about going all-in on AI when it comes to substituting their workforces.
So too might some recent examples of the pitfalls of people being replaced with computers.
Buy now, pay later platform Klarna shed more than 1000 people from its global workforce as part of a dramatic shift to AI resourcing, sparked by its partnership with OpenAI, forged in 2023.
But last month, the Swedish firm conceded its strong reliance on AI customer service chatbots – which saw its employee count almost halved in two years – had created quality issues and led to a slump in customer satisfaction.
Realising most customers prefer interacting with a human, Klarna has begun hiring back actual workers.
Software company Anysphere faced a customer backlash in April when its AI-powered support chatbot went rogue, kicking users out of the code-editing platform Cursor and delivering incorrect information.
It then seemingly 'created' a new user policy out of thin air to justify the logouts – that the platform couldn't be used across multiple computers. Cursor saw a flood of customer cancellations as a result.
AI adviser and former Google chief decision scientist Casse Kozyrkov took to LinkedIn to share her thoughts on the saga, dubbing it a 'viral hot mess'.
'It failed to tell users that its customer support 'person' Sam is actually a hallucinating bot,' Ms Kozyrkov wrote. 'It's only going to get worse with AI agents.'
Many companies pushing AI insist the technology is improving swiftly, but a host of experts aren't convinced its hype matches its ability.
Earlier this year, the Association for the Advancement of Artificial Intelligence surveyed two dozen AI specialists and some 400 of the group's members and found a surprising level of pessimism about the potential of the technology.
Sixty per cent of those probed don't believe problems with factuality and trustworthiness 'would soon be solved', it found.
Issues of accuracy and reliability are important, not just for growing public trust in AI, but for preventing unintended consequences in the future, AAAI president Francesca Rossi wrote in a report about the survey.
'We all need to work together to advance AI in a responsible way, to make sure that technological progress supports the progress of humanity and is aligned to human values,' Ms Rossi said.
Projects stalled or abandoned
Embarrassing and potentially costly issues like these are contributing to a backtrack, with analysis by S&P Global Market Intelligence showing the share of American and European companies abandoning their AI initiatives rising to 42 per cent this year from 17 per cent in 2024.
And a study released last month by consulting firm Roland Berger found a mammoth investment in AI technology wasn't translating to useful outcomes for many businesses.
Spending on AI by corporates in Europe hit an estimated US$14 billion (AU$21.4 billion) in 2024, but just 27 per cent were able to fully integrate the technology into their operations or workflows, the research revealed.
'Asked about the key challenges involved in implementing AI projects, 28 per cent of respondents cited issues with data, 25 per cent referenced the complexity of integrating AI use cases, and 15 per cent mentioned the difficulty of finding enough AI and data experts,' the study found.
Those findings were mirrored in an IBM survey, which found one-in-four AI projects delivered the returns they promised.
Dr Shafiabady said there are a few reasons for problems facing AI, like those identified in Apple's research.
'When dealing with highly complex problems, these types of complex AI models can't give an accurate solution. One of the reasons why is the innate nature of algorithms,' Dr Shafiabady said.
'Models are built on mathematical computational iterative algorithms that are coded into computers to be processed. When tasks get very complicated, these algorithms won't necessarily follow the logical reasoning and will lose track of them.
'Sometimes when the problem gets harder, all the computing power and time in the world won't enhance AI model's performance. Sometimes when it hits very difficult tasks, it fails because it has learnt the example rather than the hidden patterns in the data.
'And sometimes the problem gets complicated, and a lot of computation resource and time is wasted over exploring the wrong solutions and there is not enough 'energy' left to reach the right solution.'
Hashtags

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles

ABC News
3 hours ago
- ABC News
The IVF industry is under the microscope after Monash IVF apologised for a second embryo bungle. How rare is a mistake?
The IVF industry is under the microscope after major fertility provider Monash IVF apologised for a second embryo bungle. Days after the company told the stock market about the latest case, its chief executive officer resigned. It followed revelations in April that one of Monash IVF's Brisbane patients had been mistakenly implanted with another woman's embryo in 2023 and gave birth to a baby who had no genetic links to her later that year. An independent review is ongoing. Approximately 20,000 babies are born from IVF treatment in Australia every year, according to the Australian and New Zealand Assisted Reproduction Database. So, what is the process, what are the checks and balances, and how rare are such bungles? In vitro fertilisation — or IVF — is one of the most common assisted reproductive technology procedures (ART). The process involves collecting a donor or patient's eggs, then fertilising them with sperm in a Petri dish in a lab. A fertilised egg — known as an embryo — is then implanted into the woman's uterus. Fertility educator and IVF patient advocate Lucy Lines has previously worked as an embryologist — a scientist who specialises in developing embryos. She says whenever eggs, sperm or embryos are moved in the lab, the dishes and paperwork are all labelled with the patient's name, their ID number, and date of birth. In some labs, they are allocated a colour code as well, she says. "At minimum, there are three points of reference for each client inside the lab," she says. "So then when anything is moved from one dish or tube to another dish or tube, a second embryologist will come along and audibly repeat [the patient's name, date of birth, and the ID number]. "That is the historical way that it was done. "In some labs they've added an extra level, a barcode that requires a scanner to read the barcode, or in some other labs they have an RFID (Radio Frequency Identification) label, which has an auto-reader on the bench top or workspace and that reader will sound if there are things on that bench that don't match each other." It varies. There are more than 40 "different pieces of legislation" governing ART and IVF across the country, according to the Fertility Society of Australia and New Zealand (FSANZ). For example, laws to regulate ART providers in Queensland only passed parliament last year. By comparison, Victoria has had laws in place for years. FSANZ's Reproductive Technology Accreditation Committee (RTAC) is responsible for setting performance standards and granting licences to fertility providers across the country. The society and RTAC are currently operated on a "volunteer basis by professionals within the sector who have taken on additional responsibilities". FSANZ is calling for the establishment of RTAC as an independent statutory authority to "strengthen oversight and trust in the sector". That is backed by the Victorian Health Minister Mary-Anne Thomas who said yesterday it was concerning that the body that currently accredits fertility care providers is made up of fertility care providers. Yesterday, all Australian states and territories agreed to undertake a review into the implementation of an independent verification body for fertility providers. There are growing calls for national fertility legislation to replace the "fragmented" state and territory laws and provide "consistent, enforceable standards". IVF activist Anastasia Gunn is among those pushing for change. "We're asking for federal legislation of the industry, which the industry itself are asking for that," she says. "We're asking for a federal donor conception registry [as] historically gametes and embryos have been moved interstate. "Donor conceived people have a right to know their biological and medical history if they choose to. "We're choosing to ask for external regulation of the industry." No, there is no licensing or registration of embryologists in Australia — unlike other professions such as doctors and nurses. It is something the FSANZ would like to see changed. "We continue to advocate for robust professional recognition of embryologists in Australia — both to uphold standards and to protect the public," a FSANZ spokesperson said in a statement. "The medical laboratory professionals in Australia chose to set up this self-regulated scheme because the Australian Government decided not to include Medical Scientists and Technical Officers in the professions covered by the Australian Health Practitioner Regulation Agency (AHPRA) when it was established in 2010. "Most comparable countries do have formal regulation of Medical Scientists and Technical Officers to protect the public and set minimum standards for ongoing assessment of competency and continuing professional development. "It would require the Australian Government to legislate mandatory registration." The industry insists they are rare. But others — such as Sydney Law School lecturer Christopher Rudge — acknowledge it is also difficult to know without specific data. "The recent history of IVF mistakes and errors is somewhat clouded by reported settlements, private settlements, so it is hard to know the true frequency of these errors," Dr Rudge told the ABC earlier this week. Monash IVF is a publicly listed company, which means it has "disclosure obligations". Over the past two months, it has made statements to the ASX — and its shareholders — about the two embryo incidents. However the ASX also issued Monash IVF a 'please explain' about the timing of the Brisbane announcement — which came weeks after the company said it became aware of the mix-up. Monash IVF's chief financial officer said the company did not expect the bungle would affect its share price "as it was an isolated incidence of human error". The company's share price plummeted from $1.08 to 69 cents after the Brisbane bungle was made public in April. Ms Lines says it was complex. "My only answer is, if you're looking down the barrel of IVF, arm yourself with the knowledge of what's actually involved … so that you can empower yourself to ask the questions that you need the answers to," she says.


Canberra Times
3 hours ago
- Canberra Times
How to make your phone work smarter using AI features
Google introduced an AI-powered feature called Circle to Search for Samsung and Android smartphones last year, letting users select an image on their phone's screen to look for other references online. Apple will extend its Visual Intelligence AI feature to do this in its spring software update, offering an option to look up images with Google or in apps below screenshots.


Perth Now
3 hours ago
- Perth Now
How to make your phone work smarter using AI features
Artificial intelligence is becoming a must-have feature in modern smartphones but what the technology does differs from brand to brand. Apple announced several upcoming AI software features at its Worldwide Developers Conference in Cupertino this week, although some are similar to offerings from its competitors. Here are five ways it will be possible to use AI on smartphones this year, depending on which one sits inside your pocket. Screen phone calls: Triggered when a phone call arrives from an unknown number, Apple's Call Screening feature will use an AI-powered voice to request a caller's name and reason for calling and pass them on to the recipient before the phone rings. The feature is similar to Google's Call Screen feature introduced to Pixel smartphones in Australia and could help to reduce spam calls. Search for what you see: Google introduced an AI-powered feature called Circle to Search for Samsung and Android smartphones last year, letting users select an image on their phone's screen to look for other references online. Apple will extend its Visual Intelligence AI feature to do this in its spring software update, offering an option to look up images with Google or in apps below screenshots. Stay on hold: Artificial intelligence can be used to detect on-hold music using Apple's upcoming Hold Assist feature, mute the soundtrack to save your ears and save your place in line. The feature will send users a notification when it detects a human answer the phone. Google offers a similar feature on its Pixel smartphones called Hold For Me. Translate foreign languages: Bridging the language barrier is becoming a task increasingly handed to AI software. Apple's Live Translation feature will work across FaceTime, messaging and phone apps, for example, and promises to convert spoken and written language in real time. Samsung also offers in-call help in its AI-powered Live Translate feature and Google recently announced speech translation in its Meet video app. Guided workouts: An AI voice created using input from Apple Fitness trainers will be used to motivate and guide users through exercise in a forthcoming feature called Workout Buddy. The upbeat narration will announce runners' split times or cyclists' work-out history while they exercise, although the feature requires the use of an Apple Watch, AI-compatible iPhone and Bluetooth headphones. This AAP article was made possible by support from Apple.