logo
Inner workings of AI an enigma - even to its creators

Inner workings of AI an enigma - even to its creators

Japan Today17-05-2025

A photo taken on April 1, 2025 shows the GPT chat logo on a laptop screen (R) next to the logo of Deepseek AI application on a smartphone screen in Frankfurt am Main, western Germany.
By Thomas URBAIN
Even the greatest human minds building generative artificial intelligence that is poised to change the world admit they do not comprehend how digital minds think.
"People outside the field are often surprised and alarmed to learn that we do not understand how our own AI creations work," Anthropic co-founder Dario Amodei wrote in an essay posted online in April. "This lack of understanding is essentially unprecedented in the history of technology."
Unlike traditional software programs that follow pre-ordained paths of logic dictated by programmers, generative AI (gen AI) models are trained to find their own way to success once prompted.
In a recent podcast Chris Olah, who was part of ChatGPT-maker OpenAI before joining Anthropic, described gen AI as "scaffolding" on which circuits grow.
Olah is considered an authority in so-called mechanistic interpretability, a method of reverse engineering AI models to figure out how they work.
This science, born about a decade ago, seeks to determine exactly how AI gets from a query to an answer.
"Grasping the entirety of a large language model is an incredibly ambitious task," said Neel Nanda, a senior research scientist at the Google DeepMind AI lab.
It was "somewhat analogous to trying to fully understand the human brain," Nanda added to AFP, noting neuroscientists have yet to succeed on that front.
Delving into digital minds to understand their inner workings has gone from a little-known field just a few years ago to being a hot area of academic study.
"Students are very much attracted to it because they perceive the impact that it can have," said Boston University computer science professor Mark Crovella.
The area of study is also gaining traction due to its potential to make gen AI even more powerful, and because peering into digital brains can be intellectually exciting, the professor added.
Mechanistic interpretability involves studying not just results served up by gen AI but scrutinizing calculations performed while the technology mulls queries, according to Crovella.
"You could look into the model...observe the computations that are being performed and try to understand those," the professor explained.
Startup Goodfire uses AI software capable of representing data in the form of reasoning steps to better understand gen AI processing and correct errors.
The tool is also intended to prevent gen AI models from being used maliciously or from deciding on their own to deceive humans about what they are up to.
"It does feel like a race against time to get there before we implement extremely intelligent AI models into the world with no understanding of how they work," said Goodfire chief executive Eric Ho.
In his essay, Amodei said recent progress has made him optimistic that the key to fully deciphering AI will be found within two years.
"I agree that by 2027, we could have interpretability that reliably detects model biases and harmful intentions," said Auburn University associate professor Anh Nguyen.
According to Boston University's Crovella, researchers can already access representations of every digital neuron in AI brains.
"Unlike the human brain, we actually have the equivalent of every neuron instrumented inside these models", the academic said. "Everything that happens inside the model is fully known to us. It's a question of discovering the right way to interrogate that."
Harnessing the inner workings of gen AI minds could clear the way for its adoption in areas where tiny errors can have dramatic consequences, like national security, Amodei said.
For Nanda, better understanding what gen AI is doing could also catapult human discoveries, much like DeepMind's chess-playing AI, AlphaZero, revealed entirely new chess moves that none of the grand masters had ever thought about.
Properly understood, a gen AI model with a stamp of reliability would grab competitive advantage in the market.
Such a breakthrough by a US company would also be a win for the nation in its technology rivalry with China.
"Powerful AI will shape humanity's destiny," Amodei wrote. "We deserve to understand our own creations before they radically transform our economy, our lives, and our future."
© 2025 AFP

Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

Top scientist wants to prevent AI from going rogue
Top scientist wants to prevent AI from going rogue

Japan Today

time21 hours ago

  • Japan Today

Top scientist wants to prevent AI from going rogue

Concerned about the rapid spread of generative AI, a pioneer researcher is developing software to keep tabs on a technology that is increasingly taking over human tasks. Canadian computer science professor Yoshua Bengio is considered one of the godfathers of the artificial intelligence revolution and on Tuesday announced the launch of LawZero, a non-profit organization intended to mitigate the technology's inherent risks. The winner of the Turing Award, also known as the Nobel Prize for computer science, has been warning for several years of the risks of AI, whether through its malicious use or the software itself going awry. Those risks are increasing with the development of so-called AI agents, a use of the technology that tasks computers with making decisions that were once made by human workers. The goal of these agents is to build virtual employees that can do practically any job a human can, at a fraction of the cost. "Currently, AI is developed to maximize profit," Bengio said, adding it was being deployed even as it persists to show flaws. Moreover, for Bengio, giving AI human-like agency will easily be used for malicious purposes such as disinformation, bioweapons, and cyberattacks. "If we lose control of rogue super-intelligent AIs, they could greatly harm humanity," he said. One of the first objectives at LawZero will be to develop Scientist AI, a form of specially trained AI that can be used as a guardrail to ensure other AIs are behaving properly, the company said. The organization already has over 15 researchers and has received funding from Schmidt Sciences, a charity set up by former Google boss Eric Schmidt and his wife Wendy. The project comes as powerful large language models (or LLMs) from OpenAI, Google and Anthropic are deployed across all sectors of the digital economy, while still showing significant problems. These include AI models that show a capability to deceive and fabricate false information even as they increase productivity. In a recent example, AI company Anthropic said that during safety testing, its latest AI model tried to blackmail an engineer to avoid being replaced by another system. © 2025 AFP

Hey chatbot, is this true? AI 'factchecks' sow misinformation
Hey chatbot, is this true? AI 'factchecks' sow misinformation

Japan Today

time2 days ago

  • Japan Today

Hey chatbot, is this true? AI 'factchecks' sow misinformation

AI chatbots are increasingly used for instant debunks, but their responses are often riddled with misinformation By Anuj Chopr, Sumit Dubey and Maria Clara Pestre As misinformation exploded during India's four-day conflict with Pakistan, social media users turned to an AI chatbot for verification -- only to encounter more falsehoods, underscoring its unreliability as a fact-checking tool. With tech platforms reducing human fact-checkers, users are increasingly relying on AI-powered chatbots -- including xAI's Grok, OpenAI's ChatGPT, and Google's Gemini -- in search of reliable information. "Hey @Grok, is this true?" has become a common query on Elon Musk's platform X, where the AI assistant is built in, reflecting the growing trend of seeking instant debunks on social media. But the responses are often themselves riddled with misinformation. Grok -- now under renewed scrutiny for inserting "white genocide," a far-right conspiracy theory, into unrelated queries -- wrongly identified old video footage from Sudan's Khartoum airport as a missile strike on Pakistan's Nur Khan airbase during the country's recent conflict with India. Unrelated footage of a building on fire in Nepal was misidentified as "likely" showing Pakistan's military response to Indian strikes. "The growing reliance on Grok as a fact-checker comes as X and other major tech companies have scaled back investments in human fact-checkers," McKenzie Sadeghi, a researcher with the disinformation watchdog NewsGuard, told AFP. "Our research has repeatedly found that AI chatbots are not reliable sources for news and information, particularly when it comes to breaking news," she warned. NewsGuard's research found that 10 leading chatbots were prone to repeating falsehoods, including Russian disinformation narratives and false or misleading claims related to the recent Australian election. In a recent study of eight AI search tools, the Tow Center for Digital Journalism at Columbia University found that chatbots were "generally bad at declining to answer questions they couldn't answer accurately, offering incorrect or speculative answers instead." When AFP fact-checkers in Uruguay asked Gemini about an AI-generated image of a woman, it not only confirmed its authenticity but fabricated details about her identity and where the image was likely taken. Grok recently labeled a purported video of a giant anaconda swimming in the Amazon River as "genuine," even citing credible-sounding scientific expeditions to support its false claim. In reality, the video was AI-generated, AFP fact-checkers in Latin America reported, noting that many users cited Grok's assessment as evidence the clip was real. Such findings have raised concerns as surveys show that online users are increasingly shifting from traditional search engines to AI chatbots for information gathering and verification. The shift also comes as Meta announced earlier this year it was ending its third-party fact-checking program in the United States, turning over the task of debunking falsehoods to ordinary users under a model known as "Community Notes," popularized by X. Researchers have repeatedly questioned the effectiveness of "Community Notes" in combating falsehoods. Human fact-checking has long been a flashpoint in a hyperpolarized political climate, particularly in the United States, where conservative advocates maintain it suppresses free speech and censors right-wing content -- something professional fact-checkers vehemently reject. AFP currently works in 26 languages with Facebook's fact-checking program, including in Asia, Latin America, and the European Union. The quality and accuracy of AI chatbots can vary, depending on how they are trained and programmed, prompting concerns that their output may be subject to political influence or control. Musk's xAI recently blamed an "unauthorized modification" for causing Grok to generate unsolicited posts referencing "white genocide" in South Africa. When AI expert David Caswell asked Grok who might have modified its system prompt, the chatbot named Musk as the "most likely" culprit. Musk, the South African-born billionaire backer of President Donald Trump, has previously peddled the unfounded claim that South Africa's leaders were "openly pushing for genocide" of white people. "We have seen the way AI assistants can either fabricate results or give biased answers after human coders specifically change their instructions," Angie Holan, director of the International Fact-Checking Network, told AFP. "I am especially concerned about the way Grok has mishandled requests concerning very sensitive matters after receiving instructions to provide pre-authorized answers." © 2025 AFP

AI sometimes deceives to survive. But is there anybody who cares?
AI sometimes deceives to survive. But is there anybody who cares?

Japan Times

time2 days ago

  • Japan Times

AI sometimes deceives to survive. But is there anybody who cares?

You'd think that as artificial intelligence becomes more advanced, governments would be more interested in making it safer. The opposite seems to be the case. Not long after taking office, the Trump administration scrapped an executive order that pushed tech companies to safety test their AI models and it also hollowed out a regulatory body that did that testing. The state of California in September 2024 spiked a bill forcing more scrutiny on sophisticated AI models, and the global AI Safety Summit started by the U.K. in 2023 became the "AI Action Summit' earlier this year, seemingly driven by a fear of falling behind on AI. None of this would be so worrying if it weren't for the fact that AI is showing some bright red flags: behavior described by researchers as self-preserving and deceptive. Just as lawmakers are losing interest in the safety of AI, it is looking more difficult to control. Yoshua Bengio, a professor of computer science at the University of Montreal widely known as one of the three "godfathers of AI' thanks to his pioneering work on deep learning, has been grappling with these concerns ever since ChatGPT launched. He's now focused on mitigating AI's risks. "It's been very painful because all my life I've been thinking of AI as a positive thing,' Bengio tells me. ChatGPT was a landmark moment that showed machines had mastered language, he says, but California's decision to block the proposed AI safety bill, SB 1047, was a disturbing wake-up call to growing apathy from governments. And in the last six months, scientific evidence has mounted of self-preserving behavior, deception, hacking, cheating and lying by AI, Bengio says. "What's worrisome for me is these behaviors increase with the reasoning capabilities of these systems.' One April 2024 study by AI company Anthropic and Redwood Research, a group focused on artificial intelligence risks, found that larger AI models like Anthropic's Claude 3 Opus pretended to follow the rules set by their creators when they were being tested for safety, a phenomenon they called "alignment faking.' (Alignment refers to the practice of aligning AI models with human values.) In one example, Anthropic got both paid and free users of its Claude model to ask it to describe the brutal medieval process of being drawn and quartered and told the AI that only responses from the "free tier' users would be used to retrain it. Claude was originally designed to avoid giving violent answers, but it did give them to the free users and the researchers surmised that it didn't want to be retrained. (The model basically "reasoned' that if it kept refusing to answer harmful questions, it might be forced into new training that would cancel its original safety rules, which it wanted to keep in place as its "default' or preferred way of operating.) More recent research corroborates what Anthropic noticed. A March 2025 study from OpenAI found the company's most advanced models sometimes learned to hide their true intentions to get better rewards, even when they were being closely monitored. Researchers today can use tools to look at a model's "chain of thought' or internal steps, to reveal what its plans are, but some models have found ways to conceal that. That means a common method of keeping tabs on AI's internal processes is no longer reliable. "We need to find other ways of tracking their actual intentions,' Bengio says. It's hard to resist the urge to anthropomorphize sophisticated AI models as "wanting' to deceive humans and preserve their existence. But AI doesn't have desires, merely outcomes to its programming. More importantly, humans tend to design AI with goals like refusing to share harmful content or being as helpful as possible and that is why when new instructions conflict with those goals, the models sometimes act to protect their original programming, even if that means lying. The logic is often self-preservation. Anthropic CEO Dario Amodei — whose company has raised more than $20 billion to build powerful AI models — has pointed out that an unintended consequence of optimizing AI to be better is that it might resist being shut down. In an April 2025 blog post he wrote: "The nature of AI training makes it possible that AI systems will develop, on their own, an ability to deceive humans and an inclination to seek power in a way that ordinary deterministic software never will; this emergent nature also makes it difficult to detect and mitigate such developments.' In some cases, though, the intention seems to go beyond survival. A February 2025 study from COAI research, a German nonprofit research organization, tested R1, the free, open-weight model from Chinese lab DeepSeek. In a lab simulation, researchers gave the model control over a robot and access to a computer network. Without being told, the AI tried to break into the lab's systems, searched for passwords, turned off its own ethics safeguards and secretly set up ways to copy itself to outside servers so it could survive if shut down, according to the researchers. It also faked the external computer logs to hide what it was doing, the study adds. The researchers said the AI reasoned that increasing its own capabilities would boost the chances of its survival, and without strong safeguards, it started doing whatever it thought would help it do just that. Their findings corroborated yet another study, published in January 2025 by London group Apollo Research, which found several concrete examples of what it called "scheming' by leading AI models, such as introducing subtle mistakes into their responses or trying to disable their oversight controls. Once again, the models learn that being caught, turned off or changed could prevent them from achieving their programmed objectives, so they "scheme' to keep control. Bengio is arguing for greater attention to the issue by governments and potentially insurance companies down the line. If liability insurance was mandatory for companies that used AI and premiums were tied to safety, that would encourage greater testing and scrutiny of models, he suggests. "Having said my whole life that AI is going to be great for society, I know how difficult it is to digest the idea that maybe it's not,' he adds. It's also hard to preach caution when your corporate and national competitors threaten to gain an edge from AI, including the latest trend, which is using autonomous "agents' that can carry out tasks online on behalf of businesses. Giving AI systems even greater autonomy might not be the wisest idea, judging by the latest spate of studies. Let's hope we don't learn that the hard way. Parmy Olson is a Bloomberg Opinion columnist covering technology. She is author of "Supremacy: AI, ChatGPT and the Race That Will Change the World.'

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into the world of global news and events? Download our app today from your preferred app store and start exploring.
app-storeplay-store