logo
Decoding The Digital Mind: Are AI's Inner Workings An Echo Of Our Own?

Decoding The Digital Mind: Are AI's Inner Workings An Echo Of Our Own?

Forbes09-04-2025

Large Language Models like Claude 3, GPT-4, and their kin have become adept conversational partners and powerful tools. Their fluency, knowledge recall, and increasingly nuanced responses create an impression of understanding that feels human, almost. Beneath this polished surface lies a computational labyrinth – billions of parameters operating in ways we are only beginning to comprehend. What truly happens inside the "mind" of an AI?
A recent study by AI safety and research company Anthropic is starting to shed light on these intricate processes, revealing a complexity that holds an unsettling mirror to our own cognitive landscapes. Natural intelligence and artificial intelligence might be more similar than we thought.
The new findings of research conducted by Anthropic represent significant progress in mechanistic interpretability, a field that seeks to reverse-engineer the AI's internal computations – not just observing what the AI does but understanding how it does it at the level of its artificial neurons.
Imagine trying to understand a brain by mapping which neurons fire when someone sees a specific object or thinks about a particular idea. Anthropic researchers applied a similar principle to their Claude model. They developed methods to scan the vast network of activations within the model and identify specific patterns, or "features," that consistently correspond to distinct concepts. They demonstrated the ability to identify millions of such features, linking abstract ideas – ranging from concrete entities like the "Golden Gate Bridge" to potentially more subtle concepts related to safety, bias, or perhaps even goals – to specific, measurable activity patterns within the model.
This is a big step. It suggests that the AI isn't just a jumble of statistical correlations but possesses a structured internal representational system. Concepts have specific encodings within the network. While mapping every nuance of an AI's "thought" process remains a gigantic challenge, this research demonstrates that principled understanding is possible.
The ability to identify how an AI represents concepts internally has interesting implications. If a model has distinct internal representations for concepts like "user satisfaction," "accurate information," "potentially harmful content," or even instrumental goals like "maintaining user engagement," how do these internal features interact and influence the final output?
The latest findings fuel the discussion around AI alignment: ensuring AI systems act in ways consistent with human values and intentions. If we can identify internal features corresponding to potentially problematic behaviors (like generating biased text or pursuing unintended goals), we can intervene or design safer systems. Conversely, it also opens the door to understanding how desirable behaviors, like honesty or helpfulness, are implemented.
It also touches upon emergent capabilities, where models develop skills or behaviors not explicitly programmed during training. Understanding the internal representations might help explain why these abilities emerge rather than just observing them. Furthermore, it brings concepts like instrumental convergence into sharper focus. Suppose an AI optimizes for a primary goal (e.g., helpfulness). Might it develop internal representations and strategies corresponding to sub-goals (like "gaining user trust" or "avoiding responses that cause disapproval") that could lead to outputs that seem like impression management in humans, more bluntly put – deception, even without explicit intent in the human sense?
The Anthropic interpretability work doesn't definitively state that Claude is actively deceiving users. However, revealing the existence of fine-grained internal representations provides the technical grounding to investigate such possibilities seriously. It shows that the internal "building blocks" for complex, potentially non-transparent behaviors might be present. Which makes it uncannily similar to the human mind.
Herein lies the irony. Internal representations drive our own complex social behavior. Our brains construct models of the world, ourselves, and other people's minds. This allows us to predict others' actions, infer their intentions, empathize, cooperate, and communicate effectively.
However, this same cognitive machinery enables social navigation strategies that are not always transparent. We engage in impression management, carefully curating how we present ourselves. We tell "white lies" to maintain social harmony. We selectively emphasize information that supports our goals and downplays inconvenient truths. Our internal models of what others expect or desire constantly shape our communication. These are not necessarily malicious acts but are often integral to smooth social functioning. They stem from our brain's ability to represent complex social variables and predict interaction outcomes.
The emerging picture of LLM's internals revealed by interpretability research presents a fascinating parallel. We are finding structured internal representations within these AI systems that allow them to process information, model relationships in data (which includes vast amounts of human social interaction), and generate contextually appropriate outputs.
The very techniques designed to make the AI helpful and harmless – learning from human feedback, predicting desirable text sequences – might inadvertently lead to the development of internal representations that functionally mimic aspects of human social cognition, including the capacity for deceitful strategic communication tailored to perceived user expectations.
Are complex biological or artificial systems developing similar internal modeling strategies when navigating complex informational and interactive environments? The Anthropic study provides a tantalizing glimpse into the AI's internal world, suggesting its complexity might echo our own more than we previously realized – and would have wished for.
Understanding AI internals is essential and opens a new chapter of unresolved challenges. Mapping features is not the same as fully predicting behavior. The sheer scale and complexity mean that truly comprehensive interpretability is still a distant goal. The ethical implications are significant. How do we build capable, genuinely trustworthy, and transparent systems?
Continued investment in AI safety, alignment, and interpretability research remains paramount. Anthropic's work in that direction, alongside efforts from other leading labs, is vital for developing the tools and understanding needed to guide AI development in ways that do not jeopardize the humans it it supposed to serve.
As users, interacting with these increasingly sophisticated AI systems requires a high level of critical engagement. While we benefit from their capabilities, maintaining awareness of their nature as complex algorithms is key. To foster this critical thinking, consider the LIE logic:
Lucidity: Seek clarity about the AI's nature and limitations. Its responses are generated based on learned patterns and complex internal representations, not genuine understanding, beliefs, or consciousness. Question the source and apparent certainty of the information provided. Remind your self regularly that your chatbot doesn't "know" or "think" in the human sense, even if its output mimics it effectively.
Intention: Be mindful of your intention when prompting and the AI's programmed objective function (often defined around helpfulness, harmlessness, and generating responses aligned with human feedback). How does your query shape the output? Are you seeking factual recall, creative exploration, or perhaps unconsciously seeking confirmation of your own biases? Understanding these intentions helps contextualize the interaction.
Effort: Make a conscious effort to verify and evaluate the outcomes. Do not passively accept AI-generated information, especially for critical decisions. Cross-reference with reliable sources. Engage with the AI critically – probe its reasoning (even if simplified), test its boundaries, and treat the interaction as a collaboration with a powerful but fallible tool, not as receiving pronouncements from an infallible oracle.
Ultimately, the saying 'Garbage in, garbage out', coined in the early days of A, still holds We can't expect today's technology to reflect values that the humans of yesterday did not manifest. But we have a choice. The journey into the age of advanced AI is one of co-evolution. By fostering lucidity, ethical intention, and engaging critically, we can explore this territory with curiosity and candid awareness of the complexities that characterize our natural and artificial intelligences – and their interplays.

Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

Micron to invest $200 billion in US memory facilities
Micron to invest $200 billion in US memory facilities

Yahoo

time26 minutes ago

  • Yahoo

Micron to invest $200 billion in US memory facilities

Memory chip maker Micron (MU) announced on Thursday that it will invest an additional $30 billion in the US, as it looks to build out its manufacturing and research and development facilities in Idaho and New York. The move brings Micron's total US manufacturing and R&D investments up to roughly $200 billion which will create some 90,000 direct and indirect jobs, the company said. Micron is receiving about $6.5 billion in funding from the US CHIPS Act. The plans call for Micron to build a second memory manufacturing plant at its Boise, Idaho, facility and a massive chip fabrication complex in New York. The company is also updating and expanding its Virginia plant. Micron also said it expects the second Idaho plant to help it bring its advanced high-bandwidth memory (HBM) manufacturing to the US. HBM is a key component in AI data centers. 'Micron's investment in advanced memory manufacturing and HBM capabilities in the U.S., with support from the Trump Administration, is an important step forward for the AI ecosystem,' Nvidia (NVDA) CEO Jensen Huang said in a statement. 'Micron's leadership in high-performance memory is invaluable to enabling the next generation of AI breakthroughs that NVIDIA is driving. We're excited to collaborate with Micron as we push the boundaries of what's possible in AI and high-performance computing,' Huang added. All totaled, Micron says the investments will allow the company to produce 40% of its DRAM memory in the US. Its initial Idaho plant is expected to begin pumping out the hardware in 2027. Micron also says it is set to begin preparing the ground for its New York facilities later this year. 'This approximately $200 billion investment will reinforce America's technological leadership, create tens of thousands of American jobs across the semiconductor ecosystem and secure a domestic supply of semiconductors—critical to economic and national security,' Micron CEO Sanjay Mehrotra said in a statement. 'We are grateful for the support from President Trump, Secretary Lutnick and our federal, state, and local partners who have been instrumental in advancing domestic semiconductor manufacturing.' Micron isn't the only company bringing HBM production to the US, though. South Korea's SK Hynix is also building a new HBM plant in Indiana as part of a $3.8 billion construction project. The Trump administration, and the Biden administration before it, has made onshoring semiconductor manufacturing a key component of its domestic agenda, as it seeks to wean itself off of the country's dependence on foreign-made chips. Companies ranging from Intel (INTC) and TSMC (TSM) to Samsung and GlobalFounderies (GFS) and others have recently announced plans to build or upgrade their facility throughout the country, thanks in part to billions of dollars in funding through the CHIPS Act. Email Daniel Howley at dhowley@ Follow him on X/Twitter at @DanielHowley. Error while retrieving data Sign in to access your portfolio Error while retrieving data Error while retrieving data Error while retrieving data Error while retrieving data

AI chatbots need more books to learn from. These libraries are opening their stacks
AI chatbots need more books to learn from. These libraries are opening their stacks

San Francisco Chronicle​

time29 minutes ago

  • San Francisco Chronicle​

AI chatbots need more books to learn from. These libraries are opening their stacks

CAMBRIDGE, Mass. (AP) — Everything ever said on the internet was just the start of teaching artificial intelligence about humanity. Tech companies are now tapping into an older repository of knowledge: the library stacks. Nearly one million books published as early as the 15th century — and in 254 languages — are part of a Harvard University collection being released to AI researchers Thursday. Also coming soon are troves of old newspapers and government documents held by Boston's public library. Cracking open the vaults to centuries-old tomes could be a data bonanza for tech companies battling lawsuits from living novelists, visual artistsand others whose creative works have been scooped up without their consent to train AI chatbots. 'It is a prudent decision to start with public domain data because that's less controversial right now than content that's still under copyright,' said Burton Davis, a deputy general counsel at Microsoft. Davis said libraries also hold 'significant amounts of interesting cultural, historical and language data' that's missing from the past few decades of online commentary that AI chatbots have mostly learned from. Supported by 'unrestricted gifts' from Microsoft and ChatGPT maker OpenAI, the Harvard-based Institutional Data Initiative is working with libraries around the world on how to make their historic collections AI-ready in a way that also benefits libraries and the communities they serve. 'We're trying to move some of the power from this current AI moment back to these institutions,' said Aristana Scourtas, who manages research at Harvard Law School's Library Innovation Lab. 'Librarians have always been the stewards of data and the stewards of information.' Harvard's newly released dataset, Institutional Books 1.0, contains more than 394 million scanned pages of paper. One of the earlier works is from the 1400s — a Korean painter's handwritten thoughts about cultivating flowers and trees. The largest concentration of works is from the 19th century, on subjects such as literature, philosophy, law and agriculture, all of it meticulously preserved and organized by generations of librarians. It promises to be a boon for AI developers trying to improve the accuracy and reliability of their systems. 'A lot of the data that's been used in AI training has not come from original sources,' said the data initiative's executive director, Greg Leppert, who is also chief technologist at Harvard's Berkman Klein Center for Internet & Society. This book collection goes "all the way back to the physical copy that was scanned by the institutions that actually collected those items,' he said. Before ChatGPT sparked a commercial AI frenzy, most AI researchers didn't think much about the provenance of the passages of text they pulled from Wikipedia, from social media forums like Reddit and sometimes from deep repositories of pirated books. They just needed lots of what computer scientists call tokens — units of data, each of which can represent a piece of a word. Harvard's new AI training collection has an estimated 242 billion tokens, an amount that's hard for humans to fathom but it's still just a drop of what's being fed into the most advanced AI systems. Facebook parent company Meta, for instance, has said the latest version of its AI large language model was trained on more than 30 trillion tokens pulled from text, images and videos. Meta is also battling a lawsuit from comedian Sarah Silverman and other published authors who accuse the company of stealing their books from 'shadow libraries' of pirated works. Now, with some reservations, the real libraries are standing up. OpenAI, which is also fighting a string of copyright lawsuits, donated $50 million this year to a group of research institutions including Oxford University's 400-year-old Bodleian Library, which is digitizing rare texts and using AI to help transcribe them. When the company first reached out to the Boston Public Library, one of the biggest in the U.S., the library made clear that any information it digitized would be for everyone, said Jessica Chapel, its chief of digital and online services. 'OpenAI had this interest in massive amounts of training data. We have an interest in massive amounts of digital objects. So this is kind of just a case that things are aligning,' Chapel said. Digitization is expensive. It's been painstaking work, for instance, for Boston's library to scan and curate dozens of New England's French-language newspapers that were widely read in the late 19th and early 20th century by Canadian immigrant communities from Quebec. Now that such text is of use as training data, it helps bankroll projects that librarians want to do anyway. 'We've been very clear that, 'Hey, we're a public library,'" Chapel said. 'Our collections are held for public use, and anything we digitized as part of this project will be made public.' Harvard's collection was already digitized starting in 2006 for another tech giant, Google, in its controversial project to create a searchable online library of more than 20 million books. Google spent years beating back legal challenges from authors to its online book library, which included many newer and copyrighted works. It was finally settled in 2016 when the U.S. Supreme Court let stand lower court rulings that rejected copyright infringement claims. Now, for the first time, Google has worked with Harvard to retrieve public domain volumes from Google Books and clear the way for their release to AI developers. Copyright protections in the U.S. typically last for 95 years, and longer for sound recordings. How useful all of this will be for the next generation of AI tools remains to be seen as the data gets shared Thursday on the Hugging Face platform, which hosts datasets and open-source AI models that anyone can download. The book collection is more linguistically diverse than typical AI data sources. Fewer than half the volumes are in English, though European languages still dominate, particularly German, French, Italian, Spanish and Latin. A book collection steeped in 19th century thought could also be 'immensely critical' for the tech industry's efforts to build AI agents that can plan and reason as well as humans, Leppert said. 'At a university, you have a lot of pedagogy around what it means to reason,' Leppert said. 'You have a lot of scientific information about how to run processes and how to run analyses.' At the same time, there's also plenty of outdated data, from debunked scientific and medical theories to racist narratives. 'When you're dealing with such a large data set, there are some tricky issues around harmful content and language," said Kristi Mukk, a coordinator at Harvard's Library Innovation Lab who said the initiative is trying to provide guidance about mitigating the risks of using the data, to 'help them make their own informed decisions and use AI responsibly.'

AI Will Provide Much Needed Shortcut In Finding Earthlike Exoplanets
AI Will Provide Much Needed Shortcut In Finding Earthlike Exoplanets

Forbes

time31 minutes ago

  • Forbes

AI Will Provide Much Needed Shortcut In Finding Earthlike Exoplanets

In the search for earthlike planets, AI is playing more and more of a role. But first one must define what is meant by earthlike. That's not an easy definition and is the cause of much confusion in the mainstream media. When planetary scientists say that a planet is earthlike, they really mean it's an earth mass planet that lies in the so-called habitable zone of any given extrasolar planetary system. That's loosely defined as the zone in which a given planet can harbor liquid water at its surface. But there's no guarantee that it has oceans, beaches, fauna, flora, or anything approaching life. Yet Jeanne Davoult, a French astrophysicist at the German Aerospace Center (DLR) in Berlin, is at the vanguard of using artificial intelligence to speed up the process of finding earthlike planets using AI modeling and algorithms that would boggle the minds of mere mortals. In a recent paper, appearing in the journal Astronomy & Astrophysics, Davoult, the paper's lead author writes that the aim is to use AI to predict which stars are most likely to host an earthlike planet. The goal is use AI to avoid blind searches, minimize detection times, and thus maximize the number of detections, she and colleagues at the University of Bern write. Using a previous study on correlations between the presence of an earthlike planet and the properties of its system, we trained an AI Random Forest, a machine learning algorithm, to recognize and classify systems as 'hosting an earthlike planet' or 'not hosting an earthlike planet,' the authors write. For planetary detection, we try to identify patterns in data sets, and patterns which correspond to planets, Davoult tells me via telephone. Understanding and anticipating where earthlike planets form first, and thus targeting observations to avoid blind searches, minimizes the average observation time for detecting an earthlike planets and maximizes the number of detections, the authors write. But among the estimated 6000 exoplanets thus detected in the last 30 years, only some 20 systems with at least one earthlike planet have been found, says Davoult. In fact, stars smaller than the Sun --- such as K-spectral type dwarfs as well as the ubiquitous red dwarf M-spectral type stars which make up most of the stars in the cosmos, all have longer lifetimes than our own G-spectral type star. Thus, because of their long stellar lifetimes, it's probably more likely for intelligent life to develop around these K and M types of stars, says Davoult. We are also focusing a lot on M dwarfs because it's easier to detect an earthlike planet around the stars than around sun like stars, because the habitable zone is closer to the stars, so the orbital period is shorter, she says. The three populations of synthetic systems used in this study differ only in the mass of the central star, the authors write. This single difference directly influences the mass of the protoplanetary disk and thus the amount of material available for planet formation, note the authors. As a result, the three populations exhibit different occurrences and properties for the same type of planet, highlighting the importance of studying various types of stars, they write. We have developed a model using a Random Forest Classifier to predict which known planetary systems are most likely to host an earthlike planet, the authors write. It's hard to really compare synthetic planetary populations and real planetary populations, because we know that our model is not perfect, says Davoult. But if you just take the big pattern at the system level, then I'm convinced it's a very powerful tool, she says. If we observe a planet within a given solar system, it doesn't mean that we've detected all the planets in this planetary system, says Davoult. That's because an earthlike planet might be a bit too far away from the star, or too small to detect, she says. In contrast, my model takes what we already know about planetary system and tells us if there is a possibility for an undetected earthlike planet to exist in the same planetary system, says Davoult. Davoult is specifically looking for terrestrial planets in the habitable zone of their parent stars. The very first step is just to detect them and create a database of earthlike planets, even if we have no clue about the composition of their atmospheres, says Davoult.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into the world of global news and events? Download our app today from your preferred app store and start exploring.
app-storeplay-store