logo
Weird phrase plaguing scientific papers traced to glitch in AI data

Weird phrase plaguing scientific papers traced to glitch in AI data

The Hindu22-04-2025

Earlier this year, scientists discovered a peculiar term appearing in published papers: 'vegetative electron microscopy'.
This phrase, which sounds technical but is actually nonsense, has become a 'digital fossil' – an error preserved and reinforced in artificial intelligence (AI) systems that is nearly impossible to remove from our knowledge repositories.
Like biological fossils trapped in rock, these digital artefacts may become permanent fixtures in our information ecosystem.
The case of 'vegetative electron microscopy' offers a troubling glimpse into how AI systems can perpetuate and amplify errors throughout our collective knowledge.
Bad scan, error in translation
'Vegetative electron microscopy' appears to have originated through a remarkable coincidence of unrelated errors.
First, two papers from the 1950s, published in the journal Bacteriological Reviews, were scanned and digitised.
However, the digitising process erroneously combined 'vegetative' from one column of text with 'electron' from another. As a result, the phantom term was created.
Decades later, 'vegetative electron microscopy' turned up in some Iranian scientific papers. In 2017 and 2019, two papers used the term in English captions and abstracts.
This appears to be due to a translation error. In Farsi, the words for 'vegetative' and 'scanning' differ by only a single dot.
An error on the rise
The upshot? As of today, 'vegetative electron microscopy' appears in 22 papers, according to Google Scholar. One was the subject of a contested retraction from a Springer Nature journal, and Elsevier issued a correction for another.
The term also appears in news articles discussing subsequent integrity investigations.
'Vegetative electron microscopy' began to appear more frequently in the 2020s. To find out why, we had to peer inside modern AI models – and do some archaeological digging through the vast layers of data they were trained on.
The large language models behind modern AI chatbots such as ChatGPT are 'trained' on huge amounts of text to predict the likely next word in a sequence. The exact contents of a model's training data are often a closely guarded secret.
To test whether a model 'knew' about 'vegetative electron microscopy', we input snippets of the original papers to find out if the model would complete them with the nonsense term or more sensible alternatives.
The results were revealing. OpenAI's GPT-3 consistently completed phrases with 'vegetative electron microscopy'. Earlier models such as GPT-2 and BERT did not. This pattern helped us isolate when and where the contamination occurred.
We also found the error persists in later models including GPT-4o and Anthropic's Claude 3.5. This suggests the nonsense term may now be permanently embedded in AI knowledge bases.
By comparing what we know about the training datasets of different models, we identified the CommonCrawl dataset of scraped internet pages as the most likely vector where AI models first learned this term.
The scale problem
Finding errors of this sort is not easy. Fixing them may be almost impossible.
One reason is scale. The CommonCrawl dataset, for example, is millions of gigabytes in size. For most researchers outside large tech companies, the computing resources required to work at this scale are inaccessible.
Another reason is a lack of transparency in commercial AI models. OpenAI and many other developers refuse to provide precise details about the training data for their models. Research efforts to reverse engineer some of these datasets have also been stymied by copyright takedowns.
When errors are found, there is no easy fix. Simple keyword filtering could deal with specific terms such as 'vegetative electron microscopy'. However, it would also eliminate legitimate references (such as this article).
More fundamentally, the case raises an unsettling question. How many other nonsensical terms exist in AI systems, waiting to be discovered?
Implications for science and publishing
This 'digital fossil' also raises important questions about knowledge integrity as AI-assisted research and writing become more common.
Publishers have responded inconsistently when notified of papers including 'vegetative electron microscopy'. Some have retracted affected papers, while others defended them. Elsevier notably attempted to justify the term's validity before eventually issuing a correction.
We do not yet know if other such quirks plague large language models, but it is highly likely. Either way, the use of AI systems has already created problems for the peer-review process.
For instance, observers have noted the rise of 'tortured phrases' used to evade automated integrity software, such as 'counterfeit consciousness' instead of 'artificial intelligence'. Additionally, phrases such as 'I am an AI language model' have been found in other retracted papers.
Some automatic screening tools such as Problematic Paper Screener now flag 'vegetative electron microscopy' as a warning sign of possible AI-generated content. However, such approaches can only address known errors, not undiscovered ones.
Living with digital fossils
The rise of AI creates opportunities for errors to become permanently embedded in our knowledge systems, through processes no single actor controls. This presents challenges for tech companies, researchers, and publishers alike.
Tech companies must be more transparent about training data and methods. Researchers must find new ways to evaluate information in the face of AI-generated convincing nonsense. Scientific publishers must improve their peer review processes to spot both human and AI-generated errors.
Digital fossils reveal not just the technical challenge of monitoring massive datasets, but the fundamental challenge of maintaining reliable knowledge in systems where errors can become self-perpetuating.
Aaron J. Snoswell is research fellow in AI accountability; Kevin Witzenberger is research fellow, GenAI Lab; and Rayane El Masri is a PhD candidate, GenAI Lab – all at Queensland University of Technology. This article is republished from The Conversation.

Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

Box Office: The Final Reckoning wraps China pre-sales at USD 3.2M, eyes USD 24M–USD 30M opening weekend
Box Office: The Final Reckoning wraps China pre-sales at USD 3.2M, eyes USD 24M–USD 30M opening weekend

Pink Villa

time3 hours ago

  • Pink Villa

Box Office: The Final Reckoning wraps China pre-sales at USD 3.2M, eyes USD 24M–USD 30M opening weekend

Mission: Impossible—The Final Reckoning is looking at a high-stakes debut in China today. The Tom Cruise-led action film wrapped its seven-day ticket pre-sale campaign on Thursday with a total of USD 3.2 million secured for the Friday-to-Sunday launch window, setting the stage for what could be a No. 1 opening weekend in the country. The pre-sale figure places the eighth installment of the Mission: Impossible franchise alongside notable past openers, including Guardians of the Galaxy Vol. 3 (USD 3M), Dead Reckoning (USD 3.2M), and No Time to Die (USD 3.3M). However, it trails behind recent bigger pre-sale performers such as Deadpool & Wolverine (USD 3.7M), Transformers: Rise of the Beasts (USD 4.5M), Godzilla x Kong: The New Empire (USD 6M), and Jurassic World Dominion (USD 7.1M). Breaking down the numbers, The Final Reckoning earned USD 1.9 million in pre-sales for its Friday opening day, followed by USD 935,000 for Saturday and USD 309,000 for Sunday. Despite a strong start, momentum waned in the final three days of advance bookings, with sales stagnating and eventually matching those of Dead Reckoning, despite initially outpacing it. One key factor contributing to the slowdown is the absence of Thursday preview screenings, a decision by the studio that has left the film heavily reliant on its Friday performance to build word-of-mouth. Still, exhibitors in China appear optimistic, given 106,000 screenings have been booked for the day, indicating solid industry confidence. Given the current trajectory, The Final Reckoning is projected to open between USD 24 million and USD 30 million in its three-day debut, depending on how well general bookings do. It remains to be seen if the latest entry can outpace Dead Reckoning's USD 24.8 million opening in China two years ago. Mission: Impossible 8 Trailer HERE: Directed by Christopher McQuarrie, the film continues the saga of Ethan Hunt (Cruise) and his IMF team as they confront a rogue AI threatening humanity. The ensemble cast includes Hayley Atwell, Ving Rhames, Simon Pegg, Henry Czerny, and Angela Bassett. With a production budget between USD 300 and USD 400 million, it ranks among the most expensive films ever made. Following its global rollout on May 23, The Final Reckoning has already grossed USD 227.1 million worldwide and is currently the eighth highest-grossing film of 2025. Its performance in China this weekend could provide a critical boost to its international totals.

IISER IAT answer key 2025: Objection window opens till June 1, direct link here
IISER IAT answer key 2025: Objection window opens till June 1, direct link here

Scroll.in

time5 hours ago

  • Scroll.in

IISER IAT answer key 2025: Objection window opens till June 1, direct link here

Indian Institutes of Science Education and Research (IISER) has opened the objection submission window for the aptitude test 2025 (Hindi and English) answer key on the official website Applicants can submit suggestions, if any, till June 1 up to 5.00 pm. A fee of Rs 100 per suggestion is applicable. There is no upper limit on the number of objections that can be filed. Remember that for each objection, candidates have to make a separate application and pay an amount of Rs 100, reads the notification. The computer-based test was conducted on May 25, 2025. IAT 2025 is being conducted for admissions to the Bachelor of Science (Research) program of Indian Institute of Science (IISc), Bangalore, and the BS-Medical Sciences and Engineering program of Indian Institute of Technology, Madras (IITM). Steps to submit objections for IAT answer key 2025

AI as infrastructure: India must develop the right tech
AI as infrastructure: India must develop the right tech

Mint

time6 hours ago

  • Mint

AI as infrastructure: India must develop the right tech

Artificial intelligence (AI) is often treated as a discrete branch of information technology, surrounded by fears of sentient machines, widespread job losses and existential risks. These reactions are understandable but short-sighted. AI is not just a product or tool. It is an enabling layer, much like electricity, the internet or aviation, that can permeate and power every aspect of life. Electricity offers a useful parallel. In the 19th century, Edison and Tesla fought bitterly over the future of current, with Edison backing direct current (DC) and Tesla championing alternating current (AC). Edison went so far as to electrocute animals to discredit AC. But common sense and scalability prevailed, and AC became the standard. Today, no one argues about what kind of current powers their device. We simply expect it to work. Also Read: Will AI ever grasp quantum mechanics? Don't bet on it AI is taking a similar path. For most people, their introduction to AI has been through conversational tools like ChatGPT or voice assistants like Siri. But that's merely the tip of the iceberg. The real power of AI lies in systemic transformation. Last year, AI kind of won a Nobel prize in Chemistry! It was awarded to Demis Hassabis and John Jumper from Google DeepMind for protein structure prediction, a puzzle humanity has been attempting to solve for over five decades. It was made possible through an AI called AlphaFold2. This is the kind of systemic AI transformation that we need, and fortunately, there are inroads being made. Take Niramai, a women-led Indian startup that's revolutionizing breast cancer screening with non-invasive, radiation-free AI diagnostics. Or Wysa, a mental health startup using AI to deliver affordable Cognitive Behavioural Therapy to over 6 million users across 60 countries. Or Tapestry, incubated at Google X, which is making electrical grids more resilient by improving visibility and reducing complexity. These are not vanity projects. They're mission-driven innovations designed to solve problems that truly matter. But to scale the impact of AI, we need systemic thinking. Also Read: Indian states should adopt AI for inclusive growth and governance Systemic change needs systemic thinking: Some of India's core challenges such as air pollution, water scarcity, fragmented supply chains and rural health gaps are not 'market opportunities' in the traditional sense. They cannot be solved by building a prettier app or running a slick marketing campaign. These issues demand long-term thinking, policy alignment, patient capital and public-private partnerships. AI can bring transformative changes, and fortunately, help is at hand. First, let's look at the state-sponsored initiatives. The IndiaAI Mission is one such coordinated effort driving foundational capabilities across the ecosystem. Over 10,000 graphics processing units (GPUs) are being deployed through public-private partnerships, giving startups and researchers access to large-scale computing power. Indigenous AI models like BharatGen focus on developing context-specific datasets and models in areas like agriculture, healthcare and urban planning. Further, there's exemplary work being done in developing IndiaAI Datasets and skill-building programmes like YuvaAI. While the public sector is helping with core infrastructure and favourable policies, global investors, family offices and academia are investing in AI startups. Also Read: Rahul Matthan: Brace for a wave of AI-enabled criminal enterprise Incubation centres are also supporting hundreds of early-stage deep-tech ventures. That said, startups in this space don't just need funding or a pathway from campus labs to capital markets. They need frameworks. How do you design AI for scale? How do you ensure safety is built-in and not bolted on later? How do you unlock value while keeping costs grounded in reality? How does it treat linguistic minorities? Who are left out, who are counted and who are privileged? These aren't coding problems; they're systems design challenges. Across the board, large technology companies and innovation hubs are stepping up to help founders in their AI journey. This is where horizontal mentorship from technologists, product leaders and ethicists becomes a force multiplier. For me, the top-of-mind recall is Google for Startups, which I've mentored for over a decade. The accelerator has nurtured 17 cohorts, helping 237 startups raise over $4.5 billion and create 8,500 jobs. Today, the focus is sharper than ever, helping AI-first startups solve real problems through access to tools, mentorship, cloud infrastructure and, most importantly, guided thinking. Also Read: India must forge its own AI path amid a foundational tug of war Go for the right kind of AI growth: India doesn't need an AI ecosystem built purely on monetization and hype. It needs one built on resilience, inclusion and public good. This means investing in those already solving hard problems, often quietly and resourcefully. It means shifting our narrative from fear to responsibility, from siloed innovation to systemic collaboration. We are not just users of AI. We are—and must be—its co-creators. If we get this right, India won't just keep pace in the global AI race. It will set the benchmark for what responsible, equitable and high-impact AI-led growth should look like. The author is CEO of Agrahyah Technologies and adjunct professor of digital transformation at IIM Trichy.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into the world of global news and events? Download our app today from your preferred app store and start exploring.
app-storeplay-store