
Nvidia launches fully open source transcription AI model Parakeet-TDT-0.6B-V2 on Hugging Face
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
Nvidia has become one of the most valuable companies in the world in recent years thanks to the stock market noticing how much demand there is for graphics processing units (GPUs), the powerful chips Nvidia makes that are used to render graphics in video games but also, increasingly, train AI large language and diffusion models.
But Nvidia does far more than just make hardware, of course, and the software to run it. As the generative AI era wears on, the Santa Clara-based company has also been steadily releasing more and more of its own AI models — mostly open source and free for researchers and developers to take, download, modify and use commercially — and the latest among them is Parakeet-TDT-0.6B-v2, an automatic speech recognition (ASR) model that can, in the words of Hugging Face's Vaibhav 'VB' Srivastav, 'transcribe 60 minutes of audio in 1 second [mind blown emoji].'
This is the new generation of the Parakeet model Nvidia first unveiled back in January 2024 and updated again in April of that year, but this version two is so powerful, it currently tops the Hugging Face Open ASR Leaderboard with an average 'Word Error Rate' (times the model incorrectly transcribes a spoken word) of just 6.05% (out of 100).
To put that in perspective, it nears proprietary transcription models such as OpenAI's GPT-4o-transcribe (with a WER of 2.46% in English) and ElevenLabs Scribe (3.3%).
And it's offering all this while remaining freely available under a commercially permissive Creative Commons CC-BY-4.0 license, making it an attractive proposition for commercial enterprises and indie developers looking to build speech recognition and transcription services into their paid applications.
The model boasts 600 million parameters and leverages a combination of the FastConformer encoder and TDT decoder architectures.
It is capable of transcribing an hour of audio in just one second, provided it's running on Nvidia's GPU-accelerated hardware.
The performance benchmark is measured at an RTFx (Real-Time Factor) of 3386.02 with a batch size of 128, placing it at the top of current ASR benchmarks maintained by Hugging Face.
Released globally on May 1, 2025, Parakeet-TDT-0.6B-v2 is aimed at developers, researchers, and industry teams building applications such as transcription services, voice assistants, subtitle generators, and conversational AI platforms.
The model supports punctuation, capitalization, and detailed word-level timestamping, offering a full transcription package for a wide range of speech-to-text needs.
Developers can deploy the model using Nvidia's NeMo toolkit. The setup process is compatible with Python and PyTorch, and the model can be used directly or fine-tuned for domain-specific tasks.
The open-source license (CC-BY-4.0) also allows for commercial use, making it appealing to startups and enterprises alike.
Parakeet-TDT-0.6B-v2 was trained on a diverse and large-scale corpus called the Granary dataset. This includes around 120,000 hours of English audio, composed of 10,000 hours of high-quality human-transcribed data and 110,000 hours of pseudo-labeled speech.
Sources range from well-known datasets like LibriSpeech and Mozilla Common Voice to YouTube-Commons and Librilight.
Nvidia plans to make the Granary dataset publicly available following its presentation at Interspeech 2025.
The model was evaluated across multiple English-language ASR benchmarks, including AMI, Earnings22, GigaSpeech, and SPGISpeech, and showed strong generalization performance. It remains robust under varied noise conditions and performs well even with telephony-style audio formats, with only modest degradation at lower signal-to-noise ratios.
Parakeet-TDT-0.6B-v2 is optimized for Nvidia GPU environments, supporting hardware such as the A100, H100, T4, and V100 boards.
While high-end GPUs maximize performance, the model can still be loaded on systems with as little as 2GB of RAM, allowing for broader deployment scenarios.
NVIDIA notes that the model was developed without the use of personal data and adheres to its responsible AI framework.
Although no specific measures were taken to mitigate demographic bias, the model passed internal quality standards and includes detailed documentation on its training process, dataset provenance, and privacy compliance.
The release drew attention from the machine learning and open-source communities, especially after being publicly highlighted on social media. Commentators noted the model's ability to outperform commercial ASR alternatives while remaining fully open source and commercially usable.
Developers interested in trying the model can access it via Hugging Face or through Nvidia's NeMo toolkit. Installation instructions, demo scripts, and integration guidance are readily available to facilitate experimentation and deployment.
Hashtags

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles
Yahoo
20 minutes ago
- Yahoo
Apple Rolls Out AI-Driven iOS 26 at WWDC
Apple (NASDAQ:AAPL) kicks off its AI and design revolution with iOS 26, unveiling a Liquid Glass interface and deeper ChatGPT integrations across core apps at WWDC 2025. Apple's vice president of Human Interface Design, Alan Dye, described iOS 26 as the broadest update ever, featuring Liquid Glassa dynamic material that marries glass-like optics with fluid responsivenessapplied universally to iPhone, iPad, Watch, TV and Mac. Native apps from Camera and Safari to Messages and Wallet get fresh layouts that adapt to context, while CarPlay and Maps gain streamlined controls. On the AI front, Apple amplifies its partnership with OpenAI by embedding ChatGPT into Image Playground and Genmoji, letting users generate bespoke images and emojis from text or photoswith full user consent. Visual Intelligence now lets you ask ChatGPT about anything on your screen without switching apps, and Live Translations delivers real-time captions and translations in Messages, FaceTime and Phoneeven with non-iPhone users. Craig Federighi, SVP of Software Engineering, announced expanded Siri personalization and an open Foundation Models Framework that enables any third-party app to tap into Apple Intelligence. Why It Matters: iOS 26's Liquid Glass aesthetic and deep AI mash-up could drive device upgrade cycles and boost services revenue as Apple rolls out paid AI features. It underscores Apple's strategy to monetize AI through its hardware moat rather than by chasing model supremacy. This article first appeared on GuruFocus.


Axios
21 minutes ago
- Axios
Exclusive: Crypto security startup Hypernative raises $40M
Hypernative, a provider of real-time threat prevention for crypto companies, raised $40 million in Series B funding, its founders tell Axios exclusively. Why it matters: Security remains a major barrier to mass adoption of crypto, which is plagued by increasingly sophisticated hacks and exploits. How it works: Hypernative's technology monitors blockchain transactions to detect and respond to potential threats before they happen. "We know how to classify that preparation and then essentially front-run their attacks before they're actually triggering it," Hypernative CTO Dan Caspi explains. Its platform identifies patterns of on-chain preparations that attackers typically make before launching an exploit. Using AI and machine learning models, it tracks and analyzes on-chain and off-chain data sources, simulating transaction outcomes to protect users and assets in real time. State of play: The crypto security market is becoming increasingly competitive, with numerous startups offering solutions to prevent hacks, fraud and exploits. By the numbers: Israel-based Hypernative serves more than 200 customers, protecting assets totaling over $100 billion. In 2024, the platform detected over $2.2 billion in potential losses from hacks and exploits, a 22% increase from the previous year, the company says. "We've already saved huge amounts of money in real time for customers and non-customers," Hypernative CEO Gal Sagie says. Case in point:"Just yesterday, there was a protocol that was not a customer of ours, and we managed to reach out to them and saved around $10 million," he adds. Zoom out: Traditional financial institutions are increasingly becoming interested in blockchain technology, driven by regulatory clarity and institutional demand for digital assets. "With the new regulation and the new administration, we see a lot of demand from more traditional financial institutions," Sagie says. Zoom in: Ten Eleven Ventures and Ballistic Ventures led the round, which included participation from StepStone Group, Boldstart Ventures and the IBI Tech Fund.


CNN
22 minutes ago
- CNN
Mark Zuckerberg is reportedly recruiting a team to build a ‘superintelligence'
Meta CEO Mark Zuckerberg is personally assembling a team to achieve a 'superintelligence,' machines that are capable of surpassing human capabilities, according to a Bloomberg report. Zuckerberg is reportedly so frustrated with Meta's efforts in the artificial intelligence space that he has taken it upon himself to meet with experts in the field at his homes in Lake Tahoe and Palo Alto, California. Meta and Zuckerberg did not immediately respond to a request for comment. Meta has created AI tools that are woven into Facebook, WhatsApp and other Meta-owned apps, as well as its Ray-Ban glasses and chatbots. But the extremely competitive AI landscape continues to be led by ChatGPT-maker OpenAI, and Meta's Llama AI model has faced some recent setbacks. Zuckerberg plans to hire about 50 people and has shifted the layout of the company's Menlo Park headquarters to put the new AI team near his office, Bloomberg reported Tuesday, citing people that asked to remain anonymous. Zuckerberg has personally taken on this task because he's frustrated with the progress of Llama 4, Meta's latest large language model, according to Bloomberg. The New York Times, which separately confirmed many details of the report, also said that Alexandr Wang, the 28-year-old founder and CEO of startup Scale AI, is part of the project with Meta mulling a billions in investment in his company. Zuckerberg has reportedly told people that the initiative would be funded by Meta's massive advertising business. It's unclear how the new team would work with Meta's existing AI team, Bloomberg said. Over the past few years, Zuckerberg has pushed increasingly further into repositioning Meta into an AI powerhouse with mixed success. His intensity in the area has sharpened following the leaps in advancement from OpenAI, a rival that raised tens of billions of dollars in funding. Zuckerberg's superintelligence goal is extremely lofty. Before AI can achieve capabilities that outmatch humans' brains, the technology first needs to become capable of accomplishing anything a human can do – a so-called artificial general intelligence. AI researchers debate how close we are to that AGI goal, with some saying we're years away and other saying we're nowhere close and we have no path to achieving it. Nevertheless, the AI race is as competitive as any tech battle in recent memory. Meta is facing off with Microsoft-backed OpenAI and Alphabet, as well as a host of other major upstarts with serious funding, including Elon Musk's xAI and Anthropic. Apple has gotten a slow start but announced some of its own AI developments this week. Many tech leaders like Zuckerberg believe AI represents an existential threat to their businesses. Meta has been trying to differentiate itself with Llama by making it open source, a free-to-use AI model that seeks to become the basis for the majority of the world's AI (think Android for artificial intelligence). Google believes AI poses a significant threat to its search business: If people can just ask an AI model for the answer, why search for anything? Apple understands that AI may ultimately make apps moot, potentially undermining its smartphone dominance. And OpenAI may have gotten a massive head start with ChatGPT, but competitors are quickly catching up. CNN's David Goldman contributed to this report.