
Hanabi AI Launches OpenAudio S1: The World's First AI Voice Actor for Real-Time Emotional Control
SAN FRANCISCO--(BUSINESS WIRE)--Hanabi AI, a pioneering voice technology startup, today announced OpenAudio S1, the world's first AI voice actor and a breakthrough generative voice model that delivers unprecedented real-time emotional and tonal control. Moving beyond the limitations of traditional text-to-speech solutions, OpenAudio S1 creates nuanced, emotionally authentic vocal output that captures the full spectrum of human expression. The OpenAudio S1 model is available in open beta today on fish.audio, for everyone to try for free.
'We believe the future of AI voice-driven storytelling isn't just about generating speech—it's about performance,' said Shijia Liao, founder and CEO of Hanabi AI. 'With OpenAudio S1, we're shaping what we see as the next creative frontier: AI voice acting.'
From Synthesized Text-to-Speech Output to AI Voice Performance
At the heart of OpenAudio S1's innovation is transforming voice from merely a functional tool into a core element of storytelling. Rather than treating speech as a scripted output to synthesize, Hanabi AI views it as a performance to direct—complete with emotional depth, intentional pacing, and expressive nuance. Whether it's the trembling hesitation of suppressed anxiety before delivering difficult news, or the fragile excitement of an unexpected reunion, OpenAudio S1 allows users to control and fine tune vocal intensity, emotional resonance, and prosody in real time making voice output not just sound realistic, but feel authentically human.
'Voice is one of the most powerful ways to convey emotion, yet it's the most nuanced, the hardest to replicate, and the key to making machines feel truly human,' Liao emphasized, 'But it's been stuck in a text-to-speech mindset for too long. Ultimately, the difference between machine-generated speech and human speech comes down to emotional authenticity. It's not just what you say but how you say it. OpenAudio S1 is the first AI speech model that gives creators the power to direct voice acting as if they were working with a real human actor.'
State-of-the-Art Model Meets Controllability and Speed
Hanabi AI fuels creative vision with a robust technical foundation. OpenAudio S1 is powered by an end-to-end architecture with 4 billion parameters, trained extensively on diverse text and audio datasets. This advanced setup empowers S1 to capture emotional nuance and vocal subtleties with remarkable accuracy. Fully integrated into the fish.audio platform, S1 is accessible to a broad range of users—from creators generating long-form content in minutes to creative professionals fine-tuning every vocal inflection.
According to third-party benchmarks from Hugging Face's TTS Arena, OpenAudio S1 demonstrated consistent gains across key benchmarks, outperforming ElevenLabs, OpenAI, and Cartesia in key areas:
Expressiveness – S1 delivers more nuanced emotional expression and tonal variation, handling subtleties like sarcasm, joy, sadness, and fear with cinematic depth, unlike the limited emotional scope of current competing models.
Ultra-low latency – S1 offers sub-100ms latency, making it ideal for real-time applications like gaming, voice assistants, and live content creation where immediate response time is crucial. Competitors, like Cartesia and OpenAI, still experience higher latency, resulting in a less natural, more robotic response in real-time interactive settings.
Real-time fine-grained controllability – With S1, users can adjust tone, pitch, emotion, and pace in real time, using not only simple prompts such as (angry) or (voice quivering), but also a diverse range of more nuanced or creative instructions such as (confident but hiding fear) or (whispering with urgency). This allows for incredibly flexible and expressive voice generation tailored to a wide range of contexts and characters.
State-of-the-art voice cloning – Accurately replicates a speaker's rhythm, pacing, and timbre.
Multilingual, multi-speaker fluency – S1 seamlessly performs across 11 languages, excelling at handling multi-speaker environments (such as dialogues with multiple characters) in multilingual contexts, supporting seamless transitions between different languages without losing tonal consistency.
Pioneering Research Vision For the Future
OpenAudio S1 is just the first chapter. Hanabi's long-term mission is to build a true AI companion that doesn't just process information but connects with human emotion, intent, and presence. While many voice models today produce clear speech they still fall short of true emotional depth, and struggle to support the kind of trust, warmth, and natural interaction required of an AI companion. Instead of treating voice as an output layer, Hanabi treats it as the emotional core of the AI experience, because for an AI companion to feel natural, its voice must convey real feeling and connection.
To bring this vision to life, Hanabi advances both research and product in parallel. The company operates through two complementary divisions: OpenAudio, Hanabi's internal research lab, focuses on developing breakthrough voice models and advancing emotional nuance, real-time control, and speech fidelity. Meanwhile, Fish Audio serves as Hanabi's product arm, delivering a portfolio of accessible applications that bring these technological advancements directly to consumers.
Looking ahead, the company plans to progressively release core parts of OpenAudio's architecture, training pipeline, and inference stack to the public.
Real-World Impact with Scalable Innovation
With a four-people Gen Z founding team, the company scaled its annualized revenue from $400,000 to over $5 million between January and April 2025, while growing its MAU from 50,000 to 420,000 through Fish Audio's early products—including real-time performance tools and long-form audio generation. This traction reflects the team's ability to turn cutting-edge innovation into product experiences that resonate with a fast-growing creative community.
The founder & CEO, Shijia Liao, has spent over seven years in the field and been active in open-source AI development. Prior to Fish Audio, he led or participated in the development of several widely adopted speech and singing voice synthesis models—including So-VITS-SVC, GPT-SoVITS, Bert-VITS2, and Fish Speech—which remain influential in the research and creative coding communities today. That open-source foundation built both the technical core and the community trust that now powers fish.audio's early commercial momentum.
For a deeper dive into the research and philosophy behind OpenAudio S1, check out our launch blog post here: https://openaudio.com/blogs/s1
Pricing & Availability
Premium Membership (unlimited generation on Fish Audio Playground):
- $15 per month
- $120 per year
API: $15 per million UTF-8 bytes (approximately 20 hours of audio)
About Hanabi AI
Hanabi AI Inc. is pioneering the era of the AI Voice Actor —speech that you can direct as easily as video, shaping every inflection, pause, and emotion in real time. Built on our open-source roots, the Fish Audio platform gives filmmakers, streamers, and everyday creators frame-perfect control over how their stories sound.
Hashtags

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles
Yahoo
18 minutes ago
- Yahoo
Down Nearly 60%, Should You Buy the Dip on SoundHound AI?
SoundHound AI is growing rapidly, but it's racking up steep losses. Its growing dependence on acquisitions raises a few red flags. A lot of growth has already been baked into its valuations. 10 stocks we like better than SoundHound AI › SoundHound AI (NASDAQ: SOUN), a developer of artificial intelligence (AI)-powered audio recognition tools, saw its stock close at a record high of $24.23 on Dec. 26, 2024. But since then, its stock has declined nearly 60%. Let's see why this hot stock fizzled out -- and if its pullback represents a buying opportunity for long-term investors. SoundHound AI's namesake app identifies songs by listening to several seconds of recorded audio or a few hummed bars. However, most of its growth is fueled by Houndify, its developer platform, which allows businesses to create their own custom voice recognition tools. Houndify powers voice recognition features in restaurant ordering platforms, smart TVs, connected cars, and other devices. It's an appealing option for companies that don't want to send data to Microsoft, Alphabet's Google, or other tech giants that provide their own data-gathering voice recognition services. SoundHound AI initially attracted a lot of attention for three reasons. First, its revenue surged 47% in 2022, rose another 47% in 2023, and jumped 85% in 2024. Second, the booming AI market drove a stampede of bulls to its AI-driven stock. Lastly, the AI chipmaking bellwether Nvidia (NASDAQ: NVDA) boosted its stake in SoundHound and integrated its voice recognition tools into its Drive platform for connected vehicles. Yet Soundify's stock stumbled for three reasons. First, most of its growth in 2023 and 2024 was driven by acquisitions -- including the restaurant AI company SYNQ3, the online food ordering platform Allset, and the conversational AI company Amelia. That strategy strengthened its position in the restaurant industry, but it also indicated it was running out of room to grow. Second, those acquisitions made it even tougher to break even. Its adjusted earnings before interest, taxes, depreciation, and amortization (EBITDA) margins came in at negative 73% last year -- which broadly missed its original target of achieving a positive adjusted EBITDA margin by 2024. Lastly, Nvidia liquidated its entire position in SoundHound AI earlier this year. SoundHound ended 2024 with a backlog of $1.2 billion, and it already serves big automakers like Stellantis, quick-serve restaurants like Chipotle, healthcare institutions like MUSC Health, and tech giants like Tencent. Automakers are adding more voice-activated features to their vehicles, restaurants are using more of its AI tools to process their drive-thru and phone orders, and healthcare institutions are processing more patient requests with Amelia's AI chatbots. SoundHound could still have plenty of room to expand. From 2025 to 2035, the global voice agents market could grow at a compound annual growth rate (CAGR) of 34.8%, according to market research firm as more companies replace their human workers with AI-powered voice agents. For 2025, SoundHound expects its revenue to surge 97%. From 2024 to 2027, analysts expect its revenue to rise at a CAGR of 48%, from $85 million to $277 million. They also expect it to finally squeeze out a positive adjusted EBITDA of $5 million in 2027. That outlook seems promising, but a lot of its future growth has already been baked into its valuations. With a market cap of $4.1 billion, it already trades at 25.5 times this year's sales. It's also more than doubled its number of shares since it went public by merging with a special purpose acquisition company (SPAC) just over three years ago, and that dilution will likely continue as it relies on its secondary offerings to raise fresh cash and its stock-based compensation to subsidize its salaries and acquisitions. So while SoundHound AI is still growing rapidly, it hasn't proven that it deserves its premium valuation or that its business model is sustainable. I might nibble on the stock after its recent pullback -- since its core market is still expanding -- but I wouldn't go all in until it meaningfully narrows its losses. Before you buy stock in SoundHound AI, consider this: The Motley Fool Stock Advisor analyst team just identified what they believe are the for investors to buy now… and SoundHound AI wasn't one of them. The 10 stocks that made the cut could produce monster returns in the coming years. Consider when Netflix made this list on December 17, 2004... if you invested $1,000 at the time of our recommendation, you'd have $656,825!* Or when Nvidia made this list on April 15, 2005... if you invested $1,000 at the time of our recommendation, you'd have $865,550!* Now, it's worth noting Stock Advisor's total average return is 994% — a market-crushing outperformance compared to 172% for the S&P 500. Don't miss out on the latest top 10 list, available when you join . See the 10 stocks » *Stock Advisor returns as of June 2, 2025 Suzanne Frey, an executive at Alphabet, is a member of The Motley Fool's board of directors. Leo Sun has no position in any of the stocks mentioned. The Motley Fool has positions in and recommends Alphabet, Chipotle Mexican Grill, Microsoft, Nvidia, and Tencent. The Motley Fool recommends Stellantis and recommends the following options: long January 2026 $395 calls on Microsoft, short January 2026 $405 calls on Microsoft, and short June 2025 $55 calls on Chipotle Mexican Grill. The Motley Fool has a disclosure policy. Down Nearly 60%, Should You Buy the Dip on SoundHound AI? was originally published by The Motley Fool Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data


Geek Vibes Nation
21 minutes ago
- Geek Vibes Nation
ChatGPT Helped A Developer Finish A Week's Work In 3 Hours - Here's How It Works
— Gaurav Bisen (@thepmfguy) June 5, 2025 Real-Life Examples: How Developers Are Using ChatGPT for Faster Results One developer shared their experience of how ChatGPT saved them 7 hours on a single project. The task involved creating an API for an application, and using ChatGPT, they were able to: Generate API routes and endpoints for basic CRUD operations. Write detailed documentation to accompany the API. Fix bugs in the application logic that were causing errors during testing. By automating these processes with ChatGPT, the developer was able to spend more time on fine-tuning the user interface and optimizing the overall performance of the app. Limitations of ChatGPT in Development While ChatGPT offers immense benefits, it's not without its limitations: Complex problem-solving: While ChatGPT excels at basic coding tasks, it may struggle with highly complex or niche technical problems that require deep domain-specific knowledge. Context retention: Although ChatGPT is capable of processing code snippets, it sometimes lacks the ability to understand long-term project context, especially for larger projects that span multiple files. Creative tasks: For highly creative coding challenges that require out-of-the-box thinking, ChatGPT might not always provide the best solution. Despite these limitations, ChatGPT remains a powerful tool for developers, particularly when used in conjunction with other AI models within a platform like Chatronix. Conclusion: The Future of AI in Development In 2025, AI tools like ChatGPT are revolutionizing the development process. From generating code to debugging and automating repetitive tasks, developers are increasingly relying on AI to accelerate their workflows and boost productivity. While ChatGPT is a fantastic tool on its own, platforms like Chatronix take the experience further by offering multiple AI models in one unified space. This allows developers to compare outputs, automate even more tasks, and optimize their workflows for greater success. Curious to see how Chatronix can improve your development process? Try it today and experience the future of coding. Emily Henry writes for UKWritings Reviews and Write My Research Paper . She writes articles on many subjects including writing great resumes. Emily is also an editor at State Of Writing .


Associated Press
22 minutes ago
- Associated Press
Senate Republicans revise ban on state AI regulations in bid to preserve controversial provision
WASHINGTON (AP) — Senate Republicans have made changes to their party's sweeping tax bill in hopes of preserving a new policy that would prevent states from regulating artificial intelligence for a decade. In legislative text unveiled Thursday night, Senate Republicans proposed denying states federal funding for broadband projects if they regulate AI. That's a change from a provision in the House-passed version of the tax overhaul that simply banned any current or future AI regulations by the states for 10 years. 'These provisions fulfill the mandate given to President Trump and Congressional Republicans by the voters: to unleash America's full economic potential and keep her safe from enemies,' Sen. Ted Cruz, chairman of the Senate Commerce Committee, said in a statement announcing the changes. The proposed ban has angered state lawmakers in Democratic and Republican-led states and alarmed some digital safety advocates concerned about how AI will develop as the technology rapidly advances. But leading AI executives, including OpenAI's Sam Altman, have made the case to senators that a 'patchwork' of state AI regulations would cripple innovation. Some House Republicans are also uneasy with the provision. Rep. Marjorie Taylor Greene, R-Ga., came out against the AI regulatory moratorium in the House bill after voting for it. She said she had not read that section of the bill. 'We should be reducing federal power and preserving state power. Not the other way around,' Greene wrote on social media. Senate Republicans made their change in an attempt to follow the special process being used to pass the tax bill with a simple majority vote. To comply with those rules, any provision needs to deal primarily with the federal budget and not government policy. Republican leaders argue, essentially, that by setting conditions for states to receive certain federal appropriations — in this instance, funding for broadband internet infrastructure — they would meet the Senate's standard for using a majority vote. Cruz told reporters Thursday that he will make his case next week to Senate parliamentarian on why the revised ban satisfies the rules. The parliamentarian is the chamber's advisor on its proper rules and procedures. While the parliamentarian's ruling are not binding, senators of both parties have adhered to their findings in the past. Senators generally argue that Congress should take the lead on regulating AI but so far the two parties have been unable to broker a deal that is acceptable to Republicans' and Democrats' divergent concerns. The GOP legislation also includes significant changes to how the federal government auctions commercial spectrum ranges. Those new provisions expand the range of spectrum available for commercial use, an issue that has divided lawmakers over how to balance questions of national security alongside providing telecommunications firms access to more frequencies for commercial wireless use. Senators are aiming to pass the tax package, which extends the 2017 rate cuts and other breaks from President Donald Trump's first term along with new tax breaks and steep cuts to social programs, later this month.