Latest news with #Deepgram


Economic Times
11-05-2025
- Business
- Economic Times
OpenAI leads surge in business AI adoption, Ramp AI Index reveals
Reuters The OpenAI logo, a keyboard, and robot hands are seen in this illustration taken January 27, 2025. REUTERS/Dado Ruvic/Illustration OpenAI is at the forefront of enterprise AI adoption, topping the Ramp AI Index by acquiring customers faster than any other provider on American fintech company Ramp's platform. Chinese AI company Manus AI follows closely in second Ramp AI Index, which tracks real corporate spending on AI tools and services from over 30,000 US businesses, highlights a significant uptick in enterprise AI usage. The data is compiled monthly using actual transactions from Ramp's corporate card and bill payment platform, offering a tangible measurement of how businesses are embracing artificial intelligence. While foundational model providers continue to dominate, the report shows a notable rise in the adoption of specialised AI tools tailored to specific enterprise needs. Specialised AI: The next big game One standout example is Turbopuffer, an internal data search engine that leverages vector search to handle billions of entries efficiently. Its speed and precision make it popular among technical teams seeking scalable AI infrastructure. Other rapidly growing AI vendors include: Jasper, which provides AI-powered writing tools for marketers. Deepgram, a speech recognition platform for voice transcription. Snowflake, whose Cortex suite enables businesses to integrate large language models and semantic functions directly into SQL workflows, empowering data teams without requiring system overhauls. Enterprise adoption accelerates ET had earlier reported that larger companies—with annual revenues of at least $500 million—are adopting AI more quickly than smaller organisations. Ramp's latest data supports this trend and further reveals that smaller, specialised AI vendors are seeing impressive gains. Several new entrants climbed into the top ranks for AI-related spending in May, underscoring a shift beyond the dominance of big foundational model new customer count, OpenAI, Cursor, Canva, LinkedIn and GoDaddy lead the charts whereas Maxon Computer, JasperAI and are next in line after Manus AI in terms of largest percentage change in customer count. A recent survey found that one in three tech professionals in India is currently undergoing formal AI training via their employers—highlighting the growing demand for AI-related skills. Ramp also noted that actual AI adoption may be higher than reported, as many businesses use free tools or rely on employees' personal accounts—factors not captured in transaction-based data. Global AI market outlook The global enterprise AI market was valued at $23.95 billion in 2024 and is expected to grow at a compound annual growth rate (CAGR) of 37.6% from 2025 to 2030. However, in India, AI adoption is still maturing. According to Krishna Vij from TeamLease Digital, a talent gap of nearly 50% persists. While India has around 4.2 lakh AI professionals, the estimated need is closer to six lakh. Competition from China Despite restrictions on AI chip exports from the US, China has become the second-largest producer of AI models across text, image, video, and audio domains. As of early 2024, 36% of the 1,328 large language models (LLMs) globally originated in China, second only to the US. In a further push, the Chinese government and private investors have launched a new AI fund worth 60 billion yuan (approximately $8.2 billion).Major developments include: Alibaba's Qwen Series, DeepSeek's R1, Tencent's Hunyuan Turbo S and Manus AI. Manus AI, which has made notable strides toward AI autonomy, can execute complex multi-step workflows and access reliable data via APIs. It has achieved state-of-the-art (SOTA) performance across three difficulty levels. While the US continues to lead AI model development—producing 40 significant models in 2024—China is rapidly closing the gap. The latest Artificial Intelligence Index Report signals a transformative shift in the global AI landscape, as China accelerates its capabilities and investments.
Yahoo
18-02-2025
- Business
- Yahoo
Deepgram Achieves Key Milestone on Path to Delivering Next-Gen, Enterprise-Grade Speech-to-Speech Architecture
Pioneering Achievement Delivers Speech-To-Speech Technology Without Intermediate Text Representations, Setting the Stage for Fully Fluid, Human-Like Enterprise Voice AI Applications SAN FRANCISCO, February 18, 2025--(BUSINESS WIRE)--Deepgram, the leader in enterprise-grade speech AI, today announced a significant technical achievement in speech-to-speech (STS) technology for enterprise use cases. The company has successfully developed a speech-to-speech model that operates without relying on text conversion at any stage, marking a pivotal step toward the development of contextualized end-to-end speech AI systems. This milestone will enable fully natural and responsive voice interactions that preserve nuances, intonation, and emotional tone throughout real-time communication. When fully operationalized, this architecture will be delivered to customers via a simple upgrade from our existing industry-leading architecture. By adopting this technology alongside Deepgram's full-featured voice AI platform, companies will gain a strategic advantage, positioning themselves to deliver cutting-edge, scalable voice AI solutions that evolve with the market and outpace competitors. Advancements Over Existing Architectures Existing speech-to-speech (STS) systems are based on architectures that process speech through sequential stages, such as speech-to-text, text-to-text, and text-to-speech. These architectures have become the standard for production deployments for their modularity and maturity, but eliminating text as an intermediary offers opportunities to improve latency and better preserve emotional and contextual nuances. Meanwhile, multimodal LLMs like Gemini, GPT-4o, and Llama have evolved beyond text-only capabilities to accept additional inputs such as images, videos, and audio. However, despite these advancements, they struggle to capture the fluidity and nuance of human-like conversation. These models still rely on a turn-based framework, where audio input is tokenized and processed within a textual domain, restricting real-time interactivity and expressiveness. To advance the frontier of speech AI, Deepgram is setting the stage for end-to-end STS models, which offer a more direct approach by converting speech to speech without relying on text. Recent research on speech-to-speech models, such as Hertz and Moshi, has highlighted the significant challenges in developing models that are robust and reliable enough for enterprise use cases. These difficulties stem from the inherent complexities of modeling conversational speech and the substantial computational resources required. Overcoming these hurdles demands innovations in data collection, model architecture, and training methodologies. Delivering Speech-to-Speech with Latent Space Embeddings Deepgram is transforming speech-to-speech modeling with a new architecture that fuses the latent spaces of specialized components, eliminating the need for text conversion between them. By embedding speech directly into a latent space, Deepgram ensures that important characteristics such as intonation, pacing, and situational and emotional context are preserved throughout the entire processing pipeline. What sets Deepgram apart is its approach to fusing the hidden states—the internal representations that capture meaning, context, and structure—of each individual function: Speech-to-Text (STT), Large Language Model (LLM), and Text-to-Speech (TTS). This fusion is the first step toward training a controllable single, true end-to-end speech model, enabling seamless processing while retaining the strengths of each best-in-class component. This breakthrough has significant implications for enterprise applications, facilitating more natural conversations while maintaining the control and reliability businesses require. "This achievement represents a fundamental shift in how AI systems can process and respond to human speech," said Scott Stephenson, CEO and Co-founder of Deepgram. "By eliminating text as an intermediate step, we're preserving crucial elements of communication and maintaining the precise control that enterprises need for mission-critical applications." This technical advancement builds on Deepgram's expertise in enterprise speech AI, with over 200,000 developers using its platform, more than 50,000 years of audio processed, and over 1 trillion words transcribed. Key benefits of the new architecture include: Optimized latency design for faster, more responsive interactions Enhanced naturalness, preserving emotional context and conversational nuances Native ability to handle complex, multi-turn conversations Unified, end-to-end training across the entire model, creating a more cohesive and inherently adaptive system that fine-tunes its understanding and response generation directly in the audio space. Utilizing Transfer Learning for Cost-Efficient, High-Accuracy Speech-to-Speech Deepgram's research in the space is accelerated by its use of transfer learning and best-in-class pre-trained models, allowing it to achieve high accuracy with significantly less training data than traditional methods. Without latent techniques, training a model at the scale needed for speech-to-speech would require over 80 billion hours of audio—more than humanity has ever recorded. However, Deepgram's latent space embeddings and transfer learning approach achieve superior comprehension while significantly reducing costs, maintaining interpretability, and accelerating enterprise deployment. This efficiency enables Deepgram to deliver scalable, end-to-end speech AI that meets the demands of real-world voice applications. Empowering Developers with Full Debuggability One of the requirements in enterprise speech-to-speech modeling is the ability to understand and troubleshoot each step of the process. This is particularly challenging when text conversion between steps isn't involved, as verifying both the accuracy of the initial perception and the alignment of the spoken output with the intended response is not straightforward. Deepgram recognized this need and addressed it by designing a new architecture that enables debuggability throughout the entire process. This architecture allows developers to inspect and understand how the system processes spoken dialogue. The design incorporates speech modeling of perception, natural language understanding/generation, and speech production, preserving distinct capabilities during training. Through the ability to decode intermediate representations back to text at specific points, developers can gain insight into what the model perceives, thinks, and generates, ensuring its internal representation aligns with the model output and stays true to the intent of the business user, addressing hallucination concern in scaled business use cases. This capability allows the user to peer into each step throughout generation, helping refine models, improve performance, and deliver more accurate, lifelike, and reliable speech-to-speech solutions. Beyond Speech-to-Speech – A Complete, Enterprise-Ready Voice AI Stack While building an advanced speech-to-speech (STS) model is a major technical achievement, enterprises need more than just a model—they need a complete, scalable platform that ensures seamless deployment, adaptability, and cost efficiency. Deepgram delivers not just cutting-edge STS technology, but an enterprise-ready infrastructure designed for real-world applications. Seamless Integration & Continuous Improvement – Once Deepgram's end-to-end STS model moves to production, businesses will be able to adopt this breakthrough directly through our developer-friendly voice agent API from within the current Deepgram platform. Through continued innovation, enterprises will benefit from the latest advancements, ensuring seamless integration and a future-proof platform for their voice AI applications. Enterprise-Grade Performance & Cost Efficiency – Built for low customer COGS, our platform enables enterprises to deploy high-performance voice AI without excessive costs. This ensures scalability, whether for customer service automation, real-time voice agents, or multilingual applications. Full-Featured Platform and High-Performance Runtime – Deepgram's platform includes powerful capabilities such as: Adaptability - Dynamically fine-tune models for specific industry language, ensuring high accuracy across diverse applications without needing constant retraining. Automation - Streamline transcription, model updates, and data processing, reducing overhead and accelerating deployment. Synthetic data generation - Generate synthetic voice data to improve model training, even with limited real-world data, enhancing accuracy for niche use cases. Data curation - Clean, manage, and organize training data to ensure high-quality, relevant input, improving model performance. Model hot-swapping - Seamlessly switch between different models to optimize performance for specific tasks. Integrations - Effortlessly integrate Deepgram's voice AI with cloud platforms, enterprise systems, and third-party applications, embedding it within existing workflows. With Deepgram, enterprises don't just get speech-to-speech—they get the most advanced, enterprise-ready voice AI platform, designed for real-world deployment and long-term innovation. For more information about Deepgram's novel approach for speech-to-speech, read the technical brief. To learn more about Deepgram's suite of voice AI infrastructure, visit Additional Resources: Explore the technical brief on Deepgram's novel speech-to-speech architecture Watch a fun demo of Deepgram's voice agent API Try Deepgram's interactive demo Get $200 in free credits and try Deepgram for yourself About Deepgram Deepgram is the leading voice AI platform for enterprise use cases, offering speech-to-text (STT), text-to-speech (TTS), and full speech-to-speech (STS) capabilities. 200,000+ developers build with Deepgram's voice-native foundational models – accessed through cloud APIs or as self-hosted / on-premises APIs – due to our unmatched accuracy, low latency, and pricing. Customers include technology ISVs building voice products or platforms, co-sell partners working with large enterprises, and enterprises solving internal use cases. Having processed over 50,000 years of audio and transcribed over 1 trillion words, there is no organization in the world that understands voice better than Deepgram. To learn more, visit read our developer docs, or follow @DeepgramAI on X and LinkedIn. View source version on Contacts PR Contact: Nicole GormanGorman Communications, for DeepgramM: Sign in to access your portfolio