Latest news with #speechrecognition


CNET
31-07-2025
- Business
- CNET
This $180 Rosetta Stone Subscription Gives You Lifetime Access to 25 Languages
Learning a new language can be both fun and useful, especially if you're planning to travel internationally this summer or just want to improve your language skills. But few of us have the time and money to dedicate to ongoing in-person classes. Luckily, language learning programs like Rosetta Stone can make the language learning process much easier. And right now, thanks to a StackSocial deal, it's a lot more affordable, too. Currently, StackSocial is offering a lifetime subscription to Rosetta Stone at 54% off, reducing the cost to $180 for a limited time. Rosetta Stone can help you learn many of the world's most popular languages. That includes Arabic, Dutch, Filipino, French, German, Greek, Hebrew, Hindi, Irish, Italian, Japanese, Korean, Polish, Russian, Swedish, Turkish, Vietnamese and others. Every single lesson on Rosetta Stone is split up into smaller, more manageable parts. And you can download lessons to complete offline if you find yourself wanting to keep going on the plane. Plus, you can practice your accent using speech recognition technology. Hey, did you know? CNET Deals texts are free, easy and save you money. Your one-time purchase gains you lifetime access to Rosetta Stone's language learning platform. Keep in mind this offer is for new users only, and there are some system requirements in order to use the software. You'll need either Windows 7 or higher or Mac OS X 10.9 or higher. If you're looking for more ways to prepare for your travel plans this summer, don't forget these must-have travel essentials. Why this deal matters For less than $200, you can grab yourself a lifetime subscription to Rosetta Stone. If you buy a subscription directly on Rosetta Stone, a one-year subscription will set you back $131. You can grab the lifetime membership for just $49 more, making this a pretty sweet deal.

Entrepreneur
17-07-2025
- Business
- Entrepreneur
Zoho Bets Big on AI with Proprietary Large Language Model, Agents
Currently, India is Zoho's second largest market by revenue which grew 32 per cent in 2024 and at a CAGR of 51 per cent for the last 10 years You're reading Entrepreneur India, an international franchise of Entrepreneur Media. Software-as-a-service (SaaS) major Zoho on Thursday announced the launch of its proprietary large language model, Zia LLM, Automatic Speech Recognition models in English and Hindi, a no-code agent builder, Zia Agent Studio, as well as a model context protocol (MCP) server to open up Zoho's vast library of actions to third-party agents. Zoho also launched over 25 ready-to-deploy Zia Agents, including a few specifically for Indian customers. The announcements were made on the sidelines of Zoholics India, the company's annual user conference held this year in Bengaluru. Currently, India is Zoho's second largest market by revenue which grew 32 per cent in 2024 and at a CAGR of 51 per cent for the last 10 years. The US continues to be the largest market while the UK, Canada, and UAE follow India as the next largest markets in that order. "Today's announcement emphasises Zoho's longstanding aim to build foundational technology focused on protection of customer data, breadth and depth of capabilities because of the business context, and value," said Mani Vembu, CEO, Zoho. "Our LLM model is trained specifically for business use cases, keeping privacy and governance at its core, which has resulted in lowering the inference cost, passing on that value to the customers, while also ensuring that they are able to utilise AI productively and efficiently." "Our differentiation comes from offering agents over our low code platform so that there is a human in the loop for verification and modification. We call this co-creation with the AI agent. It is much simpler to verify and make changes in the UI screen than reading the code. We are enabling this across all the features to make it simpler to verify and validate the AI output," he added. The LLM market is projected to grow from USD 7.79 billion to USD 130.65 billion by 2034, registering a CAGR of 36.8 per cent, according to a report by Polaris Market Research. The report further stated that Asia Pacific is projected to witness the fastest market growth during the forecast period, driven by digitalization, rising data generation, and increasing demand for multilingual AI solutions to support diverse, rapidly expanding digital economies. Zoho's Zia LLM has been built completely in-house by leveraging Nvidia's AI accelerated computing platform. Trained with Zoho product use cases in mind, Zia LLM comprises three models with 1.3 billion, 2.6 billion, and 7 billion parameters, each separately trained and optimised for contextual applicability that benchmark competitively against comparable open source models in the market. "The three models allow Zoho to always optimise the right model for the right user context, striking the balance between power and resource management. This focus on right-sizing the model is an ongoing development strategy for Zoho. In the short term, Zoho will scale Zia LLM's model sizes, starting with the first set of parameter increases by the end of 2025," the company said in a statement. While Zoho supports many LLM integrations for users, including ChatGPT, Llama, and DeepSeek, Zia LLM continues Zoho's commitment to data privacy by allowing customers to keep their data on Zoho servers, leveraging the latest AI capabilities without sending their data to AI cloud providers. Zia LLM has been deployed across Zoho's data centres in the US, India, and Europe. The model is currently testing for internal use cases across Zoho's broad app portfolio, and will be available for customer use in coming months. Zoho also announced two proprietary Automatic Speech Recognition (ASR) models for speech-to-text conversion for English and Hindi. Optimised to perform on a low computer load without compromising on accuracy, the models benchmark up to 75 per cent better than comparable models across standard tests, the company said. Zoho is touted as one of the first companies from India to have developed an English ASR model. It plans to expand the available languages, beginning with other Indian and European languages. It will also introduce a reasoning language model (RLM). To enable immediate adoption of agentic technology, Zoho has developed a roster of AI agents contextually baked into its products. These agents can be used across various business activities, handling relevant actions based on the role of the user. The company has also launched AI Agents specifically for Indian businesses for verification of PAN card, Voter ID, Udyog Aadhar, GSTIN, Driving Licence, LPG connection and Electricity Bill. These can be utilised for a variety of use cases, such as employee background verification by HR teams or for document verification in financial services organisations.


Forbes
03-07-2025
- Business
- Forbes
Silent Signals: How AI Can Read Between The Lines In Your Voice
Harshal Shah is a Senior Product Manager with over a decade of experience delivering innovative audio and voice solutions. Voice technologies are no longer just about recognizing what we say; they are beginning to understand how we say it. As artificial intelligence (AI) advances, it can detect subtle emotional signals in our speech, promising more human-like interactions with machines. Emotional AI is reshaping how voice data is used across industries. Think about the last time you spoke with someone who could instantly tell how you felt without you ever saying it. That intuitive recognition is a critical part of how we build trust and empathy. As machines play an increasing role in our lives, they must learn to grasp not just what we say, but how we say it to truly support us in meaningful ways. In this article, I'll explore how AI is learning to interpret the emotional undercurrents in our voices and why it matters more than ever. As someone who has spent over a decade advancing voice and audio technologies across multiple industries, I focus on tuning speech interfaces to detect what people say and how they say it. I have led real-time voice recognition efforts and developed industry guidelines for speech clarity and inclusive interaction. I am passionate about building voice technologies that align with how humans naturally communicate. Understanding Emotional AI And Paralinguistics Have you ever wondered how much your tone of voice says about you? It's not just about the words we speak; it's how we speak them. In my experience, understanding how people talk, their tone, pauses and energy often tells you more than the words themselves. Paralinguistic voice analysis focuses on non-verbal elements of speech like tone, pitch, volume, pauses and rhythm that convey emotion, intention or attitude. While traditional voice recognition focused on transcribing spoken words, emotional AI adds a new layer: interpreting how those words are delivered. Today's AI systems use deep learning to identify these paralinguistic features in real time. Sophisticated algorithms process acoustic data to detect stress, enthusiasm, hesitation or frustration, providing machines with emotional awareness that was once the sole domain of human intuition. Applications Across Industries In digital learning environments, emotional AI can help personalize content delivery. For example, voice-enabled tutoring systems can detect confusion or boredom and adapt the pace or style of teaching. In recruitment, analyzing candidate stress levels or communication style during voice interviews may offer additional insights, though this also raises ethical questions around fairness and consent. In mental health, researchers and startups are analyzing speech patterns to detect early signs of depression, anxiety or cognitive decline. Voice biomarkers can offer a non-invasive, scalable method for screening and monitoring psychological health. In customer service, AI-driven voice systems are trained to adapt based on the caller's emotions. For example, a trained system may detect frustration in a caller's voice. As a result, the case can be escalated to a human agent with specialized training. This emotional routing can reduce churn and improve satisfaction. In safety-critical environments like aviation or automotive, voice systems could be explored to monitor stress and fatigue levels in real time, potentially preventing accidents before they occur. How Emotional AI Works So, how exactly does AI learn to recognize emotions in our voices? At the core of these capabilities is advanced signal processing. AI models analyze pitch contours, speech rate, energy and spectral patterns. Deep learning architectures, such as LSTMs and transformers, are trained on thousands of labeled voice samples to recognize emotion with increasing accuracy. Some models also incorporate context: not just what was said and how it was said, but also when and where. This multimodal awareness, combining voice with video and environmental data, enhances reliability in real-world applications. Ethical Considerations Responsible development of emotional AI depends on a few key best practices. When working with emotional AI, consent is a primary concern. Users may not realize their emotional state is being inferred, particularly if the AI does so passively. Transparency in system design and communication is essential. In light of all of this, championing user consent and clear disclosure when emotional data is being processed is paramount. Bias is another issue. Emotional expression varies across cultures and individuals. AI models trained on narrow datasets may misinterpret non-Western or neurodivergent speech patterns, leading to inaccurate or unfair outcomes. To address this, organizations must audit their models to account for cultural, linguistic and demographic diversity. Privacy is also at stake. Emotional data can be more revealing than words. If mishandled, this information could be used for manipulation, profiling or unauthorized surveillance. To help ensure emotional AI systems are not only powerful but also worthy of user trust, organizations must prioritize on-device processing of data, especially in sensitive contexts like healthcare The Future Of Emotional AI What could it mean for our daily lives when machines start to understand how we feel? Emotional AI is still evolving, but its trajectory is clear. Future systems will combine voice with facial recognition and contextual data to create holistic emotional profiles. These developments could lead to more empathetic virtual assistants, more responsive healthcare bots and safer autonomous systems. However, the future must be guided by principles of fairness, transparency and privacy. As we build machines that listen not just to our words but to our emotions, the responsibility to use that power ethically becomes essential. AI is learning to hear us better. Now we must teach it to listen wisely. As someone who's worked closely with both voice recognition systems and the humans they aim to serve, I believe the goal isn't to replace human empathy but to build machines that can complement it. When used ethically and responsibly, emotional AI has the potential to bridge the gap between data and human connection in powerful, lasting ways. Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?

Wall Street Journal
28-06-2025
- Science
- Wall Street Journal
It's Known as ‘The List'—and It's a Secret File of AI Geniuses
All over Silicon Valley, the brightest minds in AI are buzzing about 'The List,' a compilation of the most talented engineers and researchers in artificial intelligence that Mark Zuckerberg has spent months putting together. Lucas Beyer works in multimodal vision-language research and describes himself as 'a scientist dedicated to the creation of awesomeness.' Yu Zhang specializes in automatic speech recognition and barely has an online presence besides his influential papers. Misha Bilenko is an expert in large-scale machine learning who also enjoys hiking and skiing—or, as he puts it on his website, 'applying hill-climbing search and gradient descent algorithms to real-world domains.'
Yahoo
24-06-2025
- Science
- Yahoo
Howard University and Google Research Enhance A.I. Speech Recognition of African American English
Howard University and Google researchers release dataset of over 600 hours of African American English dialects to improve AI speech recognition Howard University and Google Research elevate Black American dialects in AI WASHINGTON, D.C., June 24, 2025 (GLOBE NEWSWIRE) -- Howard University and Google Research released data today which can be used by artificial intelligence developers to improve the experience of Black people using automatic speech recognition (ASR) technology. Through the partnership, Project Elevate Black Voices, researchers traveled across the United States to catalogue dialects and diction used frequently in Black communities but often not recognized or misconstrued by artificial intelligence-driven technologies, making it more difficult for many Black individuals to engage with the technology. African American English (AAE), African American Vernacular, Black English, Black talk, or Ebonics is a rich language rooted in history and culture. Because of inherent bias in the development process, incorrect results are sometimes generated when Black users vocalize commands to AI-driven technology. Many Black users have needed to inauthentically change their voice patterns away from their natural accents to be understood by voice products. 'African American English has been at the forefront of United States culture since almost the beginning of the country' said Gloria Washington, Ph.D., Howard University researcher and co-principal investigator of Project Elevate Black Voices and Howard University researcher. Voice assistant technology should understand different dialects of all African American English to truly serve not just African Americans, but other persons who speak these unique dialects. It's about time that we provide the best experience for all users of these technologies.' Researchers collected 600 hours of data from users of different AAE dialects in an effort to address implicit barriers to improving ASR performance. Thirty-two states are represented in the dataset. They found that there is a lack of natural AAE speech found within speech data because Black users have been implicitly conditioned to change their voices when using ASR-based technology. Even when data is available, in-product AAE is difficult to leverage because of code-switching. "Working with our outstanding partners at Howard University on Project Elevate Black Voices has been a tremendous and personal honor,' said Courtney Heldreth, co-principal investigator at Google Research. 'It's our mission at Google to make technology that's useful and accessible, and I truly believe that our work here will allow more users to express themselves authentically when using smart devices.' Howard University will retain ownership of the dataset and licensing, and serve as stewards for its responsible use, ensuring the data benefits Black communities. Google can also use the dataset to improve its own products, ensuring that their tools work for more people. Google performs this type of model training work with all sorts of dialects, languages, and accents around the US and the world. 'As a community-based researcher, I wanted to carefully curate the community activations to be a safe and trusted space for members of the community to share their experiences about tech and AI and to also ask those uncomfortable questions regarding data privacy,' said Lucretia Williams, Ph.D., project lead and Howard University researcher. The project team adopted a community-centric approach to audio data collection by organizing curated events in several cities, centering around Black panelists who both live and work in those communities. These panelists facilitated open and transparent discussions focused on Black culture, the intersection of technology and Black experiences, the growing presence of AI, and the importance of the Black community's active participation in innovation. At the end of each event, the team introduced a three-week audio data collection initiative, inviting participants to sign up and contribute their voices and experiences to the project. The Howard African American English Dataset 1.0 will initially be made available exclusively to researchers and institutions within historically Black colleges and universities to ensure that the data is employed in ways that reflect the interests and needs of marginalized communities, specifically African American communities whose linguistic practices have often been excluded or misrepresented in computational systems. The release of this dataset to entities outside the HBCU network will be held for consideration at a later date, with the intention of prioritizing those whose work aligns with the values of inclusivity, empowerment, and community-driven research. ### About Howard University Howard University, established in 1867, is a leading private research university based in Washington, D.C. Howard's 14 schools and colleges offer 140 undergraduate, graduate, and professional degree programs and lead the nation in awarding doctoral degrees to African American students. Howard is the top-ranked historically Black college or university according to Forbes and is the only HBCU ranked among U.S. News & World Report's Top 100 National Universities. Renowned for its esteemed faculty, high achieving students, and commitment to excellence, leadership, truth and service, Howard produces distinguished alumni across all sectors, including the first Black U.S. Supreme Court justice and the first woman U.S. vice president; Schwarzman, Marshall, Rhodes and Truman Scholars; prestigious fellows; and over 165 Fulbright recipients. Learn more at Attachment Howard University and Google Research elevate Black American dialects in AI CONTACT: Carol Wilkerson Howard University 202-288-7071 in to access your portfolio