Latest news with #Munsit

Khaleej Times
20-05-2025
- Business
- Khaleej Times
This UAE-built AI model understands Arabic and is better than ChatGPT, claims creator
As artificial intelligence continues to reshape industries worldwide, the development of locally built, culturally attuned technologies is gaining momentum. In the UAE, one company is tackling a long-standing challenge: accurate Arabic speech recognition. CNTXT AI, based in Dubai, recently unveiled Munsit — a speech-to-text model designed specifically for Arabic, trained on thousands of hours of regional audio data. We spoke with Mohammad Abu Sheikh, CEO of CNTXT AI, about why building this technology in the UAE matters, the complexities of Arabic dialects, and what this means for the future of AI in the region. What inspired you to build an Arabic speech recognition model here in the UAE, when global giants already dominate the field? We built Munsit because global tech giants weren't solving our problem. Arabic voice tech has long been underserved. Most models are designed for English, then retrofitted for Arabic, leading to low accuracy and misunderstood dialects. We saw a clear need and felt a responsibility to act. The UAE, with its AI-first vision and infrastructure, offered the ideal launchpad. It's a country committed not just to adopting AI, but building it. That's what led to Munsit: a model engineered from the ground up for Arabic, reflecting our dialects, our data, and our region. We wanted to accelerate the shift from AI consumer to AI producer. Many talk about the limitations of Arabic in tech, but few have tackled it at this scale. What were the linguistic or cultural challenges you faced, and how did you overcome them? While many see Arabic as too complex for AI, we see it as a strategic opportunity. The real challenge wasn't the language, it was the data. Less than 5 per cent of online content is in Arabic, and even less is usable for training. If data is the new oil, then unstructured data is oil unrefined — full of potential but useless until processed. Without high-quality data, you can't build high-performing models, so we solved this problem ourselves. We developed a data pipeline from scratch using weak supervision — a scalable, algorithmic approach that processed over 30,000 hours of raw Arabic audio and refined it into a clean, high-quality dataset ready for large-scale training. That gave us the foundation to train Munsit on how Arabic is actually spoken, at a speed and cost traditional methods simply couldn't match. How did you source such a large and representative Arabic dataset, and what did you learn about the voices of the region in the process? We built our Arabic speech dataset from scratch, sourcing voices from a wide range of real-world environments — news broadcasts, casual conversations, public archives, and everyday interactions across the region. We captured dialectal variation and quickly realised we were documenting the lived experience behind the language. These differences, shaped by history, geography, and culture, are more than linguistic. They're expressions of identity and belonging. CNTXT AI calls this a 'sovereign technology' — what does that mean for the UAE's place in global AI development? Sovereign AI means full ownership of the data, the infrastructure, and the outcomes. In the UAE, that translates into national investment and AI readiness at every level. Munsit is a result of that vision: built locally, deployed securely, and aligned with the country's digital priorities. The UAE is defining its own path in AI; building models that reflect regional identity and serve local needs. Data sovereignty is central to that mission. Data is precious, and it must remain in our hands. That's how the UAE moves from participant to standard-setter in global AI — exporting trusted, culturally grounded technology. What does this breakthrough mean for everyday Arabic speakers, especially in education, public services, or content creation? Arabic speakers now have a model that understands them in real time, with contextual accuracy and speed. In education, it enables dialect-aware tools for early learners and non-literate users. Imagine Emirati ed-tech platforms offering voice feedback that reflects how students actually speak. In government, it addresses dialect diversity, especially in judicial settings where interpretation can break down. Munsit detects these differences, transcribes accurately, and localizes output into formats like Emirati Arabic. It powers fast, scalable transcription and indexing in media, making Arabic content easier to find, distribute, and monetise. How big of a role did homegrown talent play in building Munsit, and do you see this as a turning point for young AI developers in the UAE? Munsit was shaped by homegrown talent — every layer reflects regional hands and regional voices. And yes, this is a turning point. You don't need to leave the region to build breakthrough AI. The infrastructure is here. The capital is here. The ambition is here. The ecosystem is ready. You can invent, and not just implement, from the region and lead globally. It's validation for the next generation: world-class AI can, and will, be built right here. What comes next for Munsit and for Arabic voice AI as a whole? What's next? A new generation of Arabic-first products, designed here and deployed globally. Munsit serves as the voice layer in our broader AI stack alongside tools for preparing, testing, and deploying AI in a sovereign way. From this foundation, we're expanding fast: domain-specific voice agents and multilingual dialect switching. One of the most exciting developments : our Arabic Text-to-Speech suite, launching with Emirati and Saudi dialects. With native voice talent onboarded, we're delivering the region's fastest, most accurate Arabic TTS, a major step toward full-stack voice infrastructure. What would you tell a young developer or linguist in the UAE who dreams of building world-class tech, right here? Start now. Move fast. You don't need permission. You're already in one of the most AI-ready nations on earth. So build. Don't just dream of catching up. Dream of leading. Because if we don't build the future in our language, solving our own problems, who will?


Zawya
30-04-2025
- Business
- Zawya
CNTXT AI unveils Munsit: The most accurate Arabic speech recognition model
Built in the UAE, Munsit sets a new global standard for Arabic speech recognition, powering seamless transcription across private and public services DUBAI, UAE – CNTXT AI, the UAE-based Data and AI company, today announced the launch of Munsit — a next-generation Arabic speech-to-text model that outperforms every global model on Arabic, including those from OpenAI, Meta, Microsoft and ElevenLabs. Munsit' — derived from the Arabic root for 'to listen' — symbolizes a breakthrough in voice technology that truly listens with attentiveness and understands the richness of Arabic speech. Developed entirely in the UAE, Munsit sets a new benchmark for transcription accuracy across Modern Standard Arabic and 25+ dialects, enabling seamless Arabic voice data processing across real-world applications. This breakthrough reflects CNTXT AI's mission to build sovereign technology — AI built in the region, for the region — that competes globally. The model is available now via API, and on-premises deployment for organizations seeking full data control. How Munsit Powers Arabic Voice Solutions Munsit is designed to deliver highly accurate Arabic transcription across diverse, real-world scenarios. Addressing the increasing demand for reliable Arabic language solutions, Munsit empowers essential applications, including: Subtitling for Content Creators: Automatically generates precise Arabic subtitles for films, videos and podcasts. Meeting Notes and Minute-Taking: Transcribes meetings and discussions into Arabic, supporting official documentation and efficient record-keeping. Call Center Support: Converts voice messages and chatbot interactions in Arabic into text, streamlining feedback and quality assurance processes. Government and Public Services: Offers transcription and dialect comprehension services tailored for public sector needs, such as processing citizen requests and ensuring accessible communication. Built for Arabic, Trained on Real Voices To create Munsit, CNTXT AI processed over 30,000 hours of Arabic audio, refining it into a high-quality 15,000-hour dataset that captures a wide range of dialects, accents, age groups, and environments. Munsit is powered by advanced AI and high-performance NVIDIA infrastructure, delivering fast, accurate transcription for a variety of Arabic-speaking use cases — from call centers and public services to education and media. Leading Global Performance in Arabic AI Benchmarking on Hugging Face leaderboard confirmed that Munsit-1 outperformed leading global speech recognition systems — including OpenAI's Whisper and GPT-4o Transcribe, Meta's SeamlessM4T, ElevenLabs' Scribe, and Microsoft Azure's Speech-to-Text —on Arabic datasets. CNTXT AI has also released a detailed research paper, outlining the model's architecture, training methodology and evaluation results. 'Munsit is more than just a breakthrough in speech recognition — it's a declaration that Arabic belongs at the forefront of global AI,' said Mohammad Abu Sheikh, CEO of CNTXT AI. 'We've proven that world-class AI doesn't need to be imported — it can be built here, in Arabic, for Arabic. This launch sets a new standard for sovereign technology, made in the UAE and ready for the world.' A Strategic Step Toward Arabic-Language AI Leadership Munsit-1 is the first step in a broader roadmap toward a full suite of Arabic voice technologies — from TTS to AI voice assistants. 'This is only version one,' added Abu Sheikh. 'What comes next will redefine how Arabic is understood, spoken, and processed by machines — on our terms, in our language.' ABOUT CNTXT AI is a UAE-based Data and AI company that enables organizations to prepare, build, test, deploy, and scale sovereign AI solutions while maintaining full data control. Our comprehensive suite of solutions transforms data into actionable AI applications—seamlessly, securely, and without compromising sovereignty. From AI-ready data pipelines to scalable deployment and industry-standard validation, we ensure AI adoption is practical, compliant, and optimized for real-world impact.