Latest news with #AIKosh


Time of India
3 days ago
- Science
- Time of India
AI datasets by IIT-Bombay to simplify Indian texts, help in AI research
AI datasets by IIT-Bombay to simplify Indian texts, help in AI research (ANI) MUMBAI: For years, research in Indian knowledge systems, often available in Indian languages such as Sanskrit, was challenging for researchers. However, a data curation exercise carried out by the premier IIT-Bombay, as part of its contribution to the central govt's AIKosh portal, has simplified it to some extent by digitising 30 different textbooks. A dataset containing around 2.18 lakh sentences with 1.5 million words from these textbooks, covering diverse topics such as astronomy, medicine, and mathematics, with some even as old as 18 centuries, is now available on the govt portal. AIKosh, launched in March, is a source for datasets, models, toolkits, and more from diverse sources that aim to help AI-based innovation and research. IIT-Bombay, one of the leading contributors to the AIKosh platform, along with BharatGen, a consortium of seven institutes again led by IIT-Bombay, has contributed 37 diverse models and datasets on the portal so far. IIT-Bombay alone launched around 16 culturally significant datasets on the platform to contribute to the country's AI mission. BharatGen, funded through a section 8 company formed by the Department of Science and Technology with IIT-Bombay, IIT-Kanpur, IIT-Madras, IIT-Hyderabad, IIT-Mandi, IIM-Indore, and IIIT-Hyderabad as partners, launched 21 models on the portal. 'We are not only researching Large Language Models (LLMs) and other generative models for AI that are effective and data and compute efficient, but also building sovereign models for India from the ground up. We are creating datasets for training these models and fine-tuning them for downstream tasks such as conversation and question-answering, while creating benchmarking datasets towards calibrating the performance of these models,' said Prof Ganesh Ramakrishnan from IIT-Bombay, who is spearheading the project. The team has not only put out datasets relevant to the Indian knowledge systems but also others that can help in audio-visual learning, such as tutorials capturing practical skills like waste-to-toy creation or organic farming. There is also one on Sanskrit translation for contemporary prose, a math word problems dataset in Hindi and English which will train the AI in mathematical reasoning, and culturally-grounded multi-lingual question-answering datasets, including questions and answers from historian Dharampal's books, among others. One of the datasets also enables the AI to answer questions about images using external knowledge, and another interesting one is on recognising text in videos with camera movements. Most of these models are trained from scratch, not just fine-tuned, said Prof Ramakrishnan. The models also uniquely balance Indian data alongside English data, ensuring relevance to our country, he said. 'We are creating benchmarks for the AI ecosystem in the country, but these can be pulled out by researchers, enterprisers, companies, or even academia and developed further,' he added.


Hindustan Times
3 days ago
- Science
- Hindustan Times
IIT-Bombay leads push for India-centric AI
Mumbai: Indian Institute of Technology (IIT) Bombay has released 16 new datasets on AIKosh, the central government's platform that provides a repository of datasets to enable artificial intelligence (AI) innovation. This is a major step in developing AI that understands India's linguistic and cultural landscape, professor Ganesh Ramakrishnan, from IIT Bombay. These datasets will support innovation and research in AI and machine learning (ML), especially in areas involving Indian languages, scripts, documents, media, and audiovisual content. The effort is part of BharatGen, a multilingual large language model (LLM) initiative led by IIT Bombay and funded by the Department of Science and Technology. So far, BharatGen has contributed 16 India centric datasets and launched 21 AI models on AIKosh. The initiative includes top institutions such as the International Institute of Information Technology in Hyderabad and the IITs of Kanpur, Mandi, Madras, Hyderabad, Indore. IIT Bombay's datasets are designed to build a solid foundation for developing Indian AI tools and applications. These include over 218,000 sentences for improving digitisation of Sanskrit texts, audio-visual data on practical skills like upcycling discarded materials into toys and organic farming, English-Sanskrit translations with 53,000 sentences for modern prose, over 78 hours of Sanskrit audio for speech recognition, multilingual question-answer sets in 11 Indian languages, including Hindi and English, math word problems in Hindi and English for AI reasoning, and table detection datasets in 14 Indian languages. The datasets include visual question answering models (a system capable of answering questions related to an image), datasets to improve translation accuracy and recognize text in videos, a comprehensive overview of Indian Knowledge Systems (IKS), cross-lingual video and text retrieval in seven Indian languages (allowing AI to retrieve relevant information when the document is written in a different language from the query), and handwritten and printed text detection datasets. These datasets and models are part of a broader effort by IIT Bombay and BharatGen to build sovereign AI models for India aligned with the India AI Mission, a central government initiative that aims to build an ecosystem that allows AI innovation by enhancing data quality and facilitating computer access. The team is not just fine-tuning existing models, but training new ones from scratch using Indian data. They are also building benchmarks to test these models for Indian use in conversation and education. A major highlight of this initiative is the launch of 'Param 1', a bilingual foundational language model with 2.9 billion parameters. It supports both English and Hindi and has been trained on 36% Indic language data—significantly more than international models like Meta's Llama, which had less than 0.01%. 'Pre-training (the initial stage of training a machine learning model on a large dataset) is an enormous undertaking and often a barrier for many. That's why we've taken on this challenge,' professor Ramakrishnan, lead of BharatGen. Developers can now fine-tune Param 1 to build Indic chatbots, copilots (virtual assistants for research), and knowledge systems. 'We hope our efforts toward creating a sovereign Generative AI ecosystem and milestones such as the release of such LLM model checkpoints, serves as a foundation for India-specific solutions,' said professor Ramakrishnan. Alongside Param 1, BharatGen has launched over 20 speech models across 19 Indian languages. These include speaker adaptive text-to-speech (TTS) systems that can mimic a speaker's voice in languages like Hindi, Tamil, Telugu, Marathi, and Bengali. Advanced speaker-conditioned TTS models and automatic speech recognition systems have also been developed to make voice-based applications more natural and inclusive. 'Our goal is not just to build AI models but to provide resources that startups and system integrators can leverage,' said professor Ramakrishnan.


Time of India
4 days ago
- Business
- Time of India
India AI: 3 more startups to build indigenous foundation model; common compute capacity expanded
New Delhi: After Sarvam AI , India on Friday selected three more startups -- Soket AI , Gan AI, Gnani AI -- for building indigenous artificial intelligence foundation models. In line with its global AI ambitions backed by a comprehensive plan that entails enhanced AI infrastructure and local language model development, India has also announced availability of 16,000 more GPUs that would take the compute facility available to startups and researchers here to 34,000, with the support of industry partners. The expanded compute capacity on cloud will provide a common computational AI platform for training and inference, crucial to develop indigenous foundational models and AI solutions tailored to the Indian context. IT Minister Ashwini Vaishnaw said significant progress has been made on India AI Mission, with focus on democratisation of technology. The compute facility supercharged with 34,000 GPUs will enable India to develop AI ecosystem in a big way, he said. Seven bidders have offered their commercials for various categories of AI compute units (GPUs). These include Cyfuture India, Ishan Infotech, Locuz Enterprise Solutions, Netmagic IT Services, Sify Digital Services, Vensysco Technologies, and Yotta Data Services. At the same time, three more teams -- Soket AI, Gan AI, Gnani AI -- have been selected for building indigenous artificial intelligence foundation models. "Like Sarvam, these three teams also have a very big target ahead of them. Whichever sector they focus on, they must be among the top five in the world," Vaishnaw said. Put simply, foundation models in generative AI are large, pre-trained models that form the base for a variety of AI applications. The Minister further said that 367 datasets have already been uploaded to AI Kosh. He also highlighted IndiaAI Mission's role in driving reverse brain drain, and creating a comprehensive ecosystem entailing foundational models, compute capacity, safety standards, and talent development initiatives. Vaishnaw emphasised that these efforts are aimed at building a complete and inclusive AI ecosystem in India. In April this year, Sarvam AI was selected to build India's first indigenous AI foundational model, marking a key milestone in the country's AI innovation ecosystem. Soket AI will develop open source 120 billion parametres foundation model optimised for the country's linguistic diversity targeting sectors such as defence, healthcare, and education. Gan AI will create 70 billion parameters of multilingual foundation model targeting capabilities to surpass the current global leader. Gnani AI will build a 14 billion parameter Voice AI foundation model delivering multilingal real-time speech processing with advances reasoning capabilities. Ganesh Gopalan, Co-Founder and CEO of said in a statement, "We are honoured to be selected under the IndiaAI Mission to develop large language models that truly represent India's linguistic diversity. At our mission has always been to make technology more inclusive and accessible". Gopalan further said is keen to "lead the way in developing voice-to-voice large language models for India and the world, because we believe transformative AI must speak the language of the people it serves". Meanwhile, under the IndiaAI Applications Development Initiative, Vaishnaw also announced the winners of the IndiaAI I4C CyberGuard AI Hackathon, jointly organised with Indian Cyber Crime Coordination Centre (I4C), Ministry of Home Affairs. "The Hackathon resulted in the development of AI-based solutions to enhance the classification of cybercrime complaints and support the identification of emerging crime patterns, trends, and modus operandi on the National Cyber Crime Reporting Portal (NCRP). These models can interpret complex inputs such as handwritten FIRs, screenshots, and audio calls with improved speed and accuracy," an official release said. PTI


Time of India
4 days ago
- Business
- Time of India
Government selects 3 more teams for foundation models of AI
Representative image NEW DELHI: India is broadening its efforts to develop AI foundation models. After Sarvam AI, the government on Friday selected three more teams - Soket AI, Gan AI, and Gnani AI - for building indigenous AI models. IT and electronics minister Ashwini Vaishnaw said the country has 367 data sets loaded on AI Kosh. "So the app ecosystem is also now developing. In a sense, the entire ecosystem is now getting built." Govt has also announced the availability of 16,000 more GPUs, which would take the compute facility available to startups and researchers to 34,000. Vaishnaw said significant progress was made on the India AI Mission, with a focus on the "democratisation of technology". The compute facility, supercharged with 34,000 GPUs, will enable India to develop the AI ecosystem in a big way. "I would like to make some mention about the three teams that were selected today. Like Sarvam, these three teams also have a very big target ahead of them. Whichever sector they focus on, they must be among the top five in the world," Vaishnaw said. Stay informed with the latest business news, updates on bank holidays and public holidays . AI Masterclass for Students. Upskill Young Ones Today!– Join Now


Hans India
4 days ago
- Business
- Hans India
Common compute capacity surpasses 34,000 GPUs in India: Ashwini Vaishnaw
New Delhi: India's national compute capacity has crossed 34,000 GPUs, the government said on Friday, adding that three new startups have been selected to build AI foundation models. Union Minister for Electronics and IT, Ashwini Vaishnaw, said that 367 datasets have already been uploaded to 'AI Kosh'. The minister also underlined IndiaAI Mission's role in fostering reverse brain drain and creating a comprehensive ecosystem encompassing foundational models, compute capacity, safety standards, and talent development initiatives. He emphasised that these efforts are aimed at building a complete and inclusive AI ecosystem in India. The 'IndiaAI Foundation Model' pillar within the India AI Mission aims to develop and deploy indigenous foundational models trained on India-specific data. Till April 30, 506 proposals have been received. On April 26, Sarvam AI was selected to build India's sovereign large language model (LLM) ecosystem, developing an open-source 120 billion parameter AI model to enhance governance and public service access through use cases like "2047: Citizen Connect" and "AI4Pragati". This follows the earlier launch of the Sarvam-1 model (2 billion parameters) and the Sarvam-M (24B parameters) model with hybrid reasoning capabilities. Vaishnaw urged the newly selected teams under the IndiaAI Mission to aim for a top-five global position in their respective sectors. Soket AI will develop India's first open-source 120 billion parameter foundation model optimised for the country's linguistic diversity, targeting sectors such as defence, healthcare, and education. Gnani AI will build a 14 billion parameter Voice AI foundation model delivering multilingual, real-time speech processing with advanced reasoning capabilities, while Gan AI will create a 70 billion parameter multilingual foundation model targeting "Superhuman TTS (text-to-speech)" capabilities to surpass current global leaders. Emphasising Prime Minister Narendra Modi's vision of democratisation of technology, Vaishnaw said, 'Technology should not be left in the hands of a few. It's very important that a larger section of society should be able to access technology, develop new solutions and get better opportunities.'