logo
#

Latest news with #GaneshRamakrishnan

AI datasets by IIT-Bombay to simplify Indian texts, help in AI research
AI datasets by IIT-Bombay to simplify Indian texts, help in AI research

Time of India

time3 days ago

  • Science
  • Time of India

AI datasets by IIT-Bombay to simplify Indian texts, help in AI research

AI datasets by IIT-Bombay to simplify Indian texts, help in AI research (ANI) MUMBAI: For years, research in Indian knowledge systems, often available in Indian languages such as Sanskrit, was challenging for researchers. However, a data curation exercise carried out by the premier IIT-Bombay, as part of its contribution to the central govt's AIKosh portal, has simplified it to some extent by digitising 30 different textbooks. A dataset containing around 2.18 lakh sentences with 1.5 million words from these textbooks, covering diverse topics such as astronomy, medicine, and mathematics, with some even as old as 18 centuries, is now available on the govt portal. AIKosh, launched in March, is a source for datasets, models, toolkits, and more from diverse sources that aim to help AI-based innovation and research. IIT-Bombay, one of the leading contributors to the AIKosh platform, along with BharatGen, a consortium of seven institutes again led by IIT-Bombay, has contributed 37 diverse models and datasets on the portal so far. IIT-Bombay alone launched around 16 culturally significant datasets on the platform to contribute to the country's AI mission. BharatGen, funded through a section 8 company formed by the Department of Science and Technology with IIT-Bombay, IIT-Kanpur, IIT-Madras, IIT-Hyderabad, IIT-Mandi, IIM-Indore, and IIIT-Hyderabad as partners, launched 21 models on the portal. 'We are not only researching Large Language Models (LLMs) and other generative models for AI that are effective and data and compute efficient, but also building sovereign models for India from the ground up. We are creating datasets for training these models and fine-tuning them for downstream tasks such as conversation and question-answering, while creating benchmarking datasets towards calibrating the performance of these models,' said Prof Ganesh Ramakrishnan from IIT-Bombay, who is spearheading the project. The team has not only put out datasets relevant to the Indian knowledge systems but also others that can help in audio-visual learning, such as tutorials capturing practical skills like waste-to-toy creation or organic farming. There is also one on Sanskrit translation for contemporary prose, a math word problems dataset in Hindi and English which will train the AI in mathematical reasoning, and culturally-grounded multi-lingual question-answering datasets, including questions and answers from historian Dharampal's books, among others. One of the datasets also enables the AI to answer questions about images using external knowledge, and another interesting one is on recognising text in videos with camera movements. Most of these models are trained from scratch, not just fine-tuned, said Prof Ramakrishnan. The models also uniquely balance Indian data alongside English data, ensuring relevance to our country, he said. 'We are creating benchmarks for the AI ecosystem in the country, but these can be pulled out by researchers, enterprisers, companies, or even academia and developed further,' he added.

Key step in democratising AI: IIT-B releases 16 datasets on AIKOSH
Key step in democratising AI: IIT-B releases 16 datasets on AIKOSH

Indian Express

time3 days ago

  • Business
  • Indian Express

Key step in democratising AI: IIT-B releases 16 datasets on AIKOSH

In an important milestone for India's Artificial Intelligence (AI) ecosystem, the Indian Institute of Technology (IIT) Bombay has released 16 diverse and culturally significant datasets on AIKOSH, India's official AI repository, making it among the biggest contributors to AIKOSH. This marks a crucial step in democratising AI by making high-quality, India-centric data openly accessible to researchers, startups and developers across the country. IIT Bombay made the announcement on X, saying that these datasets are designed to support innovation and research in AI and Machine Learning (ML), particularly in the Indian context. *IIT Bombay Releases 16 AI Datasets on AIKOSH: Enabling the Future of Responsible AI in India 🇮🇳* IIT Bombay is thrilled to announce the release of 16 diverse and culturally significant datasets on AIKOSH, the Government of India's official AI repository. These datasets are… — IIT Bombay (@iitbombay) May 30, 2025 AIKOSH, which was launched in March by the Ministry of Electronics and Information Technology, is a national platform aimed at providing support for inclusive AI development across the country. The 16 datasets by IIT Bombay are part of a larger pool of 21 AI models now available on AIKOSH, which were created by BharatGen, a Section 8 company funded by the Department of Science and Technology for indigenous AI development in India. The company is a consortium of seven partners. Led by IIT Bombay, the consortium includes IIT Kanpur, IIT Mandi, IIT Hyderabad, IIT Madras, IIM Indore and IIIT Hyderabad. Prof Ganesh Ramakrishnan, Department of Computer Science Engineering, IIT Bombay, said, 'Our goal is not just to build AI models but to provide resources that startups and system integrators can leverage, creating a favourable and sovereign AI ecosystem for India.' The datasets released on AIKOSH include handwritten and printed Indian scripts, multilingual audio data and resources designed to interpret visual and spoken inputs from Indian environments. Among the notable contributions are a large-scale Sanskrit Optical Character Recognition (OCR) dataset consisting of over 218,000 sentences from historical texts to support the digitisation of ancient Indian knowledge. There is also a speech recognition dataset with more than 78 hours of Sanskrit audio. Additional resources include capabilities for detecting tables across documents in 14 Indian languages and a comprehensive Wiki on Indian Knowledge Systems, among others. Prof Ramakrishnan said, 'Equal emphasis on India data and its provenance allows these models to uniquely balance Indian data alongside English data, ensuring true relevance and understanding for our diverse nation, while also catering to its security. These models are built with Indian linguistic and cultural nuances at their core. By making these datasets available to all thorough AIKOSH, we are democratising AI in order to foster innovations across the country, eventually to build a self-reliant and inclusive AI ecosystem for India.'

IIT-Bombay leads push for India-centric AI
IIT-Bombay leads push for India-centric AI

Hindustan Times

time3 days ago

  • Science
  • Hindustan Times

IIT-Bombay leads push for India-centric AI

Mumbai: Indian Institute of Technology (IIT) Bombay has released 16 new datasets on AIKosh, the central government's platform that provides a repository of datasets to enable artificial intelligence (AI) innovation. This is a major step in developing AI that understands India's linguistic and cultural landscape, professor Ganesh Ramakrishnan, from IIT Bombay. These datasets will support innovation and research in AI and machine learning (ML), especially in areas involving Indian languages, scripts, documents, media, and audiovisual content. The effort is part of BharatGen, a multilingual large language model (LLM) initiative led by IIT Bombay and funded by the Department of Science and Technology. So far, BharatGen has contributed 16 India centric datasets and launched 21 AI models on AIKosh. The initiative includes top institutions such as the International Institute of Information Technology in Hyderabad and the IITs of Kanpur, Mandi, Madras, Hyderabad, Indore. IIT Bombay's datasets are designed to build a solid foundation for developing Indian AI tools and applications. These include over 218,000 sentences for improving digitisation of Sanskrit texts, audio-visual data on practical skills like upcycling discarded materials into toys and organic farming, English-Sanskrit translations with 53,000 sentences for modern prose, over 78 hours of Sanskrit audio for speech recognition, multilingual question-answer sets in 11 Indian languages, including Hindi and English, math word problems in Hindi and English for AI reasoning, and table detection datasets in 14 Indian languages. The datasets include visual question answering models (a system capable of answering questions related to an image), datasets to improve translation accuracy and recognize text in videos, a comprehensive overview of Indian Knowledge Systems (IKS), cross-lingual video and text retrieval in seven Indian languages (allowing AI to retrieve relevant information when the document is written in a different language from the query), and handwritten and printed text detection datasets. These datasets and models are part of a broader effort by IIT Bombay and BharatGen to build sovereign AI models for India aligned with the India AI Mission, a central government initiative that aims to build an ecosystem that allows AI innovation by enhancing data quality and facilitating computer access. The team is not just fine-tuning existing models, but training new ones from scratch using Indian data. They are also building benchmarks to test these models for Indian use in conversation and education. A major highlight of this initiative is the launch of 'Param 1', a bilingual foundational language model with 2.9 billion parameters. It supports both English and Hindi and has been trained on 36% Indic language data—significantly more than international models like Meta's Llama, which had less than 0.01%. 'Pre-training (the initial stage of training a machine learning model on a large dataset) is an enormous undertaking and often a barrier for many. That's why we've taken on this challenge,' professor Ramakrishnan, lead of BharatGen. Developers can now fine-tune Param 1 to build Indic chatbots, copilots (virtual assistants for research), and knowledge systems. 'We hope our efforts toward creating a sovereign Generative AI ecosystem and milestones such as the release of such LLM model checkpoints, serves as a foundation for India-specific solutions,' said professor Ramakrishnan. Alongside Param 1, BharatGen has launched over 20 speech models across 19 Indian languages. These include speaker adaptive text-to-speech (TTS) systems that can mimic a speaker's voice in languages like Hindi, Tamil, Telugu, Marathi, and Bengali. Advanced speaker-conditioned TTS models and automatic speech recognition systems have also been developed to make voice-based applications more natural and inclusive. 'Our goal is not just to build AI models but to provide resources that startups and system integrators can leverage,' said professor Ramakrishnan.

IGNOU to use IIT Bombay's new AI project to deliver education in Indian languages
IGNOU to use IIT Bombay's new AI project to deliver education in Indian languages

India Today

time4 days ago

  • Business
  • India Today

IGNOU to use IIT Bombay's new AI project to deliver education in Indian languages

In a major step towards making higher education more accessible across languages, the Indira Gandhi National Open University (IGNOU) has partnered with the Indian Institute of Technology (IIT) Bombay to roll out Project Udaan -- an AI-powered translation platform that will bring academic content to learners in multiple Indian languages.A Memorandum of Understanding (MoU) was signed between the two institutions at IGNOU's headquarters on Friday. The agreement aims to support the goals of the National Education Policy (NEP) 2020, which strongly encourages multilingual education and equal learning Udaan uses artificial intelligence to translate complex technical and academic material while keeping the format and original layout intact. The platform blends Optical Character Recognition (OCR), domain-specific glossaries, and human feedback for high-quality translations. Professor Ganesh Ramakrishnan of IIT Bombay, who leads Project Udaan, said the initiative is designed to meet the linguistic needs of India's diverse learner platform aims to remove the language barrier in higher education by offering accurate, readable translations of textbooks and learning CONTENT TO BE TRANSLATED INTO REGIONAL LANGUAGESThe MoU also aligns with national initiatives such as Digital India and Bharat Bhasha, which seek to promote the use of Indian languages in the digital space and encourage inclusive AI IGNOU's reach as one of the world's largest open universities and IIT Bombay's expertise in AI and language processing, the collaboration is expected to transform access to quality education for millions of this collaboration, IGNOU's vast collection of study material will now be available in regional languages, making it easier for students from rural and non-English speaking backgrounds to study academic officials from both institutions were present during the signing ceremony. They described the project as a move towards educational innovation, equality, and empowerment -- especially for students in remote and regional breaking language barriers, Project Udaan could serve as a turning point in India's journey towards inclusive and tech-driven learning.

IGNOU, IIT Bombay launch AI initiative for regional language learning
IGNOU, IIT Bombay launch AI initiative for regional language learning

Indian Express

time5 days ago

  • Business
  • Indian Express

IGNOU, IIT Bombay launch AI initiative for regional language learning

Distance learning platform leader Indira Gandhi National Open University (IGNOU) and the Indian Institute of Technology (IIT) Bombay have signed a Memorandum of Understanding (MoU) to collaborate on Project Udaan, an AI-driven translation platform to promote inclusive and equitable education. Udaan integrates optical character recognition (OCR), layout preservation, domain-specific glossaries, and human-in-the-loop editing to deliver high-fidelity translation of academic and technical content into Indian languages. Through this collaboration, IGNOU aims to make its extensive educational resources available in multiple regional languages, thereby expanding access for learners across the country, a statement read. The collaboration supports the goals outlined in the National Education Policy (NEP) 2020, particularly the emphasis on multilingual education, digital access, and academic equity, the statement added. The signing ceremony, held on Friday at the IGNOU Headquarters, was chaired by the Vice Chancellor of IGNOU. The event was also attended by senior academic leadership. Prof. Ganesh Ramakrishnan, the chair professor in Digital Entrepreneurship at the Bank of Baroda and principal investigator of Udaan, represented IIT Bombay at the event. 'This partnership with IIT Bombay is a powerful alignment with our vision to democratize higher education. Through Project Udaan, we aim to bridge the language divide and empower learners in every corner of India,' the V-C of IGNOU said. Prof. Ramakrishnan said, 'Udaan is designed to serve the linguistic and educational diversity of India. This MoU marks a critical step in translating that vision into impact at scale, with IGNOU's vast learner base.' Prof. Nayantara Padhi, a professor at the School of Management Studies and MoU coordinator from IGNOU, said, 'This collaboration reflects IGNOU's commitment to educational equity and innovation.' 'By integrating AI-enabled translation tools with our academic delivery, we are opening new doors for learners in regional and rural contexts who will now have access to high-quality content in their own languages,' she added. 'It also contributes to national missions like Digital India and Bharat Bhasha, strengthening India's position in culturally grounded and inclusive AI research,' the notification read.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into the world of global news and events? Download our app today from your preferred app store and start exploring.
app-storeplay-store