logo
AI datasets by IIT-Bombay to simplify Indian texts, help in AI research

AI datasets by IIT-Bombay to simplify Indian texts, help in AI research

Time of India2 days ago

AI datasets by IIT-Bombay to simplify Indian texts, help in AI research (ANI)
MUMBAI: For years, research in Indian knowledge systems, often available in Indian languages such as Sanskrit, was challenging for researchers. However, a data curation exercise carried out by the premier IIT-Bombay, as part of its contribution to the central govt's AIKosh portal, has simplified it to some extent by digitising 30 different textbooks.
A dataset containing around 2.18 lakh sentences with 1.5 million words from these textbooks, covering diverse topics such as astronomy, medicine, and mathematics, with some even as old as 18 centuries, is now available on the govt portal.
AIKosh, launched in March, is a source for datasets, models, toolkits, and more from diverse sources that aim to help AI-based innovation and research. IIT-Bombay, one of the leading contributors to the AIKosh platform, along with BharatGen, a consortium of seven institutes again led by IIT-Bombay, has contributed 37 diverse models and datasets on the portal so far.
IIT-Bombay alone launched around 16 culturally significant datasets on the platform to contribute to the country's AI mission.
BharatGen, funded through a section 8 company formed by the Department of Science and Technology with IIT-Bombay, IIT-Kanpur, IIT-Madras, IIT-Hyderabad, IIT-Mandi, IIM-Indore, and IIIT-Hyderabad as partners, launched 21 models on the portal.
'We are not only researching Large Language Models (LLMs) and other generative models for AI that are effective and data and compute efficient, but also building sovereign models for India from the ground up.
We are creating datasets for training these models and fine-tuning them for downstream tasks such as conversation and question-answering, while creating benchmarking datasets towards calibrating the performance of these models,' said Prof Ganesh Ramakrishnan from IIT-Bombay, who is spearheading the project.
The team has not only put out datasets relevant to the Indian knowledge systems but also others that can help in audio-visual learning, such as tutorials capturing practical skills like waste-to-toy creation or organic farming.
There is also one on Sanskrit translation for contemporary prose, a math word problems dataset in Hindi and English which will train the AI in mathematical reasoning, and culturally-grounded multi-lingual question-answering datasets, including questions and answers from historian Dharampal's books, among others.
One of the datasets also enables the AI to answer questions about images using external knowledge, and another interesting one is on recognising text in videos with camera movements.
Most of these models are trained from scratch, not just fine-tuned, said Prof Ramakrishnan. The models also uniquely balance Indian data alongside English data, ensuring relevance to our country, he said. 'We are creating benchmarks for the AI ecosystem in the country, but these can be pulled out by researchers, enterprisers, companies, or even academia and developed further,' he added.

Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

IGNOU launches MBA in Hindi and Odia to break language barriers
IGNOU launches MBA in Hindi and Odia to break language barriers

India Today

time32 minutes ago

  • India Today

IGNOU launches MBA in Hindi and Odia to break language barriers

In a significant step toward inclusive education, the Indira Gandhi National Open University (IGNOU) has introduced its flagship MBA programme in Hindi and Odia. This move, announced on Monday, aligns with the goals of the National Education Policy (NEP) 2020, which champions multilingual learning and broader access to professional aim is clear: make management education more accessible to students who are more comfortable learning in their native languages. With these new offerings, IGNOU hopes to make business studies less intimidating and more relatable for learners from non-English-speaking part of the government's E-Kumbh initiative, the programme benefits from a partnership between IGNOU and the All India Council for Technical Education (AICTE). Key academic materials have been translated into Hindi and Odia using 'Anuvadini,' an AI-based tool developed by AICTE for educational content localisation. AICTE Vice Chairman Dr Abhay Jere highlighted the transformative role of AI tools in removing language as a barrier to learning. IGNOU Vice Chancellor Prof. Uma Kanjilal affirmed the university's commitment to inclusive education, revealing plans to launch MBA content in 10 more Indian languages in the multilingual MBA initiative is a step toward ensuring that no aspiring professional is left behind, regardless of their linguistic background.

Reliance only Indian firm in highly valued global technology companies list
Reliance only Indian firm in highly valued global technology companies list

Time of India

time34 minutes ago

  • Time of India

Reliance only Indian firm in highly valued global technology companies list

Billionaire Mukesh Ambani's Reliance Industries is the only Indian firm to have made it to the listing of top 30 publicly traded global technology companies, as per a 340-page report, titled 'Trends - Artificial Intelligence', that delves into the rapid global adoption and transformative impact of AI technologies. The report lists global technology companies by market capitalisation. Top eight slots on the list are occupied by US technology giants - Microsoft, Nvidia, Apple, Amazon, Alphabet, Meta Platforms, Tesla and Broadcom. Taiwan's TSMC is ranked 9th, followed by China's Tencent. Reliance, with a market capitalisation of USD 216 billion, is ranked 23rd, according to the list. "Over the past 30 years (1995-2025), just five companies remained on the top 30 most highly valued publicly traded global technology companies - Microsoft, Oracle, Cisco, IBM and AT&T," the report said. It went on to list Reliance alongside the likes of Nvidia, Apple, Amazon, Alphabet, Meta, Tesla, Alibaba, Salesforce and China Mobile as the new entrants. "In 1995, the USA had 53% (16 of 30) of the most valuable tech companies and 70 per cent (21 of 30) in 2025," it said. In 1995, Japan had 30% (9 of 30) of top tech companies and zero in 2025. The UK, Singapore, Hong Kong, Mexico and Malaysia had 1 each, but now none are on the list. "In 2025, new geographic entrants include China with 3, Germany with 2, Taiwan with 1, Netherlands with 1, South Korea with 1 and India with 1," it said. Taiwan has only one company on the list - TSMC - the company produces 80-90% of the world's most advanced semiconductors and 62% of global semiconductors. According to the report, India has the most number of ChatGPT mobile app users in the world. It accounts for 13.5% of monthly active users of the artificial intelligence (AI)-powered chatbot developed by OpenAI. It is ahead of the USA (8.9%), Indonesia (5.7 %) and Brazil (5.4%). Pakistan has 3% of users. India also accounts for 6.9% of active global users of Chinese AI app DeepSeek, behind China (33.9%) and Russia (9.2%). "Artificial intelligence is reshaping the modern landscape at breakneck speed. What began as research has scaled into emerging core infrastructure across industries - powering everything from customer support to software development, scientific discovery, education, and manufacturing," the report said. AI, it said, is accelerating, touching more domains, and becoming more embedded in how work gets done. "Catalysing this growth is the global availability of easy-to-use multimodal AI tools (like ChatGPT) on pervasive mobile devices, augmented by a steep decline in inference costs and an explosion in model availability. Both closed and open-source tools are now widely accessible and increasingly capable, enabling solo developers, startups, and enterprises alike to experiment and deploy with minimal friction," it said. Large tech incumbents are weaving AI deeper into their products - rolling out copilots, assistants, and even agents that reframe how users engage with technology. Whether through embedded intelligence in SaaS or agentic workflows in consumer apps, the interface layer is being rewritten in real time. On the compute side, investment continues to scale dramatically. Capital expenditures across major cloud providers, chipmakers, and hyperscalers have hit new highs, driven by the race to enable real-time, high-volume inference at scale. The investment is not just in chips, but also in new data centres, networking infrastructure, and energy systems to support growing demand. "Whether this level of capital expenditure persists remains to be seen, but as AI moves closer to the edge - in vehicles, farms, labs, and homes - distinction between digital and physical infrastructure continues to blur," the report said. PTI

Thore Network doubles down on multilingual AI with major investment in India tech infrastructure
Thore Network doubles down on multilingual AI with major investment in India tech infrastructure

Time of India

time34 minutes ago

  • Time of India

Thore Network doubles down on multilingual AI with major investment in India tech infrastructure

As India strides confidently into the era of digital inclusion, one company has been quietly building the pipes for an AI-powered, multilingual future. Thore Network Pvt. Ltd., an 8-year-old pioneer in blockchain and digital assets, is now turning its attention toward Indic language AI infrastructure, investing in platforms that bridge India's rich linguistic diversity with modern technology. With over 1.4 billion citizens and hundreds of dialects, India presents a unique challenge—and opportunity—for artificial intelligence. While global players dominate English-centric models, India's next tech leap lies in what experts call ' Sovereign AI ': solutions trained on local data, dialects, and cultural context. Thore Network has announced three flagship initiatives that aim to contribute to this vision: Insights AI: A multilingual search and discovery platform designed for context-aware results in Indian A dialect-first language translation and communication app, built for daily vernacular ChatBot: A native-language chatbot assistant tailored to Indian service sectors and regional queries. While not explicitly a government project, these initiatives align closely with the Bhashini framework under India's digital language inclusion efforts. 'For us, Mailjol isn't just a product. It's a philosophy of interconnectedness — making sure no language is left behind in the digital age,' said Alok Kumar, Founder & CEO of Thore Network. Organizational Restructuring & Funding Commitment Thore Network recently underwent a significant organizational overhaul, restructuring as an equity-holding company with plans to open its cap table to private placement in Q1 2025. The firm has committed an initial $500,000 to its AI language verticals, and is reportedly in active discussions for grant support and private venture capital to further scale its initiatives. 'We believe this investment is not just strategic—it's cultural. There's a responsibility to build for Bharat, and we're taking that seriously,' added Kumar. In parallel, Thore Network is preparing for a pilot rollout of an AI-based road safety and emergency response system, starting with select states. The mobile platform will integrate real-time toll tracking, driver alerts, and voice-based assistance, especially for non-English users—marking another step in their vision for citizen-first AI. Vice Chairman and Whole-Time Director Prashant Kolhe, known for his role in over 100 successful IPOs, commented on the company's roadmap: 'Our balance sheet is clean, our tech is maturing, and we've shown product-market fit in crypto and now AI. If everything aligns, 2027 could be our listing year,' Kolhe noted. 'We didn't pivot — we evolved. AI is the next logical extension of our blockchain base. Language is infrastructure,' Alok Kumar, Founder & CEO – Thore Network said. 'Execution. Building the trust layer. And reaching the next 500 million users — not in English, but in Bhojpuri, Tamil, and Konkani. We're bootstrapped, profitable, and now opening up for like-minded investors who understand Bharat's tech future. Thore Network continues to build at the intersection of decentralized infrastructure and human-centric AI, staking its claim in India's next tech decade, " He added.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into the world of global news and events? Download our app today from your preferred app store and start exploring.
app-storeplay-store