logo
Inside India's two-track strategy to become an AI powerhouse

Inside India's two-track strategy to become an AI powerhouse

Minta day ago
Bengaluru: At Google's annual I/O Connect event in Bengaluru this July, the spotlight was on India's AI ambitions. With over 1,800 developers in attendance, the recurring theme echoing across various panel discussions, product announcements and workshops was that of building AI capability for India's linguistic diversity.
With 22 official languages and hundreds of spoken dialects, India faces a monumental challenge in building AI systems that can work across this multilingual landscape.
In the demo area of the event, this challenge was front and centre, with startups showcasing how they're tackling it. Among those were Sarvam AI, demonstrating Sarvam-Translate, a multilingual model fine-tuned on Google's open-source large language model (LLM), Gemma.
Next to it, CoRover demonstrated BharatGPT, a chatbot for public services such as the one used by the Indian Railway Catering and Tourism Corporation (IRCTC).
At the event, Google announced that AI startups Sarvam, Soket AI and Gnani are building the next generation of India AI models, fine-tuning them on Gemma.
At first glance, this might seem contradictory. Three of these startups are among the four selected to build India's sovereign large language models under the ₹10,300 crore IndiaAI Mission, a government initiative to develop home-grown foundational models from scratch, trained on Indian data, languages and values. So, why Gemma?
Building competitive models from scratch is a resource-heavy task involving multiple challenges and India does not have the luxury of building from scratch, in isolation. With limited high-quality training datasets, an evolving compute infrastructure and urgent market demand, the more pragmatic path is to start with what is available.
These startups are therefore taking a layered approach, fine-tuning open-source models to solve real-world problems today, while simultaneously building the data pipelines, user feedback loops and domain-specific expertise needed to train more indigenous and independent models over time.
Fine-tuning involves taking an existing large language model already trained on vast amounts of general data and teaching it to specialize further on focused and often local data, so that it can perform better in those contexts.
Build and bootstrap
Project EKA, an open-source community driven initiative led by Soket, is a sovereign LLM effort, being developed in partnership with IIT Gandhinagar, IIT Roorkee and IISc Bangalore. It is being designed from scratch by training code, infrastructure and data pipelines, all sourced within India. A 7 billion-parameter model is expected in the next four-five months, with a 120 billion-parameter model planned over a 10-month cycle.
'We've mapped four key domains: agriculture, law, education and defence," says Abhishek Upperwal, co-founder of Soket AI. 'Each has a clear dataset strategy, whether from government advisory bodies or public-sector use cases."
A key feature of the EKA pipeline is that it is entirely decoupled from foreign infrastructure. Training happens on India's GPU cloud and the resulting models will be open-sourced for public use.
The team, however, has taken a pragmatic approach, using Gemma to run initial deployments. 'The idea is not to depend on Gemma forever," Upperwal clarifies. 'It's to use what's there today to bootstrap and switch to sovereign stacks when ready."
CoRover's BharatGPT is another example of this dual strategy in action. It currently runs on a fine-tuned model, offering conversational agentic AI services in multiple Indian languages to various government clients, including IRCTC, Bharat Electronics Ltd, and Life Insurance Corporation.
'For applications in public health, railways and space, we needed a base model that could be fine-tuned quickly," says Ankush Sabharwal, CoRover's founder. 'But we have also built our own foundational LLM with Indian datasets."
Like Soket, CoRover treats the current deployments as both service delivery and dataset creation. By pre-training and fine-tuning Gemma to handle domain-specific inputs, it is trying to improve accessibility today while building a bridge to future sovereign deployments.
'You begin with an open-source model. Then you fine-tune it, add language understanding, lower latency and expand domain relevance," Sabharwal explains.
'Eventually, you'll swap out the core once your own sovereign model is ready," he adds.
Amlan Mohanty, a technology policy expert, calls India's approach an experiment in trade-offs, betting on models such as Gemma to enable rapid deployment without giving up the long-term goal of autonomy. 'It's an experiment in reducing dependency on adversarial countries, ensuring cultural representation and seeing whether firms from allies like the US will uphold those expectations," he says.
Mint reached out to Sarvam and Gnani with detailed queries regarding their use of Gemma and its relevance to their sovereign AI initiatives, but the companies did not respond.
Why local context is critical
For India, building its own AI capabilities is not just a matter of nationalistic pride or keeping up with global trends. It's more about solving problems that no foreign model can adequately address today.
Think of a migrant from Bihar working in a cement factory in rural Maharashtra, who goes to a local clinic with a persistent cough. The doctor, who speaks Marathi, shows him a chest X-ray, while the AI tool assisting the doctor explains the findings in English, in a crisp Cupertino accent, using medical assumptions based on Western body types. The migrant understands only Hindi and much of the nuance is lost. Far from being just a language problem, it's a mismatch in cultural, physiological and contextual grounding.
A rural frontline health worker in Bihar needs an AI tool that understands local medical terms in Maithili, just as a farmer in Maharashtra needs crop advisories that align with state-specific irrigation schedules. A government portal should be able to process citizen queries in 15 languages with regional variations.
These are high-impact and everyday use cases where errors can directly affect livelihoods, functioning of public services and health outcomes. Fine-tuning open models gives Indian developers a way to address these urgent and ground-level needs right now, while building the datasets, domain knowledge and infrastructure that can eventually support a truly sovereign AI stack.
This dual-track strategy is possibly one of the fastest ways forward, using open tools to bootstrap sovereign capacity from the ground up.
'We don't want to lose the momentum. Fine-tuning models like Gemma lets us solve real-world problems today in applications such as agriculture or education, while we build sovereign models from scratch," says Soket AI's Upperwal. 'These are parallel but separate threads," says Upperwal. 'One is about immediate utility, the other about long-term independence. Ultimately these threads will converge."
A strategic priority
The IndiaAI Mission is a national response to a growing geopolitical issue. As AI systems become central to education, agriculture, defence and governance, over-reliance on foreign platforms raises the risks of data exposure and loss of control.
This was highlighted last month when Microsoft abruptly cut off cloud services to Nayara Energy after European Union sanctions on its Russian-linked operations. The disruption, which was reversed only after a court intervention, raised alarms on how foreign tech providers can become geopolitical pressure points.
Around the same time, US President Donald Trump doubled tariffs on Indian imports to 50%, showing how trade and tech are increasingly being used as leverage.
Besides reducing dependence, sovereign AI systems are also important for India's critical sectors to accurately represent local values, regulatory frameworks and linguistic diversity.
Most global AI models are trained on English-dominant and Western datasets, which make them poorly equipped to handle the realities of India's multilingual population or the domain-specific complexity of its systems.
This becomes a challenge when it comes to applications such as interpreting Indian legal judgments or accounting for local crop cycles and farming practices in agriculture.
Mohanty says that sovereignty in AI isn't about isolation, but about who controls the infrastructure and who sets the terms. 'Sovereignty is basically about choice and dependencies. The more choice you have, the more sovereignty you have."
He adds that full-stack independence from chips to models is not feasible for any country, including India. Even global powers such as the US and China balance domestic development with strategic partnerships. 'Nobody has complete sovereignty or control or self-sufficiency across the stack, so you either build it yourself or you partner with a trusted ally."
Mohanty also points out that the Indian government has taken a pragmatic approach by staying agnostic to the foundational elements of its AI stack. This stance is shaped less by ideology and more by constraints such as lack of Indic data, compute capacity and ready-made open-source alternatives built for India.
India's data lacunae
Despite the momentum behind India's sovereign AI push, the lack of high-quality training data, particularly in Indian languages, continues to be one of its most fundamental roadblocks. While the country is rich in linguistic diversity, that diversity has not translated into digital data that AI systems can learn from.
Manish Gupta, director of engineering at Google DeepMind India, cited internal assessments that found that 72 of India's spoken languages, which had over 100,000 speakers, had virtually no digital presence. 'Data is the fuel of AI and 72 out of those 125 languages had zero digital data," he says.
To address this linguistic challenge for Google's India market, the company launched Project Vaani in collaboration with the Indian Institute of Science (IISc).
This initiative aims to collect voice samples across hundreds of Indian districts. The first phase captured over 14,000 hours of speech data from 80 districts, representing 59 languages, 15 of which previously had no digital datasets. The second phase expanded coverage to 160 districts and future phases aim to reach all 773 districts in India.
'There's a lot of work that goes into cleaning up the data, because sometimes the quality is not good," Gupta says, referring to the challenges of transcription and audio consistency.
Google is also developing techniques to integrate these local language capabilities into its large models.
Gupta says that learnings from widely spoken languages such as English and Hindi are helping improve performance in lower-resource languages such as Gujarati and Tamil, largely due to cross-lingual transfer capabilities built into multilingual language models.
The company's Gemma LLM incorporates Indian language capabilities derived from this body of work. Gemma ties into LLM efforts run by Indian startups through a combination of Google's technical collaborations, infrastructure guidance and by making its collected datasets publicly available.
According to Gupta, the strategy is driven by both commercial and research imperatives. India is seen as a global testbed for multilingual and low-resource AI development. Supporting local language AI, especially through partnerships with startups such as Sarvam, Soket AI and Gnani.ai, allows Google to build inclusive tools that can scale beyond India to include other linguistically complex regions in Southeast Asia and Africa.
For India's sovereign AI builders, the lack of readymade and high-quality Indic datasets means that model development and dataset creation must happen in parallel.
For the Global South
India's layered strategy to use open models now, while concurrently building sovereign models, also offers a roadmap for other countries navigating similar constraints. It's a blueprint for the Global South, where nations are wrestling with the same dilemma on how to build AI systems that reflect local languages, contexts and values without the luxury of vast compute budgets or mature data ecosystems. For these countries, fine-tuned open models offer a bridge to capability, inclusion, and control.
'Full-stack sovereignty in AI is a marathon, not a sprint," Upperwal says. 'You don't build a 120 billion model in a vacuum. You get there by deploying fast, learning fast and shifting when ready."
Singapore, Vietnam and Thailand are already exploring similar methods, using Gemma to kickstart their local LLM efforts.
By 2026, when India's sovereign LLMs, including EKA, are expected to be production-ready, Upperwal says the dual track will likely converge, and bootstrapped models will fade while homegrown systems may take their place.
But even as these startups build on open tools such as Meta's Llama or Google's Gemma, which are engineered by global tech giants, the question of dependency continues to loom. Even for open-source models, control over architecture, training techniques and infrastructure support still leans heavily on Big Tech.
While Google has open-sourced speech datasets, including Project Vaani, and extended partnerships with IndiaAI Mission startups, the terms of such openness are not always symmetrical. India's sovereign plans, therefore, depend not on shunning open models but on eventually outgrowing them.
'If Google is directed by the US government to close down its weights (model parameters), or increase API (application programming interface) prices or change transparency norms, what would the impact be on Sarvam or Soket?" questions Mohanty, adding that while the current India-US tech partnership is strong, future policies could shift and jeopardize India's digital sovereignty.
In the years ahead, India and other nations in the Global South will face a critical question over whether they can convert this borrowed support into a complete, sovereign AI infrastructure, before the terms of access shift or the window to act closes.
Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

She studied at a top U.S. university, works as a senior techie at Google, yet H-1B visa jitters shadow her American dream
She studied at a top U.S. university, works as a senior techie at Google, yet H-1B visa jitters shadow her American dream

Time of India

timean hour ago

  • Time of India

She studied at a top U.S. university, works as a senior techie at Google, yet H-1B visa jitters shadow her American dream

From Brown University to Big Tech You Might Also Like: US computer science degrees from top universities are leaving graduates jobless: Why is top coding education no longer enough? The Weight of Uncertainty Life Plans on Hold Another Risk in the Process — svembu (@svembu) For many, a degree from an Ivy League university and a career with global tech giants like Facebook and Google might sound like a perfect ticket to stability in the United States. For Indian-born Surbhi Madan, however, the story is more complicated. Despite 12 years in the US and nearly a decade at Google, she says the uncertainty of her H-1B visa status continues to shape her life in unexpected 30-year-old senior software engineer recently shared her story with Business Insider, offering a candid glimpse into the hidden insecurities behind a glittering résumé.Madan moved to the US in 2013 to pursue her bachelor's degree at Brown University, inspired by her elder brother's academic journey. After interning at Google's New York office, she secured a full-time role before graduation in 2017. Her first stroke of luck came when she won the H-1B visa lottery on her initial attempt.'I feel like I got really lucky when I compare it to the situation for recent graduates now,' she told Business career path has since been enviable: a stint with Facebook's feed-ranking team, followed by leadership roles in Google Maps infrastructure and AI integrations. But behind the professional success lies a quieter, more fragile working in the US for over a decade, Madan admits that her life often feels temporary. Everyday decisions—from apartment leases to community volunteering—are filtered through the lens of her visa status.'I refrain from volunteering because it means contacting my immigration lawyer to make sure it's safe,' she explained. Even driving mistakes or tax filing errors, she fears, could jeopardize her stay.A comment by a border officer once drove the point home: when she said she 'lived' in the US, the officer corrected her, saying, 'You don't live here; you work here.' The moment, she said, stayed with constraints of the H-1B system affect not only her career mobility but also her personal milestones. Madan has contemplated freezing her eggs but worried about whether she could access them if she lost her work authorization. 'I can't imagine having a person depend on me while I'm on a temporary status tied to having a job,' she ambitions beyond coding also face roadblocks. With a passion for teaching and mentoring women in tech , Madan has thought about transitioning into education, but her visa does not permit alternative career paths outside her sponsoring many immigrants in similar positions, Surbhi's experience underscores the paradox of the American dream: the country welcomes global talent but ties their future to the unpredictability of a lottery system.'I sit down once a year and ask myself if this is still worth it. So far, the answer has been yes,' Surbhi story adds to the growing debate about whether the US immigration system can keep pace with the realities of the modern workforce—especially when even top tech talent with world-class education faces long-term founder Sridhar Vembu recently highlighted another risk: the financial burden of overseas education. In a post on X, he shared the case of a student who borrowed ₹70 lakh (about $80,000) at a steep 12% interest rate to study at a relatively unknown US university, only to struggle repaying the loan amid poor job prospects. Vembu urged students and families to think twice before taking on such heavy debt, warning that 'we should not trap young people in debt in the name of education.'

Google agrees $36 million fine for anti-competitive deals with Australia telcos
Google agrees $36 million fine for anti-competitive deals with Australia telcos

Time of India

time2 hours ago

  • Time of India

Google agrees $36 million fine for anti-competitive deals with Australia telcos

Google agreed on Monday to pay a A$55 million ($35.8 million) fine in Australia after the consumer watchdog found it had hurt competition by paying the country's two largest telcos to pre-install its search application on Android phones, excluding rival search engines. The fine extends a bumpy period for the Alphabet-owned internet giant in Australia, where last week a court mostly ruled against it in a lawsuit brought by Fortnite maker Epic Games accusing Google and Apple of preventing rival application stores in their operating systems. Google's YouTube was also last month added to an Australian ban on social media platforms admitting users aged under 16, reversing an earlier decision to exempt the video-sharing site. On anti-competitive tie-ups with Australian telcos, the country's consumer watchdog on Monday said Google struck deals with Telstra and Optus , under which the tech giant shared with them advertising revenue generated from Google Search on Android devices between late 2019 and early 2021. Google admitted the arrangement had a substantial impact on competition from rival search engines, and has stopped signing similar deals while also agreeing to the fine, the Australian Competition and Consumer Commission ( ACCC ) added. "Today's outcome ... created the potential for millions of Australians to have greater search choice in the future, and for competing search providers to gain meaningful exposure to Australian consumers," ACCC Chair Gina-Cass Gottlieb said. Google and the ACCC have jointly submitted to the Federal Court that Google should pay the A$55 million fine. The court must still decide if the penalty is appropriate, the ACCC said, but the cooperation between the regulator and Google has helped avoid lengthy litigation. A Google spokesperson said the company was pleased to resolve the ACCC's concerns which involved "provisions that haven't been in our commercial agreements for some time". "We are committed to providing Android device makers more flexibility to pre-load browsers and search apps, while preserving the offerings and features that help them innovate, compete with Apple, and keep costs low," the spokesperson added. Google owns Android. A Telstra spokesperson referred Reuters to an earlier statement saying it and Optus, owned by Singapore Telecommunications, had fully cooperated with the ACCC and promised not to sign agreements with Google to pre-install its search product since 2024.

Mohali gets 12 new taxi stands, mayor hands over permits
Mohali gets 12 new taxi stands, mayor hands over permits

Time of India

time3 hours ago

  • Time of India

Mohali gets 12 new taxi stands, mayor hands over permits

Mohali: In a major step towards streamlining public transport services in the city, the Mohali municipal corporation has issued licences for 12 new taxi stands. Mayor Amarjit Singh Jeeti Sidhu handed over the permits to the beneficiaries on I-Day. Sidhu said, "Several taxi stands were operating without formal authorisation. We have now regularised them and issued fresh licences in line with local govt department policies." Highlighting the aim of the initiative, he said it will lead to more organised taxi services for residents, while bringing unauthorised stands under regulation. The mayor clarified that applications for these stands were submitted over the past two years and were approved following due legal procedures. Alongside the allocations, the corporation also removed unauthorised encroachments wherever necessary. According to official data, Mohali has a total of 26 taxi stands, of which 14 were previously regularised, and 12 more have now received licences. The annual licence fee for each stand is Rs 1.20 lakh, and each stand is permitted to host up to 10 taxis. The corporation has collected Rs 4-Rs 5 lakh so far through the licencing process. TNN Stay updated with the latest local news from your city on Times of India (TOI). Check upcoming bank holidays , public holidays , and current gold rates and silver prices in your area.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store