19-05-2025
Pocket FM is training its AI model to scale storytelling. Is the investment worth it?
Audio series startup Pocket FM plans to have a large language model (LLM) up and running by the end of the year. The company has already labelled and categorized its proprietary datasets and is currently testing an early version of its model.
'We're currently testing a very raw model, that is going to take some time. We're also working on getting graphic processing units," said Pocket FM co-founder and chief technology officer Prateek Dixit told Mint. Teams at the company are already working on reinforcement learning for the LLM. Pocket FM expects the LLM to be ready five to six months after that.
The startup plans to buy between 30 and 50 of Nvidia's A100 or H100 GPUs in a staggered manner. These units cost anywhere between $8,000 and $25,000 each. Despite the steep costs, Dixit views the investment as strategic.
'It's not just a cost decision, you've to understand. It's more of a strategic asset for how we scale storytelling with AI," Dixit added.
Beyond hardware, the LLM push includes infrastructure upgrades, hiring skilled AI engineers, and increased R&D investment. Pocket FM currently spends 8-13% of its revenue (approximately $26 million) on R&D, with 40% of that allocated to AI initiatives. This is expected to go up by 1% to 2%.
Pocket FM plans to build their model on top of an open-source foundation model like Meta's Llama 3, tailored specifically for storytelling. The company currently uses open-source models, fine-tuned for its genre-specific needs, but over time, it reached a point where the quality of the content plateaued. 'With our own model training, we can have a step jump in quality," Dixit said.
Popular genres of content on the platform include drama, fantasy and thrillers. The company's writers have produced thousands of stories in these categories. The plan is to use the LLM for everything ranging from story creation and comic creation to developing stories, character arcs and even AI-based videos.
'The idea is to take these foundation models and fine-tune them for different writing styles. That is the use case for comics as well," said Dixit.
Earlier this year, Pocket FM launched Pocket Toons, its webcomic platform. For this, the company created an AI-powered studio it calls Blaze to produce comics, '20x faster at one-third the cost, automating processes like background rendering, scene composition, and colouring while preserving artistic creativity," the company had said.
Pocket FM has been using natural language processing (NLP), a subset of artificial intelligence (AI), for translating across the 10 languages for which it produces audio content. Other use cases include text summarisation, metadata generation and genre tagging.
'An LLM forms the backbone of a powerful IP engine that not only drives our audio formats today but will also power future innovations across multiple storytelling mediums," said Dixit.
Is the investment worth it?
Experts are divided on whether building a domain-specific language model (DSLM) is worth the cost and can help in the long run.
It's hard to say whether a DSLM can stand the test of time, given how fast the AI industry is moving. 'I don't think building a proprietary model is a good idea where the rate of innovation is so fast in the industry," said Anushree Verma, senior director analyst at Gartner.
According to Gartner, enterprise spending on such models is expected to reach $838 million in 2025 and grow to $11.3 billion by 2028. The market is expected to grow at a compound annual growth rate of 233%.
Open-source generative AI models are emerging as a viable source for domain-specific models, rapidly closing the performance and reliability gap with proprietary models and offering a cost-effective and flexible alternative for model training and specialization, Verma added.
'Building an LLM isn't automatically a strategic advantage. In many cases, a smaller DSLM can outperform a general-purpose LLM in speed, cost-efficiency, and relevance—especially when fine-tuned on proprietary data," said Manpreet Singh Ahuja, tech, media and telecom sector leader and chief clients and alliances officer at PwC India. 'The question is not 'can we build it?' but 'should we.'"
His argument is that LLMs are only worth building when a company has a clearly established reason that current models in the market can't satisfy. If a company is unable to prove that or is not able to monetise the model itself or use it across high-scale products, the return on investment is questionable.
'Long-term value comes not from owning the model alone, but from the unique data, applications, and feedback loops built around it," Ahuja added.
However, given that Pocket FM knows the use case it wants to build for, a custom LLM can benefit them, even in the long run. What's more, building one that doesn't require them to lease GPUs for training means they don't need to worry about data security concerns and safeguarding their intellectual property.
'Over time, running your own optimized model, especially using open-source foundations, can slash inference costs by up to 80%," said Sameer Jain, managing director at Primus Partners, a global management consulting firm.
Inferencing refers to the process where a trained AI model uses its existing knowledge to make its own conclusions on data its never seen before.
Eventually, the company expects that by owning its own GPUs and LLMs, it'll be able to reduce its AI costs significantly. 'The unit cost per generation of content at scale gets reduced. We're not talking about one-time use cases. We want to continuously generate inferences from models," Dixit said, adding that they expect their inferencing cost to drop by 20-30%.
Deeper AI push
Besides LLM, Pocket FM has a co-pilot that is used internally to create content in German, English, and Hindi. The company is still fine-tuning it to work with other Indic languages like Tamil, Telegu, Kannada, Marathi, and Bengali. 'We'll be making a public launch of this tool in a few months," said Dixit.
The company is also building AI agents which can participate in every step of the story creation process, from how intense the beginning of a story should be to where a cliff hanger might be appropriate to add.
'We're building them in such a way that individual modules can act and trigger separately. A story could have a really good cliff hanger but bad pacing. I should be able to ask a model to address these specific queries," said the Pocket FM co-founder.
Meanwhile, Pocket FM is considering acquisitions for the first time, and is moving with two strategies in mind: lean AI companies building either LLMs for stories or AI-based voice and video and secondly, companies which have large writer communities.
'We're building an AI entertainment suite so it would be great to get companies that can be baked into our systems," Dixit said. While the company hasn't actively set aside money for inorganic growth, they said they're going to be opportunistic about making acquisitions.
Pocket FM is knocking on the doors of global private equity players as it looks to raise another round of money. The company is looking to raise between $100 million and $200 million, this time at a unicorn valuation, according toVCCircle in March.
The company last raised money in March 2024 in a $103 million Series-D round that was led by Lightspeed India Partners at a valuation of $750 million. So far, the company has cumulatively raised $197 million across rounds and has the likes of Brand Capital, Tencent, Stepstone Group on its cap table.
Pocket FM competitor Kuku FM is also leveraging AI for similar use cases. Kuku FM used AI for the creation of scripts of series on its platform, like 'Secret Billionaire,' 'Women of Prison' and 'Bloodstone Fortune.'
Across industries, companies are now opting to build their own models as they look to leverage the vast amounts of user data they've collected over the years. Healthify, the health and wellness startup, built their own small language model that runs on top of LLMs from OpenAI and Anthropic. Ed-tech startup Physicswallah is building smaller models to solve questions pertaining to physics, chemistry, mathematics and biology.
Strategy this year
While the US has always been Pocket FM's main revenue source, accounting for 70-75% of total revenue, the company expects the European market to take off this year.
With the $103 million raised last year, the company expanded into Europe and Latin America. Currently, Pocket FM is available in Germany and the UK, where the company entered just six months ago and claims that the two markets have already contributed to 5% of its revenue.
Instead of opting to go live simultaneously across Europe, they're staggering their entry into different nations. They'll go live in France in June, then Italy around October and finally, the Netherlands around January in 2026.
Dixit expects Europe to contribute up to 30% in two years. As a result, the company said, their revenue will 'grow multi-fold." India currently contributes 10-15% to Pocket FM's revenue. Pocket FM expects that the revenue percentage contribution will remain the same while 'its absolute revenue is expected to grow significantly, potentially 2–3 times."
The company claimed it had surpassed $200 million in terms of revenue in FY25, with an annual recurring revenue of $250 million.
In FY24, Pocket FM's revenue stood at ₹261 crore, compared to ₹130 crore in FY23, according to regulatory files accessed by business intelligence platform Tofler. The company trimmed losses to ₹16 crore in FY24 from ₹75 crore in FY23.
Founded in 2018 by Rohan Nayak, Prateek Dixit, and Nishanth KS Pocket FM started as a audio series platform. The company has since rebranded itself, changing its name to Pocket Entertainment. It now runs three verticals, Pocket FM, Pocket Novels and Pocket Toons.