Latest news with #TuhinSrivastava

Baseten Launches New Inference Products to Accelerate MVPs into Production Applications

Business Wire

21-05-2025

Business
Business Wire

Baseten Launches New Inference Products to Accelerate MVPs into Production Applications

SAN FRANCISCO--(BUSINESS WIRE)--Baseten, the leader for mission-critical inference, announced the public launch of Baseten Model APIs and the closed beta of Baseten Training today. These new products enable AI teams to seamlessly transition from rapid prototyping to scaling in production, building on Baseten's proprietary inference stack. In recent months, new releases of DeepSeek, Llama, and Qwen models erased the quality gap between open and closed models. Organizations are more incentivized than ever to use open models in their products. Many AI teams have been limited to testing open models at low scale due to insufficient performance, reliability, and economics offered by model endpoint providers. While easy to get started with, the deficiencies of these shared model endpoints have fundamentally gated enterprises' ability to convert prototypes into high-functioning products. Baseten's new products - Model APIs and Training - solve two critical bottlenecks in the AI lifecycle. Both products are built using Baseten's Inference Stack and Inference-optimized Infrastructure, which power inference at scale in production for leading AI companies like Writer, Descript, and Abridge. Using Model APIs, developers can instantly access open-source models optimized for maximum inference performance and cost-efficiency to rapidly create production-ready minimum viable products (MVPs) or test new workloads. 'In the AI market, your number one differentiator is how fast you can move,' said Tuhin Srivastava, co-founder and CEO of Baseten. 'Model APIs give developers the speed and confidence to ship AI features knowing that we've handled the heavy lifting on performance and scale.' Baseten Model APIs enable AI engineers to test open models with a confident scaling story in place from day one. As inference increases, Model APIs customers can easily transfer to Dedicated Deployments that provide greater reliability, performance, and economics at scale. "With Baseten, we now support open-source models like DeepSeek and Llama in Retool, giving users more flexibility for what they can build,' said DJ Zappegos, Engineering Manager at Retool. 'Our customers are creating AI apps and workflows, and Baseten's Model APIs deliver the enterprise-grade performance and reliability they need to ship to production." Customers can also use Baseten's new Training product to rapidly train and tune models, which will result in superior inference performance, quality, and cost-efficiency to further optimize inference workloads. Unlike traditional training solutions that operate in siloed research environments, Baseten Training runs on the same production-optimized infrastructure that powers its inference. This coherence ensures that models trained or fine-tuned on Baseten will behave consistently in production, with no last-minute refactoring. Together, the latest offerings enable customers to get products to market more rapidly, improve performance and quality, and reduce costs for mission-critical inference workloads These launches reinforce Baseten's belief that product-focused AI teams must care deeply about inference performance, cost, and quality. 'Speed, reliability, and cost-efficiency are non-negotiables, and that's where we devote 100 percent of our focus,' said Amir Haghighat, co-founder and CTO of Baseten. 'Our Baseten Inference Stack is purpose-built for production AI because you can't just have one piece work well. It takes everything working well together, which is why we ensure that each layer of the Inference Stack is optimized to work with the other pieces.' 'Having lifelike text-to-speech requires models to operate with very low latency and very high quality,' said Amu Varma, co-founder of Canopy Labs. 'We chose Baseten as our preferred inference provider for Orpheus TTS because we want our customers to have the best performance possible. Baseten's Inference Stack allows our customers to create voice applications that sound as close to human as possible.' Teams can start with a quick MVP and seamlessly scale it to a dedicated, production-grade deployment when needed, without changing platforms. An enterprise can prototype a feature on Baseten Cloud, then graduate to its own private clusters or on-prem deployment (via Baseten's hybrid and self-hosted options) for greater control, performance tuning, and cost optimization, all with the same code and tooling. This 'develop once, deploy anywhere' capability directly results from Baseten's Inference-optimized Infrastructure, which abstracts the complexity of multi-cloud and on-premise orchestration for the user. The news follows on a year of considerable growth for the company. In February, Baseten announced the close of a series C funding round co-led by IVP and Spark and which moved its total amount of venture capital funding to $135 million. It was recently named to Forbes AI 50 2025, a list of the pre-eminent privately held tech companies in AI which also featured a number of companies that Baseten powers 100 percent of the inference for, like Writer and Abridge. About Baseten Baseten is the leader in infrastructure software for high-scale AI products, offering the industry's most powerful AI inference platform. Committed to delivering exceptional performance, reliability, and cost-efficiency, Baseten is on a mission to help the next great AI products scale. Top-tier investors, including IVP, Spark, Greylock, Conviction, Base Case, and South Park Commons back Baseten. Learn more at

Alibaba launches Qwen3 'hybrid' AI model, challenges OpenAI, Google

Express Tribune

29-04-2025

Business
Express Tribune

Alibaba launches Qwen3 'hybrid' AI model, challenges OpenAI, Google

Listen to article Alibaba Group on Monday unveiled Qwen3, a new family of large language models designed to compete with leading AI systems from OpenAI and Google. The release includes eight models, ranging from 0.6 billion to 235 billion parameters, and features a combination of dense and mixture-of-experts (MoE) architectures. The Chinese tech giant claims Qwen3 matches or outperforms the capabilities of OpenAI's o3-mini and Google's Gemini 2.5 Pro in several key benchmarks, including coding, math reasoning, and complex problem-solving. Qwen3 models support 119 languages and were trained on 36 trillion tokens sourced from textbooks, code, question-answer datasets, and AI-generated material. Unlike some competitors, Alibaba has made several Qwen3 models open-weight, available for download via Hugging Face and GitHub. The flagship Qwen-3-235B-A22B model, which achieved the highest scores in tests, remains restricted for now. Qwen3 introduces a 'hybrid reasoning' approach, allowing users to toggle between faster, non-reasoning outputs and slower, deeper reasoning modes to optimize accuracy. Alibaba says this flexibility enhances efficiency and user control over AI operations. The release intensifies competition in China's AI sector, following recent launches by DeepSeek and Baidu. It also comes amid heightened US export restrictions on advanced chips to China, which could impact future model training. Qwen3 will be available through cloud providers such as Fireworks AI and Hyperbolic, offering businesses new options beyond proprietary US AI systems. Industry observers say Alibaba's move signals a rapid closing of the gap between open and closed AI models globally. "Qwen3's performance shows that open models are keeping pace," said Tuhin Srivastava, CEO of AI platform Baseten. Alibaba previously released Qwen2.5-Max in January but says Qwen3 represents a significant leap in reasoning ability, coding proficiency, and multi-language support.

Alibaba unveils Qwen 3, a family of 'hybrid' AI reasoning models

Yahoo

28-04-2025

Business
Yahoo

Alibaba unveils Qwen 3, a family of 'hybrid' AI reasoning models

Chinese tech company Alibaba on Monday released Qwen 3, a family of AI models the company claims matches and in some cases outperforms the best models available from Google and OpenAI. Most of the models are — or soon will be — available for download under an "open" license from AI dev platform Hugging Face and GitHub. They range in size from 0.6 billion parameters to 235 billion parameters. Parameters roughly correspond to a model's problem-solving skills, and models with more parameters generally perform better than those with fewer parameters. The rise of China-originated model series like Qwen have increased the pressure on American labs such as OpenAI to deliver more capable AI technologies. They've also led policymakers to implement restrictions aimed at limiting the ability of Chinese AI companies to obtain the chips necessary to train models. According to Alibaba, Qwen 3 models are "hybrid" models in the sense that they can take time and "reason" through complex problems or answer simpler requests quickly. Reasoning enables the models to effectively fact-check themselves, similar to models like OpenAI's o3, but at the cost of higher latency. "We have seamlessly integrated thinking and non-thinking modes, offering users the flexibility to control the thinking budget," wrote the Qwen team in a blog post. The Qwen 3 models support 119 languages, Alibaba says, and were trained on a data set of nearly 36 trillion tokens. Tokens are the raw bits of data that the model processes; 1 million tokens is equivalent to about 750,000 words. Alibaba says Qwen 3 was trained on a combination of textbooks, "question-answer pairs," code snippets, and more. These improvements, along with others, greatly boosted Qwen 3's performance compared to its predecessor, Qwen 2, says Alibaba. On Codeforces, a platform for programming contests, the largest Qwen 3 model — Qwen-3-235B-A22B — beats out OpenAI's o3-mini. Qwen-3-235B-A22B also bests o3-mini on the latest version of AIME, a challenging math benchmark, and BFCL, a test for assessing a model's ability to "reason" about problems. But Qwen-3-235B-A22B isn't publicly available — at least not yet. The largest public Qwen 3 model, Qwen3-32B, is still competitive with a number of proprietary and open AI models, including Chinese AI lab DeepSeek's R1. Qwen3-32B surpasses OpenAI's o1 model on several tests, including an accuracy benchmark called LiveBench. Alibaba says Qwen 3 "excels" in tool-calling capabilities as well as following instructions and copying specific data formats. In addition to releasing models for download, Qwen 3 is available from cloud providers including Fireworks AI and Hyperbolic. Tuhin Srivastava, co-founder and CEO of AI cloud host Baseten, said that Qwen 3 is another point in the trend line of open models keeping pace with closed-source systems such as OpenAI's. "The U.S. is doubling down on restricting sales of chips to China and purchases from China, but models like Qwen 3 that are state-of-the-art and open [...] will undoubtedly be used domestically," he told TechCrunch in a statement. "It reflects the reality that businesses are both building their own tools [as well as] buying off the shelf via closed-model companies like Anthropic and OpenAI." This article originally appeared on TechCrunch at

'Burn the boats': To stay at the bleeding edge, AI developers are trashing old tech fast

Business Insider

27-04-2025

Business
Business Insider

'Burn the boats': To stay at the bleeding edge, AI developers are trashing old tech fast

It's not uncommon for AI companies to fear that Nvidia will swoop in and make their work redundant. But when it happened to Tuhin Srivastava, he was perfectly calm. "This is the thing about AI — you gotta burn the boats," Srivastava, the cofounder of AI inference platform Baseten, told Business Insider. He hasn't burned his quite yet, but he's bought the kerosene. The story goes back to when DeepSeek took the AI world by storm at the beginning of this year. Srivastava and his team had been working with the model for weeks, but it was a struggle. The problem was a tangle of AI jargon, but essentially, inference, the computing process that happens when AI generates outputs, needed to be scaled up to quickly run these big, complicated, reasoning models. Multiple elements were hitting bottlenecks and slowing down delivery of the model responses, making it a lot less useful for Baseten's customers, who were clamoring for access to the model. Srivastava's company has access to Nvidia's H200 chips — the best, widely available chip that could handle the advanced model at the time — but Nvidia's inference platform was glitching. A software stack called Triton Inference Server was getting bogged down with all the inference required for DeepSeek's reasoning model R1, Srivastava said. So Baseten built their own, which they still use now. Then, in March, Jensen Huang took to the stage at the company's massive GTC conference and launched a new inference platform: Dynamo. Dynamo is open-source software that helps Nvidia chips handle the intensive inference used for reasoning models at scale. "It is essentially the operating system of an AI factory," Huang said onstage. "This was where the puck was going," Srivastava said. And Nvidia's arrival wasn't a surprise. When the juggernaut inevitably surpasses Baseten's equivalent platform, the small team will abandon what they built and switch, Srivastava said. He expects it will take a couple of months max. "Burn the boats." It's not just Nvidia making tools with its massive team and research and development budget to match. Machine learning is constantly evolving. Models get more complex and require more computing power and engineering genius to work at scale, and then they shrink again when those engineers find new efficiencies and the math changes. Researchers and developers are balancing cost, time, accuracy, and hardware inputs, and every change reshuffles the deck. "You cannot get married to a particular framework or a way of doing things," said Karl Mozurkewich, principal architect at cloud firm Valdi. "This is my favorite thing about AI," said Theo Brown, a YouTuber and developer whose company, Ping, builds AI software for other developers. "It makes these things that the industry has historically treated as super valuable and holy, and just makes them incredibly cheap and easy to throw away," he told BI. Browne spent the early years of his career coding for big companies like Twitch. When he saw a reason to start over on a coding project instead of building on top of it, he faced resistance, even when it would save time or money. Sunk cost fallacy reigned. "I had to learn that rather than waiting for them to say, 'No,' do it so fast they don't have the time to block you," Browne said. That's the mindset of many bleeding-edge builders in AI. It's also often what sets startups apart from large enterprises. Quinn Slack, CEO of AI coding platform Sourcegraph, frequently explains this to his customers when he meets with Fortune 500 companies that may have built their first AI round on shaky foundations. " I would say 80% of them get there in an hourlong meeting," he said. The firmer ground is up the stack Ben Miller, CEO of real estate investment platform Fundrise, is building an AI product for the industry, and he doesn't worry too much about the latest model. If a model works for its purpose, it works, and moving up to the latest innovation is unlikely to be worth the engineer's hours. "I'm sticking with what works well enough for as long as I can," he said. That's in part because Miller has a large organization, but it's also because he's building things farther up the stack. That stack consists of hardware at the bottom, usually Nvidia's GPUs, and then layers upon layers of software. Baseten is a few layers up from Nvidia. The AI models, like R1 and GPT-4o, are a few layers up from Baseten. And Miller is just about at the top where consumers are. "There's no guarantee you're going to grow your customer base or your revenue just because you're releasing the latest bleeding-edge feature," Mozurkewich said. "When you're in front of the end-user, there are diminishing returns to moving fast and breaking things."

Latest news with #TuhinSrivastava

Baseten Launches New Inference Products to Accelerate MVPs into Production Applications

Alibaba launches Qwen3 'hybrid' AI model, challenges OpenAI, Google

Alibaba unveils Qwen 3, a family of 'hybrid' AI reasoning models

'Burn the boats': To stay at the bleeding edge, AI developers are trashing old tech fast

Get Started Now: Download the App