Latest news with #Devstral
Yahoo
6 days ago
- Business
- Yahoo
Mistral releases a vibe coding client, Mistral Code
French AI startup Mistral is releasing its own "vibe coding" client, Mistral Code, to compete with incumbents like Windsurf, Anysphere's Cursor, and GitHub Copilot. Mistral Code, a fork of the open-source project Continue, is an AI-powered coding assistant that bundles Mistral's models, an "in-IDE" assistant, local deployment options, and enterprise tools into a single package. A private beta is available as of Wednesday for JetBrains development platforms and Microsoft's VS Code. "Our goal with Mistral Code is simple: deliver best-in-class coding models to enterprise developers, enabling everything from instant completions to multi-step refactoring through an integrated platform deployable in the cloud, on reserved capacity, or air-gapped, on-prem GPUs," Mistral wrote in a blog post provided to TechCrunch. AI programming assistants are growing increasingly popular. While they still struggle to code quality software, their promise to boost coding productivity is pushing companies and developers to adopt them rapidly. One recent poll found that 76% of developers have used or were planning to use AI tools in their development processes last year. Mistral Code is said to be powered by a combination of in-house models including Codestral (for code autocomplete), Codestral Embed (for code search and retrieval), Devstral (for "agentic" coding tasks), and Mistral Medium (for chat assistance). The client supports more than 80 programming languages and a number of third-party plugins, and can reason over things like files, terminal outputs, and issues, the company said. Mistral claimed that customers including consulting firm Capgemini, Spanish and Portuguese bank Abanca, and French national railway company SNCF are using Mistral Code in production. "Customers can fine-tune or post-train the underlying models on private repositories or distill lightweight variants," Mistral explained in its blog post. "For IT managers, a rich admin console exposes granular platform controls, deep observability, seat management, and usage analytics." Mistral said it plans to continue making improvements to Mistral Code and contribute at least a portion of those upgrades to the Continue open source project. Founded in 2023, Mistral is a frontier model lab building a range of AI-powered services, including a chatbot platform, Le Chat, and mobile apps. It is backed by venture investors like General Catalyst, and has raised over €1.1 billion (roughly $1.24 billion) to date. A few weeks ago, Mistral launched the aforementioned Codestral, Devstral, and Mistral Medium models. Around the same time, the company rolled out Le Chat Enterprise, a corporate-focused chatbot service that offers tools like an AI agent builder, and integrates Mistral's models with third-party services like Gmail, Google Drive, and SharePoint. This article originally appeared on TechCrunch at Sign in to access your portfolio


Hindustan Times
03-06-2025
- Hindustan Times
The methodology to judge AI needs realignment
When Anthropic released Claude 4 a week ago, the artificial intelligence (AI) company said these models set 'new standards for coding, advanced reasoning, and AI agents'. They cite leading scores on SWE-bench Verified, a benchmark for performance on real software engineering tasks. OpenAI also claims the o3 and o4-mini models return best scores on certain benchmarks. As does Mistral, for the open-source Devstral coding model. AI companies flexing comparative test scores is a common theme. The world of technology has for long obsessed over synthetic benchmark test scores. Processor performance, memory bandwidth, speed of storage, graphics performance — plentiful examples, often used to judge whether a PC or a smartphone was worth your time and money. Yet, experts believe it may be time to evolve methodology for AI testing, rather than a wholesale change. American venture capitalist Mary Meeker, in the latest AI Trends report, notes that AI is increasingly doing better than humans in terms of accuracy and realism. She points to the MMLU (Massive Multitask Language Understanding) benchmark, which averages AI models at 92.30% accuracy compared with a human baseline of 89.8%. MMLU is a benchmark to judge a model's general knowledge across 57 tasks covering professional and academic subjects including math, law, medicine and history. Benchmarks serve as standardised yardsticks to measure, compare, and understand evolution of different AI models. Structured assessments that provide comparable scores for different models. These typically consist of datasets containing thousands of curated questions, problems, or tasks that test particular aspects of intelligence. Understanding benchmark scores requires context about both scale and meaning behind numbers. Most benchmarks report accuracy as a percentage, but the significance of these percentages varies dramatically across different tests. On MMLU, random guessing would yield approximately 25% accuracy since most questions are multiple choice. Human performance typically ranges from 85-95% depending on subject area. Headline numbers often mask important nuances. A model might excel in certain subjects, more than others. An aggregated score may hide weaker performance on tasks requiring multi-step reasoning or creative problem-solving, behind strong performance on factual recall. AI engineer and commentator Rohan Paul notes on X that 'most benchmarks don't reward long-term memory, rather they focus on short-context tasks.' Increasingly, AI companies are looking closely at the 'memory' aspect. Researchers at Google, in a new paper, detail an attention technique dubbed 'Infini-attention', to configure how AI models extend their 'context window'. Mathematical benchmarks often show wider performance gaps. While most latest AI models score over 90% on accuracy, on the GSM8K benchmark (Claude Sonnet 3.5 leads with 97.72% while GPT-4 scores 94.8%), the more challenging MATH benchmark sees much lower ratings in comparison — Google Gemini 2.0 Flash Experimental with 89.7% leads, while GPT-4 scores 84.3%; Sonnet hasn't been tested yet). Reworking the methodology For AI testing, there is a need to realign testbeds. 'All the evals are saturated. It's becoming slightly meaningless,' the words of Satya Nadella, chairman and chief executive officer (CEO) of Microsoft, while speaking at venture capital firm Madrona's annual meeting, earlier this year. The tech giant has announced they're collaborating with institutions including Penn State University, Carnegie Mellon University and Duke University, to develop an approach to evaluate AI models that predicts how they will perform on unfamiliar tasks and explain why, something current benchmarks struggle to do. An attempt is being made to make benchmarking agents for dynamic evaluation of models, contextual predictability, human-centric comparatives and cultural aspects of generative AI. 'The framework uses ADeLe (annotated-demand-levels), a technique that assesses how demanding a task is for an AI model by applying measurement scales for 18 types of cognitive and knowledge-based abilities,' explains Lexin Zhou, Research Assistant at Microsoft. Momentarily, popular benchmarks include SWE-bench (or Software Engineering Benchmark) Verified to evaluate AI coding skills, ARC-AGI (Abstraction and Reasoning Corpus for Artificial General Intelligence) to judge generalisation and reasoning, as well as LiveBench AI that measures agentic coding tasks and evaluates LLMs on reasoning, coding and math. Among limitations that can affect interpretation, many benchmarks can be 'gamed' through techniques that improve scores without necessarily improving intelligence or capability. Case in point, Meta's new Llama models. In April, they announced an array of models, including Llama 4 Scout, the Llama 4 Maverick, and still-being-trained Llama 4 Behemoth. Meta CEO Mark Zuckerberg claims the Behemoth will be the 'highest performing base model in the world'. Maverick began ranking above OpenAI's GPT-4o in LMArena benchmarks, and just below Gemini 2.5 Pro. That is where things went pear shaped for Meta, as AI researchers began to dig through these scores. Turns out, Meta had shared a Llama 4 Maverick model that was optimised for this test, and not exactly a spec customers would get. Meta denies customisations. 'We've also heard claims that we trained on test sets — that's simply not true and we would never do that. Our best understanding is that the variable quality people are seeing is due to needing to stabilise implementations,' says Ahmad Al-Dahle, VP of generative AI at Meta, in a statement. There are other challenges. Models might memorise patterns specific to benchmark formats rather than developing genuine understanding. The selection and design of benchmarks also introduces bias. There's a question of localisation. Yi Tay, AI Researcher at Google AI and DeepMind has detailed one such regional-specific benchmark called SG-Eval, focused on helping train AI models for wider context. India too is building a sovereign large language model (LLM), with Bengaluru-based AI startup Sarvam, selected under the IndiaAI Mission. As AI capabilities continue advancing, researchers are developing evaluation methods that test for genuine understanding, robustness across context and capabilities in the real-world, rather than plain pattern matching. In the case of AI, numbers tell an important part of the story, but not the complete story.

Yahoo
23-05-2025
- Business
- Yahoo
Vercel debuts an AI model optimized for web development
The team behind Vercel's V0, an AI-powered platform for web creation, has developed an AI model it claims excels at certain website development tasks. Available through an API, the model, called "v0-1.0-md," can be prompted with text or images, and was "optimized for front-end and full-stack web development," the Vercel team says. Currently in beta, it requires a V0 Premium plan ($20 per month) or Team plan ($30 per user per month) with usage-based billing enabled. The launch of V0's model comes as more developers and companies look to adopt AI-powered tools for programming. According to a Stack Overflow survey last year, around 82% of developers reported that they're using AI tools for writing code. Meanwhile, a quarter of startups in Y Combinator's W25 batch have 95% of their codebases generated by AI, per YC managing partner Jared Friedman. Vercel's model can "auto-fix" common coding issues, the Vercel team says, and it's compatible with tools and SDKs that support OpenAI's API format. Evaluated on web development frameworks like the model can ingest up to 128,000 tokens in one go. Tokens are the raw bits of data that AI models work with, with a million tokens being equivalent to about 750,000 words (roughly 163,000 words longer than "War and Peace"). Vercel isn't the only outfit developing tailored models for programming, it should be noted. Last month, JetBrains, the company behind a range of popular app development tools, debuted its first "open" AI coding model. Last week, Windsurf released a family of programming-focused models dubbed SWE-1. And just yesterday, Mistral unveiled a model, Devstral, tuned for particular developer tasks. Companies may be keen to develop — and embrace — AI-powered coding assistants, but models still struggle to produce quality software. Code-generating AI tends to introduce security vulnerabilities and errors, owing to weaknesses in areas like the ability to understand programming logic. This article originally appeared on TechCrunch at Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data
Yahoo
22-05-2025
- Business
- Yahoo
Vercel debuts an AI model optimized for web development
The team behind Vercel's V0, an AI-powered platform for web creation, has developed an AI model it claims excels at certain website development tasks. Available through an API, the model, called "v0-1.0-md," can be prompted with text or images, and was "optimized for front-end and full-stack web development," the Vercel team says. Currently in beta, it requires a V0 Premium plan ($20 per month) or Team plan ($30 per user per month) with usage-based billing enabled. The launch of V0's model comes as more developers and companies look to adopt AI-powered tools for programming. According to a Stack Overflow survey last year, around 82% of developers reported that they're using AI tools for writing code. Meanwhile, a quarter of startups in Y Combinator's W25 batch have 95% of their codebases generated by AI, per YC managing partner Jared Friedman. Vercel's model can "auto-fix" common coding issues, the Vercel team says, and it's compatible with tools and SDKs that support OpenAI's API format. Evaluated on web development frameworks like the model can ingest up to 128,000 tokens in one go. Tokens are the raw bits of data that AI models work with, with a million tokens being equivalent to about 750,000 words (roughly 163,000 words longer than "War and Peace"). Vercel isn't the only outfit developing tailored models for programming, it should be noted. Last month, JetBrains, the company behind a range of popular app development tools, debuted its first "open" AI coding model. Last week, Windsurf released a family of programming-focused models dubbed SWE-1. And just yesterday, Mistral unveiled a model, Devstral, tuned for particular developer tasks. Companies may be keen to develop — and embrace — AI-powered coding assistants, but models still struggle to produce quality software. Code-generating AI tends to introduce security vulnerabilities and errors, owing to weaknesses in areas like the ability to understand programming logic. Error while retrieving data Sign in to access your portfolio Error while retrieving data
Yahoo
21-05-2025
- Business
- Yahoo
Mistral's new Devstral AI model was designed for coding
AI startup Mistral on Wednesday announced a new AI model focused on coding: Devstral. Devstral, which Mistral says was developed in partnership with AI company All Hands AI, is openly available under an Apache 2.0 license, meaning it can be used commercially without restriction. Mistral claims that Devstral outperforms other open models like Google's Gemma 3 27B and Chinese AI lab DeepSeek's V3 on SWE-Bench Verified, a benchmark measuring coding skills. "Devstral excels at using tools to explore codebases, editing multiple files and power[ing] software engineering agents," writes Mistral in a blog post provided to TechCrunch. "[I]t runs over code agent scaffolds such as OpenHands or SWE-Agent, which define the interface between the model and the test cases [...] Devstral is light enough to run on a single [Nvidia] RTX 4090 or a Mac with 32GB RAM, making it an ideal choice for local deployment and on-device use." Devstral arrives as AI coding assistants — and the models powering them — grow increasingly popular. Just last month, JetBrains, the company behind a range of popular app development tools, released its first "open" AI model for coding. In recent months, AI outfits including Google, Windsurf, and OpenAI have also unveiled models, both openly available and proprietary, optimized for programming tasks. AI models still struggle to code quality software — code-generating AI tends to introduce security vulnerabilities and errors, owing to weaknesses in areas like the ability to understand programming logic. Yet their promise to boost coding productivity is pushing companies — and developers — to rapidly adopt them. One recent poll found that 76% of devs used or were planning to use AI tools in their development processes last year. Mistral previously waded into the assistive programming space with Codestral, a generative model for code. But Codestral wasn't released under a license that permitted devs to use the model for commercial applications; its license explicitly banned "any internal usage by employees in the context of [a] company's business activities." Devstral, which Mistral is calling a "research preview," can be downloaded from AI development platforms, including Hugging Face, and also tapped through Mistral's API. It's priced at $0.1 per million input tokens and $0.3 per million output tokens, tokens being the raw bits of data that AI models work with. (A million tokens is equivalent to about 750,000 words, or roughly 163,000 words longer than "War and Peace.") Mistral says it's "hard at work building a larger agentic coding model that will be available in the coming weeks." Devstral isn't a small model per se, but it's on the smaller side at 24 billion parameters. (Parameters roughly correspond to a model's problem-solving skills, and models with more parameters generally perform better than those with fewer parameters.) Mistral, founded in 2023, is a frontier model lab, aiming to build a range of AI-powered services, including a chatbot platform, Le Chat, and mobile apps. It's backed by VCs, including General Catalyst, and has raised over €1.1 billion (roughly $1.24 billion) to date. Mistral's customers include BNP Paribas, AXA, and Mirakl. Devstral is Mistral's third product launch this month. A few weeks ago, Mistral launched Mistral Medium 3, an efficient general-purpose model. Around the same time, the company rolled out Le Chat Enterprise, a corporate-focused chatbot service that offers tools like an AI "agent" builder and integrates Mistral's models with third-party services like Gmail, Google Drive, and SharePoint. Error while retrieving data Sign in to access your portfolio Error while retrieving data Error while retrieving data Error while retrieving data Error while retrieving data