AI21 Introduces Maestro, the World's First AI Planning and Orchestration System Built for the Enterprise

10-03-2025

AI21 is leading the shift from LLMs and Reasoning models to planning AI systems. Maestro increases the accuracy of GPT-4o and Claude Sonnet 3.5 by up to 50% on complex, multi-requirement tasks, transforming AI from an unpredictable tool to a trustworthy system.
LAS VEGAS, March 10, 2025 /PRNewswire/ -- AI21, a pioneer in frontier models and AI systems, today unveiled Maestro, the world's first AI Planning and Orchestration System designed to deliver trustworthy AI at scale for organizations.
Introduced at the HumanX 2025 conference, Maestro marks a significant advancement in enterprise AI, boosting the instruction-following accuracy of paired Large Language Models (LLMs) by up to 50% and ensuring guaranteed quality, reliability, and observability. This technology transcends the limitations of traditional LLMs and Large Reasoning Models (LRMs), setting a new benchmark for AI capabilities.
Maestro delivers a substantial improvement in LLM performance on complex tasks. It elevates the accuracy of models like GPT-4o and Claude Sonnet 3.5 by up to 50% and empowers reasoning models, such as o3-mini, to surpass 95% accuracy. Notably, Maestro bridges the performance gap between non-reasoning and reasoning models, aligning the accuracy of Claude Sonnet 3.5 with advanced reasoning models like o3-mini.
While enterprises are eager to integrate AI into their operations, large-scale generative AI deployments often falter. According to the Amazon Web Services (AWS) CDO Agenda 2024, only 6% of organizations have a generative AI application in deployment, highlighting the fundamental limitations of current AI solutions for mission-critical tasks. The prevailing approaches—"Prompt and Pray" and hard-coded chains—present significant challenges. The "Prompt and Pray" method, which relies on LLMs and LRMs to execute open-ended tasks, lacks control and reliability due to the probabilistic nature of these models. Hard-coded chains, while more predictable, are rigid, labor-intensive, and prone to failure under changing conditions.
Reasoning models, designed to solve complex tasks through thinking tokens, have not alleviated these issues. They exhibit inconsistent performance, struggle to adhere to instructions, and fail to reliably utilize tools. Consequently, none of these approaches delivers the accuracy, reliability, and adaptability essential for widespread enterprise adoption.
"Mass adoption of AI by enterprises is the key to the next industrial revolution," said Ori Goshen, Co-CEO of AI21. "AI21's Maestro is the first step toward that future – moving beyond the unpredictability of available solutions to deliver AI that is reliable at scale. Delivering complex decision-making with built-in quality control, it enables businesses to harness AI with confidence. This is how we bridge the gap between AI potential and real-world solutions."
"Wix is leading the charge in LLM adoption, powering hundreds of AI applications," said Avishai Abrahami, CEO of WIX. "Maestro ushers in a new era of agentic AI – striking a necessary balance between quality, control, and trust that could be a key factor in our ability to develop trustworthy AI applications at scale."
"The potential of enterprise AI lies in balancing innovation with reliability," said Elad Tsur, Chief AI Officer at Applied Systems. "AI21 Maestro is a promising step toward making AI more controllable and useful for business applications, bridging the gap between powerful AI models and real-world enterprise needs."
Maestro, powered by the AI Planning and Orchestration System (AIPOS), delivers reliable, system-level AI by integrating LLMs or LRMs into a framework that analyzes actions, plans solutions, and validates results. This framework learns the enterprise environment to ensure accuracy and efficiency, allowing builders to define requirements and obtain results that meet their criteria within seconds. By eliminating the need for prompt engineering and rigid workflows, Maestro delivers on the promise of truly trustworthy AI.
Request early access to Maestro API by visiting http://ai21.com/maestro.
About AI21AI21 is a pioneer in Foundation Models and AI Systems designed for enterprises. AI21's mission is to create trustworthy artificial intelligence that powers humanity towards superproductivity. Founded in 2017 by AI visionaries Prof. Amnon Shashua, Prof. Yoav Shoham, and Ori Goshen, AI21 has secured $336 million in funding from industry leaders, including NVIDIA, Google, and Intel, reinforcing its commitment to advancing AI innovation.
View original content to download multimedia:https://www.prnewswire.com/news-releases/ai21-introduces-maestro-the-worlds-first-ai-planning-and-orchestration-system-built-for-the-enterprise-302397075.html
SOURCE AI21 Labs

Hashtags

#instruction-following

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Sam Altman launches GPT-oss, OpenAI's first open-weight AI language model in over 5 years

Business Insider

18 minutes ago

Business Insider

Sam Altman launches GPT-oss, OpenAI's first open-weight AI language model in over 5 years

OpenAI 's AI models are getting more open. At least, someof them are. OpenAI CEO Sam Altman announced GPT-oss on Tuesday, an "open" family of language models with "open weights" that the CEO said can operate locally on a "high-end laptop" and smartphones. An AI model with "open weights" is one whose fully trained parameter weights are made publicly downloadable, so anyone can run, inspect, or fine-tune the AI model locally. "We believe this is the best and most usable open model in the world," OpenAI CEO Sam Altman wrote on X. There are two different models: gpt-oss-120b and gpt-oss-20b. The smaller model is designed to run on "most desktops and laptops, " while the larger model is geared toward higher-end equipment. Altman said GPT-oss has "strong real-world performance comparable to o4-mini." Just before OpenAI's announcement, rival Anthropic revealed the Claude Opus 4.1. Tuesday's announcement was not the long-rumored ChatGPT-5, which could arrive as soon as this week. Instead, the new model is OpenAI's first open-weight language model since the release of GPT-2 in 2019. "As part of this, we are quite hopeful that this release will enable new kinds of research and the creation of new kinds of products," Altman wrote. "We expect a meaningful uptick in the rate of innovation in our field, and for many more people to do important work than were able to before." Altman had previously signaled that OpenAI would return to releasing at least some open model, saying that, "We're going to do a very powerful open source model" that was "better than any current open source model out there."

OpenAI launches two ‘open' AI reasoning models

Yahoo

an hour ago

Yahoo

OpenAI launches two ‘open' AI reasoning models

OpenAI announced Tuesday the launch of two open-weight AI reasoning models with similar capabilities to its o-series. Both are freely available to download from the online developer platform, Hugging Face, the company said, describing the models as 'state-of-the-art' when measured across several benchmarks for comparing open models. The models come in two sizes: a larger and more capable gpt-oss-120b model that can run on a single Nvidia GPU, and a lighter-weight gpt-oss-20b model that can run on a consumer laptop with 16GB of memory. The launch marks OpenAI's first 'open' language model since GPT-2, which was released more than five years ago. In a briefing, OpenAI said its open models will be capable of sending complex queries to AI models in the cloud, as TechCrunch previously reported. That means if OpenAI's open model is not capable of a certain task, such as processing an image, developers can connect the open model to one of the company's more capable closed models. While OpenAI open-sourced AI models in its early days, the company has generally favored a proprietary, closed-source development approach. The latter strategy has helped OpenAI build a large business selling access to its AI models via an API to enterprises and developers. However, CEO Sam Altman said in January he believes OpenAI has been 'on the wrong side of history' when it comes to open sourcing its technologies. The company today faces growing pressure from Chinese AI labs — including DeepSeek, Alibaba's Qwen, and Moonshot AI —which have developed several of the world's most capable and popular open models. (While Meta previously dominated the open AI space, the company's Llama AI models have fallen behind in the last year.) In July, the Trump Administration also urged U.S. AI developers to open source more technology to promote global adoption of AI aligned with American values. With the release of gpt-oss, OpenAI hopes to curry favor with developers and the Trump Administration alike, both of which have watched the Chinese AI labs rise to prominence in the open source space. 'Going back to when we started in 2015, OpenAI's mission is to ensure AGI that benefits all of humanity,' said OpenAI CEO Sam Altman in a statement shared with TechCrunch. 'To that end, we are excited for the world to be building on an open AI stack created in the United States, based on democratic values, available for free to all and for wide benefit.' How the models performed OpenAI aimed to make its open model a leader among other open-weight AI models, and the company claims to have done just that. On Codeforces (with tools), a competitive coding test, gpt-oss-120b and gpt-oss-20b score 2622 and 2516, respectively, outperformed DeepSeek's R1 while underperforming o3 and o4-mini. On Humanity's Last Exam, a challenging test of crowd-sourced questions across a variety of subjects (with tools), gpt-oss-120b and gpt-oss-20b score 19% and 17.3%, respectively. Similarly, this underperforms o3 but outperforms leading open models from DeepSeek and Qwen. Notably, OpenAI's open models hallucinate significantly more than its latest AI reasoning models, o3 and o4-mini. Hallucinations have been getting more severe in OpenAI's latest AI reasoning models, and the company previously said it doesn't quite understand why. In a white paper, OpenAI says this is 'expected, as smaller models have less world knowledge than larger frontier models and tend to hallucinate more.' OpenAI found that gpt-oss-120b and gpt-oss-20b hallucinated in response to 49% and 53% of questions on PersonQA, the company's in-house benchmark for measuring the accuracy of a model's knowledge about people. That's more than triple the hallucination rate of OpenAI's o1 model, which scored 16%, and higher than its o4-mini model, which scored 36%. Training the new models OpenAI says its open models were trained with similar processes to its proprietary models. The company says each open model leverages mixture-of-experts (MoE) to tap fewer parameters for any given question, making it run more efficiently. For gpt-oss-120b, which has 117 billion total parameters, OpenAI says the model only activates 5.1 billion parameters per token. The company also says its open model was trained using high-compute reinforcement learning (RL) — a post-training process to teach AI models right from wrong in simulated environments using large clusters of Nvidia GPUs. This was also used to train OpenAI's o-series of models, and the open models have a similar chain-of-thought process in which they take additional time and computational resources to work through their answers. As a result of the post-training process, OpenAI says its open AI models excel at powering AI agents, and are capable of calling tools such as web search or Python code execution as part of its chain-of-thought process. However, OpenAI says its open models are text-only, meaning they will not be able to process or generate images and audio like the company's other models. OpenAI is releasing gpt-oss-120b and gpt-oss-20b under the Apache 2.0 license, which is generally considered one of the most permissive. This license will allow enterprises to monetize OpenAI's open models without having to pay or obtain permission from the company. However, unlike fully open source offerings from AI labs like AI2, OpenAI says it will not be releasing the training data used to create its open models. This decision is not surprising given that several active lawsuits against AI model providers, including OpenAI, have alleged that these companies inappropriately trained their AI models on copyrighted works. OpenAI delayed the release of its open models several times in recent months, partially to address safety concerns. Beyond the company's typical safety policies, OpenAI says in a white paper that it also investigated whether bad actors could fine-tune its gpt-oss models to be more helpful in cyber attacks or the creation of biological or chemical weapons. After testing from OpenAI and third-party evaluators, the company says gpt-oss may marginally increase biological capabilities. However, it did not find evidence that these open models could reach its 'high capability' threshold for danger in these domains, even after fine-tuning. While OpenAI's model appears to be state-of-the-art among open models, developers are eagerly awaiting the release of DeepSeek R2, its next AI reasoning model, as well as a new open model from Meta's new superintelligence lab.

OpenAI's first new open-weight LLMs in six years are here

Engadget

an hour ago

Engadget

OpenAI's first new open-weight LLMs in six years are here

For the first time since GPT-2 in 2019, OpenAI is releasing new open-weight large language models. It's a major milestone for a company that has increasingly been accused of forgoing its original stated mission of "ensuring artificial general intelligence benefits all of humanity." Now, following multiple delays for additional safety testing and refinement, gpt-oss-120b and gpt-oss-20b are available to download from Hugging Face. Before going any further, it's worth taking a moment to clarify what exactly OpenAI is doing here. The company is not releasing new open-source models that include the underlying code and data the company used to train them. Instead, it's sharing the weights — that is, the numerical values the models learned to assign to inputs during their training — that inform the new systems. According to Benjamin C. Lee, professor of engineering and computer science at the University of Pennsylvania, open-weight and open-source models serve two very different purposes. "An open-weight model provides the values that were learned during the training of a large language model, and those essentially allow you to use the model and build on top of it. You could use the model out of the box, or you could redefine or fine-tune it for a particular application, adjusting the weights as you like," he said. If commercial models are an absolute black box and an open-source system allows for complete customization and modification, open-weight AIs are somewhere in the middle. OpenAI has not released open-source models, likely since a rival could use the training data and code to reverse engineer its tech. "An open-source model is more than just the weights. It would also potentially include the code used to run the training process," Lee said. And practically speaking, the average person wouldn't get much use out of an open-source model unless they had a farm of high-end NVIDIA GPUs running up their electricity bill. (They would be useful for researchers looking to learn more about the data the company used to train its models though, and there are a handful of open-source models out there like Mistral NeMo and Mistral Small 3.) With that out of the way, the primary difference between gpt-oss-120b and gpt-oss-20b is how many parameters each one offers. If you're not familiar with the term, parameters are the settings a large language model can tweak to provide you with an answer. The naming is slightly confusing here, but gpt-oss-120b is a 117 billion parameter model, while its smaller sibling is a 21-billion one. In practice, that means gpt-oss-120b requires more powerful hardware to run, with OpenAI recommending a single 80GB GPU for efficient use. The good news is the company says any modern computer with 16GB of RAM can run gpt-oss-20b. As a result, you could use the smaller model to do something like vibe code on your own computer without a connection to the internet. What's more, OpenAI is making the models available through the Apache 2.0 license, giving people a great deal of flexibility to modify the systems to their needs. Despite this not being a new commercial release, OpenAI says the new models are in many ways comparable to its proprietary systems. The one limitation of the oss models is that they don't offer multi-modal input, meaning they can't process images, video and voice. For those capabilities, you'll still need to turn to the cloud and OpenAI's commercial models, something both new open-weight systems can be configured to do. Beyond that, however, they offer many of the same capabilities, including chain-of-thought reasoning and tool use. That means the models can tackle more complex problems by breaking them into smaller steps, and if they need additional assistance, they know how to use the web and coding languages like Python. Additionally, OpenAI trained the models using techniques the company previously employed in the development of o3 and its other recent frontier systems. In competition-level coding gpt-oss-120b earned a score that is only a shade worse than o3, OpenAI's current state-of-the-art reasoning model, while gpt-oss-20b landed in between o3-mini and o4-mini. Of course, we'll have to wait for more real-world testing to see how the two new models compare to OpenAI's commercial offerings and those of its rivals. The release of gpt-oss-120b and gpt-oss-20b and OpenAI's apparent willingness to double down on open-weight models comes after Mark Zuckerberg signaled Meta would release fewer such systems to the public. Open-sourcing was previously central to Zuckerberg's messaging about his company's AI efforts, with the CEO once remarking about closed-source systems "fuck that." At least among the sect of tech enthusiasts willing to tinker with LLMs, the timing, accidental or not, is somewhat embarrassing for Meta. "One could argue that open-weight models democratize access to the largest, most capable models to people who don't have these massive, hyperscale data centers with lots of GPUs," said Professor Lee. "It allows people to use the outputs or products of a months-long training process on a massive data center without having to invest in that infrastructure on their own. From the perspective of someone who just wants a really capable model to begin with, and then wants to build for some application. I think open-weight models can be really useful." OpenAI is already working with a few different organizations to deploy their own versions of these models, including AI Sweden, the country's national center for applied AI. In a press briefing OpenAI held before today's announcement, the team that worked on gpt-oss-120b and gpt-oss-20b said they view the two models as an experiment; the more people use them, the more likely OpenAI is to release additional open-weight models in the future.

AI21 Introduces Maestro, the World's First AI Planning and Orchestration System Built for the Enterprise

Hashtags

Try Our AI Features

Comments

Related Articles

Sam Altman launches GPT-oss, OpenAI's first open-weight AI language model in over 5 years

OpenAI launches two ‘open' AI reasoning models

OpenAI's first new open-weight LLMs in six years are here

Get Started Now: Download the App