5 days ago
Ai2 Unveils MolmoAct, a New Class of AI Model That Reasons in 3D Space
SEATTLE--(BUSINESS WIRE)-- Ai2 (The Allen Institute for AI) today announced the release of MolmoAct 7B, a breakthrough embodied AI model that brings the intelligence of state of the art AI models into the physical world. Instead of reasoning through language and converting that into movement, MolmoAct actually sees its surroundings, understands the relationships between space, movement and time, and plans its movements accordingly. It does this by generating visual reasoning tokens that transform 2D image inputs into 3D spatial plans—enabling robots to navigate the physical world with greater intelligence and control.
'With MolmoAct, we're laying the groundwork for a new era of AI—one that can reason and navigate the world in ways more aligned with human thinking, and collaborate with us safely and effectively.' -Ali Farhadi, CEO of Ai2
Share
While spatial reasoning isn't new, most modern systems rely on closed, end-to-end architectures trained on massive proprietary datasets. These models are difficult to reproduce, expensive to scale, and often operate as opaque black boxes. MolmoAct offers a fundamentally different approach: it's trained entirely on open data, designed for transparency, and built for real-world generalization. Its step-by-step visual reasoning traces make it easy to preview what a robot plans to do and intuitively steer its behavior in real time as conditions change.
'Embodied AI needs a new foundation that prioritizes reasoning, transparency, and openness,' said Ali Farhadi, CEO of Ai2. 'With MolmoAct, we're not just releasing a model; we're laying the groundwork for a new era of AI, bringing the intelligence of powerful AI models into the physical world. It's a step toward AI that can reason and navigate the world in ways that are more aligned with how humans do — and collaborate with us safely and effectively.'
A New Class of Model: Action Reasoning
MolmoAct is the first in a new category of AI model Ai2 is calling an Action Reasoning Model (ARM), a model that interprets high-level natural language instructions and reasons through a sequence of physical actions to carry them out in the real world. Unlike traditional end-to-end robotics models that treat tasks as a single, opaque step, ARMs interpret high-level instructions and break them down into a transparent chain of spatially grounded decisions:
3D-aware perception: grounding the robot's understanding of its environment using depth and spatial context
Visual waypoint planning: outlining a step-by-step task trajectory in image space
Action decoding: converting the plan into precise, robot-specific control commands
This layered reasoning enables MolmoAct to interpret commands like 'Sort this trash pile' not as a single step, but as a structured series of sub-tasks: recognize the scene, group objects by type, grasp them one by one, and repeat.
Built to Generalize and Trained to Scale
MolmoAct 7B, the first in its model family, was trained on a curated dataset of about 12,000 'robot episodes' from real-world environments, such as kitchens and bedrooms. These demonstrations were transformed into robot-reasoning sequences that expose how complex instructions map to grounded, goal-directed actions. Along with the model, we're releasing the MolmoAct post-training dataset containing ~12,000 distinct 'robot episodes.' Ai2 researchers spent months curating videos of robots performing actions in diverse household settings, from arranging pillows on a living room couch to putting away laundry in a bedroom.
Despite its strong performance, MolmoAct was trained with striking efficiency. It required just 18 million samples, pretraining on 256 NVIDIA H100 GPUs for about 24 hours, and fine-tuning on 64 GPUs for only two more. In contrast, many commercial models require hundreds of millions of samples and far more compute. Yet MolmoAct outperforms many of these systems on key benchmarks—including a 71.9% success rate on SimPLER—demonstrating that high-quality data and thoughtful design can outperform models trained with far more data and compute.
Understandable AI You Can Build On
Unlike most robotics models, which operate as opaque systems, MolmoAct was built for transparency. Users can preview the model's planned movements before execution, with motion trajectories overlaid on camera images. These plans can be adjusted using natural language or quick sketching corrections on a touchscreen—providing fine-grained control and enhancing safety in real-world environments like homes, hospitals, and warehouses.
True to Ai2's mission, MolmoAct is fully open-source and reproducible. Ai2 is releasing everything needed to build, run, and extend the model: training pipelines, pre- and post-training datasets, model checkpoints, and evaluation benchmarks.
MolmoAct sets a new standard for what embodied AI should look like—safe, interpretable, adaptable, and truly open. Ai2 will continue expanding its testing across both simulated and real-world environments, with the goal of enabling more capable and collaborative AI systems.
Download the model and model artifacts – including training checkpoints and evals – from Ai2's Hugging Face repository.
About Ai2
Ai2 is a Seattle-based non-profit AI research institute with the mission of building breakthrough AI to solve the world's biggest problems. Founded in 2014 by the late Paul G. Allen, Ai2 develops foundational AI research and innovative new applications that deliver real-world impact through large-scale open models, open data, robotics, conservation platforms, and more. Ai2 champions true openness through initiatives like OLMo, the world's first truly open language model framework, Molmo, a family of open state-of-the-art multimodal AI models, and Tulu, the first application of fully open post-training recipes to the largest open-weight models. These solutions empower researchers, engineers, and tech leaders to participate in the creation of state-of-the-art AI and to directly benefit from the many ways it can advance critical fields like medicine, scientific research, climate science, and conservation efforts. For more information, visit