Latest news with #worldmodel

Associated Press
3 days ago
- Entertainment
- Associated Press
Matrix-Game 2.0 Launches as a Powerful Open-Source Alternative to Genie 3
SINGAPORE, Aug. 12, 2025 /PRNewswire/ -- The SkyWork AI Technology Release Week officially kicked off on August 11. From August 11 to August 15, a new model will be unveiled each day, covering cutting-edge models for core multimodal AI scenarios. A week ago, DeepMind released a major update to its interactive world model—Genie 3—enabling real-time, long-sequence generation. This advancement has drawn significant attention to world models. However, Genie 3 was not open-sourced, leaving the community to speculate about its implementation. On August 12, Skywork unveiled an upgraded version of the self-developed Matrix series' interactive world model—Matrix-Game 2.0. It also delivers interactive, real-time, long-sequence generation in general scenarios. To drive progress in interactive world modeling, Matrix-Game 2.0 has been fully open-sourced, marking the industry's first open-source solution for real-time, long-sequence, interactive generation in general scenarios. Matrix-Game 2.0 open source addresses: Matrix-Game 2.0 achieves a breakthrough in real-time generation and long-sequence handling. Compared to its predecessor, the 2.0 version prioritizes low-latency, high-frame-rate performance for extended interactions, enabling stable 25 FPS continuous video generation across complex scenes. Its generation length scales to minute-long sequences, drastically improving temporal coherence and real-world usability. While delivering a significant boost in inference speed, Matrix-Game 2.0 maintains precise comprehension of physical laws and scene semantics. It enables users to freely explore, manipulate, and construct virtual environments in real time through simple instructions—yielding well-structured, detail-rich, and logically coherent virtual spaces. With these capabilities, Matrix-Game 2.0 not only breaks down the barriers between content generation and interaction but also unlocks new possibilities for cutting-edge applications such as virtual humans, game engines, and embodied AI. It provides a robust technical foundation for building a universal virtual world. Currently, Matrix-Game 2.0 boasts three core advantages: High-frame-rate, real-time long-sequence generation: The model supports fluid movement (forward/backward, left/right) and camera/view rotation. Users can intuitively control characters in the scene via simple commands. The system generates seamless footage in real time at 25 FPS, enabling minute-long interactive sequences in a single session. Character movements are lifelike, smooth, and precisely responsive. Cross-scenario generalization capability: The model demonstrates exceptional cross-domain adaptability. It is not only suitable for specific task scenarios but also supports simulations of diverse styles and environments—including urban, wilderness, and other spatial types, as well as realistic, oil-painting, and various visual styles. Enhanced physical consistency: The model demonstrates a deeper understanding of physical rules. Characters generated by the model exhibit physically plausible movements when navigating complex terrains such as steps and obstacles, which improves immersion and controllability. The open-source release of Matrix-Game for interactive video generation underscores Skywork's strategic foresight in AI development. This initiative will accelerate development across Skywork's multi-model AI ecosystem. Moving forward, Skywork remains committed to pioneering and open-sourcing advanced AI solutions. By collaborating with global developers and users, we aim to build next-generation platforms that accelerate the global advancement of AGI. View original content to download multimedia: SOURCE Skywork AI pte ltd

Associated Press
3 days ago
- Business
- Associated Press
Matrix-3D Goes Open-Source: A New Benchmark for 3D World Generation
SINGAPORE, Aug. 12, 2025 /PRNewswire/ -- The SkyWork AI Technology Release Week officially kicked off on August 11. From August 11 to August 15, a new model will be unveiled each day, covering cutting-edge models for core multimodal AI scenarios. On August 12, the world model Matrix-3D for 3D world generation and exploration was officially open-sourced. Starting from a single input image, it generates high-quality, trajectory-consistent panoramic videos and directly reconstructs navigable 3D spaces. Compared to WorldLabs' output, Matrix-3D enables exploration across significantly larger virtual environments. Matrix-3D open source addresses: By integrating panoramic representation, conditional video generation, and 3D reconstruction modules, Matrix-3D surpasses existing methods in field-of-view range, geometric consistency, and visual quality. It accepts both text and image inputs and generates freely explorable 3D scenes. Matrix-3D achieves state-of-the-art generation quality on panoramic video benchmark datasets, while also attaining industry-leading performance in camera motion control precision. World models like Google's Genie 3 paint a compelling vision of the future. They reveal AI's evolution beyond mere content generation tools into world simulators—systems capable of constructing and simulating entire environments. As AI technology progresses, these models are poised to become critical infrastructure for understanding our world, shaping tomorrow, and ultimately realizing artificial general intelligence (AGI). The open-source release of Matrix-3D for 3D world generation and exploration underscores Skywork's strategic foresight in AI development. This initiative will accelerate development across Skywork's multi-model AI ecosystem. Moving forward, Skywork remains committed to pioneering and open-sourcing advanced AI solutions. By collaborating with global developers and users, we aim to build next-generation platforms that accelerate the global advancement of AGI. View original content to download multimedia: SOURCE Skywork AI pte ltd


TechCrunch
05-08-2025
- Business
- TechCrunch
DeepMind reveals Genie 3, a world model that could be the key to reaching AGI
Google DeepMind has revealed Genie 3, its latest foundation world model that the AI lab says presents a crucial stepping stone on the path to artificial general intelligence, or human-like intelligence. 'Genie 3 is the first real-time interactive general purpose world model,' Shlomi Fruchter, a research director at DeepMind, said during a press briefing. 'It goes beyond narrow world models that existed before. It's not specific to any particular environment. It can generate both photo-realistic and imaginary worlds, and everything in between.' Genie 3, which is still in research preview and not publicly available, builds on both its predecessor Genie 2 – which can generate new environments for agents – and DeepMind's latest video generation model Veo 3 – which exhibits a deep understanding of physics. Image Credits:Google DeepMind With a simple text prompt, Genie 3 can generate multiple minutes – up from 10 to 20 seconds in Genie 2 – of diverse, interactive, 3D environments at 24 frames per second with a resolution of 720p. The model also features 'promptable world events,' or the ability to use a prompt to change the generated world. Perhaps most importantly, Genie 3's simulations stay physically consistent over time because the model is able to remember what it had previously generated – an emergent capability that DeepMind researchers didn't explicitly program into the model. Fruchter said that while Genie 3 clearly has implications for educational experiences and new generative media like gaming or prototyping creative concepts, its real unlock will manifest in training agents for general purpose tasks, which he said is essential to reaching AGI. 'We think world models are key on the path to AGI, specifically for embodied agents, where simulating real world scenarios is particularly challenging,'Jack Parker-Holder, a research scientist on DeepMind's open-endedness team, said during a briefing. Techcrunch event Tech and VC heavyweights join the Disrupt 2025 agenda Netflix, ElevenLabs, Wayve, Sequoia Capital — just a few of the heavy hitters joining the Disrupt 2025 agenda. They're here to deliver the insights that fuel startup growth and sharpen your edge. Don't miss the 20th anniversary of TechCrunch Disrupt, and a chance to learn from the top voices in tech — grab your ticket now and save up to $675 before prices rise on August 7. Tech and VC heavyweights join the Disrupt 2025 agenda Netflix, ElevenLabs, Wayve, Sequoia Capital — just a few of the heavy hitters joining the Disrupt 2025 agenda. They're here to deliver the insights that fuel startup growth and sharpen your edge. Don't miss the 20th anniversary of TechCrunch Disrupt, and a chance to learn from the top voices in tech — grab your ticket now and save up to $675 before prices rise. San Francisco | REGISTER NOW Image Credits:Google DeepMind Genie 3 is designed to solve that bottleneck. Like Veo, it doesn't rely on a hard-coded physics engine. Instead, it teaches itself how the world works – how objects move, fall, and interact – by remembering what it has generated and reasoning over long time horizons. 'The model is auto-regressive, meaning it generates one frame at a time,' Fruchter told TechCrunch in a separate interview. 'It has to look back at what was generated before to decide what's going to happen next. That's a key part of the architecture.' That memory creates consistency in its simulated worlds, and that consistency allows it to develop a kind of intuitive grasp of physics, similar to how humans understand that a glass teetering on the edge of a table is about to fall, or that they should duck to avoid a falling object. This ability to simulate coherent, physically plausible environments over time makes Genie 3 much more than a generative model. It becomes an ideal training ground for general-purpose agents. Not only can it generate endless, diverse worlds to explore, but it also has the potential to push agents to their limits – forcing them to adapt, struggle, and learn from their own experience in a way that mirrors how humans learn in the real world. Image Credits:Google DeepMind Currently, the range of actions an agent can take is still limited. For example, the promptable world events allow for a wide range of environmental interventions, but they're not necessarily performed by the agent itself. Similarly, it's still difficult to accurately model complex interactions between multiple independent agents in a shared environment. Genie 3 can also only support a few minutes of continuous interaction, when hours would be necessary for proper training. Still, Genie 3 presents a compelling step forward in teaching agents to go beyond reacting to inputs so they can plan, explore, seek out uncertainty, and improve through trial and error – the kind of self-driven, embodied learning that's key in moving towards general intelligence. 'We haven't really had a Move 37 moment for embodied agents yet, where they can actually take novel actions in the real world,' Parker-Holder said, referring to the legendary moment in the 2016 game of Go between DeepMind's AI agent AlphaGo and world champion Lee Sedol, in which Alpha Go played an unconventional and brilliant move that became symbolic of AI's ability to discover new strategies beyond human understanding. 'But now, we can potentially usher in a new era,' he said.


The Verge
05-08-2025
- The Verge
Google's new AI model creates video game worlds in real time
Google DeepMind is releasing a new version of its AI 'world' model, called Genie 3, capable of generating 3D environments that users and AI agents can interact with in real time. The company is also promising that users will be able to interact with the worlds for much longer than before and that the model will actually remember where things are when you look away from them. World models are a type of AI system that can simulate environments for purposes like education, entertainment, or to help train robots or AI agents. With world models, you give them a prompt and they generate a space that you can move around in like you would in a video game, but instead of the world being handcrafted with 3D assets, it's all being generated with AI. It's an area Google is putting a lot of effort into; the company showed off Genie 2 in December, which could create interactive worlds based off of an image, and it's building a world models team led by a former co-lead of OpenAI's Sora video generation tool. But the models currently have a lot of drawbacks. Genie 2 worlds were only playable up to a minute, for example. I recently tried 'interactive video' from a company backed by Pixar's cofounder, and it felt like walking through a blurry version of Google Street View where things morphed and changed in ways that I didn't expect as I looked around. Genie 3 seems like it could be a notable step forward. Users will be able to generate worlds with a prompt that supports a 'few' minutes of continuous interaction, which is up from the 10–20 seconds of interaction possible with Genie 2, according to a blog post. Google says that Genie 3 can keep spaces in visual memory for about a minute, meaning that if you turn away from something in a world and then turn back to it, things like paint on a wall or writing on a chalkboard will be in the same place. The worlds will also have a 720p resolution and run at 24fps. DeepMind is adding what it calls 'promptable world events' into Genie 3, too. Using a prompt, you'll be able to do things like change weather conditions in a world or add new characters. However, this probably isn't a model you'll be able to try for yourself. It's launching as 'a limited research preview' that will be available to 'a small cohort of academics and creators' so its developers can better understand the risks and how to appropriately mitigate them, according to Google. There are also plenty of restrictions, like the limited ways users can interact with generated worlds and that legible text is 'often only generated when provided in the input world description.' Google says it's 'exploring' how to bring Genie 3 to 'additional testers' down the line. Posts from this author will be added to your daily email digest and your homepage feed. See All by Jay Peters Posts from this topic will be added to your daily email digest and your homepage feed. See All AI Posts from this topic will be added to your daily email digest and your homepage feed. See All Google Posts from this topic will be added to your daily email digest and your homepage feed. See All News Posts from this topic will be added to your daily email digest and your homepage feed. See All Tech


The Guardian
05-08-2025
- Business
- The Guardian
Google outlines latest step towards creating artificial general intelligence
Google has outlined its latest step towards artificial general intelligence (AGI) with a new model that allows AI systems to interact with a convincing simulation of the real world. The Genie 3 'world model' could be used to train robots and autonomous vehicles as they engage with realistic recreations of environments such as warehouses, according to Google. The US technology and search company's AI division, Google DeepMind, argues that world models are a key step to achieving AGI, a hypothetical level of AI where a system can carry out most tasks on a par with humans – not just individual tasks such as playing chess or translating languages – and potentially do someone's job. DeepMind said such models would play an important role in the development of AI agents, or systems that carry out tasks autonomously. 'We expect this technology to play a critical role as we push toward AGI, and agents play a greater role in the world,' DeepMind said. However, Google said Genie 3 is not yet ready for full public release and did not give a date for its launch, adding that the model had a range of limitations. The announcement comes amid ever-increasing competition in the AI market, with the chief executive of the ChatGPT developer, OpenAI, Sam Altman, sharing a screenshot on Sunday of what appeared to be the company's latest AI model, GPT-5. Google said the model could also help humans experience a range of simulations for training or exploring, replicating experiences such as skiing or walking around a mountain lake. Genie 3 creates its scenarios immediately from text prompts, according to DeepMind, and the simulated environment can be altered quickly – by for instance, introducing a herd of deer on to a ski slope – with further text prompts. The tech company showed the Genie 3-created skiing and warehouse scenarios to journalists on Monday but is not yet releasing the model to the public. The quality of the simulations seen by the Guardian are on a par with Google's latest video creation model, Veo 3, but they last minutes rather than the eight seconds offered by Veo 3. While AGI has been viewed through the prism of potentially eliminating white collar jobs, as autonomous systems carry out an array of jobs from sales agent to lawyer or accountant, world models are viewed by Google as a key technology for developing robots and autonomous vehicles. For instance, a recreation of a warehouse with realistic physics and people could help train a robot, as a simulation it 'learned' from in training helps it achieve its goal. Google has also created a virtual agent, SIMA, that can carry out tasks in video game settings, although like Genie 3, it is not publicly available. Andrew Rogoyski of the Institute for People-Centred AI at the University of Surrey in the UK said world models could also help large language models – the technology that underpins chatbots such as ChatGPT. Sign up to Business Today Get set for the working day – we'll point you to all the business news and analysis you need every morning after newsletter promotion 'If you give a disembodied AI the ability to be embodied, albeit virtually, then the AI can explore the world, or a world – and grow in capabilities as a result,' he said. 'While AIs are trained on vast quantities on internet data, allowing an AI to explore the world physically will add an important dimension to the creation of more powerful and intelligent AIs.' In a research note accompanying the SIMA announcement last year, Google researchers said world models are important because large language models are effective at tasks such as planning but not at taking action on a human's behalf.