Latest news with #V-JEPA2


New Indian Express
3 hours ago
- Science
- New Indian Express
Meta Unveils V-JEPA 2: A smarter AI that understands the physical world
Meta has launched V-JEPA 2, an advanced AI model trained on video, designed to help robots and AI systems better understand and predict how the physical world works. This model represents a big step toward Meta's long-term goal of developing Advanced Machine Intelligence (AMI)—AI that can think before it acts. Humans naturally understand how the world responds to actions—for example, knowing a ball will fall after being tossed, or avoiding people while walking through a crowd. This is because we build mental models of how things move and interact. V-JEPA 2 aims to give AI agents a similar understanding of their surroundings, helping them to observe, reason, and make better decisions before taking action. These types of AI systems are built using world models, which allow machines to understand, predict, and plan—just like humans do when navigating everyday situations. Smarter robots through video training V-JEPA 2 builds on Meta's earlier version of the model released last year, called V-JEPA. This new version offers major improvements in helping machines predict how objects move, how they interact with other objects, and how people handle them in the real world.


Entrepreneur
a day ago
- Business
- Entrepreneur
Six Happenings that Changed the World of AI This Week
From Meta, OpenAI, Google, and Databricks to Nvidia and Samsung, numerous new AI capabilities have been announced Opinions expressed by Entrepreneur contributors are their own. You're reading Entrepreneur India, an international franchise of Entrepreneur Media. Since the global tech giants completed their annual conferences, it seems Artificial Intelligence (AI) developments haven't taken a single day's breath. From Meta, OpenAI, Google, and Databricks to Nvidia and Samsung, numerous new capabilities have been announced, all aimed at pushing AI into its next chapter. Meta's V-JEPA 2 and the Age of World Models Meta, long a silent workhorse in the AI research domain, has stepped into the spotlight with its latest offering V-JEPA 2, an open-source AI world model. Unveiled on June 11, V-JEPA 2 isn't just another generative model it's designed to reason about the physical world. At its core, the model allows AI systems to internally simulate real-world environments in 3D, equipping machines with the cognitive ability to anticipate how objects might move or behave even in unfamiliar settings. In practice, this could dramatically enhance fields like robotics and autonomous driving, where a deep understanding of physical dynamics is vital. Rather than relying on heavily labelled datasets, V-JEPA 2 is trained on unlabelled video content, making it both scalable and efficient. Meta says the model leverages what's called "latent space" to intuit motion and interaction essentially, AI imagination. "Today, we're excited to share V-JEPA 2, the first world model trained on video that enables state-of-the-art understanding and prediction, as well as zero-shot planning and robot control in new environments. As we work toward our goal of achieving advanced machine intelligence (AMI), it will be important that we have AI systems that can learn about the world as humans do, plan how to execute unfamiliar tasks, and efficiently adapt to the ever-changing world around us," Meta noted in its official blog. OpenAI's o3-Pro: A Model That Thinks Before It Speaks OpenAI's new model, o3-Pro, pushes generative AI beyond fluency and into thoughtful deliberation. Aimed at enterprise users and professionals who care more about accuracy than speed, o3-Pro is engineered for complex reasoning whether it's solving PhD-level science problems or conducting multi-step business analysis. Notably, o3-Pro reportedly outperforms major competitors like Google Gemini 2.5 and Claude 4 Opus on industry-standard benchmarks such as AIME 2024 and GPQA Diamond. But users looking for casual chats may find it slower than its predecessor, GPT-4o. The model is now available to ChatGPT Pro and Team users, though temporary chat memory is disabled due to technical issues, and image generation features are still pending. Databricks Launches Agent Bricks The company launched Agent Bricks, an automated system that allows enterprises to build, optimize, and evaluate AI agents using their own data without requiring deep ML expertise. Agent Bricks isn't just another tool in the LLM toolkit. It tackles two of the biggest blockers to AI adoption: cost and quality. Most agent systems today rely on trial-and-error for tuning. Databricks replaces this with synthetic data generation and custom benchmarks that automatically calibrate agents for domain-specific tasks like legal document parsing or extracting information from maintenance manuals. "Agent Bricks is a whole new way of building and deploying AI agents that can reason on your data," said Ali Ghodsi, CEO and Co-founder of Databricks. "For the first time, businesses can go from idea to production-grade AI on their own data with speed and confidence, with control over quality and cost tradeoffs. No manual tuning, no guesswork and all the security and governance Databricks has to offer. It's the breakthrough that finally makes enterprise AI agents both practical and powerful." One compelling example is AstraZeneca which used Agent Bricks to transform over 400,000 clinical trial documents into structured data in under an hour without writing a single line of code. "With Agent Bricks, our teams were able to parse through more than 400,000 clinical trial documents and extract structured data points without writing a single line of code. In just under 60 minutes, we had a working agent that can transform complex unstructured data usable for Analytics," noted Joseph Roemer, Head of Data & AI, Commercial IT, AstraZeneca Google's New AI Architect In a quieter but telling move, Google appointed Koray Kavukcuoglu, CTO of its DeepMind AI lab, as its first Chief AI Architect. The appointment, announced via an internal memo from CEO Sundar Pichai. Kavukcuoglu will now serve as a Senior Vice President reporting directly to Pichai, while continuing in his role as CTO at DeepMind under the leadership of CEO Demis Hassabis. He is set to relocate from London to California to take on this expanded mandate. "In this expanded role, Koray will accelerate how we bring our world-leading models into our products, with the goal of more seamless integration, faster iteration, and greater efficiency," Pichai stated in the memo, underscoring the company's push for scalable, AI-first innovation across its ecosystem. The leadership reshuffle comes at a time when Alphabet is under increasing pressure to demonstrate tangible financial returns from its heavy AI investments, with capital expenditures projected to reach USD 75 billion this year. The tech giant is also balancing regulatory scrutiny and intensifying competition in the AI space. Meta's Foray Into AI Video Editing Meanwhile, Meta is also expanding into generative video with its new AI video editing tools. Launched on its Meta AI app and editing platform 'Edits', users can now apply up to 50 preset transformations ranging from comic book aesthetics to sci-fi outfit swaps to short video clips. While still limited to preset prompts and 10-second clips, the tools point to a future where consumer creativity meets AI augmentation. Meta hasn't confirmed whether its Movie Gen AI models are behind these features, but the trajectory is clear the company is pushing toward broader consumer AI adoption. "We built this so that everyone can experiment creatively and make fun, interesting videos to share with their friends, family, and followers. Whether you're reimagining a favourite family memory or finding new ways to entertain your audience, our video editing [tools] can help," Meta said in its blog post. Nvidia and Samsung's Bet on Physical AI Meanwhile, Nvidia and Samsung are putting their money into robotics. The two tech giants joined Japan's SoftBank in backing Skild AI, a robotics software startup, with Nvidia investing USD 25 million and Samsung adding USD 10 million. Skild is now reportedly valued at USD 4.5 billion following this Series B round. The investment underscores a growing belief that the next frontier for AI isn't more virtual assistants but physical AI such as robots, autonomous systems, and machines that act on the world with intelligence.


India Today
a day ago
- Business
- India Today
Meta unveils AI that thinks and sees the world like humans
Meta has introduced a new artificial intelligence model called V-JEPA 2, which can seemingly help AI agents better understand and predict the real world – much like how humans observe, think, and plan before taking any action. According to Meta, this new open-source AI model is a big step towards developing what it calls advanced machine intelligence (AMI). AMI is Meta's vision for the future. It's an AI model that can not only process data but also learn from its surroundings and predict how things will change – just like humans do every calls V-JEPA 2 its most sophisticated world model to date. V-JEPA 2 stands for Video Joint Embedding Predictive Architecture 2. The model is primarily trained on vast amounts of video footage. The company explains that by watching a huge number of video clips – over a million hours – this AI learnt how people interact with objects, how things move, and how different actions affect the world around them. And with this training, AI can further enable robots and AI systems to anticipate how objects behave, how environments respond to motion, and how different elements interact physically.'As humans, we have the ability to predict how the physical world will evolve in response to our actions or the actions of others,' Meta said in its official blog post. 'V-JEPA 2 helps AI agents mimic this intelligence, making them smarter about the physical world.'advertisement Giving an example Meta explains that just as a person knows a tennis ball will fall back down if thrown into the air, V-JEPA 2 can learn this kind of common-sense behaviour by observing video. This training with video and world understanding further helps AI develop a mental map or understanding of how the physical world makes Meta's V-JEPA 2 different?V-JEPA 2 is a 1.2 billion-parametre model that builds on its predecessor V-JEPA, which Meta unveiled last year. This new generation is said to offer significant improvements in understanding, predicting, and planning. The company emphasises that, unlike previous systems, V-JEPA 2 is not just capable of recognising images or responding to commands, but it can actually make predictions. It can look at a situation and estimate what will happen next if a certain action is taken. These capabilities, according to Meta, are essential for AI to function autonomously in real-world settings. For instance, this could allow a robot to navigate unfamiliar terrain or manipulate objects it has never seen reveals that it has also tested this by putting the AI model into robots in its labs. During testing, the company claims these robots were able to complete basic tasks like picking up unfamiliar objects and placing them in new spots – even in environments the robot had never seen before. The robot used the model to plan its next move based on its current view and a goal image. It then chose the best action to take, step by support of the broader research community, Meta is also releasing three new benchmarks to evaluate how well AI models learn and reason from video. These benchmarks aim to standardise the way researchers test world models, offering a clearer path towards advancing physical reasoning in AI.'By sharing this work, we aim to give researchers and developers access to the best models and benchmarks to help accelerate research and progress – ultimately leading to better and more capable AI systems that will help enhance people's lives,' said while the company is currently focusing on short tasks like picking and placing objects, Meta says it wants to go further – developing models that can plan long-term, break down complex tasks into smaller steps, and even use senses like touch and sound in the future.


Indian Express
2 days ago
- Business
- Indian Express
Meta introduces V-JEPA 2, an AI world model to power robotics and autonomous systems
It seems the AI community is gearing up for the next frontier in AI, world models. Meta, on Wednesday, June 11, unveiled its new AI model, V-JEPA 2. Dubbed as a 'world model', the V-JEPA 2 has the ability to understand the physical world. The model has been designed to comprehend movements of objects and has the potential to enhance robotics and self-driving cars. The V-JEPA 2 is an open-source AI model that can understand and predict real-world environments in 3D. It allows AI to build an internal simulation of the real world, essentially helping it reason, plan, and act much like humans. While a traditional AI model would rely heavily on labelled data, the V-JEPA 2 is reportedly trained to identify patterns in unlabelled video clips, using these as its foundation for internal 3D reasoning. The world model highlights the tech giant's increasing focus towards more intuitive and intelligent AI systems that can engage with the physical world. Reportedly, this technology can be beneficial in the domains of robotics, augmented reality, and future AI assistants. 'Today, we're excited to share V-JEPA 2, the first world model trained on video that enables state-of-the-art understanding and prediction, as well as zero-shot planning and robot control in new environments. As we work toward our goal of achieving advanced machine intelligence (AMI), it will be important that we have AI systems that can learn about the world as humans do, plan how to execute unfamiliar tasks, and efficiently adapt to the ever-changing world around us,' Meta wrote in its official blog. The latest announcement from Meta comes at a time when the company is facing stiff competition from rivals Google, Microsoft, and OpenAI. According to a recent CNBC report, Meta CEO Mark Zuckerberg has made AI a top priority for the company, which is also planning to invest $14 billion in Scale AI, a company that pioneers data labelling for AI training. When it comes to the specifications, the V-JEPA 2 is a 1.2 billion-parameter model that has been built using the Meta Joint Embedding Predictive Architecture (JEPA) model which was shared in 2022. V-JEPA is Meta's first model trained on video that was released in 2024, with the latest V-JEPA 2 the company claims to have improved action prediction and world modelling capabilities which allows robots to interact with unfamiliar objects and environments to accomplish a task. In simple words, world models are mental simulations that help us in predicting how the physical world behaves. We humans develop this intuition right from a young age, such as we know instinctively that a ball thrown in the air will fall back down. Similarly, while walking in a crowded space we avoid colliding with others. This inner sense of cause and effect helps us to act more effectively in complex situations. When it comes to AI agents, they need similar capabilities to interact with the real world. Accordion to Meta, to achieve this their world models should be capable of understanding their surroundings and recognise objects, actions, and movements; they should be able to predict how things will change over time, especially in response to actions; they should plan ahead by simulating possible outcomes and choosing the best course of action. So to simplify, an AI world model is an internal simulation that helps a machine to understand, predict, and plan within a physical environment. Essentially, it helps the AI to anticipate how the world will change in response to actions. Now, this could enable more intelligent, goal-driven behavior in AI. The V-JEPA 2 model could likely enhance real-world machines like self-driving cars and robots. For instance, self-driving cars would need to understand their surroundings in real time to move about safely. While most AI models depend on massive amounts of labelled data or video footage, V-JEPA 2 reportedly uses something known as simplified 'latent' space to reason about how an object moves or interacts. According to Meta's Chief AI scientist, Yann LeCun, a world model is an 'abstract digital twin of reality' that allows AI to predict what will happen next and plan accordingly. It is a big leap towards making AI more useful in the physical world. In one of his recent presentations, LeCun stated that helping machines understand the physical world is different from teaching them language. World models, which are a recent phenomenon, are gaining attention in the AI research community for bringing new dimensions other than large language models used in tools like ChatGPT and Google Gemini. In September 2024, noted AI researcher Fei-Fei Li raised $230 million for her startup World Labs, which focuses on building large-scale world models. On the other hand, Google DeepMind is also developing its own version of a world model named Genie which is capable of simulating 3D environments and games in real time.
Yahoo
2 days ago
- Yahoo
Meta Takes Next Steps Towards the Development of True Artificial Intelligence
This story was originally published on Social Media Today. To receive daily news and insights, subscribe to our free daily Social Media Today newsletter. Meta's looking to take the next major steps in AI development, which has become a key focus for Meta CEO Mark Zuckerberg, as he eyes a new era of digital interaction, where true artificial intelligence can be achieved. Meta's latest project on this front is its V-JEPA 2 world model, which can help computer systems understand more about their environment, in order to factor such into their responses. As explained by Meta: 'As humans, we have the ability to predict how the physical world will evolve in response to our actions or the actions of others. For example, you know that if you toss a tennis ball into the air, gravity will pull it back down. When you walk through an unfamiliar crowded area, you're making moves toward your destination, while also trying not to bump into people or obstacles along the path. We achieve this physical intuition by observing the world around us and developing an internal model of it, which we can use to predict the outcomes of hypothetical actions.' Meta's V-JEPA 2 model helps AI agents mimic this, in order to assist in their understanding of the physical world, and therefore how they should respond to additional elements. 'We trained V-JEPA 2 using video, which helped the model learn important patterns in the physical world, including how people interact with objects, how objects move in the physical world and how objects interact with other objects. When deployed on robots in our labs, we found that robots can use V-JEPA 2 to perform tasks like reaching, picking up an object and placing an object in a new location.' It's another step towards automated general intelligence (AGI), and computer systems that replicate the human brain, and can 'think' for themselves to some degree. Which remains the Holy Grail of artificial intelligence research. Right now, the current wave of generative AI tools are good at mimicking human thought, by replicating responses based on whatever data inputs they can access. But these are not actually 'intelligence,' these systems are not thinking, they're using advanced predictive models to place one letter in front of the other, based on the examples of similar inputs that they can reference. They're smart spreadsheets, they're mimics of human behavior, but there's no actual thinking involved, these systems are not interpreting the various elements and coming up with their own, novel solutions. Which is what Meta's looking to develop with its new 'superintelligence' project, for which it's hiring a range of industry experts for a bigger push into AGI development. Is AGI even possible? Nobody knows, but Meta's AI chief Yann Le Cun, who's one of the most experienced and respected minds in AI development, seems to think so. Though we're not close yet. As LeCun told the Joint Mathematics Meetings conference in March this year: 'A lot of people in the AI research and development community are perceiving the idea that perhaps we have a shot within the next decade or so of building machines that have a blueprint that might eventually reach human level intelligence. The estimates about how long it's going to take vary by huge amounts, with the most optimistic people saying that we're already there. Some people who are raising lots of funds are claiming it's going to happen next year, but I don't believe so myself.' LeCun says that, through open source development, and a focus on key models, we could develop a human-level AI intelligence framework 'within the next decade or half decade,' which would lead to systems that are capable of actual planning and reasoning. But LeCun doesn't believe that the current gen AI models are the right framework to build upon in this respect. Maybe that will be part of Meta's new project, in exploring all new ways to look at AI development, with a view to a different kind of intelligence that can assess and understand different elements. This latest project is another step in that direction, and with Meta also upping its compute power significantly, it may well be best placed to crack the code, and build real systematic intelligence. Imagine the Large Hadron Collider, but for AI development. That's what Meta's likely working towards with its new AI project. Recommended Reading Meta Posts Increase in Users, Steady Revenue, in Latest Performance Update