AI World Models: The Next Wave of AI

AI world models are the most significant development in artificial intelligence since large language models but most business leaders have never heard of them.
While the public conversation focuses on which chatbot is smartest, over $2 billion in funding has quietly flowed into a new class of AI that doesn’t predict the next word. It predicts the next state of the physical world.
In the past few months alone, Yann LeCun (the Turing Award-winning former chief AI scientist at Meta) raised $1.03 billion for AMI Labs. Fei-Fei Li (known as the “godmother of AI”) raised over $1 billion for World Labs and is now valued at $5 billion. Google DeepMind launched Genie 3, the first real-time interactive world model and Nvidia’s Cosmos platform has been downloaded over 2 million times. World models were also one of the headline announcements at Nvidia’s GTC 2026 conference last week.
This is the foundational technology that will power robots, autonomous vehicles, industrial automation and physical AI systems – the machines that will operate in the real world alongside your business.
What Are AI World Models and Why Do They Matter?
Large language models like GPT-5 and Claude understand text. They predict the next word in a sequence. That’s powerful for writing, coding, analysis and conversation but language is, as researchers describe it, “an incredibly lossy compression of reality”. Describing a glass falling off a table in words is straightforward. Simulating the physics of it – the trajectory, the impact, the shatter pattern, the sound – requires something fundamentally different.
AI world models solve this. Instead of predicting the next word, they predict the next state of a physical environment based on actions taken within it. If you push the glass, what happens? If a robot turns left instead of right, what changes? If a warehouse layout shifts, how do the logistics adapt?
The distinction matters commercially because world models are what will make physical AI work. Every autonomous vehicle, every warehouse robot, every manufacturing AI system and every real-world agent needs to understand cause and effect in three-dimensional space. LLMs can’t do this, world models can.
As Nvidia’s Director of Robotics Jim Fan put it recently: “2026 will mark the first year that Large World Models lay real foundations for robotics, and for multimodal AI more broadly.” Jensen Huang called it, “the ChatGPT moment for physical AI”.
For businesses operating in logistics, manufacturing, retail, construction or any sector involving physical operations, this isn’t abstract research. It’s the technology that will reshape how work gets done in the physical world within the next three to five years. Understanding it now – through an AI Readiness Assessment followed by an AI Workshop – puts you ahead of the curve before the tools arrive at SME scale.
AI World Models: The Key Players and How They Compare
The world models landscape is moving fast, with several major players taking fundamentally different approaches. Here’s how the leading platforms compare.
AMI Labs (Yann LeCun) – The Theoretical Favourite
Approach: JEPA (Joint Embedding Predictive Architecture) – learns abstract representations by predicting missing parts of scenes rather than generating pixels.
Funding: $1.03 billion at a €3 billion valuation. No product released yet.
Target applications: Industrial process control, wearable devices, robotics, healthcare.
Pros: Founded by a Turing Award winner who has argued for years that LLMs are a dead end for general intelligence. JEPA’s approach of learning in abstract representation space rather than pixel space is theoretically more efficient and generalisable. V-JEPA 2 achieved zero-shot robot planning after training on just 62 hours of video data – a remarkable efficiency result. Meta is expected to be one of AMI’s first clients.
Cons: No product yet. The entire thesis is unproven at commercial scale. The team is small and spread across four cities. Competitors like Nvidia and Google have vastly more resources. If LLMs develop sufficient physical reasoning through scale alone, the separate world model thesis weakens.
World Labs (Fei-Fei Li) – The Commercial Frontrunner
Approach: Spatial intelligence – teaching AI to reason about 3D geometry, physical relationships and interactive environments. Their product, Marble, generates walkable 3D worlds from text, images or video.
Funding: Over $1 billion raised. In talks at a $5 billion valuation.
Target applications: 3D content creation, AR/VR, training data for embodied AI, game development.
Pros: First to ship a commercial product. Marble is available now, from free to $95/month. Fei-Fei Li’s credibility as the researcher behind ImageNet (the dataset that kick-started the deep learning revolution) gives World Labs unmatched authority. The spatial intelligence angle directly addresses why LLMs fail at physical reasoning – they don’t understand concepts like “behind” or “shelf” in geometric terms.
Cons: Marble currently generates static 3D scenes, not dynamic simulations. No embedded knowledge of object physics – a 3D Roman arch doesn’t know that removing a brick makes it collapse. The product is closer to a sophisticated 3D viewer than a true predictive world model. Bridging from impressive demos to real-world robotic applications remains unproven.
Google DeepMind – Genie 3 and SIMA 2
Approach: Two complementary systems. Genie 3 generates real-time interactive 3D environments at 24fps from text prompts. SIMA 2 is an embodied agent that learns and acts within those environments.
Funding: Backed by Google’s effectively unlimited resources.
Target applications: Game environments, agent training, embodied AI research.
Pros: Genie 3 is the first real-time interactive world model – generating navigable, persistent 3D worlds that respond to actions. SIMA 2 demonstrates what happens when you drop a generalist agent into simulated worlds and ask it to learn. Google’s compute resources and research depth are unmatched. The combination of world generation plus agent training in the same pipeline is the complete vision.
Cons: Primarily a research effort, not a commercial product. Access is limited. The gap between impressive research demos and production-ready tools for businesses remains wide. Google’s track record of launching and abandoning products gives enterprise customers pause.
Nvidia Cosmos – The Infrastructure Play
Approach: Open platform of world foundation models trained on 9,000 trillion tokens from 20 million hours of real-world data. Three model families: Predict (future state simulation), Transfer (bridging simulated and real environments), and Reason (physics-aware chain-of-thought reasoning).
Funding: Part of Nvidia’s broader AI infrastructure strategy.
Target applications: Autonomous vehicles, robotics, industrial simulation, synthetic training data.
Pros: The most accessible entry point – free and open-source. Already adopted by major players including Uber, Figure AI, Agility Robotics, and XPENG. Trained on the most comprehensive real-world dataset of any world model. Fits into Nvidia’s full-stack strategy alongside their chips, models and the Palantir Sovereign AI OS we covered recently. Two million downloads and growing.
Cons: Nvidia is the infrastructure layer, not the application layer. Cosmos provides building blocks, not finished solutions. Businesses still need significant technical capability to use it. The platform is designed for developers and researchers, not business users.
General Intuition – The Action-Conditioned Approach
Approach: Action-conditioned world models that learn to predict dynamics from video and the actions taken within it. Uses gaming data as the training ground for real-world physics understanding.
Funding: $133.7 million seed round.
Target applications: Robotics, embodied AI, autonomous systems.
Pros: Their core thesis is compelling: actions compress the complexity of physics prediction into a single fixed-cost operation. A model trained on action-labelled gaming clips learns cause and effect in a way that scales. The gaming-to-robotics pipeline provides abundant, cheap training data. The team co-wrote the definitive essay on world models with Packy McCormick’s Not Boring, which is the most comprehensive public explanation of the field.
Cons: Early stage – seed-funded with no commercial product yet. The gaming-to-real-world transfer is theoretically sound but unproven at scale. Competing against organisations with orders of magnitude more capital.
What AI World Models Mean for UK Businesses
World models won’t directly affect most SMEs tomorrow but they will reshape the industries SMEs operate within, and sooner than most people expect.
If your business involves physical operations – logistics, warehousing, manufacturing, construction, fleet management, retail – world models are the technology that will eventually power the autonomous systems operating in your sector. Understanding where this is heading helps you plan your AI Roadmap with the future in mind, not just today’s tools.
More immediately, the world models race illustrates a broader lesson: AI is forking into two parallel tracks. Language intelligence (LLMs, chatbots, AI agents) and physical intelligence (world models, robotics, autonomous systems). The businesses that understand both tracks – and plan accordingly – will be the ones best positioned as the technology matures.
An AI Workshop is where this kind of strategic thinking happens. Not just “which AI tool should we buy today?” but “where is AI heading, and how do we build a business that’s ready for what’s coming?”.
The Bottom Line
AI world models represent a fundamental shift in what artificial intelligence can do, from understanding language to understanding reality. Over $2 billion in funding from some of the most respected names in AI research signals that this isn’t speculative, it’s the next frontier of AI.
The race is early. No single player has won but the direction is clear: AI is moving from the screen into the physical world. The businesses that pay attention now will be ready when it arrives.
Complete our free AI Readiness Assessment to understand where your business sits in the current AI landscape and what’s coming next.


