DeepMind’s Genie 3 pushes AI world models into real-time, interactive, and highly realistic territory
Google DeepMind has unveiled Genie 3, its most advanced “world model” to date, capable of generating vast, dynamic environments from simple text prompts — and letting users explore them in real time.
The breakthrough system can create interactive worlds running at 24 frames per second in 720p resolution, maintaining visual and physical consistency for several minutes. That consistency, which extends up to a minute of “visual memory,” allows users to revisit areas and see them exactly as they left them.
World models are AI systems designed to simulate and predict how environments evolve and respond to actions — a crucial step toward artificial general intelligence. They allow AI agents to train within unlimited, richly detailed simulations. DeepMind’s earlier Genie 1 and Genie 2 models could already produce new environments, but Genie 3 marks the first time the technology allows real-time interaction with improved realism.
Demonstrations of Genie 3 show its ability to simulate diverse settings, from volcanic landscapes navigated by off-road robots, to coastal roads battered by hurricanes, to deep-sea jellyfish schools drifting past hydrothermal vents. It can also generate lush forests teeming with wildlife, historical reconstructions like the palace of Knossos, and fantastical worlds with animated creatures, floating mountains, or whimsical mushroom houses.
The model can reproduce natural phenomena such as lightning, water flow, and environmental physics with striking realism. It can also handle more imaginative prompts — for example, a firefly exploring an enchanted forest or a giant gorilla in elaborate period clothing wandering through overgrown mansions.
Embed from Getty ImagesUnlike traditional video generation, Genie 3 must update environments frame by frame in response to user input. The system accounts for an ever-growing trajectory of actions, ensuring that each frame reflects not only the immediate action but also the long-term state of the world. This allows for promptable world events — changes triggered by text commands — enabling users to alter weather, add objects, or introduce characters mid-simulation. DeepMind says this capability vastly expands the potential for “what if” scenarios, valuable for both immersive entertainment and AI training.
Maintaining consistency over long horizons is a known hurdle for generative models. Traditional 3D techniques like Neural Radiance Fields or Gaussian Splatting achieve consistency by using fixed 3D representations, but these limit dynamism. Genie 3 generates each frame from scratch, informed by previous context, making its worlds more adaptable and rich. That technical leap required efficiency breakthroughs, as the model must process complex world states multiple times per second to keep pace with user inputs — all while preserving detail and physical accuracy.
DeepMind emphasises that Genie 3 is not just a visual showcase. It is designed for embodied agent research, enabling AI systems to learn in endlessly varied environments. The technology also offers possibilities for interactive education, gaming, scientific simulations, and historical reconstructions.
While Genie 3 represents a major step forward, DeepMind notes its limitations: environments can drift or lose accuracy over time, and performance may vary with more complex prompts. Responsible deployment, the team says, will be key as the technology advances toward broader public use.
For now, Genie 3 stands as a milestone in AI simulation, offering a glimpse into a future where humans and machines can co-create worlds in real time — and explore them together.
