GAIA-1:
A Generative World Model for Autonomous Driving

Anthony Hu Lloyd Russell Hudson Yeo Zak Murez George Fedoseev

Alex Kendall Jamie Shotton Gianluca Corrado

Wayve

Paper Blog

Autonomous driving promises transformative improvements to transportation, but building systems capable of safely navigating the unstructured complexity of real-world scenarios remains challenging. A critical problem lies in effectively predicting the various potential outcomes that may emerge in response to the vehicle’s actions as the world evolves.

To address this challenge, we introduce GAIA-1 (‘Generative AI for Autonomy’), a generative world model that leverages video, text, and action inputs to generate realistic driving scenarios while offering fine-grained control over ego-vehicle behavior and scene features. A world model is a predictive model of the future, allowing it to understand the consequences of its actions. We cast world modelling as a self-supervised sequence modelling problem, where the goal is to predict the next discrete token in the sequence. Similarly to language models, we show that the performance of world models scales gracefully with more parameters (~10B) and compute. Emerging properties of GAIA-1 include: generalisation to out-of-distribution states, contextual awareness, and understanding of 3D geometry.

Generation from GAIA-1. — *Generation from GAIA-1 with different text promps, and action conditioning.*

GAIA-1: A Generative World Model for Autonomous Driving

GAIA-1:
A Generative World Model for Autonomous Driving