Course Outline — Spring 2026


Course Information


Course Description

This advanced research seminar treats video generation not merely as media synthesis, but as the foundation for General World Models and Embodied Agents. We will move beyond standard diffusion to explore the frontiers of Flow Matching and Autoregressive Visual Transformers. The curriculum emphasizes the convergence of video with robotics and interaction: specifically Vision-Language-Action (VLA) models, drivable 3D avatars, and neural simulators. Students will investigate how large-scale video pre-training enables "Playable Worlds," where AI agents can perceive, predict, and act within consistent, generated 4D environments.


Course Format & Tools


Weekly Schedule

Week 1: The New Thesis — Video Models as World Models

Slides