Course Outline — Spring 2026
This advanced research seminar treats video generation not merely as media synthesis, but as the foundation for General World Models and Embodied Agents. We will move beyond standard diffusion to explore the frontiers of Flow Matching and Autoregressive Visual Transformers. The curriculum emphasizes the convergence of video with robotics and interaction: specifically Vision-Language-Action (VLA) models, drivable 3D avatars, and neural simulators. Students will investigate how large-scale video pre-training enables "Playable Worlds," where AI agents can perceive, predict, and act within consistent, generated 4D environments.