Realtime Video AI with Diffusion Models
Generative AI is opening new possibilities for creating and transforming video in real time. In this talk, I’ll explore how recent models such as StreamDiffusion and LongLive push diffusion techniques into practical use for low-latency video generation and transformation. I’ll give a deep technical walkthrough of how these systems can be adapted for streaming use cases, unpacking the full pipeline - from decoding, through the diffusion process, to encoding - and highlighting optimisation strategies, such as KV caching, that make interactive generation possible. I’ll also discuss the tradeoffs between ultra-low latency video transformation and generating longer, more coherent streams. To make it concrete, I’ll present demos of StreamDiffusion (served with the open-source cloud service Daydream) and LongLive (explored with the open-source research tool Scope), showcasing practical examples of both video-to-video transformation and streaming text-to-video generation.