Helios: ByteDance and Peking University Release Real-Time 60-Second Video Model
A research team from Peking University and ByteDance released Helios on March 4, 2026 — a 14-billion-parameter open-source video generation model that generates minute-long video clips at near real-time speed on a single GPU. The model's technical architecture makes it unusual: it achieves its performance without the standard acceleration techniques that most AI video models rely on.
What Helios Is
Helios is an autoregressive diffusion transformer that generates videos up to approximately 60 seconds in length (1,440 frames at 24 FPS) at 19.5 frames per second on a single NVIDIA H100 GPU. It supports text-to-video, image-to-video, and video-to-video generation through a unified input architecture.
The model is fully open-source under the Apache 2.0 license. All code, weights, and training scripts are available on GitHub at PKU-YuanGroup/Helios, with three model checkpoints:
- Helios-Base (14B) — the base model, maximum quality at 50 sampling steps
- Helios-Mid (14B) — ~2x speedup via token compression at slight quality cost
- Helios-Distilled (14B) — 3-step distilled version delivering the 19.5 FPS real-time throughput
The distilled version is what produces the headline performance number. The base version trades speed for quality at 50 sampling steps.
What Makes It Technically Different
Most video generation models at the quality level of Helios require multiple techniques to remain viable on single-GPU hardware: KV-cache to avoid recomputing attention across frames, quantization to reduce memory footprint, sparse or linear attention to reduce computational cost, and keyframe sampling or self-forcing to prevent quality drift over long sequences.
Helios uses none of these. The team instead addressed the underlying problems through three training strategies:
Deep Compression Flow — aggressive compression of historical and noisy context within the model, reducing computational cost to levels comparable with 1.3B-parameter models despite running at 14B scale.
Easy Anti-Drifting — a training approach that explicitly simulates the drift problem (quality degradation over long sequences) during training, teaching the model to handle it intrinsically rather than applying external corrections at inference time.
Adversarial Hierarchical Distillation — reduces sampling from 50 steps to 3 steps in the distilled variant without the quality degradation typically associated with step reduction at this scale.
The result: inference costs comparable to models one-tenth its parameter count, four 14B instances fit within 80GB of GPU memory without parallelism or sharding frameworks, and minute-long generation without the repetitive loops and incoherent motion that typically appear in autoregressive models at extended durations.
With Group Offloading enabled, the model runs on as little as approximately 6GB of VRAM, making it viable on mid-range consumer hardware despite the 14B parameter count.
Context: ByteDance's Second Major Video Release in Two Weeks
Helios is ByteDance's second significant video model release in quick succession. Seedance 2.0, released in late February 2026, took a different architectural approach — multimodal fusion accepting up to 9 reference images, 3 video clips, and 3 audio tracks simultaneously, with a focus on controlled multi-input generation at up to 15 seconds.
The two models address different problems. Seedance 2.0 targets multi-reference control and audio-video fusion at high visual quality for short-form content. Helios targets raw generation speed and long-form coherence — generating minute-length content that was previously only feasible with multi-GPU clusters.
The ByteDance research team has noted that Helios is strictly a research release and is not planned for integration into any commercial ByteDance products. The Apache 2.0 license does however permit integration into third-party products.
Practical Implications
The significance of a 60-second real-time video model on a single H100 is primarily economic and logistical. Previously, generating a minute of AI video at this quality level required either hours of processing on consumer hardware or multi-GPU cloud clusters with proportionally higher cost. Helios reduces this to minutes on a single unit.
For independent filmmakers and studios, this changes storyboarding and previsualization workflows — generating a 60-second sequence, reviewing it, and iterating can now happen within a practical working session rather than an overnight batch process.
For content creators, 60 seconds is functionally the full length of a YouTube Short, TikTok, or Instagram Reel. Generating complete short-form video rather than assembling 5-10 second clips changes the production workflow for this format category.
For developers building video generation into products, the low minimum VRAM requirement (6GB with offloading) and open Apache 2.0 license enable deployment on hardware that was previously incompatible with quality video generation.
Related on Cliprise
Seedance 2.0 — ByteDance's parallel release from the same period, focused on multimodal control and audio-video fusion for high-quality short-form content — is available on Cliprise:
- Seedance 2.0 →
- Seedance 2.0 Complete Guide: Audio Sync, Multimodal Video, and Workflows →
- Seedance 1.5 Pro Complete Guide →
For broader context on where AI video generation stands in early 2026:
- Best AI Video Generator 2026: Real Tests, Real Costs →
- AI Video Generation 2026: 22+ Models, Workflows, and What Actually Works →
- Sora 2 vs Kling 3.0 vs Veo 3.1 →
Workflow tested on Cliprise with Seedance 2.0 and 47+ AI models. Sources: arXiv paper 2603.04379 (Peking University/ByteDance), ToKnow.ai technical analysis, The Decoder.