About
Stable Video Diffusion (SVD) from Stability AI, is an extremely powerful image-to-video model, which accepts an image input, into which it “injects” motion, producing some fantastic scenes.
SVD is a latent diffusion model trained to generate short video clips from image inputs. There are two models. The first, img2vid, was trained to generate 14 frames of motion at a resolution of 576×1024, and the second, img2vid-xt is a finetune of the first, trained to generate 25 frames of motion at the same resolution.
Features
- SVD 14 frames: Trained on 14 frames video clips
- SVDXT 25 frames: Trained on 25 frames and then fine-tuned on the 14-frame dataset used for SVD. This additional training allows SVDXT to generate more complex and detailed videos.
- SVD faster: SVD may be slightly faster due to its smaller size and fewer training parameters.
- SVDXT higher quality: SVDXT may be slightly faster due to its smaller size and fewer training parameters.
Limitations
- Faces and bodies: Faces, and bodies in general, often aren’t the best!
- Not Promptable: The models cannot be controlled through text.
- Requires multiple renders: You'll likely need to run this model multiple times to get satisfactory results.
Usage Tips
- Still Images works best: Still photography refers to the creation of images using a camera and the capture of a single moment in time. You can produce still images from our other Image Generation Models.
- SVD for simplicity: If you prioritize speed and simplicity, SVD might be a better option.
- SVDXT for video quality: If you prioritize video quality and stability, SVDXT might be a better option.