pipeline_tag: video-to-video
license: cc-by-nc-4.0
example outputs (courtesy of dotsimulate)
zeroscope_v2 XL
A watermark-free Modelscope-based video model capable of generating high quality video at 1024 x 576. This model was trained from the original weights with offset noise using 9,923 clips and 29,769 tagged frames at 24 frames, 1024x576 resolution.
zeroscope_v2_XL is specifically designed for upscaling content made with zeroscope_v2_576w using vid2vid in the 1111 text2video extension by kabachuha. Leveraging this model as an upscaler allows for superior overall compositions at higher resolutions, permitting faster exploration in 576x320 (or 448x256) before transitioning to a high-resolution render.
zeroscope_v2_XL uses 15.3gb of vram when rendering 30 frames at 1024x576
Using it with the 1111 text2video extension
- Download files in the zs2_XL folder.
- Replace the respective files in the 'stable-diffusion-webui\models\ModelScope\t2v' directory.
Upscaling recommendations
For upscaling, it's recommended to use the 1111 extension. It works best at 1024x576 with a denoise strength between 0.66 and 0.85. Remember to use the same prompt that was used to generate the original clip.
Usage in 🧨 Diffusers
Let's first install the libraries required:
$ pip install git+https://github.com/huggingface/diffusers.git
$ pip install transformers accelerate torch
Now, let's first generate a low resolution video using cerspense/zeroscope_v2_576w.
import torch
from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler
from diffusers.utils import export_to_video
pipe = DiffusionPipeline.from_pretrained("cerspense/zeroscope_v2_576w", torch_dtype=torch.float16)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()
pipe.enable_vae_slicing()
prompt = "Darth Vader is surfing on waves"
video_frames = pipe(prompt, num_inference_steps=40, height=320, width=576, num_frames=36).frames
video_path = export_to_video(video_frames)
Next, we can upscale it using cerspense/zeroscope_v2_XL.
pipe = DiffusionPipeline.from_pretrained("cerspense/zeroscope_v2_XL", torch_dtype=torch.float16)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()
pipe.enable_vae_slicing()
video = [Image.fromarray(frame).resize((1024, 576)) for frame in video_frames]
video_frames = pipe(prompt, video=video, strength=0.6).frames
video_path = export_to_video(video_frames, output_video_path="/home/patrick/videos/video_1024_darth_vader_36.mp4")
Here are some results:
Darth vader is surfing on waves.Known issues
Rendering at lower resolutions or fewer than 24 frames could lead to suboptimal outputs.
Thanks to camenduru, kabachuha, ExponentialML, dotsimulate, VANYA, polyware, tin2tin