imnotednamode commited on
Commit
bd67dc6
1 Parent(s): 0a13e38

add a readme (what is this anyway)

Browse files
Files changed (1) hide show
  1. README.md +18 -0
README.md CHANGED
@@ -1,3 +1,21 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+
5
+ This mixes mochi with a development version of diffusers to achieve high quality fast inference with the full 161 frames on a single 24gb card. This repo contains only the transformer. After installing the mochi development branch with `pip install git+https://github.com/huggingface/diffusers@mochi`, it can be loaded normally and used in a pipeline like so:
6
+ ```
7
+ from diffusers import MochiPipeline, MochiTransformer3DModel
8
+ from diffusers.utils import export_to_video
9
+ transformer = MochiTransformer3DModel.from_pretrained("imnotednamode/mochi-1-preview-mix-nf4")
10
+ pipe = MochiPipeline.from_pretrained("mochi-1-diffusers", torch_dtype=torch.bfloat16, transformer=transformer)
11
+ pipe.enable_model_cpu_offload()
12
+ pipe.enable_vae_tiling()
13
+ frames = pipe("A camera follows a squirrel running around on a tree branch", num_inference_steps=100, guidance_scale=4.5, height=480, width=848, num_frames=161).frames[0]
14
+ export_to_video(frames, "mochi.mp4", fps=15)
15
+ ```
16
+
17
+ In the above, you must also use the `convert_mochi_to_diffuser.py` script from https://github.com/huggingface/diffusers/pull/9769 to convert https://huggingface.co/genmo/mochi-1-preview to the diffusers format.
18
+
19
+ I've noticed raising the guidance_scale will allow the model to make a coherent output with less steps, but also reduces motion, as the model is trying to align mostly with the text prompt.
20
+
21
+ This version works by mixing nf4 weights and bf16 weights together. I notice that using pure nf4 weights degrades the model quality significantly, but using bf16 weights means the full 161 frames can't fit into vram. This version strikes a balance (most weights are in bf16).