Spaces:
Paused
Paused
Update README.md
Browse files
README.md
CHANGED
@@ -1,132 +1,12 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-
git clone https://github.com/huggingface/diffusers && cd diffusers && pip install -e . && cd ..
|
14 |
-
```
|
15 |
-
|
16 |
-
Then install `q8_kernels`, following instructions from [here](https://github.com/KONAKONA666/q8_kernels/?tab=readme-ov-file#installation).
|
17 |
-
|
18 |
-
To run inference with the Q8 kernels, we need some minor changes in `diffusers`. Apply [this patch](https://github.com/sayakpaul/q8-ltx-video/blob/368f549ca5136daf89049c9efe32748e73aca317/updates.patch) to take those into account:
|
19 |
-
|
20 |
-
```bash
|
21 |
-
git apply updates.patch
|
22 |
-
```
|
23 |
-
|
24 |
-
Now we can run inference:
|
25 |
-
|
26 |
-
```bash
|
27 |
-
python inference.py \
|
28 |
-
--prompt="A woman with long brown hair and light skin smiles at another woman with long blonde hair. The woman with brown hair wears a black jacket and has a small, barely noticeable mole on her right cheek. The camera angle is a close-up, focused on the woman with brown hair's face. The lighting is warm and natural, likely from the setting sun, casting a soft glow on the scene. The scene appears to be real-life footage"
|
29 |
-
--negative_prompt="worst quality, inconsistent motion, blurry, jittery, distorted" \
|
30 |
-
--q8_transformer_path="sayakpaul/q8-ltx-video"
|
31 |
-
```
|
32 |
-
|
33 |
-
## Why does the repo exist and some more details?
|
34 |
-
|
35 |
-
There already exists [`KONAKONA666/LTX-Video`](https://github.com/KONAKONA666/LTX-Video). Then why this repo?
|
36 |
-
|
37 |
-
That repo uses custom implementations of the LTX-Video pipeline components and can be hard to directly use in `diffusers`. This repo repurposes the kernels from the `q8_kernels` on the components directly from `diffusers`.
|
38 |
-
|
39 |
-
<details>
|
40 |
-
<summary>More details</summary>
|
41 |
-
|
42 |
-
We do this by first converting the state dict of the original [LTX-Video transformer](https://huggingface.co/Lightricks/LTX-Video/tree/main/transformer). This includes FP8 quantization. This process also requires replacing:
|
43 |
-
|
44 |
-
* linear layers of the model
|
45 |
-
* RMSNorms of the model
|
46 |
-
* GELUs of the model
|
47 |
-
|
48 |
-
before the converted state dict is loaded into the model. Some layer params are kept in FP32 and some layers are not even quantized. Replacement utilities are in [`q8_ltx.py`](./q8_ltx.py).
|
49 |
-
|
50 |
-
The model can then be serialized. The conversion and serialization are coded in [`conversion_utils.py`](./conversion_utils.py).
|
51 |
-
|
52 |
-
During loading the model and using it for inference, we:
|
53 |
-
|
54 |
-
* initialize the transformer model under a "meta" device
|
55 |
-
* follow the same layer replacement scheme as detailed above
|
56 |
-
* populate the converted state dict
|
57 |
-
* replace the attention processors to use [the flash attention implementation](https://github.com/KONAKONA666/q8_kernels/blob/9cee3f3d4ca5ec8ab463179be32c8001e31f8f33/q8_kernels/functional/flash_attention.py) one from `q8_kernels`
|
58 |
-
|
59 |
-
Refer [here](https://github.com/sayakpaul/q8-ltx-video/blob/368f549ca5136daf89049c9efe32748e73aca317/inference.py#L48) more details. Additionally, we leverage [flash-attention implementation](https://github.com/sayakpaul/q8-ltx-video/blob/368f549ca5136daf89049c9efe32748e73aca317/q8_attention_processors.py#L44) from `q8_kernels` which provides further speedup.
|
60 |
-
|
61 |
-
</details>
|
62 |
-
|
63 |
-
## Performance
|
64 |
-
|
65 |
-
Below numbers were obtained for `max_sequence_length=512`, `num_inference_steps=50`, `num_frames=81`, `resolution=480x704`. Rest of the arguments were fixed at their default values as noticed in the [pipeline call signature of LTX-Video](https://github.com/huggingface/diffusers/blob/4b9f1c7d8c2e476eed38af3144b79105a5efcd93/src/diffusers/pipelines/ltx/pipeline_ltx.py#L496). The numbers also don't include the VAE decoding time to solely focus on the transformer.
|
66 |
-
|
67 |
-
|
68 |
-
| | **Time (Secs)** | **Memory (MB)** |
|
69 |
-
|:-----------:|:-----------:|:-----------:|
|
70 |
-
| Non Q8 | 16.192 | 7172.86 |
|
71 |
-
| Non Q8 (+ compile) | 16.205 | - |
|
72 |
-
| Q8 | 9.572 | 5413.51 |
|
73 |
-
| Q8 (+ compile) | 6.747 | - |
|
74 |
-
|
75 |
-
Benchmarking script is available in [`benchmark.py`](./benchmark.py). You would need to download the precomputed
|
76 |
-
prompt embeddings from [here](https://huggingface.co/sayakpaul/q8-ltx-video/blob/main/prompt_embeds.pt) before running the benchmark.
|
77 |
-
|
78 |
-
<details>
|
79 |
-
<summary>Env</summary>
|
80 |
-
|
81 |
-
```bash
|
82 |
-
+-----------------------------------------------------------------------------------------+
|
83 |
-
| NVIDIA-SMI 560.35.05 Driver Version: 560.35.05 CUDA Version: 12.6 |
|
84 |
-
|-----------------------------------------+------------------------+----------------------+
|
85 |
-
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
|
86 |
-
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
|
87 |
-
| | | MIG M. |
|
88 |
-
|=========================================+========================+======================|
|
89 |
-
| 0 NVIDIA GeForce RTX 4090 Off | 00000000:01:00.0 Off | Off |
|
90 |
-
| 0% 46C P8 18W / 450W | 2MiB / 24564MiB | 0% Default |
|
91 |
-
| | | N/A |
|
92 |
-
+-----------------------------------------+------------------------+----------------------+
|
93 |
-
```
|
94 |
-
|
95 |
-
`diffusers-cli env`:
|
96 |
-
|
97 |
-
```bash
|
98 |
-
- 🤗 Diffusers version: 0.33.0.dev0
|
99 |
-
- Platform: Linux-6.8.0-49-generic-x86_64-with-glibc2.39
|
100 |
-
- Running on Google Colab?: No
|
101 |
-
- Python version: 3.10.12
|
102 |
-
- PyTorch version (GPU?): 2.5.1+cu124 (True)
|
103 |
-
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
|
104 |
-
- Jax version: not installed
|
105 |
-
- JaxLib version: not installed
|
106 |
-
- Huggingface_hub version: 0.27.0
|
107 |
-
- Transformers version: 4.47.1
|
108 |
-
- Accelerate version: 1.2.1
|
109 |
-
- PEFT version: 0.13.2
|
110 |
-
- Bitsandbytes version: 0.44.1
|
111 |
-
- Safetensors version: 0.4.4
|
112 |
-
- xFormers version: 0.0.29.post1
|
113 |
-
- Accelerator: NVIDIA GeForce RTX 4090, 24564 MiB
|
114 |
-
NVIDIA GeForce RTX 4090, 24564 MiB
|
115 |
-
- Using GPU in script?: <fill in>
|
116 |
-
- Using distributed or parallel set-up in script?: <fill in>
|
117 |
-
```
|
118 |
-
|
119 |
-
</details>
|
120 |
-
|
121 |
-
> [!NOTE]
|
122 |
-
> The RoPE implementation from `q8_kernels` [isn't usable as of 1st Jan 2025](https://github.com/KONAKONA666/q8_kernels/blob/9cee3f3d4ca5ec8ab463179be32c8001e31f8f33/q8_kernels/functional/rope.py#L26). So, we resort to using [the one](https://github.com/huggingface/diffusers/blob/91008aabc4b8dbd96a356ab6f457f3bd84b10e8b/src/diffusers/models/transformers/transformer_ltx.py#L464) from `diffusers`.
|
123 |
-
|
124 |
-
|
125 |
-
## Comparison
|
126 |
-
|
127 |
-
Check out [this page](https://wandb.ai/sayakpaul/q8-ltx-video/runs/89h6ac5) on Weights and Biases that provides some comparative results. Generated videos are also available [here](./videos/).
|
128 |
-
|
129 |
-
## Acknowledgement
|
130 |
-
|
131 |
-
KONAKONA666's works on [`KONAKONA666/q8_kernels`](https://github.com/KONAKONA666/q8_kernels) and [KONAKONA666/LTX-Video](https://github.com/KONAKONA666/LTX-Video).
|
132 |
-
|
|
|
1 |
+
---
|
2 |
+
title: Q8-LTX-Video-Playground
|
3 |
+
emoji: 🧨
|
4 |
+
colorFrom: blue
|
5 |
+
colorTo: purple
|
6 |
+
sdk: gradio # Specify the SDK, e.g., gradio or streamlit
|
7 |
+
sdk_version: "4.44.1" # Specify the SDK version if needed
|
8 |
+
app_file: app.py
|
9 |
+
pinned: false # Set to true if you want to pin this Space
|
10 |
+
---
|
11 |
+
|
12 |
+
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|