Text-to-Video
ckczzj commited on
Commit
3a6a5fa
Β·
1 Parent(s): b104b34

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -25
README.md CHANGED
@@ -21,7 +21,7 @@ This repo contains PyTorch model definitions, pre-trained weights and inference/
21
 
22
 
23
 
24
- ## πŸ”₯πŸ”₯πŸ”₯ News!!
25
 
26
  * Jan 13, 2025: πŸ“ˆ We release the [Penguin Video Benchmark](https://github.com/Tencent/HunyuanVideo/blob/main/assets/PenguinVideoBenchmark.csv).
27
  * Dec 18, 2024: πŸƒβ€β™‚οΈ We release the [FP8 model weights](https://huggingface.co/tencent/HunyuanVideo/blob/main/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states_fp8.pt) of HunyuanVideo to save more GPU memory.
@@ -31,7 +31,7 @@ This repo contains PyTorch model definitions, pre-trained weights and inference/
31
 
32
 
33
 
34
- ## πŸ“‘ Open-source Plan
35
 
36
  - HunyuanVideo (Text-to-Video Model)
37
  - [x] Inference
@@ -52,34 +52,31 @@ This repo contains PyTorch model definitions, pre-trained weights and inference/
52
  ## Contents
53
 
54
  - [HunyuanVideo: A Systematic Framework For Large Video Generation Model](#hunyuanvideo-a-systematic-framework-for-large-video-generation-model)
55
- - [πŸŽ₯ Demo](#-demo)
56
- - [πŸ”₯πŸ”₯πŸ”₯ News!!](#-news)
57
- - [🧩 Community Contributions](#-community-contributions)
58
- - [πŸ“‘ Open-source Plan](#-open-source-plan)
59
  - [Contents](#contents)
60
  - [**Abstract**](#abstract)
61
  - [**HunyuanVideo Overall Architecture**](#hunyuanvideo-overall-architecture)
62
- - [πŸŽ‰ **HunyuanVideo Key Features**](#-hunyuanvideo-key-features)
63
  - [**Unified Image and Video Generative Architecture**](#unified-image-and-video-generative-architecture)
64
  - [**MLLM Text Encoder**](#mllm-text-encoder)
65
  - [**3D VAE**](#3d-vae)
66
  - [**Prompt Rewrite**](#prompt-rewrite)
67
- - [πŸ“ˆ Comparisons](#-comparisons)
68
- - [πŸ“œ Requirements](#-requirements)
69
- - [πŸ› οΈ Dependencies and Installation](#️-dependencies-and-installation)
70
  - [Installation Guide for Linux](#installation-guide-for-linux)
71
- - [🧱 Download Pretrained Models](#-download-pretrained-models)
72
- - [πŸ”‘ Single-gpu Inference](#-single-gpu-inference)
73
  - [Using Command Line](#using-command-line)
74
  - [Run a Gradio Server](#run-a-gradio-server)
75
  - [More Configurations](#more-configurations)
76
- - [πŸš€ Parallel Inference on Multiple GPUs by xDiT](#-parallel-inference-on-multiple-gpus-by-xdit)
77
  - [Using Command Line](#using-command-line-1)
78
- - [πŸš€ FP8 Inference](#--fp8-inference)
79
  - [Using Command Line](#using-command-line-2)
80
- - [πŸ”— BibTeX](#-bibtex)
81
  - [Acknowledgements](#acknowledgements)
82
- - [Star History](#star-history)
83
 
84
  ---
85
 
@@ -105,7 +102,7 @@ the 3D VAE decoder.
105
 
106
 
107
 
108
- ## πŸŽ‰ **HunyuanVideo Key Features**
109
 
110
  ### **Unified Image and Video Generative Architecture**
111
 
@@ -151,7 +148,7 @@ The Prompt Rewrite Model can be directly deployed and inferred using the [Hunyua
151
 
152
 
153
 
154
- ## πŸ“ˆ Comparisons
155
 
156
  To evaluate the performance of HunyuanVideo, we selected five strong baselines from closed-source video generation models. In total, we utilized 1,533 text prompts, generating an equal number of video samples with HunyuanVideo in a single run. For a fair comparison, we conducted inference only once, avoiding any cherry-picking of results. When comparing with the baseline methods, we maintained the default settings for all selected models, ensuring consistent video resolution. Videos were assessed based on three criteria: Text Alignment, Motion Quality, and Visual Quality. More than 60 professional evaluators performed the evaluation. Notably, HunyuanVideo demonstrated the best overall performance, particularly excelling in motion quality. Please note that the evaluation is based on Hunyuan Video's high-quality version. This is different from the currently released fast version.
157
 
@@ -187,7 +184,7 @@ To evaluate the performance of HunyuanVideo, we selected five strong baselines f
187
 
188
 
189
 
190
- ## πŸ“œ Requirements
191
 
192
  The following table shows the requirements for running HunyuanVideo model (batch size = 1) to generate videos:
193
 
@@ -204,7 +201,7 @@ The following table shows the requirements for running HunyuanVideo model (batch
204
 
205
 
206
 
207
- ## πŸ› οΈ Dependencies and Installation
208
 
209
  Begin by cloning the repository:
210
 
@@ -273,13 +270,14 @@ docker run -itd --gpus all --init --net=host --uts=host --ipc=host --name hunyua
273
  ```
274
 
275
 
276
- ## 🧱 Download Pretrained Models
 
277
 
278
  The details of download pretrained models are shown [here](ckpts/README.md).
279
 
280
 
281
 
282
- ## πŸ”‘ Single-gpu Inference
283
 
284
  We list the height/width/frame settings we support in the following table.
285
 
@@ -331,7 +329,7 @@ We list some more useful configurations for easy usage:
331
 
332
 
333
 
334
- ## πŸš€ Parallel Inference on Multiple GPUs by xDiT
335
 
336
  [xDiT](https://github.com/xdit-project/xDiT) is a Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters.
337
  It has successfully provided low-latency parallel inference solutions for a variety of DiTs models, including mochi-1, CogVideoX, Flux.1, SD3, etc. This repo adopted the [Unified Sequence Parallelism (USP)](https://arxiv.org/abs/2405.07719) APIs for parallel inference of the HunyuanVideo model.
@@ -416,7 +414,7 @@ You can change the `--ulysses-degree` and `--ring-degree` to control the paralle
416
 
417
 
418
 
419
- ## πŸš€ FP8 Inference
420
 
421
  Using HunyuanVideo with FP8 quantized weights, which saves about 10GB of GPU memory. You can download the [weights](https://huggingface.co/tencent/HunyuanVideo/blob/main/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states_fp8.pt) and [weight scales](https://huggingface.co/tencent/HunyuanVideo/blob/main/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states_fp8_map.pt) from Huggingface.
422
 
@@ -446,7 +444,7 @@ python3 sample_video.py \
446
 
447
 
448
 
449
- ## πŸ”— BibTeX
450
 
451
  If you find [HunyuanVideo](https://arxiv.org/abs/2412.03603) useful for your research and applications, please cite using this BibTeX:
452
 
 
21
 
22
 
23
 
24
+ ## News!!
25
 
26
  * Jan 13, 2025: πŸ“ˆ We release the [Penguin Video Benchmark](https://github.com/Tencent/HunyuanVideo/blob/main/assets/PenguinVideoBenchmark.csv).
27
  * Dec 18, 2024: πŸƒβ€β™‚οΈ We release the [FP8 model weights](https://huggingface.co/tencent/HunyuanVideo/blob/main/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states_fp8.pt) of HunyuanVideo to save more GPU memory.
 
31
 
32
 
33
 
34
+ ## Open-source Plan
35
 
36
  - HunyuanVideo (Text-to-Video Model)
37
  - [x] Inference
 
52
  ## Contents
53
 
54
  - [HunyuanVideo: A Systematic Framework For Large Video Generation Model](#hunyuanvideo-a-systematic-framework-for-large-video-generation-model)
55
+ - [News!!](#news)
56
+ - [Open-source Plan](#open-source-plan)
 
 
57
  - [Contents](#contents)
58
  - [**Abstract**](#abstract)
59
  - [**HunyuanVideo Overall Architecture**](#hunyuanvideo-overall-architecture)
60
+ - [**HunyuanVideo Key Features**](#hunyuanvideo-key-features)
61
  - [**Unified Image and Video Generative Architecture**](#unified-image-and-video-generative-architecture)
62
  - [**MLLM Text Encoder**](#mllm-text-encoder)
63
  - [**3D VAE**](#3d-vae)
64
  - [**Prompt Rewrite**](#prompt-rewrite)
65
+ - [Comparisons](#comparisons)
66
+ - [Requirements](#requirements)
67
+ - [Dependencies and Installation](#️dependencies-and-installation)
68
  - [Installation Guide for Linux](#installation-guide-for-linux)
69
+ - [Download Pretrained Models](#download-pretrained-models)
70
+ - [Single-gpu Inference](#single-gpu-inference)
71
  - [Using Command Line](#using-command-line)
72
  - [Run a Gradio Server](#run-a-gradio-server)
73
  - [More Configurations](#more-configurations)
74
+ - [Parallel Inference on Multiple GPUs by xDiT](#parallel-inference-on-multiple-gpus-by-xdit)
75
  - [Using Command Line](#using-command-line-1)
76
+ - [FP8 Inference](#fp8-inference)
77
  - [Using Command Line](#using-command-line-2)
78
+ - [BibTeX](#bibtex)
79
  - [Acknowledgements](#acknowledgements)
 
80
 
81
  ---
82
 
 
102
 
103
 
104
 
105
+ ## **HunyuanVideo Key Features**
106
 
107
  ### **Unified Image and Video Generative Architecture**
108
 
 
148
 
149
 
150
 
151
+ ## Comparisons
152
 
153
  To evaluate the performance of HunyuanVideo, we selected five strong baselines from closed-source video generation models. In total, we utilized 1,533 text prompts, generating an equal number of video samples with HunyuanVideo in a single run. For a fair comparison, we conducted inference only once, avoiding any cherry-picking of results. When comparing with the baseline methods, we maintained the default settings for all selected models, ensuring consistent video resolution. Videos were assessed based on three criteria: Text Alignment, Motion Quality, and Visual Quality. More than 60 professional evaluators performed the evaluation. Notably, HunyuanVideo demonstrated the best overall performance, particularly excelling in motion quality. Please note that the evaluation is based on Hunyuan Video's high-quality version. This is different from the currently released fast version.
154
 
 
184
 
185
 
186
 
187
+ ## Requirements
188
 
189
  The following table shows the requirements for running HunyuanVideo model (batch size = 1) to generate videos:
190
 
 
201
 
202
 
203
 
204
+ ## Dependencies and Installation
205
 
206
  Begin by cloning the repository:
207
 
 
270
  ```
271
 
272
 
273
+
274
+ ## Download Pretrained Models
275
 
276
  The details of download pretrained models are shown [here](ckpts/README.md).
277
 
278
 
279
 
280
+ ## Single-gpu Inference
281
 
282
  We list the height/width/frame settings we support in the following table.
283
 
 
329
 
330
 
331
 
332
+ ## Parallel Inference on Multiple GPUs by xDiT
333
 
334
  [xDiT](https://github.com/xdit-project/xDiT) is a Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters.
335
  It has successfully provided low-latency parallel inference solutions for a variety of DiTs models, including mochi-1, CogVideoX, Flux.1, SD3, etc. This repo adopted the [Unified Sequence Parallelism (USP)](https://arxiv.org/abs/2405.07719) APIs for parallel inference of the HunyuanVideo model.
 
414
 
415
 
416
 
417
+ ## FP8 Inference
418
 
419
  Using HunyuanVideo with FP8 quantized weights, which saves about 10GB of GPU memory. You can download the [weights](https://huggingface.co/tencent/HunyuanVideo/blob/main/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states_fp8.pt) and [weight scales](https://huggingface.co/tencent/HunyuanVideo/blob/main/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states_fp8_map.pt) from Huggingface.
420
 
 
444
 
445
 
446
 
447
+ ## BibTeX
448
 
449
  If you find [HunyuanVideo](https://arxiv.org/abs/2412.03603) useful for your research and applications, please cite using this BibTeX:
450