tencent
/

HunyuanVideo

Text-to-Video

Model card Files Files and versions Community

ckczzj commited on 4 days ago

Commit

3a6a5fa

1 Parent(s): b104b34

Update README.md

Browse files

Files changed (1) hide show

README.md +23 -25

README.md CHANGED Viewed

@@ -21,7 +21,7 @@ This repo contains PyTorch model definitions, pre-trained weights and inference/
-## 🔥🔥🔥 News!!
 * Jan 13, 2025: 📈 We release the [Penguin Video Benchmark](https://github.com/Tencent/HunyuanVideo/blob/main/assets/PenguinVideoBenchmark.csv).
 * Dec 18, 2024: 🏃‍♂️ We release the [FP8 model weights](https://huggingface.co/tencent/HunyuanVideo/blob/main/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states_fp8.pt) of HunyuanVideo to save more GPU memory.
@@ -31,7 +31,7 @@ This repo contains PyTorch model definitions, pre-trained weights and inference/
-## 📑 Open-source Plan
 - HunyuanVideo (Text-to-Video Model)
   - [x] Inference
@@ -52,34 +52,31 @@ This repo contains PyTorch model definitions, pre-trained weights and inference/
 ## Contents
 - [HunyuanVideo: A Systematic Framework For Large Video Generation Model](#hunyuanvideo-a-systematic-framework-for-large-video-generation-model)
-  - [🎥 Demo](#-demo)
-  - [🔥🔥🔥 News!!](#-news)
-  - [🧩 Community Contributions](#-community-contributions)
-  - [📑 Open-source Plan](#-open-source-plan)
   - [Contents](#contents)
   - [**Abstract**](#abstract)
   - [**HunyuanVideo Overall Architecture**](#hunyuanvideo-overall-architecture)
-  - [🎉 **HunyuanVideo Key Features**](#-hunyuanvideo-key-features)
     - [**Unified Image and Video Generative Architecture**](#unified-image-and-video-generative-architecture)
     - [**MLLM Text Encoder**](#mllm-text-encoder)
     - [**3D VAE**](#3d-vae)
     - [**Prompt Rewrite**](#prompt-rewrite)
-  - [📈 Comparisons](#-comparisons)
-  - [📜 Requirements](#-requirements)
-  - [🛠️ Dependencies and Installation](#️-dependencies-and-installation)
     - [Installation Guide for Linux](#installation-guide-for-linux)
-  - [🧱 Download Pretrained Models](#-download-pretrained-models)
-  - [🔑 Single-gpu Inference](#-single-gpu-inference)
     - [Using Command Line](#using-command-line)
     - [Run a Gradio Server](#run-a-gradio-server)
     - [More Configurations](#more-configurations)
-  - [🚀 Parallel Inference on Multiple GPUs by xDiT](#-parallel-inference-on-multiple-gpus-by-xdit)
     - [Using Command Line](#using-command-line-1)
-  - [🚀  FP8 Inference](#--fp8-inference)
     - [Using Command Line](#using-command-line-2)
-  - [🔗 BibTeX](#-bibtex)
   - [Acknowledgements](#acknowledgements)
-  - [Star History](#star-history)
 ---
@@ -105,7 +102,7 @@ the 3D VAE decoder.
-## 🎉 **HunyuanVideo Key Features**
 ### **Unified Image and Video Generative Architecture**
@@ -151,7 +148,7 @@ The Prompt Rewrite Model can be directly deployed and inferred using the [Hunyua
-## 📈 Comparisons
 To evaluate the performance of HunyuanVideo, we selected five strong baselines from closed-source video generation models. In total, we utilized 1,533 text prompts, generating an equal number of video samples with HunyuanVideo in a single run. For a fair comparison, we conducted inference only once, avoiding any cherry-picking of results. When comparing with the baseline methods, we maintained the default settings for all selected models, ensuring consistent video resolution. Videos were assessed based on three criteria: Text Alignment, Motion Quality, and Visual Quality. More than 60 professional evaluators performed the evaluation. Notably, HunyuanVideo demonstrated the best overall performance, particularly excelling in motion quality. Please note that the evaluation is based on Hunyuan Video's high-quality version. This is different from the currently released fast version.
@@ -187,7 +184,7 @@ To evaluate the performance of HunyuanVideo, we selected five strong baselines f
-## 📜 Requirements
 The following table shows the requirements for running HunyuanVideo model (batch size = 1) to generate videos:
@@ -204,7 +201,7 @@ The following table shows the requirements for running HunyuanVideo model (batch
-## 🛠️ Dependencies and Installation
 Begin by cloning the repository:
@@ -273,13 +270,14 @@ docker run -itd --gpus all --init --net=host --uts=host --ipc=host --name hunyua
 ```
-## 🧱 Download Pretrained Models
 The details of download pretrained models are shown [here](ckpts/README.md).
-## 🔑 Single-gpu Inference
 We list the height/width/frame settings we support in the following table.
@@ -331,7 +329,7 @@ We list some more useful configurations for easy usage:
-## 🚀 Parallel Inference on Multiple GPUs by xDiT
 [xDiT](https://github.com/xdit-project/xDiT) is a Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters.
 It has successfully provided low-latency parallel inference solutions for a variety of DiTs models, including mochi-1, CogVideoX, Flux.1, SD3, etc. This repo adopted the [Unified Sequence Parallelism (USP)](https://arxiv.org/abs/2405.07719) APIs for parallel inference of the HunyuanVideo model.
@@ -416,7 +414,7 @@ You can change the `--ulysses-degree` and `--ring-degree` to control the paralle
-## 🚀  FP8 Inference
 Using HunyuanVideo with FP8 quantized weights, which saves about 10GB of GPU memory. You can download the [weights](https://huggingface.co/tencent/HunyuanVideo/blob/main/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states_fp8.pt) and [weight scales](https://huggingface.co/tencent/HunyuanVideo/blob/main/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states_fp8_map.pt) from Huggingface.
@@ -446,7 +444,7 @@ python3 sample_video.py \
-## 🔗 BibTeX
 If you find [HunyuanVideo](https://arxiv.org/abs/2412.03603) useful for your research and applications, please cite using this BibTeX:

+## News!!
 * Jan 13, 2025: 📈 We release the [Penguin Video Benchmark](https://github.com/Tencent/HunyuanVideo/blob/main/assets/PenguinVideoBenchmark.csv).
 * Dec 18, 2024: 🏃‍♂️ We release the [FP8 model weights](https://huggingface.co/tencent/HunyuanVideo/blob/main/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states_fp8.pt) of HunyuanVideo to save more GPU memory.
+## Open-source Plan
 - HunyuanVideo (Text-to-Video Model)
   - [x] Inference
 ## Contents
 - [HunyuanVideo: A Systematic Framework For Large Video Generation Model](#hunyuanvideo-a-systematic-framework-for-large-video-generation-model)
+  - [News!!](#news)
+  - [Open-source Plan](#open-source-plan)
   - [Contents](#contents)
   - [**Abstract**](#abstract)
   - [**HunyuanVideo Overall Architecture**](#hunyuanvideo-overall-architecture)
+  - [**HunyuanVideo Key Features**](#hunyuanvideo-key-features)
     - [**Unified Image and Video Generative Architecture**](#unified-image-and-video-generative-architecture)
     - [**MLLM Text Encoder**](#mllm-text-encoder)
     - [**3D VAE**](#3d-vae)
     - [**Prompt Rewrite**](#prompt-rewrite)
+  - [Comparisons](#comparisons)
+  - [Requirements](#requirements)
+  - [Dependencies and Installation](#️dependencies-and-installation)
     - [Installation Guide for Linux](#installation-guide-for-linux)
+  - [Download Pretrained Models](#download-pretrained-models)
+  - [Single-gpu Inference](#single-gpu-inference)
     - [Using Command Line](#using-command-line)
     - [Run a Gradio Server](#run-a-gradio-server)
     - [More Configurations](#more-configurations)
+  - [Parallel Inference on Multiple GPUs by xDiT](#parallel-inference-on-multiple-gpus-by-xdit)
     - [Using Command Line](#using-command-line-1)
+  - [FP8 Inference](#fp8-inference)
     - [Using Command Line](#using-command-line-2)
+  - [BibTeX](#bibtex)
   - [Acknowledgements](#acknowledgements)
 ---
+## **HunyuanVideo Key Features**
 ### **Unified Image and Video Generative Architecture**
+## Comparisons
 To evaluate the performance of HunyuanVideo, we selected five strong baselines from closed-source video generation models. In total, we utilized 1,533 text prompts, generating an equal number of video samples with HunyuanVideo in a single run. For a fair comparison, we conducted inference only once, avoiding any cherry-picking of results. When comparing with the baseline methods, we maintained the default settings for all selected models, ensuring consistent video resolution. Videos were assessed based on three criteria: Text Alignment, Motion Quality, and Visual Quality. More than 60 professional evaluators performed the evaluation. Notably, HunyuanVideo demonstrated the best overall performance, particularly excelling in motion quality. Please note that the evaluation is based on Hunyuan Video's high-quality version. This is different from the currently released fast version.
+## Requirements
 The following table shows the requirements for running HunyuanVideo model (batch size = 1) to generate videos:
+## Dependencies and Installation
 Begin by cloning the repository:
 ```
+## Download Pretrained Models
 The details of download pretrained models are shown [here](ckpts/README.md).
+## Single-gpu Inference
 We list the height/width/frame settings we support in the following table.
+## Parallel Inference on Multiple GPUs by xDiT
 [xDiT](https://github.com/xdit-project/xDiT) is a Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters.
 It has successfully provided low-latency parallel inference solutions for a variety of DiTs models, including mochi-1, CogVideoX, Flux.1, SD3, etc. This repo adopted the [Unified Sequence Parallelism (USP)](https://arxiv.org/abs/2405.07719) APIs for parallel inference of the HunyuanVideo model.
+## FP8 Inference
 Using HunyuanVideo with FP8 quantized weights, which saves about 10GB of GPU memory. You can download the [weights](https://huggingface.co/tencent/HunyuanVideo/blob/main/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states_fp8.pt) and [weight scales](https://huggingface.co/tencent/HunyuanVideo/blob/main/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states_fp8_map.pt) from Huggingface.
+## BibTeX
 If you find [HunyuanVideo](https://arxiv.org/abs/2412.03603) useful for your research and applications, please cite using this BibTeX: