THUDM
/

CogVideoX-2b

@@ -19,7 +19,7 @@ inference: false
   </div>
   <p align="center">
   <a href="https://huggingface.co/THUDM/CogVideoX-2b/blob/main/README_zh.md">📄 中文阅读</a> |
-  <a href="https://github.com/THUDM/CogVideo">🌐 Github</a> |
   <a href="#">📜 arxiv (coming soon) </a>
 </p>
@@ -87,18 +87,20 @@ inference: false
 CogVideoX is an open-source video generation model that shares the same origins as [清影](https://chatglm.cn/video).
 The table below provides a list of the video generation models we currently offer, along with their basic information.
-| Model Name                                 | CogVideoX-2B (Current Repos)                     |
-|--------------------------------------------|--------------------------------------------------|
-| Supported Prompt Language                  | English                                          |
 | GPU Memory Required for Inference          | 36GB (will be optimized before the PR is merged) |
-| GPU Memory Required for Fine-tuning (bs=1) | 46.2GB                                           |
-| Prompt Length                              | 226 Tokens                                       |
-| Video Length                               | 6 seconds                                        |
-| Frames Per Second                          | 8 frames                                         |
-| Resolution                                 | 720 * 480                                        |
-| Positional Embeddings                      | Sinusoidal                                       |
-| Quantized Inference                        | Not Supported                                    |
-| Multi-card Inference                       | Not Supported                                    |
 ## Quick Start 🤗

   </div>
   <p align="center">
   <a href="https://huggingface.co/THUDM/CogVideoX-2b/blob/main/README_zh.md">📄 中文阅读</a> |
+  <a href="https://github.com/THUDM/CogVideo">🌐 Github(with PDF paper)</a> |
   <a href="#">📜 arxiv (coming soon) </a>
 </p>
 CogVideoX is an open-source video generation model that shares the same origins as [清影](https://chatglm.cn/video).
 The table below provides a list of the video generation models we currently offer, along with their basic information.
+| Model Name                                 | CogVideoX-2B (Current Repos)                  |
+|--------------------------------------------|-----------------------------------------------|
+| Supported Prompt Language                  | English                                       |
 | GPU Memory Required for Inference          | 36GB (will be optimized before the PR is merged) |
+| GPU Memory Required for Fine-tuning (bs=1) | 42GB                                          |
+| Prompt Length                              | 226 Tokens                                    |
+| Video Length                               | 6 seconds                                     |
+| Frames Per Second                          | 8 frames                                      |
+| Resolution                                 | 720 * 480                                     |
+| Positional Embeddings                      | Sinusoidal                                    |
+| Quantized Inference                        | Not Supported                                 |
+| Multi-card Inference                       | Not Supported                                 |
+**Note** Using [SAT](https://github.com/THUDM/SwissArmyTransformer) model cost 18GB for inference. Check our github.
 ## Quick Start 🤗

README_zh.md CHANGED Viewed

@@ -6,7 +6,7 @@
   </div>
   <p align="center">
   <a href="https://huggingface.co/THUDM/CogVideoX-2b/blob/main/README.md">📄 Read in English</a> |
-  <a href="https://github.com/THUDM/CogVideo">🌐 Github</a> |
   <a href="#">📜 arxiv (即将发布) </a>
 </p>
@@ -77,7 +77,7 @@ CogVideoX是 [清影](https://chatglm.cn/video) 同源的开源版本视频生
 |---------------|---------------------|
 | 提示词语言         | English             |
 | 推理显存消耗        | 36GB（会在PR合并之前优化)    |
-| 微调显存消耗 (bs=1) | 46.2GB              |
 | 提示词长度上限       | 226 Tokens          |
 | 视频生成长度        | 6 seconds           |
 | 视频生成帧率 (每秒)   | 8 frames            |
@@ -86,6 +86,8 @@ CogVideoX是 [清影](https://chatglm.cn/video) 同源的开源版本视频生
 | 量化            | 不支持                 |
 | 多卡推理          | 不支持                 |
 ## 快速上手 🤗
 本模型已经支持使用 huggingface 的 diffusers 库进行部署，你可以按照以下步骤进行部署。

   </div>
   <p align="center">
   <a href="https://huggingface.co/THUDM/CogVideoX-2b/blob/main/README.md">📄 Read in English</a> |
+  <a href="https://github.com/THUDM/CogVideo">🌐 Github(包含PDF论文)</a> |
   <a href="#">📜 arxiv (即将发布) </a>
 </p>
 |---------------|---------------------|
 | 提示词语言         | English             |
 | 推理显存消耗        | 36GB（会在PR合并之前优化)    |
+| 微调显存消耗 (bs=1) | 42GB                |
 | 提示词长度上限       | 226 Tokens          |
 | 视频生成长度        | 6 seconds           |
 | 视频生成帧率 (每秒)   | 8 frames            |
 | 量化            | 不支持                 |
 | 多卡推理          | 不支持                 |
+**Note** 使用 [SAT](https://github.com/THUDM/SwissArmyTransformer) 推理SAT版本模型仅需18G显存。欢迎前往我们的github查看。
 ## 快速上手 🤗
 本模型已经支持使用 huggingface 的 diffusers 库进行部署，你可以按照以下步骤进行部署。