StevenZhang
commited on
Commit
·
7200bf3
1
Parent(s):
ea3c0e6
Update README.md
Browse files
README.md
CHANGED
@@ -45,16 +45,15 @@ widgets:
|
|
45 |
|
46 |
# Video-to-Video
|
47 |
|
48 |
-
|
49 |
|
50 |
|
51 |
-
|
52 |
-
|
53 |
<center>
|
54 |
<p align="center">
|
55 |
<img src="https://huggingface.co/damo-vilab/MS-Vid2Vid-XL/resolve/main/assets/images/Fig_1.png"/>
|
56 |
<br/>
|
57 |
-
Fig.1
|
58 |
<p></center>
|
59 |
|
60 |
|
@@ -64,11 +63,9 @@ The **MS-Vid2Vid** project is developed and trained by Damo Academy and is prima
|
|
64 |
## 模型介绍 (Introduction)
|
65 |
|
66 |
|
67 |
-
**MS-
|
68 |
-
右边是高分(1280 * 720),总体会平滑很多,在很多case具有较强的修正能力。
|
69 |
-
|
70 |
|
71 |
-
**MS-Vid2Vid-XL**
|
72 |
|
73 |
|
74 |
<center>
|
|
|
45 |
|
46 |
# Video-to-Video
|
47 |
|
48 |
+
**MS-Vid2Vid-XL**旨在提升视频生成的时空连续性和分辨率,其作为I2VGen-XL的第二阶段以生成720P的视频,同时还可以用于文生视频、高清视频转换等任务。其训练数据包含了精选的海量的高清视频、图像数据(最短边>=720),可以将低分辨率的视频提升到更高分辨率(1280 * 720),且其可以处理几乎任意分辨率的视频(建议16:9的宽视频)。
|
49 |
|
50 |
|
51 |
+
**MS-Vid2Vid-XL** aims to improve the spatiotemporal continuity and resolution of video generation. It serves as the second stage of I2VGen-XL to generate 720P videos, and can also be used for various tasks such as text-to-video synthesis and high-quality video transfer. The training data includes a large collection of high-definition videos and images (with the shortest side >=720), allowing for the enhancement of low-resolution videos to higher resolutions (1280 * 720). It can handle videos of almost any resolution (preferably 16:9 aspect ratio).
|
|
|
52 |
<center>
|
53 |
<p align="center">
|
54 |
<img src="https://huggingface.co/damo-vilab/MS-Vid2Vid-XL/resolve/main/assets/images/Fig_1.png"/>
|
55 |
<br/>
|
56 |
+
Fig.1 MS-Vid2Vid-XL
|
57 |
<p></center>
|
58 |
|
59 |
|
|
|
63 |
## 模型介绍 (Introduction)
|
64 |
|
65 |
|
66 |
+
**MS-Vid2Vid-XL**和I2VGen-XL第一阶段相同,都是基于隐空间的视频扩散模型(VLDM),且其共享相同结构的时空UNet(ST-UNet),其设计细节延续我们自研[VideoComposer](https://videocomposer.github.io),具体可以参考其技术报告。
|
|
|
|
|
67 |
|
68 |
+
**MS-Vid2Vid-XL** and the first stage of I2VGen-XL share the same underlying video latent diffusion model (VLDM). They both utilize a spatiotemporal UNet (ST-UNet) with the same structure, which is designed based on our in-house VideoComposer. For more specific details, please refer to its technical report.
|
69 |
|
70 |
|
71 |
<center>
|