StevenZhang commited on
Commit
7200bf3
·
1 Parent(s): ea3c0e6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -8
README.md CHANGED
@@ -45,16 +45,15 @@ widgets:
45
 
46
  # Video-to-Video
47
 
48
- 本项目**MS-Vid2Vid**由达摩院研发和训练,主要用于提升文生视频、图生视频的分辨率和时空连续性,其训练数据包含了精选的海量的高清视频、图像数据(最短边>720),可以将低分辨率的(16:9)的视频提升到更高分辨率(1280 * 720),可以用于任意低分辨率的的超分,本页面我们将称之为**MS-Vid2Vid-XL**。
49
 
50
 
51
- The **MS-Vid2Vid** project is developed and trained by Damo Academy and is primarily used to enhance the resolution and spatiotemporal continuity of text-generated videos and image-generated videos. The training data consists of a large selection of high-definition videos and image data (with a minimum short side length of 720), which can upscale low-resolution (16:9) videos to higher resolutions (1280 * 720). It can be used for arbitrary low-resolution super-resolution tasks. On this page, we refer to it as **MS-Vid2Vid-XL**.
52
-
53
  <center>
54
  <p align="center">
55
  <img src="https://huggingface.co/damo-vilab/MS-Vid2Vid-XL/resolve/main/assets/images/Fig_1.png"/>
56
  <br/>
57
- Fig.1 Video-to-Video-XL
58
  <p></center>
59
 
60
 
@@ -64,11 +63,9 @@ The **MS-Vid2Vid** project is developed and trained by Damo Academy and is prima
64
  ## 模型介绍 (Introduction)
65
 
66
 
67
- **MS-Vid2VidL**是基于Stable Diffusion设计而得,其设计细节延续我们自研[VideoComposer](https://videocomposer.github.io),具体可以参考其技术报告。如下示例中,左边是低分(448 * 256),细节会存在抖动,时序一致性较差
68
- 右边是高分(1280 * 720),总体会平滑很多,在很多case具有较强的修正能力。
69
-
70
 
71
- **MS-Vid2Vid-XL** is designed based on Stable Diffusion, with design details inherited from our in-house [VideoComposer](https://videocomposer.github.io). For specific information, please refer to our technical report.
72
 
73
 
74
  <center>
 
45
 
46
  # Video-to-Video
47
 
48
+ **MS-Vid2Vid-XL**旨在提升视频生成的时空连续性和分辨率,其作为I2VGen-XL的第二阶段以生成720P的视频,同时还可以用于文生视频、高清视频转换等任务。其训练数据包含了精选的海量的高清视频、图像数据(最短边>=720),可以将低分辨率的视频提升到更高分辨率(1280 * 720),且其可以处理几乎任意分辨率的视频(建议16:9的宽视频)
49
 
50
 
51
+ **MS-Vid2Vid-XL** aims to improve the spatiotemporal continuity and resolution of video generation. It serves as the second stage of I2VGen-XL to generate 720P videos, and can also be used for various tasks such as text-to-video synthesis and high-quality video transfer. The training data includes a large collection of high-definition videos and images (with the shortest side >=720), allowing for the enhancement of low-resolution videos to higher resolutions (1280 * 720). It can handle videos of almost any resolution (preferably 16:9 aspect ratio).
 
52
  <center>
53
  <p align="center">
54
  <img src="https://huggingface.co/damo-vilab/MS-Vid2Vid-XL/resolve/main/assets/images/Fig_1.png"/>
55
  <br/>
56
+ Fig.1 MS-Vid2Vid-XL
57
  <p></center>
58
 
59
 
 
63
  ## 模型介绍 (Introduction)
64
 
65
 
66
+ **MS-Vid2Vid-XL**和I2VGen-XL第一阶段相同,都是基于隐空间的视频扩散模型(VLDM),且其共享相同结构的时空UNet(ST-UNet),其设计细节延续我们自研[VideoComposer](https://videocomposer.github.io),具体可以参考其技术报告。
 
 
67
 
68
+ **MS-Vid2Vid-XL** and the first stage of I2VGen-XL share the same underlying video latent diffusion model (VLDM). They both utilize a spatiotemporal UNet (ST-UNet) with the same structure, which is designed based on our in-house VideoComposer. For more specific details, please refer to its technical report.
69
 
70
 
71
  <center>