benjamin-paine commited on
Commit
babeecd
1 Parent(s): 03e71d4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -240
README.md CHANGED
@@ -1,35 +1,39 @@
1
  ---
2
  license: apache-2.0
3
  ---
4
- This repository contains a pruned and partially reorganized version of [AniPortrait](https://fudan-generative-vision.github.io/champ/#/).
5
 
6
  ```
7
- @misc{wei2024aniportrait,
8
- title={AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animations},
9
- author={Huawei Wei and Zejun Yang and Zhisheng Wang},
10
  year={2024},
11
- eprint={2403.17694},
12
  archivePrefix={arXiv},
13
  primaryClass={cs.CV}
14
  }
15
  ```
16
 
17
- # Usage
 
 
 
 
18
 
19
- ## Installation
20
 
21
- First, install the AniPortrait package into your python environment. If you're creating a new environment for AniPortrait, be sure you also specify the version of torch you want with CUDA support, or else this will try to run only on CPU.
22
 
23
  ```sh
24
- pip install git+https://github.com/painebenjamin/aniportrait.git
25
  ```
26
 
27
  Now, you can create the pipeline, automatically pulling the weights from this repository, either as individual models:
28
 
29
  ```py
30
- from aniportrait import AniPortraitPipeline
31
- pipeline = AniPortraitPipeline.from_pretrained(
32
- "benjamin-paine/aniportrait",
33
  torch_dtype=torch.float16,
34
  variant="fp16",
35
  device="cuda"
@@ -39,242 +43,30 @@ pipeline = AniPortraitPipeline.from_pretrained(
39
  Or, as a single file:
40
 
41
  ```py
42
- from aniportrait import AniPortraitPipeline
43
- pipeline = AniPortraitPipeline.from_single_file(
44
- "benjamin-paine/aniportrait",
45
  torch_dtype=torch.float16,
46
  variant="fp16",
47
  device="cuda"
48
  ).to("cuda", dtype=torch.float16)
49
  ```
50
 
51
- The `AniPortraitPipeline` is a mega pipeline, capable of instantiating and executing other pipelines. It provides the following functions:
52
-
53
- ## Workflows
54
-
55
- ### img2img
56
-
57
- ```py
58
- pipeline.img2img(
59
- reference_image: PIL.Image.Image,
60
- pose_reference_image: PIL.Image.Image,
61
- num_inference_steps: int,
62
- guidance_scale: float,
63
- eta: float=0.0,
64
- reference_pose_image: Optional[Image.Image]=None,
65
- generation: Optional[Union[torch.Generator, List[torch.Generator]]]=None,
66
- output_type: Optional[str]="pil",
67
- return_dict: bool=True,
68
- callback: Optional[Callable[[int, int, torch.FloatTensor], None]]=None,
69
- callback_steps: Optional[int]=None,
70
- width: Optional[int]=None,
71
- height: Optional[int]=None,
72
- **kwargs: Any
73
- ) -> Pose2VideoPipelineOutput
74
- ```
75
-
76
- Using a reference image (for structure) and a pose reference image (for pose), render an image of the former in the pose of the latter.
77
- - The pose reference image here is an unprocessed image, from which the face pose will be extracted.
78
- - Optionally pass `reference_pose_image` to designate the pose of `reference_image`. When not passed, the pose of `reference_image` is automatically detected.
79
-
80
- ### vid2vid
81
-
82
- ```py
83
- pipeline.vid2vid(
84
- reference_image: PIL.Image.Image,
85
- pose_reference_images: List[PIL.Image.Image],
86
- num_inference_steps: int,
87
- guidance_scale: float,
88
- eta: float=0.0,
89
- reference_pose_image: Optional[Image.Image]=None,
90
- generation: Optional[Union[torch.Generator, List[torch.Generator]]]=None,
91
- output_type: Optional[str]="pil",
92
- return_dict: bool=True,
93
- callback: Optional[Callable[[int, int, torch.FloatTensor], None]]=None,
94
- callback_steps: Optional[int]=None,
95
- width: Optional[int]=None,
96
- height: Optional[int]=None,
97
- video_length: Optional[int]=None,
98
- context_schedule: str="uniform",
99
- context_frames: int=16,
100
- context_overlap: int=4,
101
- context_batch_size: int=1,
102
- interpolation_factor: int=1,
103
- use_long_video: bool=True,
104
- **kwargs: Any
105
- ) -> Pose2VideoPipelineOutput
106
- ```
107
-
108
- Using a reference image (for structure) and a sequence of pose reference images (for pose), render a video of the former in the poses of the latter, using context windowing for long-video generation when the poses are longer than 16 frames.
109
- - Optionally pass `use_long_video = false` to disable using the long video pipeline.
110
- - Optionally pass `reference_pose_image` to designate the pose of `reference_image`. When not passed, the pose of `reference_image` is automatically detected.
111
- - Optionally pass `video_length` to use this many frames. Default is the same as the length of the pose reference images.
112
-
113
- ### audio2vid
114
-
115
- ```py
116
- pipeline.audio2vid(
117
- audio: str,
118
- reference_image: PIL.Image.Image,
119
- num_inference_steps: int,
120
- guidance_scale: float,
121
- fps: int=30,
122
- eta: float=0.0,
123
- reference_pose_image: Optional[Image.Image]=None,
124
- pose_reference_images: Optional[List[PIL.Image.Image]]=None,
125
- generation: Optional[Union[torch.Generator, List[torch.Generator]]]=None,
126
- output_type: Optional[str]="pil",
127
- return_dict: bool=True,
128
- callback: Optional[Callable[[int, int, torch.FloatTensor], None]]=None,
129
- callback_steps: Optional[int]=None,
130
- width: Optional[int]=None,
131
- height: Optional[int]=None,
132
- video_length: Optional[int]=None,
133
- context_schedule: str="uniform",
134
- context_frames: int=16,
135
- context_overlap: int=4,
136
- context_batch_size: int=1,
137
- interpolation_factor: int=1,
138
- use_long_video: bool=True,
139
- **kwargs: Any
140
- ) -> Pose2VideoPipelineOutput
141
- ```
142
-
143
- Using an audio file, draw `fps` face pose images per second for the duration of the audio. Then, using those face pose images, render a video.
144
- - Optionally include a list of images to extract the poses from prior to merging with audio-generated poses (in essence, pass a video here to control non-speech motion). The default is a moderately active loop of head movement.
145
- - Optionally pass width/height to modify the size. Defaults to reference image size.
146
- - Optionally pass `use_long_video = false` to disable using the long video pipeline.
147
- - Optionally pass `reference_pose_image` to designate the pose of `reference_image`. When not passed, the pose of `reference_image` is automatically detected.
148
- - Optionally pass `video_length` to use this many frames. Default is the same as the length of the pose reference images.
149
-
150
- ## Internals/Helpers
151
-
152
- ### img2pose
153
-
154
- ```py
155
- pipeline.img2pose(
156
- reference_image: PIL.Image.Image,
157
- width: Optional[int]=None,
158
- height: Optional[int]=None
159
- ) -> PIL.Image.Image
160
- ```
161
-
162
- Detects face landmarks in an image and draws a face pose image.
163
- - Optionally modify the original width and height.
164
-
165
- ### vid2pose
166
-
167
- ```py
168
- pipeline.vid2pose(
169
- reference_image: PIL.Image.Image,
170
- retarget_image: Optional[PIL.Image.Image],
171
- width: Optional[int]=None,
172
- height: Optional[int]=None
173
- ) -> List[PIL.Image.Image]
174
- ```
175
-
176
- Detects face landmarks in a series of images and draws pose images.
177
- - Optionally modify the original width and height.
178
- - Optionally retarget to a different face position, useful for video-to-video tasks.
179
-
180
- ### audio2pose
181
 
182
  ```py
183
- pipeline.audio2pose(
184
- audio_path: str,
185
- fps: int=30,
186
- reference_image: Optional[PIL.Image.Image]=None,
187
- pose_reference_images: Optional[List[PIL.Image.Image]]=None,
188
- width: Optional[int]=None,
189
- height: Optional[int]=None
190
- ) -> List[PIL.Image.Image]
 
 
191
  ```
192
 
193
- Using an audio file, draw `fps` face pose images per second for the duration of the audio.
194
- - Optionally include a reference image to extract the face shape and initial position from. Default has a generic androgynous face shape.
195
- - Optionally include a list of images to extract the poses from prior to merging with audio-generated poses (in essence, pass a video here to control non-speech motion). The default is a moderately active loop of head movement.
196
- - Optionally pass width/height to modify the size. Defaults to reference image size, then pose image sizes, then 256.
197
-
198
- ### pose2img
199
-
200
- ```py
201
- pipeline.pose2img(
202
- reference_image: PIL.Image.Image,
203
- pose_image: PIL.Image.Image,
204
- num_inference_steps: int,
205
- guidance_scale: float,
206
- eta: float=0.0,
207
- reference_pose_image: Optional[Image.Image]=None,
208
- generation: Optional[Union[torch.Generator, List[torch.Generator]]]=None,
209
- output_type: Optional[str]="pil",
210
- return_dict: bool=True,
211
- callback: Optional[Callable[[int, int, torch.FloatTensor], None]]=None,
212
- callback_steps: Optional[int]=None,
213
- width: Optional[int]=None,
214
- height: Optional[int]=None,
215
- **kwargs: Any
216
- ) -> Pose2VideoPipelineOutput
217
- ```
218
-
219
- Using a reference image (for structure) and a pose image (for pose), render an image of the former in the pose of the latter.
220
- - The pose image here is a processed face pose. To pass a non-processed face pose, see `img2img`.
221
- - Optionally pass `reference_pose_image` to designate the pose of `reference_image`. When not passed, the pose of `reference_image` is automatically detected.
222
-
223
- ### pose2vid
224
-
225
- ```py
226
- pipeline.pose2vid(
227
- reference_image: PIL.Image.Image,
228
- pose_images: List[PIL.Image.Image],
229
- num_inference_steps: int,
230
- guidance_scale: float,
231
- eta: float=0.0,
232
- reference_pose_image: Optional[Image.Image]=None,
233
- generation: Optional[Union[torch.Generator, List[torch.Generator]]]=None,
234
- output_type: Optional[str]="pil",
235
- return_dict: bool=True,
236
- callback: Optional[Callable[[int, int, torch.FloatTensor], None]]=None,
237
- callback_steps: Optional[int]=None,
238
- width: Optional[int]=None,
239
- height: Optional[int]=None,
240
- video_length: Optional[int]=None,
241
- **kwargs: Any
242
- ) -> Pose2VideoPipelineOutput
243
- ```
244
-
245
- Using a reference image (for structure) and pose images (for pose), render a video of the former in the poses of the latter.
246
- - The pose images here are a processed face poses. To non-processed face poses, see `vid2vid`.
247
- - Optionally pass `reference_pose_image` to designate the pose of `reference_image`. When not passed, the pose of `reference_image` is automatically detected.
248
- - Optionally pass `video_length` to use this many frames. Default is the same as the length of the pose images.
249
-
250
- ### pose2vid_long
251
-
252
- ```py
253
- pipeline.pose2vid_long(
254
- reference_image: PIL.Image.Image,
255
- pose_images: List[PIL.Image.Image],
256
- num_inference_steps: int,
257
- guidance_scale: float,
258
- eta: float=0.0,
259
- reference_pose_image: Optional[Image.Image]=None,
260
- generation: Optional[Union[torch.Generator, List[torch.Generator]]]=None,
261
- output_type: Optional[str]="pil",
262
- return_dict: bool=True,
263
- callback: Optional[Callable[[int, int, torch.FloatTensor], None]]=None,
264
- callback_steps: Optional[int]=None,
265
- width: Optional[int]=None,
266
- height: Optional[int]=None,
267
- video_length: Optional[int]=None,
268
- context_schedule: str="uniform",
269
- context_frames: int=16,
270
- context_overlap: int=4,
271
- context_batch_size: int=1,
272
- interpolation_factor: int=1,
273
- **kwargs: Any
274
- ) -> Pose2VideoPipelineOutput
275
- ```
276
 
277
- Using a reference image (for structure) and pose images (for pose), render a video of the former in the poses of the latter, using context windowing for long-video generation.
278
- - The pose images here are a processed face poses. To non-processed face poses, see `vid2vid`.
279
- - Optionally pass `reference_pose_image` to designate the pose of `reference_image`. When not passed, the pose of `reference_image` is automatically detected.
280
- - Optionally pass `video_length` to use this many frames. Default is the same as the length of the pose images.
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+ This repository contains a pruned and partially reorganized version of [CHAMP](https://fudan-generative-vision.github.io/champ/#/).
5
 
6
  ```
7
+ @misc{zhu2024champ,
8
+ title={Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance},
9
+ author={Shenhao Zhu and Junming Leo Chen and Zuozhuo Dai and Yinghui Xu and Xun Cao and Yao Yao and Hao Zhu and Siyu Zhu},
10
  year={2024},
11
+ eprint={2403.14781},
12
  archivePrefix={arXiv},
13
  primaryClass={cs.CV}
14
  }
15
  ```
16
 
17
+ <video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/64429aaf7feb866811b12f73/wZku1I_4L4VwWeXXKgXqb.mp4"></video>
18
+
19
+ Video credit: [Polina Tankilevitch, Pexels](https://www.pexels.com/video/a-young-woman-dancing-hip-hop-3873100/)
20
+
21
+ Image credit: [Andrea Piacquadio, Pexels](https://www.pexels.com/photo/man-in-black-jacket-wearing-black-headphones-3831645/)
22
 
23
+ # Usage
24
 
25
+ First, install the CHAMP package into your python environment. If you're creating a new environment for CHAMP, be sure you also specify the version of torch you want with CUDA support, or else this will try to run only on CPU.
26
 
27
  ```sh
28
+ pip install git+https://github.com/painebenjamin/champ.git
29
  ```
30
 
31
  Now, you can create the pipeline, automatically pulling the weights from this repository, either as individual models:
32
 
33
  ```py
34
+ from champ import CHAMPPipeline
35
+ pipeline = CHAMPPipeline.from_pretrained(
36
+ "benjamin-paine/champ",
37
  torch_dtype=torch.float16,
38
  variant="fp16",
39
  device="cuda"
 
43
  Or, as a single file:
44
 
45
  ```py
46
+ from champ import CHAMPPipeline
47
+ pipeline = CHAMPPipeline.from_single_file(
48
+ "benjamin-paine/champ",
49
  torch_dtype=torch.float16,
50
  variant="fp16",
51
  device="cuda"
52
  ).to("cuda", dtype=torch.float16)
53
  ```
54
 
55
+ Follow this format for execution:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
 
57
  ```py
58
+ result = pipeline(
59
+ reference: PIL.Image.Image,
60
+ guidance: Dict[str, List[PIL.Image.Image]],
61
+ width: int,
62
+ height: int,
63
+ video_length: int,
64
+ num_inference_steps: int,
65
+ guidance_scale: float
66
+ ).videos
67
+ # Result is a list of PIL Images
68
  ```
69
 
70
+ Starting values for `num_inference_steps` and `guidance_scale` are `20` and `3.5`, respectively.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71
 
72
+ Guidance keys include `depth`, `normal`, `dwpose` and `semantic_map` (densepose.) This guide does not provide details on how to obtain those samples, but examples are available in [the git repository.](https://github.com/painebenjamin/champ/tree/master/example)