meow commited on
Commit
b7efb56
β€’
1 Parent(s): d6d3a5b
Files changed (1) hide show
  1. README.md +10 -462
README.md CHANGED
@@ -1,462 +1,10 @@
1
- # MDM: Human Motion Diffusion Model
2
-
3
-
4
- data in what format and data in this foramt
5
-
6
-
7
- [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/human-motion-diffusion-model/motion-synthesis-on-humanact12)](https://paperswithcode.com/sota/motion-synthesis-on-humanact12?p=human-motion-diffusion-model)
8
- [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/human-motion-diffusion-model/motion-synthesis-on-humanml3d)](https://paperswithcode.com/sota/motion-synthesis-on-humanml3d?p=human-motion-diffusion-model)
9
- [![arXiv](https://img.shields.io/badge/arXiv-<2209.14916>-<COLOR>.svg)](https://arxiv.org/abs/2209.14916)
10
-
11
- <a href="https://replicate.com/arielreplicate/motion_diffusion_model"><img src="https://replicate.com/arielreplicate/motion_diffusion_model/badge"></a>
12
-
13
- The official PyTorch implementation of the paper [**"Human Motion Diffusion Model"**](https://arxiv.org/abs/2209.14916).
14
-
15
- Please visit our [**webpage**](https://guytevet.github.io/mdm-page/) for more details.
16
-
17
- ![teaser](https://github.com/GuyTevet/mdm-page/raw/main/static/figures/github.gif)
18
-
19
- #### Bibtex
20
- If you find this code useful in your research, please cite:
21
-
22
- ```
23
- @article{tevet2022human,
24
- title={Human Motion Diffusion Model},
25
- author={Tevet, Guy and Raab, Sigal and Gordon, Brian and Shafir, Yonatan and Bermano, Amit H and Cohen-Or, Daniel},
26
- journal={arXiv preprint arXiv:2209.14916},
27
- year={2022}
28
- }
29
- ```
30
-
31
- ## News
32
-
33
- πŸ“’ **23/Nov/22** - Fixed evaluation issue (#42) - Please pull and run `bash prepare/download_t2m_evaluators.sh` from the top of the repo to adapt.
34
-
35
- πŸ“’ **4/Nov/22** - Added sampling, training and evaluation of unconstrained tasks.
36
- Note slight env changes adapting to the new code. If you already have an installed environment, run `bash prepare/download_unconstrained_assets.sh; conda install -y -c anaconda scikit-learn
37
- ` to adapt.
38
-
39
- πŸ“’ **3/Nov/22** - Added in-between and upper-body editing.
40
-
41
- πŸ“’ **31/Oct/22** - Added sampling, training and evaluation of action-to-motion tasks.
42
-
43
- πŸ“’ **9/Oct/22** - Added training and evaluation scripts.
44
- Note slight env changes adapting to the new code. If you already have an installed environment, run `bash prepare/download_glove.sh; pip install clearml` to adapt.
45
-
46
- πŸ“’ **6/Oct/22** - First release - sampling and rendering using pre-trained models.
47
-
48
-
49
- ## Getting started
50
-
51
- This code was tested on `Ubuntu 18.04.5 LTS` and requires:
52
-
53
- * Python 3.7
54
- * conda3 or miniconda3
55
- * CUDA capable GPU (one is enough)
56
-
57
- ### 1. Setup environment
58
-
59
- Install ffmpeg (if not already installed):
60
-
61
- ```shell
62
- sudo apt update
63
- sudo apt install ffmpeg
64
- ```
65
- For windows use [this](https://www.geeksforgeeks.org/how-to-install-ffmpeg-on-windows/) instead.
66
-
67
- Setup conda env:
68
- ```shell
69
- conda env create -f environment.yml
70
- conda activate mdm
71
- python -m spacy download en_core_web_sm
72
- pip install git+https://github.com/openai/CLIP.git
73
- ```
74
-
75
- Download dependencies:
76
-
77
- <details>
78
- <summary><b>Text to Motion</b></summary>
79
-
80
- ```bash
81
- bash prepare/download_smpl_files.sh
82
- bash prepare/download_glove.sh
83
- bash prepare/download_t2m_evaluators.sh
84
- ```
85
- </details>
86
-
87
- <details>
88
- <summary><b>Action to Motion</b></summary>
89
-
90
- ```bash
91
- bash prepare/download_smpl_files.sh
92
- bash prepare/download_recognition_models.sh
93
- ```
94
- </details>
95
-
96
- <details>
97
- <summary><b>Unconstrained</b></summary>
98
-
99
- ```bash
100
- bash prepare/download_smpl_files.sh
101
- bash prepare/download_recognition_models.sh
102
- bash prepare/download_recognition_unconstrained_models.sh
103
- ```
104
- </details>
105
-
106
- ### 2. Get data
107
-
108
- <details>
109
- <summary><b>Text to Motion</b></summary>
110
-
111
- There are two paths to get the data:
112
-
113
- (a) **Go the easy way if** you just want to generate text-to-motion (excluding editing which does require motion capture data)
114
-
115
- (b) **Get full data** to train and evaluate the model.
116
-
117
-
118
- #### a. The easy way (text only)
119
-
120
- **HumanML3D** - Clone HumanML3D, then copy the data dir to our repository:
121
-
122
- ```shell
123
- cd ..
124
- git clone https://github.com/EricGuo5513/HumanML3D.git
125
- unzip ./HumanML3D/HumanML3D/texts.zip -d ./HumanML3D/HumanML3D/
126
- cp -r HumanML3D/HumanML3D motion-diffusion-model/dataset/HumanML3D
127
- cd motion-diffusion-model
128
- ```
129
-
130
-
131
- #### b. Full data (text + motion capture)
132
-
133
- **HumanML3D** - Follow the instructions in [HumanML3D](https://github.com/EricGuo5513/HumanML3D.git),
134
- then copy the result dataset to our repository:
135
-
136
- ```shell
137
- cp -r ../HumanML3D/HumanML3D ./dataset/HumanML3D
138
- ```
139
-
140
- **KIT** - Download from [HumanML3D](https://github.com/EricGuo5513/HumanML3D.git) (no processing needed this time) and the place result in `./dataset/KIT-ML`
141
- </details>
142
-
143
- <details>
144
- <summary><b>Action to Motion</b></summary>
145
-
146
- **UESTC, HumanAct12**
147
- ```bash
148
- bash prepare/download_a2m_datasets.sh
149
- ```
150
- </details>
151
-
152
- <details>
153
- <summary><b>Unconstrained</b></summary>
154
-
155
- **HumanAct12**
156
- ```bash
157
- bash prepare/download_unconstrained_datasets.sh
158
- ```
159
- </details>
160
-
161
- ### 3. Download the pretrained models
162
-
163
- Download the model(s) you wish to use, then unzip and place them in `./save/`.
164
-
165
- <details>
166
- <summary><b>Text to Motion</b></summary>
167
-
168
- **You need only the first one.**
169
-
170
- **HumanML3D**
171
-
172
- [humanml-encoder-512](https://drive.google.com/file/d/1PE0PK8e5a5j-7-Xhs5YET5U5pGh0c821/view?usp=sharing) (best model)
173
-
174
- [humanml-decoder-512](https://drive.google.com/file/d/1q3soLadvVh7kJuJPd2cegMNY2xVuVudj/view?usp=sharing)
175
-
176
- [humanml-decoder-with-emb-512](https://drive.google.com/file/d/1GnsW0K3UjuOkNkAWmjrGIUmeDDZrmPE5/view?usp=sharing)
177
-
178
- **KIT**
179
-
180
- [kit-encoder-512](https://drive.google.com/file/d/1SHCRcE0es31vkJMLGf9dyLe7YsWj7pNL/view?usp=sharing)
181
-
182
- </details>
183
-
184
- <details>
185
- <summary><b>Action to Motion</b></summary>
186
-
187
- **UESTC**
188
-
189
- [uestc](https://drive.google.com/file/d/1goB2DJK4B-fLu2QmqGWKAqWGMTAO6wQ6/view?usp=sharing)
190
-
191
- [uestc_no_fc](https://drive.google.com/file/d/1fpv3mR-qP9CYCsi9CrQhFqlLavcSQky6/view?usp=sharing)
192
-
193
- **HumanAct12**
194
-
195
- [humanact12](https://drive.google.com/file/d/154X8_Lgpec6Xj0glEGql7FVKqPYCdBFO/view?usp=sharing)
196
-
197
- [humanact12_no_fc](https://drive.google.com/file/d/1frKVMBYNiN5Mlq7zsnhDBzs9vGJvFeiQ/view?usp=sharing)
198
-
199
- </details>
200
-
201
- <details>
202
- <summary><b>Unconstrained</b></summary>
203
-
204
- **HumanAct12**
205
-
206
- [humanact12_unconstrained](https://drive.google.com/file/d/1uG68m200pZK3pD-zTmPXu5XkgNpx_mEx/view?usp=share_link)
207
-
208
- </details>
209
-
210
-
211
- ## Example Usage
212
-
213
-
214
- example usage and results on TACO dataset
215
-
216
-
217
- | Input | Result | Overlayed |
218
- | :----------------------: | :---------------------: | :-----------------------: |
219
- | ![](assets/taco-20231104_017-src-a.gif) | ![](assets/taco-20231104_017-res-a.gif) | ![](assets/taco-20231104_017-overlayed-a.gif) |
220
-
221
-
222
- Follow steps below to reproduce the above result.
223
-
224
- 1. **Denoising**
225
- ```bash
226
- bash scripts/val_examples/predict_taco_rndseed_spatial_20231104_017.sh
227
- ```
228
- Ten random seeds will be utilizd for prediction. The predicted results will be saved in the folder `./data/taco/result`.
229
- 2. **Mesh reconstruction**
230
- ```bash
231
- bash scripts/val_examples/reconstruct_taco_20231104_017.sh
232
- ```
233
- Results will be saved under the same folder with the above step.
234
- 3. **Extracting results and visualization**
235
-
236
-
237
-
238
- <details>
239
- <summary><b>Text to Motion</b></summary>
240
-
241
- ### Generate from test set prompts
242
-
243
- ```shell
244
- python -m sample.generate --model_path ./save/humanml_trans_enc_512/model000200000.pt --num_samples 10 --num_repetitions 3
245
- ```
246
-
247
- ### Generate from your text file
248
-
249
- ```shell
250
- python -m sample.generate --model_path ./save/humanml_trans_enc_512/model000200000.pt --input_text ./assets/example_text_prompts.txt
251
- ```
252
-
253
- ### Generate a single prompt
254
-
255
- ```shell
256
- python -m sample.generate --model_path ./save/humanml_trans_enc_512/model000200000.pt --text_prompt "the person walked forward and is picking up his toolbox."
257
- ```
258
- </details>
259
-
260
- <details>
261
- <summary><b>Action to Motion</b></summary>
262
-
263
- ### Generate from test set actions
264
-
265
- ```shell
266
- python -m sample.generate --model_path ./save/humanact12/model000350000.pt --num_samples 10 --num_repetitions 3
267
- ```
268
-
269
- ### Generate from your actions file
270
-
271
- ```shell
272
- python -m sample.generate --model_path ./save/humanact12/model000350000.pt --action_file ./assets/example_action_names_humanact12.txt
273
- ```
274
-
275
- ### Generate a single action
276
-
277
- ```shell
278
- python -m sample.generate --model_path ./save/humanact12/model000350000.pt --text_prompt "drink"
279
- ```
280
- </details>
281
-
282
- <details>
283
- <summary><b>Unconstrained</b></summary>
284
-
285
- ```shell
286
- python -m sample.generate --model_path ./save/unconstrained/model000450000.pt --num_samples 10 --num_repetitions 3
287
- ```
288
-
289
- By abuse of notation, (num_samples * num_repetitions) samples are created, and are visually organized in a display of num_samples rows and num_repetitions columns.
290
-
291
- </details>
292
-
293
- **You may also define:**
294
- * `--device` id.
295
- * `--seed` to sample different prompts.
296
- * `--motion_length` (text-to-motion only) in seconds (maximum is 9.8[sec]).
297
-
298
- **Running those will get you:**
299
-
300
- * `results.npy` file with text prompts and xyz positions of the generated animation
301
- * `sample##_rep##.mp4` - a stick figure animation for each generated motion.
302
-
303
- It will look something like this:
304
-
305
- ![example](assets/example_stick_fig.gif)
306
-
307
- You can stop here, or render the SMPL mesh using the following script.
308
-
309
- ### Render SMPL mesh
310
-
311
- To create SMPL mesh per frame run:
312
-
313
- ```shell
314
- python -m visualize.render_mesh --input_path /path/to/mp4/stick/figure/file
315
- ```
316
-
317
- **This script outputs:**
318
- * `sample##_rep##_smpl_params.npy` - SMPL parameters (thetas, root translations, vertices and faces)
319
- * `sample##_rep##_obj` - Mesh per frame in `.obj` format.
320
-
321
- **Notes:**
322
- * The `.obj` can be integrated into Blender/Maya/3DS-MAX and rendered using them.
323
- * This script is running [SMPLify](https://smplify.is.tue.mpg.de/) and needs GPU as well (can be specified with the `--device` flag).
324
- * **Important** - Do not change the original `.mp4` path before running the script.
325
-
326
- **Notes for 3d makers:**
327
- * You have two ways to animate the sequence:
328
- 1. Use the [SMPL add-on](https://smpl.is.tue.mpg.de/index.html) and the theta parameters saved to `sample##_rep##_smpl_params.npy` (we always use beta=0 and the gender-neutral model).
329
- 1. A more straightforward way is using the mesh data itself. All meshes have the same topology (SMPL), so you just need to keyframe vertex locations.
330
- Since the OBJs are not preserving vertices order, we also save this data to the `sample##_rep##_smpl_params.npy` file for your convenience.
331
-
332
- ## Motion Editing
333
-
334
- * This feature is available for text-to-motion datasets (HumanML3D and KIT).
335
- * In order to use it, you need to acquire the full data (not just the texts).
336
- * We support the two modes presented in the paper: `in_between` and `upper_body`.
337
-
338
- ### Unconditioned editing
339
-
340
- ```shell
341
- python -m sample.edit --model_path ./save/humanml_trans_enc_512/model000200000.pt --edit_mode in_between
342
- ```
343
-
344
- **You may also define:**
345
- * `--num_samples` (default is 10) / `--num_repetitions` (default is 3).
346
- * `--device` id.
347
- * `--seed` to sample different prompts.
348
- * `--edit_mode upper_body` For upper body editing (lower body is fixed).
349
-
350
-
351
- The output will look like this (blue frames are from the input motion; orange were generated by the model):
352
-
353
- ![example](assets/in_between_edit.gif)
354
-
355
- * As in *Motion Synthesis*, you may follow the **Render SMPL mesh** section to obtain meshes for your edited motions.
356
-
357
- ### Text conditioned editing
358
-
359
- Just add the text conditioning using `--text_condition`. For example:
360
-
361
- ```shell
362
- python -m sample.edit --model_path ./save/humanml_trans_enc_512/model000200000.pt --edit_mode upper_body --text_condition "A person throws a ball"
363
- ```
364
-
365
- The output will look like this (blue joints are from the input motion; orange were generated by the model):
366
-
367
- ![example](assets/upper_body_edit.gif)
368
-
369
- ## Train your own MDM
370
-
371
- <details>
372
- <summary><b>Text to Motion</b></summary>
373
-
374
- **HumanML3D**
375
- ```shell
376
- python -m train.train_mdm --save_dir save/my_humanml_trans_enc_512 --dataset humanml
377
- ```
378
-
379
- **KIT**
380
- ```shell
381
- python -m train.train_mdm --save_dir save/my_kit_trans_enc_512 --dataset kit
382
- ```
383
- </details>
384
- <details>
385
- <summary><b>Action to Motion</b></summary>
386
-
387
- ```shell
388
- python -m train.train_mdm --save_dir save/my_name --dataset {humanact12,uestc} --cond_mask_prob 0 --lambda_rcxyz 1 --lambda_vel 1 --lambda_fc 1
389
- ```
390
- </details>
391
-
392
- <details>
393
- <summary><b>Unconstrained</b></summary>
394
-
395
- ```shell
396
- python -m train.train_mdm --save_dir save/my_name --dataset humanact12 --cond_mask_prob 0 --lambda_rcxyz 1 --lambda_vel 1 --lambda_fc 1 --unconstrained
397
- ```
398
- </details>
399
-
400
- * Use `--device` to define GPU id.
401
- * Use `--arch` to choose one of the architectures reported in the paper `{trans_enc, trans_dec, gru}` (`trans_enc` is default).
402
- * Add `--train_platform_type {ClearmlPlatform, TensorboardPlatform}` to track results with either [ClearML](https://clear.ml/) or [Tensorboard](https://www.tensorflow.org/tensorboard).
403
- * Add `--eval_during_training` to run a short (90 minutes) evaluation for each saved checkpoint.
404
- This will slow down training but will give you better monitoring.
405
-
406
- ## Evaluate
407
-
408
- <details>
409
- <summary><b>Text to Motion</b></summary>
410
-
411
- * Takes about 20 hours (on a single GPU)
412
- * The output of this script for the pre-trained models (as was reported in the paper) is provided in the checkpoints zip file.
413
-
414
- **HumanML3D**
415
- ```shell
416
- python -m eval.eval_humanml --model_path ./save/humanml_trans_enc_512/model000475000.pt
417
- ```
418
-
419
- **KIT**
420
- ```shell
421
- python -m eval.eval_humanml --model_path ./save/kit_trans_enc_512/model000400000.pt
422
- ```
423
- </details>
424
-
425
- <details>
426
- <summary><b>Action to Motion</b></summary>
427
-
428
- * Takes about 7 hours for UESTC and 2 hours for HumanAct12 (on a single GPU)
429
- * The output of this script for the pre-trained models (as was reported in the paper) is provided in the checkpoints zip file.
430
-
431
- ```shell
432
- python -m eval.eval_humanact12_uestc --model <path-to-model-ckpt> --eval_mode full
433
- ```
434
- where `path-to-model-ckpt` can be a path to any of the pretrained action-to-motion models listed above, or to a checkpoint trained by the user.
435
-
436
- </details>
437
-
438
-
439
- <details>
440
- <summary><b>Unconstrained</b></summary>
441
-
442
- * Takes about 3 hours (on a single GPU)
443
-
444
- ```shell
445
- python -m eval.eval_humanact12_uestc --model ./save/unconstrained/model000450000.pt --eval_mode full
446
- ```
447
-
448
- Precision and recall are not computed to save computing time. If you wish to compute them, edit the file eval/a2m/gru_eval.py and change the string `fast=True` to `fast=False`.
449
-
450
- </details>
451
-
452
- ## Acknowledgments
453
-
454
- This code is standing on the shoulders of giants. We want to thank the following contributors
455
- that our code is based on:
456
-
457
- [guided-diffusion](https://github.com/openai/guided-diffusion), [MotionCLIP](https://github.com/GuyTevet/MotionCLIP), [text-to-motion](https://github.com/EricGuo5513/text-to-motion), [actor](https://github.com/Mathux/ACTOR), [joints2smpl](https://github.com/wangsen1312/joints2smpl), [MoDi](https://github.com/sigal-raab/MoDi).
458
-
459
- ## License
460
- This code is distributed under an [MIT LICENSE](LICENSE).
461
-
462
- Note that our code depends on other libraries, including CLIP, SMPL, SMPL-X, PyTorch3D, and uses datasets that each have their own respective licenses that must also be followed.
 
1
+ ---
2
+ title: {{title}}
3
+ emoji: {{emoji}}
4
+ colorFrom: {{colorFrom}}
5
+ colorTo: {{colorTo}}
6
+ sdk: {{sdk}}
7
+ sdk_version: {{sdkVersion}}
8
+ app_file: app.py
9
+ pinned: false
10
+ ---