File size: 2,596 Bytes
d6598ea
 
e2a3c2d
 
 
 
 
 
 
d6598ea
 
e2a3c2d
d6598ea
68a41cd
d6598ea
eacf392
 
 
d6598ea
 
 
 
 
 
 
9f8e7c8
d6598ea
 
 
 
 
7bd0d8e
d6598ea
 
 
e2a3c2d
d6598ea
e2a3c2d
 
 
 
d6598ea
e2a3c2d
d6598ea
e2a3c2d
 
 
 
 
 
 
 
 
 
d6598ea
 
 
e2a3c2d
d6598ea
 
 
 
 
5dae7eb
d6598ea
 
 
e2a3c2d
d6598ea
 
 
 
 
 
97570c7
d6598ea
 
 
 
 
 
 
 
 
 
5dae7eb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
---
library_name: diffusers
license: cc-by-nc-2.0
base_model:
- black-forest-labs/FLUX.1-Fill-dev
pipeline_tag: image-to-image
tags:
- tryon
- vto
---

# Model Card for CATVTON-Flux

CATVTON-Flux is an advanced virtual try-on solution that combines CATVTON (Contrastive Appearance and Topology Virtual Try-On) with Flux fill inpainting model for realistic and accurate clothing transfer.

## Update:
Latest Achievement (2024/11/24):
CatVton-Flux-Alpha achieved SOTA performance with FID: 5.593255043029785 on VITON-HD dataset. Test configuration: scale 30, step 30. My VITON-HD test inferencing results available [here](https://drive.google.com/file/d/1T2W5R1xH_uszGVD8p6UUAtWyx43rxGmI/view?usp=sharing)

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

- **Developed by:** [X/Twitter:Black Magic An](https://x.com/MrsZaaa)

### Model Sources [optional]

<!-- Provide the basic links for the model. -->

- **Repository:** [github](https://github.com/nftblackmagic/catvton-flux)

## Uses

The model is designed for virtual try-on applications, allowing users to visualize how different garments would look on a person. It can be used directly through command-line interface with the following parameters:

Input person image
Person mask
Garment image
Random seed (optional)

## How to Get Started with the Model

```
transformer = FluxTransformer2DModel.from_pretrained(
    "xiaozaa/catvton-flux-alpha", 
    torch_dtype=torch.bfloat16
)
pipe = FluxFillPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    transformer=transformer,
    torch_dtype=torch.bfloat16
).to("cuda")



```

## Training Details

### Training Data

VITON-HD dataset

### Training Procedure

Finetuning Flux1-dev-fill


## Evaluation

#### Metrics

FID: 5.593255043029785 (SOTA)

### Results

[More Information Needed]

#### Summary



**BibTeX:**
```
@misc{chong2024catvtonconcatenationneedvirtual,
 title={CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models}, 
 author={Zheng Chong and Xiao Dong and Haoxiang Li and Shiyue Zhang and Wenqing Zhang and Xujie Zhang and Hanqing Zhao and Xiaodan Liang},
 year={2024},
 eprint={2407.15886},
 archivePrefix={arXiv},
 primaryClass={cs.CV},
 url={https://arxiv.org/abs/2407.15886}, 
}
@article{lhhuang2024iclora,
  title={In-Context LoRA for Diffusion Transformers},
  author={Huang, Lianghua and Wang, Wei and Wu, Zhi-Fan and Shi, Yupeng and Dou, Huanzhang and Liang, Chen and Feng, Yutong and Liu, Yu and Zhou, Jingren},
  journal={arXiv preprint arxiv:2410.23775},
  year={2024}
}
```