|
# Virtual Try-On Diffusion API |
|
|
|
<!-- TOC --> |
|
* [Virtual Try-On Diffusion API](#virtual-try-on-diffusion-api) |
|
* [Summary](#summary) |
|
* [Consuming the API](#consuming-the-api) |
|
* [Try-On Endpoints](#try-on-endpoints) |
|
* [Try-On Input Parameters](#try-on-input-parameters) |
|
* [Clothing image](#clothing-image) |
|
* [Clothing prompt](#clothing-prompt) |
|
* [Avatar image](#avatar-image) |
|
* [Avatar prompt](#avatar-prompt) |
|
* [Background image](#background-image) |
|
* [Background prompt](#background-prompt) |
|
* [Additional notes](#additional-notes) |
|
* [Try-On Output](#try-on-output) |
|
* [Response codes](#response-codes) |
|
* [NSFW content](#nsfw-content) |
|
* [Use Cases and Recipes](#use-cases-and-recipes) |
|
* [Image-based virtual try-on](#image-based-virtual-try-on) |
|
* [Image-based virtual try-on with background](#image-based-virtual-try-on-with-background) |
|
* [Avatar from a text prompt](#avatar-from-a-text-prompt) |
|
* [Clothing from a text prompt](#clothing-from-a-text-prompt) |
|
* [Modifying avatar's body](#modifying-avatars-body) |
|
* [Txt2Img](#txt2img) |
|
* [Other creative possibilities](#other-creative-possibilities) |
|
* [Performance](#performance) |
|
* [Known Issues and Limitations](#known-issues-and-limitations) |
|
<!-- TOC --> |
|
|
|
## Summary |
|
|
|
Virtual Try-On Diffusion [VTON-D] by [Texel.Moda](https://texelmoda.com) is a custom diffusion-based pipeline for fast |
|
and flexible multi-modal virtual try-on. Clothing, avatar and background can be specified by reference images or text |
|
prompts allowing for clothing transfer, avatar replacement, fashion image generation and other virtual try-on related |
|
tasks. Check out the [demo on Hugging Face](https://huggingface.co/spaces/texelmoda/try-on-diffusion) to try the API in |
|
a user-friendly way. |
|
|
|
## Consuming the API |
|
|
|
The API is exposed through the RapidAPI Hub which manages API subscriptions, API keys, payments and other things. Please |
|
refer to the [RapidAPI Documentation](https://docs.rapidapi.com/docs/consumer-quick-start-guide) to get started. |
|
|
|
Generally, in order to use an API you need to perform the following steps: |
|
- Create a RapidAPI.com account. |
|
- [Navigate to the API page](https://rapidapi.com/texelmoda-texelmoda-apis/api/try-on-diffusion) and subscribe to a |
|
suitable pricing plan. We also provide a free BASIC plan with 100 API requests per month. |
|
- Use the obtained RapidAPI key to authenticate (via the _X-RapidAPI-Key_ header) and use an API from any programming |
|
language or tool you like. |
|
|
|
Example API call using cURL: |
|
```shell |
|
curl --request POST \ |
|
--url https://try-on-diffusion.p.rapidapi.com/try-on-file \ |
|
--header 'Content-Type: multipart/form-data' \ |
|
--header 'x-rapidapi-host: try-on-diffusion.p.rapidapi.com' \ |
|
--header 'x-rapidapi-key: <RapidAPI Key>' \ |
|
--form clothing_image=1.jpg \ |
|
--form avatar_image=2.jpg |
|
``` |
|
|
|
For a simple Python client implementation please see the |
|
[Hugging Face demo application source](https://huggingface.co/spaces/texelmoda/try-on-diffusion/blob/main/try_on_diffusion_client.py). |
|
|
|
## Try-On Endpoints |
|
|
|
Try-On API consists of two endpoints that differ only in the method of passing reference images: |
|
|
|
- **POST** _/try-on-file_ - takes reference images as uploaded files in the request body (using multipart/form-data). |
|
|
|
|
|
- **POST** _/try-on-url_ - takes reference images as image URLs in POST parameters. |
|
|
|
All image requirements, behavior and status codes are the same for both endpoints, choose the one that best suits your |
|
application architecture. |
|
|
|
## Try-On Input Parameters |
|
|
|
All input parameters for the try-on endpoints are currently optional. Images and prompts serve as additional generation |
|
conditions and can even be used in combination. Below is the short parameter summary with links to extended information |
|
on certain parameters. |
|
|
|
List of input parameters for the **POST** _/try-on-file_ endpoint: |
|
|
|
| Parameter | Description | Required | |
|
|-----------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------| |
|
| [clothing_image](#clothing-image) | Clothing reference image in JPEG, PNG or WEBP format, maximum file size is 12 MB. | No | |
|
| [clothing_prompt](#clothing-prompt) | Text prompt for clothing, can be used instead of an image. Compel weighting syntax is supported. Example: _red sleeveless mini dress_ | No | |
|
| [avatar_image](#avatar-image) | Avatar image in JPEG, PNG or WEBP format, maximum file size is 12 MB. | No | |
|
| avatar_sex | Avatar sex, either "male" or "female". Will be detected automatically, if left empty or omitted. Will enforce certain avatar sex if specified. | No | |
|
| [avatar_prompt](#avatar-prompt) | Text prompt for the avatar, can be used instead of an image or with image to modify the avatar. Compel weighting syntax is supported. Example: _a gentleman with beard and mustache_ | No | |
|
| [background_image](#background-image) | Optional background reference image in JPEG, PNG or WEBP format, maximum file size is 12 MB. Original avatar background is preserved if background is not specified. | No | |
|
| [background_prompt](#background-prompt) | Optional background text prompt. Original avatar background is preserved if background is not specified. Example: _in an autumn park_ | No | |
|
| seed | Seed for image generation. Default is -1 (random seed). Actual seed will also be output in the "X-Seed" response header. Example: _42_ | No | |
|
|
|
List of input parameters for the **POST** _/try-on-url_ endpoint: |
|
|
|
| Parameter | Description | Required | |
|
|-------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------| |
|
| [clothing_image_url](#clothing-image) | Clothing reference image URL. Image should be in JPEG, PNG or WEBP format, maximum file size is 12 MB. | No | |
|
| [clothing_prompt](#clothing-prompt) | Text prompt for clothing, can be used instead of an image. Compel weighting syntax is supported. Example: _red sleeveless mini dress_ | No | |
|
| [avatar_image_url](#avatar-image) | Avatar image URL. Image should be in JPEG, PNG or WEBP format, maximum file size is 12 MB. | No | |
|
| avatar_sex | Avatar sex, either "male" or "female". Will be detected automatically, if left empty or omitted. Will enforce certain avatar sex if specified. | No | |
|
| [avatar_prompt](#avatar-prompt) | Text prompt for the avatar, can be used instead of an image or with image to modify the avatar. Compel weighting syntax is supported. Example: _a gentleman with beard and mustache_ | No | |
|
| [background_image_url](#background-image) | Optional background reference image URL. Image should be in JPEG, PNG or WEBP format, maximum file size is 12 MB. Original avatar background is preserved if background is not specified. | No | |
|
| [background_prompt](#background-prompt) | Optional background text prompt. Original avatar background is preserved if background is not specified. Example: _in an autumn park_ | No | |
|
| seed | Seed for image generation. Default is -1 (random seed). Actual seed will also be output in the "X-Seed" response header. Example: _42_ | No | |
|
|
|
### Clothing image |
|
|
|
For best results clothing reference images should meet a number of requirements: |
|
|
|
- File format: **JPEG**, **PNG** or **WEBP** |
|
- Maximum file size: **12 MB** |
|
- Minimum image size: **256x256** |
|
- Recommended image size: **768x1024 and above** |
|
- Clothing should be **dressed on a person**. Some flat lay clothing photos might work, but currently it's not guaranteed |
|
- **Single person** on the image (though multiple persons might also work) |
|
- **Frontal** photo, though some degree of rotation is fine |
|
- **Good lighting** conditions and **high image quality** as it directly affects the result |
|
- **Minimal occlusion** by hair, hands or accessories |
|
|
|
To summarize: the better is the clothing image the better is the final result. |
|
|
|
Examples of good clothing images: |
|
|
|
| <img src="images/clothing_image_01.jpg" width="240"> | <img src="images/clothing_image_02.jpg" width="240"> | <img src="images/clothing_image_03.jpg" width="240"> | <img src="images/clothing_image_04.jpg" width="240"> | |
|
|------------------------------------------------------|------------------------------------------------------|------------------------------------------------------|------------------------------------------------------| |
|
|
|
### Clothing prompt |
|
|
|
Instead of a clothing image you can use text prompt to describe the garment. Short and clear prompts work best. |
|
Additionally, [Compel weighting syntax](https://github.com/damian0815/compel/blob/main/doc/syntax.md) is supported to |
|
increase or decrease weight of certain tokens. Examples: |
|
- _a sheer blue sleeveless mini dress_ |
|
- _a beige woolen sweater and white pleated skirt_ |
|
- _a black leather jacket and dark blue slim-fit jeans_ |
|
- _a floral pattern blouse and leggings_ |
|
- _a colorful+++ t-shirt and black shorts_ |
|
|
|
### Avatar image |
|
|
|
Avatar images should also meet a some requirements: |
|
|
|
- File format: **JPEG**, **PNG** or **WEBP** |
|
- Maximum file size: **12 MB** |
|
- Minimum image size: **256x256** |
|
- Recommended image size: **768x1024 and above** |
|
- **Single person** on the image (though multiple persons might also work) |
|
- **Frontal** photo, though some degree of rotation is fine |
|
- **Good lighting** conditions and **high image quality** |
|
|
|
Examples of good avatar images: |
|
|
|
| <img src="images/avatar_image_01.jpg" width="240"> | <img src="images/avatar_image_02.jpg" width="240"> | <img src="images/avatar_image_03.jpg" width="240"> | <img src="images/avatar_image_04.jpg" width="240"> | |
|
|----------------------------------------------------|----------------------------------------------------|----------------------------------------------------|----------------------------------------------------| |
|
|
|
### Avatar prompt |
|
|
|
Instead of an avatar image you can use text prompt to describe the person. Short and clear prompts work best. |
|
Additionally, [Compel weighting syntax](https://github.com/damian0815/compel/blob/main/doc/syntax.md) is supported to |
|
increase or decrease weight of certain tokens. Examples: |
|
- _a beautiful blond girl with long hair_ |
|
- _a cute redhead girl with freckles_ |
|
- _a (plus size)++ female model wearing sunglasses_ |
|
- _a fit man with dark beard and blue eyes_ |
|
- _a gentleman with beard and mustache_ |
|
|
|
### Background image |
|
|
|
Background images are used to extract high-level background features only and serve as a reference (and not exact |
|
background). Below are basic image requirements: |
|
|
|
- File format: **JPEG**, **PNG** or **WEBP** |
|
- Maximum file size: **12 MB** |
|
- Recommended image size: **256x256 and above** |
|
|
|
Examples of background images: |
|
|
|
| <img src="images/background_image_01.jpg" width="240"> | <img src="images/background_image_02.jpg" width="240"> | <img src="images/background_image_03.jpg" width="240"> | <img src="images/background_image_04.jpg" width="240"> | |
|
|--------------------------------------------------------|--------------------------------------------------------|--------------------------------------------------------|--------------------------------------------------------| |
|
|
|
### Background prompt |
|
|
|
Instead of a background image you can use text prompt to describe the background. Short and clear prompts work best. |
|
Additionally, [Compel weighting syntax](https://github.com/damian0815/compel/blob/main/doc/syntax.md) is supported to |
|
increase or decrease weight of certain tokens. Examples: |
|
- _in an autumn park_ |
|
- _in front of a brick wall_ |
|
- _on an ocean beach with (palm trees)++_ |
|
- _in a shopping mall_ |
|
- _in a modern office_ |
|
|
|
### Additional notes |
|
|
|
We use the "same-crop" approach for clothing and avatar images: images will be cropped roughly the same way (using pose |
|
estimation), so we don't have to add too much new information (e.g. assume lower body clothing). So, if you use only a |
|
photo of an upper body clothing the result will also be cropped the same way regardless of the avatar image (and the |
|
other way around): |
|
|
|
| Clothing Image | Avatar Image | Result Image | |
|
|------------------------------------------------------|-----------------------------------------------------|--------------------------------------------------------| |
|
| <img src="images/clothing_image_02.jpg" width="240"> | <img src="images/avatar_image_02.jpg" width="240"> | <img src="images/same_crop_result_01.jpg" width="240"> | |
|
| <img src="images/clothing_image_03.jpg" width="240"> | <img src="images/avatar_image_03.jpg" width="240"> | <img src="images/same_crop_result_02.jpg" width="240"> | |
|
|
|
## Try-On Output |
|
|
|
### Response codes |
|
|
|
HTTP status code is used as a high-level response status. In case of a successful API call HTTP code 200 will be |
|
returned and response body will contain a resulting JPEG image with the maximum size of 768x1024 pixels. Response |
|
will also have the "X-Seed" header set that should contain the actual seed used for image generation (for |
|
reproducibility). Other status codes (not 200) indicate unsuccessful request, see the table below for additional |
|
details: |
|
|
|
| Response Code | Content-Type | Headers | Description | Example | |
|
|:-------------:|:------------------:|:--------------:|-----------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------:| |
|
| **200** | image/jpeg | X-Seed: {seed} | Successful API call. Response body contains the resulting image in JPEG format. | <img src="images/same_crop_result_01.jpg" width="160"> | |
|
| **400** | application/json | | Bad request: at least one of request parameters is invalid. Response body should contain additional error details in JSON format. | { "detail": "Invalid upload file type: application/x-zip-compressed" } | |
|
| **403** | application/json | | Indicates authentication issue (e.g. invalid API key). | | |
|
| **422** | application/json | | Request validation error. Response body should contain error details in JSON format. | { "detail": [ { "loc": [ "string", 0], "msg": "string", "type": "string" } ] } | |
|
| **429** | | | Too many requests. Might be triggered by the RapidAPI proxy in case of reaching maximum request rate or API call limit. | | |
|
| **500** | | | Indicates an internal server error, might not have any details. | | |
|
|
|
### NSFW content |
|
|
|
We use NSFW content checker to ensure we don't output inappropriate images. If potential NSFW content is detected in the |
|
generated image, the API will return HTTP status code 400 with a corresponding error message in JSON response. |
|
|
|
## Use Cases and Recipes |
|
|
|
Our Virtual Try-On API offers a flexible way to specify clothing, avatar and background, which makes it possible to not |
|
only perform a classic task of virtual try-on, but also generate entirely new images or alter existing images in some |
|
interesting aspects. Feel free to try and explore! |
|
|
|
In all the examples below all unmentioned inputs are assumed to be empty. |
|
|
|
### Image-based virtual try-on |
|
|
|
The most common use case is to transfer clothing from one photo (e.g. from a product page) to another photo (e.g. |
|
user avatar) while maintaining the avatar and the background. |
|
|
|
| Clothing Image | Avatar Image | Result Image | |
|
|------------------------------------------------------|----------------------------------------------------|----------------------------------------------------------| |
|
| <img src="images/clothing_image_01.jpg" width="240"> | <img src="images/avatar_image_02.jpg" width="240"> | <img src="images/image_based_result_01.jpg" width="240"> | |
|
|
|
### Image-based virtual try-on with background |
|
|
|
Additionally, it's possible to replace the avatar background with a reference image or a text prompt. |
|
|
|
| Clothing Image | Avatar Image | Background Image | Result Image | |
|
|------------------------------------------------------|----------------------------------------------------|--------------------------------------------------------|---------------------------------------------------------------------| |
|
| <img src="images/clothing_image_04.jpg" width="240"> | <img src="images/avatar_image_03.jpg" width="240"> | <img src="images/background_image_01.jpg" width="240"> | <img src="images/image_based_background_result_01.jpg" width="240"> | |
|
|
|
And with a text prompt for the background: |
|
|
|
| Clothing Image | Avatar Image | Background Prompt | Result Image | |
|
|------------------------------------------------------|----------------------------------------------------|------------------------------|---------------------------------------------------------------------| |
|
| <img src="images/clothing_image_04.jpg" width="240"> | <img src="images/avatar_image_03.jpg" width="240"> | in front of a snowy mountain | <img src="images/image_based_background_result_02.jpg" width="240"> | |
|
|
|
### Avatar from a text prompt |
|
|
|
It's possible to replace the person on the clothing image with an avatar, described in a text prompt. Background will be |
|
changed as well and will be a random one if not specified: |
|
|
|
| Clothing Image | Avatar Prompt | Background Prompt | Result Image | |
|
|------------------------------------------------------|--------------------------------------------|--------------------|------------------------------------------------------------| |
|
| <img src="images/clothing_image_02.jpg" width="240"> | a beautiful blond girl with long hair | | <img src="images/avatar_prompt_result_01.jpg" width="240"> | |
|
| <img src="images/clothing_image_03.jpg" width="240"> | a gentleman with a long beard and mustache | near a fireplace | <img src="images/avatar_prompt_result_02.jpg" width="240"> | |
|
|
|
You may also experiment with avatar prompts for more interesting results: |
|
|
|
| Clothing Image | Avatar Prompt | Background Prompt | Result Image | |
|
|------------------------------------------------------|---------------------|-----------------------|------------------------------------------------------------| |
|
| <img src="images/clothing_image_03.jpg" width="240"> | (iron man mask)+++ | in the Sahara Desert | <img src="images/avatar_prompt_result_03.jpg" width="240"> | |
|
|
|
### Clothing from a text prompt |
|
|
|
Similarly, you can specify clothing with a text prompt while providing an avatar image: |
|
|
|
| Clothing Prompt | Avatar Image | Result Image | |
|
|-------------------------------------|----------------------------------------------------|--------------------------------------------------------------| |
|
| a sheer blue sleeveless mini dress | <img src="images/avatar_image_02.jpg" width="240"> | <img src="images/clothing_prompt_result_01.jpg" width="240"> | |
|
| a colorful t-shirt and black shorts | <img src="images/avatar_image_03.jpg" width="240"> | <img src="images/clothing_prompt_result_02.jpg" width="240"> | |
|
|
|
### Modifying avatar's body |
|
|
|
If you specify clothing and avatar images to be the same while providing an avatar prompt it's possible to change |
|
avatar's body proportions. Note that it may require using additional term weighting to achieve stronger changes. |
|
|
|
| Clothing Image | Avatar Image | Avatar Prompt | Result Image | |
|
|------------------------------------------------------|------------------------------------------------------|-------------------------------|------------------------------------------------------------------| |
|
| <img src="images/clothing_image_01.jpg" width="240"> | <img src="images/clothing_image_01.jpg" width="240"> | a (plus size)+ woman | <img src="images/avatar_modification_result_01.jpg" width="240"> | |
|
| <img src="images/clothing_image_03.jpg" width="240"> | <img src="images/clothing_image_03.jpg" width="240"> | a (muscular bodybuilder)+++++ | <img src="images/avatar_modification_result_02.jpg" width="240"> | |
|
|
|
### Txt2Img |
|
|
|
As our diffusion model was fine-tuned to produce people wearing various clothing, it can better follow a clothing prompt |
|
and output realistic people and garments: |
|
|
|
| Clothing Prompt | Avatar Prompt | Background Prompt | Result Image | |
|
|-------------------------------------------------|--------------------------------|------------------------|------------------------------------------------------| |
|
| a paisley pattern purple shirt and beige chinos | a fit man with dark beard | plain white background | <img src="images/txt2img_result_01.jpg" width="240"> | |
|
| a white polka dot pattern dress | a beautiful petite blond woman | on a yacht | <img src="images/txt2img_result_02.jpg" width="240"> | |
|
|
|
### Other creative possibilities |
|
|
|
If you specify the same image for clothing and avatar while providing a background prompt (or background image) you can |
|
replace the background in a creative way: |
|
|
|
| Clothing Image | Avatar Image | Background Prompt | Result Image | |
|
|----------------------------------------------------|----------------------------------------------------|-------------------------|-------------------------------------------------------------| |
|
| <img src="images/avatar_image_02.jpg" width="240"> | <img src="images/avatar_image_02.jpg" width="240"> | on a snowy mountain top | <img src="images/new_background_result_01.jpg" width="240"> | |
|
|
|
It's also possible to use a combination of clothing image, clothing prompt, avatar image and a background to add some |
|
accessories: |
|
|
|
| Clothing Image | Clothing Prompt | Avatar Image | Background Image | Result Image | |
|
|------------------------------------------------------|--------------------------|------------------------------------------------------|--------------------------------------------------------|------------------------------------------------------------------| |
|
| <img src="images/avatar_image_02.jpg" width="240"> | a (light brown purse)+++ | <img src="images/avatar_image_02.jpg" width="240"> | <img src="images/background_image_03.jpg" width="240"> | <img src="images/accessory_result_01.jpg" width="240"> | |
|
|
|
## Performance |
|
|
|
Typically, one try-on request is processed in 5-10 seconds (depending on type of conditions) excluding network latency. |
|
In order to reduce network overhead you might want compress your images before feeding to the API (e.g. using JPEG). |
|
Please note that in case of a high demand processing time might increase due to request being queued, though we |
|
constantly monitor our GPU cluster capacity and perform scaling as needed. |
|
|
|
## Known Issues and Limitations |
|
|
|
As any generative model, our models are not perfect (though we constantly work on improvements): |
|
- Currently, we do not fully support flat lay clothing images. Some might work, but that's not guaranteed. |
|
- Prompt following might not be perfect, especially in case of long and sophisticated prompts. Prefer simpler and more |
|
straightforward prompts whenever possible. Also be pretty verbose (e.g. use the word "plain" if you need something of |
|
solid color). Additionally, Compel weighting might be used to increase weight of certain tokens. |
|
- As usual, generative models struggle with hands, fingers and toes, though we try to mitigate it to a certain extent. |
|
- Currently, we do not support trying on a single garment, only the full look. |
|
- Hats and sunglasses are not currently transferred, but we are working on it. |
|
- Backgrounds might lack some clarity as currently we focus more on clothing. |
|
- In case of a specified background a hairstyle might change. |
|
- Body shape of the avatar might change towards smaller sizes. |
|
|