File size: 4,277 Bytes
0041a8d 0a4698a 0041a8d 7481fbf ae10b8a c676336 ae10b8a a8765c2 dea7ace 0b5ab94 dea7ace a8765c2 dea7ace 0b5ab94 ae10b8a c676336 7481fbf dea636d ae10b8a 5d24017 a8765c2 ae71815 b97b251 a8765c2 ae71815 b97b251 a8765c2 ae71815 b97b251 a8765c2 ae71815 b97b251 a8765c2 ae71815 ae10b8a 0a4698a ae10b8a c676336 d441c11 c676336 d441c11 ae10b8a c676336 ae10b8a c676336 ae10b8a fe67be5 ae10b8a fe67be5 ae10b8a c676336 dff0fb6 ae10b8a c676336 ae10b8a c676336 ae10b8a c676336 ae10b8a 0a4698a a19eadb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 |
---
license: other
language:
- en
pipeline_tag: text-to-image
tags:
- stable-diffusion
- alimama-creative
library_name: diffusers
---
# Updates
✨🎉 This model has been merged into [Diffusers](https://moon-ci-docs.huggingface.co/docs/diffusers/pr_9099/en/api/pipelines/controlnet_sd3) and can now be used conveniently. 💡 🎉✨
# Examples
![SD3](images/sd3_compressed.png)
<center><i>a woman wearing a white jacket, black hat and black pants is standing in a field, the hat writes SD3</i></center>
![bucket_alibaba](images/bucket_ali_compressed.png )
<center><i>a person wearing a white shoe, carrying a white bucket with text "alibaba" on it</i></center>
## SD3 Controlnet Inpainting
Finetuned controlnet inpainting model based on sd3-medium, the inpainting model offers several advantages:
* Leveraging the SD3 16-channel VAE and high-resolution generation capability at 1024, the model effectively preserves the integrity of non-inpainting regions, including text.
* It is capable of generating text through inpainting.
* It demonstrates superior aesthetic performance in portrait generation.
Compared with [SDXL-Inpainting](https://huggingface.co/diffusers/stable-diffusion-xl-1.0-inpainting-0.1)
From left to right: Input image, Masked image, SDXL inpainting, Ours.
![0](images/0_compressed.png)
<center><i>a tiger sitting on a park bench</i></center>
![1](images/0r_compressed.png)
<center><i>a dog sitting on a park bench</i></center>
![2](images/1_compressed.png)
<center><i>a young woman wearing a blue and pink floral dress</i></center>
![3](images/3_compressed.png)
<center><i>a woman wearing a white jacket, black hat and black pants is standing in a field, the hat writes SD3</i></center>
![4](images/5_compressed.png)
<center><i>an air conditioner hanging on the bedroom wall</i></center>
# Using with Diffusers
Install from source and Run
``` Shell
pip uninstall diffusers
pip install git+https://github.com/huggingface/diffusers
```
``` python
import torch
from diffusers.utils import load_image, check_min_version
from diffusers.pipelines import StableDiffusion3ControlNetInpaintingPipeline
from diffusers.models.controlnet_sd3 import SD3ControlNetModel
controlnet = SD3ControlNetModel.from_pretrained(
"alimama-creative/SD3-Controlnet-Inpainting", use_safetensors=True, extra_conditioning_channels=1
)
pipe = StableDiffusion3ControlNetInpaintingPipeline.from_pretrained(
"stabilityai/stable-diffusion-3-medium-diffusers",
controlnet=controlnet,
torch_dtype=torch.float16,
)
pipe.text_encoder.to(torch.float16)
pipe.controlnet.to(torch.float16)
pipe.to("cuda")
image = load_image(
"https://huggingface.co/alimama-creative/SD3-Controlnet-Inpainting/resolve/main/images/dog.png"
)
mask = load_image(
"https://huggingface.co/alimama-creative/SD3-Controlnet-Inpainting/resolve/main/images/dog_mask.png"
)
width = 1024
height = 1024
prompt = "A cat is sitting next to a puppy."
generator = torch.Generator(device="cuda").manual_seed(24)
res_image = pipe(
negative_prompt="deformed, distorted, disfigured, poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, mutated hands and fingers, disconnected limbs, mutation, mutated, ugly, disgusting, blurry, amputation, NSFW",
prompt=prompt,
height=height,
width=width,
control_image=image,
control_mask=mask,
num_inference_steps=28,
generator=generator,
controlnet_conditioning_scale=0.95,
guidance_scale=7,
).images[0]
res_image.save(f"sd3.png")
```
## Training Detail
The model was trained on 12M laion2B and internal source images for 20k steps at resolution 1024x1024.
* Mixed precision : FP16
* Learning rate : 1e-4
* Batch size : 192
* Timestep sampling mode : 'logit_normal'
* Loss : Flow Matching
## Limitation
Due to the fact that only 1024*1024 pixel resolution was used during the training phase, the inference performs best at this size, with other sizes yielding suboptimal results. We will initiate multi-resolution training in the future, and at that time, we will open-source the new weights.
## LICENSE
The model is based on SD3 finetuning; therefore, the license follows the original [SD3 license](https://huggingface.co/stabilityai/stable-diffusion-3-medium#license).
|