|
--- |
|
license: mit |
|
language: |
|
- en |
|
library_name: diffusers |
|
tags: |
|
- lora |
|
- image-generation |
|
- diffusion |
|
- face-generation |
|
- text-conditioned-human-portrait |
|
- synthetic-captions |
|
- diffusers |
|
--- |
|
# Text2Face-LoRa |
|
![Python version](https://img.shields.io/badge/python-3.8+-blue.svg) |
|
![License](https://img.shields.io/badge/license-MIT-green) |
|
|
|
This is a LoRa-finetuned version of the Stable Diffusion 2.1 model specifically optimized |
|
for generating face images. The model was trained with [FFHQ](https://github.com/NVlabs/ffhq-dataset) and [easyportrait](https://github.com/hukenovs/easyportrait) |
|
using synthetic text captions for both datasets. |
|
Details on the dataset format and preparation will be available soon. |
|
|
|
|
|
## Checkpoints |
|
You can download the pretrained LoRa weights for the diffusion model and text encoder using |
|
|
|
```python |
|
from huggingface_hub import hf_hub_download |
|
|
|
hf_hub_download(repo_id="michaeltrs/text2face", |
|
filename="checkpoints/lora30k/pytorch_lora_weights.safetensors", |
|
local_dir="checkpoints") |
|
``` |
|
|
|
## Inference |
|
Generate images using the `generate.py` script, which loads the SD2.1 foundation model from Hugging Face and applies the LoRa weights. |
|
Generation is driven by defining a prompt and optionally a negative prompt. |
|
```python |
|
from diffusers import StableDiffusionPipeline |
|
import torch |
|
|
|
|
|
class Model: |
|
def __init__(self, checkpoint="checkpoints/lora30k", weight_name="pytorch_lora_weights.safetensors", device="cuda"): |
|
self.checkpoint = checkpoint |
|
state_dict, network_alphas = StableDiffusionPipeline.lora_state_dict( |
|
# Path to my trained lora output_dir |
|
checkpoint, |
|
weight_name=weight_name |
|
) |
|
self.pipe = StableDiffusionPipeline.from_pretrained( |
|
"stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16).to(device) |
|
self.pipe.load_lora_into_unet(state_dict, network_alphas, self.pipe.unet, adapter_name='test_lora') |
|
self.pipe.load_lora_into_text_encoder(state_dict, network_alphas, self.pipe.text_encoder, adapter_name='test_lora') |
|
self.pipe.set_adapters(["test_lora"], adapter_weights=[1.0]) |
|
|
|
|
|
def generate(self, prompt, negprompt='', steps=50, savedir=None, seed=1): |
|
lora_scale = 1.0 |
|
image = self.pipe(prompt, |
|
negative_prompt=negprompt, |
|
num_inference_steps=steps, |
|
cross_attention_kwargs={"scale": lora_scale}, |
|
generator=torch.manual_seed(seed)).images[0] |
|
if savedir is None: |
|
image.save(f"{self.checkpoint}/{'_'.join(prompt.replace('.', ' ').split(' '))}.png") |
|
else: |
|
image.save(f"{savedir}/{'_'.join(prompt.replace('.', ' ').split(' '))}.png") |
|
return image |
|
|
|
|
|
if __name__ == "__main__": |
|
|
|
model = Model() |
|
|
|
prompt = 'A happy 55 year old male with blond hair and a goatee smiles with visible teeth.' |
|
negprompt = '' |
|
|
|
image = model.generate(prompt, negprompt=negprompt, steps=50, seed=42) |
|
``` |
|
|
|
## Limitations |
|
|
|
This model, Text2Face-LoRa, is finetuned from Stable Diffusion 2.1 and as such, inherits all the limitations and biases |
|
associated with the base model. These biases may manifest in skewed representations across different ethnicities and |
|
genders due to the nature of the training data originally used for Stable Diffusion 2.1. |
|
|
|
### Specific Limitations Include: |
|
|
|
- **Ethnic and Gender Biases**: The model may generate images that do not equally represent the diversity of human |
|
features in different ethnic and gender groups, potentially reinforcing or exacerbating existing stereotypes. |
|
|
|
- **Selection Bias in Finetuning Datasets**: The datasets used for finetuning this model were selected with specific |
|
criteria in mind, which may not encompass a wide enough variety of data points to correct for the inherited biases of the base model. |
|
|
|
- **Caption Generation Bias**: The synthetic annotations used to finetune this model were generated by automated |
|
face analysis models, which themselves may be biased. This could lead to inaccuracies in facial feature interpretation |
|
and representation, particularly for less-represented demographics in the training data. |
|
|
|
### Ethical Considerations: |
|
|
|
Users are encouraged to consider these limitations when deploying the model in real-world applications, especially |
|
those involving diverse human subjects. It is advisable to perform additional validations and seek ways to mitigate |
|
these biases in practical use cases. |