text2face / README.md
michaeltrs
modified model card
cc15c74
---
license: mit
language:
- en
library_name: diffusers
tags:
- lora
- image-generation
- diffusion
- face-generation
- text-conditioned-human-portrait
- synthetic-captions
- diffusers
---
# Text2Face-LoRa
![Python version](https://img.shields.io/badge/python-3.8+-blue.svg)
![License](https://img.shields.io/badge/license-MIT-green)
This is a LoRa-finetuned version of the Stable Diffusion 2.1 model specifically optimized
for generating face images. The model was trained with [FFHQ](https://github.com/NVlabs/ffhq-dataset) and [easyportrait](https://github.com/hukenovs/easyportrait)
using synthetic text captions for both datasets.
Details on the dataset format and preparation will be available soon.
## Checkpoints
You can download the pretrained LoRa weights for the diffusion model and text encoder using
```python
from huggingface_hub import hf_hub_download
hf_hub_download(repo_id="michaeltrs/text2face",
filename="checkpoints/lora30k/pytorch_lora_weights.safetensors",
local_dir="checkpoints")
```
## Inference
Generate images using the `generate.py` script, which loads the SD2.1 foundation model from Hugging Face and applies the LoRa weights.
Generation is driven by defining a prompt and optionally a negative prompt.
```python
from diffusers import StableDiffusionPipeline
import torch
class Model:
def __init__(self, checkpoint="checkpoints/lora30k", weight_name="pytorch_lora_weights.safetensors", device="cuda"):
self.checkpoint = checkpoint
state_dict, network_alphas = StableDiffusionPipeline.lora_state_dict(
# Path to my trained lora output_dir
checkpoint,
weight_name=weight_name
)
self.pipe = StableDiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16).to(device)
self.pipe.load_lora_into_unet(state_dict, network_alphas, self.pipe.unet, adapter_name='test_lora')
self.pipe.load_lora_into_text_encoder(state_dict, network_alphas, self.pipe.text_encoder, adapter_name='test_lora')
self.pipe.set_adapters(["test_lora"], adapter_weights=[1.0])
def generate(self, prompt, negprompt='', steps=50, savedir=None, seed=1):
lora_scale = 1.0
image = self.pipe(prompt,
negative_prompt=negprompt,
num_inference_steps=steps,
cross_attention_kwargs={"scale": lora_scale},
generator=torch.manual_seed(seed)).images[0]
if savedir is None:
image.save(f"{self.checkpoint}/{'_'.join(prompt.replace('.', ' ').split(' '))}.png")
else:
image.save(f"{savedir}/{'_'.join(prompt.replace('.', ' ').split(' '))}.png")
return image
if __name__ == "__main__":
model = Model()
prompt = 'A happy 55 year old male with blond hair and a goatee smiles with visible teeth.'
negprompt = ''
image = model.generate(prompt, negprompt=negprompt, steps=50, seed=42)
```
## Limitations
This model, Text2Face-LoRa, is finetuned from Stable Diffusion 2.1 and as such, inherits all the limitations and biases
associated with the base model. These biases may manifest in skewed representations across different ethnicities and
genders due to the nature of the training data originally used for Stable Diffusion 2.1.
### Specific Limitations Include:
- **Ethnic and Gender Biases**: The model may generate images that do not equally represent the diversity of human
features in different ethnic and gender groups, potentially reinforcing or exacerbating existing stereotypes.
- **Selection Bias in Finetuning Datasets**: The datasets used for finetuning this model were selected with specific
criteria in mind, which may not encompass a wide enough variety of data points to correct for the inherited biases of the base model.
- **Caption Generation Bias**: The synthetic annotations used to finetune this model were generated by automated
face analysis models, which themselves may be biased. This could lead to inaccuracies in facial feature interpretation
and representation, particularly for less-represented demographics in the training data.
### Ethical Considerations:
Users are encouraged to consider these limitations when deploying the model in real-world applications, especially
those involving diverse human subjects. It is advisable to perform additional validations and seek ways to mitigate
these biases in practical use cases.