|
--- |
|
license: openrail++ |
|
tags: |
|
- text-to-image |
|
- stable-diffusion |
|
--- |
|
|
|
![image/gif](https://cdn-uploads.huggingface.co/production/uploads/637a6daf7ce76c3b83497ea2/ux_sZKB9snVPsKRT1TzfG.gif) |
|
|
|
# Model Description |
|
- **Developed by**: Natural Synthetics Inc. |
|
- **Model type**: Diffusion-based text-to-image generative model |
|
- **License**: CreativeML Open RAIL++-M License |
|
- **Model Description**: This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses two fixed, pretrained text encoders (OpenCLIP-ViT/G and CLIP-ViT/L). |
|
- **Resources for more information**: Check out our [GitHub Repository](https://github.com/hotshotco/hotshot-xl). |
|
|
|
|
|
# Limitations and Bias |
|
## Limitations |
|
- The model does not achieve perfect photorealism |
|
- The model cannot render legible text |
|
- The model struggles with more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere” |
|
- Faces and people in general may not be generated properly. |
|
|
|
## Bias |
|
While the capabilities of video generation models are impressive, they can also reinforce or exacerbate social biases. |
|
|