File size: 3,265 Bytes
2df88a6 5458b76 2df88a6 fcbce1c 6430063 fcbce1c 2df88a6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
---
library_name: diffusers
---
# Furception v1.0, by Project RedRocket.
This is a VAE decoder finetune, resumed from stabilityai/sd-vae-ft-mse using images from e621. It is trained with a mixture of MAE and MSE loss to maintain an acceptable balance between sharpness and smooth outputs, and loss is calculated in Oklab color space in order to prioritize image reconstruction based on which color channels are more perceptually significant.
Our testing has shown that the VAE is good at eliminating unwanted high-frequency noise when used on models trained on similar data. Results are far more apparent on flat-colored images than they are on realistic or painterly images, but we have not noticed any obvious loss of performance on any type of image. It may have some generalizability to a broader range of art styles due to the variety of different styles in the dataset.
Default VAE (kl-f8):
![Default VAE](crop3[1].png)
Furception 1.0:
![Our VAE](crop4[1].png)
Note that the output is overall smoother and has significantly less artifacting around edges in high-detail regions.
#### Licensing:
You are free to use this model for personal, non-commercial use.
You are also free to distribute this model alongside other (non-commercial) models, as long as you give credit. Please include the version number as well in case future models are released.
#### Training details:
Overall training is fundamentally similar to LDM. We used the same relative base weights for MAE, MSE, and LPIPS as used in LDM and in sd-vae-ft-mse in the case of LPIPS. The discriminator's weight in the loss objective is dynamically set so that the gradient norm for the discriminator is half that of the reconstruction loss, just like LDM. We used a similar discriminator to what LDM uses, except reparameterized to Wasserstein loss with a gradient penalty and with its group norm layers replaced with layer norms.
Training for version 1.0 used random square crops at various levels of downscales (Lanczos with antialiasing), randomly rotated and flipped. Training ran for 150,000 steps at a batch size of 32. EMA weights were accumulated using a similar decay to sd-vae-ft-mse scaled for our batch size and are the release version of the model.
#### Credits:
Development and research lead by @drhead.
With research and development assistance by @RedHotTensors.
And additional research assistance by @lodestones and Thessalo.
Dataset curation by @lodestones and Bannanapuncakes, with additional curation by @RedHotTensors.
And thanks to dogarrowtype for system administration assistance.
#### Based on:
CompVis Latent Diffusion: https://github.com/CompVis/latent-diffusion/
StabilityAI sd-vae-ft-mse: https://huggingface.co/stabilityai/sd-vae-ft-mse
LPIPS by Richard Zhang, et al: https://github.com/richzhang/PerceptualSimilarity
OkLab by Björn Ottosson: https://bottosson.github.io/posts/oklab/
fine-tune-models by Jonathan Chang: https://github.com/cccntu/fine-tune-models/
#### Built on:
Flax by Google Brain: https://github.com/google/flax
And Huggingface Diffusers: https://github.com/huggingface/diffusers
With deep thanks to the innumerable artists who released their works to the public for fair use in this non-commercial research project. |