File size: 1,438 Bytes
d2e658c b0de423 d2e658c b0de423 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
---
license: mit
library_name: diffusers
model-index:
- name: 16ch-VAE
results:
- task:
type: encoder-loss
dataset:
name: yerevann/coco-karpathy
type: image
metrics:
- name: PSNR
type: PSNR
value: 31.1663
---
# Ostris VAE - KL-f8-d16
A 16 channel VAE with 8x downsample. Trained from scratch on a balance of photos, artistic, text, cartoons, vector images.
It is lighter weight that most VAEs with only 57,266,643 parameters (vs SD3 VAE: 83,819,683) which means it is faster and uses less VRAM yet scores quite similarly
on real images. Plus it is MIT licensed so you can do whatever you want with it.
| VAE|PSNR (higher better)| LPIPS (lower better) | # params |
|----|----|----|----|
| sd-vae-ft-mse|26.939|0.0581|83,653,863|
| SDXL|27.370|0.0540|83,653,863|
| SD3|31.681|0.0187|83,819,683|
| **Ostris KL-f8-d16** |**31.166**|**0.0198**|**57,266,643**|
### What do I do with this?
If you don't know, you probably don't need this. This is made as an open source lighter version of a 16ch vae.
You would need to train it into a network before it is useful. I plan to do this myself for SD 1.5, SDXL, and possibly pixart.
[Follow me on Twitter](https://x.com/ostrisai) to keep up with my work on that.
### Note: Not SD3 compatable
This VAE is not SD3 compatable as it is trained from scratch and has an entirely different latent space. |