Update README.md
Browse files
README.md
CHANGED
@@ -1,10 +1,40 @@
|
|
1 |
---
|
2 |
license: mit
|
3 |
library_name: diffusers
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
---
|
5 |
|
6 |
# Ostris VAE - KL-f8-d16
|
7 |
|
8 |
A 16 channel VAE with 8x downsample. Trained from scratch on a balance of photos, artistic, text, cartoons, vector images.
|
9 |
|
10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
library_name: diffusers
|
4 |
+
model-index:
|
5 |
+
- name: 16ch-VAE
|
6 |
+
results:
|
7 |
+
- task:
|
8 |
+
type: encoder-loss
|
9 |
+
dataset:
|
10 |
+
name: yerevann/coco-karpathy
|
11 |
+
type: image
|
12 |
+
metrics:
|
13 |
+
- name: PSNR
|
14 |
+
type: PSNR
|
15 |
+
value: 31.1663
|
16 |
---
|
17 |
|
18 |
# Ostris VAE - KL-f8-d16
|
19 |
|
20 |
A 16 channel VAE with 8x downsample. Trained from scratch on a balance of photos, artistic, text, cartoons, vector images.
|
21 |
|
22 |
+
It is lighter weight that most VAEs with only 57,266,643 parameters (vs SD3 VAE: 83,819,683) which means it is faster and uses less VRAM yet scores quite similarly
|
23 |
+
on real images. Plus it is MIT licensed so you can do whatever you want with it.
|
24 |
+
|
25 |
+
| VAE|PSNR (higher better)| LPIPS (lower better) | # params |
|
26 |
+
|----|----|----|----|
|
27 |
+
| sd-vae-ft-mse|26.939|0.0581|83,653,863|
|
28 |
+
| SDXL|27.370|0.0540|83,653,863|
|
29 |
+
| SD3|31.681|0.0187|83,819,683|
|
30 |
+
| **Ostris KL-f8-d16** |**31.166**|**0.0198**|**57,266,643**|
|
31 |
+
|
32 |
+
|
33 |
+
### What do I do with this?
|
34 |
+
|
35 |
+
If you don't know, you probably don't need this. This is made as an open source lighter version of a 16ch vae.
|
36 |
+
You would need to train it into a network before it is useful. I plan to do this myself for SD 1.5, SDXL, and possibly pixart.
|
37 |
+
[Follow me on Twitter](https://x.com/ostrisai) to keep up with my work on that.
|
38 |
+
|
39 |
+
### Note: Not SD3 compatable
|
40 |
+
This VAE is not SD3 compatable as it is trained from scratch and has an entirely different latent space.
|