iamkaikai commited on
Commit
cc09a25
1 Parent(s): 3ccca4c

End of training

Browse files
README.md CHANGED
@@ -1,7 +1,7 @@
1
 
2
  ---
3
  license: creativeml-openrail-m
4
- base_model: runwayml/stable-diffusion-v1-5
5
  datasets:
6
  - iamkaikai/amazing_logos_v4
7
  tags:
@@ -14,7 +14,7 @@ inference: true
14
 
15
  # Text-to-image finetuning - iamkaikai/amazing-logos-v4
16
 
17
- This pipeline was finetuned from **runwayml/stable-diffusion-v1-5** on the **iamkaikai/amazing_logos_v4** dataset. Below are some example images generated with the finetuned pipeline using the following prompts: ['Simple elegant logo for Mandarin Oriental, Fan Hong kong Lines Paper, Hospitality, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges, black and white', 'Simple elegant logo for AltVest Investments, alternative investments financial services, Finance, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for PeckCerativeHorz2.jpg, peck horizontal trends branding bold photography analysis packaging vertical products circle discovery identity color creative exhibition direction P graphics julian research, , successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', "Simple elegant logo for Johns Creek Shirts, printing T's art Apparel screen tshirt summer T t shirts, Apparel, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges", 'Simple elegant logo for MGD, Human Circle MGD dots Resources SRP 3D brown, Human Resources, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for Indooroopilly Uniting Church, abstract initials people swirl letter I letter U letter C giving community soft friendly purple blue red, Religious, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for Hacker, Douglas, & Company, accountant Hollywood law H filmstrip attorney HDC film, law, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for Windmill unused #5, windmill property community shapes quilt blades houses colorful carlsbad homes circle whimsical estate housing real, housing development, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for The Duck Store, track track and field sports athletics tree logo badge, Sports Apparel, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for InGenious Fitness, G Ball Green Blue, Fitness, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for KickCharge Creative, seating safety man driver person figure hardhat S initial sign, Transportation, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for Chickasaw Nation, water drop laundry, Commercial Laundry Services, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for NBA Properties, Inc., basketball sports branding team entertainment philadelphia star patriotic, Sports Entertainment, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for North Asheville Tailgate Market Veggie Sub Mark, culinary cheese Initials combo organic serif vegetable radish Farmers eggplant inspirations2023 tailgate food market submark asheville farm kale modern unique sanserif veggie , farmers market, culinary, food, retail, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for A. Diethelm, A Circle Line Switzerland Triangle, Painting Tools and Supplies, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges, black and white', 'Simple elegant logo for Grupo Altair Publicidad, Circle Lines Venezuela, Publishing, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges, black and white']:
18
 
19
  ![val_imgs_grid](./val_imgs_grid.png)
20
 
@@ -37,7 +37,7 @@ image.save("my_image.png")
37
 
38
  These are the key hyperparameters used during training:
39
 
40
- * Epochs: 3
41
  * Learning rate: 1e-06
42
  * Batch size: 1
43
  * Gradient accumulation steps: 1
@@ -45,4 +45,4 @@ These are the key hyperparameters used during training:
45
  * Mixed-precision: fp16
46
 
47
 
48
- More information on all the CLI arguments and the environment are available on your [`wandb` run page](https://wandb.ai/iam-kai-kai/text2image-fine-tune/runs/k56ze8nm).
 
1
 
2
  ---
3
  license: creativeml-openrail-m
4
+ base_model: iamkaikai/amazing-logos-v4
5
  datasets:
6
  - iamkaikai/amazing_logos_v4
7
  tags:
 
14
 
15
  # Text-to-image finetuning - iamkaikai/amazing-logos-v4
16
 
17
+ This pipeline was finetuned from **iamkaikai/amazing-logos-v4** on the **iamkaikai/amazing_logos_v4** dataset. Below are some example images generated with the finetuned pipeline using the following prompts: ['Simple elegant logo for Mandarin Oriental, Fan Hong kong Lines Paper, Hospitality, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges, black and white', 'Simple elegant logo for AltVest Investments, alternative investments financial services, Finance, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for PeckCerativeHorz2.jpg, peck horizontal trends branding bold photography analysis packaging vertical products circle discovery identity color creative exhibition direction P graphics julian research, , successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', "Simple elegant logo for Johns Creek Shirts, printing T's art Apparel screen tshirt summer T t shirts, Apparel, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges", 'Simple elegant logo for MGD, Human Circle MGD dots Resources SRP 3D brown, Human Resources, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for Indooroopilly Uniting Church, abstract initials people swirl letter I letter U letter C giving community soft friendly purple blue red, Religious, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for Hacker, Douglas, & Company, accountant Hollywood law H filmstrip attorney HDC film, law, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for Windmill unused #5, windmill property community shapes quilt blades houses colorful carlsbad homes circle whimsical estate housing real, housing development, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for The Duck Store, track track and field sports athletics tree logo badge, Sports Apparel, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for InGenious Fitness, G Ball Green Blue, Fitness, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for KickCharge Creative, seating safety man driver person figure hardhat S initial sign, Transportation, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for Chickasaw Nation, water drop laundry, Commercial Laundry Services, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for NBA Properties, Inc., basketball sports branding team entertainment philadelphia star patriotic, Sports Entertainment, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for North Asheville Tailgate Market Veggie Sub Mark, culinary cheese Initials combo organic serif vegetable radish Farmers eggplant inspirations2023 tailgate food market submark asheville farm kale modern unique sanserif veggie , farmers market, culinary, food, retail, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for A. Diethelm, A Circle Line Switzerland Triangle, Painting Tools and Supplies, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges, black and white', 'Simple elegant logo for Grupo Altair Publicidad, Circle Lines Venezuela, Publishing, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges, black and white']:
18
 
19
  ![val_imgs_grid](./val_imgs_grid.png)
20
 
 
37
 
38
  These are the key hyperparameters used during training:
39
 
40
+ * Epochs: 4
41
  * Learning rate: 1e-06
42
  * Batch size: 1
43
  * Gradient accumulation steps: 1
 
45
  * Mixed-precision: fp16
46
 
47
 
48
+ More information on all the CLI arguments and the environment are available on your [`wandb` run page](https://wandb.ai/iam-kai-kai/text2image-fine-tune/runs/z0e685b8).
amazing-logos-v4.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d14e7f9f8ac0e8b8a5c03ca427227231600272c14abc63425a3514374ca5bd96
3
+ size 3851910203
checkpoint-1200000/optimizer.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:35c90c20c1070026dfcf03c8e840d8ba2d1782bbb5665dc2e436d93b6fd5daab
3
+ size 6876749715
checkpoint-1200000/random_states_0.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6a3c9425ad8250aaa3a1064f5d8322a327ea791be152d63a3297330b7aefde10
3
+ size 14727
checkpoint-1200000/scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:35fdd6a9e94c346880769a6b076e39d83d0ad76b0e78d4f6bc05c3ced87e4213
3
+ size 557
checkpoint-1200000/scheduler.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c35b14e050a8d7886fa6a9eccbdddea8acf8380950b2ec23435707e664fc94e5
3
+ size 563
checkpoint-1200000/unet/config.json ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "UNet2DConditionModel",
3
+ "_diffusers_version": "0.20.0.dev0",
4
+ "_name_or_path": "iamkaikai/amazing-logos-v4",
5
+ "act_fn": "silu",
6
+ "addition_embed_type": null,
7
+ "addition_embed_type_num_heads": 64,
8
+ "addition_time_embed_dim": null,
9
+ "attention_head_dim": 8,
10
+ "attention_type": "default",
11
+ "block_out_channels": [
12
+ 320,
13
+ 640,
14
+ 1280,
15
+ 1280
16
+ ],
17
+ "center_input_sample": false,
18
+ "class_embed_type": null,
19
+ "class_embeddings_concat": false,
20
+ "conv_in_kernel": 3,
21
+ "conv_out_kernel": 3,
22
+ "cross_attention_dim": 768,
23
+ "cross_attention_norm": null,
24
+ "down_block_types": [
25
+ "CrossAttnDownBlock2D",
26
+ "CrossAttnDownBlock2D",
27
+ "CrossAttnDownBlock2D",
28
+ "DownBlock2D"
29
+ ],
30
+ "downsample_padding": 1,
31
+ "dual_cross_attention": false,
32
+ "encoder_hid_dim": null,
33
+ "encoder_hid_dim_type": null,
34
+ "flip_sin_to_cos": true,
35
+ "freq_shift": 0,
36
+ "in_channels": 4,
37
+ "layers_per_block": 2,
38
+ "mid_block_only_cross_attention": null,
39
+ "mid_block_scale_factor": 1,
40
+ "mid_block_type": "UNetMidBlock2DCrossAttn",
41
+ "norm_eps": 1e-05,
42
+ "norm_num_groups": 32,
43
+ "num_attention_heads": null,
44
+ "num_class_embeds": null,
45
+ "only_cross_attention": false,
46
+ "out_channels": 4,
47
+ "projection_class_embeddings_input_dim": null,
48
+ "resnet_out_scale_factor": 1.0,
49
+ "resnet_skip_time_act": false,
50
+ "resnet_time_scale_shift": "default",
51
+ "sample_size": 64,
52
+ "time_cond_proj_dim": null,
53
+ "time_embedding_act_fn": null,
54
+ "time_embedding_dim": null,
55
+ "time_embedding_type": "positional",
56
+ "timestep_post_act": null,
57
+ "transformer_layers_per_block": 1,
58
+ "up_block_types": [
59
+ "UpBlock2D",
60
+ "CrossAttnUpBlock2D",
61
+ "CrossAttnUpBlock2D",
62
+ "CrossAttnUpBlock2D"
63
+ ],
64
+ "upcast_attention": false,
65
+ "use_linear_projection": false
66
+ }
checkpoint-1200000/unet/diffusion_pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:371d6a5067de00b71d70d4077d09fdbba70a6f423085cd1f83807f9cf6f82f32
3
+ size 3438375973
checkpoint-400000/optimizer.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6947b10cb3942d5258cc96f68400d2d66674733771c4fabae3e67cdf2423fa1d
3
  size 6876749715
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b72c9153a419abe1063812ab8f9768418d386d614adb81dc766a4d0d99db0d4d
3
  size 6876749715
checkpoint-400000/random_states_0.pkl CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5c13c4a88b51c926a98f92b3664ccf9386c360472c46e8e0c5d7e9ba690c73f7
3
  size 14727
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3342a8754f610bf7744be6e8783b322515e68d04d198d16b3163b548520a86b5
3
  size 14727
checkpoint-400000/scaler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8302ff80be614df12fa711d9a95cab568ef98928f174c5387fbcd5c06e0c038f
3
  size 557
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:575394dfc035b75e1a186ac3cc3d436bf93f27d1a109f1d8e0c349834f6133b7
3
  size 557
checkpoint-400000/scheduler.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:cdde7fe3c7cddd8812b687b7778ab2c0d0be206dcee60994ae5223ebc5dda448
3
  size 563
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b62ed99e252fdcea888d62cdd6f58a2dd9cc4e84976c03d70e6505bdaeb1f252
3
  size 563
checkpoint-400000/unet/config.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "_class_name": "UNet2DConditionModel",
3
  "_diffusers_version": "0.20.0.dev0",
4
- "_name_or_path": "runwayml/stable-diffusion-v1-5",
5
  "act_fn": "silu",
6
  "addition_embed_type": null,
7
  "addition_embed_type_num_heads": 64,
 
1
  {
2
  "_class_name": "UNet2DConditionModel",
3
  "_diffusers_version": "0.20.0.dev0",
4
+ "_name_or_path": "iamkaikai/amazing-logos-v4",
5
  "act_fn": "silu",
6
  "addition_embed_type": null,
7
  "addition_embed_type_num_heads": 64,
checkpoint-400000/unet/diffusion_pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3f7a38bdb4c763c10697a4fe487c216d51a602ca0f24055465d1730d4fbf930b
3
  size 3438375973
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3029dd34de28ad2c0e5aa6b18365afac7704bbe951091fdb34353562fc103130
3
  size 3438375973
checkpoint-800000/optimizer.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e0aff50c3d2074b767b1b87caf2ca632fb01975a0fc4b3ab8278cf5c941cdf4a
3
  size 6876749715
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:812de1a8f812dbfb8af750aef4082c58c8780f3152129a436b2d56b68b175175
3
  size 6876749715
checkpoint-800000/random_states_0.pkl CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3e1b4e80e520a0acbdcdd5ac38c65a3e93efcf2c764ec3123215d23e0656eaf9
3
  size 14727
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a9aa89cba07bdc72192c00a1f236a7bd4e8207bc482b0d950c8a592da1fd6815
3
  size 14727
checkpoint-800000/scaler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:fe8ba2e5c929c3bc5734a7fe0f46f99b59f95f8d328090eb7bfabba4fd27f171
3
  size 557
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:331252f87fa8ff0b2c621d32a7512699b4e7d557727e475f4272397bee489206
3
  size 557
checkpoint-800000/scheduler.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2335ec23b9a30cd63e3b86cdda47ea26e24ddd1f952509c04be05eacf67b6203
3
  size 563
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6f7ff67d9eb162b86b693775f482793e541511650ac6fdeb629ff82edbc18037
3
  size 563
checkpoint-800000/unet/config.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "_class_name": "UNet2DConditionModel",
3
  "_diffusers_version": "0.20.0.dev0",
4
- "_name_or_path": "/amazing-logos-v4/checkpoint-400000",
5
  "act_fn": "silu",
6
  "addition_embed_type": null,
7
  "addition_embed_type_num_heads": 64,
 
1
  {
2
  "_class_name": "UNet2DConditionModel",
3
  "_diffusers_version": "0.20.0.dev0",
4
+ "_name_or_path": "iamkaikai/amazing-logos-v4",
5
  "act_fn": "silu",
6
  "addition_embed_type": null,
7
  "addition_embed_type_num_heads": 64,
checkpoint-800000/unet/diffusion_pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:800c2d3a5c3de493f04b9dedc66b9dae8cc5913f976f39c78af5378ff53356dc
3
  size 3438375973
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e6c6f5905767ad5667017f5c77b8c8876d554a207ffa040cbf7e8b358225d9b8
3
  size 3438375973
convert_diffusers_to_original_stable_diffusion.py ADDED
@@ -0,0 +1,333 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Script for converting a HF Diffusers saved pipeline to a Stable Diffusion checkpoint.
2
+ # *Only* converts the UNet, VAE, and Text Encoder.
3
+ # Does not convert optimizer state or any other thing.
4
+
5
+ import argparse
6
+ import os.path as osp
7
+ import re
8
+
9
+ import torch
10
+ from safetensors.torch import load_file, save_file
11
+
12
+
13
+ # =================#
14
+ # UNet Conversion #
15
+ # =================#
16
+
17
+ unet_conversion_map = [
18
+ # (stable-diffusion, HF Diffusers)
19
+ ("time_embed.0.weight", "time_embedding.linear_1.weight"),
20
+ ("time_embed.0.bias", "time_embedding.linear_1.bias"),
21
+ ("time_embed.2.weight", "time_embedding.linear_2.weight"),
22
+ ("time_embed.2.bias", "time_embedding.linear_2.bias"),
23
+ ("input_blocks.0.0.weight", "conv_in.weight"),
24
+ ("input_blocks.0.0.bias", "conv_in.bias"),
25
+ ("out.0.weight", "conv_norm_out.weight"),
26
+ ("out.0.bias", "conv_norm_out.bias"),
27
+ ("out.2.weight", "conv_out.weight"),
28
+ ("out.2.bias", "conv_out.bias"),
29
+ ]
30
+
31
+ unet_conversion_map_resnet = [
32
+ # (stable-diffusion, HF Diffusers)
33
+ ("in_layers.0", "norm1"),
34
+ ("in_layers.2", "conv1"),
35
+ ("out_layers.0", "norm2"),
36
+ ("out_layers.3", "conv2"),
37
+ ("emb_layers.1", "time_emb_proj"),
38
+ ("skip_connection", "conv_shortcut"),
39
+ ]
40
+
41
+ unet_conversion_map_layer = []
42
+ # hardcoded number of downblocks and resnets/attentions...
43
+ # would need smarter logic for other networks.
44
+ for i in range(4):
45
+ # loop over downblocks/upblocks
46
+
47
+ for j in range(2):
48
+ # loop over resnets/attentions for downblocks
49
+ hf_down_res_prefix = f"down_blocks.{i}.resnets.{j}."
50
+ sd_down_res_prefix = f"input_blocks.{3*i + j + 1}.0."
51
+ unet_conversion_map_layer.append((sd_down_res_prefix, hf_down_res_prefix))
52
+
53
+ if i < 3:
54
+ # no attention layers in down_blocks.3
55
+ hf_down_atn_prefix = f"down_blocks.{i}.attentions.{j}."
56
+ sd_down_atn_prefix = f"input_blocks.{3*i + j + 1}.1."
57
+ unet_conversion_map_layer.append((sd_down_atn_prefix, hf_down_atn_prefix))
58
+
59
+ for j in range(3):
60
+ # loop over resnets/attentions for upblocks
61
+ hf_up_res_prefix = f"up_blocks.{i}.resnets.{j}."
62
+ sd_up_res_prefix = f"output_blocks.{3*i + j}.0."
63
+ unet_conversion_map_layer.append((sd_up_res_prefix, hf_up_res_prefix))
64
+
65
+ if i > 0:
66
+ # no attention layers in up_blocks.0
67
+ hf_up_atn_prefix = f"up_blocks.{i}.attentions.{j}."
68
+ sd_up_atn_prefix = f"output_blocks.{3*i + j}.1."
69
+ unet_conversion_map_layer.append((sd_up_atn_prefix, hf_up_atn_prefix))
70
+
71
+ if i < 3:
72
+ # no downsample in down_blocks.3
73
+ hf_downsample_prefix = f"down_blocks.{i}.downsamplers.0.conv."
74
+ sd_downsample_prefix = f"input_blocks.{3*(i+1)}.0.op."
75
+ unet_conversion_map_layer.append((sd_downsample_prefix, hf_downsample_prefix))
76
+
77
+ # no upsample in up_blocks.3
78
+ hf_upsample_prefix = f"up_blocks.{i}.upsamplers.0."
79
+ sd_upsample_prefix = f"output_blocks.{3*i + 2}.{1 if i == 0 else 2}."
80
+ unet_conversion_map_layer.append((sd_upsample_prefix, hf_upsample_prefix))
81
+
82
+ hf_mid_atn_prefix = "mid_block.attentions.0."
83
+ sd_mid_atn_prefix = "middle_block.1."
84
+ unet_conversion_map_layer.append((sd_mid_atn_prefix, hf_mid_atn_prefix))
85
+
86
+ for j in range(2):
87
+ hf_mid_res_prefix = f"mid_block.resnets.{j}."
88
+ sd_mid_res_prefix = f"middle_block.{2*j}."
89
+ unet_conversion_map_layer.append((sd_mid_res_prefix, hf_mid_res_prefix))
90
+
91
+
92
+ def convert_unet_state_dict(unet_state_dict):
93
+ # buyer beware: this is a *brittle* function,
94
+ # and correct output requires that all of these pieces interact in
95
+ # the exact order in which I have arranged them.
96
+ mapping = {k: k for k in unet_state_dict.keys()}
97
+ for sd_name, hf_name in unet_conversion_map:
98
+ mapping[hf_name] = sd_name
99
+ for k, v in mapping.items():
100
+ if "resnets" in k:
101
+ for sd_part, hf_part in unet_conversion_map_resnet:
102
+ v = v.replace(hf_part, sd_part)
103
+ mapping[k] = v
104
+ for k, v in mapping.items():
105
+ for sd_part, hf_part in unet_conversion_map_layer:
106
+ v = v.replace(hf_part, sd_part)
107
+ mapping[k] = v
108
+ new_state_dict = {v: unet_state_dict[k] for k, v in mapping.items()}
109
+ return new_state_dict
110
+
111
+
112
+ # ================#
113
+ # VAE Conversion #
114
+ # ================#
115
+
116
+ vae_conversion_map = [
117
+ # (stable-diffusion, HF Diffusers)
118
+ ("nin_shortcut", "conv_shortcut"),
119
+ ("norm_out", "conv_norm_out"),
120
+ ("mid.attn_1.", "mid_block.attentions.0."),
121
+ ]
122
+
123
+ for i in range(4):
124
+ # down_blocks have two resnets
125
+ for j in range(2):
126
+ hf_down_prefix = f"encoder.down_blocks.{i}.resnets.{j}."
127
+ sd_down_prefix = f"encoder.down.{i}.block.{j}."
128
+ vae_conversion_map.append((sd_down_prefix, hf_down_prefix))
129
+
130
+ if i < 3:
131
+ hf_downsample_prefix = f"down_blocks.{i}.downsamplers.0."
132
+ sd_downsample_prefix = f"down.{i}.downsample."
133
+ vae_conversion_map.append((sd_downsample_prefix, hf_downsample_prefix))
134
+
135
+ hf_upsample_prefix = f"up_blocks.{i}.upsamplers.0."
136
+ sd_upsample_prefix = f"up.{3-i}.upsample."
137
+ vae_conversion_map.append((sd_upsample_prefix, hf_upsample_prefix))
138
+
139
+ # up_blocks have three resnets
140
+ # also, up blocks in hf are numbered in reverse from sd
141
+ for j in range(3):
142
+ hf_up_prefix = f"decoder.up_blocks.{i}.resnets.{j}."
143
+ sd_up_prefix = f"decoder.up.{3-i}.block.{j}."
144
+ vae_conversion_map.append((sd_up_prefix, hf_up_prefix))
145
+
146
+ # this part accounts for mid blocks in both the encoder and the decoder
147
+ for i in range(2):
148
+ hf_mid_res_prefix = f"mid_block.resnets.{i}."
149
+ sd_mid_res_prefix = f"mid.block_{i+1}."
150
+ vae_conversion_map.append((sd_mid_res_prefix, hf_mid_res_prefix))
151
+
152
+
153
+ vae_conversion_map_attn = [
154
+ # (stable-diffusion, HF Diffusers)
155
+ ("norm.", "group_norm."),
156
+ ("q.", "query."),
157
+ ("k.", "key."),
158
+ ("v.", "value."),
159
+ ("proj_out.", "proj_attn."),
160
+ ]
161
+
162
+
163
+ def reshape_weight_for_sd(w):
164
+ # convert HF linear weights to SD conv2d weights
165
+ return w.reshape(*w.shape, 1, 1)
166
+
167
+
168
+ def convert_vae_state_dict(vae_state_dict):
169
+ mapping = {k: k for k in vae_state_dict.keys()}
170
+ for k, v in mapping.items():
171
+ for sd_part, hf_part in vae_conversion_map:
172
+ v = v.replace(hf_part, sd_part)
173
+ mapping[k] = v
174
+ for k, v in mapping.items():
175
+ if "attentions" in k:
176
+ for sd_part, hf_part in vae_conversion_map_attn:
177
+ v = v.replace(hf_part, sd_part)
178
+ mapping[k] = v
179
+ new_state_dict = {v: vae_state_dict[k] for k, v in mapping.items()}
180
+ weights_to_convert = ["q", "k", "v", "proj_out"]
181
+ for k, v in new_state_dict.items():
182
+ for weight_name in weights_to_convert:
183
+ if f"mid.attn_1.{weight_name}.weight" in k:
184
+ print(f"Reshaping {k} for SD format")
185
+ new_state_dict[k] = reshape_weight_for_sd(v)
186
+ return new_state_dict
187
+
188
+
189
+ # =========================#
190
+ # Text Encoder Conversion #
191
+ # =========================#
192
+
193
+
194
+ textenc_conversion_lst = [
195
+ # (stable-diffusion, HF Diffusers)
196
+ ("resblocks.", "text_model.encoder.layers."),
197
+ ("ln_1", "layer_norm1"),
198
+ ("ln_2", "layer_norm2"),
199
+ (".c_fc.", ".fc1."),
200
+ (".c_proj.", ".fc2."),
201
+ (".attn", ".self_attn"),
202
+ ("ln_final.", "transformer.text_model.final_layer_norm."),
203
+ ("token_embedding.weight", "transformer.text_model.embeddings.token_embedding.weight"),
204
+ ("positional_embedding", "transformer.text_model.embeddings.position_embedding.weight"),
205
+ ]
206
+ protected = {re.escape(x[1]): x[0] for x in textenc_conversion_lst}
207
+ textenc_pattern = re.compile("|".join(protected.keys()))
208
+
209
+ # Ordering is from https://github.com/pytorch/pytorch/blob/master/test/cpp/api/modules.cpp
210
+ code2idx = {"q": 0, "k": 1, "v": 2}
211
+
212
+
213
+ def convert_text_enc_state_dict_v20(text_enc_dict):
214
+ new_state_dict = {}
215
+ capture_qkv_weight = {}
216
+ capture_qkv_bias = {}
217
+ for k, v in text_enc_dict.items():
218
+ if (
219
+ k.endswith(".self_attn.q_proj.weight")
220
+ or k.endswith(".self_attn.k_proj.weight")
221
+ or k.endswith(".self_attn.v_proj.weight")
222
+ ):
223
+ k_pre = k[: -len(".q_proj.weight")]
224
+ k_code = k[-len("q_proj.weight")]
225
+ if k_pre not in capture_qkv_weight:
226
+ capture_qkv_weight[k_pre] = [None, None, None]
227
+ capture_qkv_weight[k_pre][code2idx[k_code]] = v
228
+ continue
229
+
230
+ if (
231
+ k.endswith(".self_attn.q_proj.bias")
232
+ or k.endswith(".self_attn.k_proj.bias")
233
+ or k.endswith(".self_attn.v_proj.bias")
234
+ ):
235
+ k_pre = k[: -len(".q_proj.bias")]
236
+ k_code = k[-len("q_proj.bias")]
237
+ if k_pre not in capture_qkv_bias:
238
+ capture_qkv_bias[k_pre] = [None, None, None]
239
+ capture_qkv_bias[k_pre][code2idx[k_code]] = v
240
+ continue
241
+
242
+ relabelled_key = textenc_pattern.sub(lambda m: protected[re.escape(m.group(0))], k)
243
+ new_state_dict[relabelled_key] = v
244
+
245
+ for k_pre, tensors in capture_qkv_weight.items():
246
+ if None in tensors:
247
+ raise Exception("CORRUPTED MODEL: one of the q-k-v values for the text encoder was missing")
248
+ relabelled_key = textenc_pattern.sub(lambda m: protected[re.escape(m.group(0))], k_pre)
249
+ new_state_dict[relabelled_key + ".in_proj_weight"] = torch.cat(tensors)
250
+
251
+ for k_pre, tensors in capture_qkv_bias.items():
252
+ if None in tensors:
253
+ raise Exception("CORRUPTED MODEL: one of the q-k-v values for the text encoder was missing")
254
+ relabelled_key = textenc_pattern.sub(lambda m: protected[re.escape(m.group(0))], k_pre)
255
+ new_state_dict[relabelled_key + ".in_proj_bias"] = torch.cat(tensors)
256
+
257
+ return new_state_dict
258
+
259
+
260
+ def convert_text_enc_state_dict(text_enc_dict):
261
+ return text_enc_dict
262
+
263
+
264
+ if __name__ == "__main__":
265
+ parser = argparse.ArgumentParser()
266
+
267
+ parser.add_argument("--model_path", default=None, type=str, required=True, help="Path to the model to convert.")
268
+ parser.add_argument("--checkpoint_path", default=None, type=str, required=True, help="Path to the output model.")
269
+ parser.add_argument("--half", action="store_true", help="Save weights in half precision.")
270
+ parser.add_argument(
271
+ "--use_safetensors", action="store_true", help="Save weights use safetensors, default is ckpt."
272
+ )
273
+
274
+ args = parser.parse_args()
275
+
276
+ assert args.model_path is not None, "Must provide a model path!"
277
+
278
+ assert args.checkpoint_path is not None, "Must provide a checkpoint path!"
279
+
280
+ # Path for safetensors
281
+ unet_path = osp.join(args.model_path, "unet", "diffusion_pytorch_model.safetensors")
282
+ vae_path = osp.join(args.model_path, "vae", "diffusion_pytorch_model.safetensors")
283
+ text_enc_path = osp.join(args.model_path, "text_encoder", "model.safetensors")
284
+
285
+ # Load models from safetensors if it exists, if it doesn't pytorch
286
+ if osp.exists(unet_path):
287
+ unet_state_dict = load_file(unet_path, device="cpu")
288
+ else:
289
+ unet_path = osp.join(args.model_path, "unet", "diffusion_pytorch_model.bin")
290
+ unet_state_dict = torch.load(unet_path, map_location="cpu")
291
+
292
+ if osp.exists(vae_path):
293
+ vae_state_dict = load_file(vae_path, device="cpu")
294
+ else:
295
+ vae_path = osp.join(args.model_path, "vae", "diffusion_pytorch_model.bin")
296
+ vae_state_dict = torch.load(vae_path, map_location="cpu")
297
+
298
+ if osp.exists(text_enc_path):
299
+ text_enc_dict = load_file(text_enc_path, device="cpu")
300
+ else:
301
+ text_enc_path = osp.join(args.model_path, "text_encoder", "pytorch_model.bin")
302
+ text_enc_dict = torch.load(text_enc_path, map_location="cpu")
303
+
304
+ # Convert the UNet model
305
+ unet_state_dict = convert_unet_state_dict(unet_state_dict)
306
+ unet_state_dict = {"model.diffusion_model." + k: v for k, v in unet_state_dict.items()}
307
+
308
+ # Convert the VAE model
309
+ vae_state_dict = convert_vae_state_dict(vae_state_dict)
310
+ vae_state_dict = {"first_stage_model." + k: v for k, v in vae_state_dict.items()}
311
+
312
+ # Easiest way to identify v2.0 model seems to be that the text encoder (OpenCLIP) is deeper
313
+ is_v20_model = "text_model.encoder.layers.22.layer_norm2.bias" in text_enc_dict
314
+
315
+ if is_v20_model:
316
+ # Need to add the tag 'transformer' in advance so we can knock it out from the final layer-norm
317
+ text_enc_dict = {"transformer." + k: v for k, v in text_enc_dict.items()}
318
+ text_enc_dict = convert_text_enc_state_dict_v20(text_enc_dict)
319
+ text_enc_dict = {"cond_stage_model.model." + k: v for k, v in text_enc_dict.items()}
320
+ else:
321
+ text_enc_dict = convert_text_enc_state_dict(text_enc_dict)
322
+ text_enc_dict = {"cond_stage_model.transformer." + k: v for k, v in text_enc_dict.items()}
323
+
324
+ # Put together new checkpoint
325
+ state_dict = {**unet_state_dict, **vae_state_dict, **text_enc_dict}
326
+ if args.half:
327
+ state_dict = {k: v.half() for k, v in state_dict.items()}
328
+
329
+ if args.use_safetensors:
330
+ save_file(state_dict, args.checkpoint_path)
331
+ else:
332
+ state_dict = {"state_dict": state_dict}
333
+ torch.save(state_dict, args.checkpoint_path)
model_index.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "_class_name": "StableDiffusionPipeline",
3
  "_diffusers_version": "0.20.0.dev0",
4
- "_name_or_path": "runwayml/stable-diffusion-v1-5",
5
  "feature_extractor": [
6
  "transformers",
7
  "CLIPImageProcessor"
 
1
  {
2
  "_class_name": "StableDiffusionPipeline",
3
  "_diffusers_version": "0.20.0.dev0",
4
+ "_name_or_path": "iamkaikai/amazing-logos-v4",
5
  "feature_extractor": [
6
  "transformers",
7
  "CLIPImageProcessor"
safety_checker/config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "_name_or_path": "/root/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/c9ab35ff5f2c362e9e22fbafe278077e196057f0/safety_checker",
3
  "architectures": [
4
  "StableDiffusionSafetyChecker"
5
  ],
 
1
  {
2
+ "_name_or_path": "/root/.cache/huggingface/hub/models--iamkaikai--amazing-logos-v4/snapshots/3ccca4c043fff382aebc663f3672ed46d73efc1d/safety_checker",
3
  "architectures": [
4
  "StableDiffusionSafetyChecker"
5
  ],
text_encoder/config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "_name_or_path": "runwayml/stable-diffusion-v1-5",
3
  "architectures": [
4
  "CLIPTextModel"
5
  ],
 
1
  {
2
+ "_name_or_path": "iamkaikai/amazing-logos-v4",
3
  "architectures": [
4
  "CLIPTextModel"
5
  ],
unet/config.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "_class_name": "UNet2DConditionModel",
3
  "_diffusers_version": "0.20.0.dev0",
4
- "_name_or_path": "/amazing-logos-v4/checkpoint-800000",
5
  "act_fn": "silu",
6
  "addition_embed_type": null,
7
  "addition_embed_type_num_heads": 64,
 
1
  {
2
  "_class_name": "UNet2DConditionModel",
3
  "_diffusers_version": "0.20.0.dev0",
4
+ "_name_or_path": "/amazing-logos-v4/checkpoint-1200000",
5
  "act_fn": "silu",
6
  "addition_embed_type": null,
7
  "addition_embed_type_num_heads": 64,
unet/diffusion_pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ce54742a1714519695c373f7a11dbb8fec5f23acb84e06f152bb0cb420ff2c21
3
  size 3438375973
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dbbd608eafb09301904c8b94772552f8b92ee8df31aa1c26ef67034b89084198
3
  size 3438375973
vae/config.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "_class_name": "AutoencoderKL",
3
  "_diffusers_version": "0.20.0.dev0",
4
- "_name_or_path": "runwayml/stable-diffusion-v1-5",
5
  "act_fn": "silu",
6
  "block_out_channels": [
7
  128,
 
1
  {
2
  "_class_name": "AutoencoderKL",
3
  "_diffusers_version": "0.20.0.dev0",
4
+ "_name_or_path": "iamkaikai/amazing-logos-v4",
5
  "act_fn": "silu",
6
  "block_out_channels": [
7
  128,
val_imgs_grid.png CHANGED

Git LFS Details

  • SHA256: 1a0b0cbb60377906f4de402310d5e26437067e448a3fe0f96d60a9c97bbda369
  • Pointer size: 132 Bytes
  • Size of remote file: 4.41 MB

Git LFS Details

  • SHA256: e7016ea7bb64c2510e8dc0c87dee473702ed0a1053e19976048659e1761243db
  • Pointer size: 132 Bytes
  • Size of remote file: 4.72 MB