iamkaikai commited on Aug 24, 2023

Commit

cc09a25

•

1 Parent(s): 3ccca4c

End of training

Browse files

Files changed (28) hide show

README.md +4 -4
amazing-logos-v4.ckpt +3 -0
checkpoint-1200000/optimizer.bin +3 -0
checkpoint-1200000/random_states_0.pkl +3 -0
checkpoint-1200000/scaler.pt +3 -0
checkpoint-1200000/scheduler.bin +3 -0
checkpoint-1200000/unet/config.json +66 -0
checkpoint-1200000/unet/diffusion_pytorch_model.bin +3 -0
checkpoint-400000/optimizer.bin +1 -1
checkpoint-400000/random_states_0.pkl +1 -1
checkpoint-400000/scaler.pt +1 -1
checkpoint-400000/scheduler.bin +1 -1
checkpoint-400000/unet/config.json +1 -1
checkpoint-400000/unet/diffusion_pytorch_model.bin +1 -1
checkpoint-800000/optimizer.bin +1 -1
checkpoint-800000/random_states_0.pkl +1 -1
checkpoint-800000/scaler.pt +1 -1
checkpoint-800000/scheduler.bin +1 -1
checkpoint-800000/unet/config.json +1 -1
checkpoint-800000/unet/diffusion_pytorch_model.bin +1 -1
convert_diffusers_to_original_stable_diffusion.py +333 -0
model_index.json +1 -1
safety_checker/config.json +1 -1
text_encoder/config.json +1 -1
unet/config.json +1 -1
unet/diffusion_pytorch_model.bin +1 -1
vae/config.json +1 -1
val_imgs_grid.png +2 -2

README.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 license: creativeml-openrail-m
-base_model: runwayml/stable-diffusion-v1-5
 datasets:
 - iamkaikai/amazing_logos_v4
 tags:
@@ -14,7 +14,7 @@ inference: true
 # Text-to-image finetuning - iamkaikai/amazing-logos-v4
-This pipeline was finetuned from **runwayml/stable-diffusion-v1-5** on the **iamkaikai/amazing_logos_v4** dataset. Below are some example images generated with the finetuned pipeline using the following prompts: ['Simple elegant logo for Mandarin Oriental, Fan Hong kong Lines Paper, Hospitality, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges, black and white', 'Simple elegant logo for AltVest Investments, alternative investments financial services, Finance, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for PeckCerativeHorz2.jpg, peck horizontal trends branding bold photography analysis packaging vertical products circle discovery identity color creative exhibition direction P  graphics julian research, , successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', "Simple elegant logo for Johns Creek Shirts, printing  T's art Apparel screen tshirt summer T t shirts, Apparel, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges", 'Simple elegant logo for MGD, Human Circle MGD dots Resources SRP 3D brown, Human Resources, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for Indooroopilly Uniting Church, abstract initials people swirl letter I letter U letter C giving community soft friendly purple blue red, Religious, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for Hacker, Douglas, & Company, accountant Hollywood law H filmstrip attorney HDC film, law, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for Windmill unused #5, windmill property community shapes quilt blades houses colorful carlsbad homes circle whimsical estate housing real, housing development, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for The Duck Store, track track and field sports athletics tree logo badge, Sports Apparel, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for InGenious Fitness, G  Ball  Green  Blue, Fitness, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for KickCharge Creative, seating safety man driver person figure hardhat S initial sign, Transportation, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for Chickasaw Nation, water drop laundry, Commercial Laundry Services, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for NBA Properties, Inc., basketball sports branding team entertainment philadelphia star patriotic, Sports Entertainment, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for North Asheville Tailgate Market Veggie Sub Mark, culinary cheese Initials combo  organic serif vegetable radish Farmers eggplant inspirations2023 tailgate food market submark asheville farm kale modern unique sanserif veggie , farmers market, culinary, food, retail, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for A. Diethelm, A Circle Line Switzerland Triangle, Painting Tools and Supplies, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges, black and white', 'Simple elegant logo for Grupo Altair Publicidad, Circle Lines Venezuela, Publishing, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges, black and white']:
 ![val_imgs_grid](./val_imgs_grid.png)
@@ -37,7 +37,7 @@ image.save("my_image.png")
 These are the key hyperparameters used during training:
-* Epochs: 3
 * Learning rate: 1e-06
 * Batch size: 1
 * Gradient accumulation steps: 1
@@ -45,4 +45,4 @@ These are the key hyperparameters used during training:
 * Mixed-precision: fp16
-More information on all the CLI arguments and the environment are available on your [`wandb` run page](https://wandb.ai/iam-kai-kai/text2image-fine-tune/runs/k56ze8nm).

 ---
 license: creativeml-openrail-m
+base_model: iamkaikai/amazing-logos-v4
 datasets:
 - iamkaikai/amazing_logos_v4
 tags:
 # Text-to-image finetuning - iamkaikai/amazing-logos-v4
+This pipeline was finetuned from **iamkaikai/amazing-logos-v4** on the **iamkaikai/amazing_logos_v4** dataset. Below are some example images generated with the finetuned pipeline using the following prompts: ['Simple elegant logo for Mandarin Oriental, Fan Hong kong Lines Paper, Hospitality, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges, black and white', 'Simple elegant logo for AltVest Investments, alternative investments financial services, Finance, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for PeckCerativeHorz2.jpg, peck horizontal trends branding bold photography analysis packaging vertical products circle discovery identity color creative exhibition direction P  graphics julian research, , successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', "Simple elegant logo for Johns Creek Shirts, printing  T's art Apparel screen tshirt summer T t shirts, Apparel, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges", 'Simple elegant logo for MGD, Human Circle MGD dots Resources SRP 3D brown, Human Resources, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for Indooroopilly Uniting Church, abstract initials people swirl letter I letter U letter C giving community soft friendly purple blue red, Religious, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for Hacker, Douglas, & Company, accountant Hollywood law H filmstrip attorney HDC film, law, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for Windmill unused #5, windmill property community shapes quilt blades houses colorful carlsbad homes circle whimsical estate housing real, housing development, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for The Duck Store, track track and field sports athletics tree logo badge, Sports Apparel, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for InGenious Fitness, G  Ball  Green  Blue, Fitness, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for KickCharge Creative, seating safety man driver person figure hardhat S initial sign, Transportation, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for Chickasaw Nation, water drop laundry, Commercial Laundry Services, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for NBA Properties, Inc., basketball sports branding team entertainment philadelphia star patriotic, Sports Entertainment, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for North Asheville Tailgate Market Veggie Sub Mark, culinary cheese Initials combo  organic serif vegetable radish Farmers eggplant inspirations2023 tailgate food market submark asheville farm kale modern unique sanserif veggie , farmers market, culinary, food, retail, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges', 'Simple elegant logo for A. Diethelm, A Circle Line Switzerland Triangle, Painting Tools and Supplies, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges, black and white', 'Simple elegant logo for Grupo Altair Publicidad, Circle Lines Venezuela, Publishing, successful vibe, minimalist, thought-provoking, abstract, recognizable, relatable, sharp, vector art, even edges, black and white']:
 ![val_imgs_grid](./val_imgs_grid.png)
 These are the key hyperparameters used during training:
+* Epochs: 4
 * Learning rate: 1e-06
 * Batch size: 1
 * Gradient accumulation steps: 1
 * Mixed-precision: fp16
+More information on all the CLI arguments and the environment are available on your [`wandb` run page](https://wandb.ai/iam-kai-kai/text2image-fine-tune/runs/z0e685b8).

amazing-logos-v4.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d14e7f9f8ac0e8b8a5c03ca427227231600272c14abc63425a3514374ca5bd96
+size 3851910203

checkpoint-1200000/optimizer.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:35c90c20c1070026dfcf03c8e840d8ba2d1782bbb5665dc2e436d93b6fd5daab
+size 6876749715

checkpoint-1200000/random_states_0.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6a3c9425ad8250aaa3a1064f5d8322a327ea791be152d63a3297330b7aefde10
+size 14727

checkpoint-1200000/scaler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:35fdd6a9e94c346880769a6b076e39d83d0ad76b0e78d4f6bc05c3ced87e4213
+size 557

checkpoint-1200000/scheduler.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c35b14e050a8d7886fa6a9eccbdddea8acf8380950b2ec23435707e664fc94e5
+size 563

checkpoint-1200000/unet/config.json ADDED Viewed

	@@ -0,0 +1,66 @@

+{
+  "_class_name": "UNet2DConditionModel",
+  "_diffusers_version": "0.20.0.dev0",
+  "_name_or_path": "iamkaikai/amazing-logos-v4",
+  "act_fn": "silu",
+  "addition_embed_type": null,
+  "addition_embed_type_num_heads": 64,
+  "addition_time_embed_dim": null,
+  "attention_head_dim": 8,
+  "attention_type": "default",
+  "block_out_channels": [
+    320,
+    640,
+    1280,
+    1280
+  ],
+  "center_input_sample": false,
+  "class_embed_type": null,
+  "class_embeddings_concat": false,
+  "conv_in_kernel": 3,
+  "conv_out_kernel": 3,
+  "cross_attention_dim": 768,
+  "cross_attention_norm": null,
+  "down_block_types": [
+    "CrossAttnDownBlock2D",
+    "CrossAttnDownBlock2D",
+    "CrossAttnDownBlock2D",
+    "DownBlock2D"
+  ],
+  "downsample_padding": 1,
+  "dual_cross_attention": false,
+  "encoder_hid_dim": null,
+  "encoder_hid_dim_type": null,
+  "flip_sin_to_cos": true,
+  "freq_shift": 0,
+  "in_channels": 4,
+  "layers_per_block": 2,
+  "mid_block_only_cross_attention": null,
+  "mid_block_scale_factor": 1,
+  "mid_block_type": "UNetMidBlock2DCrossAttn",
+  "norm_eps": 1e-05,
+  "norm_num_groups": 32,
+  "num_attention_heads": null,
+  "num_class_embeds": null,
+  "only_cross_attention": false,
+  "out_channels": 4,
+  "projection_class_embeddings_input_dim": null,
+  "resnet_out_scale_factor": 1.0,
+  "resnet_skip_time_act": false,
+  "resnet_time_scale_shift": "default",
+  "sample_size": 64,
+  "time_cond_proj_dim": null,
+  "time_embedding_act_fn": null,
+  "time_embedding_dim": null,
+  "time_embedding_type": "positional",
+  "timestep_post_act": null,
+  "transformer_layers_per_block": 1,
+  "up_block_types": [
+    "UpBlock2D",
+    "CrossAttnUpBlock2D",
+    "CrossAttnUpBlock2D",
+    "CrossAttnUpBlock2D"
+  ],
+  "upcast_attention": false,
+  "use_linear_projection": false
+}

checkpoint-1200000/unet/diffusion_pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:371d6a5067de00b71d70d4077d09fdbba70a6f423085cd1f83807f9cf6f82f32
+size 3438375973

checkpoint-400000/optimizer.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:6947b10cb3942d5258cc96f68400d2d66674733771c4fabae3e67cdf2423fa1d
 size 6876749715

 version https://git-lfs.github.com/spec/v1
+oid sha256:b72c9153a419abe1063812ab8f9768418d386d614adb81dc766a4d0d99db0d4d
 size 6876749715

checkpoint-400000/random_states_0.pkl CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5c13c4a88b51c926a98f92b3664ccf9386c360472c46e8e0c5d7e9ba690c73f7
 size 14727

 version https://git-lfs.github.com/spec/v1
+oid sha256:3342a8754f610bf7744be6e8783b322515e68d04d198d16b3163b548520a86b5
 size 14727

checkpoint-400000/scaler.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8302ff80be614df12fa711d9a95cab568ef98928f174c5387fbcd5c06e0c038f
 size 557

 version https://git-lfs.github.com/spec/v1
+oid sha256:575394dfc035b75e1a186ac3cc3d436bf93f27d1a109f1d8e0c349834f6133b7
 size 557

checkpoint-400000/scheduler.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:cdde7fe3c7cddd8812b687b7778ab2c0d0be206dcee60994ae5223ebc5dda448
 size 563

 version https://git-lfs.github.com/spec/v1
+oid sha256:b62ed99e252fdcea888d62cdd6f58a2dd9cc4e84976c03d70e6505bdaeb1f252
 size 563

checkpoint-400000/unet/config.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "_class_name": "UNet2DConditionModel",
   "_diffusers_version": "0.20.0.dev0",
-  "_name_or_path": "runwayml/stable-diffusion-v1-5",
   "act_fn": "silu",
   "addition_embed_type": null,
   "addition_embed_type_num_heads": 64,

 {
   "_class_name": "UNet2DConditionModel",
   "_diffusers_version": "0.20.0.dev0",
+  "_name_or_path": "iamkaikai/amazing-logos-v4",
   "act_fn": "silu",
   "addition_embed_type": null,
   "addition_embed_type_num_heads": 64,

checkpoint-400000/unet/diffusion_pytorch_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3f7a38bdb4c763c10697a4fe487c216d51a602ca0f24055465d1730d4fbf930b
 size 3438375973

 version https://git-lfs.github.com/spec/v1
+oid sha256:3029dd34de28ad2c0e5aa6b18365afac7704bbe951091fdb34353562fc103130
 size 3438375973

checkpoint-800000/optimizer.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e0aff50c3d2074b767b1b87caf2ca632fb01975a0fc4b3ab8278cf5c941cdf4a
 size 6876749715

 version https://git-lfs.github.com/spec/v1
+oid sha256:812de1a8f812dbfb8af750aef4082c58c8780f3152129a436b2d56b68b175175
 size 6876749715

checkpoint-800000/random_states_0.pkl CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3e1b4e80e520a0acbdcdd5ac38c65a3e93efcf2c764ec3123215d23e0656eaf9
 size 14727

 version https://git-lfs.github.com/spec/v1
+oid sha256:a9aa89cba07bdc72192c00a1f236a7bd4e8207bc482b0d950c8a592da1fd6815
 size 14727

checkpoint-800000/scaler.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:fe8ba2e5c929c3bc5734a7fe0f46f99b59f95f8d328090eb7bfabba4fd27f171
 size 557

 version https://git-lfs.github.com/spec/v1
+oid sha256:331252f87fa8ff0b2c621d32a7512699b4e7d557727e475f4272397bee489206
 size 557

checkpoint-800000/scheduler.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:2335ec23b9a30cd63e3b86cdda47ea26e24ddd1f952509c04be05eacf67b6203
 size 563

 version https://git-lfs.github.com/spec/v1
+oid sha256:6f7ff67d9eb162b86b693775f482793e541511650ac6fdeb629ff82edbc18037
 size 563

checkpoint-800000/unet/config.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "_class_name": "UNet2DConditionModel",
   "_diffusers_version": "0.20.0.dev0",
-  "_name_or_path": "/amazing-logos-v4/checkpoint-400000",
   "act_fn": "silu",
   "addition_embed_type": null,
   "addition_embed_type_num_heads": 64,

 {
   "_class_name": "UNet2DConditionModel",
   "_diffusers_version": "0.20.0.dev0",
+  "_name_or_path": "iamkaikai/amazing-logos-v4",
   "act_fn": "silu",
   "addition_embed_type": null,
   "addition_embed_type_num_heads": 64,

checkpoint-800000/unet/diffusion_pytorch_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:800c2d3a5c3de493f04b9dedc66b9dae8cc5913f976f39c78af5378ff53356dc
 size 3438375973

 version https://git-lfs.github.com/spec/v1
+oid sha256:e6c6f5905767ad5667017f5c77b8c8876d554a207ffa040cbf7e8b358225d9b8
 size 3438375973

convert_diffusers_to_original_stable_diffusion.py ADDED Viewed

	@@ -0,0 +1,333 @@

+# Script for converting a HF Diffusers saved pipeline to a Stable Diffusion checkpoint.
+# *Only* converts the UNet, VAE, and Text Encoder.
+# Does not convert optimizer state or any other thing.
+import argparse
+import os.path as osp
+import re
+import torch
+from safetensors.torch import load_file, save_file
+# =================#
+# UNet Conversion #
+# =================#
+unet_conversion_map = [
+    # (stable-diffusion, HF Diffusers)
+    ("time_embed.0.weight", "time_embedding.linear_1.weight"),
+    ("time_embed.0.bias", "time_embedding.linear_1.bias"),
+    ("time_embed.2.weight", "time_embedding.linear_2.weight"),
+    ("time_embed.2.bias", "time_embedding.linear_2.bias"),
+    ("input_blocks.0.0.weight", "conv_in.weight"),
+    ("input_blocks.0.0.bias", "conv_in.bias"),
+    ("out.0.weight", "conv_norm_out.weight"),
+    ("out.0.bias", "conv_norm_out.bias"),
+    ("out.2.weight", "conv_out.weight"),
+    ("out.2.bias", "conv_out.bias"),
+]
+unet_conversion_map_resnet = [
+    # (stable-diffusion, HF Diffusers)
+    ("in_layers.0", "norm1"),
+    ("in_layers.2", "conv1"),
+    ("out_layers.0", "norm2"),
+    ("out_layers.3", "conv2"),
+    ("emb_layers.1", "time_emb_proj"),
+    ("skip_connection", "conv_shortcut"),
+]
+unet_conversion_map_layer = []
+# hardcoded number of downblocks and resnets/attentions...
+# would need smarter logic for other networks.
+for i in range(4):
+    # loop over downblocks/upblocks
+    for j in range(2):
+        # loop over resnets/attentions for downblocks
+        hf_down_res_prefix = f"down_blocks.{i}.resnets.{j}."
+        sd_down_res_prefix = f"input_blocks.{3*i + j + 1}.0."
+        unet_conversion_map_layer.append((sd_down_res_prefix, hf_down_res_prefix))
+        if i < 3:
+            # no attention layers in down_blocks.3
+            hf_down_atn_prefix = f"down_blocks.{i}.attentions.{j}."
+            sd_down_atn_prefix = f"input_blocks.{3*i + j + 1}.1."
+            unet_conversion_map_layer.append((sd_down_atn_prefix, hf_down_atn_prefix))
+    for j in range(3):
+        # loop over resnets/attentions for upblocks
+        hf_up_res_prefix = f"up_blocks.{i}.resnets.{j}."
+        sd_up_res_prefix = f"output_blocks.{3*i + j}.0."
+        unet_conversion_map_layer.append((sd_up_res_prefix, hf_up_res_prefix))
+        if i > 0:
+            # no attention layers in up_blocks.0
+            hf_up_atn_prefix = f"up_blocks.{i}.attentions.{j}."
+            sd_up_atn_prefix = f"output_blocks.{3*i + j}.1."
+            unet_conversion_map_layer.append((sd_up_atn_prefix, hf_up_atn_prefix))
+    if i < 3:
+        # no downsample in down_blocks.3
+        hf_downsample_prefix = f"down_blocks.{i}.downsamplers.0.conv."
+        sd_downsample_prefix = f"input_blocks.{3*(i+1)}.0.op."
+        unet_conversion_map_layer.append((sd_downsample_prefix, hf_downsample_prefix))
+        # no upsample in up_blocks.3
+        hf_upsample_prefix = f"up_blocks.{i}.upsamplers.0."
+        sd_upsample_prefix = f"output_blocks.{3*i + 2}.{1 if i == 0 else 2}."
+        unet_conversion_map_layer.append((sd_upsample_prefix, hf_upsample_prefix))
+hf_mid_atn_prefix = "mid_block.attentions.0."
+sd_mid_atn_prefix = "middle_block.1."
+unet_conversion_map_layer.append((sd_mid_atn_prefix, hf_mid_atn_prefix))
+for j in range(2):
+    hf_mid_res_prefix = f"mid_block.resnets.{j}."
+    sd_mid_res_prefix = f"middle_block.{2*j}."
+    unet_conversion_map_layer.append((sd_mid_res_prefix, hf_mid_res_prefix))
+def convert_unet_state_dict(unet_state_dict):
+    # buyer beware: this is a *brittle* function,
+    # and correct output requires that all of these pieces interact in
+    # the exact order in which I have arranged them.
+    mapping = {k: k for k in unet_state_dict.keys()}
+    for sd_name, hf_name in unet_conversion_map:
+        mapping[hf_name] = sd_name
+    for k, v in mapping.items():
+        if "resnets" in k:
+            for sd_part, hf_part in unet_conversion_map_resnet:
+                v = v.replace(hf_part, sd_part)
+            mapping[k] = v
+    for k, v in mapping.items():
+        for sd_part, hf_part in unet_conversion_map_layer:
+            v = v.replace(hf_part, sd_part)
+        mapping[k] = v
+    new_state_dict = {v: unet_state_dict[k] for k, v in mapping.items()}
+    return new_state_dict
+# ================#
+# VAE Conversion #
+# ================#
+vae_conversion_map = [
+    # (stable-diffusion, HF Diffusers)
+    ("nin_shortcut", "conv_shortcut"),
+    ("norm_out", "conv_norm_out"),
+    ("mid.attn_1.", "mid_block.attentions.0."),
+]
+for i in range(4):
+    # down_blocks have two resnets
+    for j in range(2):
+        hf_down_prefix = f"encoder.down_blocks.{i}.resnets.{j}."
+        sd_down_prefix = f"encoder.down.{i}.block.{j}."
+        vae_conversion_map.append((sd_down_prefix, hf_down_prefix))
+    if i < 3:
+        hf_downsample_prefix = f"down_blocks.{i}.downsamplers.0."
+        sd_downsample_prefix = f"down.{i}.downsample."
+        vae_conversion_map.append((sd_downsample_prefix, hf_downsample_prefix))
+        hf_upsample_prefix = f"up_blocks.{i}.upsamplers.0."
+        sd_upsample_prefix = f"up.{3-i}.upsample."
+        vae_conversion_map.append((sd_upsample_prefix, hf_upsample_prefix))
+    # up_blocks have three resnets
+    # also, up blocks in hf are numbered in reverse from sd
+    for j in range(3):
+        hf_up_prefix = f"decoder.up_blocks.{i}.resnets.{j}."
+        sd_up_prefix = f"decoder.up.{3-i}.block.{j}."
+        vae_conversion_map.append((sd_up_prefix, hf_up_prefix))
+# this part accounts for mid blocks in both the encoder and the decoder
+for i in range(2):
+    hf_mid_res_prefix = f"mid_block.resnets.{i}."
+    sd_mid_res_prefix = f"mid.block_{i+1}."
+    vae_conversion_map.append((sd_mid_res_prefix, hf_mid_res_prefix))
+vae_conversion_map_attn = [
+    # (stable-diffusion, HF Diffusers)
+    ("norm.", "group_norm."),
+    ("q.", "query."),
+    ("k.", "key."),
+    ("v.", "value."),
+    ("proj_out.", "proj_attn."),
+]
+def reshape_weight_for_sd(w):
+    # convert HF linear weights to SD conv2d weights
+    return w.reshape(*w.shape, 1, 1)
+def convert_vae_state_dict(vae_state_dict):
+    mapping = {k: k for k in vae_state_dict.keys()}
+    for k, v in mapping.items():
+        for sd_part, hf_part in vae_conversion_map:
+            v = v.replace(hf_part, sd_part)
+        mapping[k] = v
+    for k, v in mapping.items():
+        if "attentions" in k:
+            for sd_part, hf_part in vae_conversion_map_attn:
+                v = v.replace(hf_part, sd_part)
+            mapping[k] = v
+    new_state_dict = {v: vae_state_dict[k] for k, v in mapping.items()}
+    weights_to_convert = ["q", "k", "v", "proj_out"]
+    for k, v in new_state_dict.items():
+        for weight_name in weights_to_convert:
+            if f"mid.attn_1.{weight_name}.weight" in k:
+                print(f"Reshaping {k} for SD format")
+                new_state_dict[k] = reshape_weight_for_sd(v)
+    return new_state_dict
+# =========================#
+# Text Encoder Conversion #
+# =========================#
+textenc_conversion_lst = [
+    # (stable-diffusion, HF Diffusers)
+    ("resblocks.", "text_model.encoder.layers."),
+    ("ln_1", "layer_norm1"),
+    ("ln_2", "layer_norm2"),
+    (".c_fc.", ".fc1."),
+    (".c_proj.", ".fc2."),
+    (".attn", ".self_attn"),
+    ("ln_final.", "transformer.text_model.final_layer_norm."),
+    ("token_embedding.weight", "transformer.text_model.embeddings.token_embedding.weight"),
+    ("positional_embedding", "transformer.text_model.embeddings.position_embedding.weight"),
+]
+protected = {re.escape(x[1]): x[0] for x in textenc_conversion_lst}
+textenc_pattern = re.compile("|".join(protected.keys()))
+# Ordering is from https://github.com/pytorch/pytorch/blob/master/test/cpp/api/modules.cpp
+code2idx = {"q": 0, "k": 1, "v": 2}
+def convert_text_enc_state_dict_v20(text_enc_dict):
+    new_state_dict = {}
+    capture_qkv_weight = {}
+    capture_qkv_bias = {}
+    for k, v in text_enc_dict.items():
+        if (
+            k.endswith(".self_attn.q_proj.weight")
+            or k.endswith(".self_attn.k_proj.weight")
+            or k.endswith(".self_attn.v_proj.weight")
+        ):
+            k_pre = k[: -len(".q_proj.weight")]
+            k_code = k[-len("q_proj.weight")]
+            if k_pre not in capture_qkv_weight:
+                capture_qkv_weight[k_pre] = [None, None, None]
+            capture_qkv_weight[k_pre][code2idx[k_code]] = v
+            continue
+        if (
+            k.endswith(".self_attn.q_proj.bias")
+            or k.endswith(".self_attn.k_proj.bias")
+            or k.endswith(".self_attn.v_proj.bias")
+        ):
+            k_pre = k[: -len(".q_proj.bias")]
+            k_code = k[-len("q_proj.bias")]
+            if k_pre not in capture_qkv_bias:
+                capture_qkv_bias[k_pre] = [None, None, None]
+            capture_qkv_bias[k_pre][code2idx[k_code]] = v
+            continue
+        relabelled_key = textenc_pattern.sub(lambda m: protected[re.escape(m.group(0))], k)
+        new_state_dict[relabelled_key] = v
+    for k_pre, tensors in capture_qkv_weight.items():
+        if None in tensors:
+            raise Exception("CORRUPTED MODEL: one of the q-k-v values for the text encoder was missing")
+        relabelled_key = textenc_pattern.sub(lambda m: protected[re.escape(m.group(0))], k_pre)
+        new_state_dict[relabelled_key + ".in_proj_weight"] = torch.cat(tensors)
+    for k_pre, tensors in capture_qkv_bias.items():
+        if None in tensors:
+            raise Exception("CORRUPTED MODEL: one of the q-k-v values for the text encoder was missing")
+        relabelled_key = textenc_pattern.sub(lambda m: protected[re.escape(m.group(0))], k_pre)
+        new_state_dict[relabelled_key + ".in_proj_bias"] = torch.cat(tensors)
+    return new_state_dict
+def convert_text_enc_state_dict(text_enc_dict):
+    return text_enc_dict
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--model_path", default=None, type=str, required=True, help="Path to the model to convert.")
+    parser.add_argument("--checkpoint_path", default=None, type=str, required=True, help="Path to the output model.")
+    parser.add_argument("--half", action="store_true", help="Save weights in half precision.")
+    parser.add_argument(
+        "--use_safetensors", action="store_true", help="Save weights use safetensors, default is ckpt."
+    )
+    args = parser.parse_args()
+    assert args.model_path is not None, "Must provide a model path!"
+    assert args.checkpoint_path is not None, "Must provide a checkpoint path!"
+    # Path for safetensors
+    unet_path = osp.join(args.model_path, "unet", "diffusion_pytorch_model.safetensors")
+    vae_path = osp.join(args.model_path, "vae", "diffusion_pytorch_model.safetensors")
+    text_enc_path = osp.join(args.model_path, "text_encoder", "model.safetensors")
+    # Load models from safetensors if it exists, if it doesn't pytorch
+    if osp.exists(unet_path):
+        unet_state_dict = load_file(unet_path, device="cpu")
+    else:
+        unet_path = osp.join(args.model_path, "unet", "diffusion_pytorch_model.bin")
+        unet_state_dict = torch.load(unet_path, map_location="cpu")
+    if osp.exists(vae_path):
+        vae_state_dict = load_file(vae_path, device="cpu")
+    else:
+        vae_path = osp.join(args.model_path, "vae", "diffusion_pytorch_model.bin")
+        vae_state_dict = torch.load(vae_path, map_location="cpu")
+    if osp.exists(text_enc_path):
+        text_enc_dict = load_file(text_enc_path, device="cpu")
+    else:
+        text_enc_path = osp.join(args.model_path, "text_encoder", "pytorch_model.bin")
+        text_enc_dict = torch.load(text_enc_path, map_location="cpu")
+    # Convert the UNet model
+    unet_state_dict = convert_unet_state_dict(unet_state_dict)
+    unet_state_dict = {"model.diffusion_model." + k: v for k, v in unet_state_dict.items()}
+    # Convert the VAE model
+    vae_state_dict = convert_vae_state_dict(vae_state_dict)
+    vae_state_dict = {"first_stage_model." + k: v for k, v in vae_state_dict.items()}
+    # Easiest way to identify v2.0 model seems to be that the text encoder (OpenCLIP) is deeper
+    is_v20_model = "text_model.encoder.layers.22.layer_norm2.bias" in text_enc_dict
+    if is_v20_model:
+        # Need to add the tag 'transformer' in advance so we can knock it out from the final layer-norm
+        text_enc_dict = {"transformer." + k: v for k, v in text_enc_dict.items()}
+        text_enc_dict = convert_text_enc_state_dict_v20(text_enc_dict)
+        text_enc_dict = {"cond_stage_model.model." + k: v for k, v in text_enc_dict.items()}
+    else:
+        text_enc_dict = convert_text_enc_state_dict(text_enc_dict)
+        text_enc_dict = {"cond_stage_model.transformer." + k: v for k, v in text_enc_dict.items()}
+    # Put together new checkpoint
+    state_dict = {**unet_state_dict, **vae_state_dict, **text_enc_dict}
+    if args.half:
+        state_dict = {k: v.half() for k, v in state_dict.items()}
+    if args.use_safetensors:
+        save_file(state_dict, args.checkpoint_path)
+    else:
+        state_dict = {"state_dict": state_dict}
+        torch.save(state_dict, args.checkpoint_path)

model_index.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "_class_name": "StableDiffusionPipeline",
   "_diffusers_version": "0.20.0.dev0",
-  "_name_or_path": "runwayml/stable-diffusion-v1-5",
   "feature_extractor": [
     "transformers",
     "CLIPImageProcessor"

 {
   "_class_name": "StableDiffusionPipeline",
   "_diffusers_version": "0.20.0.dev0",
+  "_name_or_path": "iamkaikai/amazing-logos-v4",
   "feature_extractor": [
     "transformers",
     "CLIPImageProcessor"

safety_checker/config.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
-  "_name_or_path": "/root/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/c9ab35ff5f2c362e9e22fbafe278077e196057f0/safety_checker",
   "architectures": [
     "StableDiffusionSafetyChecker"
   ],

 {
+  "_name_or_path": "/root/.cache/huggingface/hub/models--iamkaikai--amazing-logos-v4/snapshots/3ccca4c043fff382aebc663f3672ed46d73efc1d/safety_checker",
   "architectures": [
     "StableDiffusionSafetyChecker"
   ],

text_encoder/config.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
-  "_name_or_path": "runwayml/stable-diffusion-v1-5",
   "architectures": [
     "CLIPTextModel"
   ],

 {
+  "_name_or_path": "iamkaikai/amazing-logos-v4",
   "architectures": [
     "CLIPTextModel"
   ],

unet/config.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "_class_name": "UNet2DConditionModel",
   "_diffusers_version": "0.20.0.dev0",
-  "_name_or_path": "/amazing-logos-v4/checkpoint-800000",
   "act_fn": "silu",
   "addition_embed_type": null,
   "addition_embed_type_num_heads": 64,

 {
   "_class_name": "UNet2DConditionModel",
   "_diffusers_version": "0.20.0.dev0",
+  "_name_or_path": "/amazing-logos-v4/checkpoint-1200000",
   "act_fn": "silu",
   "addition_embed_type": null,
   "addition_embed_type_num_heads": 64,

unet/diffusion_pytorch_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:ce54742a1714519695c373f7a11dbb8fec5f23acb84e06f152bb0cb420ff2c21
 size 3438375973

 version https://git-lfs.github.com/spec/v1
+oid sha256:dbbd608eafb09301904c8b94772552f8b92ee8df31aa1c26ef67034b89084198
 size 3438375973

vae/config.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "_class_name": "AutoencoderKL",
   "_diffusers_version": "0.20.0.dev0",
-  "_name_or_path": "runwayml/stable-diffusion-v1-5",
   "act_fn": "silu",
   "block_out_channels": [
     128,

 {
   "_class_name": "AutoencoderKL",
   "_diffusers_version": "0.20.0.dev0",
+  "_name_or_path": "iamkaikai/amazing-logos-v4",
   "act_fn": "silu",
   "block_out_channels": [
     128,

val_imgs_grid.png CHANGED Viewed

Git LFS Details

SHA256: 1a0b0cbb60377906f4de402310d5e26437067e448a3fe0f96d60a9c97bbda369
Pointer size: 132 Bytes
Size of remote file: 4.41 MB

Git LFS Details

SHA256: e7016ea7bb64c2510e8dc0c87dee473702ed0a1053e19976048659e1761243db
Pointer size: 132 Bytes
Size of remote file: 4.72 MB