## Calculation method
- [normal](#normal)
- [cosineA/cosineB](#cosine)
- [trainDifference](#train)
- [smoothAdd](#smooth)
- [extract](#extract)
- [tensor](#tensor)
## normal
### _Available modes :_ All
Normal calculation method. Can be used in all modes.
## cosineA/cosineB
### _Available modes :_ weight sum
The comparison of two models is performed using cosine similarity, centered on the set ratio, and is calculated to eliminate loss due to merging. See below for further details.
https://github.com/hako-mikan/sd-webui-supermerger/issues/33 https://github.com/recoilme/losslessmix
The original simple weight mode is the most basic method and works by linearly interpolating between the two models based on a given weight alpha. At alpha = 0, the output is the first model (model A), and at alpha = 1, the output is the second model (model B). Any other value of alpha results in a weighted average of the two models.
- Original merge results between AnythingV3 and FeverDream model
`charming girl mid-shot. scenery-beautiful majestic`
![MergeStandard](https://user-images.githubusercontent.com/6239068/232734670-958a6db3-1022-49ed-af73-f777223e71e6.png)
One key advantage of the cosine methods over the original simple weight mode is that they take into account the structural similarity between the two models, which can lead to better results when the two models are similar but not identical. Another advantage of the cosine methods is that they can help prevent overfitting and improve generalization by limiting the amount of detail from one model that is incorporated into the other.
**In the case of CosineA**, we normalize the vectors of the first model (model A) before merging, so the resulting merged model will favor the structure of the first model while incorporating details from the second model. This is because we are essentially aligning the direction of the first model's vectors with the direction of the corresponding vectors in the second model.
- CosineA merge results between AnythingV3 and FeverDream model
_Note structure-wise the pose direction/flow and face area_
![MergeCosineA](https://user-images.githubusercontent.com/6239068/232741979-f40450ab-6006-47e5-ae00-cf5e89b7ac09.png)
_Detail-wise for example note how above and below, in all cases there's more blur preserved for the background compared to foreground, instead of the linear difference in the original merge._
**On the other hand, in CosineB**, we normalize the vectors of the second model (model B) before merging, so the resulting merged model will favor the structure of the second model while incorporating details from the first model. This is because we are aligning the direction of the second model's vectors with the direction of the corresponding vectors in the first model.
- CosineB merge results between AnythingV3 and FeverDream model
_Note structure-wise the pose direction/flow and face area, and how in the background it tried to keep the form more from the right too_
![MergeCosineB](https://user-images.githubusercontent.com/6239068/232744751-20786eff-a654-468c-93e7-c19db5829c69.png)
**In summary, the choice between CosineA and CosineB depends on which model's structure you want to prioritize in the resulting merged model. If you want to prioritize the structure of the first model, use CosineA. If you want to prioritize the structure of the second model, use CosineB.**
Note also how the second model is more the 'reference point' for the merging looking at Alpha 1 compared to the changes at 0, so the order of models can also change the end result to look for your desired output.
- CosineA merge results between FeverDream and AnythingV3 model
![MergeOppositeCosineA](https://user-images.githubusercontent.com/6239068/232741034-ce3c9739-7f5a-4a7d-b979-fec4ac7d9b71.png)
## trainDifference
### _Available modes :_ Add difference
This method at its simplest, can be thought of as a 'super Lora' for permanent merges,
it no longer adds the calculated difference between (B)-(C) models to model (A),
now it 'trains' that difference as if it was finetuning it relative to model (A).
### Comparisons
- **Regular addDifference vs trainDifference**
With [rev animated](https://civitai.com/models/7371/rev-animated) and [isometric-future](https://civitai.com/models/10063/isometric-future)
*"IsometricFuture, garden, IsometricFuture"*
**Generated with addDifference ('rev animated')+('isometric future'-'sdv1.5')**
![IsometricA](https://github.com/hako-mikan/sd-webui-supermerger/assets/6239068/bde0c09b-4cc4-447b-acf9-da175192b546)
**Generated with trainDifference ('rev animated')+('isometric future'-'sdv1.5')**
![IsometricB](https://github.com/hako-mikan/sd-webui-supermerger/assets/6239068/afb053aa-2ace-4fe5-8b62-e7e29ae5edaf)
With [rev animated](https://civitai.com/models/7371/rev-animated) and [anything v3](https://civitai.com/models/66?modelVersionId=75)
*"man smiling"*
**Generated with 'rev animated'**
![FaceA](https://github.com/hako-mikan/sd-webui-supermerger/assets/6239068/796400d6-b740-466b-beae-0ffd70276850)
**Generated with addDifference ('rev animated')+('anything v3'-'sdv1.4')**
![FaceB](https://github.com/hako-mikan/sd-webui-supermerger/assets/6239068/d44d9a08-427c-47c9-9cde-d2750e880a54)
**Generated with trainDifference ('rev animated')+('anything v3'-'sdv1.4')**
![FaceC](https://github.com/hako-mikan/sd-webui-supermerger/assets/6239068/872259c4-af29-4624-ac65-a820d5edfd33)
- **Lora vs trainDifference**
Lora's obviously aren't invalidated by this because of their utility, plug-and-play flexibility, etc.
However it's often discussed how some models 'don't work well' with Lora's, and you've got some models like 'AnyLoRA' which was developed for that user on civitai to train their Lora's with in relation to this. You can see how to take advantage of this and trainDifference [here](#LoramergingfortrainDifference).
Using [FeverDream](https://civitai.com/models/26396?modelVersionId=32375) (a model definitely further away from the 'compatibility' an anime Lora would require), and [Thicker Lines Anime Style LoRA Mix](https://civitai.com/models/13910?modelVersionId=16368) who provided both a Lora version and pre-merged with [Anything V4.5](https://huggingface.co/andite/anything-v4.0/blob/main/anything-v4.5-pruned.safetensors) version we'll use for this, a direct comparison between the Lora on 'FeverDream' vs trainDifference ('Feverdream')+('Thicker Lines'-'Anythingv4.5') can be seen.
We'll compare at 1/1.2/1.4/1.6 lora/merge strength, as it's easiest to see at the extremes how the Lora pulls apart compared to the train difference.
![CompareLora](https://github.com/hako-mikan/sd-webui-supermerger/assets/6239068/6ad4c28f-c282-4005-8542-2cb697790b65)
### Usage guidance
#### Possibilities and general usage
Expand a model with new concepts, or reinforce existing concepts (and quality output), instead of mixing
Sci-Fi Diffusion as an example https://civitai.com/models/4404?modelVersionId=4980 was trained on general sci-fi images.
You don't have to merge/mix into it anymore, you can use this to practically train Sci-Fi into your model by trainDifferencing it against SDv1.5, you aren't limited to generating an aproximated Lora difference for expansion.
Another example is you could cosime similarity merge [Analog Diffusion](https://civitai.com/models/1265/analog-diffusion) and [Timeless Diffusion](https://civitai.com/models/3557?modelVersionId=3936) that are similar in nature (and you wouldn't want to re-inforce the negative elements of the photographs too much) then trainDifference [Modelshoot Style](https://civitai.com/models/2147/modelshoot-style) ontop of that which focuses on medium body shots with a stronger photography foundation built by the previous merge.
The potential for models, being able to now in a sense 'continue training' with broad models like [Surreality](https://civitai.com/models/21666?modelVersionId=25854) and [seek.art MEGA](https://civitai.com/models/1315?modelVersionId=22808) that gratefully lifted their license restrictions with V2, is now much larger than when it was limited to mixing them into models (though of course the utility for styling with different weighting of ins/outs etc all still has its value, and everything depends on your goal).
Also models like [RPG](https://civitai.com/models/1116?modelVersionId=7133) with v5 sounding like it is being developed from SDv1.5 instead of a merge, with this can be trained into models without the heavy NSFW/female bias in many from F222/etc merges.
Direction of trainDifference and style of the difference matters
It is harder for a model to learn to be realistic, than to be stylistic.
For example if building a model that intends to eventually be stylistic, consider having multiple model branches based on similar styles, to eventually trainDifference the stylistic branch onto the most realistic branch.
Generally you should merge anime/cartoon > stylish > realistic, if the styles differ.
trainDifference is not always the best solution
Sometimes depending on the type/scope of the difference, cosine similarity merge can provide better results (if the differences aren't from SDv1.5 already, trainDifference both onto SDv1.5, and then cosine similiarity merge them from there before you trainDifference it back onto your working model).
Also, sometimes if the material is similar but large and varied, the best result can come from using trainDifference in both directions, and then weight-sum merge between those 2 to find the best result, like [waifu diffusion](https://huggingface.co/hakurei/waifu-diffusion-v1-3) and [Acertainty](https://huggingface.co/JosephusCheung/ACertainty).
Gain the benefits of a trained model anywhere
Models like [knollingcase](https://civitai.com/models/1092?modelVersionId=1093) and [Bubble Toys](https://civitai.com/models/23945/bubble-toys-the-model) are cool, but their effort has been limited by the framework they were trained on. Now you can trainDifference them onto any of the newer models that people have developed.
Additionally some people that have made checkpoints instead of Lora's mentioned trying Lora first but without getting valuable results, with trainDifference their work can still be applied onto any model.
#### Limitations and what to avoid/problems and solutions
Knowing and having access to the origin of the model pre-training is required
A lot of models have some mix of SDv1.4 now. This trainDifference merge is accurate enough that, if you were to try and for example train 'rev animated' onto 'Sci-fi Diffusion' with SDv1.5 as model (C), because 'rev animated's origin is an unknown ratio between SDv1.4 and SDv1.5 (and mix of individual in/out weights too), the merge would negatively affect the output (the 'training' would be offset/distorted), but you could trainDifference 'sci-fi Diffusion' onto 'rev animated' because it was trained on SDv1.5.
After enough time / with similar materials, 'burning'/'over training' can eventually occur
You can 'pull back' the model at this point by cosine similarity merging it with SDv1.5, which helps ground it while keeping more qualities from the training.
After enough merges, the 'clip/comprehension' can become heavy, negatively effecting simple prompts
For example complex prompts may still look good, but 'female portrait, blue eyes' could spill the 'blue' concept too much.
To help avoid this, as you make trainDifference merges or large scope, you can use [model toolkit](https://github.com/arenasys/stable-diffusion-webui-model-toolkit) to manipulate the clip.
Load the final model into that extension, and create 2 different models. 'clipA' importing the clip of your base model, 'clipB' importing the clip of what you trained into it, and use a regular weightsum merge to find the best output/comprehension between those 2 models, to soften out the clip as you expand your model.
Sometimes weightsum merging the final model with a version of it using the SDv1.5 clip can be better than mixing between clipA and clipB.
#### Practical demonstration
- One of the simpler ways you can take advantage of this is for more natural/accurate Lora styling of a different model.
In this we'll use [BreakDomainAnime](https://civitai.com/models/72675/breakdomainanime) and [Mika Pikazo Style LoRA](https://civitai.com/models/8479/mika-pikazo-style-lora) that was trained on [AnyLora](https://civitai.com/models/23900?modelVersionId=28562)
*"1girl, smiling, scenic background BREAK [mika-pikazo]"*
**Generated with 'BreakDomainAnime'**
![LoraDifferenceA](https://github.com/hako-mikan/sd-webui-supermerger/assets/6239068/cbcbf1bf-c58b-4e70-b8af-1baf6d4102ce)
**Generated with 'BreakDomainAnime' using 'Mika Pikazo Style LoRA' at 1 strength**
![LoraDifferenceB](https://github.com/hako-mikan/sd-webui-supermerger/assets/6239068/332a61f1-d1d1-4c84-9639-f56c94e556db)
Now instead of having the Lora apply over 'BreakDomainAnime', we'll use trainDifference to get a better alignment.
Using the Lora tab of SuperMerger, Merge to Checkpoint 'Mika Pikazo Style LoRA' onto "anyloraCheckpoint_novaeFp16" (the checkpoint they describe as the one to use for training, so assumed to be what they use for their training) as "anyloraCheckpoint_mika_pikazo".
Then **trainDifference ('BreakDomainAnime')+('Desired Lora combination merged onto AnyLora, in this case anyloraCheckpoint_mika_pikazo'-'AnyLora') to generate**
![LoraDifferenceC](https://github.com/hako-mikan/sd-webui-supermerger/assets/6239068/50457b98-b2ad-4e28-a048-b023a86a2530)
Another more immediately visible comparison between Lora/the above technique, for a trainDifference of a background Lora that was originally trained on an anime model moved to a realistic model.
*"An eco-friendly residential building covered in vertical gardens in an urban setting"*
![LoraTraindifferenceBackgroundExamplepng](https://github.com/hako-mikan/sd-webui-supermerger/assets/6239068/c40c1833-f166-49b5-abfe-56a28780a736)
## smoothAdd
### _Available modes :_ Add difference
A method of add difference that mixes the benefits of Median and Gaussian filters, to add model differences in a smoother way trying to avoid the negative 'burning' effect that can be seen when adding too many models this way. This also achieves more than just simply adding the difference at a lower value.
- The starting point for reference
![Untitled-1](https://user-images.githubusercontent.com/6239068/232780130-19caa53a-a767-4ee1-80a7-dc37ad948322.png)
- Adding a collection of models on top of it, each with a value of 1
`The burn here is very obvious`
![Untitled-2](https://user-images.githubusercontent.com/6239068/232781113-3e2de251-711d-463a-82c9-a080be47e180.png)
- Adding a collection of models on top of it, each with a value of 0.5
`Still not an outcome I would accept, especially you can see with the bird`
![Untitled-3](https://user-images.githubusercontent.com/6239068/232785787-cfde6967-fc86-47e8-b208-3aa8f5f46c40.png)
The functionality and result of just the Median filter
- Reduces noise in the difference by replacing each value with the median of the neighboring values.
- Preserves edges and structures in the difference, which is helpful when you want to transfer the learning related to object shapes and boundaries.
- Non-linear filtering, which means it can better preserve the important features in the difference while reducing noise.
![Untitled-5](https://user-images.githubusercontent.com/6239068/232785599-1e40ee9f-43de-4721-bb5f-0c21485fd8d3.png)
The functionality and result of just the Gaussian filter
- Smooths the difference by applying a Gaussian kernel, which reduces high-frequency noise and retains the low-frequency components.
- The level of smoothing can be controlled by the sigma parameter, allowing you to experiment with different levels of smoothing.
- Linear filtering, which means it can better preserve the global structure in the difference while reducing noise.
![Untitled-4](https://user-images.githubusercontent.com/6239068/232785723-aecce7bb-1bc6-4731-a879-f8a7e4dc5a0c.png)
- The final result when instead using the combination of Median and Gaussian filters
_Note also compared with either the Median/Guassin filters individually how you can see the top left of the mans hair in the top right image doesn't get 'stuck' when combining them here, achieving the best result overall_
![Untitled-6](https://user-images.githubusercontent.com/6239068/232786207-f7f41c55-939e-46a1-ab24-2e6d885f65f9.png)
>**TIP**
>Sometimes you may want to use this smooth Add difference as an alternative to the regular, even without the risk of burning.
>In these cases you could increase the Alpha up to 2, as smooth Add at 1 is a lower impact change individually than regular Add, but this of course depends on your desired outcome.
##
### _Available modes :_ Add difference
This method is designed to extract **either similar or dissimilar features** from *two differential models* that are built upon a common base model.
### Using Three Full-Parameter Models
In this setup, we use a base model (**Model A**) along with two derived models (**Model B** and **Model C**), both developed from **Model A**. *The two differential models* in focus are "**Model B - Model A**" and "**Model C - Model A**". Both derivatives share **Model A** as their common ancestor, ideally the most recent one, to reduce false similarities.
###
In this setup, we directly use *the two differential models*: **LoRA B** and **LoRA C**. Both models are assumed to be trained on a common base model, similar to **Model A** in the three-model setup. However, if **LoRA B** and **LoRA C** derive from different base models, the results may be unpredictable due to underlying model discrepancies.
### Key Parameters
- **alpha (*α*)**: Controls the focus of feature extraction between **Model (LoRA)B** (***α* = 0**) and **Model (LoRA)C** (***α* = 1**).
- **beta (*β*)**: Controls the nature of feature extraction, with ***β* = 0** for **similar features** and ***β* = 1** for **dissimilar features**.
- **gamma (*γ*)**: Adjusts the selectivity in identifying feature (dis)similarity. **High *γ* (e.g., *γ* = 10)** emphasizes recognizing *more similar* features as similar. Conversely, **low *γ* (e.g., *γ* = 0.1)** emphasizes recognizing *more dissimilar* features as dissimilar.
**gamma** can be set in the option items of model merging as option(gamma), and in the case of LoRA, it can be set as gamma(smooth).
### Usage Scenarios
- ***α* = 0, *β* = 0**: Extracts features in **Model B** that are similar to those in **Model C**.
- ***α* = 0, *β* = 0.5**: Represents a balanced extraction between similarity and dissimilarity for features from:
- **Full-parameter models**: $\frac{\text{A} + \text{lerp}(\text{B}, \text{C}, \alpha)}{2}$
- **LoRA networks**: $\frac{\text{lerp}(\text{B}, \text{C}, \alpha)}{2}$
- ***α* = 0, *β* = 1**: Extracts features in **Model B** that are dissimilar to those in **Model C**.
- ***α* = 1**: Reverses the focus between **Model B** and **Model C**.
## tensor
### Available modes : weight sum only
- This is an Elemental merge that goes beyond Elemental merging.
As you know, each elemental tensor determines the features of an image in U-NET, and in normal merging, the values of each tensor are multiplied by a ratio and added together as shown below (normal). In the tensor method, the tensors are combined by dividing them by the ratio as shown in the figure below (tensor).
![](https://github.com/hako-mikan/sd-webui-supermerger/blob/images/tensor.jpg)
The tensor size of each element is noted below.
```
model.diffusion_model.time_embed.0.weight torch.Size([1280, 320])
model.diffusion_model.time_embed.0.bias torch.Size([1280])
model.diffusion_model.time_embed.2.weight torch.Size([1280, 1280])
model.diffusion_model.time_embed.2.bias torch.Size([1280])
model.diffusion_model.input_blocks.0.0.weight torch.Size([320, 4, 3, 3])
model.diffusion_model.input_blocks.0.0.bias torch.Size([320])
model.diffusion_model.input_blocks.1.0.in_layers.0.weight torch.Size([320])
model.diffusion_model.input_blocks.1.0.in_layers.0.bias torch.Size([320])
model.diffusion_model.input_blocks.1.0.in_layers.2.weight torch.Size([320, 320, 3, 3])
model.diffusion_model.input_blocks.1.0.in_layers.2.bias torch.Size([320])
model.diffusion_model.input_blocks.1.0.emb_layers.1.weight torch.Size([320, 1280])
model.diffusion_model.input_blocks.1.0.emb_layers.1.bias torch.Size([320])
model.diffusion_model.input_blocks.1.0.out_layers.0.weight torch.Size([320])
model.diffusion_model.input_blocks.1.0.out_layers.0.bias torch.Size([320])
model.diffusion_model.input_blocks.1.0.out_layers.3.weight torch.Size([320, 320, 3, 3])
model.diffusion_model.input_blocks.1.0.out_layers.3.bias torch.Size([320])
model.diffusion_model.input_blocks.1.1.norm.weight torch.Size([320])
model.diffusion_model.input_blocks.1.1.norm.bias torch.Size([320])
model.diffusion_model.input_blocks.1.1.proj_in.weight torch.Size([320, 320, 1, 1])
model.diffusion_model.input_blocks.1.1.proj_in.bias torch.Size([320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn1.to_q.weight torch.Size([320, 320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn1.to_k.weight torch.Size([320, 320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn1.to_v.weight torch.Size([320, 320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([320, 320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([2560, 320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([2560])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.ff.net.2.weight torch.Size([320, 1280])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.ff.net.2.bias torch.Size([320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_q.weight torch.Size([320, 320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_k.weight torch.Size([320, 768])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_v.weight torch.Size([320, 768])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([320, 320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.norm1.weight torch.Size([320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.norm1.bias torch.Size([320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.norm2.weight torch.Size([320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.norm2.bias torch.Size([320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.norm3.weight torch.Size([320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.norm3.bias torch.Size([320])
model.diffusion_model.input_blocks.1.1.proj_out.weight torch.Size([320, 320, 1, 1])
model.diffusion_model.input_blocks.1.1.proj_out.bias torch.Size([320])
model.diffusion_model.input_blocks.2.0.in_layers.0.weight torch.Size([320])
model.diffusion_model.input_blocks.2.0.in_layers.0.bias torch.Size([320])
model.diffusion_model.input_blocks.2.0.in_layers.2.weight torch.Size([320, 320, 3, 3])
model.diffusion_model.input_blocks.2.0.in_layers.2.bias torch.Size([320])
model.diffusion_model.input_blocks.2.0.emb_layers.1.weight torch.Size([320, 1280])
model.diffusion_model.input_blocks.2.0.emb_layers.1.bias torch.Size([320])
model.diffusion_model.input_blocks.2.0.out_layers.0.weight torch.Size([320])
model.diffusion_model.input_blocks.2.0.out_layers.0.bias torch.Size([320])
model.diffusion_model.input_blocks.2.0.out_layers.3.weight torch.Size([320, 320, 3, 3])
model.diffusion_model.input_blocks.2.0.out_layers.3.bias torch.Size([320])
model.diffusion_model.input_blocks.2.1.norm.weight torch.Size([320])
model.diffusion_model.input_blocks.2.1.norm.bias torch.Size([320])
model.diffusion_model.input_blocks.2.1.proj_in.weight torch.Size([320, 320, 1, 1])
model.diffusion_model.input_blocks.2.1.proj_in.bias torch.Size([320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn1.to_q.weight torch.Size([320, 320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn1.to_k.weight torch.Size([320, 320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn1.to_v.weight torch.Size([320, 320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([320, 320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([2560, 320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([2560])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.ff.net.2.weight torch.Size([320, 1280])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.ff.net.2.bias torch.Size([320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_q.weight torch.Size([320, 320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_k.weight torch.Size([320, 768])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_v.weight torch.Size([320, 768])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([320, 320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.norm1.weight torch.Size([320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.norm1.bias torch.Size([320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.norm2.weight torch.Size([320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.norm2.bias torch.Size([320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.norm3.weight torch.Size([320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.norm3.bias torch.Size([320])
model.diffusion_model.input_blocks.2.1.proj_out.weight torch.Size([320, 320, 1, 1])
model.diffusion_model.input_blocks.2.1.proj_out.bias torch.Size([320])
model.diffusion_model.input_blocks.3.0.op.weight torch.Size([320, 320, 3, 3])
model.diffusion_model.input_blocks.3.0.op.bias torch.Size([320])
model.diffusion_model.input_blocks.4.0.in_layers.0.weight torch.Size([320])
model.diffusion_model.input_blocks.4.0.in_layers.0.bias torch.Size([320])
model.diffusion_model.input_blocks.4.0.in_layers.2.weight torch.Size([640, 320, 3, 3])
model.diffusion_model.input_blocks.4.0.in_layers.2.bias torch.Size([640])
model.diffusion_model.input_blocks.4.0.emb_layers.1.weight torch.Size([640, 1280])
model.diffusion_model.input_blocks.4.0.emb_layers.1.bias torch.Size([640])
model.diffusion_model.input_blocks.4.0.out_layers.0.weight torch.Size([640])
model.diffusion_model.input_blocks.4.0.out_layers.0.bias torch.Size([640])
model.diffusion_model.input_blocks.4.0.out_layers.3.weight torch.Size([640, 640, 3, 3])
model.diffusion_model.input_blocks.4.0.out_layers.3.bias torch.Size([640])
model.diffusion_model.input_blocks.4.0.skip_connection.weight torch.Size([640, 320, 1, 1])
model.diffusion_model.input_blocks.4.0.skip_connection.bias torch.Size([640])
model.diffusion_model.input_blocks.4.1.norm.weight torch.Size([640])
model.diffusion_model.input_blocks.4.1.norm.bias torch.Size([640])
model.diffusion_model.input_blocks.4.1.proj_in.weight torch.Size([640, 640, 1, 1])
model.diffusion_model.input_blocks.4.1.proj_in.bias torch.Size([640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn1.to_q.weight torch.Size([640, 640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn1.to_k.weight torch.Size([640, 640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn1.to_v.weight torch.Size([640, 640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([640, 640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([5120, 640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([5120])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.ff.net.2.weight torch.Size([640, 2560])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.ff.net.2.bias torch.Size([640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_q.weight torch.Size([640, 640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_k.weight torch.Size([640, 768])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_v.weight torch.Size([640, 768])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([640, 640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.norm1.weight torch.Size([640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.norm1.bias torch.Size([640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.norm2.weight torch.Size([640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.norm2.bias torch.Size([640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.norm3.weight torch.Size([640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.norm3.bias torch.Size([640])
model.diffusion_model.input_blocks.4.1.proj_out.weight torch.Size([640, 640, 1, 1])
model.diffusion_model.input_blocks.4.1.proj_out.bias torch.Size([640])
model.diffusion_model.input_blocks.5.0.in_layers.0.weight torch.Size([640])
model.diffusion_model.input_blocks.5.0.in_layers.0.bias torch.Size([640])
model.diffusion_model.input_blocks.5.0.in_layers.2.weight torch.Size([640, 640, 3, 3])
model.diffusion_model.input_blocks.5.0.in_layers.2.bias torch.Size([640])
model.diffusion_model.input_blocks.5.0.emb_layers.1.weight torch.Size([640, 1280])
model.diffusion_model.input_blocks.5.0.emb_layers.1.bias torch.Size([640])
model.diffusion_model.input_blocks.5.0.out_layers.0.weight torch.Size([640])
model.diffusion_model.input_blocks.5.0.out_layers.0.bias torch.Size([640])
model.diffusion_model.input_blocks.5.0.out_layers.3.weight torch.Size([640, 640, 3, 3])
model.diffusion_model.input_blocks.5.0.out_layers.3.bias torch.Size([640])
model.diffusion_model.input_blocks.5.1.norm.weight torch.Size([640])
model.diffusion_model.input_blocks.5.1.norm.bias torch.Size([640])
model.diffusion_model.input_blocks.5.1.proj_in.weight torch.Size([640, 640, 1, 1])
model.diffusion_model.input_blocks.5.1.proj_in.bias torch.Size([640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn1.to_q.weight torch.Size([640, 640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn1.to_k.weight torch.Size([640, 640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn1.to_v.weight torch.Size([640, 640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([640, 640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([5120, 640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([5120])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.ff.net.2.weight torch.Size([640, 2560])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.ff.net.2.bias torch.Size([640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_q.weight torch.Size([640, 640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_k.weight torch.Size([640, 768])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_v.weight torch.Size([640, 768])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([640, 640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.norm1.weight torch.Size([640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.norm1.bias torch.Size([640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.norm2.weight torch.Size([640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.norm2.bias torch.Size([640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.norm3.weight torch.Size([640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.norm3.bias torch.Size([640])
model.diffusion_model.input_blocks.5.1.proj_out.weight torch.Size([640, 640, 1, 1])
model.diffusion_model.input_blocks.5.1.proj_out.bias torch.Size([640])
model.diffusion_model.input_blocks.6.0.op.weight torch.Size([640, 640, 3, 3])
model.diffusion_model.input_blocks.6.0.op.bias torch.Size([640])
model.diffusion_model.input_blocks.7.0.in_layers.0.weight torch.Size([640])
model.diffusion_model.input_blocks.7.0.in_layers.0.bias torch.Size([640])
model.diffusion_model.input_blocks.7.0.in_layers.2.weight torch.Size([1280, 640, 3, 3])
model.diffusion_model.input_blocks.7.0.in_layers.2.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.0.emb_layers.1.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.7.0.emb_layers.1.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.0.out_layers.0.weight torch.Size([1280])
model.diffusion_model.input_blocks.7.0.out_layers.0.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.0.out_layers.3.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.input_blocks.7.0.out_layers.3.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.0.skip_connection.weight torch.Size([1280, 640, 1, 1])
model.diffusion_model.input_blocks.7.0.skip_connection.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.1.norm.weight torch.Size([1280])
model.diffusion_model.input_blocks.7.1.norm.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.1.proj_in.weight torch.Size([1280, 1280, 1, 1])
model.diffusion_model.input_blocks.7.1.proj_in.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn1.to_q.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn1.to_k.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn1.to_v.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([10240, 1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([10240])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.ff.net.2.weight torch.Size([1280, 5120])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.ff.net.2.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_q.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_k.weight torch.Size([1280, 768])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_v.weight torch.Size([1280, 768])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.norm1.weight torch.Size([1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.norm1.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.norm2.weight torch.Size([1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.norm2.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.norm3.weight torch.Size([1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.norm3.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.1.proj_out.weight torch.Size([1280, 1280, 1, 1])
model.diffusion_model.input_blocks.7.1.proj_out.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.0.in_layers.0.weight torch.Size([1280])
model.diffusion_model.input_blocks.8.0.in_layers.0.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.0.in_layers.2.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.input_blocks.8.0.in_layers.2.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.0.emb_layers.1.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.8.0.emb_layers.1.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.0.out_layers.0.weight torch.Size([1280])
model.diffusion_model.input_blocks.8.0.out_layers.0.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.0.out_layers.3.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.input_blocks.8.0.out_layers.3.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.1.norm.weight torch.Size([1280])
model.diffusion_model.input_blocks.8.1.norm.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.1.proj_in.weight torch.Size([1280, 1280, 1, 1])
model.diffusion_model.input_blocks.8.1.proj_in.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn1.to_q.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn1.to_k.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn1.to_v.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([10240, 1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([10240])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.ff.net.2.weight torch.Size([1280, 5120])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.ff.net.2.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_q.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_k.weight torch.Size([1280, 768])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_v.weight torch.Size([1280, 768])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.norm1.weight torch.Size([1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.norm1.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.norm2.weight torch.Size([1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.norm2.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.norm3.weight torch.Size([1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.norm3.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.1.proj_out.weight torch.Size([1280, 1280, 1, 1])
model.diffusion_model.input_blocks.8.1.proj_out.bias torch.Size([1280])
model.diffusion_model.input_blocks.9.0.op.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.input_blocks.9.0.op.bias torch.Size([1280])
model.diffusion_model.input_blocks.10.0.in_layers.0.weight torch.Size([1280])
model.diffusion_model.input_blocks.10.0.in_layers.0.bias torch.Size([1280])
model.diffusion_model.input_blocks.10.0.in_layers.2.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.input_blocks.10.0.in_layers.2.bias torch.Size([1280])
model.diffusion_model.input_blocks.10.0.emb_layers.1.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.10.0.emb_layers.1.bias torch.Size([1280])
model.diffusion_model.input_blocks.10.0.out_layers.0.weight torch.Size([1280])
model.diffusion_model.input_blocks.10.0.out_layers.0.bias torch.Size([1280])
model.diffusion_model.input_blocks.10.0.out_layers.3.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.input_blocks.10.0.out_layers.3.bias torch.Size([1280])
model.diffusion_model.input_blocks.11.0.in_layers.0.weight torch.Size([1280])
model.diffusion_model.input_blocks.11.0.in_layers.0.bias torch.Size([1280])
model.diffusion_model.input_blocks.11.0.in_layers.2.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.input_blocks.11.0.in_layers.2.bias torch.Size([1280])
model.diffusion_model.input_blocks.11.0.emb_layers.1.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.11.0.emb_layers.1.bias torch.Size([1280])
model.diffusion_model.input_blocks.11.0.out_layers.0.weight torch.Size([1280])
model.diffusion_model.input_blocks.11.0.out_layers.0.bias torch.Size([1280])
model.diffusion_model.input_blocks.11.0.out_layers.3.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.input_blocks.11.0.out_layers.3.bias torch.Size([1280])
model.diffusion_model.middle_block.0.in_layers.0.weight torch.Size([1280])
model.diffusion_model.middle_block.0.in_layers.0.bias torch.Size([1280])
model.diffusion_model.middle_block.0.in_layers.2.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.middle_block.0.in_layers.2.bias torch.Size([1280])
model.diffusion_model.middle_block.0.emb_layers.1.weight torch.Size([1280, 1280])
model.diffusion_model.middle_block.0.emb_layers.1.bias torch.Size([1280])
model.diffusion_model.middle_block.0.out_layers.0.weight torch.Size([1280])
model.diffusion_model.middle_block.0.out_layers.0.bias torch.Size([1280])
model.diffusion_model.middle_block.0.out_layers.3.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.middle_block.0.out_layers.3.bias torch.Size([1280])
model.diffusion_model.middle_block.1.norm.weight torch.Size([1280])
model.diffusion_model.middle_block.1.norm.bias torch.Size([1280])
model.diffusion_model.middle_block.1.proj_in.weight torch.Size([1280, 1280, 1, 1])
model.diffusion_model.middle_block.1.proj_in.bias torch.Size([1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.attn1.to_q.weight torch.Size([1280, 1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.attn1.to_k.weight torch.Size([1280, 1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.attn1.to_v.weight torch.Size([1280, 1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([1280, 1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([10240, 1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([10240])
model.diffusion_model.middle_block.1.transformer_blocks.0.ff.net.2.weight torch.Size([1280, 5120])
model.diffusion_model.middle_block.1.transformer_blocks.0.ff.net.2.bias torch.Size([1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_q.weight torch.Size([1280, 1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_k.weight torch.Size([1280, 768])
model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_v.weight torch.Size([1280, 768])
model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([1280, 1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.norm1.weight torch.Size([1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.norm1.bias torch.Size([1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.norm2.weight torch.Size([1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.norm2.bias torch.Size([1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.norm3.weight torch.Size([1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.norm3.bias torch.Size([1280])
model.diffusion_model.middle_block.1.proj_out.weight torch.Size([1280, 1280, 1, 1])
model.diffusion_model.middle_block.1.proj_out.bias torch.Size([1280])
model.diffusion_model.middle_block.2.in_layers.0.weight torch.Size([1280])
model.diffusion_model.middle_block.2.in_layers.0.bias torch.Size([1280])
model.diffusion_model.middle_block.2.in_layers.2.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.middle_block.2.in_layers.2.bias torch.Size([1280])
model.diffusion_model.middle_block.2.emb_layers.1.weight torch.Size([1280, 1280])
model.diffusion_model.middle_block.2.emb_layers.1.bias torch.Size([1280])
model.diffusion_model.middle_block.2.out_layers.0.weight torch.Size([1280])
model.diffusion_model.middle_block.2.out_layers.0.bias torch.Size([1280])
model.diffusion_model.middle_block.2.out_layers.3.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.middle_block.2.out_layers.3.bias torch.Size([1280])
model.diffusion_model.output_blocks.0.0.in_layers.0.weight torch.Size([2560])
model.diffusion_model.output_blocks.0.0.in_layers.0.bias torch.Size([2560])
model.diffusion_model.output_blocks.0.0.in_layers.2.weight torch.Size([1280, 2560, 3, 3])
model.diffusion_model.output_blocks.0.0.in_layers.2.bias torch.Size([1280])
model.diffusion_model.output_blocks.0.0.emb_layers.1.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.0.0.emb_layers.1.bias torch.Size([1280])
model.diffusion_model.output_blocks.0.0.out_layers.0.weight torch.Size([1280])
model.diffusion_model.output_blocks.0.0.out_layers.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.0.0.out_layers.3.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.output_blocks.0.0.out_layers.3.bias torch.Size([1280])
model.diffusion_model.output_blocks.0.0.skip_connection.weight torch.Size([1280, 2560, 1, 1])
model.diffusion_model.output_blocks.0.0.skip_connection.bias torch.Size([1280])
model.diffusion_model.output_blocks.1.0.in_layers.0.weight torch.Size([2560])
model.diffusion_model.output_blocks.1.0.in_layers.0.bias torch.Size([2560])
model.diffusion_model.output_blocks.1.0.in_layers.2.weight torch.Size([1280, 2560, 3, 3])
model.diffusion_model.output_blocks.1.0.in_layers.2.bias torch.Size([1280])
model.diffusion_model.output_blocks.1.0.emb_layers.1.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.1.0.emb_layers.1.bias torch.Size([1280])
model.diffusion_model.output_blocks.1.0.out_layers.0.weight torch.Size([1280])
model.diffusion_model.output_blocks.1.0.out_layers.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.1.0.out_layers.3.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.output_blocks.1.0.out_layers.3.bias torch.Size([1280])
model.diffusion_model.output_blocks.1.0.skip_connection.weight torch.Size([1280, 2560, 1, 1])
model.diffusion_model.output_blocks.1.0.skip_connection.bias torch.Size([1280])
model.diffusion_model.output_blocks.2.0.in_layers.0.weight torch.Size([2560])
model.diffusion_model.output_blocks.2.0.in_layers.0.bias torch.Size([2560])
model.diffusion_model.output_blocks.2.0.in_layers.2.weight torch.Size([1280, 2560, 3, 3])
model.diffusion_model.output_blocks.2.0.in_layers.2.bias torch.Size([1280])
model.diffusion_model.output_blocks.2.0.emb_layers.1.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.2.0.emb_layers.1.bias torch.Size([1280])
model.diffusion_model.output_blocks.2.0.out_layers.0.weight torch.Size([1280])
model.diffusion_model.output_blocks.2.0.out_layers.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.2.0.out_layers.3.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.output_blocks.2.0.out_layers.3.bias torch.Size([1280])
model.diffusion_model.output_blocks.2.0.skip_connection.weight torch.Size([1280, 2560, 1, 1])
model.diffusion_model.output_blocks.2.0.skip_connection.bias torch.Size([1280])
model.diffusion_model.output_blocks.2.1.conv.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.output_blocks.2.1.conv.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.0.in_layers.0.weight torch.Size([2560])
model.diffusion_model.output_blocks.3.0.in_layers.0.bias torch.Size([2560])
model.diffusion_model.output_blocks.3.0.in_layers.2.weight torch.Size([1280, 2560, 3, 3])
model.diffusion_model.output_blocks.3.0.in_layers.2.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.0.emb_layers.1.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.3.0.emb_layers.1.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.0.out_layers.0.weight torch.Size([1280])
model.diffusion_model.output_blocks.3.0.out_layers.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.0.out_layers.3.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.output_blocks.3.0.out_layers.3.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.0.skip_connection.weight torch.Size([1280, 2560, 1, 1])
model.diffusion_model.output_blocks.3.0.skip_connection.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.1.norm.weight torch.Size([1280])
model.diffusion_model.output_blocks.3.1.norm.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.1.proj_in.weight torch.Size([1280, 1280, 1, 1])
model.diffusion_model.output_blocks.3.1.proj_in.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn1.to_q.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn1.to_k.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn1.to_v.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([10240, 1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([10240])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.ff.net.2.weight torch.Size([1280, 5120])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.ff.net.2.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_q.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_k.weight torch.Size([1280, 768])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_v.weight torch.Size([1280, 768])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.norm1.weight torch.Size([1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.norm1.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.norm2.weight torch.Size([1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.norm2.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.norm3.weight torch.Size([1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.norm3.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.1.proj_out.weight torch.Size([1280, 1280, 1, 1])
model.diffusion_model.output_blocks.3.1.proj_out.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.0.in_layers.0.weight torch.Size([2560])
model.diffusion_model.output_blocks.4.0.in_layers.0.bias torch.Size([2560])
model.diffusion_model.output_blocks.4.0.in_layers.2.weight torch.Size([1280, 2560, 3, 3])
model.diffusion_model.output_blocks.4.0.in_layers.2.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.0.emb_layers.1.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.4.0.emb_layers.1.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.0.out_layers.0.weight torch.Size([1280])
model.diffusion_model.output_blocks.4.0.out_layers.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.0.out_layers.3.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.output_blocks.4.0.out_layers.3.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.0.skip_connection.weight torch.Size([1280, 2560, 1, 1])
model.diffusion_model.output_blocks.4.0.skip_connection.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.1.norm.weight torch.Size([1280])
model.diffusion_model.output_blocks.4.1.norm.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.1.proj_in.weight torch.Size([1280, 1280, 1, 1])
model.diffusion_model.output_blocks.4.1.proj_in.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn1.to_q.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn1.to_k.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn1.to_v.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([10240, 1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([10240])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.ff.net.2.weight torch.Size([1280, 5120])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.ff.net.2.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_q.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_k.weight torch.Size([1280, 768])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_v.weight torch.Size([1280, 768])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.norm1.weight torch.Size([1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.norm1.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.norm2.weight torch.Size([1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.norm2.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.norm3.weight torch.Size([1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.norm3.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.1.proj_out.weight torch.Size([1280, 1280, 1, 1])
model.diffusion_model.output_blocks.4.1.proj_out.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.0.in_layers.0.weight torch.Size([1920])
model.diffusion_model.output_blocks.5.0.in_layers.0.bias torch.Size([1920])
model.diffusion_model.output_blocks.5.0.in_layers.2.weight torch.Size([1280, 1920, 3, 3])
model.diffusion_model.output_blocks.5.0.in_layers.2.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.0.emb_layers.1.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.5.0.emb_layers.1.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.0.out_layers.0.weight torch.Size([1280])
model.diffusion_model.output_blocks.5.0.out_layers.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.0.out_layers.3.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.output_blocks.5.0.out_layers.3.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.0.skip_connection.weight torch.Size([1280, 1920, 1, 1])
model.diffusion_model.output_blocks.5.0.skip_connection.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.1.norm.weight torch.Size([1280])
model.diffusion_model.output_blocks.5.1.norm.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.1.proj_in.weight torch.Size([1280, 1280, 1, 1])
model.diffusion_model.output_blocks.5.1.proj_in.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn1.to_q.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn1.to_k.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn1.to_v.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([10240, 1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([10240])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.ff.net.2.weight torch.Size([1280, 5120])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.ff.net.2.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_q.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_k.weight torch.Size([1280, 768])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_v.weight torch.Size([1280, 768])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.norm1.weight torch.Size([1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.norm1.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.norm2.weight torch.Size([1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.norm2.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.norm3.weight torch.Size([1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.norm3.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.1.proj_out.weight torch.Size([1280, 1280, 1, 1])
model.diffusion_model.output_blocks.5.1.proj_out.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.2.conv.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.output_blocks.5.2.conv.bias torch.Size([1280])
model.diffusion_model.output_blocks.6.0.in_layers.0.weight torch.Size([1920])
model.diffusion_model.output_blocks.6.0.in_layers.0.bias torch.Size([1920])
model.diffusion_model.output_blocks.6.0.in_layers.2.weight torch.Size([640, 1920, 3, 3])
model.diffusion_model.output_blocks.6.0.in_layers.2.bias torch.Size([640])
model.diffusion_model.output_blocks.6.0.emb_layers.1.weight torch.Size([640, 1280])
model.diffusion_model.output_blocks.6.0.emb_layers.1.bias torch.Size([640])
model.diffusion_model.output_blocks.6.0.out_layers.0.weight torch.Size([640])
model.diffusion_model.output_blocks.6.0.out_layers.0.bias torch.Size([640])
model.diffusion_model.output_blocks.6.0.out_layers.3.weight torch.Size([640, 640, 3, 3])
model.diffusion_model.output_blocks.6.0.out_layers.3.bias torch.Size([640])
model.diffusion_model.output_blocks.6.0.skip_connection.weight torch.Size([640, 1920, 1, 1])
model.diffusion_model.output_blocks.6.0.skip_connection.bias torch.Size([640])
model.diffusion_model.output_blocks.6.1.norm.weight torch.Size([640])
model.diffusion_model.output_blocks.6.1.norm.bias torch.Size([640])
model.diffusion_model.output_blocks.6.1.proj_in.weight torch.Size([640, 640, 1, 1])
model.diffusion_model.output_blocks.6.1.proj_in.bias torch.Size([640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn1.to_q.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn1.to_k.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn1.to_v.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([5120, 640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([5120])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.ff.net.2.weight torch.Size([640, 2560])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.ff.net.2.bias torch.Size([640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_q.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_k.weight torch.Size([640, 768])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_v.weight torch.Size([640, 768])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.norm1.weight torch.Size([640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.norm1.bias torch.Size([640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.norm2.weight torch.Size([640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.norm2.bias torch.Size([640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.norm3.weight torch.Size([640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.norm3.bias torch.Size([640])
model.diffusion_model.output_blocks.6.1.proj_out.weight torch.Size([640, 640, 1, 1])
model.diffusion_model.output_blocks.6.1.proj_out.bias torch.Size([640])
model.diffusion_model.output_blocks.7.0.in_layers.0.weight torch.Size([1280])
model.diffusion_model.output_blocks.7.0.in_layers.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.7.0.in_layers.2.weight torch.Size([640, 1280, 3, 3])
model.diffusion_model.output_blocks.7.0.in_layers.2.bias torch.Size([640])
model.diffusion_model.output_blocks.7.0.emb_layers.1.weight torch.Size([640, 1280])
model.diffusion_model.output_blocks.7.0.emb_layers.1.bias torch.Size([640])
model.diffusion_model.output_blocks.7.0.out_layers.0.weight torch.Size([640])
model.diffusion_model.output_blocks.7.0.out_layers.0.bias torch.Size([640])
model.diffusion_model.output_blocks.7.0.out_layers.3.weight torch.Size([640, 640, 3, 3])
model.diffusion_model.output_blocks.7.0.out_layers.3.bias torch.Size([640])
model.diffusion_model.output_blocks.7.0.skip_connection.weight torch.Size([640, 1280, 1, 1])
model.diffusion_model.output_blocks.7.0.skip_connection.bias torch.Size([640])
model.diffusion_model.output_blocks.7.1.norm.weight torch.Size([640])
model.diffusion_model.output_blocks.7.1.norm.bias torch.Size([640])
model.diffusion_model.output_blocks.7.1.proj_in.weight torch.Size([640, 640, 1, 1])
model.diffusion_model.output_blocks.7.1.proj_in.bias torch.Size([640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn1.to_q.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn1.to_k.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn1.to_v.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([5120, 640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([5120])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.ff.net.2.weight torch.Size([640, 2560])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.ff.net.2.bias torch.Size([640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_q.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_k.weight torch.Size([640, 768])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_v.weight torch.Size([640, 768])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.norm1.weight torch.Size([640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.norm1.bias torch.Size([640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.norm2.weight torch.Size([640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.norm2.bias torch.Size([640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.norm3.weight torch.Size([640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.norm3.bias torch.Size([640])
model.diffusion_model.output_blocks.7.1.proj_out.weight torch.Size([640, 640, 1, 1])
model.diffusion_model.output_blocks.7.1.proj_out.bias torch.Size([640])
model.diffusion_model.output_blocks.8.0.in_layers.0.weight torch.Size([960])
model.diffusion_model.output_blocks.8.0.in_layers.0.bias torch.Size([960])
model.diffusion_model.output_blocks.8.0.in_layers.2.weight torch.Size([640, 960, 3, 3])
model.diffusion_model.output_blocks.8.0.in_layers.2.bias torch.Size([640])
model.diffusion_model.output_blocks.8.0.emb_layers.1.weight torch.Size([640, 1280])
model.diffusion_model.output_blocks.8.0.emb_layers.1.bias torch.Size([640])
model.diffusion_model.output_blocks.8.0.out_layers.0.weight torch.Size([640])
model.diffusion_model.output_blocks.8.0.out_layers.0.bias torch.Size([640])
model.diffusion_model.output_blocks.8.0.out_layers.3.weight torch.Size([640, 640, 3, 3])
model.diffusion_model.output_blocks.8.0.out_layers.3.bias torch.Size([640])
model.diffusion_model.output_blocks.8.0.skip_connection.weight torch.Size([640, 960, 1, 1])
model.diffusion_model.output_blocks.8.0.skip_connection.bias torch.Size([640])
model.diffusion_model.output_blocks.8.1.norm.weight torch.Size([640])
model.diffusion_model.output_blocks.8.1.norm.bias torch.Size([640])
model.diffusion_model.output_blocks.8.1.proj_in.weight torch.Size([640, 640, 1, 1])
model.diffusion_model.output_blocks.8.1.proj_in.bias torch.Size([640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn1.to_q.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn1.to_k.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn1.to_v.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([5120, 640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([5120])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.ff.net.2.weight torch.Size([640, 2560])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.ff.net.2.bias torch.Size([640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_q.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_k.weight torch.Size([640, 768])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_v.weight torch.Size([640, 768])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.norm1.weight torch.Size([640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.norm1.bias torch.Size([640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.norm2.weight torch.Size([640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.norm2.bias torch.Size([640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.norm3.weight torch.Size([640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.norm3.bias torch.Size([640])
model.diffusion_model.output_blocks.8.1.proj_out.weight torch.Size([640, 640, 1, 1])
model.diffusion_model.output_blocks.8.1.proj_out.bias torch.Size([640])
model.diffusion_model.output_blocks.8.2.conv.weight torch.Size([640, 640, 3, 3])
model.diffusion_model.output_blocks.8.2.conv.bias torch.Size([640])
model.diffusion_model.output_blocks.9.0.in_layers.0.weight torch.Size([960])
model.diffusion_model.output_blocks.9.0.in_layers.0.bias torch.Size([960])
model.diffusion_model.output_blocks.9.0.in_layers.2.weight torch.Size([320, 960, 3, 3])
model.diffusion_model.output_blocks.9.0.in_layers.2.bias torch.Size([320])
model.diffusion_model.output_blocks.9.0.emb_layers.1.weight torch.Size([320, 1280])
model.diffusion_model.output_blocks.9.0.emb_layers.1.bias torch.Size([320])
model.diffusion_model.output_blocks.9.0.out_layers.0.weight torch.Size([320])
model.diffusion_model.output_blocks.9.0.out_layers.0.bias torch.Size([320])
model.diffusion_model.output_blocks.9.0.out_layers.3.weight torch.Size([320, 320, 3, 3])
model.diffusion_model.output_blocks.9.0.out_layers.3.bias torch.Size([320])
model.diffusion_model.output_blocks.9.0.skip_connection.weight torch.Size([320, 960, 1, 1])
model.diffusion_model.output_blocks.9.0.skip_connection.bias torch.Size([320])
model.diffusion_model.output_blocks.9.1.norm.weight torch.Size([320])
model.diffusion_model.output_blocks.9.1.norm.bias torch.Size([320])
model.diffusion_model.output_blocks.9.1.proj_in.weight torch.Size([320, 320, 1, 1])
model.diffusion_model.output_blocks.9.1.proj_in.bias torch.Size([320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn1.to_q.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn1.to_k.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn1.to_v.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([2560, 320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([2560])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.ff.net.2.weight torch.Size([320, 1280])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.ff.net.2.bias torch.Size([320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_q.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_k.weight torch.Size([320, 768])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_v.weight torch.Size([320, 768])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.norm1.weight torch.Size([320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.norm1.bias torch.Size([320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.norm2.weight torch.Size([320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.norm2.bias torch.Size([320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.norm3.weight torch.Size([320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.norm3.bias torch.Size([320])
model.diffusion_model.output_blocks.9.1.proj_out.weight torch.Size([320, 320, 1, 1])
model.diffusion_model.output_blocks.9.1.proj_out.bias torch.Size([320])
model.diffusion_model.output_blocks.10.0.in_layers.0.weight torch.Size([640])
model.diffusion_model.output_blocks.10.0.in_layers.0.bias torch.Size([640])
model.diffusion_model.output_blocks.10.0.in_layers.2.weight torch.Size([320, 640, 3, 3])
model.diffusion_model.output_blocks.10.0.in_layers.2.bias torch.Size([320])
model.diffusion_model.output_blocks.10.0.emb_layers.1.weight torch.Size([320, 1280])
model.diffusion_model.output_blocks.10.0.emb_layers.1.bias torch.Size([320])
model.diffusion_model.output_blocks.10.0.out_layers.0.weight torch.Size([320])
model.diffusion_model.output_blocks.10.0.out_layers.0.bias torch.Size([320])
model.diffusion_model.output_blocks.10.0.out_layers.3.weight torch.Size([320, 320, 3, 3])
model.diffusion_model.output_blocks.10.0.out_layers.3.bias torch.Size([320])
model.diffusion_model.output_blocks.10.0.skip_connection.weight torch.Size([320, 640, 1, 1])
model.diffusion_model.output_blocks.10.0.skip_connection.bias torch.Size([320])
model.diffusion_model.output_blocks.10.1.norm.weight torch.Size([320])
model.diffusion_model.output_blocks.10.1.norm.bias torch.Size([320])
model.diffusion_model.output_blocks.10.1.proj_in.weight torch.Size([320, 320, 1, 1])
model.diffusion_model.output_blocks.10.1.proj_in.bias torch.Size([320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn1.to_q.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn1.to_k.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn1.to_v.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([2560, 320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([2560])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.ff.net.2.weight torch.Size([320, 1280])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.ff.net.2.bias torch.Size([320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_q.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_k.weight torch.Size([320, 768])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_v.weight torch.Size([320, 768])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.norm1.weight torch.Size([320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.norm1.bias torch.Size([320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.norm2.weight torch.Size([320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.norm2.bias torch.Size([320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.norm3.weight torch.Size([320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.norm3.bias torch.Size([320])
model.diffusion_model.output_blocks.10.1.proj_out.weight torch.Size([320, 320, 1, 1])
model.diffusion_model.output_blocks.10.1.proj_out.bias torch.Size([320])
model.diffusion_model.output_blocks.11.0.in_layers.0.weight torch.Size([640])
model.diffusion_model.output_blocks.11.0.in_layers.0.bias torch.Size([640])
model.diffusion_model.output_blocks.11.0.in_layers.2.weight torch.Size([320, 640, 3, 3])
model.diffusion_model.output_blocks.11.0.in_layers.2.bias torch.Size([320])
model.diffusion_model.output_blocks.11.0.emb_layers.1.weight torch.Size([320, 1280])
model.diffusion_model.output_blocks.11.0.emb_layers.1.bias torch.Size([320])
model.diffusion_model.output_blocks.11.0.out_layers.0.weight torch.Size([320])
model.diffusion_model.output_blocks.11.0.out_layers.0.bias torch.Size([320])
model.diffusion_model.output_blocks.11.0.out_layers.3.weight torch.Size([320, 320, 3, 3])
model.diffusion_model.output_blocks.11.0.out_layers.3.bias torch.Size([320])
model.diffusion_model.output_blocks.11.0.skip_connection.weight torch.Size([320, 640, 1, 1])
model.diffusion_model.output_blocks.11.0.skip_connection.bias torch.Size([320])
model.diffusion_model.output_blocks.11.1.norm.weight torch.Size([320])
model.diffusion_model.output_blocks.11.1.norm.bias torch.Size([320])
model.diffusion_model.output_blocks.11.1.proj_in.weight torch.Size([320, 320, 1, 1])
model.diffusion_model.output_blocks.11.1.proj_in.bias torch.Size([320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn1.to_q.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn1.to_k.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn1.to_v.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([2560, 320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([2560])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.ff.net.2.weight torch.Size([320, 1280])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.ff.net.2.bias torch.Size([320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_q.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_k.weight torch.Size([320, 768])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_v.weight torch.Size([320, 768])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.norm1.weight torch.Size([320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.norm1.bias torch.Size([320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.norm2.weight torch.Size([320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.norm2.bias torch.Size([320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.norm3.weight torch.Size([320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.norm3.bias torch.Size([320])
model.diffusion_model.output_blocks.11.1.proj_out.weight torch.Size([320, 320, 1, 1])
model.diffusion_model.output_blocks.11.1.proj_out.bias torch.Size([320])
model.diffusion_model.out.0.weight torch.Size([320])
model.diffusion_model.out.0.bias torch.Size([320])
model.diffusion_model.out.2.weight torch.Size([4, 320, 3, 3])
model.diffusion_model.out.2.bias torch.Size([4])
first_stage_model.encoder.conv_in.weight torch.Size([128, 3, 3, 3])
first_stage_model.encoder.conv_in.bias torch.Size([128])
first_stage_model.encoder.down.0.block.0.norm1.weight torch.Size([128])
first_stage_model.encoder.down.0.block.0.norm1.bias torch.Size([128])
first_stage_model.encoder.down.0.block.0.conv1.weight torch.Size([128, 128, 3, 3])
first_stage_model.encoder.down.0.block.0.conv1.bias torch.Size([128])
first_stage_model.encoder.down.0.block.0.norm2.weight torch.Size([128])
first_stage_model.encoder.down.0.block.0.norm2.bias torch.Size([128])
first_stage_model.encoder.down.0.block.0.conv2.weight torch.Size([128, 128, 3, 3])
first_stage_model.encoder.down.0.block.0.conv2.bias torch.Size([128])
first_stage_model.encoder.down.0.block.1.norm1.weight torch.Size([128])
first_stage_model.encoder.down.0.block.1.norm1.bias torch.Size([128])
first_stage_model.encoder.down.0.block.1.conv1.weight torch.Size([128, 128, 3, 3])
first_stage_model.encoder.down.0.block.1.conv1.bias torch.Size([128])
first_stage_model.encoder.down.0.block.1.norm2.weight torch.Size([128])
first_stage_model.encoder.down.0.block.1.norm2.bias torch.Size([128])
first_stage_model.encoder.down.0.block.1.conv2.weight torch.Size([128, 128, 3, 3])
first_stage_model.encoder.down.0.block.1.conv2.bias torch.Size([128])
first_stage_model.encoder.down.0.downsample.conv.weight torch.Size([128, 128, 3, 3])
first_stage_model.encoder.down.0.downsample.conv.bias torch.Size([128])
first_stage_model.encoder.down.1.block.0.norm1.weight torch.Size([128])
first_stage_model.encoder.down.1.block.0.norm1.bias torch.Size([128])
first_stage_model.encoder.down.1.block.0.conv1.weight torch.Size([256, 128, 3, 3])
first_stage_model.encoder.down.1.block.0.conv1.bias torch.Size([256])
first_stage_model.encoder.down.1.block.0.norm2.weight torch.Size([256])
first_stage_model.encoder.down.1.block.0.norm2.bias torch.Size([256])
first_stage_model.encoder.down.1.block.0.conv2.weight torch.Size([256, 256, 3, 3])
first_stage_model.encoder.down.1.block.0.conv2.bias torch.Size([256])
first_stage_model.encoder.down.1.block.0.nin_shortcut.weight torch.Size([256, 128, 1, 1])
first_stage_model.encoder.down.1.block.0.nin_shortcut.bias torch.Size([256])
first_stage_model.encoder.down.1.block.1.norm1.weight torch.Size([256])
first_stage_model.encoder.down.1.block.1.norm1.bias torch.Size([256])
first_stage_model.encoder.down.1.block.1.conv1.weight torch.Size([256, 256, 3, 3])
first_stage_model.encoder.down.1.block.1.conv1.bias torch.Size([256])
first_stage_model.encoder.down.1.block.1.norm2.weight torch.Size([256])
first_stage_model.encoder.down.1.block.1.norm2.bias torch.Size([256])
first_stage_model.encoder.down.1.block.1.conv2.weight torch.Size([256, 256, 3, 3])
first_stage_model.encoder.down.1.block.1.conv2.bias torch.Size([256])
first_stage_model.encoder.down.1.downsample.conv.weight torch.Size([256, 256, 3, 3])
first_stage_model.encoder.down.1.downsample.conv.bias torch.Size([256])
first_stage_model.encoder.down.2.block.0.norm1.weight torch.Size([256])
first_stage_model.encoder.down.2.block.0.norm1.bias torch.Size([256])
first_stage_model.encoder.down.2.block.0.conv1.weight torch.Size([512, 256, 3, 3])
first_stage_model.encoder.down.2.block.0.conv1.bias torch.Size([512])
first_stage_model.encoder.down.2.block.0.norm2.weight torch.Size([512])
first_stage_model.encoder.down.2.block.0.norm2.bias torch.Size([512])
first_stage_model.encoder.down.2.block.0.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.encoder.down.2.block.0.conv2.bias torch.Size([512])
first_stage_model.encoder.down.2.block.0.nin_shortcut.weight torch.Size([512, 256, 1, 1])
first_stage_model.encoder.down.2.block.0.nin_shortcut.bias torch.Size([512])
first_stage_model.encoder.down.2.block.1.norm1.weight torch.Size([512])
first_stage_model.encoder.down.2.block.1.norm1.bias torch.Size([512])
first_stage_model.encoder.down.2.block.1.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.encoder.down.2.block.1.conv1.bias torch.Size([512])
first_stage_model.encoder.down.2.block.1.norm2.weight torch.Size([512])
first_stage_model.encoder.down.2.block.1.norm2.bias torch.Size([512])
first_stage_model.encoder.down.2.block.1.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.encoder.down.2.block.1.conv2.bias torch.Size([512])
first_stage_model.encoder.down.2.downsample.conv.weight torch.Size([512, 512, 3, 3])
first_stage_model.encoder.down.2.downsample.conv.bias torch.Size([512])
first_stage_model.encoder.down.3.block.0.norm1.weight torch.Size([512])
first_stage_model.encoder.down.3.block.0.norm1.bias torch.Size([512])
first_stage_model.encoder.down.3.block.0.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.encoder.down.3.block.0.conv1.bias torch.Size([512])
first_stage_model.encoder.down.3.block.0.norm2.weight torch.Size([512])
first_stage_model.encoder.down.3.block.0.norm2.bias torch.Size([512])
first_stage_model.encoder.down.3.block.0.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.encoder.down.3.block.0.conv2.bias torch.Size([512])
first_stage_model.encoder.down.3.block.1.norm1.weight torch.Size([512])
first_stage_model.encoder.down.3.block.1.norm1.bias torch.Size([512])
first_stage_model.encoder.down.3.block.1.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.encoder.down.3.block.1.conv1.bias torch.Size([512])
first_stage_model.encoder.down.3.block.1.norm2.weight torch.Size([512])
first_stage_model.encoder.down.3.block.1.norm2.bias torch.Size([512])
first_stage_model.encoder.down.3.block.1.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.encoder.down.3.block.1.conv2.bias torch.Size([512])
first_stage_model.encoder.mid.block_1.norm1.weight torch.Size([512])
first_stage_model.encoder.mid.block_1.norm1.bias torch.Size([512])
first_stage_model.encoder.mid.block_1.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.encoder.mid.block_1.conv1.bias torch.Size([512])
first_stage_model.encoder.mid.block_1.norm2.weight torch.Size([512])
first_stage_model.encoder.mid.block_1.norm2.bias torch.Size([512])
first_stage_model.encoder.mid.block_1.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.encoder.mid.block_1.conv2.bias torch.Size([512])
first_stage_model.encoder.mid.attn_1.norm.weight torch.Size([512])
first_stage_model.encoder.mid.attn_1.norm.bias torch.Size([512])
first_stage_model.encoder.mid.attn_1.q.weight torch.Size([512, 512, 1, 1])
first_stage_model.encoder.mid.attn_1.q.bias torch.Size([512])
first_stage_model.encoder.mid.attn_1.k.weight torch.Size([512, 512, 1, 1])
first_stage_model.encoder.mid.attn_1.k.bias torch.Size([512])
first_stage_model.encoder.mid.attn_1.v.weight torch.Size([512, 512, 1, 1])
first_stage_model.encoder.mid.attn_1.v.bias torch.Size([512])
first_stage_model.encoder.mid.attn_1.proj_out.weight torch.Size([512, 512, 1, 1])
first_stage_model.encoder.mid.attn_1.proj_out.bias torch.Size([512])
first_stage_model.encoder.mid.block_2.norm1.weight torch.Size([512])
first_stage_model.encoder.mid.block_2.norm1.bias torch.Size([512])
first_stage_model.encoder.mid.block_2.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.encoder.mid.block_2.conv1.bias torch.Size([512])
first_stage_model.encoder.mid.block_2.norm2.weight torch.Size([512])
first_stage_model.encoder.mid.block_2.norm2.bias torch.Size([512])
first_stage_model.encoder.mid.block_2.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.encoder.mid.block_2.conv2.bias torch.Size([512])
first_stage_model.encoder.norm_out.weight torch.Size([512])
first_stage_model.encoder.norm_out.bias torch.Size([512])
first_stage_model.encoder.conv_out.weight torch.Size([8, 512, 3, 3])
first_stage_model.encoder.conv_out.bias torch.Size([8])
first_stage_model.decoder.conv_in.weight torch.Size([512, 4, 3, 3])
first_stage_model.decoder.conv_in.bias torch.Size([512])
first_stage_model.decoder.mid.block_1.norm1.weight torch.Size([512])
first_stage_model.decoder.mid.block_1.norm1.bias torch.Size([512])
first_stage_model.decoder.mid.block_1.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.mid.block_1.conv1.bias torch.Size([512])
first_stage_model.decoder.mid.block_1.norm2.weight torch.Size([512])
first_stage_model.decoder.mid.block_1.norm2.bias torch.Size([512])
first_stage_model.decoder.mid.block_1.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.mid.block_1.conv2.bias torch.Size([512])
first_stage_model.decoder.mid.attn_1.norm.weight torch.Size([512])
first_stage_model.decoder.mid.attn_1.norm.bias torch.Size([512])
first_stage_model.decoder.mid.attn_1.q.weight torch.Size([512, 512, 1, 1])
first_stage_model.decoder.mid.attn_1.q.bias torch.Size([512])
first_stage_model.decoder.mid.attn_1.k.weight torch.Size([512, 512, 1, 1])
first_stage_model.decoder.mid.attn_1.k.bias torch.Size([512])
first_stage_model.decoder.mid.attn_1.v.weight torch.Size([512, 512, 1, 1])
first_stage_model.decoder.mid.attn_1.v.bias torch.Size([512])
first_stage_model.decoder.mid.attn_1.proj_out.weight torch.Size([512, 512, 1, 1])
first_stage_model.decoder.mid.attn_1.proj_out.bias torch.Size([512])
first_stage_model.decoder.mid.block_2.norm1.weight torch.Size([512])
first_stage_model.decoder.mid.block_2.norm1.bias torch.Size([512])
first_stage_model.decoder.mid.block_2.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.mid.block_2.conv1.bias torch.Size([512])
first_stage_model.decoder.mid.block_2.norm2.weight torch.Size([512])
first_stage_model.decoder.mid.block_2.norm2.bias torch.Size([512])
first_stage_model.decoder.mid.block_2.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.mid.block_2.conv2.bias torch.Size([512])
first_stage_model.decoder.up.0.block.0.norm1.weight torch.Size([256])
first_stage_model.decoder.up.0.block.0.norm1.bias torch.Size([256])
first_stage_model.decoder.up.0.block.0.conv1.weight torch.Size([128, 256, 3, 3])
first_stage_model.decoder.up.0.block.0.conv1.bias torch.Size([128])
first_stage_model.decoder.up.0.block.0.norm2.weight torch.Size([128])
first_stage_model.decoder.up.0.block.0.norm2.bias torch.Size([128])
first_stage_model.decoder.up.0.block.0.conv2.weight torch.Size([128, 128, 3, 3])
first_stage_model.decoder.up.0.block.0.conv2.bias torch.Size([128])
first_stage_model.decoder.up.0.block.0.nin_shortcut.weight torch.Size([128, 256, 1, 1])
first_stage_model.decoder.up.0.block.0.nin_shortcut.bias torch.Size([128])
first_stage_model.decoder.up.0.block.1.norm1.weight torch.Size([128])
first_stage_model.decoder.up.0.block.1.norm1.bias torch.Size([128])
first_stage_model.decoder.up.0.block.1.conv1.weight torch.Size([128, 128, 3, 3])
first_stage_model.decoder.up.0.block.1.conv1.bias torch.Size([128])
first_stage_model.decoder.up.0.block.1.norm2.weight torch.Size([128])
first_stage_model.decoder.up.0.block.1.norm2.bias torch.Size([128])
first_stage_model.decoder.up.0.block.1.conv2.weight torch.Size([128, 128, 3, 3])
first_stage_model.decoder.up.0.block.1.conv2.bias torch.Size([128])
first_stage_model.decoder.up.0.block.2.norm1.weight torch.Size([128])
first_stage_model.decoder.up.0.block.2.norm1.bias torch.Size([128])
first_stage_model.decoder.up.0.block.2.conv1.weight torch.Size([128, 128, 3, 3])
first_stage_model.decoder.up.0.block.2.conv1.bias torch.Size([128])
first_stage_model.decoder.up.0.block.2.norm2.weight torch.Size([128])
first_stage_model.decoder.up.0.block.2.norm2.bias torch.Size([128])
first_stage_model.decoder.up.0.block.2.conv2.weight torch.Size([128, 128, 3, 3])
first_stage_model.decoder.up.0.block.2.conv2.bias torch.Size([128])
first_stage_model.decoder.up.1.block.0.norm1.weight torch.Size([512])
first_stage_model.decoder.up.1.block.0.norm1.bias torch.Size([512])
first_stage_model.decoder.up.1.block.0.conv1.weight torch.Size([256, 512, 3, 3])
first_stage_model.decoder.up.1.block.0.conv1.bias torch.Size([256])
first_stage_model.decoder.up.1.block.0.norm2.weight torch.Size([256])
first_stage_model.decoder.up.1.block.0.norm2.bias torch.Size([256])
first_stage_model.decoder.up.1.block.0.conv2.weight torch.Size([256, 256, 3, 3])
first_stage_model.decoder.up.1.block.0.conv2.bias torch.Size([256])
first_stage_model.decoder.up.1.block.0.nin_shortcut.weight torch.Size([256, 512, 1, 1])
first_stage_model.decoder.up.1.block.0.nin_shortcut.bias torch.Size([256])
first_stage_model.decoder.up.1.block.1.norm1.weight torch.Size([256])
first_stage_model.decoder.up.1.block.1.norm1.bias torch.Size([256])
first_stage_model.decoder.up.1.block.1.conv1.weight torch.Size([256, 256, 3, 3])
first_stage_model.decoder.up.1.block.1.conv1.bias torch.Size([256])
first_stage_model.decoder.up.1.block.1.norm2.weight torch.Size([256])
first_stage_model.decoder.up.1.block.1.norm2.bias torch.Size([256])
first_stage_model.decoder.up.1.block.1.conv2.weight torch.Size([256, 256, 3, 3])
first_stage_model.decoder.up.1.block.1.conv2.bias torch.Size([256])
first_stage_model.decoder.up.1.block.2.norm1.weight torch.Size([256])
first_stage_model.decoder.up.1.block.2.norm1.bias torch.Size([256])
first_stage_model.decoder.up.1.block.2.conv1.weight torch.Size([256, 256, 3, 3])
first_stage_model.decoder.up.1.block.2.conv1.bias torch.Size([256])
first_stage_model.decoder.up.1.block.2.norm2.weight torch.Size([256])
first_stage_model.decoder.up.1.block.2.norm2.bias torch.Size([256])
first_stage_model.decoder.up.1.block.2.conv2.weight torch.Size([256, 256, 3, 3])
first_stage_model.decoder.up.1.block.2.conv2.bias torch.Size([256])
first_stage_model.decoder.up.1.upsample.conv.weight torch.Size([256, 256, 3, 3])
first_stage_model.decoder.up.1.upsample.conv.bias torch.Size([256])
first_stage_model.decoder.up.2.block.0.norm1.weight torch.Size([512])
first_stage_model.decoder.up.2.block.0.norm1.bias torch.Size([512])
first_stage_model.decoder.up.2.block.0.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.2.block.0.conv1.bias torch.Size([512])
first_stage_model.decoder.up.2.block.0.norm2.weight torch.Size([512])
first_stage_model.decoder.up.2.block.0.norm2.bias torch.Size([512])
first_stage_model.decoder.up.2.block.0.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.2.block.0.conv2.bias torch.Size([512])
first_stage_model.decoder.up.2.block.1.norm1.weight torch.Size([512])
first_stage_model.decoder.up.2.block.1.norm1.bias torch.Size([512])
first_stage_model.decoder.up.2.block.1.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.2.block.1.conv1.bias torch.Size([512])
first_stage_model.decoder.up.2.block.1.norm2.weight torch.Size([512])
first_stage_model.decoder.up.2.block.1.norm2.bias torch.Size([512])
first_stage_model.decoder.up.2.block.1.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.2.block.1.conv2.bias torch.Size([512])
first_stage_model.decoder.up.2.block.2.norm1.weight torch.Size([512])
first_stage_model.decoder.up.2.block.2.norm1.bias torch.Size([512])
first_stage_model.decoder.up.2.block.2.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.2.block.2.conv1.bias torch.Size([512])
first_stage_model.decoder.up.2.block.2.norm2.weight torch.Size([512])
first_stage_model.decoder.up.2.block.2.norm2.bias torch.Size([512])
first_stage_model.decoder.up.2.block.2.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.2.block.2.conv2.bias torch.Size([512])
first_stage_model.decoder.up.2.upsample.conv.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.2.upsample.conv.bias torch.Size([512])
first_stage_model.decoder.up.3.block.0.norm1.weight torch.Size([512])
first_stage_model.decoder.up.3.block.0.norm1.bias torch.Size([512])
first_stage_model.decoder.up.3.block.0.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.3.block.0.conv1.bias torch.Size([512])
first_stage_model.decoder.up.3.block.0.norm2.weight torch.Size([512])
first_stage_model.decoder.up.3.block.0.norm2.bias torch.Size([512])
first_stage_model.decoder.up.3.block.0.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.3.block.0.conv2.bias torch.Size([512])
first_stage_model.decoder.up.3.block.1.norm1.weight torch.Size([512])
first_stage_model.decoder.up.3.block.1.norm1.bias torch.Size([512])
first_stage_model.decoder.up.3.block.1.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.3.block.1.conv1.bias torch.Size([512])
first_stage_model.decoder.up.3.block.1.norm2.weight torch.Size([512])
first_stage_model.decoder.up.3.block.1.norm2.bias torch.Size([512])
first_stage_model.decoder.up.3.block.1.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.3.block.1.conv2.bias torch.Size([512])
first_stage_model.decoder.up.3.block.2.norm1.weight torch.Size([512])
first_stage_model.decoder.up.3.block.2.norm1.bias torch.Size([512])
first_stage_model.decoder.up.3.block.2.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.3.block.2.conv1.bias torch.Size([512])
first_stage_model.decoder.up.3.block.2.norm2.weight torch.Size([512])
first_stage_model.decoder.up.3.block.2.norm2.bias torch.Size([512])
first_stage_model.decoder.up.3.block.2.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.3.block.2.conv2.bias torch.Size([512])
first_stage_model.decoder.up.3.upsample.conv.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.3.upsample.conv.bias torch.Size([512])
first_stage_model.decoder.norm_out.weight torch.Size([128])
first_stage_model.decoder.norm_out.bias torch.Size([128])
first_stage_model.decoder.conv_out.weight torch.Size([3, 128, 3, 3])
first_stage_model.decoder.conv_out.bias torch.Size([3])
first_stage_model.quant_conv.weight torch.Size([8, 8, 1, 1])
first_stage_model.quant_conv.bias torch.Size([8])
first_stage_model.post_quant_conv.weight torch.Size([4, 4, 1, 1])
first_stage_model.post_quant_conv.bias torch.Size([4])
cond_stage_model.transformer.text_model.embeddings.token_embedding.weight torch.Size([49408, 768])
cond_stage_model.transformer.text_model.embeddings.position_embedding.weight torch.Size([77, 768])
cond_stage_model.transformer.text_model.encoder.layers.0.self_attn.k_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.0.self_attn.k_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.0.self_attn.v_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.0.self_attn.v_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.0.self_attn.q_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.0.self_attn.q_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.0.self_attn.out_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.0.self_attn.out_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.0.layer_norm1.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.0.layer_norm1.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.0.mlp.fc1.weight torch.Size([3072, 768])
cond_stage_model.transformer.text_model.encoder.layers.0.mlp.fc1.bias torch.Size([3072])
cond_stage_model.transformer.text_model.encoder.layers.0.mlp.fc2.weight torch.Size([768, 3072])
cond_stage_model.transformer.text_model.encoder.layers.0.mlp.fc2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.0.layer_norm2.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.0.layer_norm2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.1.self_attn.k_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.1.self_attn.k_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.1.self_attn.v_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.1.self_attn.v_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.1.self_attn.q_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.1.self_attn.q_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.1.self_attn.out_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.1.self_attn.out_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.1.layer_norm1.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.1.layer_norm1.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.1.mlp.fc1.weight torch.Size([3072, 768])
cond_stage_model.transformer.text_model.encoder.layers.1.mlp.fc1.bias torch.Size([3072])
cond_stage_model.transformer.text_model.encoder.layers.1.mlp.fc2.weight torch.Size([768, 3072])
cond_stage_model.transformer.text_model.encoder.layers.1.mlp.fc2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.1.layer_norm2.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.1.layer_norm2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.2.self_attn.k_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.2.self_attn.k_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.2.self_attn.v_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.2.self_attn.v_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.2.self_attn.q_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.2.self_attn.q_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.2.self_attn.out_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.2.self_attn.out_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.2.layer_norm1.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.2.layer_norm1.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.2.mlp.fc1.weight torch.Size([3072, 768])
cond_stage_model.transformer.text_model.encoder.layers.2.mlp.fc1.bias torch.Size([3072])
cond_stage_model.transformer.text_model.encoder.layers.2.mlp.fc2.weight torch.Size([768, 3072])
cond_stage_model.transformer.text_model.encoder.layers.2.mlp.fc2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.2.layer_norm2.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.2.layer_norm2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.3.self_attn.k_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.3.self_attn.k_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.3.self_attn.v_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.3.self_attn.v_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.3.self_attn.q_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.3.self_attn.q_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.3.self_attn.out_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.3.self_attn.out_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.3.layer_norm1.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.3.layer_norm1.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.3.mlp.fc1.weight torch.Size([3072, 768])
cond_stage_model.transformer.text_model.encoder.layers.3.mlp.fc1.bias torch.Size([3072])
cond_stage_model.transformer.text_model.encoder.layers.3.mlp.fc2.weight torch.Size([768, 3072])
cond_stage_model.transformer.text_model.encoder.layers.3.mlp.fc2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.3.layer_norm2.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.3.layer_norm2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.4.self_attn.k_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.4.self_attn.k_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.4.self_attn.v_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.4.self_attn.v_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.4.self_attn.q_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.4.self_attn.q_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.4.self_attn.out_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.4.self_attn.out_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.4.layer_norm1.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.4.layer_norm1.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.4.mlp.fc1.weight torch.Size([3072, 768])
cond_stage_model.transformer.text_model.encoder.layers.4.mlp.fc1.bias torch.Size([3072])
cond_stage_model.transformer.text_model.encoder.layers.4.mlp.fc2.weight torch.Size([768, 3072])
cond_stage_model.transformer.text_model.encoder.layers.4.mlp.fc2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.4.layer_norm2.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.4.layer_norm2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.5.self_attn.k_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.5.self_attn.k_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.5.self_attn.v_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.5.self_attn.v_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.5.self_attn.q_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.5.self_attn.q_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.5.self_attn.out_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.5.self_attn.out_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.5.layer_norm1.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.5.layer_norm1.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.5.mlp.fc1.weight torch.Size([3072, 768])
cond_stage_model.transformer.text_model.encoder.layers.5.mlp.fc1.bias torch.Size([3072])
cond_stage_model.transformer.text_model.encoder.layers.5.mlp.fc2.weight torch.Size([768, 3072])
cond_stage_model.transformer.text_model.encoder.layers.5.mlp.fc2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.5.layer_norm2.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.5.layer_norm2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.6.self_attn.k_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.6.self_attn.k_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.6.self_attn.v_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.6.self_attn.v_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.6.self_attn.q_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.6.self_attn.q_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.6.self_attn.out_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.6.self_attn.out_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.6.layer_norm1.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.6.layer_norm1.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.6.mlp.fc1.weight torch.Size([3072, 768])
cond_stage_model.transformer.text_model.encoder.layers.6.mlp.fc1.bias torch.Size([3072])
cond_stage_model.transformer.text_model.encoder.layers.6.mlp.fc2.weight torch.Size([768, 3072])
cond_stage_model.transformer.text_model.encoder.layers.6.mlp.fc2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.6.layer_norm2.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.6.layer_norm2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.7.self_attn.k_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.7.self_attn.k_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.7.self_attn.v_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.7.self_attn.v_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.7.self_attn.q_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.7.self_attn.q_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.7.self_attn.out_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.7.self_attn.out_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.7.layer_norm1.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.7.layer_norm1.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.7.mlp.fc1.weight torch.Size([3072, 768])
cond_stage_model.transformer.text_model.encoder.layers.7.mlp.fc1.bias torch.Size([3072])
cond_stage_model.transformer.text_model.encoder.layers.7.mlp.fc2.weight torch.Size([768, 3072])
cond_stage_model.transformer.text_model.encoder.layers.7.mlp.fc2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.7.layer_norm2.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.7.layer_norm2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.8.self_attn.k_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.8.self_attn.k_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.8.self_attn.v_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.8.self_attn.v_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.8.self_attn.q_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.8.self_attn.q_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.8.self_attn.out_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.8.self_attn.out_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.8.layer_norm1.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.8.layer_norm1.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.8.mlp.fc1.weight torch.Size([3072, 768])
cond_stage_model.transformer.text_model.encoder.layers.8.mlp.fc1.bias torch.Size([3072])
cond_stage_model.transformer.text_model.encoder.layers.8.mlp.fc2.weight torch.Size([768, 3072])
cond_stage_model.transformer.text_model.encoder.layers.8.mlp.fc2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.8.layer_norm2.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.8.layer_norm2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.9.self_attn.k_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.9.self_attn.k_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.9.self_attn.v_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.9.self_attn.v_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.9.self_attn.q_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.9.self_attn.q_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.9.self_attn.out_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.9.self_attn.out_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.9.layer_norm1.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.9.layer_norm1.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.9.mlp.fc1.weight torch.Size([3072, 768])
cond_stage_model.transformer.text_model.encoder.layers.9.mlp.fc1.bias torch.Size([3072])
cond_stage_model.transformer.text_model.encoder.layers.9.mlp.fc2.weight torch.Size([768, 3072])
cond_stage_model.transformer.text_model.encoder.layers.9.mlp.fc2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.9.layer_norm2.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.9.layer_norm2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.10.self_attn.k_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.10.self_attn.k_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.10.self_attn.v_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.10.self_attn.v_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.10.self_attn.q_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.10.self_attn.q_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.10.self_attn.out_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.10.self_attn.out_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.10.layer_norm1.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.10.layer_norm1.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.10.mlp.fc1.weight torch.Size([3072, 768])
cond_stage_model.transformer.text_model.encoder.layers.10.mlp.fc1.bias torch.Size([3072])
cond_stage_model.transformer.text_model.encoder.layers.10.mlp.fc2.weight torch.Size([768, 3072])
cond_stage_model.transformer.text_model.encoder.layers.10.mlp.fc2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.10.layer_norm2.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.10.layer_norm2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.11.self_attn.k_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.11.self_attn.k_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.11.self_attn.v_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.11.self_attn.v_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.11.self_attn.q_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.11.self_attn.q_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.11.self_attn.out_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.11.self_attn.out_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.11.layer_norm1.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.11.layer_norm1.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.11.mlp.fc1.weight torch.Size([3072, 768])
cond_stage_model.transformer.text_model.encoder.layers.11.mlp.fc1.bias torch.Size([3072])
cond_stage_model.transformer.text_model.encoder.layers.11.mlp.fc2.weight torch.Size([768, 3072])
cond_stage_model.transformer.text_model.encoder.layers.11.mlp.fc2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.11.layer_norm2.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.11.layer_norm2.bias torch.Size([768])
cond_stage_model.transformer.text_model.final_layer_norm.weight torch.Size([768])
cond_stage_model.transformer.text_model.final_layer_norm.bias torch.Size([768])
```