## 計算手法
- [normal](#normal)
- [cosineA/cosineB](#cosine)
- [trainDifference](#train)
- [smoothAdd](#smooth)
- [extract](#extract)
- [tensor](#tensor)
## normal
### _Available modes :_ All
通常の計算方法です。全てのモードで使用できます。
## cosineA/cosineB
### _Available modes :_ weight sum
2つのモデルの比較は、設定された比率を中心にコサイン類似度を使用して行われ、マージによる損失を排除するために計算されます。詳細については以下を参照してください。
https://github.com/hako-mikan/sd-webui-supermerger/issues/33 https://github.com/recoilme/losslessmix
元のシンプルな重みモードは最も基本的な方法で、与えられた重みアルファに基づいて2つのモデル間で線形に補間します。アルファが0の場合、出力は最初のモデル(モデルA)で、アルファが1の場合、出力は二番目のモデル(モデルB)です。アルファの他の値は、2つのモデルの重み付き平均になります。
- AnythingV3とFeverDreamモデルの元のマージ結果
`charming girl mid-shot. scenery-beautiful majestic`
![MergeStandard](https://user-images.githubusercontent.com/6239068/232734670-958a6db3-1022-49ed-af73-f777223e71e6.png)
コサイン手法の一つの大きな利点は、元のシンプルな加算と比べて、2つのモデル間の構造的な類似性を考慮に入れる点です。これにより、2つのモデルが似ているが同一ではない場合に、より良い結果が得られる可能性があります。また、コサイン手法のもう一つの利点は、一方のモデルから他方へ組み込まれる詳細の量を制限することで、過学習を防ぎ、一般化を改善するのに役立つ点です。
**CosineAの場合**, 最初のモデル(モデルA)のベクトルをマージする前に正規化するため、結果として得られるマージモデルは、第一モデルの構造を優先しつつ、第二モデルからの詳細を組み込むことになります。これは、基本的に第一モデルのベクトルの方向を、第二モデルの対応するベクトルの方向に合わせているからです。
- AnythingV3とFeverDreamのマージ結果。
_ポーズの方向・流れと顔の領域に関して構造的な注意点を覚えておいてください。_
![MergeCosineA](https://user-images.githubusercontent.com/6239068/232741979-f40450ab-6006-47e5-ae00-cf5e89b7ac09.png)
_描き込みに関して上下を比べると、すべての場合において、通常のマージの線形な変化とは異なり、背景に対して前景よりもより多くのぼかしが保持されている点に注意してください。_
**CosineBの場合**一方で、CosineBでは、マージする前に第二モデル(モデルB)のベクトルを正規化するため、結果として得られるマージモデルは、第二モデルの構造を優先しつつ、第一モデルからの詳細を組み込むことになります。これは、第二モデルのベクトルの方向を、第一モデルの対応するベクトルの方向に合わせているためです。
- AnythingV3とFeverDreamのマージ結果
_構造的には、ポーズの方向・流れと顔の領域に注目し、背景では右側からの形状をより保持しようとしている点にも注意してください。_
![MergeCosineB](https://user-images.githubusercontent.com/6239068/232744751-20786eff-a654-468c-93e7-c19db5829c69.png)
**要約すると、CosineAとCosineBの選択は、結果として得られるマージモデルでどちらのモデルの構造を優先したいかに依存します。もし第一モデルの構造を優先したい場合はCosineAを使用し、第二モデルの構造を優先したい場合はCosineBを使用します。**
また、Alpha 1での変化と比較して第二モデルがマージングの「参照点」となっていることにも注目してください。したがって、モデルの順序も変更することで、望ましい出力を得るための最終結果に影響を与える可能性があります。
- FeverDreamとAnythingV3のマージ結果
![MergeOppositeCosineA](https://user-images.githubusercontent.com/6239068/232741034-ce3c9739-7f5a-4a7d-b979-fec4ac7d9b71.png)
## trainDifference
### _Available modes :_ Add difference
この方法は、そのもっとも単純な形では、「永続的なマージのためのスーパーローラ」と考えることができます。これにより、モデル(B)と(C)の間の計算された差をモデル(A)に加えるのではなく、その差をモデル(A)に対して微調整しているかのように「訓練」します。
### 比較
- **通常の addDifference(差分の追加) と trainDifference(差分の訓練) の違い**
With [rev animated](https://civitai.com/models/7371/rev-animated) and [isometric-future](https://civitai.com/models/10063/isometric-future)
*"IsometricFuture, garden, IsometricFuture"*
**Generated with addDifference ('rev animated')+('isometric future'-'sdv1.5')**
![IsometricA](https://github.com/hako-mikan/sd-webui-supermerger/assets/6239068/bde0c09b-4cc4-447b-acf9-da175192b546)
**Generated with trainDifference ('rev animated')+('isometric future'-'sdv1.5')**
![IsometricB](https://github.com/hako-mikan/sd-webui-supermerger/assets/6239068/afb053aa-2ace-4fe5-8b62-e7e29ae5edaf)
With [rev animated](https://civitai.com/models/7371/rev-animated) and [anything v3](https://civitai.com/models/66?modelVersionId=75)
*"man smiling"*
**Generated with 'rev animated'**
![FaceA](https://github.com/hako-mikan/sd-webui-supermerger/assets/6239068/796400d6-b740-466b-beae-0ffd70276850)
**Generated with addDifference ('rev animated')+('anything v3'-'sdv1.4')**
![FaceB](https://github.com/hako-mikan/sd-webui-supermerger/assets/6239068/d44d9a08-427c-47c9-9cde-d2750e880a54)
**Generated with trainDifference ('rev animated')+('anything v3'-'sdv1.4')**
![FaceC](https://github.com/hako-mikan/sd-webui-supermerger/assets/6239068/872259c4-af29-4624-ac65-a820d5edfd33)
- **Lora vs trainDifference**
LoRAがこのような理由から無効になるわけではありません。LoRAはその使い勝手の良さ、プラグアンドプレイの柔軟性などから、引き続き有用です。
しかし、あるモデルは「LoRAとうまく合わない」という話がよくされています。Civitaiのユーザーのために、LoRAとの関係でトレーニングできる「AnyLoRA」のようなモデルも開発されています。 この利点を生かす方法とtrainDifferenceでのトレーニング方法は[こちら](#LoramergingfortrainDifference)を参照してください。
明らかにアニメ風LoRAに必要な「互換性」から距離のある[FeverDream](https://civitai.com/models/26396?modelVersionId=32375)と、LoRAバージョンと[Anything V4.5](https://huggingface.co/andite/anything-v4.0/blob/main/anything-v4.5-pruned.safetensors)との事前マージバージョンの両方を提供した[Thicker Lines Anime Style LoRA Mix](https://civitai.com/models/13910?modelVersionId=16368)を使用し、FeverDreamのLoRAとtrainDifference(FeverDream + Thicker Lines - Anythingv4.5)の直接比較を行います。
LoRA強度とマージ強度を1/1.2/1.4/1.6と変化させて比較します。極端な値で、LoRAがtrainDifferenceとどう異なるかが最もわかりやすいでしょう。
### Usage guidance
#### 可能性と一般的な使い方
マージする代わりにモデルを新しい概念で拡張するか、既存の概念(と品質の高いアウトプット)を強化する
Sci-Fi Diffusionは、一般的なSF画像を使ってトレーニングされたモデルです。もう他のモデルと統合する必要はなく、SDv1.5に対してtrainDifferencingを行うことで、実質的にモデルにSci-Fi要素をトレーニングすることができます。LoRAのような近似差分を生成する制限はありません。
別の例として、性質が似ている[Analog Diffusion](https://civitai.com/models/1265/analog-diffusion)と[Timeless Diffusion](https://civitai.com/models/3557?modelVersionId=3936)をコサイン類似度で統合し、それに写真のネガティブな要素を強化しすぎないように注意しつつ、中距離のボディショットに焦点を当てた[Modelshoot Style](https://civitai.com/models/2147/modelshoot-style)をトレーニングすることもできます。
さらに、[Surreality](https://civitai.com/models/21666?modelVersionId=25854)や[seek.art MEGA](https://civitai.com/models/1315?modelVersionId=22808)のような広範なモデルに対して、V2でライセンス制限が緩和されたおかげで、これまでよりもはるかに大きな可能性があります。もちろん、入出力の異なるウェイトでのスタイリングには依然として価値がありますが、すべては目標によって異なります。
また、[RPG](https://civitai.com/models/1116?modelVersionId=7133)のようなモデルは、SDv1.5から開発されているようで、F222等の統合から生じるNSFWや女性バイアスなしでモデルにトレーニングを行うことができます。
トレーニングの差異の方向とスタイルが重要
リアリスティックなモデルをトレーニングすることは、スタイリスティックなモデルをトレーニングするよりも難しいです。たとえば、最終的にスタイリスティックなモデルを目指す場合は、似たスタイルに基づいて複数のモデルブランチを作り、最終的にスタイリスティックなブランチを最もリアリスティックなブランチに対してtrainDifferenceすることを検討してください。一般的に、スタイルが異なる場合はアニメ/カートゥーン > スタイリッシュ > リアリスティックの順で統合すべきです。
trainDifferenceが常に最適解ではない
場合によっては、差異のタイプや範囲によって、コサイン類似度による統合がより良い結果をもたらすことがあります(差異がSDv1.5からではない場合、まずSDv1.5に対してtrainDifferenceを行い、その後コサイン類似度で統合してから、作業中のモデルに対してtrainDifferenceを行います)。また、素材が似ているが広範囲にわたる場合、最良の結果を得るためには、両方向にtrainDifferenceを使用し、その2つの間で重み付き合計を行うことが最適な場合があります。例えば[waifu diffusion](https://huggingface.co/hakurei/waifu-diffusion-v1-3)や[Acertainty](https://huggingface.co/JosephusCheung/ACertainty)が該当します。
どこでも訓練されたモデルの利点を得る
[Knollingcase](https://civitai.com/models/1092?modelVersionId=1093)や[Bubble Toys](https://civitai.com/models/23945/bubble-toys-the-model)のようなモデルは魅力的ですが、これらはトレーニングされたフレームワークによってその効果が限られていました。今では、これらのモデルを他の人が開発した新しいモデルにtrainDifferenceでトレーニングすることができます。
さらに、LoRAではなくチェックポイントを作成した人々の中には、最初にLoRAを試したものの有益な結果を得られなかったと述べている人もいますが、trainDifferenceを使えば彼らの作業をどのモデルにも適用することができます。
#### 限界と避けるべきこと
モデルの事前トレーニングの起源を知り、アクセスすることが必要
現在、多くのモデルがSDv1.4の何らかの混合を持っています。このtrainDifference統合は十分に正確であり、例えば「rev animated」をSDv1.5をモデル(C)として「Sci-fi Diffusion」にトレーニングしようとすると、問題が生じます。なぜなら、「rev animated」の起源はSDv1.4とSDv1.5の間の不明な比率(そして個々の入出力のウェイトの混合も)であるため、統合は出力に悪影響を与えます(「トレーニング」がずれたり歪んだりする)。しかし、「Sci-fi Diffusion」をSDv1.5でトレーニングされた「rev animated」に対してtrainDifferenceすることは可能です。
十分な時間が経過すると、または類似した素材を使用すると、「焼き込み」/「過度のトレーニング」が発生する可能性がある
この時点で、モデルをSDv1.5とのコサイン類似度による統合によって「引き戻す」ことができます。これにより、トレーニングから得た質を保ちつつ、モデルを基盤に戻すことができます。
十分な量の統合が行われると、「クリップ/理解」が重くなり、シンプルなプロンプトに悪影響を与える可能性がある
たとえば、複雑なプロンプトは依然として良い見た目を保つかもしれませんが、「女性の肖像、青い目」といったシンプルなプロンプトでは「青」の概念が過度に現れるかもしれません。これを避けるためには、trainDifference統合や広範囲のスコープを行う際に、[model toolkit](https://github.com/arenasys/stable-diffusion-webui-model-toolkit)を使用してクリップを操作することができます。最終モデルをその拡張機能に読み込み、2つの異なるモデルを作成します。'clipA' は基本モデルのクリップをインポートし、'clipB' はトレーニングしたもののクリップをインポートします。そして、これら2つのモデル間で通常のweightsum統合を使用して、最適な出力/理解を見つけ、モデルを拡張する際にクリップを和らげます。場合によっては、最終モデルをSDv1.5のクリップを使用したバージョンとweightsum統合することが、clipAとclipBの間で混合するよりも良い結果をもたらすことがあります。
このテキストは、AIモデルトレーニングの実践的なデモンストレーションに関する説明です。翻訳します。
---
#### 実践的なデモンストレーション
- これを活用するよりシンプルな方法の一つは、異なるモデルに対してより自然で正確なLoRAスタイリングを行うことです。
ここでは、[BreakDomainAnime](https://civitai.com/models/72675/breakdomainanime)と、[AnyLora](https://civitai.com/models/23900?modelVersionId=28562)でトレーニングされた[Mika Pikazo Style LoRA](https://civitai.com/models/8479/mika-pikazo-style-lora)を使用します。
*"1girl, smiling, scenic background BREAK [mika-pikazo]"*
**'BreakDomainAnime'で生成された画像**
![LoraDifferenceA](https://github.com/hako-mikan/sd-webui-supermerger/assets/6239068/cbcbf1bf-c58b-4e70-b8af-1baf6d4102ce)
**'Mika Pikazo Style LoRA'を1の強度で使用して'BreakDomainAnime'で生成された画像**
![LoraDifferenceB](https://github.com/hako-mikan/sd-webui-supermerger/assets/6239068/332a61f1-d1d1-4c84-9639-f56c94e556db)
ここでは、LoRAを'BreakDomainAnime'に適用するのではなく、trainDifferenceを使用してより良いアライメントを得ます。
SuperMergerのLoRAタブを使用し、「Mika Pikazo Style LoRA」をトレーニング用に記述されたチェックポイント「anyloraCheckpoint_novaeFp16」(彼らがトレーニングに使用していると推測されるもの)に統合し、「anyloraCheckpoint_mika_pikazo」として保存します。
その後 **trainDifference ('BreakDomainAnime')+('任意のLoRA組み合わせをAnyLoraに統合、この場合anyloraCheckpoint_mika_pikazo'-'AnyLora') を使用して生成**
![LoraDifferenceC](https://github.com/hako-mikan/sd-webui-supermerger/assets/6239068/50457b98-b2ad-4e28-a048-b023a86a2530)
LoRAと上記の技術を使ってもう一つの例を示します。もともとアニメモデルでトレーニングされた背景LoRAをリアリスティックなモデルに移行するtrainDifferenceです。
*"An eco-friendly residential building covered in vertical gardens in an urban setting"*
![LoraTraindifferenceBackgroundExamplepng](https://github.com/hako-mikan/sd-webui-supermerger/assets/6239068/c40c1833-f166-49b5-abfe-56a28780a736)
## smoothAdd
### _Available modes :_ Add difference
MedianフィルタとGaussianフィルタの利点を組み合わせた差分追加の方法です。この方法では、多くのモデルをこのように追加するときに見られるネガティブな「焼き付け」効果を避けつつ、よりスムーズな方法でモデルの差分を追加しようとします。これは単に差分を低い値で追加するだけでは得られない効果をもたらします。
- 参照のための出発点
![Untitled-1](https://user-images.githubusercontent.com/6239068/232780130-19caa53a-a767-4ee1-80a7-dc37ad948322.png)
- それに対して、それぞれの値が1であるモデルのコレクションを追加
`ここでは焼き付けが非常に明白`
![Untitled-2](https://user-images.githubusercontent.com/6239068/232781113-3e2de251-711d-463a-82c9-a080be47e180.png)
- それに対して、それぞれの値が0.5であるモデルのコレクションを追加
`特に鳥の部分で見ると、まだ私が受け入れる結果ではありません`
![Untitled-3](https://user-images.githubusercontent.com/6239068/232785787-cfde6967-fc86-47e8-b208-3aa8f5f46c40.png)
単独のMedianフィルタの機能と結果
- 差分のノイズを減らすために、各値を隣接する値の中央値で置き換えます。
- 物体の形状や境界に関連する学習を転送したい場合に役立つよう、差分のエッジと構造を保存します。
- 非線形フィルタリングであるため、差分の重要な特徴を保存しつつノイズを減らすことができます。
![Untitled-5](https://user-images.githubusercontent.com/6239068/232785599-1e40ee9f-43de-4721-bb5f-0c21485fd8d3.png)
単独のGaussianフィルタの機能と結果
- ガウシアンカーネルを適用して差分を滑らかにし、高周波ノイズを減らし低周波成分を保持します。
- スムージングのレベルはシグマパラメータによって制御でき、異なるレベルのスムージングを試すことができます。
- 線形フィルタリングであるため、グローバルな構造を保存しつつノイズを減らすことができます。
![Untitled-4](https://user-images.githubusercontent.com/6239068/232785723-aecce7bb-1bc6-4731-a879-f8a7e4dc5a0c.png)
- MedianフィルタとGaussianフィルタの組み合わせを使用した最終結果
_特に、Median/Gaussianフィルタを個別に使用した場合と比較して、右上の画像で左上の男性の髪が「固まらない」ことがわかります。ここでの組み合わせが全体的に最良の結果をもたらしています_
![Untitled-6](https://user-images.githubusercontent.com/6239068/232786207-f7f41c55-939e-46a1-ab24-2e6d885f65f9.png)
>**ヒント**
>時には、焼き付けのリスクがなくても、通常のAddとは異なるsmooth Add差分を使用したい場合があります。
>これらのケースでは、smooth Addで1よりも個別の影響が小さいため、Alphaを2まで上げることができます。しかし、これはもちろん望む結果に依存します。
##
### _Available modes :_ Add difference
この方法は、共通の基本モデルに基づいて構築された*2つの差分モデル*から、**類似または非類似の特徴**を抽出するために設計されています。
### 3つのフルパラメーターモデルを使用する構成
この構成では、基本モデル (**モデルA**) と、**モデルA**から派生した2つのモデル (**モデルB**と**モデルC**) を使用します。関連する*2つの差分モデル*は、「**モデルB - モデルA**」と「**モデルC - モデルA**」です。偽の類似性を減らすために、**モデルA**は、2つの派生モデルの*最新の共通祖先*となっていることが望ましいです。
###
この構成では、*2つの差分モデル*として、**LoRA-B**と**LoRA-C**を直接使用します。これらのモデルは、3モデル構成の**モデルA**と同様に、共通の基本モデルに基づいて学習されていると仮定されます。もし、**LoRA-B**と**LoRA-C**が異なる基本モデルから派生している場合、基本モデルの相違により結果が予測不可能になる可能性があります。
### 主要パラメータ
- **alpha (*α*)**: **モデル(LoRA)B** (***α* = 0**) と**モデル(LoRA)C** (***α* = 1**) の間で特徴抽出の焦点を制御します。
- **beta (*β*)**: 特徴抽出の性質を制御し、***β* = 0**では**類似特徴抽出**、***β* = 1**では**非類似特徴抽出**を意味します。
- **gamma (*γ*)**: 特徴の類似性・非類似性の判定の程度を調整します。**高い*γ* (例えば、*γ* = 10)** は、*より類似した*特徴のみを類似として認識します。逆に、**低い*γ* (例えば、*γ* = 0.1)** は、*より非類似した*特徴のみを非類似として認識します。
### 使用シナリオ
- ***α* = 0, *β* = 0**: **モデルB**の特徴から**モデルC**の特徴と類似しているものを抽出します。
- ***α* = 0, *β* = 0.5**: 両者の中間に位置するため、類似特徴抽出でも非類似特徴抽出でもなく、下記の式によって表される単純な結果になります。
- **フルパラメーターモデルの場合**: $\frac{\text{A} + \text{lerp}(\text{B}, \text{C}, \alpha)}{2}$
- **LoRAネットワークの場合**: $\frac{\text{lerp}(\text{B}, \text{C}, \alpha)}{2}$
- ***α* = 0, *β* = 1**: **モデルB**の特徴から**モデルC**の特徴と非類似しているものを抽出します。
- ***α* = 1**: 上記例での**モデルB**と**モデルC**の役目を逆転させます。
## tensor
### Available modes : weight sum only
通常のマージでは下図のようにテンソルの加算が行われノーマライズ(2分の1)されます。tensorではテンソルの部分部分を取り替えることによりマージが行われます。これはつまり、元のモデルのテンソルがそのまま維持されることを意味し、通常のマージとは異なる結果を得られます。tensorとtensor2の違いは、tensor2の場合、次元の大きなテンソル([1280, 1280]のような)の時のみ、2時限目を基準にして分割しますす。tensorの場合は1次元めです。
![](https://github.com/hako-mikan/sd-webui-supermerger/blob/images/tensor.jpg)
The tensor size of each element is noted below.
```
model.diffusion_model.time_embed.0.weight torch.Size([1280, 320])
model.diffusion_model.time_embed.0.bias torch.Size([1280])
model.diffusion_model.time_embed.2.weight torch.Size([1280, 1280])
model.diffusion_model.time_embed.2.bias torch.Size([1280])
model.diffusion_model.input_blocks.0.0.weight torch.Size([320, 4, 3, 3])
model.diffusion_model.input_blocks.0.0.bias torch.Size([320])
model.diffusion_model.input_blocks.1.0.in_layers.0.weight torch.Size([320])
model.diffusion_model.input_blocks.1.0.in_layers.0.bias torch.Size([320])
model.diffusion_model.input_blocks.1.0.in_layers.2.weight torch.Size([320, 320, 3, 3])
model.diffusion_model.input_blocks.1.0.in_layers.2.bias torch.Size([320])
model.diffusion_model.input_blocks.1.0.emb_layers.1.weight torch.Size([320, 1280])
model.diffusion_model.input_blocks.1.0.emb_layers.1.bias torch.Size([320])
model.diffusion_model.input_blocks.1.0.out_layers.0.weight torch.Size([320])
model.diffusion_model.input_blocks.1.0.out_layers.0.bias torch.Size([320])
model.diffusion_model.input_blocks.1.0.out_layers.3.weight torch.Size([320, 320, 3, 3])
model.diffusion_model.input_blocks.1.0.out_layers.3.bias torch.Size([320])
model.diffusion_model.input_blocks.1.1.norm.weight torch.Size([320])
model.diffusion_model.input_blocks.1.1.norm.bias torch.Size([320])
model.diffusion_model.input_blocks.1.1.proj_in.weight torch.Size([320, 320, 1, 1])
model.diffusion_model.input_blocks.1.1.proj_in.bias torch.Size([320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn1.to_q.weight torch.Size([320, 320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn1.to_k.weight torch.Size([320, 320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn1.to_v.weight torch.Size([320, 320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([320, 320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([2560, 320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([2560])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.ff.net.2.weight torch.Size([320, 1280])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.ff.net.2.bias torch.Size([320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_q.weight torch.Size([320, 320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_k.weight torch.Size([320, 768])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_v.weight torch.Size([320, 768])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([320, 320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.norm1.weight torch.Size([320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.norm1.bias torch.Size([320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.norm2.weight torch.Size([320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.norm2.bias torch.Size([320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.norm3.weight torch.Size([320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.norm3.bias torch.Size([320])
model.diffusion_model.input_blocks.1.1.proj_out.weight torch.Size([320, 320, 1, 1])
model.diffusion_model.input_blocks.1.1.proj_out.bias torch.Size([320])
model.diffusion_model.input_blocks.2.0.in_layers.0.weight torch.Size([320])
model.diffusion_model.input_blocks.2.0.in_layers.0.bias torch.Size([320])
model.diffusion_model.input_blocks.2.0.in_layers.2.weight torch.Size([320, 320, 3, 3])
model.diffusion_model.input_blocks.2.0.in_layers.2.bias torch.Size([320])
model.diffusion_model.input_blocks.2.0.emb_layers.1.weight torch.Size([320, 1280])
model.diffusion_model.input_blocks.2.0.emb_layers.1.bias torch.Size([320])
model.diffusion_model.input_blocks.2.0.out_layers.0.weight torch.Size([320])
model.diffusion_model.input_blocks.2.0.out_layers.0.bias torch.Size([320])
model.diffusion_model.input_blocks.2.0.out_layers.3.weight torch.Size([320, 320, 3, 3])
model.diffusion_model.input_blocks.2.0.out_layers.3.bias torch.Size([320])
model.diffusion_model.input_blocks.2.1.norm.weight torch.Size([320])
model.diffusion_model.input_blocks.2.1.norm.bias torch.Size([320])
model.diffusion_model.input_blocks.2.1.proj_in.weight torch.Size([320, 320, 1, 1])
model.diffusion_model.input_blocks.2.1.proj_in.bias torch.Size([320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn1.to_q.weight torch.Size([320, 320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn1.to_k.weight torch.Size([320, 320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn1.to_v.weight torch.Size([320, 320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([320, 320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([2560, 320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([2560])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.ff.net.2.weight torch.Size([320, 1280])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.ff.net.2.bias torch.Size([320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_q.weight torch.Size([320, 320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_k.weight torch.Size([320, 768])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_v.weight torch.Size([320, 768])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([320, 320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.norm1.weight torch.Size([320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.norm1.bias torch.Size([320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.norm2.weight torch.Size([320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.norm2.bias torch.Size([320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.norm3.weight torch.Size([320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.norm3.bias torch.Size([320])
model.diffusion_model.input_blocks.2.1.proj_out.weight torch.Size([320, 320, 1, 1])
model.diffusion_model.input_blocks.2.1.proj_out.bias torch.Size([320])
model.diffusion_model.input_blocks.3.0.op.weight torch.Size([320, 320, 3, 3])
model.diffusion_model.input_blocks.3.0.op.bias torch.Size([320])
model.diffusion_model.input_blocks.4.0.in_layers.0.weight torch.Size([320])
model.diffusion_model.input_blocks.4.0.in_layers.0.bias torch.Size([320])
model.diffusion_model.input_blocks.4.0.in_layers.2.weight torch.Size([640, 320, 3, 3])
model.diffusion_model.input_blocks.4.0.in_layers.2.bias torch.Size([640])
model.diffusion_model.input_blocks.4.0.emb_layers.1.weight torch.Size([640, 1280])
model.diffusion_model.input_blocks.4.0.emb_layers.1.bias torch.Size([640])
model.diffusion_model.input_blocks.4.0.out_layers.0.weight torch.Size([640])
model.diffusion_model.input_blocks.4.0.out_layers.0.bias torch.Size([640])
model.diffusion_model.input_blocks.4.0.out_layers.3.weight torch.Size([640, 640, 3, 3])
model.diffusion_model.input_blocks.4.0.out_layers.3.bias torch.Size([640])
model.diffusion_model.input_blocks.4.0.skip_connection.weight torch.Size([640, 320, 1, 1])
model.diffusion_model.input_blocks.4.0.skip_connection.bias torch.Size([640])
model.diffusion_model.input_blocks.4.1.norm.weight torch.Size([640])
model.diffusion_model.input_blocks.4.1.norm.bias torch.Size([640])
model.diffusion_model.input_blocks.4.1.proj_in.weight torch.Size([640, 640, 1, 1])
model.diffusion_model.input_blocks.4.1.proj_in.bias torch.Size([640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn1.to_q.weight torch.Size([640, 640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn1.to_k.weight torch.Size([640, 640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn1.to_v.weight torch.Size([640, 640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([640, 640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([5120, 640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([5120])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.ff.net.2.weight torch.Size([640, 2560])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.ff.net.2.bias torch.Size([640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_q.weight torch.Size([640, 640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_k.weight torch.Size([640, 768])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_v.weight torch.Size([640, 768])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([640, 640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.norm1.weight torch.Size([640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.norm1.bias torch.Size([640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.norm2.weight torch.Size([640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.norm2.bias torch.Size([640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.norm3.weight torch.Size([640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.norm3.bias torch.Size([640])
model.diffusion_model.input_blocks.4.1.proj_out.weight torch.Size([640, 640, 1, 1])
model.diffusion_model.input_blocks.4.1.proj_out.bias torch.Size([640])
model.diffusion_model.input_blocks.5.0.in_layers.0.weight torch.Size([640])
model.diffusion_model.input_blocks.5.0.in_layers.0.bias torch.Size([640])
model.diffusion_model.input_blocks.5.0.in_layers.2.weight torch.Size([640, 640, 3, 3])
model.diffusion_model.input_blocks.5.0.in_layers.2.bias torch.Size([640])
model.diffusion_model.input_blocks.5.0.emb_layers.1.weight torch.Size([640, 1280])
model.diffusion_model.input_blocks.5.0.emb_layers.1.bias torch.Size([640])
model.diffusion_model.input_blocks.5.0.out_layers.0.weight torch.Size([640])
model.diffusion_model.input_blocks.5.0.out_layers.0.bias torch.Size([640])
model.diffusion_model.input_blocks.5.0.out_layers.3.weight torch.Size([640, 640, 3, 3])
model.diffusion_model.input_blocks.5.0.out_layers.3.bias torch.Size([640])
model.diffusion_model.input_blocks.5.1.norm.weight torch.Size([640])
model.diffusion_model.input_blocks.5.1.norm.bias torch.Size([640])
model.diffusion_model.input_blocks.5.1.proj_in.weight torch.Size([640, 640, 1, 1])
model.diffusion_model.input_blocks.5.1.proj_in.bias torch.Size([640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn1.to_q.weight torch.Size([640, 640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn1.to_k.weight torch.Size([640, 640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn1.to_v.weight torch.Size([640, 640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([640, 640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([5120, 640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([5120])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.ff.net.2.weight torch.Size([640, 2560])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.ff.net.2.bias torch.Size([640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_q.weight torch.Size([640, 640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_k.weight torch.Size([640, 768])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_v.weight torch.Size([640, 768])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([640, 640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.norm1.weight torch.Size([640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.norm1.bias torch.Size([640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.norm2.weight torch.Size([640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.norm2.bias torch.Size([640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.norm3.weight torch.Size([640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.norm3.bias torch.Size([640])
model.diffusion_model.input_blocks.5.1.proj_out.weight torch.Size([640, 640, 1, 1])
model.diffusion_model.input_blocks.5.1.proj_out.bias torch.Size([640])
model.diffusion_model.input_blocks.6.0.op.weight torch.Size([640, 640, 3, 3])
model.diffusion_model.input_blocks.6.0.op.bias torch.Size([640])
model.diffusion_model.input_blocks.7.0.in_layers.0.weight torch.Size([640])
model.diffusion_model.input_blocks.7.0.in_layers.0.bias torch.Size([640])
model.diffusion_model.input_blocks.7.0.in_layers.2.weight torch.Size([1280, 640, 3, 3])
model.diffusion_model.input_blocks.7.0.in_layers.2.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.0.emb_layers.1.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.7.0.emb_layers.1.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.0.out_layers.0.weight torch.Size([1280])
model.diffusion_model.input_blocks.7.0.out_layers.0.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.0.out_layers.3.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.input_blocks.7.0.out_layers.3.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.0.skip_connection.weight torch.Size([1280, 640, 1, 1])
model.diffusion_model.input_blocks.7.0.skip_connection.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.1.norm.weight torch.Size([1280])
model.diffusion_model.input_blocks.7.1.norm.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.1.proj_in.weight torch.Size([1280, 1280, 1, 1])
model.diffusion_model.input_blocks.7.1.proj_in.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn1.to_q.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn1.to_k.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn1.to_v.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([10240, 1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([10240])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.ff.net.2.weight torch.Size([1280, 5120])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.ff.net.2.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_q.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_k.weight torch.Size([1280, 768])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_v.weight torch.Size([1280, 768])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.norm1.weight torch.Size([1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.norm1.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.norm2.weight torch.Size([1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.norm2.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.norm3.weight torch.Size([1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.norm3.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.1.proj_out.weight torch.Size([1280, 1280, 1, 1])
model.diffusion_model.input_blocks.7.1.proj_out.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.0.in_layers.0.weight torch.Size([1280])
model.diffusion_model.input_blocks.8.0.in_layers.0.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.0.in_layers.2.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.input_blocks.8.0.in_layers.2.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.0.emb_layers.1.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.8.0.emb_layers.1.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.0.out_layers.0.weight torch.Size([1280])
model.diffusion_model.input_blocks.8.0.out_layers.0.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.0.out_layers.3.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.input_blocks.8.0.out_layers.3.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.1.norm.weight torch.Size([1280])
model.diffusion_model.input_blocks.8.1.norm.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.1.proj_in.weight torch.Size([1280, 1280, 1, 1])
model.diffusion_model.input_blocks.8.1.proj_in.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn1.to_q.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn1.to_k.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn1.to_v.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([10240, 1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([10240])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.ff.net.2.weight torch.Size([1280, 5120])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.ff.net.2.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_q.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_k.weight torch.Size([1280, 768])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_v.weight torch.Size([1280, 768])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.norm1.weight torch.Size([1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.norm1.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.norm2.weight torch.Size([1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.norm2.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.norm3.weight torch.Size([1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.norm3.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.1.proj_out.weight torch.Size([1280, 1280, 1, 1])
model.diffusion_model.input_blocks.8.1.proj_out.bias torch.Size([1280])
model.diffusion_model.input_blocks.9.0.op.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.input_blocks.9.0.op.bias torch.Size([1280])
model.diffusion_model.input_blocks.10.0.in_layers.0.weight torch.Size([1280])
model.diffusion_model.input_blocks.10.0.in_layers.0.bias torch.Size([1280])
model.diffusion_model.input_blocks.10.0.in_layers.2.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.input_blocks.10.0.in_layers.2.bias torch.Size([1280])
model.diffusion_model.input_blocks.10.0.emb_layers.1.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.10.0.emb_layers.1.bias torch.Size([1280])
model.diffusion_model.input_blocks.10.0.out_layers.0.weight torch.Size([1280])
model.diffusion_model.input_blocks.10.0.out_layers.0.bias torch.Size([1280])
model.diffusion_model.input_blocks.10.0.out_layers.3.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.input_blocks.10.0.out_layers.3.bias torch.Size([1280])
model.diffusion_model.input_blocks.11.0.in_layers.0.weight torch.Size([1280])
model.diffusion_model.input_blocks.11.0.in_layers.0.bias torch.Size([1280])
model.diffusion_model.input_blocks.11.0.in_layers.2.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.input_blocks.11.0.in_layers.2.bias torch.Size([1280])
model.diffusion_model.input_blocks.11.0.emb_layers.1.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.11.0.emb_layers.1.bias torch.Size([1280])
model.diffusion_model.input_blocks.11.0.out_layers.0.weight torch.Size([1280])
model.diffusion_model.input_blocks.11.0.out_layers.0.bias torch.Size([1280])
model.diffusion_model.input_blocks.11.0.out_layers.3.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.input_blocks.11.0.out_layers.3.bias torch.Size([1280])
model.diffusion_model.middle_block.0.in_layers.0.weight torch.Size([1280])
model.diffusion_model.middle_block.0.in_layers.0.bias torch.Size([1280])
model.diffusion_model.middle_block.0.in_layers.2.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.middle_block.0.in_layers.2.bias torch.Size([1280])
model.diffusion_model.middle_block.0.emb_layers.1.weight torch.Size([1280, 1280])
model.diffusion_model.middle_block.0.emb_layers.1.bias torch.Size([1280])
model.diffusion_model.middle_block.0.out_layers.0.weight torch.Size([1280])
model.diffusion_model.middle_block.0.out_layers.0.bias torch.Size([1280])
model.diffusion_model.middle_block.0.out_layers.3.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.middle_block.0.out_layers.3.bias torch.Size([1280])
model.diffusion_model.middle_block.1.norm.weight torch.Size([1280])
model.diffusion_model.middle_block.1.norm.bias torch.Size([1280])
model.diffusion_model.middle_block.1.proj_in.weight torch.Size([1280, 1280, 1, 1])
model.diffusion_model.middle_block.1.proj_in.bias torch.Size([1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.attn1.to_q.weight torch.Size([1280, 1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.attn1.to_k.weight torch.Size([1280, 1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.attn1.to_v.weight torch.Size([1280, 1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([1280, 1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([10240, 1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([10240])
model.diffusion_model.middle_block.1.transformer_blocks.0.ff.net.2.weight torch.Size([1280, 5120])
model.diffusion_model.middle_block.1.transformer_blocks.0.ff.net.2.bias torch.Size([1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_q.weight torch.Size([1280, 1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_k.weight torch.Size([1280, 768])
model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_v.weight torch.Size([1280, 768])
model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([1280, 1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.norm1.weight torch.Size([1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.norm1.bias torch.Size([1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.norm2.weight torch.Size([1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.norm2.bias torch.Size([1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.norm3.weight torch.Size([1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.norm3.bias torch.Size([1280])
model.diffusion_model.middle_block.1.proj_out.weight torch.Size([1280, 1280, 1, 1])
model.diffusion_model.middle_block.1.proj_out.bias torch.Size([1280])
model.diffusion_model.middle_block.2.in_layers.0.weight torch.Size([1280])
model.diffusion_model.middle_block.2.in_layers.0.bias torch.Size([1280])
model.diffusion_model.middle_block.2.in_layers.2.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.middle_block.2.in_layers.2.bias torch.Size([1280])
model.diffusion_model.middle_block.2.emb_layers.1.weight torch.Size([1280, 1280])
model.diffusion_model.middle_block.2.emb_layers.1.bias torch.Size([1280])
model.diffusion_model.middle_block.2.out_layers.0.weight torch.Size([1280])
model.diffusion_model.middle_block.2.out_layers.0.bias torch.Size([1280])
model.diffusion_model.middle_block.2.out_layers.3.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.middle_block.2.out_layers.3.bias torch.Size([1280])
model.diffusion_model.output_blocks.0.0.in_layers.0.weight torch.Size([2560])
model.diffusion_model.output_blocks.0.0.in_layers.0.bias torch.Size([2560])
model.diffusion_model.output_blocks.0.0.in_layers.2.weight torch.Size([1280, 2560, 3, 3])
model.diffusion_model.output_blocks.0.0.in_layers.2.bias torch.Size([1280])
model.diffusion_model.output_blocks.0.0.emb_layers.1.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.0.0.emb_layers.1.bias torch.Size([1280])
model.diffusion_model.output_blocks.0.0.out_layers.0.weight torch.Size([1280])
model.diffusion_model.output_blocks.0.0.out_layers.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.0.0.out_layers.3.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.output_blocks.0.0.out_layers.3.bias torch.Size([1280])
model.diffusion_model.output_blocks.0.0.skip_connection.weight torch.Size([1280, 2560, 1, 1])
model.diffusion_model.output_blocks.0.0.skip_connection.bias torch.Size([1280])
model.diffusion_model.output_blocks.1.0.in_layers.0.weight torch.Size([2560])
model.diffusion_model.output_blocks.1.0.in_layers.0.bias torch.Size([2560])
model.diffusion_model.output_blocks.1.0.in_layers.2.weight torch.Size([1280, 2560, 3, 3])
model.diffusion_model.output_blocks.1.0.in_layers.2.bias torch.Size([1280])
model.diffusion_model.output_blocks.1.0.emb_layers.1.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.1.0.emb_layers.1.bias torch.Size([1280])
model.diffusion_model.output_blocks.1.0.out_layers.0.weight torch.Size([1280])
model.diffusion_model.output_blocks.1.0.out_layers.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.1.0.out_layers.3.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.output_blocks.1.0.out_layers.3.bias torch.Size([1280])
model.diffusion_model.output_blocks.1.0.skip_connection.weight torch.Size([1280, 2560, 1, 1])
model.diffusion_model.output_blocks.1.0.skip_connection.bias torch.Size([1280])
model.diffusion_model.output_blocks.2.0.in_layers.0.weight torch.Size([2560])
model.diffusion_model.output_blocks.2.0.in_layers.0.bias torch.Size([2560])
model.diffusion_model.output_blocks.2.0.in_layers.2.weight torch.Size([1280, 2560, 3, 3])
model.diffusion_model.output_blocks.2.0.in_layers.2.bias torch.Size([1280])
model.diffusion_model.output_blocks.2.0.emb_layers.1.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.2.0.emb_layers.1.bias torch.Size([1280])
model.diffusion_model.output_blocks.2.0.out_layers.0.weight torch.Size([1280])
model.diffusion_model.output_blocks.2.0.out_layers.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.2.0.out_layers.3.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.output_blocks.2.0.out_layers.3.bias torch.Size([1280])
model.diffusion_model.output_blocks.2.0.skip_connection.weight torch.Size([1280, 2560, 1, 1])
model.diffusion_model.output_blocks.2.0.skip_connection.bias torch.Size([1280])
model.diffusion_model.output_blocks.2.1.conv.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.output_blocks.2.1.conv.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.0.in_layers.0.weight torch.Size([2560])
model.diffusion_model.output_blocks.3.0.in_layers.0.bias torch.Size([2560])
model.diffusion_model.output_blocks.3.0.in_layers.2.weight torch.Size([1280, 2560, 3, 3])
model.diffusion_model.output_blocks.3.0.in_layers.2.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.0.emb_layers.1.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.3.0.emb_layers.1.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.0.out_layers.0.weight torch.Size([1280])
model.diffusion_model.output_blocks.3.0.out_layers.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.0.out_layers.3.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.output_blocks.3.0.out_layers.3.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.0.skip_connection.weight torch.Size([1280, 2560, 1, 1])
model.diffusion_model.output_blocks.3.0.skip_connection.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.1.norm.weight torch.Size([1280])
model.diffusion_model.output_blocks.3.1.norm.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.1.proj_in.weight torch.Size([1280, 1280, 1, 1])
model.diffusion_model.output_blocks.3.1.proj_in.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn1.to_q.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn1.to_k.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn1.to_v.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([10240, 1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([10240])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.ff.net.2.weight torch.Size([1280, 5120])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.ff.net.2.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_q.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_k.weight torch.Size([1280, 768])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_v.weight torch.Size([1280, 768])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.norm1.weight torch.Size([1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.norm1.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.norm2.weight torch.Size([1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.norm2.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.norm3.weight torch.Size([1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.norm3.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.1.proj_out.weight torch.Size([1280, 1280, 1, 1])
model.diffusion_model.output_blocks.3.1.proj_out.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.0.in_layers.0.weight torch.Size([2560])
model.diffusion_model.output_blocks.4.0.in_layers.0.bias torch.Size([2560])
model.diffusion_model.output_blocks.4.0.in_layers.2.weight torch.Size([1280, 2560, 3, 3])
model.diffusion_model.output_blocks.4.0.in_layers.2.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.0.emb_layers.1.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.4.0.emb_layers.1.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.0.out_layers.0.weight torch.Size([1280])
model.diffusion_model.output_blocks.4.0.out_layers.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.0.out_layers.3.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.output_blocks.4.0.out_layers.3.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.0.skip_connection.weight torch.Size([1280, 2560, 1, 1])
model.diffusion_model.output_blocks.4.0.skip_connection.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.1.norm.weight torch.Size([1280])
model.diffusion_model.output_blocks.4.1.norm.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.1.proj_in.weight torch.Size([1280, 1280, 1, 1])
model.diffusion_model.output_blocks.4.1.proj_in.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn1.to_q.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn1.to_k.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn1.to_v.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([10240, 1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([10240])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.ff.net.2.weight torch.Size([1280, 5120])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.ff.net.2.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_q.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_k.weight torch.Size([1280, 768])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_v.weight torch.Size([1280, 768])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.norm1.weight torch.Size([1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.norm1.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.norm2.weight torch.Size([1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.norm2.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.norm3.weight torch.Size([1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.norm3.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.1.proj_out.weight torch.Size([1280, 1280, 1, 1])
model.diffusion_model.output_blocks.4.1.proj_out.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.0.in_layers.0.weight torch.Size([1920])
model.diffusion_model.output_blocks.5.0.in_layers.0.bias torch.Size([1920])
model.diffusion_model.output_blocks.5.0.in_layers.2.weight torch.Size([1280, 1920, 3, 3])
model.diffusion_model.output_blocks.5.0.in_layers.2.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.0.emb_layers.1.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.5.0.emb_layers.1.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.0.out_layers.0.weight torch.Size([1280])
model.diffusion_model.output_blocks.5.0.out_layers.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.0.out_layers.3.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.output_blocks.5.0.out_layers.3.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.0.skip_connection.weight torch.Size([1280, 1920, 1, 1])
model.diffusion_model.output_blocks.5.0.skip_connection.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.1.norm.weight torch.Size([1280])
model.diffusion_model.output_blocks.5.1.norm.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.1.proj_in.weight torch.Size([1280, 1280, 1, 1])
model.diffusion_model.output_blocks.5.1.proj_in.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn1.to_q.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn1.to_k.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn1.to_v.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([10240, 1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([10240])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.ff.net.2.weight torch.Size([1280, 5120])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.ff.net.2.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_q.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_k.weight torch.Size([1280, 768])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_v.weight torch.Size([1280, 768])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.norm1.weight torch.Size([1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.norm1.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.norm2.weight torch.Size([1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.norm2.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.norm3.weight torch.Size([1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.norm3.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.1.proj_out.weight torch.Size([1280, 1280, 1, 1])
model.diffusion_model.output_blocks.5.1.proj_out.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.2.conv.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.output_blocks.5.2.conv.bias torch.Size([1280])
model.diffusion_model.output_blocks.6.0.in_layers.0.weight torch.Size([1920])
model.diffusion_model.output_blocks.6.0.in_layers.0.bias torch.Size([1920])
model.diffusion_model.output_blocks.6.0.in_layers.2.weight torch.Size([640, 1920, 3, 3])
model.diffusion_model.output_blocks.6.0.in_layers.2.bias torch.Size([640])
model.diffusion_model.output_blocks.6.0.emb_layers.1.weight torch.Size([640, 1280])
model.diffusion_model.output_blocks.6.0.emb_layers.1.bias torch.Size([640])
model.diffusion_model.output_blocks.6.0.out_layers.0.weight torch.Size([640])
model.diffusion_model.output_blocks.6.0.out_layers.0.bias torch.Size([640])
model.diffusion_model.output_blocks.6.0.out_layers.3.weight torch.Size([640, 640, 3, 3])
model.diffusion_model.output_blocks.6.0.out_layers.3.bias torch.Size([640])
model.diffusion_model.output_blocks.6.0.skip_connection.weight torch.Size([640, 1920, 1, 1])
model.diffusion_model.output_blocks.6.0.skip_connection.bias torch.Size([640])
model.diffusion_model.output_blocks.6.1.norm.weight torch.Size([640])
model.diffusion_model.output_blocks.6.1.norm.bias torch.Size([640])
model.diffusion_model.output_blocks.6.1.proj_in.weight torch.Size([640, 640, 1, 1])
model.diffusion_model.output_blocks.6.1.proj_in.bias torch.Size([640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn1.to_q.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn1.to_k.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn1.to_v.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([5120, 640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([5120])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.ff.net.2.weight torch.Size([640, 2560])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.ff.net.2.bias torch.Size([640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_q.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_k.weight torch.Size([640, 768])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_v.weight torch.Size([640, 768])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.norm1.weight torch.Size([640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.norm1.bias torch.Size([640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.norm2.weight torch.Size([640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.norm2.bias torch.Size([640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.norm3.weight torch.Size([640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.norm3.bias torch.Size([640])
model.diffusion_model.output_blocks.6.1.proj_out.weight torch.Size([640, 640, 1, 1])
model.diffusion_model.output_blocks.6.1.proj_out.bias torch.Size([640])
model.diffusion_model.output_blocks.7.0.in_layers.0.weight torch.Size([1280])
model.diffusion_model.output_blocks.7.0.in_layers.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.7.0.in_layers.2.weight torch.Size([640, 1280, 3, 3])
model.diffusion_model.output_blocks.7.0.in_layers.2.bias torch.Size([640])
model.diffusion_model.output_blocks.7.0.emb_layers.1.weight torch.Size([640, 1280])
model.diffusion_model.output_blocks.7.0.emb_layers.1.bias torch.Size([640])
model.diffusion_model.output_blocks.7.0.out_layers.0.weight torch.Size([640])
model.diffusion_model.output_blocks.7.0.out_layers.0.bias torch.Size([640])
model.diffusion_model.output_blocks.7.0.out_layers.3.weight torch.Size([640, 640, 3, 3])
model.diffusion_model.output_blocks.7.0.out_layers.3.bias torch.Size([640])
model.diffusion_model.output_blocks.7.0.skip_connection.weight torch.Size([640, 1280, 1, 1])
model.diffusion_model.output_blocks.7.0.skip_connection.bias torch.Size([640])
model.diffusion_model.output_blocks.7.1.norm.weight torch.Size([640])
model.diffusion_model.output_blocks.7.1.norm.bias torch.Size([640])
model.diffusion_model.output_blocks.7.1.proj_in.weight torch.Size([640, 640, 1, 1])
model.diffusion_model.output_blocks.7.1.proj_in.bias torch.Size([640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn1.to_q.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn1.to_k.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn1.to_v.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([5120, 640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([5120])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.ff.net.2.weight torch.Size([640, 2560])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.ff.net.2.bias torch.Size([640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_q.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_k.weight torch.Size([640, 768])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_v.weight torch.Size([640, 768])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.norm1.weight torch.Size([640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.norm1.bias torch.Size([640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.norm2.weight torch.Size([640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.norm2.bias torch.Size([640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.norm3.weight torch.Size([640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.norm3.bias torch.Size([640])
model.diffusion_model.output_blocks.7.1.proj_out.weight torch.Size([640, 640, 1, 1])
model.diffusion_model.output_blocks.7.1.proj_out.bias torch.Size([640])
model.diffusion_model.output_blocks.8.0.in_layers.0.weight torch.Size([960])
model.diffusion_model.output_blocks.8.0.in_layers.0.bias torch.Size([960])
model.diffusion_model.output_blocks.8.0.in_layers.2.weight torch.Size([640, 960, 3, 3])
model.diffusion_model.output_blocks.8.0.in_layers.2.bias torch.Size([640])
model.diffusion_model.output_blocks.8.0.emb_layers.1.weight torch.Size([640, 1280])
model.diffusion_model.output_blocks.8.0.emb_layers.1.bias torch.Size([640])
model.diffusion_model.output_blocks.8.0.out_layers.0.weight torch.Size([640])
model.diffusion_model.output_blocks.8.0.out_layers.0.bias torch.Size([640])
model.diffusion_model.output_blocks.8.0.out_layers.3.weight torch.Size([640, 640, 3, 3])
model.diffusion_model.output_blocks.8.0.out_layers.3.bias torch.Size([640])
model.diffusion_model.output_blocks.8.0.skip_connection.weight torch.Size([640, 960, 1, 1])
model.diffusion_model.output_blocks.8.0.skip_connection.bias torch.Size([640])
model.diffusion_model.output_blocks.8.1.norm.weight torch.Size([640])
model.diffusion_model.output_blocks.8.1.norm.bias torch.Size([640])
model.diffusion_model.output_blocks.8.1.proj_in.weight torch.Size([640, 640, 1, 1])
model.diffusion_model.output_blocks.8.1.proj_in.bias torch.Size([640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn1.to_q.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn1.to_k.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn1.to_v.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([5120, 640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([5120])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.ff.net.2.weight torch.Size([640, 2560])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.ff.net.2.bias torch.Size([640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_q.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_k.weight torch.Size([640, 768])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_v.weight torch.Size([640, 768])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.norm1.weight torch.Size([640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.norm1.bias torch.Size([640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.norm2.weight torch.Size([640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.norm2.bias torch.Size([640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.norm3.weight torch.Size([640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.norm3.bias torch.Size([640])
model.diffusion_model.output_blocks.8.1.proj_out.weight torch.Size([640, 640, 1, 1])
model.diffusion_model.output_blocks.8.1.proj_out.bias torch.Size([640])
model.diffusion_model.output_blocks.8.2.conv.weight torch.Size([640, 640, 3, 3])
model.diffusion_model.output_blocks.8.2.conv.bias torch.Size([640])
model.diffusion_model.output_blocks.9.0.in_layers.0.weight torch.Size([960])
model.diffusion_model.output_blocks.9.0.in_layers.0.bias torch.Size([960])
model.diffusion_model.output_blocks.9.0.in_layers.2.weight torch.Size([320, 960, 3, 3])
model.diffusion_model.output_blocks.9.0.in_layers.2.bias torch.Size([320])
model.diffusion_model.output_blocks.9.0.emb_layers.1.weight torch.Size([320, 1280])
model.diffusion_model.output_blocks.9.0.emb_layers.1.bias torch.Size([320])
model.diffusion_model.output_blocks.9.0.out_layers.0.weight torch.Size([320])
model.diffusion_model.output_blocks.9.0.out_layers.0.bias torch.Size([320])
model.diffusion_model.output_blocks.9.0.out_layers.3.weight torch.Size([320, 320, 3, 3])
model.diffusion_model.output_blocks.9.0.out_layers.3.bias torch.Size([320])
model.diffusion_model.output_blocks.9.0.skip_connection.weight torch.Size([320, 960, 1, 1])
model.diffusion_model.output_blocks.9.0.skip_connection.bias torch.Size([320])
model.diffusion_model.output_blocks.9.1.norm.weight torch.Size([320])
model.diffusion_model.output_blocks.9.1.norm.bias torch.Size([320])
model.diffusion_model.output_blocks.9.1.proj_in.weight torch.Size([320, 320, 1, 1])
model.diffusion_model.output_blocks.9.1.proj_in.bias torch.Size([320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn1.to_q.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn1.to_k.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn1.to_v.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([2560, 320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([2560])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.ff.net.2.weight torch.Size([320, 1280])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.ff.net.2.bias torch.Size([320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_q.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_k.weight torch.Size([320, 768])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_v.weight torch.Size([320, 768])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.norm1.weight torch.Size([320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.norm1.bias torch.Size([320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.norm2.weight torch.Size([320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.norm2.bias torch.Size([320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.norm3.weight torch.Size([320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.norm3.bias torch.Size([320])
model.diffusion_model.output_blocks.9.1.proj_out.weight torch.Size([320, 320, 1, 1])
model.diffusion_model.output_blocks.9.1.proj_out.bias torch.Size([320])
model.diffusion_model.output_blocks.10.0.in_layers.0.weight torch.Size([640])
model.diffusion_model.output_blocks.10.0.in_layers.0.bias torch.Size([640])
model.diffusion_model.output_blocks.10.0.in_layers.2.weight torch.Size([320, 640, 3, 3])
model.diffusion_model.output_blocks.10.0.in_layers.2.bias torch.Size([320])
model.diffusion_model.output_blocks.10.0.emb_layers.1.weight torch.Size([320, 1280])
model.diffusion_model.output_blocks.10.0.emb_layers.1.bias torch.Size([320])
model.diffusion_model.output_blocks.10.0.out_layers.0.weight torch.Size([320])
model.diffusion_model.output_blocks.10.0.out_layers.0.bias torch.Size([320])
model.diffusion_model.output_blocks.10.0.out_layers.3.weight torch.Size([320, 320, 3, 3])
model.diffusion_model.output_blocks.10.0.out_layers.3.bias torch.Size([320])
model.diffusion_model.output_blocks.10.0.skip_connection.weight torch.Size([320, 640, 1, 1])
model.diffusion_model.output_blocks.10.0.skip_connection.bias torch.Size([320])
model.diffusion_model.output_blocks.10.1.norm.weight torch.Size([320])
model.diffusion_model.output_blocks.10.1.norm.bias torch.Size([320])
model.diffusion_model.output_blocks.10.1.proj_in.weight torch.Size([320, 320, 1, 1])
model.diffusion_model.output_blocks.10.1.proj_in.bias torch.Size([320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn1.to_q.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn1.to_k.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn1.to_v.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([2560, 320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([2560])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.ff.net.2.weight torch.Size([320, 1280])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.ff.net.2.bias torch.Size([320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_q.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_k.weight torch.Size([320, 768])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_v.weight torch.Size([320, 768])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.norm1.weight torch.Size([320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.norm1.bias torch.Size([320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.norm2.weight torch.Size([320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.norm2.bias torch.Size([320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.norm3.weight torch.Size([320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.norm3.bias torch.Size([320])
model.diffusion_model.output_blocks.10.1.proj_out.weight torch.Size([320, 320, 1, 1])
model.diffusion_model.output_blocks.10.1.proj_out.bias torch.Size([320])
model.diffusion_model.output_blocks.11.0.in_layers.0.weight torch.Size([640])
model.diffusion_model.output_blocks.11.0.in_layers.0.bias torch.Size([640])
model.diffusion_model.output_blocks.11.0.in_layers.2.weight torch.Size([320, 640, 3, 3])
model.diffusion_model.output_blocks.11.0.in_layers.2.bias torch.Size([320])
model.diffusion_model.output_blocks.11.0.emb_layers.1.weight torch.Size([320, 1280])
model.diffusion_model.output_blocks.11.0.emb_layers.1.bias torch.Size([320])
model.diffusion_model.output_blocks.11.0.out_layers.0.weight torch.Size([320])
model.diffusion_model.output_blocks.11.0.out_layers.0.bias torch.Size([320])
model.diffusion_model.output_blocks.11.0.out_layers.3.weight torch.Size([320, 320, 3, 3])
model.diffusion_model.output_blocks.11.0.out_layers.3.bias torch.Size([320])
model.diffusion_model.output_blocks.11.0.skip_connection.weight torch.Size([320, 640, 1, 1])
model.diffusion_model.output_blocks.11.0.skip_connection.bias torch.Size([320])
model.diffusion_model.output_blocks.11.1.norm.weight torch.Size([320])
model.diffusion_model.output_blocks.11.1.norm.bias torch.Size([320])
model.diffusion_model.output_blocks.11.1.proj_in.weight torch.Size([320, 320, 1, 1])
model.diffusion_model.output_blocks.11.1.proj_in.bias torch.Size([320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn1.to_q.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn1.to_k.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn1.to_v.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([2560, 320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([2560])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.ff.net.2.weight torch.Size([320, 1280])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.ff.net.2.bias torch.Size([320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_q.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_k.weight torch.Size([320, 768])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_v.weight torch.Size([320, 768])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.norm1.weight torch.Size([320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.norm1.bias torch.Size([320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.norm2.weight torch.Size([320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.norm2.bias torch.Size([320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.norm3.weight torch.Size([320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.norm3.bias torch.Size([320])
model.diffusion_model.output_blocks.11.1.proj_out.weight torch.Size([320, 320, 1, 1])
model.diffusion_model.output_blocks.11.1.proj_out.bias torch.Size([320])
model.diffusion_model.out.0.weight torch.Size([320])
model.diffusion_model.out.0.bias torch.Size([320])
model.diffusion_model.out.2.weight torch.Size([4, 320, 3, 3])
model.diffusion_model.out.2.bias torch.Size([4])
first_stage_model.encoder.conv_in.weight torch.Size([128, 3, 3, 3])
first_stage_model.encoder.conv_in.bias torch.Size([128])
first_stage_model.encoder.down.0.block.0.norm1.weight torch.Size([128])
first_stage_model.encoder.down.0.block.0.norm1.bias torch.Size([128])
first_stage_model.encoder.down.0.block.0.conv1.weight torch.Size([128, 128, 3, 3])
first_stage_model.encoder.down.0.block.0.conv1.bias torch.Size([128])
first_stage_model.encoder.down.0.block.0.norm2.weight torch.Size([128])
first_stage_model.encoder.down.0.block.0.norm2.bias torch.Size([128])
first_stage_model.encoder.down.0.block.0.conv2.weight torch.Size([128, 128, 3, 3])
first_stage_model.encoder.down.0.block.0.conv2.bias torch.Size([128])
first_stage_model.encoder.down.0.block.1.norm1.weight torch.Size([128])
first_stage_model.encoder.down.0.block.1.norm1.bias torch.Size([128])
first_stage_model.encoder.down.0.block.1.conv1.weight torch.Size([128, 128, 3, 3])
first_stage_model.encoder.down.0.block.1.conv1.bias torch.Size([128])
first_stage_model.encoder.down.0.block.1.norm2.weight torch.Size([128])
first_stage_model.encoder.down.0.block.1.norm2.bias torch.Size([128])
first_stage_model.encoder.down.0.block.1.conv2.weight torch.Size([128, 128, 3, 3])
first_stage_model.encoder.down.0.block.1.conv2.bias torch.Size([128])
first_stage_model.encoder.down.0.downsample.conv.weight torch.Size([128, 128, 3, 3])
first_stage_model.encoder.down.0.downsample.conv.bias torch.Size([128])
first_stage_model.encoder.down.1.block.0.norm1.weight torch.Size([128])
first_stage_model.encoder.down.1.block.0.norm1.bias torch.Size([128])
first_stage_model.encoder.down.1.block.0.conv1.weight torch.Size([256, 128, 3, 3])
first_stage_model.encoder.down.1.block.0.conv1.bias torch.Size([256])
first_stage_model.encoder.down.1.block.0.norm2.weight torch.Size([256])
first_stage_model.encoder.down.1.block.0.norm2.bias torch.Size([256])
first_stage_model.encoder.down.1.block.0.conv2.weight torch.Size([256, 256, 3, 3])
first_stage_model.encoder.down.1.block.0.conv2.bias torch.Size([256])
first_stage_model.encoder.down.1.block.0.nin_shortcut.weight torch.Size([256, 128, 1, 1])
first_stage_model.encoder.down.1.block.0.nin_shortcut.bias torch.Size([256])
first_stage_model.encoder.down.1.block.1.norm1.weight torch.Size([256])
first_stage_model.encoder.down.1.block.1.norm1.bias torch.Size([256])
first_stage_model.encoder.down.1.block.1.conv1.weight torch.Size([256, 256, 3, 3])
first_stage_model.encoder.down.1.block.1.conv1.bias torch.Size([256])
first_stage_model.encoder.down.1.block.1.norm2.weight torch.Size([256])
first_stage_model.encoder.down.1.block.1.norm2.bias torch.Size([256])
first_stage_model.encoder.down.1.block.1.conv2.weight torch.Size([256, 256, 3, 3])
first_stage_model.encoder.down.1.block.1.conv2.bias torch.Size([256])
first_stage_model.encoder.down.1.downsample.conv.weight torch.Size([256, 256, 3, 3])
first_stage_model.encoder.down.1.downsample.conv.bias torch.Size([256])
first_stage_model.encoder.down.2.block.0.norm1.weight torch.Size([256])
first_stage_model.encoder.down.2.block.0.norm1.bias torch.Size([256])
first_stage_model.encoder.down.2.block.0.conv1.weight torch.Size([512, 256, 3, 3])
first_stage_model.encoder.down.2.block.0.conv1.bias torch.Size([512])
first_stage_model.encoder.down.2.block.0.norm2.weight torch.Size([512])
first_stage_model.encoder.down.2.block.0.norm2.bias torch.Size([512])
first_stage_model.encoder.down.2.block.0.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.encoder.down.2.block.0.conv2.bias torch.Size([512])
first_stage_model.encoder.down.2.block.0.nin_shortcut.weight torch.Size([512, 256, 1, 1])
first_stage_model.encoder.down.2.block.0.nin_shortcut.bias torch.Size([512])
first_stage_model.encoder.down.2.block.1.norm1.weight torch.Size([512])
first_stage_model.encoder.down.2.block.1.norm1.bias torch.Size([512])
first_stage_model.encoder.down.2.block.1.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.encoder.down.2.block.1.conv1.bias torch.Size([512])
first_stage_model.encoder.down.2.block.1.norm2.weight torch.Size([512])
first_stage_model.encoder.down.2.block.1.norm2.bias torch.Size([512])
first_stage_model.encoder.down.2.block.1.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.encoder.down.2.block.1.conv2.bias torch.Size([512])
first_stage_model.encoder.down.2.downsample.conv.weight torch.Size([512, 512, 3, 3])
first_stage_model.encoder.down.2.downsample.conv.bias torch.Size([512])
first_stage_model.encoder.down.3.block.0.norm1.weight torch.Size([512])
first_stage_model.encoder.down.3.block.0.norm1.bias torch.Size([512])
first_stage_model.encoder.down.3.block.0.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.encoder.down.3.block.0.conv1.bias torch.Size([512])
first_stage_model.encoder.down.3.block.0.norm2.weight torch.Size([512])
first_stage_model.encoder.down.3.block.0.norm2.bias torch.Size([512])
first_stage_model.encoder.down.3.block.0.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.encoder.down.3.block.0.conv2.bias torch.Size([512])
first_stage_model.encoder.down.3.block.1.norm1.weight torch.Size([512])
first_stage_model.encoder.down.3.block.1.norm1.bias torch.Size([512])
first_stage_model.encoder.down.3.block.1.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.encoder.down.3.block.1.conv1.bias torch.Size([512])
first_stage_model.encoder.down.3.block.1.norm2.weight torch.Size([512])
first_stage_model.encoder.down.3.block.1.norm2.bias torch.Size([512])
first_stage_model.encoder.down.3.block.1.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.encoder.down.3.block.1.conv2.bias torch.Size([512])
first_stage_model.encoder.mid.block_1.norm1.weight torch.Size([512])
first_stage_model.encoder.mid.block_1.norm1.bias torch.Size([512])
first_stage_model.encoder.mid.block_1.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.encoder.mid.block_1.conv1.bias torch.Size([512])
first_stage_model.encoder.mid.block_1.norm2.weight torch.Size([512])
first_stage_model.encoder.mid.block_1.norm2.bias torch.Size([512])
first_stage_model.encoder.mid.block_1.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.encoder.mid.block_1.conv2.bias torch.Size([512])
first_stage_model.encoder.mid.attn_1.norm.weight torch.Size([512])
first_stage_model.encoder.mid.attn_1.norm.bias torch.Size([512])
first_stage_model.encoder.mid.attn_1.q.weight torch.Size([512, 512, 1, 1])
first_stage_model.encoder.mid.attn_1.q.bias torch.Size([512])
first_stage_model.encoder.mid.attn_1.k.weight torch.Size([512, 512, 1, 1])
first_stage_model.encoder.mid.attn_1.k.bias torch.Size([512])
first_stage_model.encoder.mid.attn_1.v.weight torch.Size([512, 512, 1, 1])
first_stage_model.encoder.mid.attn_1.v.bias torch.Size([512])
first_stage_model.encoder.mid.attn_1.proj_out.weight torch.Size([512, 512, 1, 1])
first_stage_model.encoder.mid.attn_1.proj_out.bias torch.Size([512])
first_stage_model.encoder.mid.block_2.norm1.weight torch.Size([512])
first_stage_model.encoder.mid.block_2.norm1.bias torch.Size([512])
first_stage_model.encoder.mid.block_2.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.encoder.mid.block_2.conv1.bias torch.Size([512])
first_stage_model.encoder.mid.block_2.norm2.weight torch.Size([512])
first_stage_model.encoder.mid.block_2.norm2.bias torch.Size([512])
first_stage_model.encoder.mid.block_2.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.encoder.mid.block_2.conv2.bias torch.Size([512])
first_stage_model.encoder.norm_out.weight torch.Size([512])
first_stage_model.encoder.norm_out.bias torch.Size([512])
first_stage_model.encoder.conv_out.weight torch.Size([8, 512, 3, 3])
first_stage_model.encoder.conv_out.bias torch.Size([8])
first_stage_model.decoder.conv_in.weight torch.Size([512, 4, 3, 3])
first_stage_model.decoder.conv_in.bias torch.Size([512])
first_stage_model.decoder.mid.block_1.norm1.weight torch.Size([512])
first_stage_model.decoder.mid.block_1.norm1.bias torch.Size([512])
first_stage_model.decoder.mid.block_1.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.mid.block_1.conv1.bias torch.Size([512])
first_stage_model.decoder.mid.block_1.norm2.weight torch.Size([512])
first_stage_model.decoder.mid.block_1.norm2.bias torch.Size([512])
first_stage_model.decoder.mid.block_1.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.mid.block_1.conv2.bias torch.Size([512])
first_stage_model.decoder.mid.attn_1.norm.weight torch.Size([512])
first_stage_model.decoder.mid.attn_1.norm.bias torch.Size([512])
first_stage_model.decoder.mid.attn_1.q.weight torch.Size([512, 512, 1, 1])
first_stage_model.decoder.mid.attn_1.q.bias torch.Size([512])
first_stage_model.decoder.mid.attn_1.k.weight torch.Size([512, 512, 1, 1])
first_stage_model.decoder.mid.attn_1.k.bias torch.Size([512])
first_stage_model.decoder.mid.attn_1.v.weight torch.Size([512, 512, 1, 1])
first_stage_model.decoder.mid.attn_1.v.bias torch.Size([512])
first_stage_model.decoder.mid.attn_1.proj_out.weight torch.Size([512, 512, 1, 1])
first_stage_model.decoder.mid.attn_1.proj_out.bias torch.Size([512])
first_stage_model.decoder.mid.block_2.norm1.weight torch.Size([512])
first_stage_model.decoder.mid.block_2.norm1.bias torch.Size([512])
first_stage_model.decoder.mid.block_2.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.mid.block_2.conv1.bias torch.Size([512])
first_stage_model.decoder.mid.block_2.norm2.weight torch.Size([512])
first_stage_model.decoder.mid.block_2.norm2.bias torch.Size([512])
first_stage_model.decoder.mid.block_2.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.mid.block_2.conv2.bias torch.Size([512])
first_stage_model.decoder.up.0.block.0.norm1.weight torch.Size([256])
first_stage_model.decoder.up.0.block.0.norm1.bias torch.Size([256])
first_stage_model.decoder.up.0.block.0.conv1.weight torch.Size([128, 256, 3, 3])
first_stage_model.decoder.up.0.block.0.conv1.bias torch.Size([128])
first_stage_model.decoder.up.0.block.0.norm2.weight torch.Size([128])
first_stage_model.decoder.up.0.block.0.norm2.bias torch.Size([128])
first_stage_model.decoder.up.0.block.0.conv2.weight torch.Size([128, 128, 3, 3])
first_stage_model.decoder.up.0.block.0.conv2.bias torch.Size([128])
first_stage_model.decoder.up.0.block.0.nin_shortcut.weight torch.Size([128, 256, 1, 1])
first_stage_model.decoder.up.0.block.0.nin_shortcut.bias torch.Size([128])
first_stage_model.decoder.up.0.block.1.norm1.weight torch.Size([128])
first_stage_model.decoder.up.0.block.1.norm1.bias torch.Size([128])
first_stage_model.decoder.up.0.block.1.conv1.weight torch.Size([128, 128, 3, 3])
first_stage_model.decoder.up.0.block.1.conv1.bias torch.Size([128])
first_stage_model.decoder.up.0.block.1.norm2.weight torch.Size([128])
first_stage_model.decoder.up.0.block.1.norm2.bias torch.Size([128])
first_stage_model.decoder.up.0.block.1.conv2.weight torch.Size([128, 128, 3, 3])
first_stage_model.decoder.up.0.block.1.conv2.bias torch.Size([128])
first_stage_model.decoder.up.0.block.2.norm1.weight torch.Size([128])
first_stage_model.decoder.up.0.block.2.norm1.bias torch.Size([128])
first_stage_model.decoder.up.0.block.2.conv1.weight torch.Size([128, 128, 3, 3])
first_stage_model.decoder.up.0.block.2.conv1.bias torch.Size([128])
first_stage_model.decoder.up.0.block.2.norm2.weight torch.Size([128])
first_stage_model.decoder.up.0.block.2.norm2.bias torch.Size([128])
first_stage_model.decoder.up.0.block.2.conv2.weight torch.Size([128, 128, 3, 3])
first_stage_model.decoder.up.0.block.2.conv2.bias torch.Size([128])
first_stage_model.decoder.up.1.block.0.norm1.weight torch.Size([512])
first_stage_model.decoder.up.1.block.0.norm1.bias torch.Size([512])
first_stage_model.decoder.up.1.block.0.conv1.weight torch.Size([256, 512, 3, 3])
first_stage_model.decoder.up.1.block.0.conv1.bias torch.Size([256])
first_stage_model.decoder.up.1.block.0.norm2.weight torch.Size([256])
first_stage_model.decoder.up.1.block.0.norm2.bias torch.Size([256])
first_stage_model.decoder.up.1.block.0.conv2.weight torch.Size([256, 256, 3, 3])
first_stage_model.decoder.up.1.block.0.conv2.bias torch.Size([256])
first_stage_model.decoder.up.1.block.0.nin_shortcut.weight torch.Size([256, 512, 1, 1])
first_stage_model.decoder.up.1.block.0.nin_shortcut.bias torch.Size([256])
first_stage_model.decoder.up.1.block.1.norm1.weight torch.Size([256])
first_stage_model.decoder.up.1.block.1.norm1.bias torch.Size([256])
first_stage_model.decoder.up.1.block.1.conv1.weight torch.Size([256, 256, 3, 3])
first_stage_model.decoder.up.1.block.1.conv1.bias torch.Size([256])
first_stage_model.decoder.up.1.block.1.norm2.weight torch.Size([256])
first_stage_model.decoder.up.1.block.1.norm2.bias torch.Size([256])
first_stage_model.decoder.up.1.block.1.conv2.weight torch.Size([256, 256, 3, 3])
first_stage_model.decoder.up.1.block.1.conv2.bias torch.Size([256])
first_stage_model.decoder.up.1.block.2.norm1.weight torch.Size([256])
first_stage_model.decoder.up.1.block.2.norm1.bias torch.Size([256])
first_stage_model.decoder.up.1.block.2.conv1.weight torch.Size([256, 256, 3, 3])
first_stage_model.decoder.up.1.block.2.conv1.bias torch.Size([256])
first_stage_model.decoder.up.1.block.2.norm2.weight torch.Size([256])
first_stage_model.decoder.up.1.block.2.norm2.bias torch.Size([256])
first_stage_model.decoder.up.1.block.2.conv2.weight torch.Size([256, 256, 3, 3])
first_stage_model.decoder.up.1.block.2.conv2.bias torch.Size([256])
first_stage_model.decoder.up.1.upsample.conv.weight torch.Size([256, 256, 3, 3])
first_stage_model.decoder.up.1.upsample.conv.bias torch.Size([256])
first_stage_model.decoder.up.2.block.0.norm1.weight torch.Size([512])
first_stage_model.decoder.up.2.block.0.norm1.bias torch.Size([512])
first_stage_model.decoder.up.2.block.0.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.2.block.0.conv1.bias torch.Size([512])
first_stage_model.decoder.up.2.block.0.norm2.weight torch.Size([512])
first_stage_model.decoder.up.2.block.0.norm2.bias torch.Size([512])
first_stage_model.decoder.up.2.block.0.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.2.block.0.conv2.bias torch.Size([512])
first_stage_model.decoder.up.2.block.1.norm1.weight torch.Size([512])
first_stage_model.decoder.up.2.block.1.norm1.bias torch.Size([512])
first_stage_model.decoder.up.2.block.1.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.2.block.1.conv1.bias torch.Size([512])
first_stage_model.decoder.up.2.block.1.norm2.weight torch.Size([512])
first_stage_model.decoder.up.2.block.1.norm2.bias torch.Size([512])
first_stage_model.decoder.up.2.block.1.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.2.block.1.conv2.bias torch.Size([512])
first_stage_model.decoder.up.2.block.2.norm1.weight torch.Size([512])
first_stage_model.decoder.up.2.block.2.norm1.bias torch.Size([512])
first_stage_model.decoder.up.2.block.2.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.2.block.2.conv1.bias torch.Size([512])
first_stage_model.decoder.up.2.block.2.norm2.weight torch.Size([512])
first_stage_model.decoder.up.2.block.2.norm2.bias torch.Size([512])
first_stage_model.decoder.up.2.block.2.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.2.block.2.conv2.bias torch.Size([512])
first_stage_model.decoder.up.2.upsample.conv.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.2.upsample.conv.bias torch.Size([512])
first_stage_model.decoder.up.3.block.0.norm1.weight torch.Size([512])
first_stage_model.decoder.up.3.block.0.norm1.bias torch.Size([512])
first_stage_model.decoder.up.3.block.0.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.3.block.0.conv1.bias torch.Size([512])
first_stage_model.decoder.up.3.block.0.norm2.weight torch.Size([512])
first_stage_model.decoder.up.3.block.0.norm2.bias torch.Size([512])
first_stage_model.decoder.up.3.block.0.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.3.block.0.conv2.bias torch.Size([512])
first_stage_model.decoder.up.3.block.1.norm1.weight torch.Size([512])
first_stage_model.decoder.up.3.block.1.norm1.bias torch.Size([512])
first_stage_model.decoder.up.3.block.1.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.3.block.1.conv1.bias torch.Size([512])
first_stage_model.decoder.up.3.block.1.norm2.weight torch.Size([512])
first_stage_model.decoder.up.3.block.1.norm2.bias torch.Size([512])
first_stage_model.decoder.up.3.block.1.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.3.block.1.conv2.bias torch.Size([512])
first_stage_model.decoder.up.3.block.2.norm1.weight torch.Size([512])
first_stage_model.decoder.up.3.block.2.norm1.bias torch.Size([512])
first_stage_model.decoder.up.3.block.2.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.3.block.2.conv1.bias torch.Size([512])
first_stage_model.decoder.up.3.block.2.norm2.weight torch.Size([512])
first_stage_model.decoder.up.3.block.2.norm2.bias torch.Size([512])
first_stage_model.decoder.up.3.block.2.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.3.block.2.conv2.bias torch.Size([512])
first_stage_model.decoder.up.3.upsample.conv.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.3.upsample.conv.bias torch.Size([512])
first_stage_model.decoder.norm_out.weight torch.Size([128])
first_stage_model.decoder.norm_out.bias torch.Size([128])
first_stage_model.decoder.conv_out.weight torch.Size([3, 128, 3, 3])
first_stage_model.decoder.conv_out.bias torch.Size([3])
first_stage_model.quant_conv.weight torch.Size([8, 8, 1, 1])
first_stage_model.quant_conv.bias torch.Size([8])
first_stage_model.post_quant_conv.weight torch.Size([4, 4, 1, 1])
first_stage_model.post_quant_conv.bias torch.Size([4])
cond_stage_model.transformer.text_model.embeddings.token_embedding.weight torch.Size([49408, 768])
cond_stage_model.transformer.text_model.embeddings.position_embedding.weight torch.Size([77, 768])
cond_stage_model.transformer.text_model.encoder.layers.0.self_attn.k_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.0.self_attn.k_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.0.self_attn.v_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.0.self_attn.v_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.0.self_attn.q_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.0.self_attn.q_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.0.self_attn.out_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.0.self_attn.out_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.0.layer_norm1.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.0.layer_norm1.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.0.mlp.fc1.weight torch.Size([3072, 768])
cond_stage_model.transformer.text_model.encoder.layers.0.mlp.fc1.bias torch.Size([3072])
cond_stage_model.transformer.text_model.encoder.layers.0.mlp.fc2.weight torch.Size([768, 3072])
cond_stage_model.transformer.text_model.encoder.layers.0.mlp.fc2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.0.layer_norm2.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.0.layer_norm2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.1.self_attn.k_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.1.self_attn.k_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.1.self_attn.v_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.1.self_attn.v_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.1.self_attn.q_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.1.self_attn.q_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.1.self_attn.out_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.1.self_attn.out_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.1.layer_norm1.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.1.layer_norm1.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.1.mlp.fc1.weight torch.Size([3072, 768])
cond_stage_model.transformer.text_model.encoder.layers.1.mlp.fc1.bias torch.Size([3072])
cond_stage_model.transformer.text_model.encoder.layers.1.mlp.fc2.weight torch.Size([768, 3072])
cond_stage_model.transformer.text_model.encoder.layers.1.mlp.fc2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.1.layer_norm2.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.1.layer_norm2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.2.self_attn.k_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.2.self_attn.k_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.2.self_attn.v_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.2.self_attn.v_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.2.self_attn.q_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.2.self_attn.q_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.2.self_attn.out_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.2.self_attn.out_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.2.layer_norm1.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.2.layer_norm1.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.2.mlp.fc1.weight torch.Size([3072, 768])
cond_stage_model.transformer.text_model.encoder.layers.2.mlp.fc1.bias torch.Size([3072])
cond_stage_model.transformer.text_model.encoder.layers.2.mlp.fc2.weight torch.Size([768, 3072])
cond_stage_model.transformer.text_model.encoder.layers.2.mlp.fc2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.2.layer_norm2.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.2.layer_norm2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.3.self_attn.k_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.3.self_attn.k_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.3.self_attn.v_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.3.self_attn.v_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.3.self_attn.q_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.3.self_attn.q_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.3.self_attn.out_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.3.self_attn.out_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.3.layer_norm1.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.3.layer_norm1.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.3.mlp.fc1.weight torch.Size([3072, 768])
cond_stage_model.transformer.text_model.encoder.layers.3.mlp.fc1.bias torch.Size([3072])
cond_stage_model.transformer.text_model.encoder.layers.3.mlp.fc2.weight torch.Size([768, 3072])
cond_stage_model.transformer.text_model.encoder.layers.3.mlp.fc2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.3.layer_norm2.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.3.layer_norm2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.4.self_attn.k_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.4.self_attn.k_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.4.self_attn.v_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.4.self_attn.v_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.4.self_attn.q_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.4.self_attn.q_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.4.self_attn.out_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.4.self_attn.out_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.4.layer_norm1.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.4.layer_norm1.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.4.mlp.fc1.weight torch.Size([3072, 768])
cond_stage_model.transformer.text_model.encoder.layers.4.mlp.fc1.bias torch.Size([3072])
cond_stage_model.transformer.text_model.encoder.layers.4.mlp.fc2.weight torch.Size([768, 3072])
cond_stage_model.transformer.text_model.encoder.layers.4.mlp.fc2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.4.layer_norm2.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.4.layer_norm2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.5.self_attn.k_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.5.self_attn.k_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.5.self_attn.v_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.5.self_attn.v_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.5.self_attn.q_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.5.self_attn.q_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.5.self_attn.out_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.5.self_attn.out_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.5.layer_norm1.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.5.layer_norm1.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.5.mlp.fc1.weight torch.Size([3072, 768])
cond_stage_model.transformer.text_model.encoder.layers.5.mlp.fc1.bias torch.Size([3072])
cond_stage_model.transformer.text_model.encoder.layers.5.mlp.fc2.weight torch.Size([768, 3072])
cond_stage_model.transformer.text_model.encoder.layers.5.mlp.fc2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.5.layer_norm2.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.5.layer_norm2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.6.self_attn.k_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.6.self_attn.k_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.6.self_attn.v_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.6.self_attn.v_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.6.self_attn.q_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.6.self_attn.q_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.6.self_attn.out_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.6.self_attn.out_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.6.layer_norm1.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.6.layer_norm1.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.6.mlp.fc1.weight torch.Size([3072, 768])
cond_stage_model.transformer.text_model.encoder.layers.6.mlp.fc1.bias torch.Size([3072])
cond_stage_model.transformer.text_model.encoder.layers.6.mlp.fc2.weight torch.Size([768, 3072])
cond_stage_model.transformer.text_model.encoder.layers.6.mlp.fc2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.6.layer_norm2.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.6.layer_norm2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.7.self_attn.k_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.7.self_attn.k_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.7.self_attn.v_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.7.self_attn.v_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.7.self_attn.q_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.7.self_attn.q_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.7.self_attn.out_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.7.self_attn.out_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.7.layer_norm1.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.7.layer_norm1.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.7.mlp.fc1.weight torch.Size([3072, 768])
cond_stage_model.transformer.text_model.encoder.layers.7.mlp.fc1.bias torch.Size([3072])
cond_stage_model.transformer.text_model.encoder.layers.7.mlp.fc2.weight torch.Size([768, 3072])
cond_stage_model.transformer.text_model.encoder.layers.7.mlp.fc2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.7.layer_norm2.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.7.layer_norm2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.8.self_attn.k_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.8.self_attn.k_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.8.self_attn.v_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.8.self_attn.v_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.8.self_attn.q_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.8.self_attn.q_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.8.self_attn.out_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.8.self_attn.out_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.8.layer_norm1.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.8.layer_norm1.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.8.mlp.fc1.weight torch.Size([3072, 768])
cond_stage_model.transformer.text_model.encoder.layers.8.mlp.fc1.bias torch.Size([3072])
cond_stage_model.transformer.text_model.encoder.layers.8.mlp.fc2.weight torch.Size([768, 3072])
cond_stage_model.transformer.text_model.encoder.layers.8.mlp.fc2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.8.layer_norm2.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.8.layer_norm2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.9.self_attn.k_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.9.self_attn.k_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.9.self_attn.v_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.9.self_attn.v_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.9.self_attn.q_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.9.self_attn.q_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.9.self_attn.out_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.9.self_attn.out_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.9.layer_norm1.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.9.layer_norm1.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.9.mlp.fc1.weight torch.Size([3072, 768])
cond_stage_model.transformer.text_model.encoder.layers.9.mlp.fc1.bias torch.Size([3072])
cond_stage_model.transformer.text_model.encoder.layers.9.mlp.fc2.weight torch.Size([768, 3072])
cond_stage_model.transformer.text_model.encoder.layers.9.mlp.fc2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.9.layer_norm2.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.9.layer_norm2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.10.self_attn.k_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.10.self_attn.k_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.10.self_attn.v_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.10.self_attn.v_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.10.self_attn.q_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.10.self_attn.q_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.10.self_attn.out_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.10.self_attn.out_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.10.layer_norm1.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.10.layer_norm1.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.10.mlp.fc1.weight torch.Size([3072, 768])
cond_stage_model.transformer.text_model.encoder.layers.10.mlp.fc1.bias torch.Size([3072])
cond_stage_model.transformer.text_model.encoder.layers.10.mlp.fc2.weight torch.Size([768, 3072])
cond_stage_model.transformer.text_model.encoder.layers.10.mlp.fc2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.10.layer_norm2.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.10.layer_norm2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.11.self_attn.k_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.11.self_attn.k_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.11.self_attn.v_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.11.self_attn.v_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.11.self_attn.q_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.11.self_attn.q_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.11.self_attn.out_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.11.self_attn.out_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.11.layer_norm1.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.11.layer_norm1.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.11.mlp.fc1.weight torch.Size([3072, 768])
cond_stage_model.transformer.text_model.encoder.layers.11.mlp.fc1.bias torch.Size([3072])
cond_stage_model.transformer.text_model.encoder.layers.11.mlp.fc2.weight torch.Size([768, 3072])
cond_stage_model.transformer.text_model.encoder.layers.11.mlp.fc2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.11.layer_norm2.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.11.layer_norm2.bias torch.Size([768])
cond_stage_model.transformer.text_model.final_layer_norm.weight torch.Size([768])
cond_stage_model.transformer.text_model.final_layer_norm.bias torch.Size([768])
```