apollo-research
/

gpt2_noLN

@@ -8,6 +8,11 @@ tags: []
 This is a gpt2-small model with LayerNorm fine-tuned out.
 The model was fine-tuned on OpenWebText for ~500M tokens (1000 iterations of batch size ~488 at 1024 context length) while gradually disableing LayerNorm layers.
 The model is a `GPT2LMHeadModel` (to avoid requiring `trust_remote_code`) which technically contains LayerNorm blocks.
 However, the epsilon values are all set to 1e12 so that the LayerNorm has no effect. The LN scale is set to 1e6 (to counter the 1e12 epsilon), and the bias to 0.
@@ -15,8 +20,80 @@ The final LayerNorm also has 1e12 as epsilon, but non-unity weights and biases.
 thus the LN parameters cannot be folded into that matrix. You can completely remove all LNs by simply replacing `ln_1` and `ln_2` modules with identities, and replacing
 `ln_f` with modifications to the unembed matrix and unembed bias.
-Available versions:
-* v2 (default): Trained for 1000 iterations in a single training run
-* v1: Trained for 900 iterations, with multiple interrup, modify LNs, and resume steps
-The training script will be published shortly.

 This is a gpt2-small model with LayerNorm fine-tuned out.
 The model was fine-tuned on OpenWebText for ~500M tokens (1000 iterations of batch size ~488 at 1024 context length) while gradually disableing LayerNorm layers.
+For details see [here](https://www.lesswrong.com/posts/THzcKKQd4oWkg4dSP/you-can-remove-gpt2-s-layernorm-by-fine-tuning-for-an-hour) and the upcoming paper.
+Available versions:
+* v2 (default): Trained for 1000 iterations in a single training run
+* v1: Trained for 900 iterations, with multiple interrup, modify LNs, and resume steps
 The model is a `GPT2LMHeadModel` (to avoid requiring `trust_remote_code`) which technically contains LayerNorm blocks.
 However, the epsilon values are all set to 1e12 so that the LayerNorm has no effect. The LN scale is set to 1e6 (to counter the 1e12 epsilon), and the bias to 0.
 thus the LN parameters cannot be folded into that matrix. You can completely remove all LNs by simply replacing `ln_1` and `ln_2` modules with identities, and replacing
 `ln_f` with modifications to the unembed matrix and unembed bias.
+## TransformerLens loading code
+```python
+import torch
+from transformers import GPT2LMHeadModel
+from transformer_lens import HookedTransformer
+model = GPT2LMHeadModel.from_pretrained("apollo-research/gpt2_noLN").to("cpu")
+hooked_model = HookedTransformer.from_pretrained("gpt2", hf_model=model, fold_ln=False, center_unembed=False).to("cpu")
+# Kill the LayerNorms because TransformerLens overwrites eps
+for block in hooked_model.blocks:
+    block.ln1.eps = 1e12
+    block.ln2.eps = 1e12
+hooked_model.ln_final.eps = 1e12
+```
+Or with LNs properly replaced by identities:
+```python
+import torch
+from transformers import GPT2LMHeadModel
+from transformer_lens import HookedTransformer
+model = GPT2LMHeadModel.from_pretrained("apollo-research/gpt2_noLN").to("cpu")
+# Undo my hacky LayerNorm removal
+for block in model.transformer.h:
+    block.ln_1.weight.data = block.ln_1.weight.data / 1e6
+    block.ln_1.eps = 1e-5
+    block.ln_2.weight.data = block.ln_2.weight.data / 1e6
+    block.ln_2.eps = 1e-5
+model.transformer.ln_f.weight.data = model.transformer.ln_f.weight.data / 1e6
+model.transformer.ln_f.eps = 1e-5
+# Properly replace LayerNorms by Identities
+class HookedTransformerNoLN(HookedTransformer):
+    def removeLN(self):
+        for i in range(len(self.blocks)):
+            self.blocks[i].ln1 = torch.nn.Identity()
+            self.blocks[i].ln2 = torch.nn.Identity()
+        self.ln_final = torch.nn.Identity()
+hooked_model = HookedTransformerNoLN.from_pretrained("gpt2", hf_model=model, fold_ln=True, center_unembed=False).to("cpu")
+hooked_model.removeLN()
+```
+## NNSight loading code
+Copy-pasted from [Logan Riggs' comment](https://www.lesswrong.com/posts/THzcKKQd4oWkg4dSP/you-can-remove-gpt2-s-layernorm-by-fine-tuning-for-an-hour?commentId=Gcq8wic9WmdnqM2Fm), based on code by Caden.
+```python
+import torch
+from transformers import GPT2LMHeadModel
+from transformer_lens import HookedTransformer
+from nnsight.models.UnifiedTransformer import UnifiedTransformer
+model = GPT2LMHeadModel.from_pretrained("apollo-research/gpt2_noLN").to("cpu")
+# Undo my hacky LayerNorm removal
+for block in model.transformer.h:
+    block.ln_1.weight.data = block.ln_1.weight.data / 1e6
+    block.ln_1.eps = 1e-5
+    block.ln_2.weight.data = block.ln_2.weight.data / 1e6
+    block.ln_2.eps = 1e-5
+model.transformer.ln_f.weight.data = model.transformer.ln_f.weight.data / 1e6
+model.transformer.ln_f.eps = 1e-5
+# Properly replace LayerNorms by Identities
+def removeLN(transformer_lens_model):
+    for i in range(len(transformer_lens_model.blocks)):
+        transformer_lens_model.blocks[i].ln1 = torch.nn.Identity()
+        transformer_lens_model.blocks[i].ln2 = torch.nn.Identity()
+    transformer_lens_model.ln_final = torch.nn.Identity()
+hooked_model = HookedTransformer.from_pretrained("gpt2", hf_model=model, fold_ln=True, center_unembed=False).to("cpu")
+removeLN(hooked_model)
+model_nnsight = UnifiedTransformer(model="gpt2", hf_model=model, fold_ln=True, center_unembed=False).to("cpu")
+removeLN(model_nnsight)
+```