We trained 12 Sparse Autoencoders on the Residual Stream of GPT2-small. Each of these contains ~ 25k features as we used an expansion factor of 32 and the residual stream dimension of GPT2 has 768 dimensions. We trained with an L1 coefficient of 8e-5 and learning rate of 4e-4 for 300 Million tokens, storing a buffer of ~500k tokens from OpenWebText which is refilled and shuffled whenever 50% of the tokens are used. To avoid dead neurons, we use ghost gradients. Our encoder/decoder weights are untied but we do use a tied decoder bias initialized at the geometric median per Bricken et al.
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.