Open supervised dictionary learning models and datasets for Gemma 2 2B and 9B.
pyvene
university
AI & ML interests
interpretability
Recent Activity
View all activity
Organization Card
Who are we?
We are a group of hackers from Stanford's NLP group, and we are interested in LLM interpretability.
pyvene
is where we started, which stands for pytorch model intervenetion.
Resources
Supervised dictionary learning models (SDLs) and datasets releases for Gemma 2 2B and 9B: AxBench Collection
.
Benchmark interpretability methods at scale (AxBench) library: AxBench
.
Representation finetuning (ReFT) library: pyreft
.
PyTorch model intervention library: pyvene
.
Collections
1
spaces
6
models
12
pyvene/gemma-reft-2b-it-res
Updated
pyvene/gemma-reft-9b-it-res-generator
Updated
pyvene/gemma-reft-2b-it-res-generator
Updated
pyvene/gemma-diffmean-9b-it-res
Updated
pyvene/gemma-diffmean-2b-it-res
Updated
pyvene/gemma-reft-9b-it-res
Updated
pyvene/reft_golden_gate_bridge_llama3
Updated
•
30
pyvene/reft_goody2_llama3
Updated
•
5
pyvene/reft_emoji_chat_llama3
Updated
•
6
pyvene/reft_emoji_chat
Updated
•
12
•
2