53 150 583

Gabriele Sarti

gsarti

https://gsarti.com

AI & ML interests

Interpretability for generative language models

Recent Activity

upvoted an article about 7 hours ago

Open-R1: Update #1

updated a collection about 12 hours ago

🔍 Interpretability & Analysis of LMs

upvoted a paper about 12 hours ago

Partially Rewriting a Transformer in Natural Language

View all activity

Organizations

gsarti's activity

upvoted an article about 7 hours ago

Article

Open-R1: Update #1

•

2 days ago

• 180

upvoted 2 papers about 12 hours ago

Partially Rewriting a Transformer in Natural Language

Paper • 2501.18838 • Published 4 days ago • 1

Sparse Autoencoders Trained on the Same Data Learn Different Features

Paper • 2501.16615 • Published 7 days ago • 1

upvoted a paper about 14 hours ago

AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders

Paper • 2501.17148 • Published 6 days ago • 1

upvoted a paper 5 days ago

Open Problems in Mechanistic Interpretability

Paper • 2501.16496 • Published 7 days ago • 15

upvoted a collection 10 days ago

Gemma Neogenesis 💎🌍🇮🇹

Collection

Datasets and models for Neogenesis: Post-training recipe for improving Gemma 2 for a specific language. Notebook: https://t.ly/iuKdy • 11 items • Updated 15 days ago • 5

upvoted a paper 20 days ago

Enhancing Automated Interpretability with Output-Centric Feature Descriptions

Paper • 2501.08319 • Published 20 days ago • 10

upvoted a paper 21 days ago

Open Problems in Machine Unlearning for AI Safety

Paper • 2501.04952 • Published 26 days ago • 1

upvoted a paper 26 days ago

Sometimes I am a Tree: Data Drives Unstable Hierarchical Generalization

Paper • 2412.04619 • Published Dec 5, 2024 • 1

upvoted 2 papers about 1 month ago

Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models

Paper • 2412.16247 • Published Dec 20, 2024 • 1

Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations with MDL-SAEs

Paper • 2410.11179 • Published Oct 15, 2024 • 1

upvoted 6 papers about 2 months ago

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 345

Incremental Sentence Processing Mechanisms in Autoregressive Transformer Language Models

Paper • 2412.05353 • Published Dec 6, 2024 • 1

The LLM Language Network: A Neuroscientific Approach for Identifying Causally Task-Relevant Units

Paper • 2411.02280 • Published Nov 4, 2024 • 1

upvoted an article 2 months ago

Article

EuroLLM-9B

•

Dec 2, 2024

• 108

upvoted 2 collections 2 months ago

NLI Eval Datasets

Collection

A curated collection of NLI evaluation datasets. Each dataset is exactly as originally proposed • 19 items • Updated Nov 12, 2024 • 3

🇮🇹👓 LLaVA-NDiNO

Collection

HF Collection for the models of the paper "LLaVA-NDiNO: Empowering LLMs with Multimodality for the Italian Language" • 7 items • Updated Oct 20, 2024 • 3