SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders Paper • 2501.18052 • Published 9 days ago • 6
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders Paper • 2410.22366 • Published Oct 28, 2024 • 77
🔍 Interpretability & Analysis of LMs Collection Outstanding research in LM interpretability and evaluation, summarized • 101 items • Updated 3 days ago • 97