Papers
arxiv:2410.23054

Controlling Language and Diffusion Models by Transporting Activations

Published on Oct 30
· Submitted by prlz77 on Nov 6
#3 Paper of the day
Authors:
,
,
,
,
,
,

Abstract

The increasing capabilities of large generative models and their ever more widespread deployment have raised concerns about their reliability, safety, and potential misuse. To address these issues, recent works have proposed to control model generation by steering model activations in order to effectively induce or prevent the emergence of concepts or behaviors in the generated output. In this paper we introduce Activation Transport (AcT), a general framework to steer activations guided by optimal transport theory that generalizes many previous activation-steering works. AcT is modality-agnostic and provides fine-grained control over the model behavior with negligible computational overhead, while minimally impacting model abilities. We experimentally show the effectiveness and versatility of our approach by addressing key challenges in large language models (LLMs) and text-to-image diffusion models (T2Is). For LLMs, we show that AcT can effectively mitigate toxicity, induce arbitrary concepts, and increase their truthfulness. In T2Is, we show how AcT enables fine-grained style control and concept negation.

Community

Paper submitter
•
edited 7 days ago

Happy to present new work from our team at Apple MLR! We provide principled tools for fine-grained control of generative models without requiring any fine-tuning nor LoRA :)

In this work, we tackle conditioning of pre-trained generative models from a neuron distribution perspective. Grounded on Optimal Transport, we propose a method that preserves the internal neuron distributions learnt during training, showing superior performance on many tasks.
And it works both for LLM and Diffusion! 🎉

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2410.23054 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2410.23054 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2410.23054 in a Space README.md to link it from this page.

Collections including this paper 4