Papers
arxiv:2501.01097

EliGen: Entity-Level Controlled Image Generation with Regional Attention

Published on Jan 2
Authors:
,
,
,
,

Abstract

Recent advancements in diffusion models have significantly advanced text-to-image generation, yet global text prompts alone remain insufficient for achieving fine-grained control over individual entities within an image. To address this limitation, we present EliGen, a novel framework for Entity-Level controlled Image Generation. We introduce regional attention, a mechanism for diffusion transformers that requires no additional parameters, seamlessly integrating entity prompts and arbitrary-shaped spatial masks. By contributing a high-quality dataset with fine-grained spatial and semantic entity-level annotations, we train EliGen to achieve robust and accurate entity-level manipulation, surpassing existing methods in both positional control precision and image quality. Additionally, we propose an inpainting fusion pipeline, extending EliGen to multi-entity image inpainting tasks. We further demonstrate its flexibility by integrating it with community models such as IP-Adapter and MLLM, unlocking new creative possibilities. The source code, dataset, and model will be released publicly.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2501.01097 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2501.01097 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2501.01097 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.