arxiv:2412.17812

FaceLift: Single Image to 3D Head with View Generation and GS-LRM

Published on Dec 23, 2024

Authors:

Abstract

We present FaceLift, a feed-forward approach for rapid, high-quality, 360-degree head reconstruction from a single image. Our pipeline begins by employing a multi-view latent diffusion model that generates consistent side and back views of the head from a single facial input. These generated views then serve as input to a GS-LRM reconstructor, which produces a comprehensive 3D representation using Gaussian splats. To train our system, we develop a dataset of multi-view renderings using synthetic 3D human head as-sets. The diffusion-based multi-view generator is trained exclusively on synthetic head images, while the GS-LRM reconstructor undergoes initial training on Objaverse followed by fine-tuning on synthetic head data. FaceLift excels at preserving identity and maintaining view consistency across views. Despite being trained solely on synthetic data, FaceLift demonstrates remarkable generalization to real-world images. Through extensive qualitative and quantitative evaluations, we show that FaceLift outperforms state-of-the-art methods in 3D head reconstruction, highlighting its practical applicability and robust performance on real-world images. In addition to single image reconstruction, FaceLift supports video inputs for 4D novel view synthesis and seamlessly integrates with 2D reanimation techniques to enable 3D facial animation. Project page: https://weijielyu.github.io/FaceLift.

View arXiv page View PDF Add to collection

Community

sunghyogun

Jan 11

Incredible work! The FaceLift pipeline is a game-changer for 3D head reconstruction. The use of a multi-view latent diffusion model for generating consistent side and back views from a single image is a brilliant approach, and it's amazing to see how it maintains both identity preservation and view consistency. The integration of GS-LRM for 3D reconstruction is also an innovative step forward. It's impressive that this system, trained exclusively on synthetic data, generalizes so well to real-world images. The ability to handle both single-image and video inputs for 4D synthesis and 3D facial animation is especially exciting, and the potential applications in AR, VR, and digital content creation are enormous. Congratulations on pushing the envelope in 3D vision and AI, and I look forward to seeing how this evolves!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2412.17812 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2412.17812 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2412.17812 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.