Hands-on

The goal of this hands-on is to build a text-to-splat pipeline, using LGM (Large Gaussian Model) as an example.

This consists of two parts of the generative 3D pipeline:

Multi-view Diffusion
ML-friendly 3D (Gaussian Splatting)

Setup

Open the Colab notebook linked above. Click Runtime -> Change runtime type and select GPU as the hardware accelerator.

Then, start by installing the necessary dependencies:

!pip install -r https://huggingface.co/spaces/dylanebert/LGM-mini/raw/main/requirements.txt
!pip install https://huggingface.co/spaces/dylanebert/LGM-mini/resolve/main/wheel/diff_gaussian_rasterization-0.0.0-cp310-cp310-linux_x86_64.whl

As before, if the notebook asks you to restart the session, do so, then rerun the code block.

Load the Models

Just like in the multi-view diffusion notebook, load the pretrained multi-view diffusion model:

import torch
from diffusers import DiffusionPipeline

image_pipeline = DiffusionPipeline.from_pretrained(
    "dylanebert/multi-view-diffusion",
    custom_pipeline="dylanebert/multi-view-diffusion",
    torch_dtype=torch.float16,
    trust_remote_code=True,
).to("cuda")

This is because multi-view diffusion is the first step in the LGM pipeline.

Then, load the generative Gaussian Splatting model, the main contribution of LGM:

splat_pipeline = DiffusionPipeline.from_pretrained(
    "dylanebert/LGM",
    custom_pipeline="dylanebert/LGM",
    torch_dtype=torch.float16,
    trust_remote_code=True,
).to("cuda")

Load an Image

As before, load the famous Cat Statue image:

import requests
from PIL import Image
from io import BytesIO

image_url = "https://huggingface.co/datasets/dylanebert/3d-arena/resolve/main/inputs/images/a_cat_statue.jpg"
response = requests.get(image_url)
image = Image.open(BytesIO(response.content))
image

Run the Pipeline

Finally, pass the image through both pipelines. The output will be a matrix of splat data, which can be saved with splat_pipeline.save_ply().

import numpy as np
from google.colab import files

input_image = np.array(image, dtype=np.float32) / 255.
multi_view_images = image_pipeline("", input_image, guidance_scale=5, num_inference_steps=30, elevation=0)

Multi-view Cats

splat = splat_pipeline(multi_view_images)

output_path = "/tmp/output.ply"
splat_pipeline.save_ply(splat, output_path)
files.download(output_path)

This includes files.download() to download the file to your local machine when running the notebook in Colab. If you’re running the notebook locally, you can remove this line.

Congratulations! You’ve run the LGM pipeline.

Gradio Demo

Now, let’s create a Gradio demo to run the model end-to-end with an easy-to-use interface:

import gradio as gr

def run(image):
    input_image = image.astype("float32") / 255.0
    images = image_pipeline("", input_image, guidance_scale=5, num_inference_steps=30, elevation=0)
    splat = splat_pipeline(images)
    output_path = "/tmp/output.ply"
    splat_pipeline.save_ply(splat, output_path)
    return output_path

demo = gr.Interface(fn=run, inputs="image", outputs=gr.Model3D())
demo.launch()

This will create a Gradio demo that takes an image as input and outputs a 3D splat.