gryan/OmniGen-v1-bnb-4bit

This repo contains bitsandbytes 4bit-NF4 model weights for OmniGen-v1. For info about OmniGen see the original model card.

8-bit weights: gryan/OmniGen-v1-bnb-8bit
4-bit (fp16, nf4) weights: gryan/OmniGen-v1-fp16-bnb-4bit -- for older GPUs (< Ampere/RTX 30xx) / Colab users.

Usage

Set up your environment by following the original Quick Start Guide before getting started.

NOTE: This feature is not officially supported yet. You'll need to install the repo from this pull request.

from OmniGen import OmniGenPipeline, OmniGen

# pass the quantized model in the pipeline
model = OmniGen.from_pretrained('gryan/OmniGen-v1-bnb-4bit')
pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1", model=model)

# proceed as normal!

## Text to Image
images = pipe(
    prompt="A curly-haired man in a red shirt is drinking tea.", 
    height=1024, 
    width=1024, 
    guidance_scale=2.5,
    seed=0,
)
images[0].save("example_t2i.png")  # save output PIL Image

## Multi-modal to Image
# In the prompt, we use the placeholder to represent the image. The image placeholder should be in the format of <img><|image_*|></img>
# You can add multiple images in the input_images. Please ensure that each image has its placeholder. For example, for the list input_images [img1_path, img2_path], the prompt needs to have two placeholders: <img><|image_1|></img>, <img><|image_2|></img>.
images = pipe(
    prompt="A man in a black shirt is reading a book. The man is the right man in <img><|image_1|></img>.",
    input_images=["./imgs/test_cases/two_man.jpg"],
    height=1024, 
    width=1024,
    guidance_scale=2.5, 
    img_guidance_scale=1.6,
    seed=0
)
images[0].save("example_ti2i.png")  # save output PIL image

Image Comparisons

Performance

For 4bit-nf4 quantized model on RTX 3090 GPU(24G):

Settings	Only Text	Text + Single Image	Text + Two Images
use_kv_cache=False	6.8G, 1m16s	7.2G, 3m30s	7.7G, 5m47s
use_kv_cache	9.9G, 1m14s	20.4G†, 8m5s	OOM (36.7G†, >1h10m)
use_kv_cache,offload_kv_cache	6.8G, 1m16s	7.2G, 2m49s	8.4G, 4m3s
use_kv_cache,offload_kv_cache,separate_cfg_infer	6.8G, 1m20s	7.0G, 2m31s	7.4G, 3m31s
use_kv_cache,offload_kv_cache,offload_model*	5.0G, 1m35s	6.0G, 3m7s	8.0G, 4m21s
use_kv_cache,offload_kv_cache,separate_cfg_infer,offload_model*	5.0G, 1m58s	5.3G, 3m29s	5.6G, 4m19s

† - memory_reserved > 24gb, RAM spillover
* - only VAE offload. Model loaded in 4bit cannot be offloaded.

See original inference settings table for bfloat16 performance.

gryan
/

OmniGen-v1-bnb-4bit

Usage

Image Comparisons

Performance

Model tree for gryan/OmniGen-v1-bnb-4bit