CUDA out of memory

#142
by germain505 - opened

No matter which model I use (schnell/dev) I can't get it to run, and all the VRAM gets devoured. I have an RTX 4090 GPU. Am I doing something wrong?

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 126.00 MiB. GPU 0 has a total capacity of 23.52 GiB of which 103.44 MiB is free. Process 2588 has 83.96 MiB memory in use. Including non-PyTorch memory, this process has 22.61 GiB memory in use. Of the allocated memory 22.22 GiB is allocated by PyTorch, and 15.45 MiB is reserved by PyTorch but unallocated. 
This comment has been hidden
This comment has been hidden
This comment has been hidden
This comment has been hidden
This comment has been hidden
This comment has been hidden

have same problem with the same GPU. You solved it?

same problem with 4090. no problem when using huggingface diffusers, but when running source code I come across CUDA oom

Same problem here. I just posted on another thread because I can't seem to find how to search the discussions.

Running FLUX.1-dev Image Generation with Memory Optimization on my Nvidia GTX 1070 8GB GPU

This guide explains how to run the FLUX.1-dev image generation model with various memory optimizations to handle GPU memory constraints.

Setup and Imports

import os
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'expandable_segments:True'
import torch
from diffusers import FluxPipeline

The first lines set up our environment:

  • Setting PYTORCH_CUDA_ALLOC_CONF helps prevent memory fragmentation
  • We import PyTorch and the FluxPipeline from the diffusers library

Pipeline Configuration

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.bfloat16,
    use_safetensors=True
)

Here we configure the pipeline with several optimizations:

  • torch_dtype=torch.bfloat16 uses 16-bit precision to reduce memory usage
  • use_safetensors=True enables more efficient model loading

Memory Optimizations

torch.cuda.empty_cache()
pipe.enable_attention_slicing()
pipe.enable_sequential_cpu_offload()

These lines implement three key memory-saving techniques:

  • empty_cache() clears unused CUDA memory
  • enable_attention_slicing() processes attention in smaller chunks
  • enable_sequential_cpu_offload() moves unused model components to CPU

Image Generation

prompt = "A cat holding a sign that says hello world"
image = pipe(
    prompt,
    height=160,
    width=160,
    guidance_scale=3.5,
    num_inference_steps=50,
    max_sequence_length=512,
    generator=torch.Generator("cpu").manual_seed(0)
).images[0]

The generation parameters are configured for memory efficiency:

  • Small image dimensions (160x160) to minimize memory usage
  • guidance_scale=3.5 controls how closely the image follows the prompt
  • num_inference_steps=50 determines generation quality
  • max_sequence_length=512 limits the prompt token length
  • Setting a manual seed ensures reproducible results

Saving the Result

image.save("flux-dev.png")

Finally, we save the generated image to a PNG file.

Memory Usage Tips

If you're still experiencing memory issues, you can try:

  • Further reducing image dimensions
  • Decreasing the number of inference steps (try 30-40)
  • Lowering the max_sequence_length if using shorter prompts
  • Adjusting the guidance_scale (lower values use less memory)

Complete Code

Here's the complete code block for easy copying:

import os
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'expandable_segments:True'
import torch
from diffusers import FluxPipeline

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.bfloat16,
    use_safetensors=True
)

# Memory optimizations
torch.cuda.empty_cache()
pipe.enable_attention_slicing()
pipe.enable_sequential_cpu_offload()

prompt = "A cat holding a sign that says hello world"
image = pipe(
    prompt,
    height=160,
    width=160,
    guidance_scale=3.5,
    num_inference_steps=50,
    max_sequence_length=512,
    generator=torch.Generator("cpu").manual_seed(0)
).images[0]

image.save("flux-dev.png")

Sign up or log in to comment