zero-gpu-explorers/README · RuntimeError: No CUDA GPUs are available

Sep 10, 2024

•

edited Sep 10, 2024

Getting RuntimeError: No CUDA GPUs are available, I checked the changes applied here but it was not caused from the same error.

Error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 135, in worker_init
    torch.init(nvidia_uuid)
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/torch/patching.py", line 354, in init
    torch.Tensor([0]).cuda()
  File "/usr/local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 314, in _lazy_init
    torch._C._cuda_init()
RuntimeError: No CUDA GPUs are available
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/gradio/queueing.py", line 536, in process_events
    response = await route_utils.call_process_api(
  File "/usr/local/lib/python3.10/site-packages/gradio/route_utils.py", line 321, in call_process_api
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1935, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function
    prediction = await anyio.to_thread.run_sync(  # type: ignore
  File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 859, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 826, in wrapper
    response = f(*args, **kwargs)
  File "/home/user/app/ui/images/images.py", line 829, in generate_images
    images = generate_t2i(base_request)
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 214, in gradio_handler
    raise res.value
RuntimeError: No CUDA GPUs are available

I have made sure to add @spaces.GPU(duration=120) before the functions

@spaces.GPU(duration=120)
def generate_t2i(request: GenT2I):
  ...

images = generate_t2i(base_request)
return gr.update( # output_images
  value=images,
  interactive=True
)

This is how i load the model

def load_sd():
    # device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    device = "cuda" if torch.cuda.is_available() else "cpu"

    # Models
    models = [
        {
            "repo_id": "black-forest-labs/FLUX.1-dev",
            "loader": "flux",
            "compute_type": torch.bfloat16,
        },
        {
            "repo_id": "SG161222/RealVisXL_V4.0",
            "loader": "xl",
            "compute_type": torch.float16,
        }
    ]

    for model in models:
        try:
            model["pipeline"] = AutoPipelineForText2Image.from_pretrained(
                model['repo_id'],
                torch_dtype = model['compute_type'],
                safety_checker = None,
                variant = "fp16"
            ).to(device)
            model["pipeline"].enable_model_cpu_offload()
        except:
            model["pipeline"] = AutoPipelineForText2Image.from_pretrained(
                model['repo_id'],
                torch_dtype = model['compute_type'],
                safety_checker = None
            ).to(device)
            model["pipeline"].enable_model_cpu_offload() 


    # VAE n Refiner
    sdxl_vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
    refiner = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-refiner-1.0", vae=sdxl_vae, torch_dtype=torch.float16, use_safetensors=True, variant="fp16").to(device)
    refiner.enable_model_cpu_offload()


    # Safety Checker
    safety_checker = StableDiffusionSafetyChecker.from_pretrained("CompVis/stable-diffusion-safety-checker").to(device)
    feature_extractor = CLIPFeatureExtractor.from_pretrained("openai/clip-vit-base-patch32", from_pt=True)


    # Controlnets
    controlnet_models = [
        {
            "repo_id": "xinsir/controlnet-depth-sdxl-1.0",
            "name": "depth_xl",
            "layers": ["depth"],
            "loader": "xl",
            "compute_type": torch.float16,
        },
        {
            "repo_id": "xinsir/controlnet-canny-sdxl-1.0",
            "name": "canny_xl",
            "layers": ["canny"],
            "loader": "xl",
            "compute_type": torch.float16,
        },
        {
            "repo_id": "xinsir/controlnet-openpose-sdxl-1.0",
            "name": "openpose_xl",
            "layers": ["pose"],
            "loader": "xl",
            "compute_type": torch.float16,
        },
        {
            "repo_id": "xinsir/controlnet-scribble-sdxl-1.0",
            "name": "scribble_xl",
            "layers": ["scribble"],
            "loader": "xl",
            "compute_type": torch.float16,
        },
        {
            "repo_id": "Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro",
            "name": "flux1_union_pro",
            "layers": ["canny_fl", "tile_fl", "depth_fl", "blur_fl", "pose_fl", "gray_fl", "low_quality_fl"],
            "loader": "flux-multi",
            "compute_type": torch.bfloat16,
        }
    ]

    for controlnet in controlnet_models:
        if controlnet["loader"] == "xl":
            controlnet["controlnet"] = ControlNetModel.from_pretrained(
                controlnet["repo_id"],
                torch_dtype = controlnet['compute_type']
            )
        elif controlnet["loader"] == "flux-multi":
            controlnet["controlnet"] = FluxMultiControlNetModel([FluxControlNetModel.from_pretrained(
                controlnet["repo_id"],
                torch_dtype = controlnet['compute_type']
            )])
        #TODO: Add support for flux only controlnet


    # Face Detection (for PhotoMaker)
    face_detector = FaceAnalysis2(providers=['CUDAExecutionProvider'], allowed_modules=['detection', 'recognition'])
    face_detector.prepare(ctx_id=0, det_size=(640, 640))


    # PhotoMaker V2 (for SDXL only)
    photomaker_ckpt = hf_hub_download(repo_id="TencentARC/PhotoMaker-V2", filename="photomaker-v2.bin", repo_type="model")

    return device, models, sdxl_vae, refiner, safety_checker, feature_extractor, controlnet_models, face_detector, photomaker_ckpt

device, models, sdxl_vae, refiner, safety_checker, feature_extractor, controlnet_models, face_detector, photomaker_ckpt = load_sd()

mantrakp

Sep 12, 2024

Update:
https://huggingface.co/spaces/mantrakp/aai
please check this space out and let me know

cbensimon

ZeroGPU Explorers org Sep 12, 2024

@mantrakp thanks for the report. Currently the Space works as expected on my side but do not hesitate to report again if you still encounter errors

mantrakp

Sep 13, 2024

I was encountering errors, but when I refactored all the code into a single file, it worked. Now I'm copying and separating the code again to try to re-refactor it.

liuganghuggingface

Oct 5, 2024

I encountered the same issue. May I ask what the solution is? 'refactored all the code into a single file,' does it mean placing all the code in the app.py file?

mantrakp

Oct 7, 2024

am not sure but importing spaces in app.py seem to solve the issue, make sure that spaces load before torch. This is totally unreliable so lmk if it works

John6666

Oct 7, 2024

In addition to the tips above, you should avoid touching CUDA except for functions with @spaces.GPU decorators and global scope as much as possible.
Be especially careful about doing .to("cuda"). If it is done in a library, you may have to tweak the library directly.

Also, I think it might be because of the multi-processing of spaces library, there are many cases where information exchange between functions with @spaces.GPU decorators inside and outside of them doesn't work well. I think the only way is to avoid the exchange itself as much as possible.
https://huggingface.co/spaces/zero-gpu-explorers/README/discussions/107