Apply for community grant: Academic project (gpu)

#1
by anchorxia - opened

There is opensource project about video generation, https://huggingface.co/TMElyralab/MuseV. We want to apply gpu resource to build space. After debug space demo, i'll make public. Users could enjoy it conveninetly on hg space.

Hi @AnchorFake , we have assigned a gpu to this space. Note that GPU Grants are provided temporarily and might be removed after some time if the usage is very low.

To learn more about GPUs in Spaces, please check out https://huggingface.co/docs/hub/spaces-gpus

@hysts Thanks for hg support. We'll make full use of the gpu.

@hysts Hi,We are very grateful for HG's support. We have been running A10 for a few days and found that there is not enough graphics memory, which makes it unable to support high-resolution video generation. Additionally, the lack of the graphics memory makes app frequently crashes and restarts. Therefore, we would like to ask if it is possible to upgrade the community grant GPU from A10 small 15GB to the A10 large 46GB.

@AnchorFake OK, I've upgrade the hardware to a10g-large for now. But if the issue is about VRAM, it won't fix it because the difference between a10g-small and a10g-large is CPU RAM (15GB vs. 46GB), not VRAM (both of them have 24GB VRAM).

BTW, would it be possible to migrate your Space to ZeroGPU? We recently started using ZeroGPU as the default hardware for grants because it would reduce our infra cost in the long run and because it improves UX too. I just sent you an invitation to join the ZeroGPU explorers org, so it would be nice if you could check out the compatibility and usage section of the org card to see if your Space can run on it.

@hysts Thanks. I'll migrate it to ZeroGPU.

About the issue,i'ill do more test to find more info to improve the performance. Before that, you can downgrade back to A10 small.

@hysts Hi, I've joined ZeroGPU. But i have no idea how to migirate MuseVDemo to ZeroGPU.

In MuseVDemo setting, there is not ZeroGPU item.
image.png

I tried another away, create new ZeroGPU space and add files, but i've no permission.
image.png

@AnchorFake Ah, sorry. Can you create a separate private Space to test if ZeroGPU can work for your Space? You can assign ZeroGPU to the private Space and test if ZeroGPU can work for your app, and once you confirmed that it works, you can update the main Space and delete the private one you used for testing. You can create the Space under your profile or organizations where you have write permission. (The ZeroGPU explorers org is just used to give its members special permission to use ZeroGPU, and people are not supposed to create Spaces in the ZeroGPU explorers org.)
The reason you cannot change the hardware of this MuseVDemo Space is because it's using docker as SDK. Currently, ZeroGPU is only compatible with gradio SDK. So, to test ZeroGPU, you first need to migrate your Space to a normal gradio Space.

Forgot to mention, but as you are now in the ZeroGPU explorers org, you can find the Zero Nvidia A100 option in the Settings of your Space, which is currently disabled for your Space due to the SDK incompatibility.

@Hyst Hi, we have try deployying musev with gradio space, but failed. The error is very weired.

  1. MuseVDemo Gradio space with Zero Nvidia A100, error is as bellow. It seemd python package installation failed.
    image.png

  2. MuseVDemo Gradio space by dockr running registry.hf.space/anchorfake-musev:latest in local gpu succeed.

  3. MuseV Gradio space with CPU Based, error is as bellow. It seems python package installation is right, but mdoel download failed
    image.png

Any ideas for this question?

@AnchorFake Thanks for testing ZeroGPU!

The error in the first point is due to the fact that JIT compile is currently not available on ZeroGPU and timm has functions with @torch.jit.script decorator. But there's a workaround:

import torch

torch.jit.script = lambda f: f
import timm

This replaces torch.jit.script with a function that does nothing before importing timm, so the decorator is basically ignored when importing timm.

Regarding the error in 3., not 100% sure, but the error sometimes happens on Spaces, but usually it's just a temporary infra issue, and restarting your Space will fix it.

@Hyst

Previous error is fixed with torch.jit.script = lambda f: f. But New error occurs. It seems the the python environment manage in ZeroGPU complex and prone to errors.

image.png

gradio space with ZeroGPU is in https://huggingface.co/spaces/AnchorFake/MuseV2Test. Maybe we can continue debugging ZeroGPU on there?

Before ZeroGPU is successful, may I ask if MuseVDemo can be upgraded to Nvidia A100 large first? After ZeroGPU is successful, turn off this place again.

@AnchorFake
OK, we can assign a normal GPU to this Space while you are checking if it's possible to migrate to ZeroGPU in https://huggingface.co/spaces/AnchorFake/MuseV2Test .
But, is it possible to run this Space on a10g-large? Normal A100 is not available for grants except for ZeroGPU.

@Hyst

BTW, I think you are pinging a wrong person. 😅

@hysts Got that, Thanks for supporting MuseVDemo.

https://huggingface.co/spaces/AnchorFake/MuseV2Test use Nvidia A10G small is fine. We just use this for debug, not for public.

Ah, OK, this Space was running on a10g-large and you've been testing ZeroGPU in another Space from the beginning. Sorry, but A100 is not available for grants.

Ah, OK, this Space was running on a10g-large and you've been testing ZeroGPU in another Space from the beginning. Sorry, but A100 is not available for grants.

@hysts yes, that's right

So, just to be clear, you need a10g-small to test https://huggingface.co/spaces/AnchorFake/MuseV2Test too, correct? I just assigned a10g-small to the Space. I think you can switch it back to ZeroGPU yourself to test ZeroGPU.

@hysts yes, that's correct.

Sign up or log in to comment