Spaces:
Running
ZeroGPU Duration Quota Question
So, if i make a zerogpu space, where i request for example 100 seconds (@spaces.GPU(duration=100)
) in case someone uses a stable diffusion model with max settings, but then someone uses it with lower settings, taking less than 100s, does it still take from his account 100s of zerogpu quota, or the real amount of seconds that it took for the inference?
I just started playing with ZeroGPU this month, and from my limited experience, I think only the actual number of seconds it took for the inference are removed from the user's quota. You can see the space send a "release" request to the device manager endpoint when the results are returned to the user. The "duration" parameter only works as a guard to check if a user's quota is still enough for the specified duration. This ensures the GPU can be scheduled for the user, or it will fail if this requirement cannot be satisfied.
I wrote a small wrapper to support dynamic GPU duration. Feel free to check it out here: https://huggingface.co/spaces/zero-gpu-explorers/README/discussions/87
I just started playing with ZeroGPU this month, and from my limited experience, I think only the actual number of seconds it took for the inference are removed from the user's quota. You can see the space send a "release" request to the device manager endpoint when the results are returned to the user. The "duration" parameter only works as a guard to check if a user's quota is still enough for the specified duration. This ensures the GPU can be scheduled for the user, or it will fail if this requirement cannot be satisfied.
I wrote a small wrapper to support dynamic GPU duration. Feel free to check it out here: https://huggingface.co/spaces/zero-gpu-explorers/README/discussions/87
That looks interesting, i hope it gets merged into the spaces
package