Spaces:
Running
A10G GPU assignment
Hi,
@chris-rannou
cc:
@akhaliq
Could you assign an A10G GPU to this space?
Hello @hysts I assigned an A10G GPU on the space but it's getting OOMKilled, consuming 30Go+ RAM
@chris-rannou Great! Thanks!
but it's getting OOMKilled, consuming 30Go+ RAM
Hmm, that's weird. I thought it worked on my environment with 24 GB RAM. I'll look into it.
The error says
Runtime error
Memory limit exceeded (16G)
I was thinking of GPU memory when I saw OOM, but maybe you were talking about regular non-GPU memory. Is it possible to assign an instance with more RAM? I'm not sure if it'll solve the problem, but I'd like to see what happens.
@hysts yes the OOM error is for regular memory I should have been clearer. The error message says 16Go but that's a mistake (it's the default value), there is in fact 30Gb assigned to this space.
@chris-rannou
Thanks. I see. Actually, this program temporarily consuming about 30 GB of RAM is an expected behavior because the size of the model is about 14GB and the SwissArmyTransformer
library, which this repo uses, first instantiates the model with random parameters and then loads the pretrained weights, as is often done.
30 GB of RAM looks sufficient, but it seems the program actually needs a bit more memory.
Is it possible to increase the amount of RAM?
Working on it
@chris-rannou Thanks a lot! Looks like the Space is working now.
FYI at startup the space actually consumes up to about 46Go of memory and then stabilizes at about 26Go.
Oh, that much? In my GCP environment, the app seemed to consume only about 32GB of RAM, so it's unexpected to me, but thanks.