Spaces:
Running
Runtime error ZeroGPU with transoformers MistralForCasualLM
Error:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/gradio/queueing.py", line 527, in process_events
response = await route_utils.call_process_api(
File "/usr/local/lib/python3.10/site-packages/gradio/route_utils.py", line 261, in call_process_api
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1788, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1352, in call_function
prediction = await utils.async_iteration(iterator)
File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 583, in async_iteration
return await iterator.anext()
File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 709, in asyncgen_wrapper
response = await iterator.anext()
File "/usr/local/lib/python3.10/site-packages/gradio/chat_interface.py", line 552, in _stream_fn
first_response = await async_iteration(generator)
File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 583, in async_iteration
return await iterator.anext()
File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 576, in anext
return await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 851, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 559, in run_sync_iterator_async
return next(iterator)
File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 294, in gradio_handler
raise res.value
RuntimeError: NVML_SUCCESS == r INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":830, please report a bug to PyTorch.
Not sure if this will fix it but you probably shouldn’t be loading the model inside the @zero .gpu function
Also to use CUDA don’t set pytorch default but use .to(cuda instead
Also encountered this issue
The same error, any guide on how to fix ?
Did y'all find the fix to this?
https://huggingface.co/spaces/zero-gpu-explorers/README/discussions/104
I encountered the same errors too, most of which were fixed by HF, but still not all of them.
As mrfakename said, don't let torch manage your GPU anyway, and be very careful in the type of space where you select and load models after startup.
- Anyway, basically offload it to the CPU and only explicitly .to("cuda") it when you use it.
- Definitely avoid situations where parts of the pipes or models are spread across RAM and VRAM.
- You don't have to be that careful about offloading after use it.
- Import spaces at the beginning of the code.
- Be very careful when adding @spaces decorators.
- accelerate is not the culprit, but the error content changes when this guy is there and when he's not.