Text Generation
Transformers
PyTorch
mpt
Composer
MosaicML
llm-foundry
custom_code
text-generation-inference

Onnx export fails when using cuda:0 as init device

#66
by Akshay1996 - opened

I am trying to export this model to onnx using cuda but it fails with CUDA OOM error.
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 31.75 GiB total capacity; 31.13 GiB already allocated; 225.75 MiB free; 31.16 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Does anyone know how to workaround this?

Running into the same issue. Some discussion here on how it could be made to work https://github.com/huggingface/optimum/issues/1061 I'm not quite sure where exactly but I saw someone comment somewhere that they just needed a large amount of memory to get it done.

Sign up or log in to comment