OutOfMemoryError: CUDA out of memory

#109
by sieudd - opened

I am modeling on my PC with GPU p40 24VRAM but currently getting error torch.OutOfMemoryError: CUDA out of memory. As far as I know when loading model 8B only need 16GVRAM.

My func:

    def __init__(self):
        model_path = '/app/core/model/llama3.1-8b-instruct'
        self.device = torch.device(
            "cuda" if torch.cuda.is_available() else "cpu")

        if not os.path.exists(model_path):
            # Load the tokenizer and model from the custom directory
            tokenizer = transformers.AutoTokenizer.from_pretrained('meta-llama/Meta-Llama-3.1-8B-Instruct')
            model = transformers.AutoModelForCausalLM.from_pretrained('meta-llama/Meta-Llama-3.1-8B-Instruct')
            model.save_pretrained(model_path)
            tokenizer.save_pretrained(model_path)

        # Load model và tokenizer từ thư mục đã lưu
        self.tokenizer = transformers.AutoTokenizer.from_pretrained(model_path)
        self.model = transformers.AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.bfloat16).to(self.device)


        # Khởi tạo pipeline
        self.pipeline = transformers.pipeline(
            "text-generation",
            model=self.model,
            tokenizer=self.tokenizer,
            device=0 if torch.cuda.is_available() else -1,
        )

My error:
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 112.00 MiB. GPU 0 has a total capacity of 24.00 GiB of which 163.86 MiB is free. Process 280286 has 15.58 GiB memory in use. Process 280570 has 5.12 GiB memory in use. Of the allocated memory 4.93 GiB is allocated by PyTorch, and 41.85 MiB is reserved by PyTorch but unallocated.

Looks like your code could load model twice.

I'm having a similar issue with Ubuntu. With a Geforce GT 1030 card. Only 2GB of GDDR, but the model is only trying to allocate 112 MB, but NVTOP and nvidia-smi show that nothing else is using the memory or the gpu. I know 2GB is too low, but I was successful running on win 10 pc with the exact same card. And, in case you are doubting that, I know it was using the win gpu because, not only did the model respond 4 x faster than on the cpu, task manager and nvidia-smi both showed it using the memory and the GPU. But that's not happening on the Ubuntu.

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.28.03              Driver Version: 560.28.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GT 1030         Off |   00000000:06:00.0 Off |                  N/A |
| 27%   33C    P0             N/A /   30W |       0MiB /   2048MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 112.00 MiB. GPU

I can run the same exact script on the cpu. It's extremely slow, but it runs.

Sign up or log in to comment