Resource Requirements for Running DeepSeek v3 Locally

#56
by wilfoderek - opened

Hi, I’m interested in running the DeepSeek v3 model locally, and I’d like to know the resource requirements (CPU, GPU, RAM, and storage) for optimal performance. I currently have access to an NVIDIA H100 GPU with 80GB of RAM, and I’m curious if this setup is sufficient or overkill for running the model efficiently.
Any suggestions are welcome!

Thanks in advance for your support!

Gpu is just simple calculations accelerator, nothing more. LLMs need a huge space for their brains first place, next is the speed. So the problem is in 80Gbs. I've launched Q5_K Medium quality of it on lliterally 10 years old Gigabyte 12 Ram slots motherboard and it takes 502Gb Ram to load (it's GGUF so somewhat reduced from original, which even more demanding, in my experience with GGUFs usually Ram=storage space+10%) . If we would have today a Terabyte GPUs that would be no problem.
Your GPU is great for video models generation, training-merging LLMs.

how many h100 to run q4?

how many h100 to run q4?

I'm not recommending going lower than Q5 because very large models are really bad at low quants, hallucinations level incredible and much worse than smaller models in same sizes, maybe the reason of such in it's dataset size. Even Q5 in my tests are not ideal and i'm downloading Q6 already.

how many h100 to run q4?

I'm not recommending going lower than Q5 because very large models are really bad at low quants, hallucinations level incredible and much worse than smaller models in same sizes, maybe the reason of such in it's dataset size. Even Q5 in my tests are not ideal and i'm downloading Q6 already.

them, how many to run Q5?

how many h100 to run q4?

I'm not recommending going lower than Q5 because very large models are really bad at low quants, hallucinations level incredible and much worse than smaller models in same sizes, maybe the reason of such in it's dataset size. Even Q5 in my tests are not ideal and i'm downloading Q6 already.

them, how many to run Q5?

I've tested Q5 (502Gb RAM) and Q6 (567Gb RAM) and i'm recommending Q6 if you need some creative work, Q5 maybe for some lighter things or talk. I will write my test result with Q5 and Q6 in specified forum page (Unsloth/Deepseek-V3-GGUF) in creating music melody by these quants of the model (spoiler: Q5 not capable to write music in Chuck code, Q6 was able to write some melody).

Sign up or log in to comment