mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1 Text Generation โข Updated 6 days ago โข 72 โข 11
view post Post 2114 llama.cpp is 26.8% faster than ollama. I have upgraded both, and using the same settings, I am running the same DeepSeek R1 Distill 1.5B on the same hardware. It's an Apples to Apples comparison. Total duration: llama.cpp 6.85 sec <- 26.8% fasterollama 8.69 secBreakdown by phase:Model loadingllama.cpp 241 ms <- 2x fasterollama 553 msPrompt processingllama.cpp 416.04 tokens/s with an eval time 45.67 ms <- 10x fasterollama 42.17 tokens/s with an eval time of 498 msToken generationllama.cpp 137.79 tokens/s with an eval time 6.62 sec <- 13% fasterollama 122.07 tokens/s with an eval time 7.64 secllama.cpp is LLM inference in C/C++; ollama adds abstraction layers and marketing. Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it. See translation 7 replies ยท ๐ 13 13 ๐ 7 7 + Reply