view post Post 1602 llama.cpp is 26.8% faster than ollama. I have upgraded both, and using the same settings, I am running the same DeepSeek R1 Distill 1.5B on the same hardware. It's an Apples to Apples comparison. Total duration: llama.cpp 6.85 sec <- 26.8% fasterollama 8.69 secBreakdown by phase:Model loadingllama.cpp 241 ms <- 2x fasterollama 553 msPrompt processingllama.cpp 416.04 tokens/s with an eval time 45.67 ms <- 10x fasterollama 42.17 tokens/s with an eval time of 498 msToken generationllama.cpp 137.79 tokens/s with an eval time 6.62 sec <- 13% fasterollama 122.07 tokens/s with an eval time 7.64 secllama.cpp is LLM inference in C/C++; ollama adds abstraction layers and marketing. Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it. See translation 6 replies · 👍 9 9 🚀 3 3 + Reply
GGUF LoRA adapters Collection Adapters extracted from fine tuned models, using mergekit-extract-lora • 16 items • Updated 3 days ago • 4
Extracted LoRA (mergekit) Collection PEFT-compatible LoRA adapters produced by mergekit-extract-lora • 17 items • Updated 3 days ago • 3
GGUF LoRA adapters Collection Adapters extracted from fine tuned models, using mergekit-extract-lora • 16 items • Updated 3 days ago • 4