6 1 44

Mohamed yasser PRO

yasserrmd

AI & ML interests

None yet

Recent Activity

liked a model 4 days ago

minishlab/potion-retrieval-32M

liked a model 4 days ago

minishlab/potion-base-32M

updated a model 4 days ago

yasserrmd/DeepSeek-7B-1M-gguf

View all activity

Organizations

None yet

yasserrmd's activity

liked 2 models 4 days ago

minishlab/potion-retrieval-32M

Updated 5 days ago • 266 • 14

minishlab/potion-base-32M

Updated 5 days ago • 206 • 8

updated a model 4 days ago

yasserrmd/DeepSeek-7B-1M-gguf

Updated 4 days ago • 153

published a model 4 days ago

yasserrmd/DeepSeek-7B-1M-gguf

Updated 4 days ago • 153

updated a model 4 days ago

yasserrmd/DeepSeek-7B-1M

Updated 4 days ago • 10

published a model 4 days ago

yasserrmd/DeepSeek-7B-1M

Updated 4 days ago • 10

liked a model 5 days ago

mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1

Text Generation • Updated 6 days ago • 72 • 11

liked a model 6 days ago

Falconsai/florence-2-invoice

Text Generation • Updated Jul 14, 2024 • 86 • 4

liked 2 models 7 days ago

Falconsai/fear_mongering_detection

Text Classification • Updated Nov 18, 2023 • 26.9k • 5

CAMB-AI/MARS5-TTS

Text-to-Speech • Updated Jul 5, 2024 • 215 • 449

updated a model 7 days ago

yasserrmd/Qwen2.5-7B-Instruct-1M-gguf

Updated 7 days ago • 229

published a model 7 days ago

yasserrmd/Qwen2.5-7B-Instruct-1M-gguf

Updated 7 days ago • 229

New activity in yasserrmd/esg-assistant 8 days ago

Librarian Bot: Add language metadata for dataset

#2 opened 9 days ago by

librarian-bot

reacted to mitkox's post with 👍 9 days ago

Post

2114

llama.cpp is 26.8% faster than ollama.
I have upgraded both, and using the same settings, I am running the same DeepSeek R1 Distill 1.5B on the same hardware. It's an Apples to Apples comparison.

Total duration:
llama.cpp 6.85 sec <- 26.8% faster
ollama 8.69 sec

Breakdown by phase:
Model loading
llama.cpp 241 ms <- 2x faster
ollama 553 ms

Prompt processing
llama.cpp 416.04 tokens/s with an eval time 45.67 ms <- 10x faster
ollama 42.17 tokens/s with an eval time of 498 ms

Token generation
llama.cpp 137.79 tokens/s with an eval time 6.62 sec <- 13% faster
ollama 122.07 tokens/s with an eval time 7.64 sec

llama.cpp is LLM inference in C/C++; ollama adds abstraction layers and marketing.

Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it.

7 replies