677 19 206

Arthur Zucker

ArthurZ

AI & ML interests

None yet

Recent Activity

liked a model 11 days ago

Qwen/Qwen2.5-3B

liked a model 14 days ago

stabilityai/stable-diffusion-3.5-large

liked a model 14 days ago

deepseek-ai/Janus-Pro-7B

View all activity

Organizations

ArthurZ's activity

liked a model 11 days ago

Qwen/Qwen2.5-3B

Text Generation • Updated Sep 20, 2024 • 222k • • 69

liked 2 models 14 days ago

stabilityai/stable-diffusion-3.5-large

Text-to-Image • Updated Oct 22, 2024 • 272k • • 2.33k

deepseek-ai/Janus-Pro-7B

Any-to-Any • Updated 24 days ago • 490k • 3.12k

liked a model 15 days ago

MiniMaxAI/MiniMax-VL-01

Image-Text-to-Text • Updated 3 days ago • 630 • 242

New activity in mistral-community/pixtral-12b 18 days ago

Fastest way for inference?

#28 opened 19 days ago by

psycy

liked a model 21 days ago

mistralai/Mistral-Small-24B-Instruct-2501

Text Generation • Updated 23 days ago • 755k • • 819

New activity in deepseek-ai/DeepSeek-R1 27 days ago

model-00078-of-000163.safetensors not marked safe?

#80 opened 27 days ago by

aborst

liked a model 27 days ago

microsoft/phi-4

Text Generation • Updated 1 day ago • 618k • • 1.78k

updated a model 28 days ago

ArthurZ/Ilama-3.2-1B

Feature Extraction • Updated 28 days ago • 15.3k

upvoted an article 28 days ago

Article

Welcome to Inference Providers on the Hub 🔥

29 days ago

• 387

published a model 28 days ago

ArthurZ/Ilama-3.2-1B

Feature Extraction • Updated 28 days ago • 15.3k

reacted to mitkox's post with 🚀 29 days ago

Post

2383

llama.cpp is 26.8% faster than ollama.
I have upgraded both, and using the same settings, I am running the same DeepSeek R1 Distill 1.5B on the same hardware. It's an Apples to Apples comparison.

Total duration:
llama.cpp 6.85 sec <- 26.8% faster
ollama 8.69 sec

Breakdown by phase:
Model loading
llama.cpp 241 ms <- 2x faster
ollama 553 ms

Prompt processing
llama.cpp 416.04 tokens/s with an eval time 45.67 ms <- 10x faster
ollama 42.17 tokens/s with an eval time of 498 ms

Token generation
llama.cpp 137.79 tokens/s with an eval time 6.62 sec <- 13% faster
ollama 122.07 tokens/s with an eval time 7.64 sec

llama.cpp is LLM inference in C/C++; ollama adds abstraction layers and marketing.

Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it.