Xuan Son NGUYEN's picture

Xuan Son NGUYEN

ngxson

·

https://blog.ngxson.com

AI & ML interests

Doing AI for fun, not for profit

Recent Activity

updated a model about 6 hours ago

ngxson/Qwen2.5-7B-Instruct-1M-Q4_K_M-GGUF

published a model about 6 hours ago

ngxson/Qwen2.5-7B-Instruct-1M-Q4_K_M-GGUF

reacted to mitkox's post with 🚀 1 day ago

llama.cpp is 26.8% faster than ollama. I have upgraded both, and using the same settings, I am running the same DeepSeek R1 Distill 1.5B on the same hardware. It's an Apples to Apples comparison. Total duration: llama.cpp 6.85 sec <- 26.8% faster ollama 8.69 sec Breakdown by phase: Model loading llama.cpp 241 ms <- 2x faster ollama 553 ms Prompt processing llama.cpp 416.04 tokens/s with an eval time 45.67 ms <- 10x faster ollama 42.17 tokens/s with an eval time of 498 ms Token generation llama.cpp 137.79 tokens/s with an eval time 6.62 sec <- 13% faster ollama 122.07 tokens/s with an eval time 7.64 sec llama.cpp is LLM inference in C/C++; ollama adds abstraction layers and marketing. Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it.

View all activity

Articles

Introducing GGUF-my-LoRA

Code a simple RAG from scratch

Introduction to ggml

Organizations

ngxson's activity

New activity in deepseek-ai/DeepSeek-R1-Distill-Qwen-32B 4 days ago

Tokenizer config is wrong

#10 opened 5 days ago by

New activity in 5CD-AI/Vintern-1B-v3_5 12 days ago

Deployment as server?

#1 opened 13 days ago by

New activity in 5CD-AI/Viet-Doc-VQA-verIII 16 days ago

🚩 Report: Not working

#1 opened 17 days ago by

New activity in ngxson/MiniThinky-dataset 18 days ago

Librarian Bot: Add language metadata for dataset

#2 opened 19 days ago by

New activity in ngxson/MiniThinky-1B-Llama-3.2 18 days ago

Update README.md

#2 opened 18 days ago by

New activity in bartowski/QVQ-72B-Preview-GGUF 19 days ago

Add system message

#7 opened 19 days ago by

Ollama upload please.

#2 opened about 1 month ago by

AlgorithmicKing

New activity in ngxson/MiniThinky-v2-1B-Llama-3.2 19 days ago

Upload folder using huggingface_hub

#1 opened 19 days ago by

New activity in ngxson/MiniThinky-1B-Llama-3.2 21 days ago

Upload folder using huggingface_hub

#1 opened 21 days ago by

New activity in ggml-org/gguf-my-repo 25 days ago

Update app.py

#144 opened 27 days ago by

New activity in ggml-org/gguf-my-repo about 2 months ago

Accessing own private repos

#141 opened about 2 months ago by

[Errno 2] No such file or directory: './llama.cpp/llama-quantize'

#140 opened about 2 months ago by

New activity in ggml-org/gguf-my-repo 2 months ago

Error quantizing: b'/bin/sh: 1: ./llama.cpp/llama-quantize: not found\n'

#136 opened 2 months ago by

Better isolation + various improvements

#133 opened 3 months ago by

New activity in ggml-org/gguf-my-repo 3 months ago

update readme for card generation

#128 opened 3 months ago by

Error converting to fp16: b'INFO:hf-to-gguf:Loading model: qwen2.5-3b

#135 opened 3 months ago by

Qwen2.5-3B: [Errno 2] No such file or directory: 'downloads/tmpg0g5sjvl'

#134 opened 3 months ago by

add docker compose for dev locally

#130 opened 3 months ago by

Add F16 and BF16 quantization

#129 opened 3 months ago by

Update app.py

#132 opened 3 months ago by