DeepSeek-R1-UD-Q2_K_XL model inference by llama.cpp can't use flash-attention with n_embd_head_k!=n_embd_head_v
2
#43 opened 1 day ago
by
fuzhenxin
Share a mmlu test result,I use 2.51bit,and compare with ds api, baidu's ds,it seems 2.51bit is very smart at least in mmlu
#42 opened 7 days ago
by
tarjintor
RTX 5090 with 600GB of RAM what models?
4
#40 opened 11 days ago
by
frank-mx
Deploying a production ready service with GGUF on AWS account.
1
#39 opened 13 days ago
by
samagra-tensorfuse
How to Convert DeepSeek-R1-UD-IQ1_M GGUF Back to Safetensors?
#38 opened 14 days ago
by
Cheryl33990
Perplexity comparsion results (Updated)
1
#37 opened 14 days ago
by
inputout

Q2_K_XL model is the best? IQ2_XXS is better than Q2_K_XL in mmlu-pro benchmark
11
#36 opened 15 days ago
by
albertchow
Long-Form input takes too long
#35 opened 18 days ago
by
htkim27
Q2_K_XL 好还是 Q4好呢
3
#34 opened 19 days ago
by
jializou

is it uncensored?
5
#33 opened 20 days ago
by
Morrigan-Ship
Cannot Run `unsloth/DeepSeek-R1-GGUF` Model – Missing `configuration_deepseek.py`
2
#32 opened 24 days ago
by
syrys4750

When using llama.cpp to deploy the DeepSeek - R1 - Q4_K_M model, garbled characters appear in the server's response.
4
#31 opened 25 days ago
by
KAMING
各种量化版本的模型,在不同测评数据集上面的表现怎么样,有没有具体的测试结果
3
#29 opened 25 days ago
by
huanfa
when using with ollama, does it support kv_cache_type=q4_0 and flash_attention=1?
3
#28 opened 27 days ago
by
leonzy04
如何同时处理多个http请求
4
#27 opened 27 days ago
by
007hao
IQ1_S模型合并后部署于ollama上,推理生成效果差
3
#26 opened 27 days ago
by
gaozj
模型似乎被微调过
2
#25 opened 28 days ago
by
mogazheng
What is the base precision type(FP32/FP16) used in Q2/Q1 quantization?
#23 opened 30 days ago
by
ArYuZzz1
any benchmark results?
3
#22 opened about 1 month ago
by
Wei-Wu
Accuracy of the dynamic quants compared to usual quants?
19
#21 opened about 1 month ago
by
inputout

8bits quantization
5
#20 opened about 1 month ago
by
ramkumarkoppu
New research paper, R1 type reasoning models can be drastically improved in quality
2
#19 opened about 1 month ago
by
krustik
md5 / sha256 hashes please
1
#18 opened about 1 month ago
by
ivanvolosyuk
Is there a model removing non-shared MoE experts?
4
#17 opened about 1 month ago
by
ghostplant
A Step-by-step deployment guide with ollama
4
#16 opened about 1 month ago
by
snowkylin

No think tokens visible
6
#15 opened about 1 month ago
by
sudkamath
Over 2 tok/sec agg backed by NVMe SSD on 96GB RAM + 24GB VRAM AM5 rig with llama.cpp
9
#13 opened about 1 month ago
by
ubergarm
Running the model with vLLM does not actually work
8
#12 opened about 1 month ago
by
aikitoria

DeepSeek-R1-GGUF on LMStudio not available
2
#11 opened about 1 month ago
by
32SkyDive
Where did the BF16 come from?
8
#10 opened about 1 month ago
by
gshpychka
Inference speed
2
#9 opened about 1 month ago
by
Iker

Running this model using vLLM Docker
4
#8 opened about 1 month ago
by
moficodes

UD-IQ1_M models for distilled R1 versions?
3
#6 opened about 1 month ago
by
SamPurkis
Llama.cpp server chat template
5
#4 opened about 1 month ago
by
softwareweaver

Are the Q4 and Q5 models R1 or R1-Zero
18
#2 opened about 2 months ago
by
gng2info
What is the VRAM requirement to run this ?
5
#1 opened about 2 months ago
by
RageshAntony