unsloth/DeepSeek-R1-GGUF

#43 opened 1 day ago by

fuzhenxin

Share a mmlu test result,I use 2.51bit,and compare with ds api, baidu's ds,it seems 2.51bit is very smart at least in mmlu

#42 opened 7 days ago by

tarjintor

RTX 5090 with 600GB of RAM what models？

#40 opened 11 days ago by

frank-mx

Deploying a production ready service with GGUF on AWS account.

1

#39 opened 13 days ago by

samagra-tensorfuse

How to Convert DeepSeek-R1-UD-IQ1_M GGUF Back to Safetensors?

#38 opened 14 days ago by

Cheryl33990

Perplexity comparsion results (Updated)

1

#37 opened 14 days ago by

inputout

Q2_K_XL model is the best? IQ2_XXS is better than Q2_K_XL in mmlu-pro benchmark

11

#36 opened 15 days ago by

albertchow

Long-Form input takes too long

#35 opened 18 days ago by

htkim27

Q2_K_XL 好还是 Q4好呢

#34 opened 19 days ago by

jializou

is it uncensored?

#33 opened 20 days ago by

Morrigan-Ship

Cannot Run `unsloth/DeepSeek-R1-GGUF` Model – Missing `configuration_deepseek.py`

#32 opened 24 days ago by

syrys4750

When using llama.cpp to deploy the DeepSeek - R1 - Q4_K_M model, garbled characters appear in the server's response.

#31 opened 25 days ago by

KAMING

各种量化版本的模型，在不同测评数据集上面的表现怎么样，有没有具体的测试结果

#29 opened 25 days ago by

huanfa

when using with ollama, does it support kv_cache_type=q4_0 and flash_attention=1?

#28 opened 27 days ago by

leonzy04

如何同时处理多个http请求

#27 opened 27 days ago by

007hao

IQ1_S模型合并后部署于ollama上，推理生成效果差

#26 opened 27 days ago by

gaozj

模型似乎被微调过

#25 opened 28 days ago by

mogazheng

What is the base precision type(FP32/FP16) used in Q2/Q1 quantization?

#23 opened 30 days ago by

ArYuZzz1

any benchmark results?

#22 opened about 1 month ago by

Wei-Wu

Accuracy of the dynamic quants compared to usual quants?

19

#21 opened about 1 month ago by

inputout

8bits quantization

#20 opened about 1 month ago by

ramkumarkoppu

New research paper, R1 type reasoning models can be drastically improved in quality

#19 opened about 1 month ago by

krustik

md5 / sha256 hashes please

1

#18 opened about 1 month ago by

ivanvolosyuk

Is there a model removing non-shared MoE experts?

#17 opened about 1 month ago by

ghostplant

A Step-by-step deployment guide with ollama

#16 opened about 1 month ago by

snowkylin

No think tokens visible

6

#15 opened about 1 month ago by

sudkamath

Over 2 tok/sec agg backed by NVMe SSD on 96GB RAM + 24GB VRAM AM5 rig with llama.cpp

9

#13 opened about 1 month ago by

ubergarm

Running the model with vLLM does not actually work

8

#12 opened about 1 month ago by

aikitoria

DeepSeek-R1-GGUF on LMStudio not available

#11 opened about 1 month ago by

32SkyDive

Where did the BF16 come from?

8

#10 opened about 1 month ago by

gshpychka

Inference speed

#9 opened about 1 month ago by

Iker

Running this model using vLLM Docker

#8 opened about 1 month ago by

moficodes

UD-IQ1_M models for distilled R1 versions?

#6 opened about 1 month ago by

SamPurkis

Llama.cpp server chat template

#4 opened about 1 month ago by

softwareweaver

Are the Q4 and Q5 models R1 or R1-Zero

18

#2 opened about 2 months ago by

gng2info

What is the VRAM requirement to run this ?