Compatible small models for speculative decoding?
#9 opened about 2 months ago
by
treehugg3
How many GPU ram needed?
1
#8 opened 3 months ago
by
RaidXD
![](https://cdn-avatars.huggingface.co/v1/production/uploads/6655e5dd9ccb17d967248bf6/q7lf6pLo92aTBwF9ZwzJy.jpeg)
q8 with 8 part
#7 opened 3 months ago
by
sdyy
Q6_K vs. Q5_K_L
3
#6 opened 3 months ago
by
AIGUYCONTENT
![](https://cdn-avatars.huggingface.co/v1/production/uploads/6535601a5f5e918bdc08d0d2/ElWiwLfKV_qL_uHP82L2t.jpeg)
Unable to pull in from Ollama
5
#3 opened 4 months ago
by
AIGUYCONTENT
![](https://cdn-avatars.huggingface.co/v1/production/uploads/6535601a5f5e918bdc08d0d2/ElWiwLfKV_qL_uHP82L2t.jpeg)
Observation: 4-bit quantization can't answer the Strawberry prompt
12
#2 opened 4 months ago
by
ThePabli
Nemotron 51B too please
4
#1 opened 4 months ago
by
nacs