A Step-by-step deployment guide with ollama

#16

by snowkylin - opened Jan 31

Discussion

snowkylin

Jan 31

Just wanna share my deployment process in case of need.

https://snowkylin.github.io/blogs/a-note-on-deepseek-r1.html

shimmyshimmer

Unsloth AI org Jan 31

how did u manage to run the model directly using the ollama run command? :)

did u merge the ggufs yourself?

snowkylin

Jan 31

how did u manage to run the model directly using the ollama run command? :)

did u merge the ggufs yourself?

Yes, I merged them by llama-gguf-splitin llama.cpp. You can find the detail here.

krustik

Feb 3

Yes, it's really sad that Ollama is the last one today which don't support model in parts - everyone should be warned, and main stop for me.
I tested even in Kobold-cpp app and it can support gguf in parts, not counting LM Studio or oobabooga.
Also kinda ComfyUI people complain that there's very low context window in Ollama API, like only 2048 and require hacks, i haven't tested but beware too.

WithoutOrdinary

23 days ago

Yes, it's really sad that Ollama is the last one today which don't support model in parts

vllm also currently requires split merging for gguf quants.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment