All your models compatible with Ollama and rename all parts to compatible with Ollama, for example

#377
by joaquinito2073 - opened

Rename part1of8, for example, to 00001-of-00008.gguf, for example.
Source: https://huggingface.co/docs/hub/en/ollama

Rename part1of8, for example, to 00001-of-00008.gguf, for example.
Source: https://huggingface.co/docs/hub/en/ollama

He is using a different approach to split models and so it is not possible to load them that way even if you rename them.
Just concatenate them using the following command before loading them-.

cat $(ls /$path/$model.$quant.gguf.* | sort -V) > /$path/$model.$quant.gguf

If you really want to load the model without concatenating it first, you can use the following command to get a mountpoint to a fuse concatenated model:

cfconcat $(ls /$path/$model.$quant.gguf.* | sort -V)

You can use the output directly in command line tools like this:

CUDA_VISIBLE_DEVICES=0 llama.cpp/llama-perplexity -m $(cfconcat $(ls /$path/$model.$quant.gguf.* | sort -V)) --multiple-choice --multiple-choice-tasks 2000 -f mmlu-validation.bin -c 1024 -ngl 0 > ./evaluation/$model.$quant.mmlu.txt

My personal favorite option is to just concatenate the models while downloading like this but only do so if you have stable internet and as a single download issue will sliently corrupt the entire download

curl -L https://huggingface.co/mradermacher/$model-GGUF/resolve/main/$model.$squant.gguf.part[1-3]of3 > /upool/$model.$squant.gguf

Arguably, it's also a bug in ollama's documentation to claim it works with all repos, when it works with, well, more like half :) None of the split TheBloke's models and everything else that is older works.

mradermacher changed discussion status to closed

Sign up or log in to comment