General discussion and feedback.
Ministral 8B has a special interleaved sliding-window attention pattern for faster and memory-efficient inference.
Please provide feedback if this GGUF works or not.
It seems to work fine at low context, some have reported oddities at long context, and others have reported subpar performance from the original model being hosted in an HF space, so it's hard to be certain if the GGUF is broken or the original model
So far though I can reasonably say that at low context it works as expected
As things develop I will update this card, or pull the model if I receive other negative feedback showing bad performance, but initial testing is promising
I have a somewhat long context, and unfortunately I am running into issues with the generated output stopping after a few sentences (and often in the middle of a sentence).
@bartowski how did you convert this?
INFO:hf-to-gguf:Loading model: Ministral-8B-Instruct-2410
Traceback (most recent call last):
File "/content/llama.cpp/convert_hf_to_gguf.py", line 4430, in <module>
main()
File "/content/llama.cpp/convert_hf_to_gguf.py", line 4398, in main
hparams = Model.load_hparams(dir_model)
File "/content/llama.cpp/convert_hf_to_gguf.py", line 462, in load_hparams
with open(dir_model / "config.json", "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'Ministral-8B-Instruct-2410/config.json'
Oh wait.. you got from this: prince-canuma/Ministral-8B-Instruct-2410-HF
Not from mistralai repo