Unfortunately has the same problem as all the Mistral models

#1
by ElvisM - opened

After 16k context length, the smartness of the model is pretty much halved at least. The only thing I want for this year is the Mistral team releasing a 12b model that remains consistent as context length increases. To write long stories, I still need to summarize and then continue.

Part of the issue is that fine tunes (this model consists of three) do not train / use context above a certain level.
The other issue(s) are :
1 - Quants ; longer context = poorer performing quants.
2 - Sometimes a re-quant will improve performance (changes in Llamacpp Oct 2024ish, resulted in much better quants).
3 - Mistral Nemo models are more difficult to work with relative to Llama, older Mistral (7Bs), and GEmmas.

Sign up or log in to comment