Unfortunately has the same problem as all the Mistral models

by ElvisM - opened Jan 11

Jan 11

After 16k context length, the smartness of the model is pretty much halved at least. The only thing I want for this year is the Mistral team releasing a 12b model that remains consistent as context length increases. To write long stories, I still need to summarize and then continue.

DavidAU

Owner Jan 12

Part of the issue is that fine tunes (this model consists of three) do not train / use context above a certain level.
The other issue(s) are :
1 - Quants ; longer context = poorer performing quants.
2 - Sometimes a re-quant will improve performance (changes in Llamacpp Oct 2024ish, resulted in much better quants).
3 - Mistral Nemo models are more difficult to work with relative to Llama, older Mistral (7Bs), and GEmmas.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment