Issues at high context lengths
Was excited to see these new large context models released, but I can't seem to get coherent results out of them if attempting to use a large amount of input text. If you chat to it "normally" with very short queries and it responding with short responses, it seems to work fine, but if you try to utilize the large context window it fails to properly function, which seems to defeat the purpose of the model, unless I making some mistake.
For example, if I just ask it how it is doing, it responds normally like so:
However, if I paste the raw test of an entire new article online (still FAR under the 128k context length) and ask it for a short summary of the article, it responds which gibberish, like this:
Or sometimes it fails in other ways ,like responding with just a single character.
At any rate, I have been unable to utilize the large context window in any meaningful way, so I was wondering if I was perhaps doing something wrong? Just using it in ooba. The GGUF versions also behave in the exact same way.
I have not had much luck with the longer these Yarn context models yet either. From some descriptions from people on TheBloke's Discord, the new Amazon MistralLite seems to have great usable context length (Turboderp mentions he got it to go past 38K.) I suspect the Alpha or other parameters will need to be set properly with a long-context prompt to have coherent output.
Tested your MistralLite-5.0bpw-h6-exl2 at around 25k-30k tokens and it worked first test (same input that produced last image above)
I have the same issues. It's producing nonsense.
I tested both the unquantized base model and the 8.0bpw version and they both behaved the same and was able to return non-gibberish inference. In ooba, I set the max token length to 32K. The only setting I changed was this one:
I have not tried to go to very high tokens though. Basic tests seem to work as expected.