gguf?

#3
by LaferriereJC - opened

@TheBloke
Can you gguf'izer these?

I'm waiting for it too, it's the best model I've met so far

The Bloke just released it.

Edit: I'm getting the newline character typed vs applied, such as <0x0A><0x0A> rather than new paragraphs. Is this just the case for the GGUF version, the GPT4ALL app I'm using... or does it also happen with this unquantized version?

Argilla org

Hi @LaferriereJC @YAKOVNUKJHJ , good news ✨ The awesome @TheBloke has already quantized those (announced recently at https://twitter.com/alvarobartt/status/1731587062522929520) so you should already be able to use those, either GGUF or AWQ.

Argilla org

Edit: I'm getting the newline character typed vs applied, such as <0x0A><0x0A> rather than new paragraphs. Is this just the case for the GGUF version, the GPT4ALL app I'm using... or does it also happen with this unquantized version?

Hi here @Phil337 could you elaborate a bit on this? Is it related to the GGUF quantized weights, or just to prompting within the Notus model?

deleted

@alvarobartt It must be due to the GGUF version (I'm using Q4_0) or how it mixes with GPT4All because it's EVERY newline and paragraph regardless of prompt. There's a known token issue with GPT4All and the latest GGUF implementation that they say on the Github page is going to be fixed with the next update, so maybe that's it. Other than this Lotus performed very well.

Argilla org

Happy to hear that @Phil337, we'll also play around a bit with the quantized versions this week!

alvarobartt changed discussion status to closed
Argilla org

Hi again @Phil337 after reading a bit more it seems that the issue of the <0x0A> tokens was because the file tokenizer.model (SetencePiece based tokenizer, slow tokenizer) was missing and the GGUF quantized version had to build the tokenizer from the existing vocab file and that was leading to some errors, I saw that also being reported at https://huggingface.co/TheBloke/Starling-LM-7B-alpha-GGUF/discussions/1, and finally decided to port it from https://huggingface.co/HuggingFaceH4/zephyr-7b-beta/blob/main/tokenizer.model, as we're using the same tokenizer. Also thanks to @plaguss for internally reporting it!

deleted

@alvarobartt Thanks for looking into it and finding the cause.

Sign up or log in to comment