Next Mahou series.

by EloyOn - opened Jul 29, 2024

Jul 29, 2024

Your Mahou series is amazing at sticking into character, big fan since v1.1.

Adding ChatML into Llama3 is weird though, since L3 has it's own format. When you eventually cook v1.4 8B with Llama 3.1, could you try leaving out ChatML to see if the model perfoms better. As it i s now, it can be used with ChatML or L3, and the results and vocabulary slightly changes depending of which one you use.

nbeerbower

flammen.ai org Jul 29, 2024

Thanks, I appreciate it. :)

I understand, which is why I've pretty much given up entirely on modifying the tokenizer. A priority for Mahou 2 will be training the models with the base model's native chat format. In the meantime, I might try a small experiment as you suggested.

EloyOn

Jul 29, 2024

In the meantime, I might try a small experiment as you suggested.

I might have said it incorrectly, since english is not my main language. I meant training the model with the base model's native chat format instead of forcing ChatML into it, which is what you say that you already decided to do for the next version.

Looking forward to Mahou 2.

nbeerbower

flammen.ai org Jul 29, 2024

Mahou 2 will be trained off a large synthetic dataset that I haven't finished yet (so it won't be available for a while). What I meant was I'll attempt retraining llama3 with the current data in its native format to see how it changes the output :)

EloyOn

Jul 29, 2024

•

edited Jul 29, 2024

Nice. One thing that experiment will show, is to see if the awesome character adherence it's due to the ChatML part, or from the dataset.

Because, really, Mahou's character adherence level is something that I haven't seen in any other L3 model, and I've tested quite a few.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment