Next Mahou series.
Your Mahou series is amazing at sticking into character, big fan since v1.1.
Adding ChatML into Llama3 is weird though, since L3 has it's own format. When you eventually cook v1.4 8B with Llama 3.1, could you try leaving out ChatML to see if the model perfoms better. As it i s now, it can be used with ChatML or L3, and the results and vocabulary slightly changes depending of which one you use.
Thanks, I appreciate it. :)
I understand, which is why I've pretty much given up entirely on modifying the tokenizer. A priority for Mahou 2 will be training the models with the base model's native chat format. In the meantime, I might try a small experiment as you suggested.
In the meantime, I might try a small experiment as you suggested.
I might have said it incorrectly, since english is not my main language. I meant training the model with the base model's native chat format instead of forcing ChatML into it, which is what you say that you already decided to do for the next version.
Looking forward to Mahou 2.
Mahou 2 will be trained off a large synthetic dataset that I haven't finished yet (so it won't be available for a while). What I meant was I'll attempt retraining llama3 with the current data in its native format to see how it changes the output :)
Nice. One thing that experiment will show, is to see if the awesome character adherence it's due to the ChatML part, or from the dataset.
Because, really, Mahou's character adherence level is something that I haven't seen in any other L3 model, and I've tested quite a few.