EQ Bench Scores

#1
by Sao10K - opened
Owner

benchmark_scores_by_run_id_updated.png

Owner
β€’
edited Aug 4

Evaluated on bf16, with both Mistral and ChatML format. Highest one is kept.

Edit: Lyra is nemorun4

My experience kinda reflecting that - feels nice to use, like a sidegrade to nemomix v4. Doesn't feel as smart/ attentive to smaller but key details as nemomix v4 especially if the the information is farther away in the context, but that model's ability too do so feels like a bit of an astounding anomaly. It does feel less censored and writes in a more human way for sure, and less repetitive. The dialogue also feels better - a decent tradeoff! Looking forward to whatever comes next from this too!

Edit: Actually, tested it a bit more and the more I try it the more I like it - it seems like it can actually be pretty smart and recall well, sometimes answer even better than nemomix especially once I toned down the repetition penalty. I guess issue was maybe consistency, but it is already pretty great in its own way, and maybe it's on me for having tried to use the same settings for nemomix v4 as for this model.

Sign up or log in to comment