Really nice and clean model card

#3
by CultriX - opened

Just wanted to let you know that I found this model card to be really pleasant and informative yet clean-looking!

Owner

Thank you, @CultriX . Lately, I have been trying to be more careful with the selection of models, testing them, creating an informative card, and providing as much information as possible so that people who are interested can follow the journey of the creation and testing. I am currently working on a code with a transformer to have an inference with stream and the model in 4-bit for execution, and it is more entertaining to see the magic happen without having to wait. The goal is always to improve. When I started with this, I used your models; they were my first merges! Thank you, Cultrix.

Yeah and that's great for the whacky things that I come up with as they benefit from cool model cards like this!

(Here's the idea I'm referring to in case you are interested in my latest crazy idea lol: https://huggingface.co/spaces/CultriX/Alt_LLM_LeaderBoard/discussions/1#65f511f6243549de5725220f)

And thanks for those last words man that's nice to hear! You have become quite advanced at this stuff in a short time then I'd say, goodjob!

I think it's a great idea, now I'm going to investigate it. If you look at my latest models, there is a little bit of everything, thanks to the comment of @Phil337 , a great "taster" of LLM models, I realized that many times we do not make better models, but models more contaminated with the answers of the benchmarks. I recommend you to read his comment, I thought it was great. So while I'm still trying to be in the top 10 of contaminated models haha I'm with other experiments like transcendental-7b, which is an esoteric and mystical model, trying to merge the models that individually are better in each benchmark and at the same time have a competitive avg, testing merges of 6 models. Basically playing and learning. Whenever you want, we can do some collaboration or crazy idea

Thanks I'll read it! And yeah makes sense because technically you're teaching the model to be good at getting a higher benchmark score and not necessarily at being a better overal model. Usually, a high benchmark score also instinctively implies a good general performance so the two can imo be correlated (but should not be mistaken to be the exact same thing as they are not).

Owner

I was trying the Hermes 2 pro, the Einstein and the truth is that they seem superior to the ones that are in the top 100, with questions of logic, philosophy, creativity, etc. There are many models in the benchmark that get stubborn from so much DPO, others that respond to INSTINST and you have to regulate the repetition_penalty, beyond that, it is all very new and experimental. For work use, I always use specific models. For SQL for Business or for QA in Python is what gives me the most results

I'm trying your method, let's see what comes out. @CultriX

image.png


image.png

Yeah this is why in my example I do not only feed it the benchmark scores I also feed it the actual configurations used to merge the models that got those benchmark results as this probably gives it a lot more information and ways to find things that work well rather than just the benchmark scores themselves. I then ask it to come up with a new configuration that it thinks will do even better :)

Owner

@CultriX : It's more complex, now I got a little distracted by 12b's FrankMerges, but it would be good to have a discord channel or a meeting point to share all these learnings. One strange thing that caught my attention, for example, is that the density according to mergekit should not exceed 0.5 but in a github thread it is discussed that better results are achieved with values of up to 0.83, I don't know if it applies to 7b models. In short, it would be good for everyone to be able to share our progress and discoveries.

There is a Discord! https://discord.gg/7Tw62v8N

Sign up or log in to comment