This is why we need big models

by Ainonake - opened Aug 20, 2024

Aug 20, 2024

Very smart in rare RP scenarios, getting very close to Claude in unhinged perfomance.
This and smaller magnum 72b I think are now the best rp models ever made.

Smaller magnums (32b and 12b) were a lot dumber than this and 72b though. If scenario wents to the area that isn't standard, smaller models start to collapse and reveal obvious stupidity. Magnum 123b and 72b on the other hand, can handle these scenarios quite well (but there is a still room for improvement though).

An enourmous step up from base Mistral 123b in uncensoring - base model was dumb as brick in rp.

Opus still has a huge advantage though - it can write unhinged stuff not only in English, but in an wast amount of languages and quite well. Maybe at some time we will get multilingual rp fine-tunes and then they will finally beat Claude.

I didn't like Llama 3&3.1 fine-tunes, because they still seem censored to oblivion, but not in obvious ways. Magnums are a goodsend though, So I wanted to say thank you to the team behind these models.

Tested with 4.0 exl2.

lucyknada

Anthracite org Aug 21, 2024

awesome! thanks for such a detailed review and testing it out! and indeed in our testing 3.1 has been a significant step-down from 3.0 too, so we wanted a better base.

CamiloMM

Aug 22, 2024

•

edited Aug 23, 2024

imho
123B > 72B > 12B > 32B

In isolation Qwen2 32B is perfectly serviceable, but I just don't know how, Mistral-Nemo is really impressive. I'm convinced everyone that had issues with it had bad samplers (please include sampler suggestions! I notice you can't just assume all models respond to the same samplers) or ran it lower than 8bpw/Q8.

(Edit: wait, 32B is Qwen1.5, that explains it I guess)

I'm thankful you guys made this because the Lumimaid tune of 123B is a bit strange; though I am still impressed how these fine tunes all get lower UGI scores than base on https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard

(particularly the "writing" and "unruly" sections. it seems quite odd, how can that be?!)

(Edit: anecdotally, this is better than base 123B and significantly better than Lumimaid, which I can only suppose had something configured wrong during training.)

coenleferink12

Aug 25, 2024

•

edited Aug 25, 2024

Passing by to say that this model is very, very good.

In my opinion, it is by far the most creative open weights model out there, and it beats many closed weights ones (except Opus, but in my case that is situational; while Opus has better coherence, sometimes Magnum just wins on sheer creativity of the output, even if it requires editing to fit the story).

What I like most is how well it adapts to the style. It mimics the language of the previous text well, but not only that, it doubles the quality in its output; personally, I found the language the model uses quite pleasant to read, and not overly burdened with common 'vulgar' tropes (CHAR1 looked at CHAR2 coldly, but there was a hint of understanding, CHAR1 wanted to say something, to protest, but knew CHAR2 was right, You speak of X, but... and many, many more), while again staying true to the original style (although sometimes it seems to lose focus and lapses into unnecessary vulgarity, like 'fuck' or 'shit' or something, even though the style suggests that such strong words should not be used). And while true to style, it is again quite creative and gives a decent reasoning in its outputs (though it varies from situation to situation; I have come across some where it performs poorly, just cannot get it right).

Also, CamiloMM said that Opus has advantage in language. I personally tried Magnum in Russian.

Not perfect, but it follows the style well, has pretty good vocabulary and quality of output, and decent reasoning. Overall, I'd say it's creative and on average reasons well in the Russian sphere, although it sometimes struggles with grammar and produces incorrect sentences. It also varies from situation to situation. I am not sure if German or French would be better, but I think they might be?

Overall, again, a very good model. Wish I had GPU to run it freely though! :)

tachyphylaxis

Sep 9, 2024

FWIW, I like this more than opus overall. One of the best I've ever tried--certainly outstanding.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment