My almost deep feedback

#2
by Numbra - opened

Hi @v000000 , I was testing this model in depth (Not so deep because I didn't have that much time, unfortunately) and I found some pretty amazing things and some pretty... I guess "boring" would be a fitting word.

Opinion:

  • On the one hand I felt a greater intelligence, a better description, a better coherence.
  • But I also felt a greater... Dull, boring, as if the model was imitating itself, but in a bad way... If that makes sense.

Observations:

  • The model is very dependent on a higher temperature: something between 1.28 - 1.3 which produced good results.
  • The model is very sensitive to Rep Pen Slope, I left it at 1 which I consider basic, but for this model it was enough to change the writing a lot, like removing /*/ from one answer to another.

I feel that the best results came from t0.0001, the standard model is quite... Eh basic.

Something beyond opinion:

I suspect that the drop in results is due to abliteration (ironic isn't it?), and to test this how about trying to do it with an L3 model that doesn't need abliteration instead of L3.1?
Because the storm is very... direct despite the increase in "model reasoning" so to speak, I feel that the Stheno is not very far from this lack of charisma of the L3.1 compared to the L3 which has already been exploited to a greater capacity.
My suggestion is simple: wouldn't you like to try repeating the same thing, but with a darker model? Like the Umbral-mind? I'm curious about the results, what do you say?

Thank you for the feedback it was interesting! and, that is ironic it may be because the abliteration vectors were from a L3 model and then applied to a L3.1 model. But grimjim said it works fine still? I recommend using DRY Rep instead of Rep Pen, since it always introduces those formatting issues you saw and DRY does not, I think it's also because Niitama was probably trained on a smaller RP dataset so it can't handle being punished for repetition so much. Id also recommend lowering the temp and using XTC sampling instead of just cranking the temperature for creativity, since it will go dumb. I've also noticed that dullness and repeating patterns if that's what you meant? I really like this model because it feels very human and emotive but it is kind off stuck in patterns, but with the abliteration it still knows when to refuse or be unpredictable during RP but still never refuses instruct. So the abliteration I thought was a really good amount.

But sure, I could try to do the same but L3 model with "L3-Umbral-Mind+L3.1 Storm" When I have time. But the 8k context, worse instruct and earlier stheno3.1 is annoying.

@Numbra
I merged it now and I used the Umbral-Mind you sent. I haven't tested it yet or made GGUF, but the weights are there and up on my page: v000000/L3-Umbral-Storm-8B-t0.0001

v000000 changed discussion status to closed

Sign up or log in to comment