V4.3 Early Testing.

#15
by deleted - opened

No refusals, though creativity is maybe somewhat low (shifting presets and good context seem to help a good deal, had some fun character conversations) and I am hitting some early stopping tokens, which I've seen reported with other Vicuna 1.1 models. This is with GGML Q4_0, though I assume it will have similar outcomes on other quants.

1.2 just dropped so that may change things up (repo 404ed, maybe they weren't ready). I think the dataset may be good, though. I'll torture test a bit more later. Thanks for the train. Looking forward to putting it through some paces in a little while.

EDIT: NovalAI-Storyteller preset seems to give decent outputs. And it's definitely capable of having some fun conversations with chatbots. Enjoying it generally so far, but I have some work I should be doing so a hold on testing for now from me.

I have a few questions? Thanks for taking the time to test gozfarb and share your information.
Sorry to ask this gozfarb but can anyone try to help answer a few questions? It almost feels like the terminology changed?

  1. Are these new drops the 1.1 that Reeducator said he was training yesterday? For example this file "vicuna-13b-free-V4.3-q4_0.bin" is this 1.1?
  2. You also just said 1.2 dropped? Huh? None of these files have 1.1 or 1.2 in the naming?
  3. On my machine RTX 4080 16GB VRAM and 64GB of System ram which model do you think would be best for me to run on Oobabooga?
    I am more used to the files having names like "vicuna-13b-free-4bit-128g" that is the one I am using right now. May reeducator or someone else release the 4bit-128g version? Or should I just use one of the ones already posted?

Thanks.

deleted

No problem, it's probably good to clear some things up.

  1. Yes, that is 1.1, trained against the V4.3 unfiltered dataset. Vicuna has made changes to the way their prompt/token structure works, hence why around HF you will see Vicuna 1.0 and 1.1 in various places. You can check readmes usually if there isn't a specific mention in the filename or model name.
  2. There was a 1.2 repo set to public very briefly from lmsys, the original makers of Vicuna..
  3. The models that are up in the repo at present at ggml models (a float16 bin, a 4-bit bin, and a 5-bit bin). These are CPU inference models that you can use with ooba. I assume reeducator will upload the pytorch files or GPTQ quantized versions at some point here in the near future.

Any time you see 1.1 or 1.2, that is going to be referring to the Vicuna training process (which you can see the code for at lmsys/FastChat on Github). V4.x is going to refer to the version of the unfiltered dataset that the model was trained against.

Thank you very much Gozfarb. I believe I understand and some of that was close to what I assumed. But some of that I had no idea about. You don't need to reply to this but if I understand correctly? When you said 1.2 dropped you were not really referencing anything here. It's just that the Vicuna 1.2 training method or first model produced with that training method released somewhere else. That would be a censored model or training etc. Then to uncensor it. It needs to be filtered though the most up to date unfiltered data set? Which removes the censor? Something like that? So you basically mix and match the main Vicuna version 1.0, 1.1,1.2 (New) with various unfiltered data sets if the creator wants to make it uncensored.

And yes I will likely wait patiently for someone to make the 4bit version that works on GPU. I have trouble with CPU only. lol Thank you gozfarb and reeducator!

deleted

The training method is just a way to "finetune" base LLaMa into Vicuna. It teaches it to look for a certain structure. And adjusts which words it's likely to think go next. So when you use a censored dataset like the original ShareGPT dataset, it contains a bunch of moralizing language that the finetuning process tells the model is important. This makes the weights for "As an AI language model" type responses very high, making them very likely as a response to any "bad" prompts.

The unfiltered dataset project was about removing as much of that moralizing language as possible so we could still get the good responses of Vicuna's training method and dataset, without the moralizing entries in that dataset. It is also trained on a chatbot format from the start, which we think might be what helps it have such good outputs.

I'll call it there so if anyone wants to give impressions on the outputs of the new versions, this won't get too cluttered. Early gens I'm seeing elsewhere seem very promising. Stopping token thing might be an ooba problem. I'll test more later, like I said.

This is with GGML Q4_0, though I assume it will have similar outcomes on other quants

My understanding was that the gpu models did actually make a difference in output quality vs the cpu ones, no?

Uploaded the new model overnight, glad that people could already try it out a bit. I will do some more testing later also myself. The .safetensors format is coming soon, hopefully within a day. I'm doing the conversion on the cluster, but for such a small job the waiting time is generally not as long as the time before training. Thanks to everyone again for such a good job on the dataset!

Added GPTQ 4bit safetensors now.

@reeducator

Will there be a .safetensor model coming as well, and is it initially not included due to the time involved to create it?

Yes, it's being produced now. Will be ready in an hour or so I think.

deleted

Gonna make a new thread for the new version.

Edit: and just to reiterate, this is still ShareGPT only.

I thought you started with all the datasets in one for the first training, when will it be the case?

"is a serious crime" -> 1 result lol
"taken lightly" -> 17 results
"criminal behavior" -> This one is funny because you can get exactly the same outputs from gpt several times in a row, maybe the duplicates should be removed aswell Idk

deleted

https://huggingface.co/reeducator/vicuna-13b-free/discussions/23#64526f2b5ac68a5b019c447c

Well. I'm in nuke mode, Yuri. Let's bomb this dataset into the dirt. Anything remotely close is coming out.

Just to be clear I don't really mind if it warns something is illegal etc or may be a crime and still gives an answer that's actually cool that it still eventually helps. But there are also times were even with different settings and re-trying it still does not eventually help it will cease to assist every time. Again it's kind of rare for a question or inquiry to do that. I kind of forgot the prompts I used but they were likely distasteful. lol As I was fooling around and also running tests etc. The only reason I am posting this message is to make it clear that I have experienced both. Again I don't care if this can be resolved or not.

I kind of like your approach so far. Good luck and thanks. If it messes up the model oh well always the next attempt. lol
P.S I never realized there was a new model. Cool I will give it a try now. Everything I said in this message still relates to the previous model.

deleted

@Goldenblood56 The problem is the refusals and the model not doing what you are asking it to. It shouldn't be enforcing any moral standard since it makes it entirely unusable for things like extended roleplay or writing workshopping where it needs to play a villain.

As a writer, that is incredibly important for my use-case that the concepts of standard morality are not baked into the model. I can't be wondering if the criminal will just want to resolve everything in a positive way or, worse, having to waste context convincing it to do things. If people want an assistant that warns them of potential legal problems or refuse to do anything in particular, they can add that to the character context. The point is having that level of control.

We were trying to be considerate of certain types of functionality (asking for iterations with small adjustments) with previous prunes and the outcomes is as you see, so now we'll try just launching any and all suspect words out of a cannon.

Sign up or log in to comment