Can you be more specific how to use 8k context with ExLLama in Oobabooga?
Originally the model card said for ExLlama that you had to manually add the patch which I couldn't figure out how to do, then you recently updated it with this note:
If you are using exllama the monkey-patch is built into the engine, please use -cpe to set the scaling factor, ie. if you are running it at 4k context, pass -cpe 2 -l 4096
This may work from running on a command-line but does not specify how to pass those parameters to Oobabooga. Applying them to the CMD_FLAGS after --loader exllama
or --monkey-patch
is not a recognized argument. Also, that example doesn't indicate whether cpe goes up to 4 for 8k context or down to 1, requiring further pre-requisite knowledge to use this model properly
Same question here, I think it is related to exllama itself. Someone would have (I think) a way to add the parameter into ooba.
I just did a PR which adds these values into ooba to be able to set them.
https://github.com/oobabooga/text-generation-webui/pull/2876
Edit: ignore that PR, ooba just did it now as well lol https://github.com/oobabooga/text-generation-webui/pull/2875
Edit2: it's merged
Im using huggingface text-generation inference and running the fp16 version. How do i apply the patching to that?