Lopp at the end of sentences

#2
by MeTRoPol7 - opened

exllama:

Temperature:0.95
Top-K:off
Top-P:0.75
Min-P:off
Typical:0.25

User:
Hello, How are you today?
Chatbot:
I am not sure how to answer that question because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because

The problem does not happen in AutoGPTQ,

Also I tried m-sys/FastChat with with GPTQ-for-LLaMa and this error This error appears:
File "C:\ProgramData\Miniconda\envs\cuda-env\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: expected scalar type Float but found Half

Any solution to set context in exllama or FastChat?

I understand add " --alpha 4.0 " to exllama fix the problem, but I cant find any solution for FastChat.

Sign up or log in to comment