TheBloke/Llama-2-70B-Chat-GPTQ · ExLlama is not working, received "shape '[1, 64, 64, 128]' is invalid for input of size 65536" error

Jul 19, 2023

I've updated text-generation-web-UI to latest (transformers updated to 4.31.0) and also manually verified the exllama folder also has been updated and contains this https://github.com/turboderp/exllama/commit/b3aea521859b83cfd889c4c00c05a323313b7fee commit.

Exllama is able to load the module, but when I typing, i got:

Traceback (most recent call last):
File "c:\oobabooga_windows\text-generation-webui\modules\text_generation.py", line 331, in generate_reply_custom
for reply in shared.model.generate_with_streaming(question, state):
File "c:\oobabooga_windows\text-generation-webui\modules\exllama.py", line 98, in generate_with_streaming
self.generator.gen_begin_reuse(ids)
File "c:\oobabooga_windows\installer_files\env\lib\site-packages\exllama\generator.py", line 186, in gen_begin_reuse
self.gen_begin(in_tokens)
File "c:\oobabooga_windows\installer_files\env\lib\site-packages\exllama\generator.py", line 171, in gen_begin
self.model.forward(self.sequence[:, :-1], self.cache, preprocess_only = True, lora = self.lora)
File "c:\oobabooga_windows\installer_files\env\lib\site-packages\exllama\model.py", line 887, in forward
r = self._forward(input_ids[:, chunk_begin : chunk_end],
File "c:\oobabooga_windows\installer_files\env\lib\site-packages\exllama\model.py", line 968, in _forward
hidden_states = decoder_layer.forward(hidden_states, cache, buffers[device], lora)
File "c:\oobabooga_windows\installer_files\env\lib\site-packages\exllama\model.py", line 471, in forward
hidden_states = self.self_attn.forward(hidden_states, cache, buffer, lora)
File "c:\oobabooga_windows\installer_files\env\lib\site-packages\exllama\model.py", line 389, in forward
key_states = key_states.view(bsz, q_len, self.config.num_attention_heads, self.config.head_dim).transpose(1, 2)
RuntimeError: shape '[1, 64, 64, 128]' is invalid for input of size 65536
Output generated in 0.00 seconds (0.00 tokens/s, 0 tokens, context 65, seed 789726404)

is there anyone able to get the exllama working?

Thanks

hugginglaoda

Jul 22, 2023

•

edited Jul 22, 2023

use the latest version in main branch of exllama and latest released version of transformer

TheBloke

Owner Jul 22, 2023

text-generation-webui provides its own exllama wheel, and I don't know if that's been updated yet. Try pip3 uninstall exllama in the Python environment of text-generation-webui, then run again. That will cause exllama to automatically build its kernel extension on model load, which will therefore definitely include the llama 70B changes