Need Update due to flash attention suported in original model?
#4
by
spikezz
- opened
when loading 8-bit version, I get error:
Traceback (most recent call last):
File "/home/spikezz/Project/text-generation-webui/modules/ui_model_menu.py", line 209, in load_model_wrapper
shared.model, shared.tokenizer = load_model(shared.model_name, loader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/spikezz/Project/text-generation-webui/modules/models.py", line 85, in load_model
output = load_func_map[loader](model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/spikezz/Project/text-generation-webui/modules/models.py", line 234, in huggingface_loader
model = LoaderClass.from_pretrained(path_to_model, **params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/spikezz/Project/text-generation-webui/installer_files/env/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained
return model_class.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/spikezz/Project/text-generation-webui/installer_files/env/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3233, in from_pretrained
config = cls._check_and_enable_flash_attn_2(config, torch_dtype=torch_dtype, device_map=device_map)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/spikezz/Project/text-generation-webui/installer_files/env/lib/python3.11/site-packages/transformers/modeling_utils.py", line 1267, in _check_and_enable_flash_attn_2
raise ValueError(
ValueError: The current architecture does not support Flash Attention 2.0. Please open an issue on GitHub to request support for this architecture: https://github.com/huggingface/transformers/issues/new
but base on the update info on https://huggingface.co/Qwen/Qwen-14B-Chat, It seems to support flash-attention-2, do we need an update here as well?