DocOwl2 using flash attention mandatorily

#1
by nicozck - opened

DocOwl2 cannot be loaded without flash_attn because the implementation of the compressor mandatorily uses flash attention.

This issue causes DocOwl2 to not run on many non-NVIDIA devices. Please consider adding an option to disable or enable flash attention.

https://huggingface.co/mPLUG/DocOwl2/blob/205b9e18b0cb503c9ef0dde1e7b120e6925778d9/visual_compressor.py#L106

Sign up or log in to comment