Unable to run on V100
Hi, I'm trying to run on a V100 GPU and am following the recommendation of setting attn_implementation='eager'
, but this still returns RuntimeError: FlashAttention only supports Ampere GPUs or newer.
.
Any idea what is going on here?
Thanks!
V100 uses the Volta architecture. Try it on A100, A6000, or A40 GPUs!
I don't have have access to those, but it says in the model card that it should work on V100
Try na running it as a subprocess ?
subprocess.run(
"pip install flash-attn --no-build-isolation",
env={"FLASH_ATTENTION_SKIP_CUDA_BUILD": "TRUE"},
shell=True
)
Flash attention will not work on this type of GPU. My question is why the model still tries to run flash attention when attn_implementation='eager'
is set
Because it overrides the model configuration,
_attn_implementation": "flash_attention_2"