Unable to run on V100

#11
by sdascoli - opened

Hi, I'm trying to run on a V100 GPU and am following the recommendation of setting attn_implementation='eager', but this still returns RuntimeError: FlashAttention only supports Ampere GPUs or newer..
Any idea what is going on here?
Thanks!

V100 uses the Volta architecture. Try it on A100, A6000, or A40 GPUs!

I don't have have access to those, but it says in the model card that it should work on V100

Try na running it as a subprocess ?

subprocess.run(
"pip install flash-attn --no-build-isolation",
env={"FLASH_ATTENTION_SKIP_CUDA_BUILD": "TRUE"},
shell=True
)

Flash attention will not work on this type of GPU. My question is why the model still tries to run flash attention when attn_implementation='eager' is set

Because it overrides the model configuration,
_attn_implementation": "flash_attention_2"

Sign up or log in to comment