Unable to run on V100

#11

by sdascoli - opened about 7 hours ago

about 7 hours ago

Hi, I'm trying to run on a V100 GPU and am following the recommendation of setting attn_implementation='eager', but this still returns RuntimeError: FlashAttention only supports Ampere GPUs or newer..
Any idea what is going on here?
Thanks!

prithivMLmods

about 5 hours ago

V100 uses the Volta architecture. Try it on A100, A6000, or A40 GPUs!

sdascoli

about 5 hours ago

I don't have have access to those, but it says in the model card that it should work on V100

prithivMLmods

about 5 hours ago

Try na running it as a subprocess ?

subprocess.run(
"pip install flash-attn --no-build-isolation",
env={"FLASH_ATTENTION_SKIP_CUDA_BUILD": "TRUE"},
shell=True
)

sdascoli

about 5 hours ago

Flash attention will not work on this type of GPU. My question is why the model still tries to run flash attention when attn_implementation='eager' is set

prithivMLmods

about 5 hours ago

•

edited about 3 hours ago

Because it overrides the model configuration,
_attn_implementation": "flash_attention_2"

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment