Inference on CPU

#13
by wambugu1738 - opened

How can the model be run on CPU considering, flash_attn doesn't support cpu

Remove import flash_attn

from unittest.mock import patch
from transformers.dynamic_module_utils import get_imports

def fixed_get_imports(filename: str | os.PathLike) -> list[str]:
    if not str(filename).endswith("modeling_florence2.py"):
        return get_imports(filename)
    imports = get_imports(filename)
    imports.remove("flash_attn")
    return imports

Use attn_implementation="sdpa" when load model:

 with patch("transformers.dynamic_module_utils.get_imports", fixed_get_imports): #workaround for unnecessary flash_attn requirement
            model = AutoModelForCausalLM.from_pretrained(model_path, attn_implementation="sdpa", torch_dtype=dtype,trust_remote_code=True)

With this, you can inference on CPU or GPU doesn't support flash_attn.

Thanks, it works now.

wambugu1738 changed discussion status to closed

Thanks, it works now.

thanks, It works now.

Sign up or log in to comment