microsoft/Florence-2-base-ft · Inference on CPU

Jul 1, 2024

How can the model be run on CPU considering, flash_attn doesn't support cpu

Jul 2, 2024

Remove import flash_attn

from unittest.mock import patch
from transformers.dynamic_module_utils import get_imports

def fixed_get_imports(filename: str | os.PathLike) -> list[str]:
    if not str(filename).endswith("modeling_florence2.py"):
        return get_imports(filename)
    imports = get_imports(filename)
    imports.remove("flash_attn")
    return imports

Use attn_implementation="sdpa" when load model:

 with patch("transformers.dynamic_module_utils.get_imports", fixed_get_imports): #workaround for unnecessary flash_attn requirement
            model = AutoModelForCausalLM.from_pretrained(model_path, attn_implementation="sdpa", torch_dtype=dtype,trust_remote_code=True)

With this, you can inference on CPU or GPU doesn't support flash_attn.

wambugu71

Jul 2, 2024

Thanks, it works now.

wambugu71 changed discussion status to closed Jul 2, 2024

JiabaoWangTS

Jul 3, 2024

Thanks, it works now.

carion99

Sep 29, 2024

thanks, It works now.