Does not run on CPU

#22
by andmed - opened

Hello

I am running on recommended code:

import transformers
import torch
model_id = "meta-llama/Meta-Llama-3.1-70B"
pipeline = transformers.pipeline(
"text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto"
)
pipeline("Hey how are you doing today?")

except for device_map="auto" which I set to "cpu"

The model after load just does nothing, emits the message only:

Setting pad_token_id to eos_token_id:None for open-end generation.
/Users/Shared/llama/venv/lib/python3.10/site-packages/transformers/generation/utils.py:1375: UserWarning: Using the model-agnostic default max_length (=20) to control the generation length. We recommend setting max_new_tokens to control the maximum length of the generation.
warnings.warn(

Sign up or log in to comment