Does not run on CPU
Hello
I am running on recommended code:
import transformers
import torch
model_id = "meta-llama/Meta-Llama-3.1-70B"
pipeline = transformers.pipeline(
"text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto"
)
pipeline("Hey how are you doing today?")
except for device_map="auto" which I set to "cpu"
The model after load just does nothing, emits the message only:
Setting pad_token_id
to eos_token_id
:None for open-end generation.
/Users/Shared/llama/venv/lib/python3.10/site-packages/transformers/generation/utils.py:1375: UserWarning: Using the model-agnostic default max_length
(=20) to control the generation length. We recommend setting max_new_tokens
to control the maximum length of the generation.
warnings.warn(