Llama3 Not Running
Hi, I recently downloaded llama3 and am trying to run it on vscode. I've installed of the prereqs and it seems my computer meets the hardware reqs (Lenovo ThinkPad with 16GB of RAM); however, when I execute the model.generate, it acts like it tries to run the model but nothing generates. I see my memory spike from 6GB to 13 and hovers here until I restart my computer. Any idea as to what is going on?
Here is my code:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch
import os
from dotenv import load_dotenv
load_dotenv()
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
Load the Hugging Face token from the environment variable
huggingface_token = os.getenv("HUGGINGFACE_TOKEN")
tokenizer = AutoTokenizer.from_pretrained(model_id)
Explicitly set pad_token_id to eos_token_id (128001)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto", # This requires the accelerate library
use_auth_token=huggingface_token,
)
#response = text_generator(prompt)
messages = [
{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
{"role": "user", "content": "Who are you?"},
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
outputs = model.generate(
input_ids,
max_new_tokens=256,
eos_token_id=[128001,128009],
pad_token_id=tokenizer.pad_token_id,
do_sample=True,
temperature=0.6,
top_p=0.9,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))
I am having the same issue. Any updates?
Any help would be appreciated!
Same problem :|