Token generation limit?
#1
by
fab4ml
- opened
Hi, token generation stops after a few paragraphs (less than 20 sentences for the prompt below), regardless of the value of max_tokens. I am wondering if this is the expected behavior?
from mlx_lm import load, generate
model, tokenizer = load("../Mixtral-8x7B-Instruct-v0.1-hf-4bit-mlx")
response = generate(model, tokenizer, prompt="what is a star", verbose=True, max_tokens=999999999999999999)
Stepping through the code of the generate function in mlx_lm/utils.py, generation appears to be stopping because the generated token is equal to eos_token_id.
for token, _ in zip(generate_step(prompt, model, temp), range(max_tokens)):
if token == tokenizer.eos_token_id:
That's the expected behavior. The Instruct model was trained to interact in a conversational manner and will stop at the end of a sentence (EOS). If you just want to generate text as an autocomplete feature, you can try the base model.
Got it, thank you
fab4ml
changed discussion status to
closed