MLX
mixtral
Mixture of Experts

Token generation limit?

#1
by fab4ml - opened

Hi, token generation stops after a few paragraphs (less than 20 sentences for the prompt below), regardless of the value of max_tokens. I am wondering if this is the expected behavior?

from mlx_lm import load, generate
model, tokenizer = load("../Mixtral-8x7B-Instruct-v0.1-hf-4bit-mlx")
response = generate(model, tokenizer, prompt="what is a star", verbose=True, max_tokens=999999999999999999)

Stepping through the code of the generate function in mlx_lm/utils.py, generation appears to be stopping because the generated token is equal to eos_token_id.

for token, _ in zip(generate_step(prompt, model, temp), range(max_tokens)):
    if token == tokenizer.eos_token_id:
MLX Community org

That's the expected behavior. The Instruct model was trained to interact in a conversational manner and will stop at the end of a sentence (EOS). If you just want to generate text as an autocomplete feature, you can try the base model.

Got it, thank you

fab4ml changed discussion status to closed

Sign up or log in to comment