mlx-community/Mixtral-8x7B-v0.1-hf-4bit-mlx

Jan 18

Hi, token generation stops after a few paragraphs (less than 20 sentences for the prompt below), regardless of the value of max_tokens. I am wondering if this is the expected behavior?

from mlx_lm import load, generate
model, tokenizer = load("../Mixtral-8x7B-Instruct-v0.1-hf-4bit-mlx")
response = generate(model, tokenizer, prompt="what is a star", verbose=True, max_tokens=999999999999999999)

Stepping through the code of the generate function in mlx_lm/utils.py, generation appears to be stopping because the generated token is equal to eos_token_id.

for token, _ in zip(generate_step(prompt, model, temp), range(max_tokens)):
    if token == tokenizer.eos_token_id:

mzbac

MLX Community org Jan 18

That's the expected behavior. The Instruct model was trained to interact in a conversational manner and will stop at the end of a sentence (EOS). If you just want to generate text as an autocomplete feature, you can try the base model.

fab4ml

Jan 18

Got it, thank you

fab4ml changed discussion status to closed Jan 18

mlx-community
/

Mixtral-8x7B-v0.1-hf-4bit-mlx

Token generation limit?