moonshotai/Moonlight-16B-A3B · Fix generation with latest transformers

kylesayrs

1 day ago

•

edited about 24 hours ago

Purpose

Fix model generation

Related Issues

https://github.com/huggingface/transformers/issues/36071

Changes

The latest transformers release removed support for past_key_values.get_max_length() in favor of past_key_values.get_max_cache_shape()
Add support for decoding tensors of ids, as is the typical output from generation

Testing

from transformers import AutoModelForCausalLM, AutoTokenizer

# Select model and load it.
MODEL_ID = "moonshotai/Moonlight-16B-A3B"

model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    device_map="auto",
    torch_dtype="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, trust_remote_code=True)

# # Confirm generations of the quantized model look sane.
print("\n\n")
print("========== SAMPLE GENERATION ==============")
input_ids = tokenizer("Hello my name is", return_tensors="pt").input_ids.to("cuda")
output = model.generate(input_ids, max_new_tokens=100)
print(tokenizer.decode(output[0]))
print("==========================================\n\n")

fix max shape call2cc43f6f

handle torch tensor194676f6

kylesayrs changed pull request title from Fix DynamicCache with latest transformers to Fix generation with latest transformers 1 day ago

kylesayrs changed pull request status to open about 24 hours ago