Update max_position_embeddings
#2
by
FlorianJc
- opened
If the model is really able to handle 128k context size, you should set max_position_embeddings and max_length to 131072.
If not so, vLLM reject max_model_len>8192
This is the 8k context version, and we use LongLM to increase the context, you can refer to it here.
Note: reusing Llama
model source code will lose some extension code, but don't worry, it still works fine when used with LongLM
.
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.bfloat16,
attn_implementation="flash_attention_2",
trust_remote_code=True,
)
SelfExtend.apply(
model,
group_size=16,
window_size=512,
enable_flash_attention=True,
flash_attention_impl="flash_attn",
)
model.generation_config.max_length = 123392