Context length is not 128k

#41
by pseudotensor - opened

vllm uses default of 8k, and can't make it use 128k.

https://github.com/vllm-project/vllm/issues/3676

you can .. just change the config.json
but 128k would take over 130g vram alone .. i can only fit 64 in 96g

As I argue in that vLLM thread. I don't think that's how it should be done. Shouldn't just change embedding size, since rope scaling is used. It should be part of the calculation.

Based on the discussion on this post, we should use the --max_model_length when using vllm for this model to use the larger context window? Because by default, I am currently getting errors as it is using 8192 as the length.

This comment has been hidden

Sign up or log in to comment