Needs max_model_len attribute in config.json
#12
by
shaily99
- opened
vllm uses 8k context length by default, and cannot use the 128K.
There has been discussions on several vllm issues. It seems they added a fix to allow the higher model Len, as long as there is a max_model_len key in the config.
It seems that this key was added to the unquantized version of command R but has not been added to this quantized repo.
Ref: https://github.com/vllm-project/vllm/issues/3676#issuecomment-2026793168