ValueError: `rope_scaling` must be a dictionary with two fields

#15
by jsemrau - opened

ValueError: rope_scaling must be a dictionary with two fields, type and factor, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}

Using the standard script on Huggingface, I get this error message. What needs to be done here?

I am running into this same issue.

Solution : pip install --upgrade transformers

Works! I prev had transformers==4.38.2 but upgrading resolved the rope_scaling error but upgrading resulted in 'top_k_top_p_filtering' ImportError. For those encountering this error: the solution to this second error is pip install --upgrade trl.

In order to run LLaMA 3.1 in the same environment as LLaMA 3 deployments, some additional package upgrades might be necessary. I’ve also had to upgrade VLLM, my backend, to use LLaMA 3.1 as it was throwing rope scaling related errors as well. If you encounter issues similar to the one described above, continue upgrading packages that produce errors, and hopefully, the issue will be resolved.

I am having the same issue. When attempting to load the model with textgenwebui I get the same kind of error and I have updated all requirements/dependencies including transformers.

Perhaps textgenwebui hasn't been updated. Try filing an issue with textgenwebui.

What specific VLLM and transformers version works for LLaMa 3.1?

I have

transformers==4.43.1
vllm==0.5.3.post1

I am still getting same error even after upgrading vllm and transformers version
ValueError: rope_scaling must be a dictionary with two fields, type and factor, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}

I found a 'fix' but I'm not sure what the side effects might be. In the config.json file of the model find the entry that says "rope_scaling" and replace it with this

"rope_scaling": {
"factor": 8.0,
"type": "dynamic"
},

I honestly do not know what these values mean, I just fed in values the loader said it wanted and it seems to work.

I am still getting same error even after upgrading vllm and transformers version
ValueError: rope_scaling must be a dictionary with two fields, type and factor, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}

If it helps here is pip freeze: https://gist.github.com/macsz/4735d3b5265040ffda1220f0b2480acc

I'm also getting this error while loading this llama 3.1 8b instruct:

ValueError: rope_scaling must be a dictionary with two fields, type and factor, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}

Solution : pip install --upgrade transformers

Worked for me, ty!

i tried this, and also upgraded the libraries mentioned in this discussion.

I had the same problem, upgraded transformers and pip, it worked! Do not forget to restart kernel after upgrading packages.

I' ve had the same issue and
pip install --upgrade transformers
was enough and worked for me.

Please update both vllm and transformers

pip install --upgrade transformers
pip install --upgrade vllm

I' ve had the same issue and
pip install --upgrade transformers
was enough and worked for me.

Can you please share you requirements.txt?

Please update both vllm and transformers

pip install --upgrade transformers
pip install --upgrade vllm

still not working for me.

I found a 'fix' but I'm not sure what the side effects might be. In the config.json file of the model find the entry that says "rope_scaling" and replace it with this

"rope_scaling": {
"factor": 8.0,
"type": "dynamic"
},

I honestly do not know what these values mean, I just fed in values the loader said it wanted and it seems to work.

After doing this, I am getting below error

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB. GPU 0 has a total capacty of 44.32 GiB of which 6.96 GiB is free. Process 1447583 has 608.00 MiB memory in use. Process 808909 has 612.00 MiB memory in use. Process 1213235 has 528.00 MiB memory in use. Process 1658457 has 19.74 GiB memory in use. Including non-PyTorch memory, this process has 15.87 GiB memory in use. Of the allocated memory 15.23 GiB is allocated by PyTorch, and 9.49 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

But llama3 is getting loaded fine.

I found a 'fix' but I'm not sure what the side effects might be. In the config.json file of the model find the entry that says "rope_scaling" and replace it with this

"rope_scaling": {
"factor": 8.0,
"type": "dynamic"
},

I honestly do not know what these values mean, I just fed in values the loader said it wanted and it seems to work.

After doing this, I am getting below error

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB. GPU 0 has a total capacty of 44.32 GiB of which 6.96 GiB is free. Process 1447583 has 608.00 MiB memory in use. Process 808909 has 612.00 MiB memory in use. Process 1213235 has 528.00 MiB memory in use. Process 1658457 has 19.74 GiB memory in use. Including non-PyTorch memory, this process has 15.87 GiB memory in use. Of the allocated memory 15.23 GiB is allocated by PyTorch, and 9.49 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

But llama3 is getting loaded fine.

What was before editing this?

"rope_scaling": {
"factor": 8.0,
"type": "dynamic"
},

The out of memory error is a whole different set of problems. You don't have enough VRAM to run the model. Do you use quantization when running the model?

Everyone that is using Huggingface Estimator in sagemaker 2.226.0(latest) will have to wait because currently it supports only transformers 4.36.0 image

"I have transformers==4.43.1"

This works. It should be version 4.43.+
https://github.com/huggingface/transformers/releases

pip install transformers==4.43.1

I found a 'fix' but I'm not sure what the side effects might be. In the config.json file of the model find the entry that says "rope_scaling" and replace it with this

"rope_scaling": {
"factor": 8.0,
"type": "dynamic"
},

I honestly do not know what these values mean, I just fed in values the loader said it wanted and it seems to work.

After doing this, I am getting below error

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB. GPU 0 has a total capacty of 44.32 GiB of which 6.96 GiB is free. Process 1447583 has 608.00 MiB memory in use. Process 808909 has 612.00 MiB memory in use. Process 1213235 has 528.00 MiB memory in use. Process 1658457 has 19.74 GiB memory in use. Including non-PyTorch memory, this process has 15.87 GiB memory in use. Of the allocated memory 15.23 GiB is allocated by PyTorch, and 9.49 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

But llama3 is getting loaded fine.

LLaMA 3.1 has larger context window of 128K vs 8k in LLaMA 3. Try reducing it and see if it works. In VLLM you can set it with --max-model-len param.

I found a 'fix' but I'm not sure what the side effects might be. In the config.json file of the model find the entry that says "rope_scaling" and replace it with this

"rope_scaling": {
"factor": 8.0,
"type": "dynamic"
},

I honestly do not know what these values mean, I just fed in values the loader said it wanted and it seems to work.

After doing this, I am getting below error

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB. GPU 0 has a total capacty of 44.32 GiB of which 6.96 GiB is free. Process 1447583 has 608.00 MiB memory in use. Process 808909 has 612.00 MiB memory in use. Process 1213235 has 528.00 MiB memory in use. Process 1658457 has 19.74 GiB memory in use. Including non-PyTorch memory, this process has 15.87 GiB memory in use. Of the allocated memory 15.23 GiB is allocated by PyTorch, and 9.49 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

But llama3 is getting loaded fine.

LLaMA 3.1 has larger context window of 128K vs 8k in LLaMA 3. Try reducing it and see if it works. In VLLM you can set it with --max-model-len param.

It worked when i did "rope_scaling" : null.
I am not sure how this can affect the inference results.
But this is working now.

None of the above solutions have worked, I have transformers and vllm upgraded completely, and I've tried editing the rope_scaling parameter itself but I keep running into OOM errors when tho I'm running on an A100 80GB. Anyone have any more solutions? I'm not against outside the box thinking at this point

Remove ${HF_HOME} and re-run? Maybe some garbage got cached.

Remove ${HF_HOME} and re-run? Maybe some garbage got cached.

no dice :/

None of the above solutions have worked, I have transformers and vllm upgraded completely, and I've tried editing the rope_scaling parameter itself but I keep running into OOM errors when tho I'm running on an A100 80GB. Anyone have any more solutions? I'm not against outside the box thinking at this point

same problem

Meta Llama org

As @jsemrau mentioned, please make sure that you are on transformers 4.43.2 (or higher) by running pip install --upgrade transformers. This should fix the original issue about rope_scaling. For other issues (like OOM problems), I would suggest to open new issues and provide system details.

Solution : pip install --upgrade transformers

Also works for me, cheers!

How does one do this via the docker installation of TGI? Do we need to build a separate dockerfile first with an upgrade transformers?

after changing to
"rope_scaling": { "factor": 8.0, "type": "dynamic" },
it procedes further
then
... Lib\site-packages\transformers\integrations\awq.py", line 354, in _fuse_awq_mlp new_module = target_cls(gate_proj, down_proj, up_proj, activation_fn) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "d:\code\autoawq\awq\modules\fused\mlp.py", line 41, in __init__ self.linear = awq_ext.gemm_forward_cuda ^^^^^^^ NameError: name 'awq_ext' is not defined

;D

Everyone that is using sagemaker 2.226.0(latest) will have to wait because currently it supports only transformers 4.36.0 image

Anyone knows where can I get a follow up on this info?

I encountered the same issue, and running the following command resolved it for me:

pip install --upgrade transformers==4.43.3

P.S. If you encounter the same issue repeatedly, check other libraries to see if they are installing different versions of transformers(ex: bitsandbytes). For the best results, after installing all the libraries, update transformers to version 4.43.3.

I am getting an error after fine-tuning the Llama 3.1 8B Instruct model and deploying it to SageMaker. I configured SageMaker to use HuggingFace Transformers 4.43, and the deployment was successful. However, when I try to test the endpoint, it gives this error. How can I run pip install --upgrade transformers==4.43.2?

Received client error (400) from 3VSBZEPFose1o1Q8vAytfGhMQD1cnCE5T83b with message "{ "code": 400, "type": "InternalServerException", "message": "rope_scalingmust be a dictionary with with two fields,typeandfactor, got {\u0027factor\u0027: 8.0, \u0027high_freq_factor\u0027: 4.0, \u0027low_freq_factor\u0027: 1.0, \u0027original_max_position_embeddings\u0027: 8192, \u0027rope_type\u0027: \u0027llama3\u0027}" } "

Edit: Disclaimer -> For Aws
Changing the image worked for me. Struggled with all other recommended images, as well as a custom image upgrading transformers. I can't yet explain it.

This one:
763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.3.0-tgi2.2.0-gpu-py310-cu121-ubuntu22.04-v2.0

This error will be resolved if we do an update in the train.py function in axolotl.cli like the following for the rope_scaling: (I did this for my fine-tuning purpose and it worked)

Inject rope_scaling configuration if missing or incomplete

if not hasattr(cfg, 'rope_scaling') or 'type' not in cfg.rope_scaling or 'factor' not in cfg.rope_scaling:
    LOG.warning("`rope_scaling` not found or incomplete in config, applying defaults.")
    cfg.rope_scaling = {
        "type": "linear",  # You can set it to "dynamic" if that's preferred
        "factor": 8.0
    }

I'm still getting the rope_scaling must be a dictionary with two fields, ...' error when using the model Meta-Llama-3.1-8B-Instruct.
I'm running WSL with Ubuntu 22.04, postgresql 16 and postgresml 2.9.3 and all on my laptop.
I have upgraded to:
transformers: 4.45.1
vllm: 0.6.1.dev238+ge2c6e0a82

I don't know how/where to apply the above change to "Inject rope_scaling configuration if missing or incomplete".
Anyone with a similar setup who made it work?

Solution:
postgresml installs a virtual environment in /var/lib/postgresml-python/pgml-venv and despite that I had activated it, "pip install" only updated the system.
So I did this:
cd /var/lib/postgresml-python/pgml-venv
source bin/activate
sudo /var/lib/postgresml-python/pgml-venv/bin/python -m pip3 install --upgrade transformers
sudo /var/lib/postgresml-python/pgml-venv/bin/python -m pip3 install --upgrade vllm

I am just downloading 4bit quantised Llama-3.1-70B

nf4_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(
“meta-llama/Llama-3.1-70B”,
trust_remote_code=True,
use_safetensors=True,
token=HUGGINGFACE_TOKEN,
device_map="auto",
quantization_config=nf4_config
)

but still getting this error
ValueError: rope_scaling must be a dictionary with with two fields, type and factor, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}

these are the details
OS:
-> Ubuntu version =Ubuntu 20.04.6 LTS
-> kernal : 5.15.0-1068-aws
GPU :
-> NVIDIA-SMI 535.183.01
-> Driver Version: 535.183.01
-> CUDA Version: 12.2
-> 4*NVIDIA A10G => 96GiB GPU
Packages
-> vllm : 0.5.5
->transformers : 4.45.1

any suggestion ?

Are you are sure that you are using the right python environment?

I had this issue while fine tuning a model on AWS SageMaker notebook. For others who have had this issue:

The issue there seems to be that SageMaker's most up-to-date integration with Transformers (as of 12th October 2024) is Pytorch 2.1.0 and Transformers 4.36.0.

(You can take a look at images here: https://github.com/aws/deep-learning-containers/blob/master/available_images.md#huggingface-training-containers)

However Transformers 4.36.0 has different parameters for rope_scaling than the later versions of Transformers. You can make the comparison for yourself here:
https://github.com/huggingface/transformers/blob/v4.36.0/src/transformers/models/llama/configuration_llama.py
https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/configuration_llama.py

This means that if you train a modern model which is built alongside modern versions of Transformers, then it might auto-generate a config.json file with rope_scaling parameters that doesn't match the Transformers version you use on SageMaker for other parts of the code such as Endpoint generation (which is where I got my error).

The only fix I know of for this issue is to manually go into where you've saved the trained model and edit the config.json to have only the two parameters Transformers 4.36.0 expects. (This might also be true for earlier versions of Transformers but I haven't checked. To do so, go back to whatever branch of Transformers you are using sits on and check the path to configuration_llama.py and see for yourself how rope_scaling was configured.

But if you are using 4.36.0 it will work if you change 'rope_type' to 'type' and set it to either 'linear' or 'dynamic'. Keep 'factor' as is and delete everything else in rope_scaling.

pip install --upgrade transformers

After update, working for me

It didn't work after installing and updating transformers/torch/other libraries until I switched GPU to CPU, so try to check your environment as well!

I had this issue while fine tuning a model on AWS SageMaker notebook. For others who have had this issue:

The issue there seems to be that SageMaker's most up-to-date integration with Transformers (as of 12th October 2024) is Pytorch 2.1.0 and Transformers 4.36.0.

(You can take a look at images here: https://github.com/aws/deep-learning-containers/blob/master/available_images.md#huggingface-training-containers)

However Transformers 4.36.0 has different parameters for rope_scaling than the later versions of Transformers. You can make the comparison for yourself here:
https://github.com/huggingface/transformers/blob/v4.36.0/src/transformers/models/llama/configuration_llama.py
https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/configuration_llama.py

This means that if you train a modern model which is built alongside modern versions of Transformers, then it might auto-generate a config.json file with rope_scaling parameters that doesn't match the Transformers version you use on SageMaker for other parts of the code such as Endpoint generation (which is where I got my error).

The only fix I know of for this issue is to manually go into where you've saved the trained model and edit the config.json to have only the two parameters Transformers 4.36.0 expects. (This might also be true for earlier versions of Transformers but I haven't checked. To do so, go back to whatever branch of Transformers you are using sits on and check the path to configuration_llama.py and see for yourself how rope_scaling was configured.

But if you are using 4.36.0 it will work if you change 'rope_type' to 'type' and set it to either 'linear' or 'dynamic'. Keep 'factor' as is and delete everything else in rope_scaling.

As a work around, this worked for me, is there any other solution?

Sign up or log in to comment