deepseek-ai/DeepSeek-V2-Lite-Chat · DeepSeek-V2-Lite-Chat: AttributeError: 'DynamicCache' object has no attribute 'get_max

Hi, I am trying to run the DeepSeek-V2-Lite-Chat example:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig

model_name = "deepseek-ai/DeepSeek-V2-Lite-Chat"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
model.generation_config = GenerationConfig.from_pretrained(model_name)
model.generation_config.pad_token_id = model.generation_config.eos_token_id

messages = [
    {"role": "user", "content": "Write a piece of quicksort code in C++"}
]
input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
outputs = model.generate(input_tensor.to(model.device), max_new_tokens=100)

result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True)
print(result)

but I get:

Loading checkpoint shards: 100% 4/4 [00:04<00:00,  1.05it/s]
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[1], line 14
     10 messages = [
     11     {"role": "user", "content": "Write a piece of quicksort code in C++"}
     12 ]
     13 input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
---> 14 outputs = model.generate(input_tensor.to(model.device), max_new_tokens=100)
     16 result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True)
     17 print(result)

File ~\AppData\Local\anaconda3\envs\chatbot\lib\site-packages\torch\utils\_contextlib.py:116, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    113 @functools.wraps(func)
    114 def decorate_context(*args, **kwargs):
    115     with ctx_factory():
--> 116         return func(*args, **kwargs)

File ~\AppData\Local\anaconda3\envs\chatbot\lib\site-packages\transformers\generation\utils.py:2223, in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, assistant_model, streamer, negative_prompt_ids, negative_prompt_attention_mask, **kwargs)
   2215     input_ids, model_kwargs = self._expand_inputs_for_generation(
   2216         input_ids=input_ids,
   2217         expand_size=generation_config.num_return_sequences,
   2218         is_encoder_decoder=self.config.is_encoder_decoder,
   2219         **model_kwargs,
   2220     )
   2222     # 12. run sample (it degenerates to greedy search when `generation_config.do_sample=False`)
-> 2223     result = self._sample(
   2224         input_ids,
   2225         logits_processor=prepared_logits_processor,
   2226         stopping_criteria=prepared_stopping_criteria,
   2227         generation_config=generation_config,
   2228         synced_gpus=synced_gpus,
   2229         streamer=streamer,
   2230         **model_kwargs,
   2231     )
   2233 elif generation_mode in (GenerationMode.BEAM_SAMPLE, GenerationMode.BEAM_SEARCH):
   2234     # 11. prepare beam search scorer
   2235     beam_scorer = BeamSearchScorer(
   2236         batch_size=batch_size,
   2237         num_beams=generation_config.num_beams,
   (...)
   2242         max_length=generation_config.max_length,
   2243     )

File ~\AppData\Local\anaconda3\envs\chatbot\lib\site-packages\transformers\generation\utils.py:3204, in GenerationMixin._sample(self, input_ids, logits_processor, stopping_criteria, generation_config, synced_gpus, streamer, **model_kwargs)
   3199 is_prefill = True
   3200 while self._has_unfinished_sequences(
   3201     this_peer_finished, synced_gpus, device=input_ids.device, cur_len=cur_len, max_length=max_length
   3202 ):
   3203     # prepare model inputs
-> 3204     model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
   3206     # prepare variable output controls (note: some models won't accept all output controls)
   3207     model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})

File ~\.cache\huggingface\modules\transformers_modules\deepseek-ai\DeepSeek-V2-Lite-Chat\85864749cd611b4353ce1decdb286193298f64c7\modeling_deepseek.py:1728, in DeepseekV2ForCausalLM.prepare_inputs_for_generation(self, input_ids, past_key_values, attention_mask, inputs_embeds, **kwargs)
   1726     cache_length = past_key_values.get_seq_length()
   1727     past_length = past_key_values.seen_tokens
-> 1728     max_cache_length = past_key_values.get_max_length()
   1729 else:
   1730     cache_length = past_length = past_key_values[0][0].shape[2]

File ~\AppData\Local\anaconda3\envs\chatbot\lib\site-packages\torch\nn\modules\module.py:1928, in Module.__getattr__(self, name)
   1926     if name in modules:
   1927         return modules[name]
-> 1928 raise AttributeError(
   1929     f"'{type(self).__name__}' object has no attribute '{name}'"
   1930 )

AttributeError: 'DynamicCache' object has no attribute 'get_max_length'

Is there any compatibility issue with the Transformers library? I have the 4.49.0 version:

pip show transformers
Name: transformers
Version: 4.49.0
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: [email protected]
License: Apache 2.0 License
Location: c:\users\thread96\appdata\local\anaconda3\envs\chatbot\lib\site-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by: llama-index-llms-huggingface, llama-index-llms-openai-like, peft, sentence-transformers, trl

Any help?
Thanks!
Nikos

deepseek-ai
/

DeepSeek-V2-Lite-Chat

DeepSeek-V2-Lite-Chat: AttributeError: 'DynamicCache' object has no attribute 'get_max_length'