DeepSeek-V2-Lite-Chat: AttributeError: 'DynamicCache' object has no attribute 'get_max_length'
#9
by
nikoscham
- opened
Hi, I am trying to run the DeepSeek-V2-Lite-Chat example:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
model_name = "deepseek-ai/DeepSeek-V2-Lite-Chat"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
model.generation_config = GenerationConfig.from_pretrained(model_name)
model.generation_config.pad_token_id = model.generation_config.eos_token_id
messages = [
{"role": "user", "content": "Write a piece of quicksort code in C++"}
]
input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
outputs = model.generate(input_tensor.to(model.device), max_new_tokens=100)
result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True)
print(result)
but I get:
Loading checkpoint shards: 100% 4/4 [00:04<00:00, 1.05it/s]
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[1], line 14
10 messages = [
11 {"role": "user", "content": "Write a piece of quicksort code in C++"}
12 ]
13 input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
---> 14 outputs = model.generate(input_tensor.to(model.device), max_new_tokens=100)
16 result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True)
17 print(result)
File ~\AppData\Local\anaconda3\envs\chatbot\lib\site-packages\torch\utils\_contextlib.py:116, in context_decorator.<locals>.decorate_context(*args, **kwargs)
113 @functools.wraps(func)
114 def decorate_context(*args, **kwargs):
115 with ctx_factory():
--> 116 return func(*args, **kwargs)
File ~\AppData\Local\anaconda3\envs\chatbot\lib\site-packages\transformers\generation\utils.py:2223, in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, assistant_model, streamer, negative_prompt_ids, negative_prompt_attention_mask, **kwargs)
2215 input_ids, model_kwargs = self._expand_inputs_for_generation(
2216 input_ids=input_ids,
2217 expand_size=generation_config.num_return_sequences,
2218 is_encoder_decoder=self.config.is_encoder_decoder,
2219 **model_kwargs,
2220 )
2222 # 12. run sample (it degenerates to greedy search when `generation_config.do_sample=False`)
-> 2223 result = self._sample(
2224 input_ids,
2225 logits_processor=prepared_logits_processor,
2226 stopping_criteria=prepared_stopping_criteria,
2227 generation_config=generation_config,
2228 synced_gpus=synced_gpus,
2229 streamer=streamer,
2230 **model_kwargs,
2231 )
2233 elif generation_mode in (GenerationMode.BEAM_SAMPLE, GenerationMode.BEAM_SEARCH):
2234 # 11. prepare beam search scorer
2235 beam_scorer = BeamSearchScorer(
2236 batch_size=batch_size,
2237 num_beams=generation_config.num_beams,
(...)
2242 max_length=generation_config.max_length,
2243 )
File ~\AppData\Local\anaconda3\envs\chatbot\lib\site-packages\transformers\generation\utils.py:3204, in GenerationMixin._sample(self, input_ids, logits_processor, stopping_criteria, generation_config, synced_gpus, streamer, **model_kwargs)
3199 is_prefill = True
3200 while self._has_unfinished_sequences(
3201 this_peer_finished, synced_gpus, device=input_ids.device, cur_len=cur_len, max_length=max_length
3202 ):
3203 # prepare model inputs
-> 3204 model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
3206 # prepare variable output controls (note: some models won't accept all output controls)
3207 model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
File ~\.cache\huggingface\modules\transformers_modules\deepseek-ai\DeepSeek-V2-Lite-Chat\85864749cd611b4353ce1decdb286193298f64c7\modeling_deepseek.py:1728, in DeepseekV2ForCausalLM.prepare_inputs_for_generation(self, input_ids, past_key_values, attention_mask, inputs_embeds, **kwargs)
1726 cache_length = past_key_values.get_seq_length()
1727 past_length = past_key_values.seen_tokens
-> 1728 max_cache_length = past_key_values.get_max_length()
1729 else:
1730 cache_length = past_length = past_key_values[0][0].shape[2]
File ~\AppData\Local\anaconda3\envs\chatbot\lib\site-packages\torch\nn\modules\module.py:1928, in Module.__getattr__(self, name)
1926 if name in modules:
1927 return modules[name]
-> 1928 raise AttributeError(
1929 f"'{type(self).__name__}' object has no attribute '{name}'"
1930 )
AttributeError: 'DynamicCache' object has no attribute 'get_max_length'
Is there any compatibility issue with the Transformers library? I have the 4.49.0 version:
pip show transformers
Name: transformers
Version: 4.49.0
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: [email protected]
License: Apache 2.0 License
Location: c:\users\thread96\appdata\local\anaconda3\envs\chatbot\lib\site-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by: llama-index-llms-huggingface, llama-index-llms-openai-like, peft, sentence-transformers, trl
Any help?
Thanks!
Nikos
I found a solution - see here: https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite-Chat/discussions/8