mistralai
/

Mistral-7B-Instruct-v0.3

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Slow tokenizer problem.

#22

by bradhutchings - opened May 27

May 27

I'm trying to make this work with PrivateGPT:
https://github.com/zylon-ai/private-gpt

Their default local LLM is Mistral-7B-Instruct-v0.2.

With the new tokenizer, I'm getting this error.

Downloading tokenizer mistralai/Mistral-7B-Instruct-v0.3
You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Traceback (most recent call last):
File "/home/linux/privateGPT/scripts/setup", line 43, in
AutoTokenizer.from_pretrained(
File "/home/linux/.cache/pypoetry/virtualenvs/private-gpt-AnyLGiqx-py3.11/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 825, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/linux/.cache/pypoetry/virtualenvs/private-gpt-AnyLGiqx-py3.11/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2048, in from_pretrained
return cls._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^
File "/home/linux/.cache/pypoetry/virtualenvs/private-gpt-AnyLGiqx-py3.11/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2287, in _from_pretrained
tokenizer = cls(*init_inputs, init_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/linux/.cache/pypoetry/virtualenvs/private-gpt-AnyLGiqx-py3.11/lib/python3.11/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 133, in init
super().init(
File "/home/linux/.cache/pypoetry/virtualenvs/private-gpt-AnyLGiqx-py3.11/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py", line 102, in init**
raise ValueError(
ValueError: Cannot instantiate this tokenizer from a slow version. If it's based on sentencepiece, make sure you have sentencepiece installed.

I'm not very knowledgeable about how these things work, but am wondering if there's something that can be done about it.

Thanks!
-Brad

May 28

try pip install sentencepiece

ArthurZ

May 28

Yep, but at the same time for a fast tokenizer, this should not be necessary, we'll updated this

May 29

I immediately tried pip install sentencepiece when I first encountered that, and that didn't help.

I found other suggestions of things to install after Googling the error message. They didn't solve the problem either.

I did a deep dive (as deep as I could anyway) into add_prefix_space and came up empty there too.

skr1125

Jun 22

I am encountering the same exact problem when I try to use the Mistral-7B-Instruct-v0.3 model in a kaggle competition.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment