Query on `vocab_size` in config.json for Inference
#2
by
MatrixC7
- opened
Greetings!
Really appreciate the outstanding performance of this model – thank you for your hard work! I have a minor query regarding the vocab_size
specified in config.json
. Should it be set to 45440 instead of the current 45416 to reflect the actual size? Keeping 45416 would lead to an error while doing the inference with exllamav2 quantization as below:
ERROR: Traceback (most recent call last):
ERROR: File "F:\tabbyAPI\main.py", line 460, in generator
ERROR: for part, prompt_tokens, completion_tokens in new_generation:
ERROR: File "F:\tabbyAPI\backends\exllamav2\model.py", line 741, in generate_gen
ERROR: chunk, eos, tokens, _, _ = self.generator.stream()
ERROR: ^^^^^^^^^^^^^^^^^^^^^^^
ERROR: File "C:\Users\i\scoop\apps\mambaforge\current\envs\tabbyapi-test\Lib\site-packages\exllamav2\generator\streaming.py", line 117, in stream
ERROR: chunk, eos, chunk_token_ids, probs, logits = self._stream()
ERROR: ^^^^^^^^^^^^^^
ERROR: File "C:\Users\i\scoop\apps\mambaforge\current\envs\tabbyapi-test\Lib\site-packages\exllamav2\generator\streaming.py", line 196, in _stream
ERROR: self.held_logits = torch.cat([self.held_logits, next_logits], dim = 0)
ERROR: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR: RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 45416 but got size 45440 for tensor number 1 in the list.
Kind regards,
Fangru Shao
The problem comes from exllamav2 and @turboderp has fixed it! 🥳No need to change 45416 to make the quants work!
MatrixC7
changed discussion status to
closed