Error while loading through the code at huggingface

I loaded it through the use in transformer code and got the following error.

from transformers import AutoProcessor, AutoModelForPreTraining

processor = AutoProcessor.from_pretrained("KBLab/wav2vec2-large-voxrex")

model = AutoModelForPreTraining.from_pretrained("KBLab/wav2vec2-large-voxrex")

Maybe you would like to add information to the repo or change the default code to work directly from Use in Transformers?

OSError: Can't load tokenizer for 'KBLab/wav2vec2-large-voxrex'. If you were trying to load it from '', make sure you don't have a local directory with the same name. Otherwise, make sure 'KBLab/wav2vec2-large-voxrex' is the correct path to a directory containing all relevant files for a Wav2Vec2CTCTokenizer tokenizer.

National Library of Sweden / KBLab org
edited Oct 19, 2022

wav2vec2-large-voxrex is a different repo which does not have a vocabulary (vocab.json) nor a tokenizer config file (tokenizer_config.json). You'd need to clone that repo with git and add your own vocabulary manually.
This repo is wav2vec2-large-voxrex-swedish. You can load it for continued pretraining using existing vocab (edit: continued pretraining doesn't need a vocab, see comment below this one):

from transformers import AutoProcessor, AutoModelForPreTraining
processor = AutoProcessor.from_pretrained("KBLab/wav2vec2-large-voxrex-swedish")
model = AutoModelForPreTraining.from_pretrained("KBLab/wav2vec2-large-voxrex-swedish")

or with CTC

from transformers import AutoProcessor, AutoModelForCTC
processor = AutoProcessor.from_pretrained("KBLab/wav2vec2-large-voxrex-swedish")
model = AutoModelForCTC.from_pretrained("KBLab/wav2vec2-large-voxrex-swedish")

See links below for differences in files they include

National Library of Sweden / KBLab org
edited Oct 19, 2022

@marma Is it necessary to have vocab during unsupervised pretraining?

If you want to continue to pretrain, you may not need vocab: . There is no supervised data to feed the model in such a scenario.

Pretraining setup (no processor):

import torch
from transformers import AutoFeatureExtractor, Wav2Vec2ForPreTraining

feature_extractor = AutoFeatureExtractor.from_pretrained("KBLab/wav2vec2-large-voxrex")
model = Wav2Vec2ForPreTraining.from_pretrained("KBLab/wav2vec2-large-voxrex")

Otherwise, if you need to finetune the model yourself @birgermoell , my suggestion would be to git clone, and add all tokenizer related files from to your cloned folder. Then load the model locally on your computer.

See: for example on creating vocab from scratch, and for finetuning. However, you should just be able to copy over tokenizer related files from KBLab/wav2vec2-large-voxrex-swedishto your cloned folder if your purpose is to finetune.

National Library of Sweden / KBLab org
edited Oct 19, 2022

@Lauler You are right, unsupervised pretraining does not need a vocab. The vocab is derived from the speech-text pairs used in finetuning.

My hope was to get the embeddings out from the base model in order to use it for a classification task (which is not dependent on transcription and that is why I don't want to use the embeddings from the fine-tuned models). I'm honestly a bit confused between the difference between and

National Library of Sweden / KBLab org

@birgermoell The only difference is that VoxRex-swedish is a Wav2Vec2ForCTC, i.e it has a CTC head on top of the pretrained model that has been fintuned for Swedish. My guess is that you want pooled_output or something similar. Maybe this[1] already does that?


