Warning when loading the model with HuggingFace Transformers
Hi,
When I try to load the model with: "AutoModel.from_pretrained('utter-project/mHuBERT-147')", I have this warning:
Some weights of the model checkpoint at mHUBERT were not used when initializing HubertModel: ['encoder.pos_conv_embed.conv.weight_g', 'encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing HubertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing HubertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of HubertModel were not initialized from the model checkpoint at mHUBERT and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'encoder.pos_conv_embed.conv.parametrizations.weight.original1']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Is this expected? I want to use the model to embed audio datasets.
Thanks!
I don't know about your error, but when I load the model with the repo weights (checkpoint_best.pt), it works perfectly.
While you're waiting for the authors to get back to you, you can give this model a try.
Hello,
What is the class of "AutoModel" you are using?
Hi again! I think I might have just reproduced your issue on a HF spaces docker by accident.
It seems conv.weight_g and conv.weight_v where replaced by conv.parametrizations.weight.original0 and conv.parametrizations.weight.original1 on a newer transformers version.
I manage to get it to correctly load the checkpoint by using the following environment:
numpy==1.26.3
torch==1.13.1
transformers==4.32.0
Posting this here in case someone else experiences this problem with mHuBERT-147:
https://github.com/huggingface/transformers/issues/26796
Crazily enough, this is a fake warning! The weights are correctly loaded on torch>=2!
how to use this model?
Hi @MonolithFoundation ,
mHuBERT-147 is a multilingual speech representation model.
Here there is some more information on this model: https://huggingface.co/blog/mzboito/naver-demo-french-slu#mhubert-147-a-compact-multilingual-hubert-model
You can also check the ASR fine-tuning tutorial below.
To simply load the model as it is (not an ASR module, just a speech representation model), use the following:
from transformers import HubertModel
HubertModel.from_pretrained("utter-project/mHuBERT-147")
i want compare the audio and the text inside it without doing asr, is that possible? (same thing as compare an image and text)
It is not possible. mHuBERT-147 is a speech-only encoder, you need something else for text