[Error?] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

#7
by flexai - opened

When using an example from https://huggingface.co/distil-whisper/distil-large-v3#sequential-long-form, I receive Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. warning.

Is it expected or does it indicate an error in the set up on my end?

In addition to the loading example, I prepare the model locally during the docker image build with the following method:

def download_model():
    import os
    import transformers
    from huggingface_hub import snapshot_download

    # Ensure folder exists
    os.makedirs(MODEL_CACHE_DIR, exist_ok=True)
    snapshot_download(
        repo_id="distil-whisper/distil-large-v3",
        allow_patterns=["model.safetensors", "*.json", "*.txt"],
        local_dir=MODEL_CACHE_DIR,
    )
    transformers.utils.move_cache()

then when loading, instead of specifying a model string, I provide MODEL_CACHE_DIR instead.

Whisper Distillation org

Hey @flexai , I am not getting the warning when running the example. Do you still face this issue ?

Hey boss, I haven't run it since so let us close this issue until further notice! Btw, thank you for the models. It's huge value.

flexai changed discussion status to closed

Sign up or log in to comment