Trust remote code = True, even for fine-tuned local model?

#19

by eMbeddr - opened Oct 28, 2024

Oct 28, 2024

Hi guys,

From the SBERT documentation, I got the impression that I only needed to set trust_remote_code to True when using your model "directly" from the Hub. Yet, when I fine-tune it and save the model using model.save(output_dir) I can only access it whilst having trust_remote_code = True (whilst also having local_files_only = True).

When I try to ditch trust_remote_code I get a warning, and the embeddings become meaningless.

Any clues or things I should try?:))

tomaarsen

Oct 28, 2024

Hello!

Indeed - the reason that you're encountering this behaviour is because the code modeling files are stored outside of this repository, in https://huggingface.co/jinaai/jina-bert-implementation
They are loaded with these lines: https://huggingface.co/jinaai/jina-embeddings-v2-small-en/blob/main/config.json#L9-L12

To be able to ditch trust_remote_code, you must update the config.json of your local model to just e.g.:

    "AutoConfig": "configuration_bert.JinaBertConfig",
    "AutoModelForMaskedLM": "modeling_bert.JinaBertForMaskedLM",
    "AutoModel": "modeling_bert.JinaBertModel",
    "AutoModelForSequenceClassification": "modeling_bert.JinaBertForSequenceClassification"

And then download https://huggingface.co/jinaai/jina-bert-implementation/blob/main/modeling_bert.py and https://huggingface.co/jinaai/jina-bert-implementation/blob/main/configuration_bert.py and place them in the model repository. Then you won't need trust_remote_code.

Tom Aarsen

eMbeddr

Oct 29, 2024

Aha! Then it makes sense - thanks for the quick reply. And thanks for the solid work you're doing!:))

dweb

Dec 5, 2024

@tomaarsen I followed your instructions and yet still when I set trust_remote_code=False my parameters get reset. I tried to read more into exactly what this parameter is supposed to mean and it seems that it indicates any code which isn't internal to the transformers library. Although these .py files are now static and under my control in my repo when I do as you instruct, isn't trust_remote_code=True still necessary since the code isn't internal to transformers?

tomaarsen

Dec 5, 2024

Normally speaking I would imagine that trust_remote_code is no longer needed, no. But it's possible that some of the codebase adopts "trust_remote_code means any custom non-transformers code" somewhere, so then you'd still need it. Having said that, I feel like it shouldn't be necessary. I have no clue why it would reset the weights either, that's a bit odd.

Tom Aarsen

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment