Minor changes for correct inference
Hello @Lolalb and team!
Preface
Congratulations on your model release! Very exciting to see another strong encoder out there. I already started some training runs of my own to experiment! I ran into some minor issues regarding the transformers
integration, that this PR tackles.
Pull Request overview
- Update AutoModel... in config.json
- Add base_model_prefix="model" on PreTrainedModel
- Cast attention_mask to bool in SDPA
- Tag this model as transformers-compatible
- Specify that we don't want the token_type_ids from the tokenizer
Details
Updating config.json
Previously, loading an AutoModel.from_pretrained
actually loaded the Masked language modeling model, and it was not possible to load the Sequence Classification model. Together with updating the base_model_prefix
in model.py
, it's now possible to do AutoModel.from_pretrained
, AutoModelForMaskedLM.from_pretrained
and AutoModelForSequenceClassification.from_pretrained
.
Cast attention mask to bool
In SDPA, the attention mask was an int tensor before, which SDPA doesn't like. I'm doing some training with this model now, but I have a custom data collator, so it's not using your data collator for Flash Attention 2 support.
No token_type_ids from the tokenizer
The BERT tokenizer outputs token_type_ids
by default. This used to be used to recognize which sentence was which if you provided pairs of sentences. However, it's kind of fallen out of style. We don't need it, but the tokenizer does return it, so this removes that.
You can test all this stuff nicely by using the revision
argument:
from transformers import AutoModel, AutoTokenizer
model_name = "chandar-lab/NeoBERT"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True, revision="refs/pr/1")
model = AutoModel.from_pretrained(model_name, trust_remote_code=True, revision="refs/pr/1")
# Tokenize input text
text = "NeoBERT is the most efficient model of its kind!"
inputs = tokenizer(text, return_tensors="pt")
# Generate embeddings
outputs = model(**inputs)
embedding = outputs.last_hidden_state[:, 0, :]
print(embedding.shape)
I'll be uploading my first finetune of this model in a few minutes. It looks to be very strong, stronger than my same model with ModernBERT-base.
Edit: Here's the NeoBERT model and my ModernBERT-base baseline, using the same training script:
- https://huggingface.co/tomaarsen/reranker-ModernBERT-base-gooaq-bce-static-retriever-hardest
- https://huggingface.co/tomaarsen/reranker-NeoBERT-gooaq-bce
Interestingly, NeoBERT is much stronger in-domain, but much worse out-of-domain with my finetuned model here.
If there is a lot of interest from the community, it might make sense to introduce neobert
as an architecture in transformers
, so that users won't have to use trust_remote_code
anymore. I do have to preface that we don't have xformers
in pure transformers
, so we would need a "manual" swiglu instead.
cc @stefan-it as you were also working on NeoBERT, I believe.
- Tom Aarsen
Also, I noticed that throughout your paper, you describe the model as NeoBERT-medium
. If you imagine ever making a smaller or larger variant of this model, this might be a good time to rename the model to chandar-lab/NeoBERT-medium
so you don't shoot yourself in the foot by having this model be the one NeoBERT model. Note that you can then still load the model with chandar-lab/NeoBERT
, and https://huggingface.co/chandar-lab/NeoBERT should still work.
Hi @tomaarsen , thanks a lot for these modifications and your comments! We're excited to see that NeoBERT is performing well in your experiments. We are considering training other sizes if we do get the necessary compute, in which case we would also remove the xformers dependency from these models (unfortunately, it seems tricky to do so for this version).