ModernBERT-base-zeroshot-v2.0
Model description
This model is answerdotai/ModernBERT-base
fine-tuned on the same dataset mix as the zeroshot-v2.0
models in the Zeroshot Classifiers Collection.
General takeaways:
- The model is very fast and memory efficient. It's multiple times faster and consumes multiple times less memory than DeBERTav3. The memory efficiency enables larger batch sizes. I got a ~2x speed increase by enabling bf16 (instead of fp16).
- It performs slightly worse then DeBERTav3 on average on the tasks tested below.
- I'm in the process of preparing a newer version trained on better synthetic data to make full use of the 8k context window
and to update the training mix of the older
zeroshot-v2.0
models.
Training results
Per-dataset breakdown:
Datasets | Mean | Mean w/o NLI | mnli_m | mnli_mm | fevernli | anli_r1 | anli_r2 | anli_r3 | wanli | lingnli | wellformedquery | rottentomatoes | amazonpolarity | imdb | yelpreviews | hatexplain | massive | banking77 | emotiondair | emocontext | empathetic | agnews | yahootopics | biasframes_sex | biasframes_offensive | biasframes_intent | financialphrasebank | appreviews | hateoffensive | trueteacher | spam | wikitoxic_toxicaggregated | wikitoxic_obscene | wikitoxic_identityhate | wikitoxic_threat | wikitoxic_insult | manifesto | capsotu |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Accuracy | 0.831 | 0.835 | 0.932 | 0.936 | 0.884 | 0.763 | 0.647 | 0.657 | 0.823 | 0.889 | 0.753 | 0.864 | 0.949 | 0.935 | 0.974 | 0.798 | 0.788 | 0.727 | 0.789 | 0.793 | 0.489 | 0.893 | 0.717 | 0.927 | 0.851 | 0.859 | 0.907 | 0.952 | 0.926 | 0.726 | 0.978 | 0.912 | 0.914 | 0.93 | 0.951 | 0.906 | 0.476 | 0.708 |
F1 macro | 0.813 | 0.818 | 0.925 | 0.93 | 0.872 | 0.74 | 0.61 | 0.611 | 0.81 | 0.874 | 0.751 | 0.864 | 0.949 | 0.935 | 0.974 | 0.751 | 0.738 | 0.746 | 0.733 | 0.798 | 0.475 | 0.893 | 0.712 | 0.919 | 0.851 | 0.859 | 0.892 | 0.952 | 0.847 | 0.721 | 0.966 | 0.912 | 0.914 | 0.93 | 0.942 | 0.906 | 0.329 | 0.637 |
Inference text/sec (A100 40GB GPU, batch=128) | 3472.0 | 3474.0 | 2338.0 | 4416.0 | 2993.0 | 2959.0 | 2904.0 | 3003.0 | 4647.0 | 4486.0 | 5032.0 | 4354.0 | 2466.0 | 1140.0 | 1582.0 | 4392.0 | 5446.0 | 5296.0 | 4904.0 | 4787.0 | 2251.0 | 4042.0 | 1884.0 | 4048.0 | 4032.0 | 4121.0 | 4275.0 | 3746.0 | 4485.0 | 1114.0 | 4322.0 | 2260.0 | 2274.0 | 2189.0 | 2085.0 | 2410.0 | 3933.0 | 4388.0 |
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 32
- eval_batch_size: 128
- seed: 42
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.06
- num_epochs: 2
Framework versions
- Transformers 4.48.0.dev0
- Pytorch 2.5.1+cu124
- Datasets 3.2.0
- Tokenizers 0.21.0
- Downloads last month
- 291
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for MoritzLaurer/ModernBERT-base-zeroshot-v2.0
Base model
answerdotai/ModernBERT-base