ModernBERT-large-squad2-v0.1

This model is a fine-tuned version of answerdotai/ModernBERT-large on the rajpurkar/squad_v2 dataset.

Maximum sequence length used during training was 8192.

Requires trust_remote_code to be set to True in order to be load the model.

from transformers import pipeline

model_name = "praise2112/ModernBERT-large-squad2-v0.1"

# a) Get predictions
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)

context = """Model Summary
ModernBERT is a modernized bidirectional encoder-only Transformer model (BERT-style) pre-trained on 2 trillion tokens of English and code data with a native context length of up to 8,192 tokens. ModernBERT leverages recent architectural improvements such as:

Rotary Positional Embeddings (RoPE) for long-context support.
Local-Global Alternating Attention for efficiency on long inputs.
Unpadding and Flash Attention for efficient inference.
ModernBERT’s native long context length makes it ideal for tasks that require processing long documents, such as retrieval, classification, and semantic search within large corpora. The model was trained on a large corpus of text and code, making it suitable for a wide range of downstream tasks, including code retrieval and hybrid (text + code) semantic search.

It is available in the following sizes:

ModernBERT-base - 22 layers, 149 million parameters
ModernBERT-large - 28 layers, 395 million parameters
For more information about ModernBERT, we recommend our release blog post for a high-level overview, and our arXiv pre-print for in-depth information.

ModernBERT is a collaboration between Answer.AI, LightOn, and friends."""

question = "Why was RoPE used in ModernBERT?"

res = nlp(question=question, context=context, max_seq_len=8192)

# {'score': 0.5530015826225281, 'start': 309, 'end': 334, 'answer': ' for long-context support'}

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 64
  • optimizer: Use ExtendedOptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 4

Training results

Metric Value
eval_exact 86.27
eval_f1 89.30

Framework versions

  • Transformers 4.48.0.dev0
  • Pytorch 2.5.1+cu124
  • Datasets 2.20.0
  • Tokenizers 0.21.0
Downloads last month
54
Safetensors
Model size
396M params
Tensor type
F32
·
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.

Model tree for Praise2112/ModernBERT-large-squad2-v0.1

Finetuned
(46)
this model

Dataset used to train Praise2112/ModernBERT-large-squad2-v0.1