pszemraj's picture
End of training
c87157f verified
|
raw
history blame
3.81 kB
metadata
license: apache-2.0
base_model: google/bigbird-roberta-base
tags:
  - generated_from_trainer
model-index:
  - name: bigbird-roberta-base-fineweb-edu-llama3-annotations-4096-vN
    results: []

Visualize in Weights & Biases

bigbird-roberta-base-fineweb-edu-llama3-annotations-4096-vN

This model is a fine-tuned version of google/bigbird-roberta-base on the HuggingFaceFW/fineweb-edu-llama3-annotations dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2176
  • Mse: 0.2176

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 90085
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-09
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss Mse
0.4763 0.0288 100 0.4468 0.4468
0.3078 0.0577 200 0.3130 0.3130
0.3088 0.0865 300 0.2695 0.2695
0.2379 0.1153 400 0.2618 0.2618
0.289 0.1441 500 0.2583 0.2583
0.3049 0.1730 600 0.2723 0.2723
0.2292 0.2018 700 0.2477 0.2477
0.2677 0.2306 800 0.2369 0.2369
0.3181 0.2594 900 0.2307 0.2307
0.2551 0.2883 1000 0.2411 0.2411
0.2743 0.3171 1100 0.2350 0.2350
0.2383 0.3459 1200 0.2424 0.2424
0.2191 0.3747 1300 0.2279 0.2279
0.2431 0.4036 1400 0.2232 0.2232
0.2161 0.4324 1500 0.2307 0.2307
0.2459 0.4612 1600 0.2246 0.2246
0.2403 0.4900 1700 0.2232 0.2232
0.251 0.5189 1800 0.2421 0.2421
0.2565 0.5477 1900 0.2207 0.2207
0.2274 0.5765 2000 0.2294 0.2294
0.2272 0.6053 2100 0.2192 0.2192
0.2668 0.6342 2200 0.2204 0.2204
0.2434 0.6630 2300 0.2196 0.2196
0.2464 0.6918 2400 0.2185 0.2185
0.2338 0.7206 2500 0.2166 0.2166
0.243 0.7495 2600 0.2165 0.2165
0.1891 0.7783 2700 0.2201 0.2201
0.2355 0.8071 2800 0.2167 0.2167
0.2231 0.8359 2900 0.2168 0.2168
0.2274 0.8648 3000 0.2243 0.2243
0.2287 0.8936 3100 0.2203 0.2203
0.261 0.9224 3200 0.2186 0.2186
0.2187 0.9512 3300 0.2176 0.2176
0.2069 0.9801 3400 0.2178 0.2178

Framework versions

  • Transformers 4.42.3
  • Pytorch 2.3.1+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1