metadata

license: apache-2.0
base_model: google/bigbird-roberta-base
tags:
  - generated_from_trainer
model-index:
  - name: bigbird-roberta-base-fineweb-edu-llama3-annotations-4096-vN
    results: []

bigbird-roberta-base-fineweb-edu-llama3-annotations-4096-vN

This model is a fine-tuned version of google/bigbird-roberta-base on the HuggingFaceFW/fineweb-edu-llama3-annotations dataset. It achieves the following results on the evaluation set:

Loss: 0.2176
Mse: 0.2176

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 4
eval_batch_size: 4
seed: 90085
gradient_accumulation_steps: 32
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-09
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1.0

Training results

Training Loss	Epoch	Step	Validation Loss	Mse
0.4763	0.0288	100	0.4468	0.4468
0.3078	0.0577	200	0.3130	0.3130
0.3088	0.0865	300	0.2695	0.2695
0.2379	0.1153	400	0.2618	0.2618
0.289	0.1441	500	0.2583	0.2583
0.3049	0.1730	600	0.2723	0.2723
0.2292	0.2018	700	0.2477	0.2477
0.2677	0.2306	800	0.2369	0.2369
0.3181	0.2594	900	0.2307	0.2307
0.2551	0.2883	1000	0.2411	0.2411
0.2743	0.3171	1100	0.2350	0.2350
0.2383	0.3459	1200	0.2424	0.2424
0.2191	0.3747	1300	0.2279	0.2279
0.2431	0.4036	1400	0.2232	0.2232
0.2161	0.4324	1500	0.2307	0.2307
0.2459	0.4612	1600	0.2246	0.2246
0.2403	0.4900	1700	0.2232	0.2232
0.251	0.5189	1800	0.2421	0.2421
0.2565	0.5477	1900	0.2207	0.2207
0.2274	0.5765	2000	0.2294	0.2294
0.2272	0.6053	2100	0.2192	0.2192
0.2668	0.6342	2200	0.2204	0.2204
0.2434	0.6630	2300	0.2196	0.2196
0.2464	0.6918	2400	0.2185	0.2185
0.2338	0.7206	2500	0.2166	0.2166
0.243	0.7495	2600	0.2165	0.2165
0.1891	0.7783	2700	0.2201	0.2201
0.2355	0.8071	2800	0.2167	0.2167
0.2231	0.8359	2900	0.2168	0.2168
0.2274	0.8648	3000	0.2243	0.2243
0.2287	0.8936	3100	0.2203	0.2203
0.261	0.9224	3200	0.2186	0.2186
0.2187	0.9512	3300	0.2176	0.2176
0.2069	0.9801	3400	0.2178	0.2178

Framework versions

Transformers 4.42.3
Pytorch 2.3.1+cu121
Datasets 2.20.0
Tokenizers 0.19.1