bert-base-uncased finetuned on IMDB dataset
Evaluation set was created by taking 1000 samples from test set
DatasetDict({
train: Dataset({
features: ['text', 'label'],
num_rows: 25000
})
dev: Dataset({
features: ['text', 'label'],
num_rows: 1000
})
test: Dataset({
features: ['text', 'label'],
num_rows: 24000
})
})
Parameters
max_sequence_length = 128
batch_size = 32
eval_steps = 100
learning_rate=2e-05
num_train_epochs=5
early_stopping_patience = 10
Training Run
[2700/3910 1:11:43 < 32:09, 0.63 it/s, Epoch 3/5]
Step Training Loss Validation Loss Accuracy Precision Recall F1 Runtime Samples Per Second
100 No log 0.371974 0.845000 0.798942 0.917004 0.853911 15.256900 65.544000
200 No log 0.349631 0.850000 0.873913 0.813765 0.842767 15.288600 65.408000
300 No log 0.359376 0.845000 0.869281 0.807692 0.837356 15.303900 65.343000
400 No log 0.307613 0.870000 0.851351 0.892713 0.871542 15.358400 65.111000
500 0.364500 0.309362 0.856000 0.807018 0.931174 0.864662 15.326100 65.248000
600 0.364500 0.302709 0.867000 0.881607 0.844130 0.862461 15.324400 65.255000
700 0.364500 0.300102 0.871000 0.894168 0.838057 0.865204 15.474900 64.621000
800 0.364500 0.383784 0.866000 0.833333 0.910931 0.870406 15.380100 65.019000
900 0.364500 0.309934 0.874000 0.881743 0.860324 0.870902 15.358900 65.109000
1000 0.254600 0.332236 0.872000 0.894397 0.840081 0.866388 15.442700 64.756000
1100 0.254600 0.330807 0.871000 0.877847 0.858300 0.867963 15.410900 64.889000
1200 0.254600 0.352724 0.872000 0.925581 0.805668 0.861472 15.272800 65.476000
1300 0.254600 0.278529 0.881000 0.891441 0.864372 0.877698 15.408200 64.900000
1400 0.254600 0.291371 0.878000 0.854962 0.906883 0.880157 15.427400 64.820000
1500 0.208400 0.324827 0.869000 0.904232 0.821862 0.861082 15.338600 65.195000
1600 0.208400 0.377024 0.884000 0.898734 0.862348 0.880165 15.414500 64.874000
1700 0.208400 0.375274 0.885000 0.881288 0.886640 0.883956 15.367200 65.073000
1800 0.208400 0.378904 0.880000 0.877016 0.880567 0.878788 15.363900 65.088000
1900 0.208400 0.410517 0.874000 0.866534 0.880567 0.873494 15.324700 65.254000
2000 0.130800 0.404030 0.876000 0.888655 0.856275 0.872165 15.414200 64.875000
2100 0.130800 0.390763 0.883000 0.882353 0.880567 0.881459 15.341500 65.183000
2200 0.130800 0.417967 0.880000 0.875502 0.882591 0.879032 15.351300 65.141000
2300 0.130800 0.390974 0.883000 0.898520 0.860324 0.879007 15.396100 64.952000
2400 0.130800 0.479739 0.874000 0.856589 0.894737 0.875248 15.460500 64.681000
2500 0.098400 0.473215 0.875000 0.883576 0.860324 0.871795 15.392200 64.968000
2600 0.098400 0.532294 0.872000 0.889362 0.846154 0.867220 15.364100 65.087000
2700 0.098400 0.536664 0.881000 0.880325 0.878543 0.879433 15.351100 65.142000
TrainOutput(global_step=2700, training_loss=0.2004435383832013, metrics={'train_runtime': 4304.5331, 'train_samples_per_second': 0.908, 'total_flos': 7258763970957312, 'epoch': 3.45})
Classification Report
precision recall f1-score support
0 0.90 0.87 0.89 11994
1 0.87 0.90 0.89 12006
accuracy 0.89 24000
macro avg 0.89 0.89 0.89 24000
weighted avg 0.89 0.89 0.89 24000