mt5-small / README.md
psxjp5's picture
Update README.md
6380c39
|
raw
history blame
4.97 kB
metadata
license: apache-2.0
base_model: google/mt5-small
tags:
  - generated_from_trainer
metrics:
  - rouge
  - bleu
  - meteor
datasets:
  - natural_questions
model-index:
  - name: mt5-small
    results:
      - task:
          type: Question answering from context
          name: Question answering
        dataset:
          type: natural-questions
          name: Adapted Natural Questions
        metrics:
          - type: bleu
            value: 34.1596
            name: BLEU
            verified: true
          - type: rouge
            value: 44.4366
            name: ROUGE1
            verified: true
          - type: rouge
            value: 38.8202
            name: ROUGE2
            verified: true
          - type: rouge
            value: 43.113
            name: ROUGEl
            verified: true
          - type: rouge
            value: 43.1423
            name: ROUGElsum
            verified: true
          - type: meteor
            value: 0.4049
            name: METEOR
            verified: true

mt5-small_test_45

This model is a fine-tuned version of google/mt5-small on an enhanced version of the Natural Questions dataset. It achieves the following results on the evaluation set:

  • Loss: 0.7291
  • Rouge1: 44.4366
  • Rouge2: 38.8202
  • Rougel: 43.113
  • Rougelsum: 43.1423
  • Bleu: 34.1596
  • Gen Len: 12.6724
  • Meteor: 0.4049
  • True negatives: 69.7281
  • False negatives: 10.4037
  • Cosine Sim: 0.763

Model description

This model is fine-tuned for long-form, closed-domain question answering - question-answering from context. It uses a heavily refined version of Google's Natural Questions dataset.

Answers to the questions were rewritten using OpenAI's GPT-3.5 Turbo model.

Please see the following repo for all code and adaptations.

Intended uses & limitations

The model requires questions to be submitted using the following format using the input message: [CONTEXT] <\s> [QUESTION]

It is trained to respond appropriately when a question cannot be answered using the provided context.

It can give false negatives and false positives on occasion (see Training Results), and all answers must be checked appropriately.

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 9
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 20
  • weight_decay = 0.007

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Bleu Gen Len Meteor True negatives False negatives Cosine Sim
2.5724 1.0 175 0.9876 18.7781 15.6002 18.22 18.2686 7.6676 7.7661 0.1628 72.8701 56.677 0.4003
1.1469 1.99 350 0.8580 36.8209 31.2514 35.5008 35.5462 25.7137 12.0014 0.3311 62.8399 20.3934 0.6645
0.9468 2.99 525 0.7997 40.4128 34.716 39.0867 39.0972 29.3028 12.4287 0.3656 63.4441 15.295 0.7114
0.8129 3.98 700 0.7733 42.6764 36.7266 41.2465 41.2833 32.0644 12.9002 0.3871 62.1752 11.413 0.7425
0.7228 4.98 875 0.7483 42.9082 36.957 41.482 41.5233 32.4942 12.8866 0.3906 63.3233 11.5166 0.747
0.6493 5.97 1050 0.7293 40.3205 34.9632 39.1111 39.1168 28.8249 11.6867 0.3674 73.8973 17.9865 0.7068
0.5883 6.97 1225 0.7172 42.7342 37.0855 41.4069 41.424 32.1296 12.48 0.3887 70.0302 12.7847 0.7392
0.5409 7.96 1400 0.7387 44.6657 38.8426 43.3276 43.3496 34.4773 12.9395 0.4084 66.3444 9.5238 0.7658
0.5035 8.96 1575 0.7330 43.4925 38.0013 42.2697 42.2372 32.6131 12.2789 0.3979 72.6284 12.8364 0.7451
0.4652 9.95 1750 0.7291 44.4366 38.8202 43.113 43.1423 34.1596 12.6724 0.4049 69.7281 10.4037 0.763

Framework versions

  • Transformers 4.31.0
  • Pytorch 2.0.1+cu118
  • Datasets 2.13.1
  • Tokenizers 0.13.3