JohnnyBoy00's picture
Update README.md
60d919b
|
raw
history blame
7.26 kB
metadata
language: en
datasets:
  - JohnnyBoy00/saf_communication_networks_english
license: apache-2.0
tags:
  - generated_from_trainer
widget:
  - text: >-
      Answer: In TCP there is a Sequence Number field to identify packets
      individually for reliability. There is no Sequence Number in UDP. The UDP
      header does not have an options field, while the TCP header does. In TCP
      there is an Advertised Window field for the Sliding Window Protocol for
      Flow Control. There is no Flow Control and therefore no Advertised Window
      field in UDP. In TCP there there is only a Data Offset field that
      specifies the header length. In UDP the whole Packet Length is
      transmitted. Reference: Possible Differences : The UPD header (8 bytes) is
      much shorter than the TCP header (20-60 bytes) The UDP header has a fixed
      length while the TCP header has a variable length Fields contained in the
      TCP header and not the UDP header : -Sequence number -Acknowledgment
      number -Reserved -Flags/Control bits -Advertised window -Urgent Pointer
      -Options + Padding if the options are UDP includes the packet length (data
      + header) while TCP has the header length/data offset (just header) field
      instead The sender port field is optional in UDP, while the source port in
      TCP is necessary to establish the connection Question: State at least 4 of
      the differences shown in the lecture between the UDP and TCP headers.

bart-finetuned-saf-communication-networks

This model is a fine-tuned version of facebook/bart-large on the saf_communication_networks_english dataset for Short Answer Feedback (SAF), as proposed in Filighera et al., ACL 2022.

Model description

This model was built on top of BART, which is a sequence-to-sequence model trained with denoising as pretraining objective.

It expects inputs in the following format:

Answer: [answer] Reference: [reference_answer] Question: [question]

In the example above, [answer], [reference_answer] and [question] should be replaced by the provided answer, the reference answer and the question to which they refer, respectively.

The outputs are formatted as follows:

[verification_feedback] Feedback: [feedback]

Hence, the [verification_feedback] label will be one of Correct, Partially correct or Incorrect, while [feedback] will be the textual feedback generated by the model according to the given answer.

Intended uses & limitations

This model is intended to be used for Short Answer Feedback generation in the context of college-level communication networks topics. Thus, it is not expected to have particularly good performance on sets of questions and answers out of this scope.

It is important to acknowledge that the model underperforms when a question that was not seen during training is given as input for inference. In particular, it tends to classify most answers as being correct and does not provide relevant feedback in such cases. Nevertheless, this limitation could be partially overcome by extending the dataset with the desired question (and associated answers) and fine-tuning it for a few epochs on the new data.

Training and evaluation data

As mentioned previously, the model was trained on the saf_communication_networks_english dataset, which is divided into the following splits.

Split Number of examples
train 1700
validation 427
test_unseen_answers 375
test_unseen_questions 479

Evaluation was performed on the test_unseen_answers and test_unseen_questions splits.

Training procedure

The Trainer API was used to fine-tune the model. The code utilized for pre-processing and training was mostly adapted from the summarization script made available by HuggingFace.

Training was completed in a little under 1 hour on a GPU on Google Colab.

Training hyperparameters

The following hyperparameters were used during training:

  • num_epochs: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • learning_rate: 5e-05
  • lr_scheduler_type: linear
  • train_batch_size: 1
  • gradient_accumulation_steps: 4
  • eval_batch_size: 4
  • seed: 42
  • mixed_precision_training: Native AMP
  • total_train_batch_size: 4

Framework versions

  • Transformers 4.25.1
  • Pytorch 1.12.1+cu113
  • Datasets 2.7.1
  • Tokenizers 0.13.2

Evaluation results

The generated feedback was evaluated through means of the SacreBLEU, ROUGE, METEOR, BERTScore metrics from HuggingFace, while the accuracy and F1 scores from scikit-learn where used for evaluation of the labels.

The following results were achieved.

Split SacreBLEU ROUGE METEOR BERTscore Accuracy Weighted F1 Macro F1
test_unseen_answers 36.0 49.1 60.8 69.5 76.0 73.0 53.4
test_unseen_questions 2.4 20.1 28.5 36.6 51.6 41.0 27.9

The script used to compute these metrics and perform evaluation can be found in the evaluation.py file in this repository.

Usage

The example below shows how the model can be applied to generate feedback to a given answer.

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model = AutoModelForSeq2SeqLM.from_pretrained('JohnnyBoy00/bart-finetuned-saf-communication-networks')
tokenizer = AutoTokenizer.from_pretrained('JohnnyBoy00/bart-finetuned-saf-communication-networks')
example_input = ''
inputs = tokenizer(example_input, max_length=256, padding='max_length', truncation=True, return_tensors='pt')
generated_tokens = model.generate(
                inputs['input_ids'],
                attention_mask=inputs['attention_mask'],
                max_length=128
            )
output = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0]

The output produced by the model then looks as follows:

Correct Feedback: 

Related Work

Filighera et al., ACL 2022 trained a T5 model on this dataset, providing a baseline for SAF generation. The entire code used to define and train the model can be found on GitHub.