mt5-base-thaisum

This repository contains the finetuned mT5-base model for Thai sentence summarization. The architecture of the model is based on mT5 model and fine-tuned on text-summarization pairs in Thai.

Example

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch

tokenizer = AutoTokenizer.from_pretrained("preechanon/mt5-base-thaisum-text-summarization")
model = AutoModelForSeq2SeqLM.from_pretrained("preechanon/mt5-base-thaisum-text-summarization")
new_input_string = "ข้อความที่ต้องการ"
input_ = tokenizer(new_input_string, truncation=True, max_length=1024, return_tensors="pt")
with torch.no_grad():
    preds = model.generate(
        input_['input_ids'].to('cpu'),
        num_beams=15,
        num_return_sequences=1,
        no_repeat_ngram_size=1,
        remove_invalid_values=True,
        max_length=140,
    )

summary = tokenizer.decode(preds[0], skip_special_tokens=True)
summary

Score

  • Rouge1: 0.488931
  • Rouge2: 0.309732
  • Rougel: 0.425490
  • Rougelsum: 0.444359

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-04
  • train_batch_size: 8
  • eval_batch_size: 1
  • seed: 42
  • optimizer: AdamW with betas=(0.9,0.999), epsilon=1e-08 and weight_decay=0.1
  • warmup step: 5000
  • lr_scheduler_type: linear
  • num_epochs: 6
  • gradient_accumulation_steps: 4

Framework versions

  • Transformers 4.36.1
  • Pytorch 2.1.2

Resource Funding

NSTDA Supercomputer center (ThaiSC) and the National e-Science Infrastructure Consortium for their support of computer facilities.

Citation

ปรีชานนท์ ชาติไทย และ สัจจวัจน์ ส่งเสริม. (2567),
การสรุปข้อความข่าวภาษาไทยด้วยโครงข่ายประสาทเทียม (Thai News Text Summarization Using Neural Network),
วิทยาศาสตรบัณฑิต (วทบ.):ขอนแก่น, มหาวิทยาลัยขอนแก่น
Downloads last month
52
Safetensors
Model size
582M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.