KLUE Robeta-base for legal documents

  • KLUE/Robeta-Base Model์„ ํŒ๊ฒฐ๋ฌธ์œผ๋กœ ์ด๋ค„์ง„ legal_text_merged02_light.txt ํŒŒ์ผ์„ ์‚ฌ์šฉํ•˜์—ฌ ์žฌํ•™์Šต ์‹œํ‚จ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

Model Details

Model Description

  • Developed by: J.Park @ KETI
  • Model type: klue/roberta-base
  • Language(s) (NLP): korean
  • License: [More Information Needed]
  • Finetuned from model [optional]: [More Information Needed]

ํ•™์Šต ๋ฐฉ๋ฒ•

base_model = 'klue/roberta-base'
base_tokenizer = 'klue/roberta-base'

from transformers import RobertaTokenizer, RobertaForMaskedLM
from transformers import AutoModel, AutoTokenizer
model = RobertaForMaskedLM.from_pretrained(base_model)
tokenizer = AutoTokenizer.from_pretrained(base_tokenizer)

from transformers import LineByLineTextDataset
dataset = LineByLineTextDataset(
    tokenizer=tokenizer,
    file_path=fpath_dataset,
    block_size=512,
)

from transformers import DataCollatorForLanguageModeling
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer, mlm=True, mlm_probability=0.15
)

from transformers import Trainer, TrainingArguments
training_args = TrainingArguments(
    output_dir=output_dir,
    overwrite_output_dir=True,
    num_train_epochs=5,
    per_device_train_batch_size=18,
    save_steps=100,
    save_total_limit=2,
    seed=1
)

trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=dataset
)

train_metrics = trainer.train()
trainer.save_model(output_dir)
trainer.push_to_hub()
Downloads last month
2
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for againeureka/klue_roberta_base_for_legal

Finetunes
1 model