ko-barTNumText(TNT Model๐งจ): Try Number To Korean Reading(์ซ์๋ฅผ ํ๊ธ๋ก ๋ฐ๊พธ๋ ๋ชจ๋ธ)
Table of Contents
Model Details
Model Description: ๋ญ๊ฐ ์ฐพ์๋ด๋ ๋ชจ๋ธ์ด๋ ์๊ณ ๋ฆฌ์ฆ์ด ๋ฑํ ์์ด์ ๋ง๋ค์ด๋ณธ ๋ชจ๋ธ์ ๋๋ค.
BartForConditionalGeneration Fine-Tuning Model For Number To Korean
BartForConditionalGeneration์ผ๋ก ํ์ธํ๋ํ, ์ซ์๋ฅผ ํ๊ธ๋ก ๋ณํํ๋ Task ์ ๋๋ค.Dataset use Korea aihub
I can't open my fine-tuning datasets for my private issue
๋ฐ์ดํฐ์ ์ Korea aihub์์ ๋ฐ์์ ์ฌ์ฉํ์์ผ๋ฉฐ, ํ์ธํ๋์ ์ฌ์ฉ๋ ๋ชจ๋ ๋ฐ์ดํฐ๋ฅผ ์ฌ์ ์ ๊ณต๊ฐํด๋๋ฆด ์๋ ์์ต๋๋ค.Korea aihub data is ONLY permit to Korean!!!!!!!
aihub์์ ๋ฐ์ดํฐ๋ฅผ ๋ฐ์ผ์ค ๋ถ์ ํ๊ตญ์ธ์ผ ๊ฒ์ด๋ฏ๋ก, ํ๊ธ๋ก๋ง ์์ฑํฉ๋๋ค.
์ ํํ๋ ์์ฑ์ ์ฌ๋ฅผ ์ฒ ์์ ์ฌ๋ก ๋ฒ์ญํ๋ ํํ๋ก ํ์ต๋ ๋ชจ๋ธ์ ๋๋ค. (ETRI ์ ์ฌ๊ธฐ์ค)In case, ten million, some people use 10 million or some people use 10000000, so this model is crucial for training datasets
์ฒ๋ง์ 1000๋ง ํน์ 10000000์ผ๋ก ์ธ ์๋ ์๊ธฐ์, Training Datasets์ ๋ฐ๋ผ ๊ฒฐ๊ณผ๋ ์์ดํ ์ ์์ต๋๋ค.์๊ดํ์ฌ์ ์ ์์กด๋ช ์ฌ์ ๋์ด์ฐ๊ธฐ์ ๋ฐ๋ผ ๊ฒฐ๊ณผ๊ฐ ํ์ฐํ ๋ฌ๋ผ์ง ์ ์์ต๋๋ค. (์ฐ์ด, ์ฐ ์ด -> ์ฐ์ด, 50์ด) https://eretz2.tistory.com/34
์ผ๋จ์ ๊ธฐ์ค์ ์ก๊ณ ์น์ฐ์น๊ฒ ํ์ต์ํค๊ธฐ์ ์ด๋ป๊ฒ ์ฌ์ฉ๋ ์ง ๋ชฐ๋ผ, ํ์ต ๋ฐ์ดํฐ ๋ถํฌ์ ๋งก๊ธฐ๋๋ก ํ์ต๋๋ค. (์ฐ ์ด์ด ๋ ๋ง์๊น ์ฐ์ด์ด ๋ ๋ง์๊น!?)Developed by: Yoo SungHyun(https://github.com/YooSungHyun)
Language(s): Korean
License: apache-2.0
Parent Model: See the kobart-base-v2 for more information about the pre-trained base model.
Uses
Want see more detail follow this URL KoGPT_num_converter
and see bart_inference.py
and bart_train.py
Evaluation
Just using evaluate-metric/bleu
and evaluate-metric/rouge
in huggingface evaluate
library
Training wanDB URL
How to Get Started With the Model
from transformers.pipelines import Text2TextGenerationPipeline
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
texts = ["๊ทธ๋ฌ๊ฒ ๋๊ฐ 6์๊น์ง ์ ์ ๋ง์๋?"]
tokenizer = AutoTokenizer.from_pretrained("lIlBrother/ko-barTNumText")
model = AutoModelForSeq2SeqLM.from_pretrained("lIlBrother/ko-barTNumText")
seq2seqlm_pipeline = Text2TextGenerationPipeline(model=model, tokenizer=tokenizer)
kwargs = {
"min_length": 0,
"max_length": 1206,
"num_beams": 100,
"do_sample": False,
"num_beam_groups": 1,
}
pred = seq2seqlm_pipeline(texts, **kwargs)
print(pred)
# ๊ทธ๋ฌ๊ฒ ๋๊ฐ ์ฌ์ฏ ์๊น์ง ์ ์ ๋ง์๋?
- Downloads last month
- 14
Evaluation results
- eval_bleuself-reported0.931
- eval_rouge1self-reported0.961
- eval_rouge2self-reported0.939
- eval_rougeLself-reported0.961
- eval_rougeLsumself-reported0.961