|
--- |
|
language: vi |
|
tags: |
|
- vi |
|
- vietnamese |
|
- gpt2 |
|
- text-generation |
|
- lm |
|
- nlp |
|
datasets: |
|
- wikilinguage |
|
widget: |
|
- text: Không phải tất cả các nguyên liệu lành mạnh đều đắt đỏ. |
|
pipeline_tag: text-generation |
|
inference: |
|
parameters: |
|
max_length: 120 |
|
do_sample: true |
|
temperature: 0.8 |
|
--- |
|
|
|
# GPT-2 |
|
|
|
GPT-2, a language pretrained model with a causal language modeling (CLM) goal, is a transformer-based language model. |
|
This model was pre-trained and used to generate text on the Vietnamese Wikilingua dataset. |
|
|
|
# How to use the model |
|
|
|
~~~~ |
|
from transformers import GPT2Tokenizer, GPT2LMHeadModel |
|
|
|
tokenizer = GPT2Tokenizer.from_pretrained('minhtoan/vietnamese-gpt2-finetune') |
|
model = GPT2LMHeadModel.from_pretrained('minhtoan/vietnamese-gpt2-finetune') |
|
|
|
text = "Không phải tất cả các nguyên liệu lành mạnh đều đắt đỏ." |
|
input_ids = tokenizer.encode(text, return_tensors='pt') |
|
max_length = 100 |
|
|
|
sample_outputs = model.generate(input_ids,pad_token_id=tokenizer.eos_token_id, |
|
do_sample=True, |
|
max_length=max_length, |
|
min_length=max_length, |
|
num_return_sequences=3) |
|
|
|
for i, sample_output in enumerate(sample_outputs): |
|
print(">> Generated text {}\n\n{}".format(i+1, tokenizer.decode(sample_output.tolist()))) |
|
print('\n---') |
|
~~~~ |
|
|
|
|
|
## Author |
|
` |
|
Phan Minh Toan |
|
` |