NLG
Collection
Natural Language Generation ITG model collection
•
1 item
•
Updated
•
1
This is a transformer-decoder GPT-2 model, adapted for the single-turn dialogue task in Spanish. We fine-tuned a DialoGPT-medium 345M parameter model from Microsoft, following the CLM (Causal Language Modelling) objective.
We used one of the datasets available in the Bot Framework Tools repository. We processed the professional-styled personality chat dataset in Spanish, the file is available to download here
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
CHAT_TURNS = 5
MAX_LENGTH = 1000
model = AutoModelForCausalLM.from_pretrained('ITG/DialoGPT-medium-spanish-chitchat')
tokenizer = AutoTokenizer.from_pretrained('ITG/DialoGPT-medium-spanish-chitchat')
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
for i in range(CHAT_TURNS):
user_input = input(f"Step - {i} >> user prompt ->")
with torch.no_grad():
# User turn, where "user_input" is the question (single-turn dialogue task)
user_inputs_ids = tokenizer.encode(user_input + tokenizer.eos_token, return_tensors="pt")
user_inputs_ids = user_inputs_ids.to(device)
# The chat history adds the generated tokens for the answer
chat_history = model.generate(user_inputs_ids, max_length=MAX_LENGTH, pad_token_id=tokenizer.eos_token_id)
# decode just the last generated output tokens from the model (do not include the user prompt again)
step_model_answer = tokenizer.decode(chat_history[:, user_inputs_ids.shape[-1]:][0], skip_special_tokens=True)
print(f"Step - {i} >> DialoGPT-spanish model answer -> {step_model_answer}")
Role | Response |
---|---|
User | Hola, buenos dias |
Bot | Buenos días |
User | ¿Qué tal estás? |
Bot | Fenomenal, gracias. |
User | ¡Tengo hambre! |
Bot | Quizá picar algo te ayude. |
User | Estoy triste |
Bot | Siento escuchar eso. |
User | ¿Tienes personalidad? |
Bot | Soy un bot. |
User | ¡Cuenta un chiste! |
Bot | En realidad no tengo tanta gracia. |
User | Jajajajajajajaja |
Bot | ¡Me alegro de que te rías! |
Hyper-parameter | Value |
---|---|
Validation partition (%) | 20% |
Training batch size | 8 |
Learning rate | 5e-4 |
Max training epochs | 20 |
Warmup training steps (%) | 6% |
Weight decay | 0.01 |
Optimiser (beta1, beta2, epsilon) | AdamW (0.9, 0.999, 1e-08) |
Monitoring metric (delta, patience) | Validation loss (0.1, 3) |
If you want to fine-tune your own dialogue model, we recommend you to start from the DialoGPT model. You can check the original GitHub repository.
Since our approach can assign a probability to any Unicode string, this allows us to evaluate our LMs on any dataset regardless of pre-processing, tokenization, or vocab size.