Dante-Zero Fine-tuned Model

This model was fine-tuned using Reinforcement Learning with Group Relative Policy Optimization (GRPO) to generate Dante-style poetry in endecasillabi (11-syllable lines).

Model Details

Base Model: PleIAs/Pleias-350m-Preview
Training Method: GRPO (Group Relative Policy Optimization )
Training Data: 1,000 chunks from Dante's Divine Comedy
Epochs: 10
Trained By: ruggsea
Date: 2025-03-05

Model Description

This model is specialized in generating Italian poetry in the style of Dante Alighieri's Divine Comedy. It has been trained to:

Generate proper endecasillabi (11-syllable lines)
Follow the structure of Dante's poetry
Avoid repetition
Create original content (not plagiarize the Divine Comedy)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("ruggsea/dante-zero-2025-03-05")
tokenizer = AutoTokenizer.from_pretrained("ruggsea/dante-zero-2025-03-05")

# Generate poetry
prompt = "Nel mezzo del cammin di nostra vita"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    inputs.input_ids,
    max_new_tokens=200,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    repetition_penalty=1.2
)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

Reward Functions

The model was trained using several reward functions:

Endecasillabo Checker: Rewards proper 11-syllable lines
Plagiarism Checker: Penalizes copying from the Divine Comedy
Verse Structure Checker: Encourages verse-like structure
Repetition Penalty: Discourages repetitive patterns

License

This model is available under the same license as the base model (PleIAs/Pleias-350m-Preview).