---
tags:
- generated_from_trainer
datasets:
- RaiBP/openwebtext2-first-30-chunks-english-only-examples
model-index:
- name: training_nen
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# training_nen

This model was trained from scratch on the RaiBP/openwebtext2-first-30-chunks-english-only-examples dataset.

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.005
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 4
- total_train_batch_size: 64
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 1000
- num_epochs: 1.0

### Training results

### Evaluation results
Perplexity on random 2000 examples of the target language's [Wikipedia dataset](https://huggingface.co/datasets/wikimedia/wikipedia), using the code provided in the [perplexity docs](https://huggingface.co/docs/transformers/perplexity), with 512 tokes of stride.
Baseline is the result from evaluating [OpenAI's GPT-2](https://huggingface.co/gpt2) on the same examples.
| Target language | PPL               | Baseline PPL      |
|-----------------|-------------------|-------------------|
| en              |42.175106048583984 |26.562532424926758 |
| de              |225.5620574951172  |56.907039642333984 |
| es              |184.9262237548828  |55.592445373535156 |
| fr              |170.0771026611328  ||
|it               |238.36192321777344 ||
|pt               |203.595947265625   ||
|nl               |225.9720001220703  ||

The following script was used for evaluation


```python
import numpy as np
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
from tqdm import tqdm
import random

# Set the seed for reproducibility
random.seed(42)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load the model
model_name = "RaiBP/gpt2-openwebtext2-first-30-chunks-ablation-non-english"
model = AutoModelForCausalLM.from_pretrained(model_name).to(device)
tokenizer = AutoTokenizer.from_pretrained(model_name)

target_language_dataset = "20231101.de" # change here for other languages

dataset = load_dataset("wikimedia/wikipedia", target_language_dataset, split="train")
num_examples = 2000
random_numbers = list(np.random.randint(0, len(dataset), num_examples))
examples = []
for i in tqdm(random_numbers):
    examples.append(dataset[int(i)]["text"])
encodings = tokenizer("\n\n".join(examples), return_tensors="pt")

max_length = model.config.n_positions
stride = 512
seq_len = encodings.input_ids.size(1)

nlls = []
prev_end_loc = 0
for begin_loc in tqdm(range(0, seq_len, stride)):
    end_loc = min(begin_loc + max_length, seq_len)
    trg_len = end_loc - prev_end_loc  # may be different from stride on last loop
    input_ids = encodings.input_ids[:, begin_loc:end_loc].to(device)
    target_ids = input_ids.clone()
    target_ids[:, :-trg_len] = -100

    with torch.no_grad():
        outputs = model(input_ids, labels=target_ids)

        # loss is calculated using CrossEntropyLoss which averages over valid labels
        # N.B. the model only calculates loss over trg_len - 1 labels, because it internally shifts the labels
        # to the left by 1.
        neg_log_likelihood = outputs.loss

    nlls.append(neg_log_likelihood)

    prev_end_loc = end_loc
    if end_loc == seq_len:
        break

ppl = torch.exp(torch.stack(nlls).mean())

print("Perplexity: ", ppl.item())
```


### Framework versions

- Transformers 4.37.0.dev0
- Pytorch 1.13.0
- Datasets 2.16.0
- Tokenizers 0.15.0