ruggsea commited on
Commit
76e490c
·
verified ·
1 Parent(s): d60b3ee

Upload Dante-Zero model trained on 2025-03-05

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. README.md +58 -0
  2. checkpoint-100/config.json +30 -0
  3. checkpoint-100/generation_config.json +6 -0
  4. checkpoint-100/model.safetensors +3 -0
  5. checkpoint-100/optimizer.pt +3 -0
  6. checkpoint-100/rng_state.pth +3 -0
  7. checkpoint-100/scheduler.pt +3 -0
  8. checkpoint-100/special_tokens_map.json +24 -0
  9. checkpoint-100/tokenizer.json +0 -0
  10. checkpoint-100/tokenizer_config.json +48 -0
  11. checkpoint-100/trainer_state.json +1533 -0
  12. checkpoint-100/training_args.bin +3 -0
  13. checkpoint-1000/config.json +30 -0
  14. checkpoint-1000/generation_config.json +6 -0
  15. checkpoint-1000/model.safetensors +3 -0
  16. checkpoint-1000/optimizer.pt +3 -0
  17. checkpoint-1000/rng_state.pth +3 -0
  18. checkpoint-1000/scheduler.pt +3 -0
  19. checkpoint-1000/special_tokens_map.json +24 -0
  20. checkpoint-1000/tokenizer.json +0 -0
  21. checkpoint-1000/tokenizer_config.json +48 -0
  22. checkpoint-1000/trainer_state.json +0 -0
  23. checkpoint-1000/training_args.bin +3 -0
  24. checkpoint-1250/config.json +30 -0
  25. checkpoint-1250/generation_config.json +6 -0
  26. checkpoint-1250/model.safetensors +3 -0
  27. checkpoint-1250/optimizer.pt +3 -0
  28. checkpoint-1250/rng_state.pth +3 -0
  29. checkpoint-1250/scheduler.pt +3 -0
  30. checkpoint-1250/special_tokens_map.json +24 -0
  31. checkpoint-1250/tokenizer.json +0 -0
  32. checkpoint-1250/tokenizer_config.json +48 -0
  33. checkpoint-1250/trainer_state.json +0 -0
  34. checkpoint-1250/training_args.bin +3 -0
  35. checkpoint-1500/config.json +30 -0
  36. checkpoint-1500/generation_config.json +6 -0
  37. checkpoint-1500/model.safetensors +3 -0
  38. checkpoint-1500/optimizer.pt +3 -0
  39. checkpoint-1500/rng_state.pth +3 -0
  40. checkpoint-1500/scheduler.pt +3 -0
  41. checkpoint-1500/special_tokens_map.json +24 -0
  42. checkpoint-1500/tokenizer.json +0 -0
  43. checkpoint-1500/tokenizer_config.json +48 -0
  44. checkpoint-1500/trainer_state.json +0 -0
  45. checkpoint-1500/training_args.bin +3 -0
  46. checkpoint-200/config.json +30 -0
  47. checkpoint-200/generation_config.json +6 -0
  48. checkpoint-200/model.safetensors +3 -0
  49. checkpoint-200/optimizer.pt +3 -0
  50. checkpoint-200/rng_state.pth +3 -0
README.md ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Dante-Zero Fine-tuned Model
2
+
3
+ This model was fine-tuned using Reinforcement Learning with Generative Pre-trained Transformer Optimization (GRPO) to generate Dante-style poetry in endecasillabi (11-syllable lines).
4
+
5
+ ## Model Details
6
+
7
+ - **Base Model:** PleIAs/Pleias-350m-Preview
8
+ - **Training Method:** GRPO (Generative Pre-trained Transformer Optimization)
9
+ - **Training Data:** 1,000 chunks from Dante's Divine Comedy
10
+ - **Epochs:** 10
11
+ - **Trained By:** ruggsea
12
+ - **Date:** 2025-03-05
13
+
14
+ ## Model Description
15
+
16
+ This model is specialized in generating Italian poetry in the style of Dante Alighieri's Divine Comedy. It has been trained to:
17
+
18
+ 1. Generate proper endecasillabi (11-syllable lines)
19
+ 2. Follow the structure of Dante's poetry
20
+ 3. Avoid repetition
21
+ 4. Create original content (not plagiarize the Divine Comedy)
22
+
23
+ ## Usage
24
+
25
+ ```python
26
+ from transformers import AutoModelForCausalLM, AutoTokenizer
27
+
28
+ # Load model and tokenizer
29
+ model = AutoModelForCausalLM.from_pretrained("ruggsea/dante-zero-2025-03-05")
30
+ tokenizer = AutoTokenizer.from_pretrained("ruggsea/dante-zero-2025-03-05")
31
+
32
+ # Generate poetry
33
+ prompt = "Nel mezzo del cammin di nostra vita"
34
+ inputs = tokenizer(prompt, return_tensors="pt")
35
+ outputs = model.generate(
36
+ inputs.input_ids,
37
+ max_new_tokens=200,
38
+ do_sample=True,
39
+ temperature=0.7,
40
+ top_p=0.9,
41
+ repetition_penalty=1.2
42
+ )
43
+ generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
44
+ print(generated_text)
45
+ ```
46
+
47
+ ## Reward Functions
48
+
49
+ The model was trained using several reward functions:
50
+
51
+ 1. **Endecasillabo Checker:** Rewards proper 11-syllable lines
52
+ 2. **Plagiarism Checker:** Penalizes copying from the Divine Comedy
53
+ 3. **Verse Structure Checker:** Encourages verse-like structure
54
+ 4. **Repetition Penalty:** Discourages repetitive patterns
55
+
56
+ ## License
57
+
58
+ This model is available under the same license as the base model (PleIAs/Pleias-350m-Preview).
checkpoint-100/config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "PleIAs/Pleias-350m-Preview",
3
+ "architectures": [
4
+ "LlamaForCausalLM"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "bos_token_id": 1,
9
+ "eos_token_id": 2,
10
+ "head_dim": 64,
11
+ "hidden_act": "silu",
12
+ "hidden_size": 1024,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 2560,
15
+ "max_position_embeddings": 2048,
16
+ "mlp_bias": false,
17
+ "model_type": "llama",
18
+ "num_attention_heads": 16,
19
+ "num_hidden_layers": 26,
20
+ "num_key_value_heads": 8,
21
+ "pretraining_tp": 1,
22
+ "rms_norm_eps": 1e-05,
23
+ "rope_scaling": null,
24
+ "rope_theta": 10000,
25
+ "tie_word_embeddings": true,
26
+ "torch_dtype": "bfloat16",
27
+ "transformers_version": "4.49.0",
28
+ "use_cache": true,
29
+ "vocab_size": 65536
30
+ }
checkpoint-100/generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.49.0"
6
+ }
checkpoint-100/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1c99a7f8c77cc79ed15835969ef60ae7b3c2cbbdab4139110f198edbad56c705
3
+ size 706875632
checkpoint-100/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:77edf2d5a4084c047bb211b08dc91f5697a9d6cb4e2f43fcb5d239222ed9228e
3
+ size 1413896442
checkpoint-100/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c1bb900e9a2b6ab4fe70d2528f090a3254a977221d0c269a3033bf92d7e4cd5a
3
+ size 14180
checkpoint-100/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b1b34a0d732d301e0162f57e8852beddd92961cfdf875b064b06f7ef56cc1a2e
3
+ size 1064
checkpoint-100/special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|end_of_text|>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|end_of_text|>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "<|end_of_text|>",
17
+ "unk_token": {
18
+ "content": "[UNK]",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
checkpoint-100/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-100/tokenizer_config.json ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[UNK]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<|begin_of_text|>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "<|end_of_text|>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[PAD]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ }
35
+ },
36
+ "bos_token": "<|end_of_text|>",
37
+ "clean_up_tokenization_spaces": true,
38
+ "eos_token": "<|end_of_text|>",
39
+ "extra_special_tokens": {},
40
+ "model_max_length": 1000000000000000019884624838656,
41
+ "pad_token": "<|end_of_text|>",
42
+ "padding_side": "left",
43
+ "return_token_type_ids": false,
44
+ "tokenizer_class": "PreTrainedTokenizer",
45
+ "unk_token": "[UNK]",
46
+ "use_token_type_ids": false,
47
+ "vocab_size": 65536
48
+ }
checkpoint-100/trainer_state.json ADDED
@@ -0,0 +1,1533 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 0.4,
5
+ "eval_steps": 500,
6
+ "global_step": 100,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "completion_length": 179.5625,
13
+ "epoch": 0.004,
14
+ "grad_norm": 2.046875,
15
+ "kl": 0.0,
16
+ "learning_rate": 2.0000000000000003e-06,
17
+ "loss": 0.0,
18
+ "reward": 0.9819051176309586,
19
+ "reward_std": 0.2500194739550352,
20
+ "rewards/check_divine_comedy_plagiarism": 0.0,
21
+ "rewards/endecasillabo_reward_func": -0.05803571408614516,
22
+ "rewards/no_repetition_reward_func": 0.946190819144249,
23
+ "rewards/verse_reward_func": 0.09375,
24
+ "step": 1
25
+ },
26
+ {
27
+ "completion_length": 183.1875,
28
+ "epoch": 0.008,
29
+ "grad_norm": 1.1875,
30
+ "kl": 0.0,
31
+ "learning_rate": 4.000000000000001e-06,
32
+ "loss": -0.0,
33
+ "reward": 1.0611461400985718,
34
+ "reward_std": 0.16349453944712877,
35
+ "rewards/check_divine_comedy_plagiarism": 0.0,
36
+ "rewards/endecasillabo_reward_func": 0.0052083334885537624,
37
+ "rewards/no_repetition_reward_func": 0.962187796831131,
38
+ "rewards/verse_reward_func": 0.09375,
39
+ "step": 2
40
+ },
41
+ {
42
+ "completion_length": 177.3125,
43
+ "epoch": 0.012,
44
+ "grad_norm": 3.34375,
45
+ "kl": 0.0009225639951182529,
46
+ "learning_rate": 6e-06,
47
+ "loss": 0.0,
48
+ "reward": 0.927303671836853,
49
+ "reward_std": 0.20073290541768074,
50
+ "rewards/check_divine_comedy_plagiarism": 0.0,
51
+ "rewards/endecasillabo_reward_func": -0.0625,
52
+ "rewards/no_repetition_reward_func": 0.958553671836853,
53
+ "rewards/verse_reward_func": 0.03125,
54
+ "step": 3
55
+ },
56
+ {
57
+ "completion_length": 164.375,
58
+ "epoch": 0.016,
59
+ "grad_norm": 1.3515625,
60
+ "kl": 0.0010245055600535125,
61
+ "learning_rate": 8.000000000000001e-06,
62
+ "loss": 0.0,
63
+ "reward": 0.9359359890222549,
64
+ "reward_std": 0.16188888170290738,
65
+ "rewards/check_divine_comedy_plagiarism": 0.0,
66
+ "rewards/endecasillabo_reward_func": -0.0625,
67
+ "rewards/no_repetition_reward_func": 0.9671860039234161,
68
+ "rewards/verse_reward_func": 0.03125,
69
+ "step": 4
70
+ },
71
+ {
72
+ "completion_length": 181.9375,
73
+ "epoch": 0.02,
74
+ "grad_norm": 1.2109375,
75
+ "kl": 0.000991553533822298,
76
+ "learning_rate": 1e-05,
77
+ "loss": 0.0,
78
+ "reward": 1.0427572429180145,
79
+ "reward_std": 0.14885016926564276,
80
+ "rewards/check_divine_comedy_plagiarism": 0.0,
81
+ "rewards/endecasillabo_reward_func": 0.010416666977107525,
82
+ "rewards/no_repetition_reward_func": 0.9698406159877777,
83
+ "rewards/verse_reward_func": 0.0625,
84
+ "step": 5
85
+ },
86
+ {
87
+ "completion_length": 199.3125,
88
+ "epoch": 0.024,
89
+ "grad_norm": 0.78125,
90
+ "kl": 0.0008464615239063278,
91
+ "learning_rate": 1.2e-05,
92
+ "loss": 0.0,
93
+ "reward": 0.897506594657898,
94
+ "reward_std": 0.0972918642219156,
95
+ "rewards/check_divine_comedy_plagiarism": 0.0,
96
+ "rewards/endecasillabo_reward_func": 0.0,
97
+ "rewards/no_repetition_reward_func": 0.897506594657898,
98
+ "rewards/verse_reward_func": 0.0,
99
+ "step": 6
100
+ },
101
+ {
102
+ "completion_length": 179.0625,
103
+ "epoch": 0.028,
104
+ "grad_norm": 0.95703125,
105
+ "kl": 0.0009047269122675061,
106
+ "learning_rate": 1.4000000000000001e-05,
107
+ "loss": 0.0,
108
+ "reward": 0.9670905768871307,
109
+ "reward_std": 0.11869156261673197,
110
+ "rewards/check_divine_comedy_plagiarism": 0.0,
111
+ "rewards/endecasillabo_reward_func": 0.0,
112
+ "rewards/no_repetition_reward_func": 0.9358405917882919,
113
+ "rewards/verse_reward_func": 0.03125,
114
+ "step": 7
115
+ },
116
+ {
117
+ "completion_length": 200.0,
118
+ "epoch": 0.032,
119
+ "grad_norm": 0.82421875,
120
+ "kl": 0.0009274971816921607,
121
+ "learning_rate": 1.6000000000000003e-05,
122
+ "loss": 0.0,
123
+ "reward": 1.0096274018287659,
124
+ "reward_std": 0.1000128339510411,
125
+ "rewards/check_divine_comedy_plagiarism": 0.0,
126
+ "rewards/endecasillabo_reward_func": 0.003289473708719015,
127
+ "rewards/no_repetition_reward_func": 0.9438379555940628,
128
+ "rewards/verse_reward_func": 0.0625,
129
+ "step": 8
130
+ },
131
+ {
132
+ "completion_length": 191.4375,
133
+ "epoch": 0.036,
134
+ "grad_norm": 0.9609375,
135
+ "kl": 0.0009898433345369995,
136
+ "learning_rate": 1.8e-05,
137
+ "loss": 0.0,
138
+ "reward": 1.0010679364204407,
139
+ "reward_std": 0.0678855258738622,
140
+ "rewards/check_divine_comedy_plagiarism": 0.0,
141
+ "rewards/endecasillabo_reward_func": 0.0,
142
+ "rewards/no_repetition_reward_func": 0.9698179215192795,
143
+ "rewards/verse_reward_func": 0.03125,
144
+ "step": 9
145
+ },
146
+ {
147
+ "completion_length": 179.625,
148
+ "epoch": 0.04,
149
+ "grad_norm": 1.296875,
150
+ "kl": 0.0010007202363340184,
151
+ "learning_rate": 2e-05,
152
+ "loss": 0.0,
153
+ "reward": 1.006456971168518,
154
+ "reward_std": 0.1597397131845355,
155
+ "rewards/check_divine_comedy_plagiarism": 0.0,
156
+ "rewards/endecasillabo_reward_func": 0.0,
157
+ "rewards/no_repetition_reward_func": 0.9439570009708405,
158
+ "rewards/verse_reward_func": 0.0625,
159
+ "step": 10
160
+ },
161
+ {
162
+ "completion_length": 183.375,
163
+ "epoch": 0.044,
164
+ "grad_norm": 1.75,
165
+ "kl": 0.0008183487225323915,
166
+ "learning_rate": 2.2000000000000003e-05,
167
+ "loss": 0.0,
168
+ "reward": 0.9860096573829651,
169
+ "reward_std": 0.23296335770282894,
170
+ "rewards/check_divine_comedy_plagiarism": 0.0,
171
+ "rewards/endecasillabo_reward_func": -0.0625,
172
+ "rewards/no_repetition_reward_func": 0.9235096573829651,
173
+ "rewards/verse_reward_func": 0.125,
174
+ "step": 11
175
+ },
176
+ {
177
+ "completion_length": 188.8125,
178
+ "epoch": 0.048,
179
+ "grad_norm": 2.109375,
180
+ "kl": 0.0009898586868075654,
181
+ "learning_rate": 2.4e-05,
182
+ "loss": 0.0,
183
+ "reward": 0.9310034066438675,
184
+ "reward_std": 0.2854597745463252,
185
+ "rewards/check_divine_comedy_plagiarism": 0.0,
186
+ "rewards/endecasillabo_reward_func": -0.0625,
187
+ "rewards/no_repetition_reward_func": 0.9310034215450287,
188
+ "rewards/verse_reward_func": 0.0625,
189
+ "step": 12
190
+ },
191
+ {
192
+ "completion_length": 182.0,
193
+ "epoch": 0.052,
194
+ "grad_norm": 0.9453125,
195
+ "kl": 0.000811553152743727,
196
+ "learning_rate": 2.6000000000000002e-05,
197
+ "loss": 0.0,
198
+ "reward": 1.0062111169099808,
199
+ "reward_std": 0.17000664526131004,
200
+ "rewards/check_divine_comedy_plagiarism": 0.0,
201
+ "rewards/endecasillabo_reward_func": 0.0,
202
+ "rewards/no_repetition_reward_func": 0.9437110871076584,
203
+ "rewards/verse_reward_func": 0.0625,
204
+ "step": 13
205
+ },
206
+ {
207
+ "completion_length": 178.5,
208
+ "epoch": 0.056,
209
+ "grad_norm": 1.3359375,
210
+ "kl": 0.0009504656482022256,
211
+ "learning_rate": 2.8000000000000003e-05,
212
+ "loss": 0.0,
213
+ "reward": 0.9895833879709244,
214
+ "reward_std": 0.07891088706674054,
215
+ "rewards/check_divine_comedy_plagiarism": 0.0,
216
+ "rewards/endecasillabo_reward_func": 0.0,
217
+ "rewards/no_repetition_reward_func": 0.9583334028720856,
218
+ "rewards/verse_reward_func": 0.03125,
219
+ "step": 14
220
+ },
221
+ {
222
+ "completion_length": 192.9375,
223
+ "epoch": 0.06,
224
+ "grad_norm": 0.98828125,
225
+ "kl": 0.0009447222109884024,
226
+ "learning_rate": 3e-05,
227
+ "loss": 0.0,
228
+ "reward": 0.8987747207283974,
229
+ "reward_std": 0.2582971692318097,
230
+ "rewards/check_divine_comedy_plagiarism": 0.0,
231
+ "rewards/endecasillabo_reward_func": -0.125,
232
+ "rewards/no_repetition_reward_func": 0.9612747132778168,
233
+ "rewards/verse_reward_func": 0.0625,
234
+ "step": 15
235
+ },
236
+ {
237
+ "completion_length": 186.9375,
238
+ "epoch": 0.064,
239
+ "grad_norm": 1.0,
240
+ "kl": 0.0009807306778384373,
241
+ "learning_rate": 3.2000000000000005e-05,
242
+ "loss": 0.0,
243
+ "reward": 1.0321685820817947,
244
+ "reward_std": 0.12863060203380883,
245
+ "rewards/check_divine_comedy_plagiarism": 0.0,
246
+ "rewards/endecasillabo_reward_func": 0.0,
247
+ "rewards/no_repetition_reward_func": 0.9696685969829559,
248
+ "rewards/verse_reward_func": 0.0625,
249
+ "step": 16
250
+ },
251
+ {
252
+ "completion_length": 155.5625,
253
+ "epoch": 0.068,
254
+ "grad_norm": 1.21875,
255
+ "kl": 0.0010486226383363828,
256
+ "learning_rate": 3.4000000000000007e-05,
257
+ "loss": 0.0,
258
+ "reward": 0.9713962525129318,
259
+ "reward_std": 0.24358075205236673,
260
+ "rewards/check_divine_comedy_plagiarism": 0.0,
261
+ "rewards/endecasillabo_reward_func": -0.125,
262
+ "rewards/no_repetition_reward_func": 0.9713962525129318,
263
+ "rewards/verse_reward_func": 0.125,
264
+ "step": 17
265
+ },
266
+ {
267
+ "completion_length": 187.5625,
268
+ "epoch": 0.072,
269
+ "grad_norm": 0.91015625,
270
+ "kl": 0.0009779602842172608,
271
+ "learning_rate": 3.6e-05,
272
+ "loss": 0.0,
273
+ "reward": 1.0452248454093933,
274
+ "reward_std": 0.216551274061203,
275
+ "rewards/check_divine_comedy_plagiarism": 0.0,
276
+ "rewards/endecasillabo_reward_func": 0.004464285913854837,
277
+ "rewards/no_repetition_reward_func": 0.9470105767250061,
278
+ "rewards/verse_reward_func": 0.09375,
279
+ "step": 18
280
+ },
281
+ {
282
+ "completion_length": 185.5625,
283
+ "epoch": 0.076,
284
+ "grad_norm": 1.2890625,
285
+ "kl": 0.001001911296043545,
286
+ "learning_rate": 3.8e-05,
287
+ "loss": 0.0,
288
+ "reward": 0.9728774726390839,
289
+ "reward_std": 0.09631168603664264,
290
+ "rewards/check_divine_comedy_plagiarism": 0.0,
291
+ "rewards/endecasillabo_reward_func": 0.0,
292
+ "rewards/no_repetition_reward_func": 0.9416275024414062,
293
+ "rewards/verse_reward_func": 0.03125,
294
+ "step": 19
295
+ },
296
+ {
297
+ "completion_length": 200.0,
298
+ "epoch": 0.08,
299
+ "grad_norm": 0.8125,
300
+ "kl": 0.0009419274720130488,
301
+ "learning_rate": 4e-05,
302
+ "loss": 0.0,
303
+ "reward": 0.9773482233285904,
304
+ "reward_std": 0.07271566009148955,
305
+ "rewards/check_divine_comedy_plagiarism": 0.0,
306
+ "rewards/endecasillabo_reward_func": 0.004464285913854837,
307
+ "rewards/no_repetition_reward_func": 0.9416339844465256,
308
+ "rewards/verse_reward_func": 0.03125,
309
+ "step": 20
310
+ },
311
+ {
312
+ "completion_length": 196.3125,
313
+ "epoch": 0.084,
314
+ "grad_norm": 0.90625,
315
+ "kl": 0.0009593678259989247,
316
+ "learning_rate": 4.2e-05,
317
+ "loss": 0.0,
318
+ "reward": 0.9732275754213333,
319
+ "reward_std": 0.10389877262059599,
320
+ "rewards/check_divine_comedy_plagiarism": 0.0,
321
+ "rewards/endecasillabo_reward_func": 0.0,
322
+ "rewards/no_repetition_reward_func": 0.9419775754213333,
323
+ "rewards/verse_reward_func": 0.03125,
324
+ "step": 21
325
+ },
326
+ {
327
+ "completion_length": 186.625,
328
+ "epoch": 0.088,
329
+ "grad_norm": 1.546875,
330
+ "kl": 0.0011393697932362556,
331
+ "learning_rate": 4.4000000000000006e-05,
332
+ "loss": 0.0,
333
+ "reward": 1.0079743266105652,
334
+ "reward_std": 0.07848087884485722,
335
+ "rewards/check_divine_comedy_plagiarism": 0.0,
336
+ "rewards/endecasillabo_reward_func": 0.007708333432674408,
337
+ "rewards/no_repetition_reward_func": 0.9690160155296326,
338
+ "rewards/verse_reward_func": 0.03125,
339
+ "step": 22
340
+ },
341
+ {
342
+ "completion_length": 199.0625,
343
+ "epoch": 0.092,
344
+ "grad_norm": 0.9609375,
345
+ "kl": 0.0011462626862339675,
346
+ "learning_rate": 4.600000000000001e-05,
347
+ "loss": 0.0,
348
+ "reward": 0.9748821258544922,
349
+ "reward_std": 0.020194193988572806,
350
+ "rewards/check_divine_comedy_plagiarism": 0.0,
351
+ "rewards/endecasillabo_reward_func": 0.007352941203862429,
352
+ "rewards/no_repetition_reward_func": 0.9675291776657104,
353
+ "rewards/verse_reward_func": 0.0,
354
+ "step": 23
355
+ },
356
+ {
357
+ "completion_length": 175.0,
358
+ "epoch": 0.096,
359
+ "grad_norm": 1.5859375,
360
+ "kl": 0.0012009456986561418,
361
+ "learning_rate": 4.8e-05,
362
+ "loss": 0.0,
363
+ "reward": 0.9423503875732422,
364
+ "reward_std": 0.1669474468799308,
365
+ "rewards/check_divine_comedy_plagiarism": 0.0,
366
+ "rewards/endecasillabo_reward_func": -0.05859375,
367
+ "rewards/no_repetition_reward_func": 0.9696941375732422,
368
+ "rewards/verse_reward_func": 0.03125,
369
+ "step": 24
370
+ },
371
+ {
372
+ "completion_length": 164.0,
373
+ "epoch": 0.1,
374
+ "grad_norm": 1.1171875,
375
+ "kl": 0.0012017716944683343,
376
+ "learning_rate": 5e-05,
377
+ "loss": 0.0,
378
+ "reward": 0.998960480093956,
379
+ "reward_std": 0.05968155374284834,
380
+ "rewards/check_divine_comedy_plagiarism": 0.0,
381
+ "rewards/endecasillabo_reward_func": 0.0,
382
+ "rewards/no_repetition_reward_func": 0.967710480093956,
383
+ "rewards/verse_reward_func": 0.03125,
384
+ "step": 25
385
+ },
386
+ {
387
+ "completion_length": 186.1875,
388
+ "epoch": 0.104,
389
+ "grad_norm": 1.25,
390
+ "kl": 0.001386465271934867,
391
+ "learning_rate": 4.999756310023261e-05,
392
+ "loss": 0.0001,
393
+ "reward": 1.024965062737465,
394
+ "reward_std": 0.12524886144092306,
395
+ "rewards/check_divine_comedy_plagiarism": 0.0,
396
+ "rewards/endecasillabo_reward_func": 0.0,
397
+ "rewards/no_repetition_reward_func": 0.9624650627374649,
398
+ "rewards/verse_reward_func": 0.0625,
399
+ "step": 26
400
+ },
401
+ {
402
+ "completion_length": 199.6875,
403
+ "epoch": 0.108,
404
+ "grad_norm": 0.890625,
405
+ "kl": 0.0012058749562129378,
406
+ "learning_rate": 4.999025287600886e-05,
407
+ "loss": 0.0,
408
+ "reward": 0.9964274168014526,
409
+ "reward_std": 0.26108699198812246,
410
+ "rewards/check_divine_comedy_plagiarism": 0.0,
411
+ "rewards/endecasillabo_reward_func": -0.0625,
412
+ "rewards/no_repetition_reward_func": 0.9651773869991302,
413
+ "rewards/verse_reward_func": 0.09375,
414
+ "step": 27
415
+ },
416
+ {
417
+ "completion_length": 179.75,
418
+ "epoch": 0.112,
419
+ "grad_norm": 1.0703125,
420
+ "kl": 0.00140558643033728,
421
+ "learning_rate": 4.997807075247146e-05,
422
+ "loss": 0.0001,
423
+ "reward": 0.9315525591373444,
424
+ "reward_std": 0.1886110061313957,
425
+ "rewards/check_divine_comedy_plagiarism": 0.0,
426
+ "rewards/endecasillabo_reward_func": -0.0625,
427
+ "rewards/no_repetition_reward_func": 0.9628025740385056,
428
+ "rewards/verse_reward_func": 0.03125,
429
+ "step": 28
430
+ },
431
+ {
432
+ "completion_length": 163.5,
433
+ "epoch": 0.116,
434
+ "grad_norm": 1.6171875,
435
+ "kl": 0.00141397793777287,
436
+ "learning_rate": 4.996101910454953e-05,
437
+ "loss": 0.0001,
438
+ "reward": 0.9228900671005249,
439
+ "reward_std": 0.08080775220878422,
440
+ "rewards/check_divine_comedy_plagiarism": 0.0,
441
+ "rewards/endecasillabo_reward_func": -0.0625,
442
+ "rewards/no_repetition_reward_func": 0.9541400671005249,
443
+ "rewards/verse_reward_func": 0.03125,
444
+ "step": 29
445
+ },
446
+ {
447
+ "completion_length": 185.0,
448
+ "epoch": 0.12,
449
+ "grad_norm": 0.921875,
450
+ "kl": 0.0018486627377569675,
451
+ "learning_rate": 4.993910125649561e-05,
452
+ "loss": 0.0001,
453
+ "reward": 0.9309101402759552,
454
+ "reward_std": 0.07874627423007041,
455
+ "rewards/check_divine_comedy_plagiarism": 0.0,
456
+ "rewards/endecasillabo_reward_func": 0.0,
457
+ "rewards/no_repetition_reward_func": 0.9309101402759552,
458
+ "rewards/verse_reward_func": 0.0,
459
+ "step": 30
460
+ },
461
+ {
462
+ "completion_length": 181.6875,
463
+ "epoch": 0.124,
464
+ "grad_norm": 1.265625,
465
+ "kl": 0.0024124052142724395,
466
+ "learning_rate": 4.991232148123761e-05,
467
+ "loss": 0.0001,
468
+ "reward": 0.9530736654996872,
469
+ "reward_std": 0.2654994917102158,
470
+ "rewards/check_divine_comedy_plagiarism": 0.0,
471
+ "rewards/endecasillabo_reward_func": -0.0625,
472
+ "rewards/no_repetition_reward_func": 0.9530736953020096,
473
+ "rewards/verse_reward_func": 0.0625,
474
+ "step": 31
475
+ },
476
+ {
477
+ "completion_length": 193.25,
478
+ "epoch": 0.128,
479
+ "grad_norm": 1.0234375,
480
+ "kl": 0.002259129600133747,
481
+ "learning_rate": 4.988068499954578e-05,
482
+ "loss": 0.0001,
483
+ "reward": 1.0645287036895752,
484
+ "reward_std": 0.22907709190621972,
485
+ "rewards/check_divine_comedy_plagiarism": 0.0,
486
+ "rewards/endecasillabo_reward_func": 0.0,
487
+ "rewards/no_repetition_reward_func": 0.9395287036895752,
488
+ "rewards/verse_reward_func": 0.125,
489
+ "step": 32
490
+ },
491
+ {
492
+ "completion_length": 200.0,
493
+ "epoch": 0.132,
494
+ "grad_norm": 0.875,
495
+ "kl": 0.002345684450119734,
496
+ "learning_rate": 4.984419797901491e-05,
497
+ "loss": 0.0001,
498
+ "reward": 0.9817457795143127,
499
+ "reward_std": 0.05011130444472656,
500
+ "rewards/check_divine_comedy_plagiarism": 0.0,
501
+ "rewards/endecasillabo_reward_func": 0.004166666883975267,
502
+ "rewards/no_repetition_reward_func": 0.9463291019201279,
503
+ "rewards/verse_reward_func": 0.03125,
504
+ "step": 33
505
+ },
506
+ {
507
+ "completion_length": 171.875,
508
+ "epoch": 0.136,
509
+ "grad_norm": 1.1484375,
510
+ "kl": 0.0031502785277552903,
511
+ "learning_rate": 4.980286753286195e-05,
512
+ "loss": 0.0001,
513
+ "reward": 1.0629110634326935,
514
+ "reward_std": 0.1909176445333287,
515
+ "rewards/check_divine_comedy_plagiarism": 0.0,
516
+ "rewards/endecasillabo_reward_func": 0.0,
517
+ "rewards/no_repetition_reward_func": 0.9691610336303711,
518
+ "rewards/verse_reward_func": 0.09375,
519
+ "step": 34
520
+ },
521
+ {
522
+ "completion_length": 192.5,
523
+ "epoch": 0.14,
524
+ "grad_norm": 1.0390625,
525
+ "kl": 0.0024105910561047494,
526
+ "learning_rate": 4.975670171853926e-05,
527
+ "loss": 0.0001,
528
+ "reward": 1.0318033248186111,
529
+ "reward_std": 0.12928975140675902,
530
+ "rewards/check_divine_comedy_plagiarism": 0.0,
531
+ "rewards/endecasillabo_reward_func": 0.0,
532
+ "rewards/no_repetition_reward_func": 0.9693033397197723,
533
+ "rewards/verse_reward_func": 0.0625,
534
+ "step": 35
535
+ },
536
+ {
537
+ "completion_length": 168.8125,
538
+ "epoch": 0.144,
539
+ "grad_norm": 1.4765625,
540
+ "kl": 0.003135324106551707,
541
+ "learning_rate": 4.9705709536163824e-05,
542
+ "loss": 0.0001,
543
+ "reward": 0.9991441071033478,
544
+ "reward_std": 0.07299526745919138,
545
+ "rewards/check_divine_comedy_plagiarism": 0.0,
546
+ "rewards/endecasillabo_reward_func": 0.0,
547
+ "rewards/no_repetition_reward_func": 0.967894122004509,
548
+ "rewards/verse_reward_func": 0.03125,
549
+ "step": 36
550
+ },
551
+ {
552
+ "completion_length": 176.75,
553
+ "epoch": 0.148,
554
+ "grad_norm": 1.0,
555
+ "kl": 0.003276034549344331,
556
+ "learning_rate": 4.964990092676263e-05,
557
+ "loss": 0.0001,
558
+ "reward": 1.008391559123993,
559
+ "reward_std": 0.07070710451807827,
560
+ "rewards/check_divine_comedy_plagiarism": 0.0,
561
+ "rewards/endecasillabo_reward_func": 0.01131535996682942,
562
+ "rewards/no_repetition_reward_func": 0.9658262133598328,
563
+ "rewards/verse_reward_func": 0.03125,
564
+ "step": 37
565
+ },
566
+ {
567
+ "completion_length": 186.25,
568
+ "epoch": 0.152,
569
+ "grad_norm": 1.0390625,
570
+ "kl": 0.0030353819020092487,
571
+ "learning_rate": 4.9589286770334654e-05,
572
+ "loss": 0.0001,
573
+ "reward": 0.972317710518837,
574
+ "reward_std": 0.016954098246060312,
575
+ "rewards/check_divine_comedy_plagiarism": 0.0,
576
+ "rewards/endecasillabo_reward_func": 0.004166666883975267,
577
+ "rewards/no_repetition_reward_func": 0.9681510329246521,
578
+ "rewards/verse_reward_func": 0.0,
579
+ "step": 38
580
+ },
581
+ {
582
+ "completion_length": 200.0,
583
+ "epoch": 0.156,
584
+ "grad_norm": 0.8828125,
585
+ "kl": 0.0033272147993557155,
586
+ "learning_rate": 4.952387888372979e-05,
587
+ "loss": 0.0001,
588
+ "reward": 1.0669118165969849,
589
+ "reward_std": 0.1348436245461926,
590
+ "rewards/check_divine_comedy_plagiarism": 0.0,
591
+ "rewards/endecasillabo_reward_func": 0.0,
592
+ "rewards/no_repetition_reward_func": 0.9419118165969849,
593
+ "rewards/verse_reward_func": 0.125,
594
+ "step": 39
595
+ },
596
+ {
597
+ "completion_length": 173.625,
598
+ "epoch": 0.16,
599
+ "grad_norm": 1.84375,
600
+ "kl": 0.003484633460175246,
601
+ "learning_rate": 4.9453690018345144e-05,
602
+ "loss": 0.0001,
603
+ "reward": 0.999244287610054,
604
+ "reward_std": 0.18777411000337452,
605
+ "rewards/check_divine_comedy_plagiarism": 0.0,
606
+ "rewards/endecasillabo_reward_func": -0.0625,
607
+ "rewards/no_repetition_reward_func": 0.9679943025112152,
608
+ "rewards/verse_reward_func": 0.09375,
609
+ "step": 40
610
+ },
611
+ {
612
+ "completion_length": 179.6875,
613
+ "epoch": 0.164,
614
+ "grad_norm": 1.140625,
615
+ "kl": 0.004894184530712664,
616
+ "learning_rate": 4.937873385763908e-05,
617
+ "loss": 0.0002,
618
+ "reward": 1.0016028583049774,
619
+ "reward_std": 0.07005300477612764,
620
+ "rewards/check_divine_comedy_plagiarism": 0.0,
621
+ "rewards/endecasillabo_reward_func": 0.0,
622
+ "rewards/no_repetition_reward_func": 0.9703528732061386,
623
+ "rewards/verse_reward_func": 0.03125,
624
+ "step": 41
625
+ },
626
+ {
627
+ "completion_length": 190.625,
628
+ "epoch": 0.168,
629
+ "grad_norm": 1.03125,
630
+ "kl": 0.0029558211099356413,
631
+ "learning_rate": 4.929902501446366e-05,
632
+ "loss": 0.0001,
633
+ "reward": 1.0353728234767914,
634
+ "reward_std": 0.1293391860090196,
635
+ "rewards/check_divine_comedy_plagiarism": 0.0,
636
+ "rewards/endecasillabo_reward_func": 0.004166666883975267,
637
+ "rewards/no_repetition_reward_func": 0.9687061756849289,
638
+ "rewards/verse_reward_func": 0.0625,
639
+ "step": 42
640
+ },
641
+ {
642
+ "completion_length": 187.6875,
643
+ "epoch": 0.172,
644
+ "grad_norm": 1.1484375,
645
+ "kl": 0.0038812385755591094,
646
+ "learning_rate": 4.9214579028215776e-05,
647
+ "loss": 0.0002,
648
+ "reward": 0.9657279551029205,
649
+ "reward_std": 0.015553490375168622,
650
+ "rewards/check_divine_comedy_plagiarism": 0.0,
651
+ "rewards/endecasillabo_reward_func": 0.0,
652
+ "rewards/no_repetition_reward_func": 0.9657279700040817,
653
+ "rewards/verse_reward_func": 0.0,
654
+ "step": 43
655
+ },
656
+ {
657
+ "completion_length": 190.25,
658
+ "epoch": 0.176,
659
+ "grad_norm": 0.83203125,
660
+ "kl": 0.0032730000093579292,
661
+ "learning_rate": 4.912541236180779e-05,
662
+ "loss": 0.0001,
663
+ "reward": 0.8466661870479584,
664
+ "reward_std": 0.20375583856366575,
665
+ "rewards/check_divine_comedy_plagiarism": 0.0,
666
+ "rewards/endecasillabo_reward_func": -0.0625,
667
+ "rewards/no_repetition_reward_func": 0.9091661721467972,
668
+ "rewards/verse_reward_func": 0.0,
669
+ "step": 44
670
+ },
671
+ {
672
+ "completion_length": 189.4375,
673
+ "epoch": 0.18,
674
+ "grad_norm": 1.03125,
675
+ "kl": 0.004002011672127992,
676
+ "learning_rate": 4.9031542398457974e-05,
677
+ "loss": 0.0002,
678
+ "reward": 1.0241023004055023,
679
+ "reward_std": 0.2693894528783858,
680
+ "rewards/check_divine_comedy_plagiarism": 0.0,
681
+ "rewards/endecasillabo_reward_func": -0.05833333311602473,
682
+ "rewards/no_repetition_reward_func": 0.9574356377124786,
683
+ "rewards/verse_reward_func": 0.125,
684
+ "step": 45
685
+ },
686
+ {
687
+ "completion_length": 200.0,
688
+ "epoch": 0.184,
689
+ "grad_norm": 1.109375,
690
+ "kl": 0.0036516414838843048,
691
+ "learning_rate": 4.893298743830168e-05,
692
+ "loss": 0.0001,
693
+ "reward": 1.031081184744835,
694
+ "reward_std": 0.12839890411123633,
695
+ "rewards/check_divine_comedy_plagiarism": 0.0,
696
+ "rewards/endecasillabo_reward_func": 0.0,
697
+ "rewards/no_repetition_reward_func": 0.9685812145471573,
698
+ "rewards/verse_reward_func": 0.0625,
699
+ "step": 46
700
+ },
701
+ {
702
+ "completion_length": 184.0625,
703
+ "epoch": 0.188,
704
+ "grad_norm": 1.28125,
705
+ "kl": 0.004063341184519231,
706
+ "learning_rate": 4.882976669482367e-05,
707
+ "loss": 0.0002,
708
+ "reward": 1.010192185640335,
709
+ "reward_std": 0.06739407801069319,
710
+ "rewards/check_divine_comedy_plagiarism": 0.0,
711
+ "rewards/endecasillabo_reward_func": 0.008928571827709675,
712
+ "rewards/no_repetition_reward_func": 0.9700136184692383,
713
+ "rewards/verse_reward_func": 0.03125,
714
+ "step": 47
715
+ },
716
+ {
717
+ "completion_length": 198.125,
718
+ "epoch": 0.192,
719
+ "grad_norm": 0.83203125,
720
+ "kl": 0.003968458739109337,
721
+ "learning_rate": 4.8721900291112415e-05,
722
+ "loss": 0.0002,
723
+ "reward": 0.9995202720165253,
724
+ "reward_std": 0.07142159587237984,
725
+ "rewards/check_divine_comedy_plagiarism": 0.0,
726
+ "rewards/endecasillabo_reward_func": 0.0024999999441206455,
727
+ "rewards/no_repetition_reward_func": 0.9657702594995499,
728
+ "rewards/verse_reward_func": 0.03125,
729
+ "step": 48
730
+ },
731
+ {
732
+ "completion_length": 186.8125,
733
+ "epoch": 0.196,
734
+ "grad_norm": 0.875,
735
+ "kl": 0.004230177321005613,
736
+ "learning_rate": 4.860940925593703e-05,
737
+ "loss": 0.0002,
738
+ "reward": 0.9467289745807648,
739
+ "reward_std": 0.05237383279018104,
740
+ "rewards/check_divine_comedy_plagiarism": 0.0,
741
+ "rewards/endecasillabo_reward_func": 0.0,
742
+ "rewards/no_repetition_reward_func": 0.9467289596796036,
743
+ "rewards/verse_reward_func": 0.0,
744
+ "step": 49
745
+ },
746
+ {
747
+ "completion_length": 172.25,
748
+ "epoch": 0.2,
749
+ "grad_norm": 1.0546875,
750
+ "kl": 0.007191646727733314,
751
+ "learning_rate": 4.849231551964771e-05,
752
+ "loss": 0.0003,
753
+ "reward": 1.1267552971839905,
754
+ "reward_std": 0.3819158934056759,
755
+ "rewards/check_divine_comedy_plagiarism": 0.0,
756
+ "rewards/endecasillabo_reward_func": -0.0625,
757
+ "rewards/no_repetition_reward_func": 0.9705053418874741,
758
+ "rewards/verse_reward_func": 0.21875,
759
+ "step": 50
760
+ },
761
+ {
762
+ "completion_length": 166.9375,
763
+ "epoch": 0.204,
764
+ "grad_norm": 1.40625,
765
+ "kl": 0.006566129217389971,
766
+ "learning_rate": 4.837064190990036e-05,
767
+ "loss": 0.0003,
768
+ "reward": 0.8921804428100586,
769
+ "reward_std": 0.1621550468262285,
770
+ "rewards/check_divine_comedy_plagiarism": 0.0,
771
+ "rewards/endecasillabo_reward_func": -0.055921051651239395,
772
+ "rewards/no_repetition_reward_func": 0.9481014907360077,
773
+ "rewards/verse_reward_func": 0.0,
774
+ "step": 51
775
+ },
776
+ {
777
+ "completion_length": 199.3125,
778
+ "epoch": 0.208,
779
+ "grad_norm": 0.8359375,
780
+ "kl": 0.006002974580042064,
781
+ "learning_rate": 4.8244412147206284e-05,
782
+ "loss": 0.0002,
783
+ "reward": 1.02969092130661,
784
+ "reward_std": 0.12929325131699443,
785
+ "rewards/check_divine_comedy_plagiarism": 0.0,
786
+ "rewards/endecasillabo_reward_func": 0.0,
787
+ "rewards/no_repetition_reward_func": 0.9671909362077713,
788
+ "rewards/verse_reward_func": 0.0625,
789
+ "step": 52
790
+ },
791
+ {
792
+ "completion_length": 158.75,
793
+ "epoch": 0.212,
794
+ "grad_norm": 1.75,
795
+ "kl": 0.006504544056952,
796
+ "learning_rate": 4.8113650840307834e-05,
797
+ "loss": 0.0003,
798
+ "reward": 0.9824613779783249,
799
+ "reward_std": 0.16255538212135434,
800
+ "rewards/check_divine_comedy_plagiarism": 0.0,
801
+ "rewards/endecasillabo_reward_func": -0.058823529398068786,
802
+ "rewards/no_repetition_reward_func": 0.9162849336862564,
803
+ "rewards/verse_reward_func": 0.125,
804
+ "step": 53
805
+ },
806
+ {
807
+ "completion_length": 177.375,
808
+ "epoch": 0.216,
809
+ "grad_norm": 1.171875,
810
+ "kl": 0.005495292483828962,
811
+ "learning_rate": 4.797838348138086e-05,
812
+ "loss": 0.0002,
813
+ "reward": 1.1409614980220795,
814
+ "reward_std": 0.25640933960676193,
815
+ "rewards/check_divine_comedy_plagiarism": 0.0,
816
+ "rewards/endecasillabo_reward_func": 0.0,
817
+ "rewards/no_repetition_reward_func": 0.9534614980220795,
818
+ "rewards/verse_reward_func": 0.1875,
819
+ "step": 54
820
+ },
821
+ {
822
+ "completion_length": 200.0,
823
+ "epoch": 0.22,
824
+ "grad_norm": 0.9375,
825
+ "kl": 0.005827408516779542,
826
+ "learning_rate": 4.783863644106502e-05,
827
+ "loss": 0.0002,
828
+ "reward": 0.9815671294927597,
829
+ "reward_std": 0.08528327068779618,
830
+ "rewards/check_divine_comedy_plagiarism": 0.0,
831
+ "rewards/endecasillabo_reward_func": 0.0,
832
+ "rewards/no_repetition_reward_func": 0.9503171741962433,
833
+ "rewards/verse_reward_func": 0.03125,
834
+ "step": 55
835
+ },
836
+ {
837
+ "completion_length": 161.75,
838
+ "epoch": 0.224,
839
+ "grad_norm": 2.875,
840
+ "kl": 0.0055718234507367015,
841
+ "learning_rate": 4.769443696332272e-05,
842
+ "loss": 0.0002,
843
+ "reward": 0.9660014510154724,
844
+ "reward_std": 0.19446178257931024,
845
+ "rewards/check_divine_comedy_plagiarism": 0.0,
846
+ "rewards/endecasillabo_reward_func": -0.0625,
847
+ "rewards/no_repetition_reward_func": 0.9660014659166336,
848
+ "rewards/verse_reward_func": 0.0625,
849
+ "step": 56
850
+ },
851
+ {
852
+ "completion_length": 168.1875,
853
+ "epoch": 0.228,
854
+ "grad_norm": 1.5625,
855
+ "kl": 0.007422439637593925,
856
+ "learning_rate": 4.754581316012785e-05,
857
+ "loss": 0.0003,
858
+ "reward": 0.8729029893875122,
859
+ "reward_std": 0.31897079292684793,
860
+ "rewards/check_divine_comedy_plagiarism": 0.0,
861
+ "rewards/endecasillabo_reward_func": -0.125,
862
+ "rewards/no_repetition_reward_func": 0.966652974486351,
863
+ "rewards/verse_reward_func": 0.03125,
864
+ "step": 57
865
+ },
866
+ {
867
+ "completion_length": 174.875,
868
+ "epoch": 0.232,
869
+ "grad_norm": 1.0625,
870
+ "kl": 0.005080131231807172,
871
+ "learning_rate": 4.7392794005985326e-05,
872
+ "loss": 0.0002,
873
+ "reward": 0.9698505848646164,
874
+ "reward_std": 0.0073038917616941035,
875
+ "rewards/check_divine_comedy_plagiarism": 0.0,
876
+ "rewards/endecasillabo_reward_func": 0.0,
877
+ "rewards/no_repetition_reward_func": 0.9698505699634552,
878
+ "rewards/verse_reward_func": 0.0,
879
+ "step": 58
880
+ },
881
+ {
882
+ "completion_length": 167.9375,
883
+ "epoch": 0.236,
884
+ "grad_norm": 1.65625,
885
+ "kl": 0.005669898469932377,
886
+ "learning_rate": 4.723540933228244e-05,
887
+ "loss": 0.0002,
888
+ "reward": 0.99271559715271,
889
+ "reward_std": 0.08150344673776999,
890
+ "rewards/check_divine_comedy_plagiarism": 0.0,
891
+ "rewards/endecasillabo_reward_func": 0.0,
892
+ "rewards/no_repetition_reward_func": 0.9614655822515488,
893
+ "rewards/verse_reward_func": 0.03125,
894
+ "step": 59
895
+ },
896
+ {
897
+ "completion_length": 191.0625,
898
+ "epoch": 0.24,
899
+ "grad_norm": 0.9375,
900
+ "kl": 0.007179695880040526,
901
+ "learning_rate": 4.707368982147318e-05,
902
+ "loss": 0.0003,
903
+ "reward": 1.0425692796707153,
904
+ "reward_std": 0.15895756683312356,
905
+ "rewards/check_divine_comedy_plagiarism": 0.0,
906
+ "rewards/endecasillabo_reward_func": 0.0,
907
+ "rewards/no_repetition_reward_func": 0.9488192647695541,
908
+ "rewards/verse_reward_func": 0.09375,
909
+ "step": 60
910
+ },
911
+ {
912
+ "completion_length": 178.8125,
913
+ "epoch": 0.244,
914
+ "grad_norm": 1.234375,
915
+ "kl": 0.007374793640337884,
916
+ "learning_rate": 4.690766700109659e-05,
917
+ "loss": 0.0003,
918
+ "reward": 1.0702029168605804,
919
+ "reward_std": 0.14750042068772018,
920
+ "rewards/check_divine_comedy_plagiarism": 0.0,
921
+ "rewards/endecasillabo_reward_func": 0.004166666883975267,
922
+ "rewards/no_repetition_reward_func": 0.972286269068718,
923
+ "rewards/verse_reward_func": 0.09375,
924
+ "step": 61
925
+ },
926
+ {
927
+ "completion_length": 177.5625,
928
+ "epoch": 0.248,
929
+ "grad_norm": 1.1328125,
930
+ "kl": 0.009947373066097498,
931
+ "learning_rate": 4.6737373237630476e-05,
932
+ "loss": 0.0004,
933
+ "reward": 1.010836973786354,
934
+ "reward_std": 0.18967258161865175,
935
+ "rewards/check_divine_comedy_plagiarism": 0.0,
936
+ "rewards/endecasillabo_reward_func": 0.0,
937
+ "rewards/no_repetition_reward_func": 0.9170869737863541,
938
+ "rewards/verse_reward_func": 0.09375,
939
+ "step": 62
940
+ },
941
+ {
942
+ "completion_length": 181.75,
943
+ "epoch": 0.252,
944
+ "grad_norm": 0.921875,
945
+ "kl": 0.00658240367192775,
946
+ "learning_rate": 4.656284173018144e-05,
947
+ "loss": 0.0003,
948
+ "reward": 1.0603050589561462,
949
+ "reward_std": 0.28348227217793465,
950
+ "rewards/check_divine_comedy_plagiarism": 0.0,
951
+ "rewards/endecasillabo_reward_func": 0.0,
952
+ "rewards/no_repetition_reward_func": 0.9353050738573074,
953
+ "rewards/verse_reward_func": 0.125,
954
+ "step": 63
955
+ },
956
+ {
957
+ "completion_length": 178.125,
958
+ "epoch": 0.256,
959
+ "grad_norm": 1.4921875,
960
+ "kl": 0.009526908048428595,
961
+ "learning_rate": 4.638410650401267e-05,
962
+ "loss": 0.0004,
963
+ "reward": 1.0653773248195648,
964
+ "reward_std": 0.1409751852042973,
965
+ "rewards/check_divine_comedy_plagiarism": 0.0,
966
+ "rewards/endecasillabo_reward_func": 0.0,
967
+ "rewards/no_repetition_reward_func": 0.971627339720726,
968
+ "rewards/verse_reward_func": 0.09375,
969
+ "step": 64
970
+ },
971
+ {
972
+ "completion_length": 193.0625,
973
+ "epoch": 0.26,
974
+ "grad_norm": 0.921875,
975
+ "kl": 0.007291483459994197,
976
+ "learning_rate": 4.620120240391065e-05,
977
+ "loss": 0.0003,
978
+ "reward": 0.992166668176651,
979
+ "reward_std": 0.08354807726573199,
980
+ "rewards/check_divine_comedy_plagiarism": 0.0,
981
+ "rewards/endecasillabo_reward_func": 0.0,
982
+ "rewards/no_repetition_reward_func": 0.9609166830778122,
983
+ "rewards/verse_reward_func": 0.03125,
984
+ "step": 65
985
+ },
986
+ {
987
+ "completion_length": 163.1875,
988
+ "epoch": 0.264,
989
+ "grad_norm": 2.4375,
990
+ "kl": 0.010100604966282845,
991
+ "learning_rate": 4.601416508739211e-05,
992
+ "loss": 0.0004,
993
+ "reward": 0.9094432145357132,
994
+ "reward_std": 0.13521920214407146,
995
+ "rewards/check_divine_comedy_plagiarism": 0.0,
996
+ "rewards/endecasillabo_reward_func": -0.0625,
997
+ "rewards/no_repetition_reward_func": 0.940693199634552,
998
+ "rewards/verse_reward_func": 0.03125,
999
+ "step": 66
1000
+ },
1001
+ {
1002
+ "completion_length": 186.25,
1003
+ "epoch": 0.268,
1004
+ "grad_norm": 0.94140625,
1005
+ "kl": 0.006481092656031251,
1006
+ "learning_rate": 4.5823031017752485e-05,
1007
+ "loss": 0.0003,
1008
+ "reward": 0.9819363504648209,
1009
+ "reward_std": 0.09091703849844635,
1010
+ "rewards/check_divine_comedy_plagiarism": 0.0,
1011
+ "rewards/endecasillabo_reward_func": 0.0,
1012
+ "rewards/no_repetition_reward_func": 0.9506863355636597,
1013
+ "rewards/verse_reward_func": 0.03125,
1014
+ "step": 67
1015
+ },
1016
+ {
1017
+ "completion_length": 175.8125,
1018
+ "epoch": 0.272,
1019
+ "grad_norm": 2.109375,
1020
+ "kl": 0.008945175679400563,
1021
+ "learning_rate": 4.562783745695738e-05,
1022
+ "loss": 0.0004,
1023
+ "reward": 0.9387106597423553,
1024
+ "reward_std": 0.16390067897737026,
1025
+ "rewards/check_divine_comedy_plagiarism": 0.0,
1026
+ "rewards/endecasillabo_reward_func": -0.0625,
1027
+ "rewards/no_repetition_reward_func": 0.9699606597423553,
1028
+ "rewards/verse_reward_func": 0.03125,
1029
+ "step": 68
1030
+ },
1031
+ {
1032
+ "completion_length": 200.0,
1033
+ "epoch": 0.276,
1034
+ "grad_norm": 0.83984375,
1035
+ "kl": 0.0076279257191345096,
1036
+ "learning_rate": 4.542862245837821e-05,
1037
+ "loss": 0.0003,
1038
+ "reward": 0.9555844515562057,
1039
+ "reward_std": 0.1185589594533667,
1040
+ "rewards/check_divine_comedy_plagiarism": 0.0,
1041
+ "rewards/endecasillabo_reward_func": 0.0,
1042
+ "rewards/no_repetition_reward_func": 0.9243344515562057,
1043
+ "rewards/verse_reward_func": 0.03125,
1044
+ "step": 69
1045
+ },
1046
+ {
1047
+ "completion_length": 190.375,
1048
+ "epoch": 0.28,
1049
+ "grad_norm": 1.0390625,
1050
+ "kl": 0.010221289936453104,
1051
+ "learning_rate": 4.522542485937369e-05,
1052
+ "loss": 0.0004,
1053
+ "reward": 1.0356830060482025,
1054
+ "reward_std": 0.14343186398036778,
1055
+ "rewards/check_divine_comedy_plagiarism": 0.0,
1056
+ "rewards/endecasillabo_reward_func": 0.0052083334885537624,
1057
+ "rewards/no_repetition_reward_func": 0.9679746478796005,
1058
+ "rewards/verse_reward_func": 0.0625,
1059
+ "step": 70
1060
+ },
1061
+ {
1062
+ "completion_length": 183.6875,
1063
+ "epoch": 0.284,
1064
+ "grad_norm": 1.0703125,
1065
+ "kl": 0.009065072517842054,
1066
+ "learning_rate": 4.5018284273718336e-05,
1067
+ "loss": 0.0004,
1068
+ "reward": 0.9543251842260361,
1069
+ "reward_std": 0.019520931062288582,
1070
+ "rewards/check_divine_comedy_plagiarism": 0.0,
1071
+ "rewards/endecasillabo_reward_func": 0.0,
1072
+ "rewards/no_repetition_reward_func": 0.9543252140283585,
1073
+ "rewards/verse_reward_func": 0.0,
1074
+ "step": 71
1075
+ },
1076
+ {
1077
+ "completion_length": 178.125,
1078
+ "epoch": 0.288,
1079
+ "grad_norm": 1.1875,
1080
+ "kl": 0.009216391015797853,
1081
+ "learning_rate": 4.480724108387977e-05,
1082
+ "loss": 0.0004,
1083
+ "reward": 0.9448710381984711,
1084
+ "reward_std": 0.046969235059805214,
1085
+ "rewards/check_divine_comedy_plagiarism": 0.0,
1086
+ "rewards/endecasillabo_reward_func": 0.0,
1087
+ "rewards/no_repetition_reward_func": 0.9448710232973099,
1088
+ "rewards/verse_reward_func": 0.0,
1089
+ "step": 72
1090
+ },
1091
+ {
1092
+ "completion_length": 195.4375,
1093
+ "epoch": 0.292,
1094
+ "grad_norm": 0.91796875,
1095
+ "kl": 0.007251435425132513,
1096
+ "learning_rate": 4.4592336433146e-05,
1097
+ "loss": 0.0003,
1098
+ "reward": 1.004251092672348,
1099
+ "reward_std": 0.07232979987747967,
1100
+ "rewards/check_divine_comedy_plagiarism": 0.0,
1101
+ "rewards/endecasillabo_reward_func": 0.003289473708719015,
1102
+ "rewards/no_repetition_reward_func": 0.9697116166353226,
1103
+ "rewards/verse_reward_func": 0.03125,
1104
+ "step": 73
1105
+ },
1106
+ {
1107
+ "completion_length": 200.0,
1108
+ "epoch": 0.296,
1109
+ "grad_norm": 1.0546875,
1110
+ "kl": 0.008636260172352195,
1111
+ "learning_rate": 4.4373612217604496e-05,
1112
+ "loss": 0.0003,
1113
+ "reward": 1.030202716588974,
1114
+ "reward_std": 0.13229261268861592,
1115
+ "rewards/check_divine_comedy_plagiarism": 0.0,
1116
+ "rewards/endecasillabo_reward_func": 0.0,
1117
+ "rewards/no_repetition_reward_func": 0.9677027016878128,
1118
+ "rewards/verse_reward_func": 0.0625,
1119
+ "step": 74
1120
+ },
1121
+ {
1122
+ "completion_length": 188.8125,
1123
+ "epoch": 0.3,
1124
+ "grad_norm": 1.1640625,
1125
+ "kl": 0.008647361653856933,
1126
+ "learning_rate": 4.415111107797445e-05,
1127
+ "loss": 0.0003,
1128
+ "reward": 1.0327503681182861,
1129
+ "reward_std": 0.12821446859743446,
1130
+ "rewards/check_divine_comedy_plagiarism": 0.0,
1131
+ "rewards/endecasillabo_reward_func": 0.0,
1132
+ "rewards/no_repetition_reward_func": 0.9702503681182861,
1133
+ "rewards/verse_reward_func": 0.0625,
1134
+ "step": 75
1135
+ },
1136
+ {
1137
+ "completion_length": 190.25,
1138
+ "epoch": 0.304,
1139
+ "grad_norm": 0.94140625,
1140
+ "kl": 0.010642669745720923,
1141
+ "learning_rate": 4.3924876391293915e-05,
1142
+ "loss": 0.0004,
1143
+ "reward": 0.9804545342922211,
1144
+ "reward_std": 0.02781949588097632,
1145
+ "rewards/check_divine_comedy_plagiarism": 0.0,
1146
+ "rewards/endecasillabo_reward_func": 0.011174242943525314,
1147
+ "rewards/no_repetition_reward_func": 0.969280332326889,
1148
+ "rewards/verse_reward_func": 0.0,
1149
+ "step": 76
1150
+ },
1151
+ {
1152
+ "completion_length": 177.5,
1153
+ "epoch": 0.308,
1154
+ "grad_norm": 1.6640625,
1155
+ "kl": 0.009210785734467208,
1156
+ "learning_rate": 4.36949522624633e-05,
1157
+ "loss": 0.0004,
1158
+ "reward": 0.906844437122345,
1159
+ "reward_std": 0.12991982704261318,
1160
+ "rewards/check_divine_comedy_plagiarism": 0.0,
1161
+ "rewards/endecasillabo_reward_func": -0.0625,
1162
+ "rewards/no_repetition_reward_func": 0.969344437122345,
1163
+ "rewards/verse_reward_func": 0.0,
1164
+ "step": 77
1165
+ },
1166
+ {
1167
+ "completion_length": 169.9375,
1168
+ "epoch": 0.312,
1169
+ "grad_norm": 1.4296875,
1170
+ "kl": 0.0085853380151093,
1171
+ "learning_rate": 4.3461383515647106e-05,
1172
+ "loss": 0.0003,
1173
+ "reward": 1.0604548156261444,
1174
+ "reward_std": 0.1915177572518587,
1175
+ "rewards/check_divine_comedy_plagiarism": 0.0,
1176
+ "rewards/endecasillabo_reward_func": 0.0,
1177
+ "rewards/no_repetition_reward_func": 0.966704785823822,
1178
+ "rewards/verse_reward_func": 0.09375,
1179
+ "step": 78
1180
+ },
1181
+ {
1182
+ "completion_length": 161.625,
1183
+ "epoch": 0.316,
1184
+ "grad_norm": 1.40625,
1185
+ "kl": 0.008558343281038105,
1186
+ "learning_rate": 4.3224215685535294e-05,
1187
+ "loss": 0.0003,
1188
+ "reward": 0.9392005652189255,
1189
+ "reward_std": 0.19322428456507623,
1190
+ "rewards/check_divine_comedy_plagiarism": 0.0,
1191
+ "rewards/endecasillabo_reward_func": -0.0625,
1192
+ "rewards/no_repetition_reward_func": 0.9704505503177643,
1193
+ "rewards/verse_reward_func": 0.03125,
1194
+ "step": 79
1195
+ },
1196
+ {
1197
+ "completion_length": 159.0625,
1198
+ "epoch": 0.32,
1199
+ "grad_norm": 2.21875,
1200
+ "kl": 0.009619904682040215,
1201
+ "learning_rate": 4.2983495008466276e-05,
1202
+ "loss": 0.0004,
1203
+ "reward": 0.9338277578353882,
1204
+ "reward_std": 0.17408786993473768,
1205
+ "rewards/check_divine_comedy_plagiarism": 0.0,
1206
+ "rewards/endecasillabo_reward_func": -0.059027777751907706,
1207
+ "rewards/no_repetition_reward_func": 0.9616055637598038,
1208
+ "rewards/verse_reward_func": 0.03125,
1209
+ "step": 80
1210
+ },
1211
+ {
1212
+ "completion_length": 171.875,
1213
+ "epoch": 0.324,
1214
+ "grad_norm": 1.0703125,
1215
+ "kl": 0.008021688903681934,
1216
+ "learning_rate": 4.273926841341302e-05,
1217
+ "loss": 0.0003,
1218
+ "reward": 1.0092779248952866,
1219
+ "reward_std": 0.10741436225362122,
1220
+ "rewards/check_divine_comedy_plagiarism": 0.0,
1221
+ "rewards/endecasillabo_reward_func": 0.0,
1222
+ "rewards/no_repetition_reward_func": 0.9467779695987701,
1223
+ "rewards/verse_reward_func": 0.0625,
1224
+ "step": 81
1225
+ },
1226
+ {
1227
+ "completion_length": 190.3125,
1228
+ "epoch": 0.328,
1229
+ "grad_norm": 0.953125,
1230
+ "kl": 0.008311162469908595,
1231
+ "learning_rate": 4.249158351283414e-05,
1232
+ "loss": 0.0003,
1233
+ "reward": 1.0095551759004593,
1234
+ "reward_std": 0.08130261686164886,
1235
+ "rewards/check_divine_comedy_plagiarism": 0.0,
1236
+ "rewards/endecasillabo_reward_func": 0.0069444444961845875,
1237
+ "rewards/no_repetition_reward_func": 0.9713607430458069,
1238
+ "rewards/verse_reward_func": 0.03125,
1239
+ "step": 82
1240
+ },
1241
+ {
1242
+ "completion_length": 191.5,
1243
+ "epoch": 0.332,
1244
+ "grad_norm": 1.265625,
1245
+ "kl": 0.00910053146071732,
1246
+ "learning_rate": 4.224048859339175e-05,
1247
+ "loss": 0.0004,
1248
+ "reward": 1.019190862774849,
1249
+ "reward_std": 0.14867421332746744,
1250
+ "rewards/check_divine_comedy_plagiarism": 0.0,
1251
+ "rewards/endecasillabo_reward_func": 0.0,
1252
+ "rewards/no_repetition_reward_func": 0.9566908329725266,
1253
+ "rewards/verse_reward_func": 0.0625,
1254
+ "step": 83
1255
+ },
1256
+ {
1257
+ "completion_length": 163.5625,
1258
+ "epoch": 0.336,
1259
+ "grad_norm": 1.5390625,
1260
+ "kl": 0.010091160424053669,
1261
+ "learning_rate": 4.198603260653792e-05,
1262
+ "loss": 0.0004,
1263
+ "reward": 0.968821257352829,
1264
+ "reward_std": 0.19755066523794085,
1265
+ "rewards/check_divine_comedy_plagiarism": 0.0,
1266
+ "rewards/endecasillabo_reward_func": -0.0625,
1267
+ "rewards/no_repetition_reward_func": 0.968821257352829,
1268
+ "rewards/verse_reward_func": 0.0625,
1269
+ "step": 84
1270
+ },
1271
+ {
1272
+ "completion_length": 189.875,
1273
+ "epoch": 0.34,
1274
+ "grad_norm": 1.1796875,
1275
+ "kl": 0.011119180475361645,
1276
+ "learning_rate": 4.172826515897146e-05,
1277
+ "loss": 0.0004,
1278
+ "reward": 1.0337166488170624,
1279
+ "reward_std": 0.1272729904158041,
1280
+ "rewards/check_divine_comedy_plagiarism": 0.0,
1281
+ "rewards/endecasillabo_reward_func": 0.0,
1282
+ "rewards/no_repetition_reward_func": 0.97121661901474,
1283
+ "rewards/verse_reward_func": 0.0625,
1284
+ "step": 85
1285
+ },
1286
+ {
1287
+ "completion_length": 174.5,
1288
+ "epoch": 0.344,
1289
+ "grad_norm": 1.2734375,
1290
+ "kl": 0.00940567790530622,
1291
+ "learning_rate": 4.146723650296701e-05,
1292
+ "loss": 0.0004,
1293
+ "reward": 0.9978608191013336,
1294
+ "reward_std": 0.07362514256965369,
1295
+ "rewards/check_divine_comedy_plagiarism": 0.0,
1296
+ "rewards/endecasillabo_reward_func": 0.0,
1297
+ "rewards/no_repetition_reward_func": 0.9666108191013336,
1298
+ "rewards/verse_reward_func": 0.03125,
1299
+ "step": 86
1300
+ },
1301
+ {
1302
+ "completion_length": 173.0625,
1303
+ "epoch": 0.348,
1304
+ "grad_norm": 1.2578125,
1305
+ "kl": 0.010684633627533913,
1306
+ "learning_rate": 4.1202997526578276e-05,
1307
+ "loss": 0.0004,
1308
+ "reward": 0.9645528197288513,
1309
+ "reward_std": 0.2310585738159716,
1310
+ "rewards/check_divine_comedy_plagiarism": 0.0,
1311
+ "rewards/endecasillabo_reward_func": -0.0625,
1312
+ "rewards/no_repetition_reward_func": 0.9645527899265289,
1313
+ "rewards/verse_reward_func": 0.0625,
1314
+ "step": 87
1315
+ },
1316
+ {
1317
+ "completion_length": 199.9375,
1318
+ "epoch": 0.352,
1319
+ "grad_norm": 1.0234375,
1320
+ "kl": 0.009741432731971145,
1321
+ "learning_rate": 4.093559974371725e-05,
1322
+ "loss": 0.0004,
1323
+ "reward": 1.0686962455511093,
1324
+ "reward_std": 0.20054577942937613,
1325
+ "rewards/check_divine_comedy_plagiarism": 0.0,
1326
+ "rewards/endecasillabo_reward_func": 0.0052083334885537624,
1327
+ "rewards/no_repetition_reward_func": 0.9697379469871521,
1328
+ "rewards/verse_reward_func": 0.09375,
1329
+ "step": 88
1330
+ },
1331
+ {
1332
+ "completion_length": 200.0,
1333
+ "epoch": 0.356,
1334
+ "grad_norm": 0.97265625,
1335
+ "kl": 0.010457327123731375,
1336
+ "learning_rate": 4.066509528411152e-05,
1337
+ "loss": 0.0004,
1338
+ "reward": 1.063912272453308,
1339
+ "reward_std": 0.1371896315831691,
1340
+ "rewards/check_divine_comedy_plagiarism": 0.0,
1341
+ "rewards/endecasillabo_reward_func": 0.00390625,
1342
+ "rewards/no_repetition_reward_func": 0.9662560373544693,
1343
+ "rewards/verse_reward_func": 0.09375,
1344
+ "step": 89
1345
+ },
1346
+ {
1347
+ "completion_length": 186.25,
1348
+ "epoch": 0.36,
1349
+ "grad_norm": 0.95703125,
1350
+ "kl": 0.0091336437035352,
1351
+ "learning_rate": 4.039153688314145e-05,
1352
+ "loss": 0.0004,
1353
+ "reward": 0.9959292709827423,
1354
+ "reward_std": 0.06627507868688554,
1355
+ "rewards/check_divine_comedy_plagiarism": 0.0,
1356
+ "rewards/endecasillabo_reward_func": 0.0,
1357
+ "rewards/no_repetition_reward_func": 0.9646792411804199,
1358
+ "rewards/verse_reward_func": 0.03125,
1359
+ "step": 90
1360
+ },
1361
+ {
1362
+ "completion_length": 197.5625,
1363
+ "epoch": 0.364,
1364
+ "grad_norm": 0.90625,
1365
+ "kl": 0.0125860923435539,
1366
+ "learning_rate": 4.011497787155938e-05,
1367
+ "loss": 0.0005,
1368
+ "reward": 1.0568312108516693,
1369
+ "reward_std": 0.1421456339303404,
1370
+ "rewards/check_divine_comedy_plagiarism": 0.0,
1371
+ "rewards/endecasillabo_reward_func": 0.0,
1372
+ "rewards/no_repetition_reward_func": 0.9630811959505081,
1373
+ "rewards/verse_reward_func": 0.09375,
1374
+ "step": 91
1375
+ },
1376
+ {
1377
+ "completion_length": 177.4375,
1378
+ "epoch": 0.368,
1379
+ "grad_norm": 6.8125,
1380
+ "kl": 0.011468705954030156,
1381
+ "learning_rate": 3.983547216509254e-05,
1382
+ "loss": 0.0005,
1383
+ "reward": 1.0019132494926453,
1384
+ "reward_std": 0.22998703457415104,
1385
+ "rewards/check_divine_comedy_plagiarism": 0.0,
1386
+ "rewards/endecasillabo_reward_func": -0.0625,
1387
+ "rewards/no_repetition_reward_func": 0.9706632345914841,
1388
+ "rewards/verse_reward_func": 0.09375,
1389
+ "step": 92
1390
+ },
1391
+ {
1392
+ "completion_length": 191.625,
1393
+ "epoch": 0.372,
1394
+ "grad_norm": 1.140625,
1395
+ "kl": 0.011732830666005611,
1396
+ "learning_rate": 3.955307425393224e-05,
1397
+ "loss": 0.0005,
1398
+ "reward": 0.9975786060094833,
1399
+ "reward_std": 0.07108220970258117,
1400
+ "rewards/check_divine_comedy_plagiarism": 0.0,
1401
+ "rewards/endecasillabo_reward_func": 0.0,
1402
+ "rewards/no_repetition_reward_func": 0.9663286209106445,
1403
+ "rewards/verse_reward_func": 0.03125,
1404
+ "step": 93
1405
+ },
1406
+ {
1407
+ "completion_length": 184.5,
1408
+ "epoch": 0.376,
1409
+ "grad_norm": 1.40625,
1410
+ "kl": 0.01064238091930747,
1411
+ "learning_rate": 3.92678391921108e-05,
1412
+ "loss": 0.0004,
1413
+ "reward": 1.0643680691719055,
1414
+ "reward_std": 0.19065000605769455,
1415
+ "rewards/check_divine_comedy_plagiarism": 0.0,
1416
+ "rewards/endecasillabo_reward_func": 0.0,
1417
+ "rewards/no_repetition_reward_func": 0.9706180691719055,
1418
+ "rewards/verse_reward_func": 0.09375,
1419
+ "step": 94
1420
+ },
1421
+ {
1422
+ "completion_length": 190.25,
1423
+ "epoch": 0.38,
1424
+ "grad_norm": 1.0625,
1425
+ "kl": 0.008287778589874506,
1426
+ "learning_rate": 3.897982258676867e-05,
1427
+ "loss": 0.0003,
1428
+ "reward": 1.0316542387008667,
1429
+ "reward_std": 0.1355991712771356,
1430
+ "rewards/check_divine_comedy_plagiarism": 0.0,
1431
+ "rewards/endecasillabo_reward_func": 0.0,
1432
+ "rewards/no_repetition_reward_func": 0.9691542387008667,
1433
+ "rewards/verse_reward_func": 0.0625,
1434
+ "step": 95
1435
+ },
1436
+ {
1437
+ "completion_length": 179.625,
1438
+ "epoch": 0.384,
1439
+ "grad_norm": 1.7265625,
1440
+ "kl": 0.01421071938239038,
1441
+ "learning_rate": 3.868908058731376e-05,
1442
+ "loss": 0.0006,
1443
+ "reward": 0.9379890263080597,
1444
+ "reward_std": 0.06504874012898654,
1445
+ "rewards/check_divine_comedy_plagiarism": 0.0,
1446
+ "rewards/endecasillabo_reward_func": -0.0625,
1447
+ "rewards/no_repetition_reward_func": 0.9692390412092209,
1448
+ "rewards/verse_reward_func": 0.03125,
1449
+ "step": 96
1450
+ },
1451
+ {
1452
+ "completion_length": 200.0,
1453
+ "epoch": 0.388,
1454
+ "grad_norm": 0.98046875,
1455
+ "kl": 0.013900347286835313,
1456
+ "learning_rate": 3.8395669874474915e-05,
1457
+ "loss": 0.0006,
1458
+ "reward": 1.0974175035953522,
1459
+ "reward_std": 0.15276630711741745,
1460
+ "rewards/check_divine_comedy_plagiarism": 0.0,
1461
+ "rewards/endecasillabo_reward_func": 0.0027173913549631834,
1462
+ "rewards/no_repetition_reward_func": 0.9697001129388809,
1463
+ "rewards/verse_reward_func": 0.125,
1464
+ "step": 97
1465
+ },
1466
+ {
1467
+ "completion_length": 151.125,
1468
+ "epoch": 0.392,
1469
+ "grad_norm": 2.171875,
1470
+ "kl": 0.016044694697484374,
1471
+ "learning_rate": 3.8099647649251986e-05,
1472
+ "loss": 0.0006,
1473
+ "reward": 0.9290718883275986,
1474
+ "reward_std": 0.18168721569236368,
1475
+ "rewards/check_divine_comedy_plagiarism": 0.0,
1476
+ "rewards/endecasillabo_reward_func": -0.0625,
1477
+ "rewards/no_repetition_reward_func": 0.9603218883275986,
1478
+ "rewards/verse_reward_func": 0.03125,
1479
+ "step": 98
1480
+ },
1481
+ {
1482
+ "completion_length": 187.4375,
1483
+ "epoch": 0.396,
1484
+ "grad_norm": 1.1484375,
1485
+ "kl": 0.019918689038604498,
1486
+ "learning_rate": 3.780107162176429e-05,
1487
+ "loss": 0.0008,
1488
+ "reward": 1.0208609402179718,
1489
+ "reward_std": 0.15501247788779438,
1490
+ "rewards/check_divine_comedy_plagiarism": 0.0,
1491
+ "rewards/endecasillabo_reward_func": 0.0052083334885537624,
1492
+ "rewards/no_repetition_reward_func": 0.9531526267528534,
1493
+ "rewards/verse_reward_func": 0.0625,
1494
+ "step": 99
1495
+ },
1496
+ {
1497
+ "completion_length": 169.875,
1498
+ "epoch": 0.4,
1499
+ "grad_norm": 21.625,
1500
+ "kl": 0.02106661314610392,
1501
+ "learning_rate": 3.7500000000000003e-05,
1502
+ "loss": 0.0008,
1503
+ "reward": 0.9671216309070587,
1504
+ "reward_std": 0.2393743423745036,
1505
+ "rewards/check_divine_comedy_plagiarism": 0.0,
1506
+ "rewards/endecasillabo_reward_func": -0.059210526291280985,
1507
+ "rewards/no_repetition_reward_func": 0.9638321995735168,
1508
+ "rewards/verse_reward_func": 0.0625,
1509
+ "step": 100
1510
+ }
1511
+ ],
1512
+ "logging_steps": 1,
1513
+ "max_steps": 250,
1514
+ "num_input_tokens_seen": 0,
1515
+ "num_train_epochs": 1,
1516
+ "save_steps": 100,
1517
+ "stateful_callbacks": {
1518
+ "TrainerControl": {
1519
+ "args": {
1520
+ "should_epoch_stop": false,
1521
+ "should_evaluate": false,
1522
+ "should_log": false,
1523
+ "should_save": true,
1524
+ "should_training_stop": false
1525
+ },
1526
+ "attributes": {}
1527
+ }
1528
+ },
1529
+ "total_flos": 0.0,
1530
+ "train_batch_size": 4,
1531
+ "trial_name": null,
1532
+ "trial_params": null
1533
+ }
checkpoint-100/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2c61957a403ae0924b967402286406830e3935654a8c024ccfdeadd6433d89ac
3
+ size 5752
checkpoint-1000/config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "PleIAs/Pleias-350m-Preview",
3
+ "architectures": [
4
+ "LlamaForCausalLM"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "bos_token_id": 1,
9
+ "eos_token_id": 2,
10
+ "head_dim": 64,
11
+ "hidden_act": "silu",
12
+ "hidden_size": 1024,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 2560,
15
+ "max_position_embeddings": 2048,
16
+ "mlp_bias": false,
17
+ "model_type": "llama",
18
+ "num_attention_heads": 16,
19
+ "num_hidden_layers": 26,
20
+ "num_key_value_heads": 8,
21
+ "pretraining_tp": 1,
22
+ "rms_norm_eps": 1e-05,
23
+ "rope_scaling": null,
24
+ "rope_theta": 10000,
25
+ "tie_word_embeddings": true,
26
+ "torch_dtype": "bfloat16",
27
+ "transformers_version": "4.49.0",
28
+ "use_cache": true,
29
+ "vocab_size": 65536
30
+ }
checkpoint-1000/generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.49.0"
6
+ }
checkpoint-1000/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:af40ff621b2528378c101b6daa62afe363ae9df812f0824bd28eabe28c590b6b
3
+ size 706875632
checkpoint-1000/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2939d3565c8f454d495ab4b18b9d2cbd237eca51b599026cfe2c5953e1528f16
3
+ size 1413896442
checkpoint-1000/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bf9a6009d6de5a55fcfadf3b516a9c0731ad1f5f93e4fc0dcab89e4f6703f4f7
3
+ size 14180
checkpoint-1000/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:066959694903b2428a992642efadd126fbc0e6a7300aedf33231c1641a3b801a
3
+ size 1064
checkpoint-1000/special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|end_of_text|>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|end_of_text|>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "<|end_of_text|>",
17
+ "unk_token": {
18
+ "content": "[UNK]",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
checkpoint-1000/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1000/tokenizer_config.json ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[UNK]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<|begin_of_text|>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "<|end_of_text|>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[PAD]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ }
35
+ },
36
+ "bos_token": "<|end_of_text|>",
37
+ "clean_up_tokenization_spaces": true,
38
+ "eos_token": "<|end_of_text|>",
39
+ "extra_special_tokens": {},
40
+ "model_max_length": 1000000000000000019884624838656,
41
+ "pad_token": "<|end_of_text|>",
42
+ "padding_side": "left",
43
+ "return_token_type_ids": false,
44
+ "tokenizer_class": "PreTrainedTokenizer",
45
+ "unk_token": "[UNK]",
46
+ "use_token_type_ids": false,
47
+ "vocab_size": 65536
48
+ }
checkpoint-1000/trainer_state.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1000/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5e715216741604a6e077ba2b09630723e606676f5b8960f15b645bdf2bca79b1
3
+ size 5752
checkpoint-1250/config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "PleIAs/Pleias-350m-Preview",
3
+ "architectures": [
4
+ "LlamaForCausalLM"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "bos_token_id": 1,
9
+ "eos_token_id": 2,
10
+ "head_dim": 64,
11
+ "hidden_act": "silu",
12
+ "hidden_size": 1024,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 2560,
15
+ "max_position_embeddings": 2048,
16
+ "mlp_bias": false,
17
+ "model_type": "llama",
18
+ "num_attention_heads": 16,
19
+ "num_hidden_layers": 26,
20
+ "num_key_value_heads": 8,
21
+ "pretraining_tp": 1,
22
+ "rms_norm_eps": 1e-05,
23
+ "rope_scaling": null,
24
+ "rope_theta": 10000,
25
+ "tie_word_embeddings": true,
26
+ "torch_dtype": "bfloat16",
27
+ "transformers_version": "4.49.0",
28
+ "use_cache": true,
29
+ "vocab_size": 65536
30
+ }
checkpoint-1250/generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.49.0"
6
+ }
checkpoint-1250/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c353e29cdfc442a642ead972a2407b467a7f93694d614db2bf9a908f6d1c603e
3
+ size 706875632
checkpoint-1250/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9bed8b310a7435f1a375ddd1f0c38b99598dca67359f45b4cc5097da7c9726a8
3
+ size 1413896442
checkpoint-1250/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0bc81ade4cec98e4c2b590620950f56abbb6cdd5dcec276b50656c3fc54171ad
3
+ size 14180
checkpoint-1250/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9197244c5ee8fd84c23cb5387dd7cd4b0d34bb7720142963e9ea404ddb17646d
3
+ size 1064
checkpoint-1250/special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|end_of_text|>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|end_of_text|>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "<|end_of_text|>",
17
+ "unk_token": {
18
+ "content": "[UNK]",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
checkpoint-1250/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1250/tokenizer_config.json ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[UNK]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<|begin_of_text|>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "<|end_of_text|>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[PAD]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ }
35
+ },
36
+ "bos_token": "<|end_of_text|>",
37
+ "clean_up_tokenization_spaces": true,
38
+ "eos_token": "<|end_of_text|>",
39
+ "extra_special_tokens": {},
40
+ "model_max_length": 1000000000000000019884624838656,
41
+ "pad_token": "<|end_of_text|>",
42
+ "padding_side": "left",
43
+ "return_token_type_ids": false,
44
+ "tokenizer_class": "PreTrainedTokenizer",
45
+ "unk_token": "[UNK]",
46
+ "use_token_type_ids": false,
47
+ "vocab_size": 65536
48
+ }
checkpoint-1250/trainer_state.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1250/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5e715216741604a6e077ba2b09630723e606676f5b8960f15b645bdf2bca79b1
3
+ size 5752
checkpoint-1500/config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "PleIAs/Pleias-350m-Preview",
3
+ "architectures": [
4
+ "LlamaForCausalLM"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "bos_token_id": 1,
9
+ "eos_token_id": 2,
10
+ "head_dim": 64,
11
+ "hidden_act": "silu",
12
+ "hidden_size": 1024,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 2560,
15
+ "max_position_embeddings": 2048,
16
+ "mlp_bias": false,
17
+ "model_type": "llama",
18
+ "num_attention_heads": 16,
19
+ "num_hidden_layers": 26,
20
+ "num_key_value_heads": 8,
21
+ "pretraining_tp": 1,
22
+ "rms_norm_eps": 1e-05,
23
+ "rope_scaling": null,
24
+ "rope_theta": 10000,
25
+ "tie_word_embeddings": true,
26
+ "torch_dtype": "bfloat16",
27
+ "transformers_version": "4.49.0",
28
+ "use_cache": true,
29
+ "vocab_size": 65536
30
+ }
checkpoint-1500/generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.49.0"
6
+ }
checkpoint-1500/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5bd014f9c19f12c59b3a97c972bfbae0e19cebe03472ea233dfac648ee11a44b
3
+ size 706875632
checkpoint-1500/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0dd3053c2cc5f949a3f076a2ce57f93eebd90ee57fa7ea62a306c931dd46927f
3
+ size 1413896442
checkpoint-1500/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9d36874ae9aa68388939f7ba32704ae78c305546d3405871aa319375e5cb9ccb
3
+ size 14180
checkpoint-1500/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b2ba4ddf44b55773794aa8993007c7aa5ec7847ee0bddc2ba617f28ab98771e6
3
+ size 1064
checkpoint-1500/special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|end_of_text|>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|end_of_text|>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "<|end_of_text|>",
17
+ "unk_token": {
18
+ "content": "[UNK]",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
checkpoint-1500/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1500/tokenizer_config.json ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[UNK]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<|begin_of_text|>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "<|end_of_text|>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[PAD]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ }
35
+ },
36
+ "bos_token": "<|end_of_text|>",
37
+ "clean_up_tokenization_spaces": true,
38
+ "eos_token": "<|end_of_text|>",
39
+ "extra_special_tokens": {},
40
+ "model_max_length": 1000000000000000019884624838656,
41
+ "pad_token": "<|end_of_text|>",
42
+ "padding_side": "left",
43
+ "return_token_type_ids": false,
44
+ "tokenizer_class": "PreTrainedTokenizer",
45
+ "unk_token": "[UNK]",
46
+ "use_token_type_ids": false,
47
+ "vocab_size": 65536
48
+ }
checkpoint-1500/trainer_state.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1500/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:62f9b836a29bd7368492e464b0eb16cb35efea62335bfe3f9aedeed823bcabef
3
+ size 5752
checkpoint-200/config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "PleIAs/Pleias-350m-Preview",
3
+ "architectures": [
4
+ "LlamaForCausalLM"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "bos_token_id": 1,
9
+ "eos_token_id": 2,
10
+ "head_dim": 64,
11
+ "hidden_act": "silu",
12
+ "hidden_size": 1024,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 2560,
15
+ "max_position_embeddings": 2048,
16
+ "mlp_bias": false,
17
+ "model_type": "llama",
18
+ "num_attention_heads": 16,
19
+ "num_hidden_layers": 26,
20
+ "num_key_value_heads": 8,
21
+ "pretraining_tp": 1,
22
+ "rms_norm_eps": 1e-05,
23
+ "rope_scaling": null,
24
+ "rope_theta": 10000,
25
+ "tie_word_embeddings": true,
26
+ "torch_dtype": "bfloat16",
27
+ "transformers_version": "4.49.0",
28
+ "use_cache": true,
29
+ "vocab_size": 65536
30
+ }
checkpoint-200/generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.49.0"
6
+ }
checkpoint-200/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6e4bedbef9edbd8cf9e7cd0091d138f5e9fbc0b018a72b45612caa6c7cc3dcf2
3
+ size 706875632
checkpoint-200/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d811fdc36ea64ba6b151d1c439b08be38b79e1da5e617c8a076d245606d21913
3
+ size 1413896442
checkpoint-200/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:790242f025e1b7c5a019f39cc269f4321859a20583f29393c688025fa9447b45
3
+ size 14180