ninyx commited on
Commit
7ec3e38
·
verified ·
1 Parent(s): b9c52b8

Model save

Browse files
Files changed (3) hide show
  1. README.md +14 -12
  2. adapter_model.safetensors +1 -1
  3. results.json +2 -2
README.md CHANGED
@@ -23,9 +23,9 @@ should probably proofread and complete it, then remove this comment. -->
23
 
24
  This model is a fine-tuned version of [microsoft/Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct) on the generator dataset.
25
  It achieves the following results on the evaluation set:
26
- - Loss: 1.8935
27
- - Bleu: {'bleu': 0.26234942453828036, 'precisions': [0.6386386439809577, 0.32210746013057906, 0.19439435894133555, 0.13267612303321208], 'brevity_penalty': 0.9720688221242278, 'length_ratio': 0.9724517334440523, 'translation_length': 187372, 'reference_length': 192680}
28
- - Rouge: {'rouge1': 0.6264335677482978, 'rouge2': 0.303034334791063, 'rougeL': 0.5025911195619426, 'rougeLsum': 0.5017835431871924}
29
  - Exact Match: {'exact_match': 0.0}
30
 
31
  ## Model description
@@ -53,21 +53,23 @@ The following hyperparameters were used during training:
53
  - total_train_batch_size: 60
54
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
55
  - lr_scheduler_type: cosine
56
- - num_epochs: 8
57
  - mixed_precision_training: Native AMP
58
 
59
  ### Training results
60
 
61
  | Training Loss | Epoch | Step | Validation Loss | Bleu | Rouge | Exact Match |
62
  |:-------------:|:------:|:----:|:---------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------------:|:--------------------:|
63
- | 1.0263 | 0.9930 | 71 | 1.8935 | {'bleu': 0.26234942453828036, 'precisions': [0.6386386439809577, 0.32210746013057906, 0.19439435894133555, 0.13267612303321208], 'brevity_penalty': 0.9720688221242278, 'length_ratio': 0.9724517334440523, 'translation_length': 187372, 'reference_length': 192680} | {'rouge1': 0.6264335677482978, 'rouge2': 0.303034334791063, 'rougeL': 0.5025911195619426, 'rougeLsum': 0.5017835431871924} | {'exact_match': 0.0} |
64
- | 0.7406 | 2.0 | 143 | 2.0190 | {'bleu': 0.2316346526053078, 'precisions': [0.6194236274162941, 0.28977498736790047, 0.16667026013087397, 0.10975622939300578], 'brevity_penalty': 0.9676527647784504, 'length_ratio': 0.9681648328835375, 'translation_length': 186546, 'reference_length': 192680} | {'rouge1': 0.6036218396315156, 'rouge2': 0.2682122181745471, 'rougeL': 0.47708940409367784, 'rougeLsum': 0.4770613490668666} | {'exact_match': 0.0} |
65
- | 0.5882 | 2.9930 | 214 | 2.0681 | {'bleu': 0.22838404823316763, 'precisions': [0.6165055539838692, 0.2855842981089862, 0.1628873061791873, 0.10631461677977687], 'brevity_penalty': 0.9719140991086993, 'length_ratio': 0.9723012248287316, 'translation_length': 187343, 'reference_length': 192680} | {'rouge1': 0.6006461234391669, 'rouge2': 0.2637867501761157, 'rougeL': 0.4734228347835384, 'rougeLsum': 0.4732165944934509} | {'exact_match': 0.0} |
66
- | 0.5344 | 4.0 | 286 | 2.0990 | {'bleu': 0.23243181277634892, 'precisions': [0.6183425166820463, 0.290359158131201, 0.16708232101387482, 0.10871728128815054], 'brevity_penalty': 0.9726288315430208, 'length_ratio': 0.9729966784305585, 'translation_length': 187477, 'reference_length': 192680} | {'rouge1': 0.6020165553663895, 'rouge2': 0.2689980360965313, 'rougeL': 0.4761211517574821, 'rougeLsum': 0.476013109131896} | {'exact_match': 0.0} |
67
- | 0.491 | 4.9930 | 357 | 2.1029 | {'bleu': 0.23217356305609613, 'precisions': [0.6182632097844334, 0.28982060887176503, 0.16651395073437608, 0.1084203343202102], 'brevity_penalty': 0.9735242125771166, 'length_ratio': 0.9738685904089682, 'translation_length': 187645, 'reference_length': 192680} | {'rouge1': 0.602022171746183, 'rouge2': 0.2678457021558207, 'rougeL': 0.4757373660712696, 'rougeLsum': 0.4756766948490637} | {'exact_match': 0.0} |
68
- | 0.4804 | 6.0 | 429 | 2.1066 | {'bleu': 0.22688560402307306, 'precisions': [0.6157982530470419, 0.2836332155892459, 0.16120717833852222, 0.10478493323661374], 'brevity_penalty': 0.9735029030458111, 'length_ratio': 0.9738478305999585, 'translation_length': 187641, 'reference_length': 192680} | {'rouge1': 0.5997469653277846, 'rouge2': 0.2615884826579755, 'rougeL': 0.4719633878547087, 'rougeLsum': 0.4719354595076038} | {'exact_match': 0.0} |
69
- | 0.4667 | 6.9930 | 500 | 2.1083 | {'bleu': 0.2278015535859749, 'precisions': [0.6163871882176788, 0.28446401188294446, 0.1620224273628829, 0.10547183495849786], 'brevity_penalty': 0.9736627137558076, 'length_ratio': 0.9740035291675316, 'translation_length': 187671, 'reference_length': 192680} | {'rouge1': 0.6005116792432119, 'rouge2': 0.26230752315350514, 'rougeL': 0.47195529453152857, 'rougeLsum': 0.47187802968758125} | {'exact_match': 0.0} |
70
- | 0.4827 | 7.9441 | 568 | 2.1093 | {'bleu': 0.2279193811355966, 'precisions': [0.6163472757204277, 0.2845680034617419, 0.16225351810881897, 0.10543872371283539], 'brevity_penalty': 0.9738224996031655, 'length_ratio': 0.9741592277351049, 'translation_length': 187701, 'reference_length': 192680} | {'rouge1': 0.6005721411221121, 'rouge2': 0.2625293432287747, 'rougeL': 0.4722072250908843, 'rougeLsum': 0.472187239051013} | {'exact_match': 0.0} |
 
 
71
 
72
 
73
  ### Framework versions
 
23
 
24
  This model is a fine-tuned version of [microsoft/Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct) on the generator dataset.
25
  It achieves the following results on the evaluation set:
26
+ - Loss: 1.8937
27
+ - Bleu: {'bleu': 0.26205068002927057, 'precisions': [0.6385562102386747, 0.3220126728603845, 0.19412484437384622, 0.13232381936372636], 'brevity_penalty': 0.9720474824019883, 'length_ratio': 0.9724309736350426, 'translation_length': 187368, 'reference_length': 192680}
28
+ - Rouge: {'rouge1': 0.6264248496834525, 'rouge2': 0.3031545327309577, 'rougeL': 0.5022734325866114, 'rougeLsum': 0.5017276717558696}
29
  - Exact Match: {'exact_match': 0.0}
30
 
31
  ## Model description
 
53
  - total_train_batch_size: 60
54
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
55
  - lr_scheduler_type: cosine
56
+ - num_epochs: 10
57
  - mixed_precision_training: Native AMP
58
 
59
  ### Training results
60
 
61
  | Training Loss | Epoch | Step | Validation Loss | Bleu | Rouge | Exact Match |
62
  |:-------------:|:------:|:----:|:---------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------------:|:--------------------:|
63
+ | 1.0389 | 0.9930 | 71 | 1.8937 | {'bleu': 0.26205068002927057, 'precisions': [0.6385562102386747, 0.3220126728603845, 0.19412484437384622, 0.13232381936372636], 'brevity_penalty': 0.9720474824019883, 'length_ratio': 0.9724309736350426, 'translation_length': 187368, 'reference_length': 192680} | {'rouge1': 0.6264248496834525, 'rouge2': 0.3031545327309577, 'rougeL': 0.5022734325866114, 'rougeLsum': 0.5017276717558696} | {'exact_match': 0.0} |
64
+ | 0.7026 | 2.0 | 143 | 2.0257 | {'bleu': 0.22948697087184314, 'precisions': [0.6175561684920868, 0.2864991434080009, 0.16448293132138875, 0.10829521706024982], 'brevity_penalty': 0.9685578318352563, 'length_ratio': 0.9690419348141998, 'translation_length': 186715, 'reference_length': 192680} | {'rouge1': 0.6021744263812635, 'rouge2': 0.2645080008922339, 'rougeL': 0.47549724399365867, 'rougeLsum': 0.47563577913274346} | {'exact_match': 0.0} |
65
+ | 0.5794 | 2.9930 | 214 | 2.0827 | {'bleu': 0.22453345451779733, 'precisions': [0.6142047063979434, 0.2794608644390257, 0.15886996662779512, 0.10500024249478636], 'brevity_penalty': 0.9706541083647924, 'length_ratio': 0.9710763960971559, 'translation_length': 187107, 'reference_length': 192680} | {'rouge1': 0.5986129640494808, 'rouge2': 0.2565288834240412, 'rougeL': 0.47029440892215696, 'rougeLsum': 0.4703605206181696} | {'exact_match': 0.0} |
66
+ | 0.5107 | 4.0 | 286 | 2.0999 | {'bleu': 0.22808449006897172, 'precisions': [0.6164639351259069, 0.28426452965847815, 0.16231439361428204, 0.10640438075565883], 'brevity_penalty': 0.9724315296841323, 'length_ratio': 0.9728046501972182, 'translation_length': 187440, 'reference_length': 192680} | {'rouge1': 0.6010609898102299, 'rouge2': 0.2621809898542294, 'rougeL': 0.4728255342917802, 'rougeLsum': 0.4728531320642606} | {'exact_match': 0.0} |
67
+ | 0.4923 | 4.9930 | 357 | 2.0932 | {'bleu': 0.23027336632996132, 'precisions': [0.6166044676937471, 0.2878130430610787, 0.1642595225622989, 0.10630862410891355], 'brevity_penalty': 0.975977176959311, 'length_ratio': 0.9762611583973427, 'translation_length': 188106, 'reference_length': 192680} | {'rouge1': 0.6020695302602435, 'rouge2': 0.2657671472450324, 'rougeL': 0.47423678533654967, 'rougeLsum': 0.47426066890913565} | {'exact_match': 0.0} |
68
+ | 0.4431 | 6.0 | 429 | 2.0962 | {'bleu': 0.22873099259924137, 'precisions': [0.6169168021752459, 0.28490855532923826, 0.16326705657201365, 0.10637588763042322], 'brevity_penalty': 0.9730979379483501, 'length_ratio': 0.9734533942287731, 'translation_length': 187565, 'reference_length': 192680} | {'rouge1': 0.6015904749444395, 'rouge2': 0.26263389133741416, 'rougeL': 0.4729371282759689, 'rougeLsum': 0.4730073305944661} | {'exact_match': 0.0} |
69
+ | 0.4291 | 6.9930 | 500 | 2.0895 | {'bleu': 0.23078161525345967, 'precisions': [0.6175051285594328, 0.2861604050093259, 0.16454167512744605, 0.10739661140462743], 'brevity_penalty': 0.9762747516268988, 'length_ratio': 0.9765517957234794, 'translation_length': 188162, 'reference_length': 192680} | {'rouge1': 0.6034137320239901, 'rouge2': 0.26422178262738116, 'rougeL': 0.47430934107431466, 'rougeLsum': 0.47430902463237395} | {'exact_match': 0.0} |
70
+ | 0.4297 | 8.0 | 572 | 2.0865 | {'bleu': 0.22849194288081487, 'precisions': [0.6172627948932184, 0.28407374796552737, 0.1623422141125731, 0.10599288515917175], 'brevity_penalty': 0.9749190245343078, 'length_ratio': 0.9752283578991073, 'translation_length': 187907, 'reference_length': 192680} | {'rouge1': 0.6027503352616924, 'rouge2': 0.2615077454867606, 'rougeL': 0.47349895225288113, 'rougeLsum': 0.47352034156560674} | {'exact_match': 0.0} |
71
+ | 0.4361 | 8.9930 | 643 | 2.0832 | {'bleu': 0.2305080658084417, 'precisions': [0.6175195604418985, 0.2856609509586922, 0.16423418171705448, 0.10763603992041658], 'brevity_penalty': 0.9754508959408048, 'length_ratio': 0.9757473531243512, 'translation_length': 188007, 'reference_length': 192680} | {'rouge1': 0.6029422201953518, 'rouge2': 0.26346694480161104, 'rougeL': 0.4742809273284626, 'rougeLsum': 0.4743122502561476} | {'exact_match': 0.0} |
72
+ | 0.4423 | 9.9301 | 710 | 2.0840 | {'bleu': 0.230038020190203, 'precisions': [0.6176251608717387, 0.2855817326664036, 0.16376314072743217, 0.10700689536841428], 'brevity_penalty': 0.9756157200699793, 'length_ratio': 0.9759082416441769, 'translation_length': 188038, 'reference_length': 192680} | {'rouge1': 0.603139585918947, 'rouge2': 0.26328950362942705, 'rougeL': 0.4742788009942601, 'rougeLsum': 0.47433418479279266} | {'exact_match': 0.0} |
73
 
74
 
75
  ### Framework versions
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1692cc37c4e84f061838a4777ec1c9f44abfedb3a2765d956040ffebbcbe9fe4
3
  size 545621264
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9a8c07aadf9faee5923cad47549a7323e599a373c8d7598c928f8d958e18cb21
3
  size 545621264
results.json CHANGED
@@ -1,4 +1,4 @@
1
  Pre-training results:
2
- {"eval_loss": 4.0594482421875, "eval_bleu": {"bleu": 0.1826898245158488, "precisions": [0.5573254531286369, 0.22778189271183172, 0.119520655944497, 0.07341526849915855], "brevity_penalty": 1.0, "length_ratio": 1.0256695038405648, "translation_length": 197626, "reference_length": 192680}, "eval_rouge": {"rouge1": 0.5620978339462965, "rouge2": 0.21928124564678209, "rougeL": 0.4200989137725146, "rougeLsum": 0.4164644643467429}, "eval_exact_match": {"exact_match": 0.0}, "eval_runtime": 95.4065, "eval_samples_per_second": 5.367, "eval_steps_per_second": 1.342}
3
  Post-training results:
4
- {"eval_loss": 1.8935281038284302, "eval_bleu": {"bleu": 0.26234942453828036, "precisions": [0.6386386439809577, 0.32210746013057906, 0.19439435894133555, 0.13267612303321208], "brevity_penalty": 0.9720688221242278, "length_ratio": 0.9724517334440523, "translation_length": 187372, "reference_length": 192680}, "eval_rouge": {"rouge1": 0.6264335677482978, "rouge2": 0.303034334791063, "rougeL": 0.5025911195619426, "rougeLsum": 0.5017835431871924}, "eval_exact_match": {"exact_match": 0.0}, "eval_runtime": 94.9049, "eval_samples_per_second": 5.395, "eval_steps_per_second": 1.349, "epoch": 7.944055944055944}
 
1
  Pre-training results:
2
+ {"eval_loss": 4.062599182128906, "eval_bleu": {"bleu": 0.18269170824034775, "precisions": [0.5574618750158064, 0.2274269370616604, 0.11952349235564187, 0.07351314427865768], "brevity_penalty": 1.0, "length_ratio": 1.0260795100685074, "translation_length": 197705, "reference_length": 192680}, "eval_rouge": {"rouge1": 0.5625677489526986, "rouge2": 0.21880497497157425, "rougeL": 0.42027195103201975, "rougeLsum": 0.41590785352936227}, "eval_exact_match": {"exact_match": 0.0}, "eval_runtime": 89.2489, "eval_samples_per_second": 5.737, "eval_steps_per_second": 1.434}
3
  Post-training results:
4
+ {"eval_loss": 1.893650770187378, "eval_bleu": {"bleu": 0.26205068002927057, "precisions": [0.6385562102386747, 0.3220126728603845, 0.19412484437384622, 0.13232381936372636], "brevity_penalty": 0.9720474824019883, "length_ratio": 0.9724309736350426, "translation_length": 187368, "reference_length": 192680}, "eval_rouge": {"rouge1": 0.6264248496834525, "rouge2": 0.3031545327309577, "rougeL": 0.5022734325866114, "rougeLsum": 0.5017276717558696}, "eval_exact_match": {"exact_match": 0.0}, "eval_runtime": 86.0283, "eval_samples_per_second": 5.952, "eval_steps_per_second": 1.488, "epoch": 9.93006993006993}