collapse_gemma-2-2b_hs2_accumulate_iter15_sftsd0
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.0977
- Num Input Tokens Seen: 78458600
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 0
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.7383 | 0.0035 | 5 | 1.3900 | 275576 |
1.6269 | 0.0070 | 10 | 1.3798 | 543728 |
1.6729 | 0.0104 | 15 | 1.3504 | 825496 |
1.5776 | 0.0139 | 20 | 1.3058 | 1096304 |
1.4886 | 0.0174 | 25 | 1.2627 | 1374568 |
1.3866 | 0.0209 | 30 | 1.2362 | 1642248 |
1.3134 | 0.0243 | 35 | 1.2053 | 1913600 |
1.2321 | 0.0278 | 40 | 1.1946 | 2187640 |
1.1024 | 0.0313 | 45 | 1.2215 | 2459792 |
1.0001 | 0.0348 | 50 | 1.2301 | 2742696 |
0.7793 | 0.0382 | 55 | 1.2631 | 3014608 |
0.6931 | 0.0417 | 60 | 1.3282 | 3289872 |
0.5294 | 0.0452 | 65 | 1.3136 | 3563896 |
0.5295 | 0.0487 | 70 | 1.2944 | 3830608 |
0.2979 | 0.0521 | 75 | 1.3077 | 4102464 |
0.2793 | 0.0556 | 80 | 1.2550 | 4374448 |
0.2951 | 0.0591 | 85 | 1.2690 | 4640688 |
0.3198 | 0.0626 | 90 | 1.2279 | 4921368 |
0.2746 | 0.0660 | 95 | 1.2360 | 5191520 |
0.3397 | 0.0695 | 100 | 1.2057 | 5467336 |
0.2905 | 0.0730 | 105 | 1.2057 | 5729656 |
0.1941 | 0.0765 | 110 | 1.1996 | 6004544 |
0.1477 | 0.0799 | 115 | 1.2102 | 6273944 |
0.2191 | 0.0834 | 120 | 1.2012 | 6548992 |
0.1988 | 0.0869 | 125 | 1.1996 | 6832168 |
0.2487 | 0.0904 | 130 | 1.1908 | 7104720 |
0.1528 | 0.0938 | 135 | 1.1941 | 7374064 |
0.2133 | 0.0973 | 140 | 1.1852 | 7652800 |
0.1849 | 0.1008 | 145 | 1.1876 | 7923360 |
0.1235 | 0.1043 | 150 | 1.1813 | 8196048 |
0.1459 | 0.1077 | 155 | 1.1800 | 8468232 |
0.2325 | 0.1112 | 160 | 1.1795 | 8733608 |
0.2413 | 0.1147 | 165 | 1.1697 | 9006464 |
0.1806 | 0.1182 | 170 | 1.1780 | 9276672 |
0.1988 | 0.1216 | 175 | 1.1757 | 9546832 |
0.1944 | 0.1251 | 180 | 1.1707 | 9810624 |
0.2444 | 0.1286 | 185 | 1.1719 | 10082368 |
0.1533 | 0.1321 | 190 | 1.1754 | 10359872 |
0.1064 | 0.1355 | 195 | 1.1670 | 10629544 |
0.1298 | 0.1390 | 200 | 1.1703 | 10896792 |
0.2071 | 0.1425 | 205 | 1.1683 | 11172240 |
0.1509 | 0.1460 | 210 | 1.1708 | 11444168 |
0.1929 | 0.1494 | 215 | 1.1702 | 11718792 |
0.1859 | 0.1529 | 220 | 1.1690 | 11989840 |
0.1892 | 0.1564 | 225 | 1.1668 | 12266016 |
0.2307 | 0.1599 | 230 | 1.1671 | 12539232 |
0.1579 | 0.1634 | 235 | 1.1641 | 12814112 |
0.1806 | 0.1668 | 240 | 1.1587 | 13090632 |
0.1774 | 0.1703 | 245 | 1.1597 | 13357832 |
0.1533 | 0.1738 | 250 | 1.1668 | 13634472 |
0.2038 | 0.1773 | 255 | 1.1587 | 13902960 |
0.2065 | 0.1807 | 260 | 1.1580 | 14173944 |
0.1271 | 0.1842 | 265 | 1.1634 | 14444016 |
0.1164 | 0.1877 | 270 | 1.1579 | 14715008 |
0.1264 | 0.1912 | 275 | 1.1606 | 14988952 |
0.1714 | 0.1946 | 280 | 1.1596 | 15260256 |
0.1016 | 0.1981 | 285 | 1.1546 | 15538176 |
0.0737 | 0.2016 | 290 | 1.1571 | 15806776 |
0.1314 | 0.2051 | 295 | 1.1553 | 16080024 |
0.1033 | 0.2085 | 300 | 1.1562 | 16347576 |
0.1346 | 0.2120 | 305 | 1.1504 | 16620128 |
0.1372 | 0.2155 | 310 | 1.1477 | 16895136 |
0.1163 | 0.2190 | 315 | 1.1495 | 17171288 |
0.1431 | 0.2224 | 320 | 1.1484 | 17450352 |
0.1652 | 0.2259 | 325 | 1.1498 | 17721064 |
0.123 | 0.2294 | 330 | 1.1502 | 17990344 |
0.1765 | 0.2329 | 335 | 1.1488 | 18260400 |
0.1038 | 0.2363 | 340 | 1.1472 | 18535840 |
0.1463 | 0.2398 | 345 | 1.1490 | 18807112 |
0.1773 | 0.2433 | 350 | 1.1474 | 19082776 |
0.1506 | 0.2468 | 355 | 1.1441 | 19355712 |
0.1526 | 0.2502 | 360 | 1.1463 | 19631304 |
0.1463 | 0.2537 | 365 | 1.1451 | 19907576 |
0.2014 | 0.2572 | 370 | 1.1448 | 20175672 |
0.1539 | 0.2607 | 375 | 1.1466 | 20454560 |
0.1504 | 0.2641 | 380 | 1.1424 | 20731344 |
0.1594 | 0.2676 | 385 | 1.1430 | 21002896 |
0.1833 | 0.2711 | 390 | 1.1420 | 21268792 |
0.1452 | 0.2746 | 395 | 1.1380 | 21537512 |
0.2366 | 0.2780 | 400 | 1.1397 | 21809232 |
0.0888 | 0.2815 | 405 | 1.1401 | 22080624 |
0.1506 | 0.2850 | 410 | 1.1400 | 22357848 |
0.1382 | 0.2885 | 415 | 1.1402 | 22636504 |
0.1701 | 0.2919 | 420 | 1.1388 | 22910152 |
0.118 | 0.2954 | 425 | 1.1367 | 23187368 |
0.1803 | 0.2989 | 430 | 1.1380 | 23459392 |
0.2108 | 0.3024 | 435 | 1.1369 | 23728760 |
0.1698 | 0.3058 | 440 | 1.1359 | 24003552 |
0.1305 | 0.3093 | 445 | 1.1367 | 24272248 |
0.1281 | 0.3128 | 450 | 1.1343 | 24548408 |
0.1456 | 0.3163 | 455 | 1.1360 | 24818264 |
0.232 | 0.3197 | 460 | 1.1432 | 25095288 |
0.1566 | 0.3232 | 465 | 1.1370 | 25372672 |
0.1168 | 0.3267 | 470 | 1.1331 | 25639136 |
0.1593 | 0.3302 | 475 | 1.1367 | 25920904 |
0.1524 | 0.3337 | 480 | 1.1406 | 26197400 |
0.1342 | 0.3371 | 485 | 1.1353 | 26466840 |
0.1033 | 0.3406 | 490 | 1.1335 | 26731560 |
0.1234 | 0.3441 | 495 | 1.1321 | 27002208 |
0.1209 | 0.3476 | 500 | 1.1302 | 27279152 |
0.1752 | 0.3510 | 505 | 1.1359 | 27551568 |
0.1358 | 0.3545 | 510 | 1.1331 | 27823944 |
0.1978 | 0.3580 | 515 | 1.1284 | 28088712 |
0.1087 | 0.3615 | 520 | 1.1285 | 28365200 |
0.129 | 0.3649 | 525 | 1.1285 | 28642992 |
0.1153 | 0.3684 | 530 | 1.1265 | 28913400 |
0.0884 | 0.3719 | 535 | 1.1290 | 29182264 |
0.1377 | 0.3754 | 540 | 1.1328 | 29447320 |
0.1164 | 0.3788 | 545 | 1.1260 | 29724952 |
0.1423 | 0.3823 | 550 | 1.1270 | 30000680 |
0.1486 | 0.3858 | 555 | 1.1314 | 30277328 |
0.1688 | 0.3893 | 560 | 1.1285 | 30556056 |
0.1699 | 0.3927 | 565 | 1.1281 | 30829864 |
0.1266 | 0.3962 | 570 | 1.1263 | 31106904 |
0.1159 | 0.3997 | 575 | 1.1254 | 31380000 |
0.1226 | 0.4032 | 580 | 1.1259 | 31652120 |
0.1778 | 0.4066 | 585 | 1.1252 | 31925080 |
0.1757 | 0.4101 | 590 | 1.1265 | 32204560 |
0.1351 | 0.4136 | 595 | 1.1249 | 32471832 |
0.1188 | 0.4171 | 600 | 1.1274 | 32742528 |
0.1753 | 0.4205 | 605 | 1.1266 | 33018744 |
0.1451 | 0.4240 | 610 | 1.1265 | 33292984 |
0.1829 | 0.4275 | 615 | 1.1240 | 33573296 |
0.1304 | 0.4310 | 620 | 1.1237 | 33843368 |
0.1526 | 0.4344 | 625 | 1.1251 | 34112416 |
0.1346 | 0.4379 | 630 | 1.1240 | 34385928 |
0.1803 | 0.4414 | 635 | 1.1228 | 34663952 |
0.216 | 0.4449 | 640 | 1.1218 | 34935240 |
0.1527 | 0.4483 | 645 | 1.1205 | 35212168 |
0.2165 | 0.4518 | 650 | 1.1207 | 35479680 |
0.1188 | 0.4553 | 655 | 1.1216 | 35752688 |
0.16 | 0.4588 | 660 | 1.1226 | 36027136 |
0.1208 | 0.4622 | 665 | 1.1222 | 36303144 |
0.1079 | 0.4657 | 670 | 1.1211 | 36574576 |
0.1064 | 0.4692 | 675 | 1.1213 | 36846400 |
0.1433 | 0.4727 | 680 | 1.1233 | 37117000 |
0.1272 | 0.4761 | 685 | 1.1211 | 37396584 |
0.1173 | 0.4796 | 690 | 1.1191 | 37668880 |
0.098 | 0.4831 | 695 | 1.1201 | 37945496 |
0.1083 | 0.4866 | 700 | 1.1186 | 38226464 |
0.1757 | 0.4901 | 705 | 1.1199 | 38498776 |
0.1238 | 0.4935 | 710 | 1.1193 | 38770744 |
0.1689 | 0.4970 | 715 | 1.1175 | 39048952 |
0.1603 | 0.5005 | 720 | 1.1184 | 39323336 |
0.1656 | 0.5040 | 725 | 1.1192 | 39596704 |
0.0784 | 0.5074 | 730 | 1.1179 | 39858624 |
0.1977 | 0.5109 | 735 | 1.1157 | 40122560 |
0.1845 | 0.5144 | 740 | 1.1168 | 40395096 |
0.1114 | 0.5179 | 745 | 1.1191 | 40658456 |
0.1299 | 0.5213 | 750 | 1.1174 | 40926968 |
0.0891 | 0.5248 | 755 | 1.1137 | 41200944 |
0.1778 | 0.5283 | 760 | 1.1151 | 41480888 |
0.1612 | 0.5318 | 765 | 1.1211 | 41759800 |
0.1187 | 0.5352 | 770 | 1.1208 | 42028368 |
0.1144 | 0.5387 | 775 | 1.1169 | 42305480 |
0.1703 | 0.5422 | 780 | 1.1163 | 42583192 |
0.2197 | 0.5457 | 785 | 1.1156 | 42857504 |
0.2184 | 0.5491 | 790 | 1.1153 | 43132064 |
0.1669 | 0.5526 | 795 | 1.1145 | 43406912 |
0.1964 | 0.5561 | 800 | 1.1146 | 43682584 |
0.1196 | 0.5596 | 805 | 1.1138 | 43963752 |
0.1284 | 0.5630 | 810 | 1.1132 | 44235536 |
0.1151 | 0.5665 | 815 | 1.1135 | 44510256 |
0.0733 | 0.5700 | 820 | 1.1165 | 44787976 |
0.1112 | 0.5735 | 825 | 1.1152 | 45052000 |
0.2125 | 0.5769 | 830 | 1.1120 | 45329328 |
0.0824 | 0.5804 | 835 | 1.1147 | 45603728 |
0.1086 | 0.5839 | 840 | 1.1162 | 45867216 |
0.0679 | 0.5874 | 845 | 1.1132 | 46139416 |
0.1332 | 0.5908 | 850 | 1.1132 | 46413304 |
0.104 | 0.5943 | 855 | 1.1147 | 46685920 |
0.1171 | 0.5978 | 860 | 1.1134 | 46955744 |
0.1843 | 0.6013 | 865 | 1.1118 | 47229432 |
0.2033 | 0.6047 | 870 | 1.1125 | 47507344 |
0.0771 | 0.6082 | 875 | 1.1119 | 47779320 |
0.1556 | 0.6117 | 880 | 1.1137 | 48056360 |
0.0928 | 0.6152 | 885 | 1.1132 | 48317416 |
0.1895 | 0.6186 | 890 | 1.1123 | 48587976 |
0.1137 | 0.6221 | 895 | 1.1127 | 48856016 |
0.1271 | 0.6256 | 900 | 1.1133 | 49123232 |
0.1797 | 0.6291 | 905 | 1.1112 | 49404456 |
0.1807 | 0.6325 | 910 | 1.1112 | 49672520 |
0.2039 | 0.6360 | 915 | 1.1135 | 49950400 |
0.1096 | 0.6395 | 920 | 1.1128 | 50215808 |
0.1172 | 0.6430 | 925 | 1.1108 | 50485008 |
0.0978 | 0.6465 | 930 | 1.1108 | 50752240 |
0.0999 | 0.6499 | 935 | 1.1095 | 51022832 |
0.095 | 0.6534 | 940 | 1.1106 | 51302016 |
0.1667 | 0.6569 | 945 | 1.1113 | 51575968 |
0.1818 | 0.6604 | 950 | 1.1088 | 51846376 |
0.1353 | 0.6638 | 955 | 1.1094 | 52119088 |
0.1416 | 0.6673 | 960 | 1.1108 | 52391448 |
0.086 | 0.6708 | 965 | 1.1090 | 52659048 |
0.0758 | 0.6743 | 970 | 1.1099 | 52930288 |
0.1553 | 0.6777 | 975 | 1.1114 | 53209168 |
0.1747 | 0.6812 | 980 | 1.1089 | 53479120 |
0.2046 | 0.6847 | 985 | 1.1075 | 53750888 |
0.2212 | 0.6882 | 990 | 1.1081 | 54019400 |
0.1543 | 0.6916 | 995 | 1.1068 | 54288032 |
0.0992 | 0.6951 | 1000 | 1.1066 | 54558888 |
0.1385 | 0.6986 | 1005 | 1.1085 | 54838040 |
0.1162 | 0.7021 | 1010 | 1.1081 | 55109752 |
0.1669 | 0.7055 | 1015 | 1.1076 | 55383056 |
0.1157 | 0.7090 | 1020 | 1.1089 | 55653208 |
0.18 | 0.7125 | 1025 | 1.1108 | 55925192 |
0.1463 | 0.7160 | 1030 | 1.1085 | 56195192 |
0.1892 | 0.7194 | 1035 | 1.1058 | 56477704 |
0.0992 | 0.7229 | 1040 | 1.1071 | 56746928 |
0.152 | 0.7264 | 1045 | 1.1076 | 57022680 |
0.1036 | 0.7299 | 1050 | 1.1055 | 57292688 |
0.1372 | 0.7333 | 1055 | 1.1060 | 57558488 |
0.1364 | 0.7368 | 1060 | 1.1059 | 57823056 |
0.1455 | 0.7403 | 1065 | 1.1057 | 58084136 |
0.1572 | 0.7438 | 1070 | 1.1048 | 58353992 |
0.1606 | 0.7472 | 1075 | 1.1046 | 58629200 |
0.1648 | 0.7507 | 1080 | 1.1034 | 58901416 |
0.143 | 0.7542 | 1085 | 1.1062 | 59168288 |
0.1321 | 0.7577 | 1090 | 1.1069 | 59437168 |
0.112 | 0.7611 | 1095 | 1.1057 | 59706672 |
0.1522 | 0.7646 | 1100 | 1.1063 | 59971784 |
0.1618 | 0.7681 | 1105 | 1.1050 | 60250424 |
0.1794 | 0.7716 | 1110 | 1.1046 | 60528320 |
0.1838 | 0.7750 | 1115 | 1.1039 | 60793992 |
0.1161 | 0.7785 | 1120 | 1.1027 | 61072272 |
0.1336 | 0.7820 | 1125 | 1.1031 | 61345400 |
0.1796 | 0.7855 | 1130 | 1.1043 | 61624416 |
0.1018 | 0.7889 | 1135 | 1.1040 | 61897832 |
0.1589 | 0.7924 | 1140 | 1.1022 | 62168872 |
0.1118 | 0.7959 | 1145 | 1.1019 | 62436160 |
0.134 | 0.7994 | 1150 | 1.1038 | 62709464 |
0.1531 | 0.8028 | 1155 | 1.1047 | 62978600 |
0.1198 | 0.8063 | 1160 | 1.1038 | 63253240 |
0.0952 | 0.8098 | 1165 | 1.1022 | 63522656 |
0.1531 | 0.8133 | 1170 | 1.1022 | 63799152 |
0.079 | 0.8168 | 1175 | 1.1022 | 64076200 |
0.1454 | 0.8202 | 1180 | 1.1037 | 64354136 |
0.0946 | 0.8237 | 1185 | 1.1035 | 64629040 |
0.1531 | 0.8272 | 1190 | 1.1022 | 64897896 |
0.126 | 0.8307 | 1195 | 1.1010 | 65178264 |
0.1557 | 0.8341 | 1200 | 1.1012 | 65449248 |
0.1155 | 0.8376 | 1205 | 1.1015 | 65715256 |
0.1492 | 0.8411 | 1210 | 1.1033 | 65985160 |
0.098 | 0.8446 | 1215 | 1.1029 | 66261624 |
0.2062 | 0.8480 | 1220 | 1.1007 | 66536800 |
0.1045 | 0.8515 | 1225 | 1.1019 | 66809144 |
0.1599 | 0.8550 | 1230 | 1.1020 | 67080736 |
0.0624 | 0.8585 | 1235 | 1.1007 | 67354456 |
0.2141 | 0.8619 | 1240 | 1.1012 | 67627128 |
0.1572 | 0.8654 | 1245 | 1.1007 | 67904808 |
0.1963 | 0.8689 | 1250 | 1.1011 | 68185216 |
0.115 | 0.8724 | 1255 | 1.1026 | 68451096 |
0.1836 | 0.8758 | 1260 | 1.1024 | 68723728 |
0.1619 | 0.8793 | 1265 | 1.1000 | 68997184 |
0.1449 | 0.8828 | 1270 | 1.0991 | 69279024 |
0.0874 | 0.8863 | 1275 | 1.0999 | 69548496 |
0.0881 | 0.8897 | 1280 | 1.1020 | 69828272 |
0.071 | 0.8932 | 1285 | 1.1016 | 70101848 |
0.1586 | 0.8967 | 1290 | 1.1011 | 70376968 |
0.0901 | 0.9002 | 1295 | 1.1014 | 70647088 |
0.1439 | 0.9036 | 1300 | 1.1012 | 70918216 |
0.1242 | 0.9071 | 1305 | 1.0995 | 71189864 |
0.1162 | 0.9106 | 1310 | 1.0993 | 71469944 |
0.1242 | 0.9141 | 1315 | 1.1007 | 71735376 |
0.1472 | 0.9175 | 1320 | 1.1011 | 72011544 |
0.1443 | 0.9210 | 1325 | 1.1009 | 72287696 |
0.1586 | 0.9245 | 1330 | 1.1001 | 72558976 |
0.0775 | 0.9280 | 1335 | 1.1018 | 72823888 |
0.1103 | 0.9314 | 1340 | 1.1005 | 73092040 |
0.1226 | 0.9349 | 1345 | 1.0975 | 73366384 |
0.1511 | 0.9384 | 1350 | 1.0982 | 73633912 |
0.179 | 0.9419 | 1355 | 1.0997 | 73908416 |
0.1192 | 0.9453 | 1360 | 1.0996 | 74188736 |
0.1652 | 0.9488 | 1365 | 1.0994 | 74459056 |
0.1723 | 0.9523 | 1370 | 1.0991 | 74730640 |
0.1447 | 0.9558 | 1375 | 1.0978 | 75005176 |
0.1265 | 0.9592 | 1380 | 1.0991 | 75280304 |
0.1463 | 0.9627 | 1385 | 1.0990 | 75555024 |
0.1738 | 0.9662 | 1390 | 1.0995 | 75825496 |
0.1228 | 0.9697 | 1395 | 1.0980 | 76104496 |
0.1413 | 0.9732 | 1400 | 1.0980 | 76373840 |
0.132 | 0.9766 | 1405 | 1.0983 | 76657488 |
0.1527 | 0.9801 | 1410 | 1.0963 | 76928840 |
0.0874 | 0.9836 | 1415 | 1.0971 | 77191984 |
0.1092 | 0.9871 | 1420 | 1.1009 | 77464336 |
0.1268 | 0.9905 | 1425 | 1.0996 | 77741544 |
0.0959 | 0.9940 | 1430 | 1.0978 | 78018920 |
0.1658 | 0.9975 | 1435 | 1.0978 | 78296160 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 1
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter15_sftsd0
Base model
google/gemma-2-2b