--- license: gemma base_model: google/gemma-2-2b tags: - trl - sft - generated_from_trainer model-index: - name: collapse_gemma-2-2b_hs2_accumulate_iter15_sftsd1 results: [] --- # collapse_gemma-2-2b_hs2_accumulate_iter15_sftsd1 This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 1.1038 - Num Input Tokens Seen: 76152160 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 8e-06 - train_batch_size: 8 - eval_batch_size: 16 - seed: 1 - gradient_accumulation_steps: 16 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant_with_warmup - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen | |:-------------:|:------:|:----:|:---------------:|:-----------------:| | No log | 0 | 0 | 1.3909 | 0 | | 1.6335 | 0.0035 | 5 | 1.3904 | 265264 | | 1.622 | 0.0070 | 10 | 1.3796 | 529520 | | 1.5616 | 0.0105 | 15 | 1.3493 | 789912 | | 1.4771 | 0.0140 | 20 | 1.3048 | 1051936 | | 1.4482 | 0.0174 | 25 | 1.2593 | 1310816 | | 1.2366 | 0.0209 | 30 | 1.2277 | 1575464 | | 1.205 | 0.0244 | 35 | 1.1954 | 1844176 | | 1.1595 | 0.0279 | 40 | 1.1877 | 2101088 | | 1.1075 | 0.0314 | 45 | 1.2079 | 2358008 | | 0.9717 | 0.0349 | 50 | 1.2192 | 2622960 | | 0.8649 | 0.0384 | 55 | 1.2430 | 2900296 | | 0.6741 | 0.0419 | 60 | 1.2808 | 3162936 | | 0.6088 | 0.0453 | 65 | 1.2750 | 3426256 | | 0.5466 | 0.0488 | 70 | 1.2739 | 3687984 | | 0.4352 | 0.0523 | 75 | 1.2766 | 3956072 | | 0.3651 | 0.0558 | 80 | 1.2627 | 4222280 | | 0.4407 | 0.0593 | 85 | 1.2419 | 4490176 | | 0.4192 | 0.0628 | 90 | 1.2448 | 4757240 | | 0.3062 | 0.0663 | 95 | 1.2267 | 5025816 | | 0.3112 | 0.0698 | 100 | 1.2124 | 5283528 | | 0.2991 | 0.0732 | 105 | 1.2096 | 5542616 | | 0.3083 | 0.0767 | 110 | 1.2180 | 5801264 | | 0.2697 | 0.0802 | 115 | 1.2119 | 6076072 | | 0.2169 | 0.0837 | 120 | 1.2046 | 6341608 | | 0.2624 | 0.0872 | 125 | 1.2023 | 6605168 | | 0.1965 | 0.0907 | 130 | 1.2020 | 6874640 | | 0.2709 | 0.0942 | 135 | 1.2018 | 7136960 | | 0.2237 | 0.0977 | 140 | 1.1901 | 7398024 | | 0.1973 | 0.1011 | 145 | 1.1966 | 7665200 | | 0.2071 | 0.1046 | 150 | 1.1827 | 7929616 | | 0.268 | 0.1081 | 155 | 1.1982 | 8192768 | | 0.1654 | 0.1116 | 160 | 1.1955 | 8458944 | | 0.2108 | 0.1151 | 165 | 1.1858 | 8721632 | | 0.2272 | 0.1186 | 170 | 1.1951 | 8989592 | | 0.2106 | 0.1221 | 175 | 1.1778 | 9253224 | | 0.1706 | 0.1256 | 180 | 1.1830 | 9530992 | | 0.2324 | 0.1290 | 185 | 1.1872 | 9795664 | | 0.189 | 0.1325 | 190 | 1.1764 | 10062896 | | 0.2099 | 0.1360 | 195 | 1.1816 | 10332024 | | 0.1786 | 0.1395 | 200 | 1.1858 | 10593000 | | 0.1921 | 0.1430 | 205 | 1.1741 | 10860952 | | 0.2308 | 0.1465 | 210 | 1.1782 | 11123072 | | 0.2099 | 0.1500 | 215 | 1.1704 | 11393152 | | 0.1098 | 0.1535 | 220 | 1.1730 | 11663680 | | 0.1734 | 0.1570 | 225 | 1.1834 | 11923280 | | 0.2554 | 0.1604 | 230 | 1.1687 | 12190352 | | 0.2023 | 0.1639 | 235 | 1.1790 | 12451824 | | 0.1721 | 0.1674 | 240 | 1.1711 | 12717744 | | 0.1527 | 0.1709 | 245 | 1.1674 | 12981208 | | 0.1508 | 0.1744 | 250 | 1.1722 | 13246192 | | 0.245 | 0.1779 | 255 | 1.1683 | 13516936 | | 0.127 | 0.1814 | 260 | 1.1690 | 13775424 | | 0.1797 | 0.1849 | 265 | 1.1731 | 14051952 | | 0.1174 | 0.1883 | 270 | 1.1666 | 14317336 | | 0.1984 | 0.1918 | 275 | 1.1680 | 14578840 | | 0.1501 | 0.1953 | 280 | 1.1664 | 14843080 | | 0.1468 | 0.1988 | 285 | 1.1643 | 15108696 | | 0.1391 | 0.2023 | 290 | 1.1662 | 15377712 | | 0.1242 | 0.2058 | 295 | 1.1690 | 15643664 | | 0.2058 | 0.2093 | 300 | 1.1623 | 15904872 | | 0.1263 | 0.2128 | 305 | 1.1607 | 16167640 | | 0.1244 | 0.2162 | 310 | 1.1670 | 16430328 | | 0.1891 | 0.2197 | 315 | 1.1577 | 16693176 | | 0.1042 | 0.2232 | 320 | 1.1596 | 16957704 | | 0.1125 | 0.2267 | 325 | 1.1597 | 17225152 | | 0.1481 | 0.2302 | 330 | 1.1567 | 17487800 | | 0.1469 | 0.2337 | 335 | 1.1578 | 17756544 | | 0.2201 | 0.2372 | 340 | 1.1549 | 18013952 | | 0.1572 | 0.2407 | 345 | 1.1539 | 18278648 | | 0.1809 | 0.2441 | 350 | 1.1560 | 18549424 | | 0.1714 | 0.2476 | 355 | 1.1536 | 18809728 | | 0.1317 | 0.2511 | 360 | 1.1541 | 19083128 | | 0.1413 | 0.2546 | 365 | 1.1575 | 19346464 | | 0.1489 | 0.2581 | 370 | 1.1506 | 19610352 | | 0.1807 | 0.2616 | 375 | 1.1557 | 19876488 | | 0.1268 | 0.2651 | 380 | 1.1511 | 20145576 | | 0.0999 | 0.2686 | 385 | 1.1480 | 20406568 | | 0.1462 | 0.2720 | 390 | 1.1480 | 20673400 | | 0.2487 | 0.2755 | 395 | 1.1498 | 20933272 | | 0.1891 | 0.2790 | 400 | 1.1498 | 21203336 | | 0.1258 | 0.2825 | 405 | 1.1494 | 21463128 | | 0.2041 | 0.2860 | 410 | 1.1456 | 21727312 | | 0.1538 | 0.2895 | 415 | 1.1470 | 21991800 | | 0.1854 | 0.2930 | 420 | 1.1516 | 22259128 | | 0.1756 | 0.2965 | 425 | 1.1449 | 22527368 | | 0.1609 | 0.3000 | 430 | 1.1496 | 22797736 | | 0.1281 | 0.3034 | 435 | 1.1490 | 23065120 | | 0.1733 | 0.3069 | 440 | 1.1419 | 23332824 | | 0.1267 | 0.3104 | 445 | 1.1475 | 23596656 | | 0.1899 | 0.3139 | 450 | 1.1503 | 23858872 | | 0.1488 | 0.3174 | 455 | 1.1445 | 24128464 | | 0.2237 | 0.3209 | 460 | 1.1402 | 24398456 | | 0.1103 | 0.3244 | 465 | 1.1457 | 24674592 | | 0.1947 | 0.3279 | 470 | 1.1457 | 24942816 | | 0.1473 | 0.3313 | 475 | 1.1414 | 25208880 | | 0.1941 | 0.3348 | 480 | 1.1405 | 25476880 | | 0.1516 | 0.3383 | 485 | 1.1412 | 25742280 | | 0.1136 | 0.3418 | 490 | 1.1412 | 26011456 | | 0.1652 | 0.3453 | 495 | 1.1417 | 26276776 | | 0.1915 | 0.3488 | 500 | 1.1402 | 26548176 | | 0.1144 | 0.3523 | 505 | 1.1398 | 26811736 | | 0.1495 | 0.3558 | 510 | 1.1396 | 27074568 | | 0.0938 | 0.3592 | 515 | 1.1388 | 27336872 | | 0.1582 | 0.3627 | 520 | 1.1397 | 27600664 | | 0.1563 | 0.3662 | 525 | 1.1374 | 27869688 | | 0.1637 | 0.3697 | 530 | 1.1369 | 28136912 | | 0.1926 | 0.3732 | 535 | 1.1376 | 28397024 | | 0.1246 | 0.3767 | 540 | 1.1424 | 28663128 | | 0.124 | 0.3802 | 545 | 1.1405 | 28928160 | | 0.1651 | 0.3837 | 550 | 1.1357 | 29193256 | | 0.1705 | 0.3871 | 555 | 1.1370 | 29469488 | | 0.1742 | 0.3906 | 560 | 1.1381 | 29736456 | | 0.1332 | 0.3941 | 565 | 1.1356 | 30004776 | | 0.1699 | 0.3976 | 570 | 1.1376 | 30269160 | | 0.1459 | 0.4011 | 575 | 1.1376 | 30536768 | | 0.1499 | 0.4046 | 580 | 1.1324 | 30797352 | | 0.1346 | 0.4081 | 585 | 1.1336 | 31057984 | | 0.129 | 0.4116 | 590 | 1.1361 | 31325216 | | 0.1389 | 0.4150 | 595 | 1.1332 | 31595480 | | 0.1412 | 0.4185 | 600 | 1.1334 | 31857336 | | 0.2066 | 0.4220 | 605 | 1.1332 | 32127984 | | 0.1311 | 0.4255 | 610 | 1.1337 | 32397688 | | 0.1676 | 0.4290 | 615 | 1.1317 | 32665408 | | 0.1461 | 0.4325 | 620 | 1.1331 | 32927392 | | 0.1319 | 0.4360 | 625 | 1.1311 | 33190440 | | 0.1646 | 0.4395 | 630 | 1.1301 | 33455856 | | 0.1153 | 0.4430 | 635 | 1.1315 | 33710760 | | 0.15 | 0.4464 | 640 | 1.1326 | 33974104 | | 0.1542 | 0.4499 | 645 | 1.1311 | 34239680 | | 0.159 | 0.4534 | 650 | 1.1315 | 34496864 | | 0.1548 | 0.4569 | 655 | 1.1315 | 34763840 | | 0.1949 | 0.4604 | 660 | 1.1275 | 35033312 | | 0.1225 | 0.4639 | 665 | 1.1271 | 35302976 | | 0.1075 | 0.4674 | 670 | 1.1299 | 35564864 | | 0.1672 | 0.4709 | 675 | 1.1292 | 35826896 | | 0.1726 | 0.4743 | 680 | 1.1284 | 36091984 | | 0.1091 | 0.4778 | 685 | 1.1269 | 36361816 | | 0.1787 | 0.4813 | 690 | 1.1281 | 36633880 | | 0.2382 | 0.4848 | 695 | 1.1266 | 36897128 | | 0.1621 | 0.4883 | 700 | 1.1262 | 37166912 | | 0.1219 | 0.4918 | 705 | 1.1264 | 37432672 | | 0.1246 | 0.4953 | 710 | 1.1245 | 37701560 | | 0.1244 | 0.4988 | 715 | 1.1267 | 37970272 | | 0.1594 | 0.5022 | 720 | 1.1272 | 38234632 | | 0.1719 | 0.5057 | 725 | 1.1248 | 38500864 | | 0.2219 | 0.5092 | 730 | 1.1243 | 38772848 | | 0.2507 | 0.5127 | 735 | 1.1269 | 39039088 | | 0.1579 | 0.5162 | 740 | 1.1248 | 39308664 | | 0.1591 | 0.5197 | 745 | 1.1242 | 39571480 | | 0.2021 | 0.5232 | 750 | 1.1238 | 39840248 | | 0.1743 | 0.5267 | 755 | 1.1256 | 40108832 | | 0.1233 | 0.5301 | 760 | 1.1240 | 40376032 | | 0.0994 | 0.5336 | 765 | 1.1227 | 40644888 | | 0.1611 | 0.5371 | 770 | 1.1249 | 40908112 | | 0.2003 | 0.5406 | 775 | 1.1235 | 41179560 | | 0.1634 | 0.5441 | 780 | 1.1212 | 41440016 | | 0.1453 | 0.5476 | 785 | 1.1224 | 41705792 | | 0.2001 | 0.5511 | 790 | 1.1260 | 41973864 | | 0.0966 | 0.5546 | 795 | 1.1230 | 42238800 | | 0.1117 | 0.5581 | 800 | 1.1218 | 42503920 | | 0.1329 | 0.5615 | 805 | 1.1227 | 42764056 | | 0.1201 | 0.5650 | 810 | 1.1205 | 43034552 | | 0.1335 | 0.5685 | 815 | 1.1230 | 43291144 | | 0.1374 | 0.5720 | 820 | 1.1230 | 43552240 | | 0.1848 | 0.5755 | 825 | 1.1220 | 43822248 | | 0.1219 | 0.5790 | 830 | 1.1198 | 44088592 | | 0.1587 | 0.5825 | 835 | 1.1190 | 44352464 | | 0.2018 | 0.5860 | 840 | 1.1219 | 44618272 | | 0.1012 | 0.5894 | 845 | 1.1228 | 44884504 | | 0.1689 | 0.5929 | 850 | 1.1199 | 45145128 | | 0.1059 | 0.5964 | 855 | 1.1189 | 45409224 | | 0.1455 | 0.5999 | 860 | 1.1197 | 45679272 | | 0.0694 | 0.6034 | 865 | 1.1194 | 45950112 | | 0.1902 | 0.6069 | 870 | 1.1209 | 46218208 | | 0.1812 | 0.6104 | 875 | 1.1201 | 46487616 | | 0.1626 | 0.6139 | 880 | 1.1172 | 46748944 | | 0.096 | 0.6173 | 885 | 1.1183 | 47009152 | | 0.1288 | 0.6208 | 890 | 1.1194 | 47280744 | | 0.1101 | 0.6243 | 895 | 1.1195 | 47545352 | | 0.1378 | 0.6278 | 900 | 1.1195 | 47814864 | | 0.1172 | 0.6313 | 905 | 1.1187 | 48080808 | | 0.1363 | 0.6348 | 910 | 1.1183 | 48349280 | | 0.1101 | 0.6383 | 915 | 1.1194 | 48620456 | | 0.1411 | 0.6418 | 920 | 1.1189 | 48886456 | | 0.1156 | 0.6452 | 925 | 1.1181 | 49150432 | | 0.1821 | 0.6487 | 930 | 1.1200 | 49412392 | | 0.1125 | 0.6522 | 935 | 1.1188 | 49682696 | | 0.1048 | 0.6557 | 940 | 1.1180 | 49948680 | | 0.0824 | 0.6592 | 945 | 1.1176 | 50216176 | | 0.1427 | 0.6627 | 950 | 1.1169 | 50489944 | | 0.186 | 0.6662 | 955 | 1.1180 | 50759736 | | 0.2481 | 0.6697 | 960 | 1.1186 | 51023448 | | 0.1163 | 0.6731 | 965 | 1.1179 | 51283016 | | 0.1322 | 0.6766 | 970 | 1.1173 | 51543336 | | 0.1411 | 0.6801 | 975 | 1.1186 | 51808904 | | 0.182 | 0.6836 | 980 | 1.1168 | 52070888 | | 0.1972 | 0.6871 | 985 | 1.1164 | 52333312 | | 0.17 | 0.6906 | 990 | 1.1177 | 52600672 | | 0.137 | 0.6941 | 995 | 1.1164 | 52870784 | | 0.1906 | 0.6976 | 1000 | 1.1137 | 53138192 | | 0.1769 | 0.7011 | 1005 | 1.1139 | 53407448 | | 0.1233 | 0.7045 | 1010 | 1.1141 | 53676296 | | 0.1227 | 0.7080 | 1015 | 1.1150 | 53941152 | | 0.1432 | 0.7115 | 1020 | 1.1140 | 54205752 | | 0.1228 | 0.7150 | 1025 | 1.1116 | 54467160 | | 0.0864 | 0.7185 | 1030 | 1.1139 | 54738952 | | 0.1125 | 0.7220 | 1035 | 1.1149 | 55005232 | | 0.196 | 0.7255 | 1040 | 1.1128 | 55271256 | | 0.1382 | 0.7290 | 1045 | 1.1113 | 55536416 | | 0.1006 | 0.7324 | 1050 | 1.1157 | 55797424 | | 0.1389 | 0.7359 | 1055 | 1.1151 | 56066880 | | 0.2355 | 0.7394 | 1060 | 1.1137 | 56333520 | | 0.1486 | 0.7429 | 1065 | 1.1126 | 56595848 | | 0.116 | 0.7464 | 1070 | 1.1125 | 56861976 | | 0.1151 | 0.7499 | 1075 | 1.1154 | 57125656 | | 0.0951 | 0.7534 | 1080 | 1.1146 | 57389504 | | 0.0787 | 0.7569 | 1085 | 1.1114 | 57655800 | | 0.1477 | 0.7603 | 1090 | 1.1104 | 57923624 | | 0.1156 | 0.7638 | 1095 | 1.1139 | 58188808 | | 0.1177 | 0.7673 | 1100 | 1.1137 | 58450392 | | 0.1342 | 0.7708 | 1105 | 1.1102 | 58711432 | | 0.1254 | 0.7743 | 1110 | 1.1110 | 58979616 | | 0.1598 | 0.7778 | 1115 | 1.1128 | 59240552 | | 0.1482 | 0.7813 | 1120 | 1.1129 | 59505736 | | 0.1407 | 0.7848 | 1125 | 1.1132 | 59760680 | | 0.1267 | 0.7882 | 1130 | 1.1123 | 60029656 | | 0.1646 | 0.7917 | 1135 | 1.1128 | 60300680 | | 0.1653 | 0.7952 | 1140 | 1.1133 | 60568280 | | 0.1418 | 0.7987 | 1145 | 1.1116 | 60834856 | | 0.1253 | 0.8022 | 1150 | 1.1110 | 61096016 | | 0.1718 | 0.8057 | 1155 | 1.1118 | 61370616 | | 0.1613 | 0.8092 | 1160 | 1.1107 | 61635696 | | 0.1818 | 0.8127 | 1165 | 1.1112 | 61901944 | | 0.2125 | 0.8161 | 1170 | 1.1103 | 62167944 | | 0.1432 | 0.8196 | 1175 | 1.1123 | 62433440 | | 0.1304 | 0.8231 | 1180 | 1.1135 | 62699952 | | 0.1346 | 0.8266 | 1185 | 1.1115 | 62964840 | | 0.1394 | 0.8301 | 1190 | 1.1106 | 63239200 | | 0.0875 | 0.8336 | 1195 | 1.1106 | 63510800 | | 0.0908 | 0.8371 | 1200 | 1.1107 | 63776448 | | 0.1791 | 0.8406 | 1205 | 1.1110 | 64035136 | | 0.1111 | 0.8441 | 1210 | 1.1098 | 64298496 | | 0.1449 | 0.8475 | 1215 | 1.1099 | 64568864 | | 0.0927 | 0.8510 | 1220 | 1.1096 | 64828248 | | 0.1184 | 0.8545 | 1225 | 1.1105 | 65097160 | | 0.1134 | 0.8580 | 1230 | 1.1096 | 65360720 | | 0.137 | 0.8615 | 1235 | 1.1079 | 65618128 | | 0.1902 | 0.8650 | 1240 | 1.1089 | 65881112 | | 0.0928 | 0.8685 | 1245 | 1.1101 | 66146016 | | 0.1157 | 0.8720 | 1250 | 1.1104 | 66403760 | | 0.1252 | 0.8754 | 1255 | 1.1090 | 66667576 | | 0.1246 | 0.8789 | 1260 | 1.1086 | 66936136 | | 0.071 | 0.8824 | 1265 | 1.1105 | 67201944 | | 0.1103 | 0.8859 | 1270 | 1.1116 | 67473424 | | 0.1637 | 0.8894 | 1275 | 1.1109 | 67736232 | | 0.1647 | 0.8929 | 1280 | 1.1089 | 68004328 | | 0.108 | 0.8964 | 1285 | 1.1093 | 68257848 | | 0.138 | 0.8999 | 1290 | 1.1094 | 68528304 | | 0.127 | 0.9033 | 1295 | 1.1094 | 68794728 | | 0.1024 | 0.9068 | 1300 | 1.1076 | 69068984 | | 0.1781 | 0.9103 | 1305 | 1.1081 | 69335832 | | 0.0999 | 0.9138 | 1310 | 1.1084 | 69600344 | | 0.1854 | 0.9173 | 1315 | 1.1080 | 69860304 | | 0.214 | 0.9208 | 1320 | 1.1065 | 70124608 | | 0.164 | 0.9243 | 1325 | 1.1058 | 70387744 | | 0.1006 | 0.9278 | 1330 | 1.1061 | 70656296 | | 0.1319 | 0.9312 | 1335 | 1.1085 | 70919480 | | 0.1377 | 0.9347 | 1340 | 1.1079 | 71183312 | | 0.1027 | 0.9382 | 1345 | 1.1066 | 71457232 | | 0.1108 | 0.9417 | 1350 | 1.1062 | 71718440 | | 0.0757 | 0.9452 | 1355 | 1.1080 | 71984784 | | 0.1517 | 0.9487 | 1360 | 1.1078 | 72244848 | | 0.219 | 0.9522 | 1365 | 1.1053 | 72507952 | | 0.1911 | 0.9557 | 1370 | 1.1045 | 72772808 | | 0.1306 | 0.9591 | 1375 | 1.1064 | 73036664 | | 0.1457 | 0.9626 | 1380 | 1.1068 | 73302280 | | 0.1485 | 0.9661 | 1385 | 1.1049 | 73568576 | | 0.1622 | 0.9696 | 1390 | 1.1053 | 73839240 | | 0.1118 | 0.9731 | 1395 | 1.1101 | 74110880 | | 0.1263 | 0.9766 | 1400 | 1.1073 | 74373336 | | 0.1481 | 0.9801 | 1405 | 1.1043 | 74644872 | | 0.15 | 0.9836 | 1410 | 1.1059 | 74912200 | | 0.1295 | 0.9871 | 1415 | 1.1064 | 75181024 | | 0.1734 | 0.9905 | 1420 | 1.1063 | 75456792 | | 0.2327 | 0.9940 | 1425 | 1.1052 | 75723432 | | 0.1585 | 0.9975 | 1430 | 1.1034 | 75990824 | ### Framework versions - Transformers 4.44.0 - Pytorch 2.4.0+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1