Update README.md
Browse files
README.md
CHANGED
@@ -24,54 +24,14 @@ pipeline_tag: text-generation
|
|
24 |
|
25 |
### Open-Hermes-2.0 (Only first 1500 examples): **[ 1530/125193 4:46:45 < 386:48:08, 0.09 it/s, Epoch 0.01/1]**
|
26 |
|
27 |
-
|
28 |
-
|
29 |
-
|
30 |
-
|
31 |
-
|
32 |
-
|
33 |
-
|
34 |
-
|
35 |
-
1490 5.440900
|
36 |
-
1491 4.945900
|
37 |
-
1492 6.154700
|
38 |
-
1493 5.624800
|
39 |
-
1494 6.868100
|
40 |
-
1495 5.627100
|
41 |
-
1496 5.192700
|
42 |
-
1497 5.826800
|
43 |
-
1498 5.512200
|
44 |
-
1499 5.869900
|
45 |
-
1500 5.852300
|
46 |
-
1501 5.574800
|
47 |
-
1502 5.299200
|
48 |
-
1503 5.631200
|
49 |
-
1504 5.535600
|
50 |
-
1505 5.626000
|
51 |
-
1506 5.093300
|
52 |
-
1507 5.278000
|
53 |
-
1508 5.585400
|
54 |
-
1509 5.318600
|
55 |
-
1510 5.319200
|
56 |
-
1511 5.513900
|
57 |
-
1512 5.375400
|
58 |
-
1513 5.460600
|
59 |
-
1514 5.045300
|
60 |
-
1515 6.013600
|
61 |
-
1516 5.812300
|
62 |
-
1517 5.707400
|
63 |
-
1518 5.109800
|
64 |
-
1519 5.212900
|
65 |
-
1520 5.317200
|
66 |
-
1521 5.935400
|
67 |
-
1522 5.733900
|
68 |
-
1523 5.866000
|
69 |
-
1524 5.675400
|
70 |
-
1525 5.580800
|
71 |
-
1526 4.996900
|
72 |
-
1527 5.666700
|
73 |
-
1528 4.979900
|
74 |
-
```
|
75 |
|
76 |
### Hyperparameters
|
77 |
|
|
|
24 |
|
25 |
### Open-Hermes-2.0 (Only first 1500 examples): **[ 1530/125193 4:46:45 < 386:48:08, 0.09 it/s, Epoch 0.01/1]**
|
26 |
|
27 |
+
**Notes:**
|
28 |
+
|
29 |
+
- Tried over 30+ combinations of hyperparameters. Below are the best I could land on.
|
30 |
+
|
31 |
+
- Loss hovered around ~5-6 no matter what I tried with the learning rate.
|
32 |
+
|
33 |
+
- Couldn't increase batch size due to Colab limitations, so the answer may lie somewhere in a perfect balance of Lr and Batch Size.
|
34 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
35 |
|
36 |
### Hyperparameters
|
37 |
|