Severian
/

Jamba-Hercules

Text Generation

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

Severian commited on Mar 31

Commit

dd66d6a

•

1 Parent(s): 169463c

Update README.md

Files changed (1) hide show

README.md +8 -48

README.md CHANGED Viewed

@@ -24,54 +24,14 @@ pipeline_tag: text-generation
 ### Open-Hermes-2.0 (Only first 1500 examples): **[ 1530/125193 4:46:45 < 386:48:08, 0.09 it/s, Epoch 0.01/1]**
-```
-1483	5.986700
-1484	5.764100
-1485	5.887200
-1486	5.445200
-1487	6.086300
-1488	5.718300
-1489	5.670300
-1490	5.440900
-1491	4.945900
-1492	6.154700
-1493	5.624800
-1494	6.868100
-1495	5.627100
-1496	5.192700
-1497	5.826800
-1498	5.512200
-1499	5.869900
-1500	5.852300
-1501	5.574800
-1502	5.299200
-1503	5.631200
-1504	5.535600
-1505	5.626000
-1506	5.093300
-1507	5.278000
-1508	5.585400
-1509	5.318600
-1510	5.319200
-1511	5.513900
-1512	5.375400
-1513	5.460600
-1514	5.045300
-1515	6.013600
-1516	5.812300
-1517	5.707400
-1518	5.109800
-1519	5.212900
-1520	5.317200
-1521	5.935400
-1522	5.733900
-1523	5.866000
-1524	5.675400
-1525	5.580800
-1526	4.996900
-1527	5.666700
-1528	4.979900
-```
 ### Hyperparameters

 ### Open-Hermes-2.0 (Only first 1500 examples): **[ 1530/125193 4:46:45 < 386:48:08, 0.09 it/s, Epoch 0.01/1]**
+**Notes:**
+- Tried over 30+ combinations of hyperparameters. Below are the best I could land on.
+- Loss hovered around ~5-6 no matter what I tried with the learning rate.
+- Couldn't increase batch size due to Colab limitations, so the answer may lie somewhere in a perfect balance of Lr and Batch Size.
 ### Hyperparameters