Observations+benchmarks
#1
by
ChuckMcSneed
- opened
- Seems to be working at 17k, but loses minor details just like Aurelian.
- It works significantly worse with Alpaca format compared to original Goliath.
- At short context, it has problems with formatting.
- Overall performance is worse than original Goliath, as expected.
The trend of losing ~30% of SP on my meme benchmark after adding 32k context continues even here.
If anyone else has made benchmarks, please post them.
Thanks again.
I'm experimenting with a bit of fine-tuning to try and get the 32K models closer to the 4K performance, will upload a CP when it is done.
Does one of your scores capture this loss of minor details, or is it an anecdotal observation?