Observations+benchmarks

#1
by ChuckMcSneed - opened
  • Seems to be working at 17k, but loses minor details just like Aurelian.
  • It works significantly worse with Alpaca format compared to original Goliath.
  • At short context, it has problems with formatting.
  • Overall performance is worse than original Goliath, as expected.

image.png

The trend of losing ~30% of SP on my meme benchmark after adding 32k context continues even here.

If anyone else has made benchmarks, please post them.

Thanks again.

I'm experimenting with a bit of fine-tuning to try and get the 32K models closer to the 4K performance, will upload a CP when it is done.

Does one of your scores capture this loss of minor details, or is it an anecdotal observation?

Sign up or log in to comment