It's bad, sorry.
Sorry, but this model is a disappointment. I was interested when I saw the performance table, but once again real world testing is showing that these benchmarks are overrated.
I tested quantized version of this model with very low temperatures in attempt to achieve as high accuracy in coding task as possible.
I asked the model to fix a simple Pong game written in html, css and javascript. The game was previously written by another 7B model and I think it's good for testing the LLM's capability of understanding code and reasoning.
I didn't expect this model to completely fix it - that's something even 32B model struggled with and managed to fix it only partially. However, I did not expect this model to break the game further which is unfortunately exactly what happened.
Asking the model to write its own pong game from scratch using the very same prompt I used with that other 7B model results in a lot of useless and nonsensical code that doesn't look like a pong game even remotely - When you're asking the AI to write the Pong game where the player plays against the computer, you're probably not expecting it to write some "pong game" for two players (instead of the one player vs computer as per request) in which the players are supposed to control their paddles by typing something into the input boxes and even that logic actually does not work.
For future reference to anyone interested, please teach your LLMs that this is not a Pong game...