Submit SWE-bench result

#4
by EwoutH - opened

A SWE-bench verified result of 16.8 was noted in the model card. Congratulations!

It would be great if that could be submitted to https://github.com/swe-bench/experiments, to be on the official scoreboard and verifiable.

It appears that the SWE-verified score of 16.8 is lower than the DeepSeek-Coder-V2-0724 score of 19. Does this indicate a significant decline in programming capability?

Sign up or log in to comment