Update README.md
Browse files
README.md
CHANGED
@@ -35,6 +35,16 @@ StarChat is a series of language models that are trained to act as helpful codin
|
|
35 |
- **Repository:** https://github.com/huggingface/alignment-handbook
|
36 |
- **Demo:** https://huggingface.co/spaces/HuggingFaceH4/starchat2-playground
|
37 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
38 |
|
39 |
## Intended uses & limitations
|
40 |
|
|
|
35 |
- **Repository:** https://github.com/huggingface/alignment-handbook
|
36 |
- **Demo:** https://huggingface.co/spaces/HuggingFaceH4/starchat2-playground
|
37 |
|
38 |
+
## Performance
|
39 |
+
|
40 |
+
StarChat2 15B was trained to balance chat and programming capabilities. It achieves strong performance on chat benchmarks like [MT Bench](https://huggingface.co/spaces/lmsys/mt-bench) and [IFEval](https://arxiv.org/abs/2311.07911), as well as the canonical HumanEval benchmark for Python code completion. The scores reported below were obtained using the [LightEval](https://github.com/huggingface/lighteval) evaluation suite (commit `988959cb905df4baa050f82b4d499d46e8b537f2`) and each prompt has been formatted with the model's corresponding chat template to simulate real-world usage. This is why some scores may differ from those reported in technical reports or on the Open LLM Leaderboard.
|
41 |
+
|
42 |
+
| Model | MT Bench | IFEval | HumanEval |
|
43 |
+
|-------------------------------------------------------------------------------------------------|---------:|-------:|----------:|
|
44 |
+
| [starchat2-15b-v0.1](https://huggingface.co/HuggingFaceH4/starchat2-15b-v0.1) | 7.66 | 35.12 | 71.34 |
|
45 |
+
| [deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct) | 4.17 | 14.23 | 80.48 |
|
46 |
+
| [CodeLlama-13b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-13b-Instruct-hf) | 6.80 | 43.44 | 50.60 |
|
47 |
+
|
48 |
|
49 |
## Intended uses & limitations
|
50 |
|