Update README.md
Browse files
README.md
CHANGED
@@ -58,15 +58,20 @@ For model details, please visit [DeepSeek-V2 page](https://github.com/deepseek-a
|
|
58 |
|
59 |
DeepSeek-V2.5 better aligns with human preferences and has been optimized in various aspects, including writing and instruction following:
|
60 |
|
61 |
-
-
|
62 |
-
|
63 |
-
|
64 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
65 |
|
66 |
-
DeepSeek-V2.5 further enhances code generation capabilities, optimizing for common programming application scenarios, and achieving the following results on benchmarks:
|
67 |
-
|
68 |
-
- HumanEval: 89%
|
69 |
-
- LiveCodeBench (January - September): 41%
|
70 |
|
71 |
## 2. How to run locally
|
72 |
|
|
|
58 |
|
59 |
DeepSeek-V2.5 better aligns with human preferences and has been optimized in various aspects, including writing and instruction following:
|
60 |
|
61 |
+
| Metric | DeepSeek-V2-0628 | DeepSeek-Coder-V2-0724 | DeepSeek-V2.5 |
|
62 |
+
|------------------------|------------------|------------------------|---------------|
|
63 |
+
| AlpacaEval 2.0 | 46.6 | 44.5 | 50.5 |
|
64 |
+
| ArenaHard | 68.3 | 66.3 | 76.2 |
|
65 |
+
| AlignBench | 7.88 | 7.91 | 8.04 |
|
66 |
+
| MT-Bench | 8.85 | 8.91 | 9.02 |
|
67 |
+
| HumanEval python | 84.5 | 87.2 | 89 |
|
68 |
+
| HumanEval Multi | 73.8 | 74.8 | 73.8 |
|
69 |
+
| LiveCodeBench(01-09) | 36.6 | 39.7 | 41.8 |
|
70 |
+
| Aider | 69.9 | 72.9 | 72.2 |
|
71 |
+
| SWE-verified | N/A | 19 | 16.8 |
|
72 |
+
| DS-FIM-Eval | N/A | 73.2 | 78.3 |
|
73 |
+
| DS-Arena-Code | N/A | 49.5 | 63.1 |
|
74 |
|
|
|
|
|
|
|
|
|
75 |
|
76 |
## 2. How to run locally
|
77 |
|