deepseek-ai
/

DeepSeek-V2.5

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

luofuli commited on Sep 6, 2024

Commit

594d86b

·

verified ·

1 Parent(s): 5941330

Update README.md

Files changed (1) hide show

README.md +13 -8

README.md CHANGED Viewed

@@ -58,15 +58,20 @@ For model details, please visit [DeepSeek-V2 page](https://github.com/deepseek-a
 DeepSeek-V2.5 better aligns with human preferences and has been optimized in various aspects, including writing and instruction following:
-- ArenaHard winrate increased from 68.3% to 76.3%
-- AlpacaEval 2.0 LC winrate increased from 46.61% to 50.52%
-- MT-Bench score increased from 8.84 to 9.02
-- AlignBench score increased from 7.88 to 8.04
-DeepSeek-V2.5 further enhances code generation capabilities, optimizing for common programming application scenarios, and achieving the following results on benchmarks:
-- HumanEval: 89%
-- LiveCodeBench (January - September): 41%
 ## 2. How to run locally

 DeepSeek-V2.5 better aligns with human preferences and has been optimized in various aspects, including writing and instruction following:
+| Metric                 | DeepSeek-V2-0628 | DeepSeek-Coder-V2-0724 | DeepSeek-V2.5 |
+|------------------------|------------------|------------------------|---------------|
+| AlpacaEval 2.0          | 46.6             | 44.5                   | 50.5          |
+| ArenaHard              | 68.3             | 66.3                   | 76.2          |
+| AlignBench             | 7.88             | 7.91                   | 8.04          |
+| MT-Bench               | 8.85             | 8.91                   | 9.02          |
+| HumanEval python       | 84.5             | 87.2                   | 89            |
+| HumanEval Multi        | 73.8             | 74.8                   | 73.8          |
+| LiveCodeBench(01-09)   | 36.6             | 39.7                   | 41.8          |
+| Aider                  | 69.9             | 72.9                   | 72.2          |
+| SWE-verified           | N/A              | 19                     | 16.8          |
+| DS-FIM-Eval            | N/A              | 73.2                   | 78.3          |
+| DS-Arena-Code          | N/A              | 49.5                   | 63.1          |
 ## 2. How to run locally