xwen-team
/

Xwen-72B-Chat

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

shenzhi-wang commited on Feb 1

Commit

ff919ce

·

verified ·

1 Parent(s): 7d4f2de

Update README.md

Files changed (1) hide show

README.md +21 -1

README.md CHANGED Viewed

@@ -86,7 +86,7 @@ print(response)
 🔒: Proprietary
-### 3.1 Arena-Hard-Auto
 All results below, except those for `Xwen-72B-Chat`, are sourced from [Arena-Hard-Auto](https://github.com/lmarena/arena-hard-auto) (accessed on February 1, 2025).
@@ -148,6 +148,26 @@ All results below, except those for `Xwen-72B-Chat`, are sourced from [Arena-Har
 | Yi-Large-Preview 🔒            | 7.20                     |
 ## References

 🔒: Proprietary
+### 3.1 Arena-Hard-Auto-v0.1
 All results below, except those for `Xwen-72B-Chat`, are sourced from [Arena-Hard-Auto](https://github.com/lmarena/arena-hard-auto) (accessed on February 1, 2025).
 | Yi-Large-Preview 🔒            | 7.20                     |
+### 3.3 MT-Bench
+> [!IMPORTANT]
+> We replaced the original judge model, `GPT-4`, in MT-Bench with the more powerful model, `GPT-4o-0513`. To keep fairness, all the results below are generated by ``GPT-4o-0513``. As a result, the following results may differ from the MT-Bench scores reported elsewhere.
+|                               | Score                    |
+| ----------------------------- | ------------------------ |
+| **Xwen-72B-Chat** 🔑           | **8.64** (Top-1 Among 🔑) |
+| Qwen2.5-72B-Chat 🔑            | 8.62                     |
+| Deepseek V2.5 🔑               | 8.43                     |
+| Mistral-Large-Instruct-2407 🔑 | 8.53                     |
+| Llama3.1-70B-Instruct 🔑       | 8.23                     |
+| Llama-3.1-405B-Instruct-FP8 🔑 | 8.36                     |
+| GPT-4o-0513 🔒                 | 8.59                     |
+| Claude-3.5-Sonnet-20240620 🔒  | 6.96                     |
+| Yi-Lightning 🔒                | **8.75** (Top-1 Among 🔒) |
+| Yi-Large-Preview 🔒            | 8.32                     |
 ## References