Kung-Hsiang Huang commited on
Commit
5765510
·
1 Parent(s): a7c0a8a

fix format

Browse files
Files changed (1) hide show
  1. crmarena_results/all_results.csv +21 -19
crmarena_results/all_results.csv CHANGED
@@ -1,23 +1,25 @@
1
  ,Model,Agentic Framework,NCR,HTU,TCU,NED,PVI,KQA,TII,MTA,BRI,Overall ⬆️
2
- 20,o1,Function Calling,60.8,68.5,66.9,60.0,24.6,39.2,99.2,84.6,74.8,64.3
3
- 18,o1,ReAct,70.0,51.5,54.6,34.6,30.0,58.8,81.5,75.4,63.1,57.7
4
- 12,gpt-4o,Function Calling,60.0,47.7,81.5,46.2,39.2,30.4,97.7,27.7,59.2,54.4
5
- 16,llama3.1-405b,Function Calling,16.2,31.5,64.6,50.0,26.9,47.6,95.4,86.9,42.3,51.3
6
- 14,claude-3.5-sonnet,Function Calling,4.6,33.1,82.3,52.3,30.0,40.5,69.2,26.9,36.9,41.8
7
- 17,llama3.1-70b,Function Calling,1.5,23.1,44.6,53.8,37.4,42.4,93.8,43.8,29.2,41.1
8
  6,gpt-4o,ReAct,70.0,39.2,22.3,30.8,35.4,50.2,64.6,20.9,10.8,38.2
9
- 2,claude-3.5-sonnet,Act,78.5,24.6,15.4,51.5,28.5,44.7,45.4,20.8,26.9,37.4
10
- 19,deepseek-r1,ReAct,53.8,23.1,30.1,40.8,34.6,61.2,46.9,3.1,22.3,35.1
11
- 8,claude-3.5-sonnet,ReAct,62.9,20.0,11.5,52.3,30.0,45.0,43.9,20.8,21.5,34.3
12
  10,llama3.1-405b,ReAct,81.5,22.3,15.4,33.9,34.6,55.3,34.6,13.9,13.1,33.8
13
- 0,gpt-4o,Act,43.1,10.0,17.7,30.8,28.5,29.3,68.5,29.2,7.7,29.4
14
- 7,gpt-4o-mini,ReAct,40.8,36.9,25.4,31.5,24.6,52.8,30.0,6.2,6.2,28.3
15
- 11,llama3.1-70b,ReAct,48.5,20.0,13.9,33.1,37.7,48.7,23.9,13.9,10.8,27.8
16
- 4,llama3.1-405b,Act,46.2,17.7,17.7,13.9,30.0,47.0,15.4,5.4,6.9,22.2
17
- 13,gpt-4o-mini,Function Calling,0.8,10.8,10.8,17.7,13.8,39.7,60.0,0.0,21.5,19.5
18
- 5,llama3.1-70b,Act,28.5,20.0,24.6,6.2,30.0,47.9,8.5,0.0,1.5,18.6
19
- 9,claude-3-sonnet,ReAct,7.7,24.6,26.9,29.2,28.5,16.0,22.3,0.8,0.0,17.3
20
- 1,gpt-4o-mini,Act,0.8,38.5,23.8,9.2,0.0,43.1,26.9,3.8,3.8,16.7
21
- 3,claude-3-sonnet,Act,9.2,26.9,24.6,30.8,23.8,16.6,16.2,1.5,0.0,16.6
22
- 15,claude-3-sonnet,Function Calling,0.8,1.5,30.0,25.4,41.5,23.2,12.3,1.5,0.0,15.1
23
  21,deepseek-r1,Function Calling,0.8,0.8,2.3,0.8,24.6,34.6,0.0,13.8,3.1,9.0
 
 
 
1
  ,Model,Agentic Framework,NCR,HTU,TCU,NED,PVI,KQA,TII,MTA,BRI,Overall ⬆️
2
+ 0,o1,Function Calling,60.8,68.5,66.9,60.0,24.6,39.2,99.2,84.6,74.8,64.3
3
+ 1,o1,ReAct,70.0,51.5,54.6,34.6,30.0,58.8,81.5,75.4,63.1,57.7
4
+ 2,gpt-4o,Function Calling,60.0,47.7,81.5,46.2,39.2,30.4,97.7,27.7,59.2,54.4
5
+ 3,llama3.1-405b,Function Calling,16.2,31.5,64.6,50.0,26.9,47.6,95.4,86.9,42.3,51.3
6
+ 4,claude-3.5-sonnet,Function Calling,4.6,33.1,82.3,52.3,30.0,40.5,69.2,26.9,36.9,41.8
7
+ 5,llama3.1-70b,Function Calling,1.5,23.1,44.6,53.8,37.4,42.4,93.8,43.8,29.2,41.1
8
  6,gpt-4o,ReAct,70.0,39.2,22.3,30.8,35.4,50.2,64.6,20.9,10.8,38.2
9
+ 7,claude-3.5-sonnet,Act,78.5,24.6,15.4,51.5,28.5,44.7,45.4,20.8,26.9,37.4
10
+ 8,deepseek-r1,ReAct,53.8,23.1,30.1,40.8,34.6,61.2,46.9,3.1,22.3,35.1
11
+ 9,claude-3.5-sonnet,ReAct,62.9,20.0,11.5,52.3,30.0,45.0,43.9,20.8,21.5,34.3
12
  10,llama3.1-405b,ReAct,81.5,22.3,15.4,33.9,34.6,55.3,34.6,13.9,13.1,33.8
13
+ 11,gpt-4o,Act,43.1,10.0,17.7,30.8,28.5,29.3,68.5,29.2,7.7,29.4
14
+ 12,gpt-4o-mini,ReAct,40.8,36.9,25.4,31.5,24.6,52.8,30.0,6.2,6.2,28.3
15
+ 13,llama3.1-70b,ReAct,48.5,20.0,13.9,33.1,37.7,48.7,23.9,13.9,10.8,27.8
16
+ 14,llama3.1-405b,Act,46.2,17.7,17.7,13.9,30.0,47.0,15.4,5.4,6.9,22.2
17
+ 15,gpt-4o-mini,Function Calling,0.8,10.8,10.8,17.7,13.8,39.7,60.0,0.0,21.5,19.5
18
+ 16,llama3.1-70b,Act,28.5,20.0,24.6,6.2,30.0,47.9,8.5,0.0,1.5,18.6
19
+ 17,claude-3-sonnet,ReAct,7.7,24.6,26.9,29.2,28.5,16.0,22.3,0.8,0.0,17.3
20
+ 18,gpt-4o-mini,Act,0.8,38.5,23.8,9.2,0.0,43.1,26.9,3.8,3.8,16.7
21
+ 19,claude-3-sonnet,Act,9.2,26.9,24.6,30.8,23.8,16.6,16.2,1.5,0.0,16.6
22
+ 20,claude-3-sonnet,Function Calling,0.8,1.5,30.0,25.4,41.5,23.2,12.3,1.5,0.0,15.1
23
  21,deepseek-r1,Function Calling,0.8,0.8,2.3,0.8,24.6,34.6,0.0,13.8,3.1,9.0
24
+ 22,llama3.1-8b,ReAct,0.0,0.0,1.5,6.2,15.4,4.0,0.0,0.0,0.8,3.1
25
+ 23,llama3.1-8b,Function Calling,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0