Update README.md
Browse files
README.md
CHANGED
@@ -45,7 +45,32 @@ The model is capable of handling various function calling scenarios, including:
|
|
45 |
## Performance
|
46 |
|
47 |
1. First, we evaluate our model on the Berkeley Function-Calling Leaderboard (BFCL), and the performance is as follows:
|
48 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
49 |
|
50 |
|
51 |
|
|
|
45 |
## Performance
|
46 |
|
47 |
1. First, we evaluate our model on the Berkeley Function-Calling Leaderboard (BFCL), and the performance is as follows:
|
48 |
+
# Model Performance Rankings
|
49 |
+
|
50 |
+
| Rank | Overall Acc | Model | AST Summary | Exec Summary | Irrelevance | Relevance | Organization | License |
|
51 |
+
|------|-------------|-------------------------------------|------------|-------------|------------|----------|--------------------|---------------------------|
|
52 |
+
| 1 | 85.79 | GPT-4-0125-Preview (Prompt) | 85.5 | 89.25 | 61.35 | 97.56 | OpenAI | Proprietary |
|
53 |
+
| 2 | 85.00 | GPT-4-1106-Preview (Prompt) | 86.31 | 87.38 | 64.98 | 90.24 | OpenAI | Proprietary |
|
54 |
+
| 3 | 84.74 | GPT-4-0613 (Prompt) | 84.66 | 87.57 | 75.57 | 82.93 | OpenAI | Proprietary |
|
55 |
+
| 4 | 83.92 | MadeAgents/hammer-7b | 78.7 | 89.71 | 72.87 | 92.68 | MadeAgents | cc-by-nc-4.0 |
|
56 |
+
| 5 | 83.89 | GPT-4-turbo-2024-04-09 (Prompt) | 85.41 | 88.12 | 61.82 | 82.93 | OpenAI | Proprietary |
|
57 |
+
| 6 | 83.35 | GPT-4o-mini-2024-07-18 (Prompt) | 80.51 | 87.95 | 79.2 | 80.49 | OpenAI | Proprietary |
|
58 |
+
| 7 | 83.13 | GPT-4o-2024-05-13 (Prompt) | 83.83 | 85.12 | 77.44 | 78.05 | OpenAI | Proprietary |
|
59 |
+
| 8 | 82.99 | MadeAgents/hammer-7b(nomask) | 77.38 | 89.12 | 73.63 | 90.24 | MadeAgents | cc-by-nc-4.0 |
|
60 |
+
| 9 | 82.55 | Functionary-Medium-v3.1 (FC) | 81.06 | 89.32 | 73.23 | 70.73 | MeetKai | MIT |
|
61 |
+
| 10 | 81.78 | GPT-4-1106-Preview (FC) | 77.95 | 87.61 | 72.7 | 82.93 | OpenAI | Proprietary |
|
62 |
+
| 11 | 81.59 | Meta-Llama-3-70B-Instruct (Prompt) | 80.15 | 88.04 | 50.47 | 92.68 | Meta | Meta Llama 3 Community |
|
63 |
+
| 12 | 80.88 | Claude-3-Opus-20240229 (Prompt) | 79.42 | 87.39 | 56.15 | 85.37 | Anthropic | Proprietary |
|
64 |
+
| 13 | 80.87 | GPT-4-0125-Preview (FC) | 77.02 | 85.3 | 74.03 | 85.37 | OpenAI | Proprietary |
|
65 |
+
| 14 | 80.23 | Nemotron-4-340b-instruct (Prompt) | 76.67 | 83.38 | 84.1 | 78.05 | NVIDIA | nvidia-open-model-license|
|
66 |
+
| 15 | 80.21 | Functionary-Small-v3.1 (FC) | 78.64 | 83.45 | 68.36 | 85.37 | MeetKai | MIT |
|
67 |
+
| 16 | 79.66 | mistral-large-2407 (FC Any) | 85.61 | 88.45 | 0.34 | 100 | Mistral AI | Proprietary |
|
68 |
+
| 17 | 79.55 | GPT-4o-2024-05-13 (FC) | 79.45 | 83.36 | 73.5 | 70.73 | OpenAI | Proprietary |
|
69 |
+
| 18 | 79.41 | xLAM-7b-fc-r (FC) | 72.77 | 85.68 | 79.76 | 80.49 | Salesforce | cc-by-nc-4.0 |
|
70 |
+
| 19 | 79.25 | GPT-4o-mini-2024-07-18 (FC) | 77.63 | 81.8 | 71.83 | 82.93 | OpenAI | Proprietary |
|
71 |
+
| 20 | 79.14 | Open-Mixtral-8x22b (Prompt) | 75.6 | 86.71 | 71.42 | 70.73 | Mistral AI | Proprietary |
|
72 |
+
|
73 |
+
*Note: The rankings are based on the performance metrics provided.*
|
74 |
|
75 |
|
76 |
|