Safetensors
qwen2
linqq9 commited on
Commit
46bbe74
·
verified ·
1 Parent(s): 9d00b78

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -1
README.md CHANGED
@@ -45,7 +45,32 @@ The model is capable of handling various function calling scenarios, including:
45
  ## Performance
46
 
47
  1. First, we evaluate our model on the Berkeley Function-Calling Leaderboard (BFCL), and the performance is as follows:
48
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
 
50
 
51
 
 
45
  ## Performance
46
 
47
  1. First, we evaluate our model on the Berkeley Function-Calling Leaderboard (BFCL), and the performance is as follows:
48
+ # Model Performance Rankings
49
+
50
+ | Rank | Overall Acc | Model | AST Summary | Exec Summary | Irrelevance | Relevance | Organization | License |
51
+ |------|-------------|-------------------------------------|------------|-------------|------------|----------|--------------------|---------------------------|
52
+ | 1 | 85.79 | GPT-4-0125-Preview (Prompt) | 85.5 | 89.25 | 61.35 | 97.56 | OpenAI | Proprietary |
53
+ | 2 | 85.00 | GPT-4-1106-Preview (Prompt) | 86.31 | 87.38 | 64.98 | 90.24 | OpenAI | Proprietary |
54
+ | 3 | 84.74 | GPT-4-0613 (Prompt) | 84.66 | 87.57 | 75.57 | 82.93 | OpenAI | Proprietary |
55
+ | 4 | 83.92 | MadeAgents/hammer-7b | 78.7 | 89.71 | 72.87 | 92.68 | MadeAgents | cc-by-nc-4.0 |
56
+ | 5 | 83.89 | GPT-4-turbo-2024-04-09 (Prompt) | 85.41 | 88.12 | 61.82 | 82.93 | OpenAI | Proprietary |
57
+ | 6 | 83.35 | GPT-4o-mini-2024-07-18 (Prompt) | 80.51 | 87.95 | 79.2 | 80.49 | OpenAI | Proprietary |
58
+ | 7 | 83.13 | GPT-4o-2024-05-13 (Prompt) | 83.83 | 85.12 | 77.44 | 78.05 | OpenAI | Proprietary |
59
+ | 8 | 82.99 | MadeAgents/hammer-7b(nomask) | 77.38 | 89.12 | 73.63 | 90.24 | MadeAgents | cc-by-nc-4.0 |
60
+ | 9 | 82.55 | Functionary-Medium-v3.1 (FC) | 81.06 | 89.32 | 73.23 | 70.73 | MeetKai | MIT |
61
+ | 10 | 81.78 | GPT-4-1106-Preview (FC) | 77.95 | 87.61 | 72.7 | 82.93 | OpenAI | Proprietary |
62
+ | 11 | 81.59 | Meta-Llama-3-70B-Instruct (Prompt) | 80.15 | 88.04 | 50.47 | 92.68 | Meta | Meta Llama 3 Community |
63
+ | 12 | 80.88 | Claude-3-Opus-20240229 (Prompt) | 79.42 | 87.39 | 56.15 | 85.37 | Anthropic | Proprietary |
64
+ | 13 | 80.87 | GPT-4-0125-Preview (FC) | 77.02 | 85.3 | 74.03 | 85.37 | OpenAI | Proprietary |
65
+ | 14 | 80.23 | Nemotron-4-340b-instruct (Prompt) | 76.67 | 83.38 | 84.1 | 78.05 | NVIDIA | nvidia-open-model-license|
66
+ | 15 | 80.21 | Functionary-Small-v3.1 (FC) | 78.64 | 83.45 | 68.36 | 85.37 | MeetKai | MIT |
67
+ | 16 | 79.66 | mistral-large-2407 (FC Any) | 85.61 | 88.45 | 0.34 | 100 | Mistral AI | Proprietary |
68
+ | 17 | 79.55 | GPT-4o-2024-05-13 (FC) | 79.45 | 83.36 | 73.5 | 70.73 | OpenAI | Proprietary |
69
+ | 18 | 79.41 | xLAM-7b-fc-r (FC) | 72.77 | 85.68 | 79.76 | 80.49 | Salesforce | cc-by-nc-4.0 |
70
+ | 19 | 79.25 | GPT-4o-mini-2024-07-18 (FC) | 77.63 | 81.8 | 71.83 | 82.93 | OpenAI | Proprietary |
71
+ | 20 | 79.14 | Open-Mixtral-8x22b (Prompt) | 75.6 | 86.71 | 71.42 | 70.73 | Mistral AI | Proprietary |
72
+
73
+ *Note: The rankings are based on the performance metrics provided.*
74
 
75
 
76