Update README.md
Browse files
README.md
CHANGED
@@ -14,443 +14,25 @@ Hammer-7b is a finetuned model built upon [Qwen2-7B-Instruct](https://huggingfac
|
|
14 |
|
15 |
## Evaluation
|
16 |
First, we evaluate our model on the Berkeley Function-Calling Leaderboard (BFCL), and the performance is as follows:
|
17 |
-
|
18 |
-
|
19 |
-
.
|
20 |
-
|
21 |
-
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
|
22 |
-
font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
|
23 |
-
.tg .tg-9id2{color:#007BFF;text-align:center;vertical-align:middle}
|
24 |
-
.tg .tg-pchv{color:#212529;font-weight:bold;text-align:center;vertical-align:middle}
|
25 |
-
.tg .tg-qai4{color:#212529;text-align:center;vertical-align:middle}
|
26 |
-
.tg .tg-p59o{color:#00E;text-align:center;text-decoration:underline;vertical-align:top}
|
27 |
-
</style>
|
28 |
-
<table class="tg"><thead>
|
29 |
-
<tr>
|
30 |
-
<th class="tg-pchv"><span style="font-weight:700;font-style:normal;text-decoration:none;color:#212529">Rank</span></th>
|
31 |
-
<th class="tg-pchv"><span style="font-weight:700;font-style:normal;text-decoration:none;color:#212529">Overall</span> <span style="font-weight:700;font-style:normal;text-decoration:none;color:#212529">Acc</span></th>
|
32 |
-
<th class="tg-pchv"><span style="font-weight:700;font-style:normal;text-decoration:none;color:#212529">Model</span></th>
|
33 |
-
<th class="tg-pchv"><span style="font-weight:700;font-style:normal;text-decoration:none;color:#212529">AST</span> <span style="font-weight:700;font-style:normal;text-decoration:none;color:#212529">Summary</span></th>
|
34 |
-
<th class="tg-pchv"><span style="font-weight:700;font-style:normal;text-decoration:none;color:#212529">Exec</span> <span style="font-weight:700;font-style:normal;text-decoration:none;color:#212529">Summary</span></th>
|
35 |
-
<th class="tg-pchv"><span style="font-weight:700;font-style:normal;text-decoration:none;color:#212529">Irrelevance</span></th>
|
36 |
-
<th class="tg-pchv"><span style="font-weight:700;font-style:normal;text-decoration:none;color:#212529">Relevance</span></th>
|
37 |
-
<th class="tg-pchv"><span style="font-weight:700;font-style:normal;text-decoration:none;color:#212529">Organization</span></th>
|
38 |
-
<th class="tg-pchv"><span style="font-weight:700;font-style:normal;text-decoration:none;color:#212529">License</span></th>
|
39 |
-
</tr>
|
40 |
-
</thead>
|
41 |
-
<tbody>
|
42 |
-
<tr>
|
43 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">1</span></td>
|
44 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">85.79</span></td>
|
45 |
-
<td class="tg-p59o"><a href="https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo">GPT-4-0125-Preview (Prompt)</a></td>
|
46 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">85.5</span></td>
|
47 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">89.25</span></td>
|
48 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">61.35</span></td>
|
49 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">97.56</span></td>
|
50 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">OpenAI</span></td>
|
51 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">Proprietary</span></td>
|
52 |
-
</tr>
|
53 |
-
<tr>
|
54 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">2</span></td>
|
55 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">85</span></td>
|
56 |
-
<td class="tg-p59o"><a href="https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo">GPT-4-1106-Preview (Prompt)</a></td>
|
57 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">86.31</span></td>
|
58 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">87.38</span></td>
|
59 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">64.98</span></td>
|
60 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">90.24</span></td>
|
61 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">OpenAI</span></td>
|
62 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">Proprietary</span></td>
|
63 |
-
</tr>
|
64 |
-
<tr>
|
65 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">3</span></td>
|
66 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">84.74</span></td>
|
67 |
-
<td class="tg-p59o"><a href="https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo">GPT-4-0613 (Prompt)</a></td>
|
68 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">84.66</span></td>
|
69 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">87.57</span></td>
|
70 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">75.57</span></td>
|
71 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">82.93</span></td>
|
72 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">OpenAI</span></td>
|
73 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">Proprietary</span></td>
|
74 |
-
</tr>
|
75 |
-
<tr>
|
76 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">4</span></td>
|
77 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">83.92</span></td>
|
78 |
-
<td class="tg-9id2"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#007BFF">Hammer-7b</span></td>
|
79 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">78.7</span></td>
|
80 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">89.71</span></td>
|
81 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">72.87</span></td>
|
82 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">92.68</span></td>
|
83 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">MadeAgents</span></td>
|
84 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">cc-by-nc-4.0</span></td>
|
85 |
-
</tr>
|
86 |
-
<tr>
|
87 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">5</span></td>
|
88 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">83.89</span></td>
|
89 |
-
<td class="tg-p59o"><a href="https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo">GPT-4-turbo-2024-04-09 (Prompt)</a></td>
|
90 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">85.41</span></td>
|
91 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">88.12</span></td>
|
92 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">61.82</span></td>
|
93 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">82.93</span></td>
|
94 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">OpenAI</span></td>
|
95 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">Proprietary</span></td>
|
96 |
-
</tr>
|
97 |
-
<tr>
|
98 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">6</span></td>
|
99 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">83.35</span></td>
|
100 |
-
<td class="tg-p59o"><a href="https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/">GPT-4o-mini-2024-07-18 (Prompt)</a></td>
|
101 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">80.51</span></td>
|
102 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">87.95</span></td>
|
103 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">79.2</span></td>
|
104 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">80.49</span></td>
|
105 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">OpenAI</span></td>
|
106 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">Proprietary</span></td>
|
107 |
-
</tr>
|
108 |
-
<tr>
|
109 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">7</span></td>
|
110 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">83.13</span></td>
|
111 |
-
<td class="tg-p59o"><a href="https://openai.com/index/hello-gpt-4o/">GPT-4o-2024-05-13 (Prompt)</a></td>
|
112 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">83.83</span></td>
|
113 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">85.12</span></td>
|
114 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">77.44</span></td>
|
115 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">78.05</span></td>
|
116 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">OpenAI</span></td>
|
117 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">Proprietary</span></td>
|
118 |
-
</tr>
|
119 |
-
<tr>
|
120 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">8</span></td>
|
121 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">82.55</span></td>
|
122 |
-
<td class="tg-p59o"><a href="https://huggingface.co/meetkai/functionary-medium-v3.1">Functionary-Medium-v3.1 (FC)</a></td>
|
123 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">81.06</span></td>
|
124 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">89.32</span></td>
|
125 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">73.23</span></td>
|
126 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">70.73</span></td>
|
127 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">MeetKai</span></td>
|
128 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">MIT</span></td>
|
129 |
-
</tr>
|
130 |
-
<tr>
|
131 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">9</span></td>
|
132 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">81.78</span></td>
|
133 |
-
<td class="tg-p59o"><a href="https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo">GPT-4-1106-Preview (FC)</a></td>
|
134 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">77.95</span></td>
|
135 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">87.61</span></td>
|
136 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">72.7</span></td>
|
137 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">82.93</span></td>
|
138 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">OpenAI</span></td>
|
139 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">Proprietary</span></td>
|
140 |
-
</tr>
|
141 |
-
<tr>
|
142 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">10</span></td>
|
143 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">81.59</span></td>
|
144 |
-
<td class="tg-p59o"><a href="https://llama.meta.com/llama3">Meta-Llama-3-70B-Instruct (Prompt)</a></td>
|
145 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">80.15</span></td>
|
146 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">88.04</span></td>
|
147 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">50.47</span></td>
|
148 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">92.68</span></td>
|
149 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">Meta</span></td>
|
150 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">Meta</span> <span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">Llama</span> <span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">3</span> <span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">Community</span></td>
|
151 |
-
</tr>
|
152 |
-
<tr>
|
153 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">11</span></td>
|
154 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">80.88</span></td>
|
155 |
-
<td class="tg-p59o"><a href="https://www.anthropic.com/news/claude-3-family">Claude-3-Opus-20240229 (Prompt)</a></td>
|
156 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">79.42</span></td>
|
157 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">87.39</span></td>
|
158 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">56.15</span></td>
|
159 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">85.37</span></td>
|
160 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">Anthropic</span></td>
|
161 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">Proprietary</span></td>
|
162 |
-
</tr>
|
163 |
-
<tr>
|
164 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">12</span></td>
|
165 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">80.87</span></td>
|
166 |
-
<td class="tg-p59o"><a href="https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo">GPT-4-0125-Preview (FC)</a></td>
|
167 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">77.02</span></td>
|
168 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">85.3</span></td>
|
169 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">74.03</span></td>
|
170 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">85.37</span></td>
|
171 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">OpenAI</span></td>
|
172 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">Proprietary</span></td>
|
173 |
-
</tr>
|
174 |
-
<tr>
|
175 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">13</span></td>
|
176 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">80.23</span></td>
|
177 |
-
<td class="tg-p59o"><a href="https://huggingface.co/nvidia/nemotron-4-340b-instruct">Nemotron-4-340b-instruct (Prompt)</a></td>
|
178 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">76.67</span></td>
|
179 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">83.38</span></td>
|
180 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">84.1</span></td>
|
181 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">78.05</span></td>
|
182 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">NVIDIA</span></td>
|
183 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">nvidia-open-model-license</span></td>
|
184 |
-
</tr>
|
185 |
-
<tr>
|
186 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">14</span></td>
|
187 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">80.21</span></td>
|
188 |
-
<td class="tg-p59o"><a href="https://huggingface.co/meetkai/functionary-small-v3.1">Functionary-Small-v3.1 (FC)</a></td>
|
189 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">78.64</span></td>
|
190 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">83.45</span></td>
|
191 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">68.36</span></td>
|
192 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">85.37</span></td>
|
193 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">MeetKai</span></td>
|
194 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">MIT</span></td>
|
195 |
-
</tr>
|
196 |
-
<tr>
|
197 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">15</span></td>
|
198 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">79.66</span></td>
|
199 |
-
<td class="tg-p59o"><a href="https://mistral.ai/news/mistral-large-2407/">mistral-large-2407 (FC Any)</a></td>
|
200 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">85.61</span></td>
|
201 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">88.45</span></td>
|
202 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">0.34</span></td>
|
203 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">100</span></td>
|
204 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">Mistral</span> <span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">AI</span></td>
|
205 |
-
<td class="tg-qai4"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#212529">Proprietary</span></td>
|
206 |
-
</tr>
|
207 |
-
</tbody></table>
|
208 |
|
209 |
*Note: The rankings are based on the performance metrics provided.*
|
210 |
|
211 |
In addition, we also evaluated our model on other benchmarks. Below are the results across several benchmarks, derived from evaluations performed in a zero-shot manner. Our model, Hammer-7b, demonstrated superior performance compared to other models. The table below replicates and extends the format found in ["Granite-Function Calling Model"](https://arxiv.org/abs/2407.00121), particularly Table 6: Function Calling Academic Benchmarks.
|
212 |
|
213 |
-
<style
|
214 |
-
.
|
215 |
-
|
216 |
-
overflow:hidden;padding:12px 5px;word-break:normal;}
|
217 |
-
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
|
218 |
-
font-weight:normal;overflow:hidden;padding:12px 5px;word-break:normal;}
|
219 |
-
.tg .tg-baqh{text-align:center;vertical-align:top}
|
220 |
-
.tg .tg-7geq{background-color:#ffffc7;text-align:center;vertical-align:top}
|
221 |
-
.tg .tg-k5c1{background-color:#ffffc7;font-weight:bold;text-align:center;vertical-align:top}
|
222 |
-
.tg .tg-nrix{text-align:center;vertical-align:middle}
|
223 |
-
.tg .tg-amwm{font-weight:bold;text-align:center;vertical-align:top}
|
224 |
-
</style>
|
225 |
-
<table class="tg"><thead>
|
226 |
-
<tr>
|
227 |
-
<th class="tg-nrix" rowspan="2"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">Model</span></th>
|
228 |
-
<th class="tg-nrix" rowspan="2"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">Size</span></th>
|
229 |
-
<th class="tg-baqh" colspan="2"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">API-Bank</span> <span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">L-1</span></th>
|
230 |
-
<th class="tg-baqh" colspan="2"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">API-Bank</span> <span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">L-2</span></th>
|
231 |
-
<th class="tg-baqh" colspan="2"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">Tool-Alpaca</span></th>
|
232 |
-
<th class="tg-baqh" colspan="2"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">Nexus</span> <span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">Raven</span></th>
|
233 |
-
<th class="tg-baqh" colspan="2"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">F1</span> <span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">Average</span></th>
|
234 |
-
</tr>
|
235 |
-
<tr>
|
236 |
-
<th class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">F1</span> <span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">Func-Name</span></th>
|
237 |
-
<th class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">F1</span> <span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">Args</span></th>
|
238 |
-
<th class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">F1</span> <span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">Func-Name</span></th>
|
239 |
-
<th class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">F1</span> <span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">Args</span></th>
|
240 |
-
<th class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">F1</span> <span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">Func-Name</span></th>
|
241 |
-
<th class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">F1</span> <span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">Args</span></th>
|
242 |
-
<th class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">F1</span> <span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">Func-Name</span></th>
|
243 |
-
<th class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">F1</span> <span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">Args</span></th>
|
244 |
-
<th class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">F1</span> <span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">Func-Name</span></th>
|
245 |
-
<th class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">F1</span> <span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">Args</span></th>
|
246 |
-
</tr>
|
247 |
-
</thead>
|
248 |
-
<tbody>
|
249 |
-
<tr>
|
250 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">Functionary-small-v2.4</span></td>
|
251 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">7B</span></td>
|
252 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">78.00%</span></td>
|
253 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">70.00%</span></td>
|
254 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">54.00%</span></td>
|
255 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">45.00%</span></td>
|
256 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">88.00%</span></td>
|
257 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">47.00%</span></td>
|
258 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">82.00%</span></td>
|
259 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">64.00%</span></td>
|
260 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">75.50%</span></td>
|
261 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">56.50%</span></td>
|
262 |
-
</tr>
|
263 |
-
<tr>
|
264 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">Gorilla-openfunctions-v2</span></td>
|
265 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">7B</span></td>
|
266 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">43.00%</span></td>
|
267 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">41.00%</span></td>
|
268 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">12.00%</span></td>
|
269 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">12.00%</span></td>
|
270 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">69.00%</span></td>
|
271 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">39.00%</span></td>
|
272 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">81.00%</span></td>
|
273 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">65.00%</span></td>
|
274 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">51.20%</span></td>
|
275 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">39.30%</span></td>
|
276 |
-
</tr>
|
277 |
-
<tr>
|
278 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">Hermes-2-Pro-Mistral</span></td>
|
279 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">7B</span></td>
|
280 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">93.00%</span></td>
|
281 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">77.00%</span></td>
|
282 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">54.00%</span></td>
|
283 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">25.00%</span></td>
|
284 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">80.00%</span></td>
|
285 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">26.00%</span></td>
|
286 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">90.00%</span></td>
|
287 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">63.00%</span></td>
|
288 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">79.30%</span></td>
|
289 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">47.80%</span></td>
|
290 |
-
</tr>
|
291 |
-
<tr>
|
292 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">Mistral-Instruct-v0.3</span></td>
|
293 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">7B</span></td>
|
294 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">79.00%</span></td>
|
295 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">69.00%</span></td>
|
296 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">69.00%</span></td>
|
297 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">46.00%</span></td>
|
298 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">33.00%</span></td>
|
299 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">33.00%</span></td>
|
300 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">71.00%</span></td>
|
301 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">54.00%</span></td>
|
302 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">63.00%</span></td>
|
303 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">50.50%</span></td>
|
304 |
-
</tr>
|
305 |
-
<tr>
|
306 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">CodeGemma-Instruct</span></td>
|
307 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">7B</span></td>
|
308 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">77.00%</span></td>
|
309 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">57.00%</span></td>
|
310 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">59.00%</span></td>
|
311 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">38.00%</span></td>
|
312 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">59.00%</span></td>
|
313 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">31.00%</span></td>
|
314 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">84.00%</span></td>
|
315 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">68.00%</span></td>
|
316 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">69.80%</span></td>
|
317 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">48.50%</span></td>
|
318 |
-
</tr>
|
319 |
-
<tr>
|
320 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">Nexusflow-Raven-v2</span></td>
|
321 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">13B</span></td>
|
322 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">51.00%</span></td>
|
323 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">42.00%</span></td>
|
324 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">28.00%</span></td>
|
325 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">22.00%</span></td>
|
326 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">85.00%</span></td>
|
327 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">37.00%</span></td>
|
328 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">92.00%</span></td>
|
329 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">75.00%</span></td>
|
330 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">64.00%</span></td>
|
331 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">44.00%</span></td>
|
332 |
-
</tr>
|
333 |
-
<tr>
|
334 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">C4AI-Command-R-v01</span></td>
|
335 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">35B</span></td>
|
336 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">93.00%</span></td>
|
337 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">76.00%</span></td>
|
338 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">77.00%</span></td>
|
339 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">54.00%</span></td>
|
340 |
-
<td class="tg-amwm"><span style="font-weight:700;font-style:normal;text-decoration:none;color:#000">90.00%</span></td>
|
341 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">42.00%</span></td>
|
342 |
-
<td class="tg-amwm"><span style="font-weight:700;font-style:normal;text-decoration:none;color:#000">93.00%</span></td>
|
343 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">71.00%</span></td>
|
344 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">88.30%</span></td>
|
345 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">60.80%</span></td>
|
346 |
-
</tr>
|
347 |
-
<tr>
|
348 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">Meta-Llama-3-70B-Instruct</span></td>
|
349 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">70B</span></td>
|
350 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">85.00%</span></td>
|
351 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">67.00%</span></td>
|
352 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">69.00%</span></td>
|
353 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">52.00%</span></td>
|
354 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">78.00%</span></td>
|
355 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">43.00%</span></td>
|
356 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">70.00%</span></td>
|
357 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">52.00%</span></td>
|
358 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">75.50%</span></td>
|
359 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">53.50%</span></td>
|
360 |
-
</tr>
|
361 |
-
<tr>
|
362 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">GRANITE-20B-FUNCTIONCALLING</span></td>
|
363 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">20B</span></td>
|
364 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">91.00%</span></td>
|
365 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">71.00%</span></td>
|
366 |
-
<td class="tg-amwm"><span style="font-weight:700;font-style:normal;text-decoration:none;color:#000">83.00%</span></td>
|
367 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">60.00%</span></td>
|
368 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">89.00%</span></td>
|
369 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">44.00%</span></td>
|
370 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">92.00%</span></td>
|
371 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">72.00%</span></td>
|
372 |
-
<td class="tg-amwm"><span style="font-weight:700;font-style:normal;text-decoration:none;color:#000">88.80%</span></td>
|
373 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">61.80%</span></td>
|
374 |
-
</tr>
|
375 |
-
<tr>
|
376 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">xlam-7b-fc-r</span></td>
|
377 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">7B</span></td>
|
378 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">90.00%</span></td>
|
379 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">80.70%</span></td>
|
380 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">68.90%</span></td>
|
381 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">60.70%</span></td>
|
382 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">67.30%</span></td>
|
383 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">59.00%</span></td>
|
384 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">54.10%</span></td>
|
385 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">57.50%</span></td>
|
386 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">70.10%</span></td>
|
387 |
-
<td class="tg-baqh"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">64.50%</span></td>
|
388 |
-
</tr>
|
389 |
-
<tr>
|
390 |
-
<td class="tg-7geq"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">Hammer-7b</span></td>
|
391 |
-
<td class="tg-7geq"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">7B</span></td>
|
392 |
-
<td class="tg-k5c1"><span style="font-weight:700;font-style:normal;text-decoration:none;color:#000">93.80%</span></td>
|
393 |
-
<td class="tg-k5c1"><span style="font-weight:700;font-style:normal;text-decoration:none;color:#000">85.90%</span></td>
|
394 |
-
<td class="tg-7geq"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">79.20%</span></td>
|
395 |
-
<td class="tg-7geq"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">64.40%</span></td>
|
396 |
-
<td class="tg-7geq"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">82.30%</span></td>
|
397 |
-
<td class="tg-k5c1"><span style="font-weight:700;font-style:normal;text-decoration:none;color:#000">59.90%</span></td>
|
398 |
-
<td class="tg-7geq"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">92.50%</span></td>
|
399 |
-
<td class="tg-k5c1"><span style="font-weight:700;font-style:normal;text-decoration:none;color:#000">77.40%</span></td>
|
400 |
-
<td class="tg-7geq"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">86.90%</span></td>
|
401 |
-
<td class="tg-k5c1"><span style="font-weight:700;font-style:normal;text-decoration:none;color:#000">71.90%</span></td>
|
402 |
-
</tr>
|
403 |
-
</tbody></table>
|
404 |
|
405 |
Finally, we evaluate the performance of our model on the [Seal-Tools](https://arxiv.org/abs/2405.08355) dataset, which also achieves better performance.
|
406 |
-
|
407 |
-
|
408 |
-
.
|
409 |
-
|
410 |
-
|
411 |
-
font-weight:normal;overflow:hidden;padding:12px 5px;word-break:normal;}
|
412 |
-
.tg .tg-9wq8{border-color:inherit;text-align:center;vertical-align:middle}
|
413 |
-
.tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:top}
|
414 |
-
.tg .tg-7btt{border-color:inherit;font-weight:bold;text-align:center;vertical-align:top}
|
415 |
-
.tg .tg-mfhl{background-color:#ffffc7;border-color:inherit;text-align:center;vertical-align:top}
|
416 |
-
.tg .tg-py60{background-color:#ffffc7;border-color:inherit;font-weight:bold;text-align:center;vertical-align:top}
|
417 |
-
</style>
|
418 |
-
<table class="tg"><thead>
|
419 |
-
<tr>
|
420 |
-
<th class="tg-9wq8" rowspan="2"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">Model</span></th>
|
421 |
-
<th class="tg-9wq8" rowspan="2"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">Size</span></th>
|
422 |
-
<th class="tg-c3ow" colspan="2"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">SealTool(Single-Tool)</span></th>
|
423 |
-
</tr>
|
424 |
-
<tr>
|
425 |
-
<th class="tg-c3ow"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">F1</span> <span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">Func-Name</span></th>
|
426 |
-
<th class="tg-c3ow"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">F1</span> <span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">Args</span></th>
|
427 |
-
</tr></thead>
|
428 |
-
<tbody>
|
429 |
-
<tr>
|
430 |
-
<td class="tg-c3ow"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">Gorilla-openfunctions-v2</span></td>
|
431 |
-
<td class="tg-c3ow"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">7B</span></td>
|
432 |
-
<td class="tg-c3ow"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">93.20%</span></td>
|
433 |
-
<td class="tg-c3ow"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">91.10%</span></td>
|
434 |
-
</tr>
|
435 |
-
<tr>
|
436 |
-
<td class="tg-c3ow"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">GRANITE-20B-FUNCTIONCALLING</span></td>
|
437 |
-
<td class="tg-c3ow"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">20B</span></td>
|
438 |
-
<td class="tg-c3ow"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">94.90%</span></td>
|
439 |
-
<td class="tg-7btt"><span style="font-weight:700;font-style:normal;text-decoration:none;color:#000">92.70%</span></td>
|
440 |
-
</tr>
|
441 |
-
<tr>
|
442 |
-
<td class="tg-c3ow"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">xlam-7b-fc-r</span></td>
|
443 |
-
<td class="tg-c3ow"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">7B</span></td>
|
444 |
-
<td class="tg-c3ow"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">79.00%</span></td>
|
445 |
-
<td class="tg-c3ow"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">76.90%</span></td>
|
446 |
-
</tr>
|
447 |
-
<tr>
|
448 |
-
<td class="tg-mfhl"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">Hammer-7b</span></td>
|
449 |
-
<td class="tg-mfhl"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">7B</span></td>
|
450 |
-
<td class="tg-py60"><span style="font-weight:700;font-style:normal;text-decoration:none;color:#000">97.40%</span></td>
|
451 |
-
<td class="tg-mfhl"><span style="font-weight:400;font-style:normal;text-decoration:none;color:#000">91.70%</span></td>
|
452 |
-
</tr>
|
453 |
-
</tbody></table>
|
454 |
|
455 |
## Requiements
|
456 |
The code of Hammer-7b has been in the latest Hugging face transformers and we advise you to install `transformers>=4.37.0`.
|
|
|
14 |
|
15 |
## Evaluation
|
16 |
First, we evaluate our model on the Berkeley Function-Calling Leaderboard (BFCL), and the performance is as follows:
|
17 |
+
|
18 |
+
<div style="text-align: center;">
|
19 |
+
<img src="figures/bfcl.PNG" alt="overview" width="1480" style="margin: auto;">
|
20 |
+
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
|
22 |
*Note: The rankings are based on the performance metrics provided.*
|
23 |
|
24 |
In addition, we also evaluated our model on other benchmarks. Below are the results across several benchmarks, derived from evaluations performed in a zero-shot manner. Our model, Hammer-7b, demonstrated superior performance compared to other models. The table below replicates and extends the format found in ["Granite-Function Calling Model"](https://arxiv.org/abs/2407.00121), particularly Table 6: Function Calling Academic Benchmarks.
|
25 |
|
26 |
+
<div style="text-align: center;">
|
27 |
+
<img src="figures/other.PNG" alt="overview" width="880" style="margin: auto;">
|
28 |
+
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
|
30 |
Finally, we evaluate the performance of our model on the [Seal-Tools](https://arxiv.org/abs/2405.08355) dataset, which also achieves better performance.
|
31 |
+
|
32 |
+
<div style="text-align: center;">
|
33 |
+
<img src="figures/sealtool.PNG" alt="overview" width="480" style="margin: auto;">
|
34 |
+
</div>
|
35 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
36 |
|
37 |
## Requiements
|
38 |
The code of Hammer-7b has been in the latest Hugging face transformers and we advise you to install `transformers>=4.37.0`.
|