Update README.md
Browse files
README.md
CHANGED
@@ -13,21 +13,19 @@ Hammer-7b is a cutting-edge Large Language Model (LLM) crafted to boost the crit
|
|
13 |
Hammer-7b is a finetuned model built upon [Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct). It's trained using the [APIGen Function Calling Datasets](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) containing 60,000 samples, supplemented by 7,500 irrelevance detection data we generated. Employing innovative training techniques like function masking, function shuffling, and prompt optimization, Hammer-7b has achieved exceptional performances across numerous benchmarks including [Berkley Function Calling Leaderboard](https://gorilla.cs.berkeley.edu/leaderboard.html), [API-Bank](https://arxiv.org/abs/2304.08244), [Tool-Alpaca](https://arxiv.org/abs/2306.05301), [Nexus Raven](https://github.com/nexusflowai/NexusRaven-V2) and [Seal-Tools](https://arxiv.org/abs/2405.08355).
|
14 |
|
15 |
## Evaluation
|
16 |
-
First, we evaluate
|
17 |
|
18 |
<div style="text-align: center;">
|
19 |
<img src="figures/bfcl.PNG" alt="overview" width="1480" style="margin: auto;">
|
20 |
</div>
|
21 |
|
22 |
-
|
23 |
-
|
24 |
-
In addition, we also evaluated our model on other benchmarks. Below are the results across several benchmarks, derived from evaluations performed in a zero-shot manner. Our model, Hammer-7b, demonstrated superior performance compared to other models. The table below replicates and extends the format found in ["Granite-Function Calling Model"](https://arxiv.org/abs/2407.00121), particularly Table 6: Function Calling Academic Benchmarks.
|
25 |
|
26 |
<div style="text-align: center;">
|
27 |
<img src="figures/other.PNG" alt="overview" width="880" style="margin: auto;">
|
28 |
</div>
|
29 |
|
30 |
-
Finally, we evaluate the performance of our model on the [Seal-Tools](https://arxiv.org/abs/2405.08355) dataset
|
31 |
|
32 |
<div style="text-align: center;">
|
33 |
<img src="figures/sealtool.PNG" alt="overview" width="480" style="margin: auto;">
|
@@ -44,7 +42,6 @@ import json
|
|
44 |
import torch
|
45 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
46 |
|
47 |
-
|
48 |
model_name = "MadeAgents/Hammer-7b"
|
49 |
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto", trust_remote_code=True)
|
50 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
@@ -70,9 +67,6 @@ The example format is as follows. Please make sure the parameter type is correct
|
|
70 |
# Define the input query and available tools
|
71 |
query = "Where can I find live giveaways for beta access and games? And what's the weather like in New York, US?"
|
72 |
|
73 |
-
|
74 |
-
|
75 |
-
|
76 |
live_giveaways_by_type = {
|
77 |
"name": "live_giveaways_by_type",
|
78 |
"description": "Retrieve live giveaways from the GamerPower API based on the specified type.",
|
@@ -117,7 +111,6 @@ get_stock_price={
|
|
117 |
}
|
118 |
}
|
119 |
|
120 |
-
|
121 |
def convert_to_format_tool(tools):
|
122 |
''''''
|
123 |
if isinstance(tools, dict):
|
|
|
13 |
Hammer-7b is a finetuned model built upon [Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct). It's trained using the [APIGen Function Calling Datasets](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) containing 60,000 samples, supplemented by 7,500 irrelevance detection data we generated. Employing innovative training techniques like function masking, function shuffling, and prompt optimization, Hammer-7b has achieved exceptional performances across numerous benchmarks including [Berkley Function Calling Leaderboard](https://gorilla.cs.berkeley.edu/leaderboard.html), [API-Bank](https://arxiv.org/abs/2304.08244), [Tool-Alpaca](https://arxiv.org/abs/2306.05301), [Nexus Raven](https://github.com/nexusflowai/NexusRaven-V2) and [Seal-Tools](https://arxiv.org/abs/2405.08355).
|
14 |
|
15 |
## Evaluation
|
16 |
+
First, we evaluate Hammer-7b on the Berkeley Function-Calling Leaderboard (BFCL):
|
17 |
|
18 |
<div style="text-align: center;">
|
19 |
<img src="figures/bfcl.PNG" alt="overview" width="1480" style="margin: auto;">
|
20 |
</div>
|
21 |
|
22 |
+
In addition, we evaluated Hammer-7b on other academic benchmarks to further show our model's generalization ability:
|
|
|
|
|
23 |
|
24 |
<div style="text-align: center;">
|
25 |
<img src="figures/other.PNG" alt="overview" width="880" style="margin: auto;">
|
26 |
</div>
|
27 |
|
28 |
+
Finally, we evaluate the performance of our model on the [Seal-Tools](https://arxiv.org/abs/2405.08355) dataset:
|
29 |
|
30 |
<div style="text-align: center;">
|
31 |
<img src="figures/sealtool.PNG" alt="overview" width="480" style="margin: auto;">
|
|
|
42 |
import torch
|
43 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
44 |
|
|
|
45 |
model_name = "MadeAgents/Hammer-7b"
|
46 |
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto", trust_remote_code=True)
|
47 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
|
|
67 |
# Define the input query and available tools
|
68 |
query = "Where can I find live giveaways for beta access and games? And what's the weather like in New York, US?"
|
69 |
|
|
|
|
|
|
|
70 |
live_giveaways_by_type = {
|
71 |
"name": "live_giveaways_by_type",
|
72 |
"description": "Retrieve live giveaways from the GamerPower API based on the specified type.",
|
|
|
111 |
}
|
112 |
}
|
113 |
|
|
|
114 |
def convert_to_format_tool(tools):
|
115 |
''''''
|
116 |
if isinstance(tools, dict):
|