Safetensors
qwen2
Guanyu419 commited on
Commit
72544ed
·
verified ·
1 Parent(s): 092cfc3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -10
README.md CHANGED
@@ -13,21 +13,19 @@ Hammer-7b is a cutting-edge Large Language Model (LLM) crafted to boost the crit
13
  Hammer-7b is a finetuned model built upon [Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct). It's trained using the [APIGen Function Calling Datasets](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) containing 60,000 samples, supplemented by 7,500 irrelevance detection data we generated. Employing innovative training techniques like function masking, function shuffling, and prompt optimization, Hammer-7b has achieved exceptional performances across numerous benchmarks including [Berkley Function Calling Leaderboard](https://gorilla.cs.berkeley.edu/leaderboard.html), [API-Bank](https://arxiv.org/abs/2304.08244), [Tool-Alpaca](https://arxiv.org/abs/2306.05301), [Nexus Raven](https://github.com/nexusflowai/NexusRaven-V2) and [Seal-Tools](https://arxiv.org/abs/2405.08355).
14
 
15
  ## Evaluation
16
- First, we evaluate our model on the Berkeley Function-Calling Leaderboard (BFCL), and the performance is as follows:
17
 
18
  <div style="text-align: center;">
19
  <img src="figures/bfcl.PNG" alt="overview" width="1480" style="margin: auto;">
20
  </div>
21
 
22
- *Note: The rankings are based on the performance metrics provided.*
23
-
24
- In addition, we also evaluated our model on other benchmarks. Below are the results across several benchmarks, derived from evaluations performed in a zero-shot manner. Our model, Hammer-7b, demonstrated superior performance compared to other models. The table below replicates and extends the format found in ["Granite-Function Calling Model"](https://arxiv.org/abs/2407.00121), particularly Table 6: Function Calling Academic Benchmarks.
25
 
26
  <div style="text-align: center;">
27
  <img src="figures/other.PNG" alt="overview" width="880" style="margin: auto;">
28
  </div>
29
 
30
- Finally, we evaluate the performance of our model on the [Seal-Tools](https://arxiv.org/abs/2405.08355) dataset, which also achieves better performance.
31
 
32
  <div style="text-align: center;">
33
  <img src="figures/sealtool.PNG" alt="overview" width="480" style="margin: auto;">
@@ -44,7 +42,6 @@ import json
44
  import torch
45
  from transformers import AutoModelForCausalLM, AutoTokenizer
46
 
47
-
48
  model_name = "MadeAgents/Hammer-7b"
49
  model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto", trust_remote_code=True)
50
  tokenizer = AutoTokenizer.from_pretrained(model_name)
@@ -70,9 +67,6 @@ The example format is as follows. Please make sure the parameter type is correct
70
  # Define the input query and available tools
71
  query = "Where can I find live giveaways for beta access and games? And what's the weather like in New York, US?"
72
 
73
-
74
-
75
-
76
  live_giveaways_by_type = {
77
  "name": "live_giveaways_by_type",
78
  "description": "Retrieve live giveaways from the GamerPower API based on the specified type.",
@@ -117,7 +111,6 @@ get_stock_price={
117
  }
118
  }
119
 
120
-
121
  def convert_to_format_tool(tools):
122
  ''''''
123
  if isinstance(tools, dict):
 
13
  Hammer-7b is a finetuned model built upon [Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct). It's trained using the [APIGen Function Calling Datasets](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) containing 60,000 samples, supplemented by 7,500 irrelevance detection data we generated. Employing innovative training techniques like function masking, function shuffling, and prompt optimization, Hammer-7b has achieved exceptional performances across numerous benchmarks including [Berkley Function Calling Leaderboard](https://gorilla.cs.berkeley.edu/leaderboard.html), [API-Bank](https://arxiv.org/abs/2304.08244), [Tool-Alpaca](https://arxiv.org/abs/2306.05301), [Nexus Raven](https://github.com/nexusflowai/NexusRaven-V2) and [Seal-Tools](https://arxiv.org/abs/2405.08355).
14
 
15
  ## Evaluation
16
+ First, we evaluate Hammer-7b on the Berkeley Function-Calling Leaderboard (BFCL):
17
 
18
  <div style="text-align: center;">
19
  <img src="figures/bfcl.PNG" alt="overview" width="1480" style="margin: auto;">
20
  </div>
21
 
22
+ In addition, we evaluated Hammer-7b on other academic benchmarks to further show our model's generalization ability:
 
 
23
 
24
  <div style="text-align: center;">
25
  <img src="figures/other.PNG" alt="overview" width="880" style="margin: auto;">
26
  </div>
27
 
28
+ Finally, we evaluate the performance of our model on the [Seal-Tools](https://arxiv.org/abs/2405.08355) dataset:
29
 
30
  <div style="text-align: center;">
31
  <img src="figures/sealtool.PNG" alt="overview" width="480" style="margin: auto;">
 
42
  import torch
43
  from transformers import AutoModelForCausalLM, AutoTokenizer
44
 
 
45
  model_name = "MadeAgents/Hammer-7b"
46
  model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto", trust_remote_code=True)
47
  tokenizer = AutoTokenizer.from_pretrained(model_name)
 
67
  # Define the input query and available tools
68
  query = "Where can I find live giveaways for beta access and games? And what's the weather like in New York, US?"
69
 
 
 
 
70
  live_giveaways_by_type = {
71
  "name": "live_giveaways_by_type",
72
  "description": "Retrieve live giveaways from the GamerPower API based on the specified type.",
 
111
  }
112
  }
113
 
 
114
  def convert_to_format_tool(tools):
115
  ''''''
116
  if isinstance(tools, dict):