Safetensors
qwen2
qypeng commited on
Commit
1d64965
·
verified ·
1 Parent(s): 3e5abcf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -13
README.md CHANGED
@@ -2,31 +2,24 @@
2
  license: cc-by-4.0
3
  datasets:
4
  - Salesforce/xlam-function-calling-60k
 
5
  base_model: Qwen/Qwen2-7B-Instruct
6
  ---
7
  # Hammer-7b Function Calling Model
8
 
9
  ## <font color=red>\[Updates!!!\]</font> Hammer 2.0 Series have been Published
10
 
11
- We're excited to introduce Hammer 2.0, the latest in our Hammer Large Language Models series designed to enhance AI function calling. Differing from existing models focusing on training data refinement, Hammer optimizes performance primarily through advanced training techniques. In this version, we release a number of models with sizes ranging from 0.5B to 7B:
12
- [0.5B](https://huggingface.co/MadeAgents/Hammer2.0-0.5b),
13
- [1.5B](https://huggingface.co/MadeAgents/Hammer2.0-1.5b),
14
- [4B](https://huggingface.co/MadeAgents/Hammer2.0-3b), and [7B](https://huggingface.co/MadeAgents/Hammer2.0-0.5b).
15
-
16
 
17
 
18
  ## Introduction
19
  **Hammer** is a series of cutting-edge Large Language Models (LLMs) crafted to boost the critical capability of AI agents: function calling. Differing from existing models focusing on training data refinement, Hammer optimizes performance primarily through advanced training techniques. Focusing on on-device applications, we release a number of models from [1.5B](https://huggingface.co/MadeAgents/Hammer-1.5b), [4B](https://huggingface.co/MadeAgents/Hammer-4b) to [7B](https://huggingface.co/MadeAgents/Hammer-7b) parameters.
20
 
21
  ## Model Details
22
- Hammer-7b is a finetuned model built upon [Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct). It's trained using the [APIGen Function Calling Datasets](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) containing 60,000 samples, supplemented by [7,500 irrelevance detection data](https://huggingface.co/datasets/MadeAgents/XLAM-7.5k-Irrelevance) we generated. Employing innovative training techniques like function masking, function shuffling, and prompt optimization, Hammer-7b has achieved exceptional performances across numerous benchmarks including [Berkley Function Calling Leaderboard](https://gorilla.cs.berkeley.edu/leaderboard.html), [API-Bank](https://arxiv.org/abs/2304.08244), [Tool-Alpaca](https://arxiv.org/abs/2306.05301), [Nexus Raven](https://github.com/nexusflowai/NexusRaven-V2) and [Seal-Tools](https://arxiv.org/abs/2405.08355).
23
-
24
- ## Tuning Details
25
- A report with all the technical details leading to our models has been published at "[Hammer: Robust Function-Calling for On-Device Language Models via Function Masking](https://arxiv.org/abs/2410.04587)". All the code for data process, model tuning, and evaluation will also be open-sourced very soon.
26
-
27
 
28
  ## Evaluation
29
- First, we evaluate Hammer series on the Berkeley Function-Calling Leaderboard (BFCL):
30
 
31
  <div style="text-align: center;">
32
  <img src="figures/bfcl.PNG" alt="overview" width="1480" style="margin: auto;">
@@ -34,14 +27,13 @@ First, we evaluate Hammer series on the Berkeley Function-Calling Leaderboard (B
34
 
35
  The above table indicates that within the BFCL framework, our Hammer series consistently achieves corresponding sota performance at comparable scales, particularly Hammer-7B, whose overall performance ranks second only to the proprietary GPT-4.
36
 
37
-
38
  In addition, we evaluated our Hammer series (1.5b, 4b, 7b) on other academic benchmarks to further show our model's generalization ability:
39
 
40
  <div style="text-align: center;">
41
  <img src="figures/others.PNG" alt="overview" width="1000" style="margin: auto;">
42
  </div>
43
 
44
- Upon observing Hammer's performance across various benchmarks unrelated to the APIGen Function Calling Datasets, we find that Hammer demonstrates remarkably stable performance, which indicates the robustness of Hammers. In contrast, the baseline methods exhibit varying degrees of effectiveness across these other benchmarks.
45
 
46
  ## Requiements
47
  The code of Hammer-7b has been in the latest Hugging face transformers and we advise you to install `transformers>=4.37.0`.
 
2
  license: cc-by-4.0
3
  datasets:
4
  - Salesforce/xlam-function-calling-60k
5
+ - MadeAgents/xlam-irrelevance-7.5k
6
  base_model: Qwen/Qwen2-7B-Instruct
7
  ---
8
  # Hammer-7b Function Calling Model
9
 
10
  ## <font color=red>\[Updates!!!\]</font> Hammer 2.0 Series have been Published
11
 
12
+ We're excited to release lightweight Hammer 2.0 models ([0.5B](https://huggingface.co/MadeAgents/Hammer2.0-0.5b) , [1.5B](https://huggingface.co/MadeAgents/Hammer2.0-1.5b) , [3B](https://huggingface.co/MadeAgents/Hammer2.0-3b) , and [7B](https://huggingface.co/MadeAgents/Hammer2.0-7b)) with strong function calling capability, which empower developers to build personalized, on-device agentic applications.
 
 
 
 
13
 
14
 
15
  ## Introduction
16
  **Hammer** is a series of cutting-edge Large Language Models (LLMs) crafted to boost the critical capability of AI agents: function calling. Differing from existing models focusing on training data refinement, Hammer optimizes performance primarily through advanced training techniques. Focusing on on-device applications, we release a number of models from [1.5B](https://huggingface.co/MadeAgents/Hammer-1.5b), [4B](https://huggingface.co/MadeAgents/Hammer-4b) to [7B](https://huggingface.co/MadeAgents/Hammer-7b) parameters.
17
 
18
  ## Model Details
19
+ Hammer2.0 finetuned based on [Qwen 2.0 series](https://huggingface.co/collections/Qwen/qwen2-6659360b33528ced941e557f) using function masking techniques. It's trained using the [APIGen Function Calling Datasets](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) containing 60,000 samples, supplemented by [xlam-irrelevance-7.5k](https://huggingface.co/datasets/MadeAgents/xlam-irrelevance-7.5k) we generated. Hammer has achieved exceptional performances across numerous function calling benchmarks. For more details, please refer to [Hammer: Robust Function-Calling for On-Device Language Models via Function Masking](https://arxiv.org/abs/2410.04587) and [Hammer GitHub repository](https://github.com/MadeAgents/Hammer).
 
 
 
 
20
 
21
  ## Evaluation
22
+ First, we evaluate Hammer series on the Berkeley Function-Calling Leaderboard (BFCL-v2):
23
 
24
  <div style="text-align: center;">
25
  <img src="figures/bfcl.PNG" alt="overview" width="1480" style="margin: auto;">
 
27
 
28
  The above table indicates that within the BFCL framework, our Hammer series consistently achieves corresponding sota performance at comparable scales, particularly Hammer-7B, whose overall performance ranks second only to the proprietary GPT-4.
29
 
 
30
  In addition, we evaluated our Hammer series (1.5b, 4b, 7b) on other academic benchmarks to further show our model's generalization ability:
31
 
32
  <div style="text-align: center;">
33
  <img src="figures/others.PNG" alt="overview" width="1000" style="margin: auto;">
34
  </div>
35
 
36
+ Hammer models showcase highly stable performance, suggesting the robustness of Hammer series. In contrast, the baseline approaches display varying levels of effectiveness.
37
 
38
  ## Requiements
39
  The code of Hammer-7b has been in the latest Hugging face transformers and we advise you to install `transformers>=4.37.0`.