MadeAgents
/

Hammer-7b

Safetensors

qwen2

Model card Files Files and versions Community

Guanyu419 commited on Sep 6, 2024

Commit

72544ed

verified ·

1 Parent(s): 092cfc3

Update README.md

Browse files

Files changed (1) hide show

README.md +3 -10

README.md CHANGED Viewed

@@ -13,21 +13,19 @@ Hammer-7b is a cutting-edge Large Language Model (LLM) crafted to boost the crit
 Hammer-7b is a finetuned model built upon [Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct). It's trained using the [APIGen Function Calling Datasets](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) containing 60,000 samples, supplemented by 7,500 irrelevance detection data we generated. Employing innovative training techniques like function masking, function shuffling, and prompt optimization, Hammer-7b has achieved exceptional performances across numerous benchmarks including [Berkley Function Calling Leaderboard](https://gorilla.cs.berkeley.edu/leaderboard.html), [API-Bank](https://arxiv.org/abs/2304.08244), [Tool-Alpaca](https://arxiv.org/abs/2306.05301), [Nexus Raven](https://github.com/nexusflowai/NexusRaven-V2) and [Seal-Tools](https://arxiv.org/abs/2405.08355).
 ## Evaluation
-First, we evaluate our model on the Berkeley Function-Calling Leaderboard (BFCL), and the performance is as follows：
 <div style="text-align: center;">
     <img src="figures/bfcl.PNG" alt="overview" width="1480" style="margin: auto;">
 </div>
-*Note: The rankings are based on the performance metrics provided.*
-In addition, we also evaluated our model on other benchmarks. Below are the results across several benchmarks, derived from evaluations performed in a zero-shot manner. Our model, Hammer-7b, demonstrated superior performance compared to other models. The table below replicates and extends the format found in ["Granite-Function Calling Model"](https://arxiv.org/abs/2407.00121), particularly Table 6: Function Calling Academic Benchmarks.
 <div style="text-align: center;">
     <img src="figures/other.PNG" alt="overview" width="880" style="margin: auto;">
 </div>
-Finally, we evaluate the performance of our model on the [Seal-Tools](https://arxiv.org/abs/2405.08355) dataset, which also achieves better performance.
 <div style="text-align: center;">
     <img src="figures/sealtool.PNG" alt="overview" width="480" style="margin: auto;">
@@ -44,7 +42,6 @@ import json
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model_name = "MadeAgents/Hammer-7b"
 model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto", trust_remote_code=True)
 tokenizer = AutoTokenizer.from_pretrained(model_name)
@@ -70,9 +67,6 @@ The example format is as follows. Please make sure the parameter type is correct
 # Define the input query and available tools
 query = "Where can I find live giveaways for beta access and games? And what's the weather like in New York, US?"
 live_giveaways_by_type = {
     "name": "live_giveaways_by_type",
     "description": "Retrieve live giveaways from the GamerPower API based on the specified type.",
@@ -117,7 +111,6 @@ get_stock_price={
         }
     }
 def convert_to_format_tool(tools):
     ''''''
     if isinstance(tools, dict):

 Hammer-7b is a finetuned model built upon [Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct). It's trained using the [APIGen Function Calling Datasets](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) containing 60,000 samples, supplemented by 7,500 irrelevance detection data we generated. Employing innovative training techniques like function masking, function shuffling, and prompt optimization, Hammer-7b has achieved exceptional performances across numerous benchmarks including [Berkley Function Calling Leaderboard](https://gorilla.cs.berkeley.edu/leaderboard.html), [API-Bank](https://arxiv.org/abs/2304.08244), [Tool-Alpaca](https://arxiv.org/abs/2306.05301), [Nexus Raven](https://github.com/nexusflowai/NexusRaven-V2) and [Seal-Tools](https://arxiv.org/abs/2405.08355).
 ## Evaluation
+First, we evaluate Hammer-7b on the Berkeley Function-Calling Leaderboard (BFCL):
 <div style="text-align: center;">
     <img src="figures/bfcl.PNG" alt="overview" width="1480" style="margin: auto;">
 </div>
+In addition, we evaluated Hammer-7b on other academic benchmarks to further show our model's generalization ability:
 <div style="text-align: center;">
     <img src="figures/other.PNG" alt="overview" width="880" style="margin: auto;">
 </div>
+Finally, we evaluate the performance of our model on the [Seal-Tools](https://arxiv.org/abs/2405.08355) dataset:
 <div style="text-align: center;">
     <img src="figures/sealtool.PNG" alt="overview" width="480" style="margin: auto;">
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model_name = "MadeAgents/Hammer-7b"
 model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto", trust_remote_code=True)
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 # Define the input query and available tools
 query = "Where can I find live giveaways for beta access and games? And what's the weather like in New York, US?"
 live_giveaways_by_type = {
     "name": "live_giveaways_by_type",
     "description": "Retrieve live giveaways from the GamerPower API based on the specified type.",
         }
     }
 def convert_to_format_tool(tools):
     ''''''
     if isinstance(tools, dict):