license: cc-by-4.0
datasets:
- Salesforce/xlam-function-calling-60k
base_model: Qwen/Qwen2-7B-Instruct
Hammer-7b Function Calling Model
Introduction
Function calling is a pivotal capability for large language models, enabling them to interact with software systems and databases in a dynamic and efficient manner. However, during the evaluation of the function calling capabilities of open-source large language models, several critical issues have been identified that warrant attention and refinement.
- 1: One paramount issue identified is the marked propensity of fine-tuned language models to rely extensively on the lexical cues provided by function and parameter names during the invocation process. These names often act as semantic beacons, frequently derived from the concatenation or abbreviation of related concepts. Although such nomenclature serves useful in encapsulating semantic information, this reliance presents a stark divergence from the cognitive processes employed by human users during function calls. Humans, in contrast, typically prioritize the semantic richness of function and parameter descriptions, which encapsulate a more holistic understanding of the intended operations.
- 2: Furthermore, for tools resembling those encountered during training, models tend to rely on previously ingrained knowledge, potentially resulting in erroneous function calls—particularly when dealing with default parameter values.
Addressing these multifaceted issues necessitates a refined and sophisticated approach to model training and optimization. To this end, we have meticulously developed an advanced function calling model through the fine-tuning of the Qwen2-7B-instruct. The ensuing sections provide a comprehensive overview of the methods and processes implemented during the training phase to mitigate these issues effectively:
Data Extraction and Preparation: We extracted 7.5k sample data from Salesforce/xlam-function-calling-60k and removed the target tools from the candidate toolset to generate irrelevant data samples. This data was mixed with 60k XLAM data samples for training.
Methodology: We utilized our proposed function/parameter mask technique, a dynamic data augmentation method. This approach enhances the model's focus on descriptive information within tool definitions. The specific mask operations include:
- Function Mask: Replacing the function name with a randomly generated string to ensure the model pays more attention to the function description.
- Parameter Mask: Replacing the parameter name with a randomly generated string, encouraging the model to focus on the parameter description.
- Default Value Mask: Replacing the parameter's default value with a randomly generated string to prevent the model from overfitting to specific tools.
Additionally, we apply a shuffling technique to randomly reorder functions within the function set.
Prompt Optimization: During inference, since our model focuses more on function/parameter descriptions, we added default value information in parameter descriptions to obtain better performance.
Supported Function Calling Types
The model is capable of handling various function calling scenarios, including:
- Single Function Calling
- Multiple Function Calling
- Parallel Function Calling
- Multiple Parallel Function Calling
- Irrelevance Detection
Performance
- First, we evaluate our model on the Berkeley Function-Calling Leaderboard (BFCL), and the performance is as follows:
Model Performance Rankings
Rank | Overall Acc | Model | AST Summary | Exec Summary | Irrelevance | Relevance | Organization | License |
---|---|---|---|---|---|---|---|---|
1 | 85.79 | GPT-4-0125-Preview (Prompt) | 85.5 | 89.25 | 61.35 | 97.56 | OpenAI | Proprietary |
2 | 85.00 | GPT-4-1106-Preview (Prompt) | 86.31 | 87.38 | 64.98 | 90.24 | OpenAI | Proprietary |
3 | 84.74 | GPT-4-0613 (Prompt) | 84.66 | 87.57 | 75.57 | 82.93 | OpenAI | Proprietary |
4 | 83.92 | MadeAgents/hammer-7b | 78.7 | 89.71 | 72.87 | 92.68 | MadeAgents | cc-by-nc-4.0 |
5 | 83.89 | GPT-4-turbo-2024-04-09 (Prompt) | 85.41 | 88.12 | 61.82 | 82.93 | OpenAI | Proprietary |
6 | 83.35 | GPT-4o-mini-2024-07-18 (Prompt) | 80.51 | 87.95 | 79.2 | 80.49 | OpenAI | Proprietary |
7 | 83.13 | GPT-4o-2024-05-13 (Prompt) | 83.83 | 85.12 | 77.44 | 78.05 | OpenAI | Proprietary |
8 | 82.99 | MadeAgents/hammer-7b(nomask) | 77.38 | 89.12 | 73.63 | 90.24 | MadeAgents | cc-by-nc-4.0 |
9 | 82.55 | Functionary-Medium-v3.1 (FC) | 81.06 | 89.32 | 73.23 | 70.73 | MeetKai | MIT |
10 | 81.78 | GPT-4-1106-Preview (FC) | 77.95 | 87.61 | 72.7 | 82.93 | OpenAI | Proprietary |
11 | 81.59 | Meta-Llama-3-70B-Instruct (Prompt) | 80.15 | 88.04 | 50.47 | 92.68 | Meta | Meta Llama 3 Community |
12 | 80.88 | Claude-3-Opus-20240229 (Prompt) | 79.42 | 87.39 | 56.15 | 85.37 | Anthropic | Proprietary |
13 | 80.87 | GPT-4-0125-Preview (FC) | 77.02 | 85.3 | 74.03 | 85.37 | OpenAI | Proprietary |
14 | 80.23 | Nemotron-4-340b-instruct (Prompt) | 76.67 | 83.38 | 84.1 | 78.05 | NVIDIA | nvidia-open-model-license |
15 | 80.21 | Functionary-Small-v3.1 (FC) | 78.64 | 83.45 | 68.36 | 85.37 | MeetKai | MIT |
16 | 79.66 | mistral-large-2407 (FC Any) | 85.61 | 88.45 | 0.34 | 100 | Mistral AI | Proprietary |
17 | 79.55 | GPT-4o-2024-05-13 (FC) | 79.45 | 83.36 | 73.5 | 70.73 | OpenAI | Proprietary |
18 | 79.41 | xLAM-7b-fc-r (FC) | 72.77 | 85.68 | 79.76 | 80.49 | Salesforce | cc-by-nc-4.0 |
19 | 79.25 | GPT-4o-mini-2024-07-18 (FC) | 77.63 | 81.8 | 71.83 | 82.93 | OpenAI | Proprietary |
20 | 79.14 | Open-Mixtral-8x22b (Prompt) | 75.6 | 86.71 | 71.42 | 70.73 | Mistral AI | Proprietary |
Note: The rankings are based on the performance metrics provided.
Upcoming Developments
We are actively working on preparing smaller models derived from this architecture, which will be open-sourced soon.
Example Usage
This is a simple example of how to use our model.
import json
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "/home/notebook/data/group/ComplexTaskDecision/Hammer/ckpt/select_caller/xlam_7B/xlam_mask3_0.33_hammer_qwen7b_batch32/merge_step4220_bf16"
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Please use our provided instruction prompt for best performance
TASK_INSTRUCTION = """You are a tool calling assistant. In order to complete the user's request, you need to select one or more appropriate tools from the following tools and fill in the correct values for the tool parameters. Your specific tasks are:
1. Make one or more function/tool calls to meet the request based on the question.
2. If none of the function can be used, point it out and refuse to answer.
3. If the given question lacks the parameters required by the function, also point it out.
"""
FORMAT_INSTRUCTION = """
The output MUST strictly adhere to the following JSON format, and NO other text MUST be included.
The example format is as follows. Please make sure the parameter type is correct. If no function call is needed, please directly output an empty list '[]'
```
[
{"name": "func_name1", "arguments": {"argument1": "value1", "argument2": "value2"}},
... (more tool calls as required)
]
```
"""
# Define the input query and available tools
query = "Where can I find live giveaways for beta access and games? And what's the weather like in New York, US?"
live_giveaways_by_type = {
"name": "live_giveaways_by_type",
"description": "Retrieve live giveaways from the GamerPower API based on the specified type.",
"parameters": {
"type": "object",
"properties": {
"type": {
"type": "string",
"description": "The type of giveaways to retrieve (e.g., game, loot, beta).",
"default": "game"
}
},
"required": ["type"]
}
}
get_current_weather={
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
}
},
"required": ["location"]
}
}
get_stock_price={
"name": "get_stock_price",
"description": "Retrieves the current stock price for a given ticker symbol. The ticker symbol must be a valid symbol for a publicly traded company on a major US stock exchange like NYSE or NASDAQ. The tool will return the latest trade price in USD. It should be used when the user asks about the current or most recent price of a specific stock. It will not provide any other information about the stock or company.",
"parameters": {
"type": "object",
"properties": {
"ticker": {
"type": "string",
"description": "The stock ticker symbol, e.g. AAPL for Apple Inc."
}
},
"required": ["ticker"]
}
}
# Helper function to convert openai format tools to our more concise xLAM format
def convert_to_format_tool(tools):
''''''
if isinstance(tools, dict):
format_tools = {
"name": tools["name"],
"description": tools["description"],
"parameters": tools["parameters"].get("properties", {}),
}
required = tools["parameters"].get("required", [])
for param in required:
format_tools["parameters"][param]["required"] = True
for param in format_tools["parameters"].keys():
if "default" in format_tools["parameters"][param]:
default = format_tools["parameters"][param]["default"]
format_tools["parameters"][param]["description"]+=f"default is \'{default}\'"
return format_tools
elif isinstance(tools, list):
return [convert_to_format_tool(tool) for tool in tools]
else:
return tools
# Helper function to build the input prompt for our model
def build_prompt(task_instruction: str, format_instruction: str, tools: list, query: str):
prompt = f"[BEGIN OF TASK INSTRUCTION]\n{task_instruction}\n[END OF TASK INSTRUCTION]\n\n"
prompt += f"[BEGIN OF AVAILABLE TOOLS]\n{json.dumps(tools)}\n[END OF AVAILABLE TOOLS]\n\n"
prompt += f"[BEGIN OF FORMAT INSTRUCTION]\n{format_instruction}\n[END OF FORMAT INSTRUCTION]\n\n"
prompt += f"[BEGIN OF QUERY]\n{query}\n[END OF QUERY]\n\n"
return prompt
# Build the input and start the inference
openai_format_tools = [live_giveaways_by_type, get_current_weather,get_stock_price]
format_tools = convert_to_format_tool(openai_format_tools)
content = build_prompt(TASK_INSTRUCTION, FORMAT_INSTRUCTION, format_tools, query)
messages=[
{ 'role': 'user', 'content': content}
]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
# tokenizer.eos_token_id is the id of <|EOT|> token
outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))
Feel free to reach out for further clarifications or contributions!