JayosChaos commited on
Commit
00e4fdb
·
verified ·
1 Parent(s): 60a356e

Upload bonus_unit1.py

Browse files
Files changed (1) hide show
  1. bonus_unit1.py +412 -0
bonus_unit1.py ADDED
@@ -0,0 +1,412 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # -*- coding: utf-8 -*-
2
+ """bonus-unit1.ipynb
3
+
4
+ Automatically generated by Colab.
5
+
6
+ Original file is located at
7
+ https://colab.research.google.com/#fileId=https%3A//huggingface.co/agents-course/notebooks/blob/main/bonus-unit1/bonus-unit1.ipynb
8
+
9
+ # Bonus Unit 1: Fine-Tuning a model for Function-Calling
10
+
11
+ In this tutorial, **we're going to Fine-Tune an LLM for Function Calling.**
12
+
13
+ This notebook is part of the <a href="https://www.hf.co/learn/agents-course/unit1/introduction">Hugging Face Agents Course</a>, a free Course from beginner to expert, where you learn to build Agents.
14
+
15
+ <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/communication/share.png" alt="Agent Course"/>
16
+
17
+ ## Prerequisites 🏗️
18
+
19
+ Before diving into the notebook, you need to:
20
+
21
+ 🔲 📚 **Study [What is Function-Calling](https://www.hf.co/learn/agents-course/bonus-unit1/what-is-function-calling) Section**
22
+
23
+ 🔲 📚 **Study [Fine-Tune your Model and what are LoRAs](https://www.hf.co/learn/agents-course/bonus-unit1/fine-tuning) Section**
24
+
25
+ # Step 0: Ask to Access Gemma on Hugging Face
26
+
27
+ <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/bonus-unit1/gemma.png" alt="Gemma"/>
28
+
29
+
30
+ To access Gemma on Hugging Face:
31
+
32
+ 1. **Make sure you're signed in** to your Hugging Face Account
33
+
34
+ 2. Go to https://huggingface.co/google/gemma-2-2b-it
35
+
36
+ 3. Click on **Acknowledge license** and fill the form.
37
+
38
+ Alternatively you can use another model, and modify the code accordingly (it can be a good exercise for you to be sure you know how to fine-tune for Function-Calling).
39
+
40
+ You can use for instance:
41
+
42
+ - [HuggingFaceTB/SmolLM2-1.7B-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct)
43
+
44
+ - [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)
45
+
46
+ ## Step 1: Set the GPU 💪
47
+
48
+ If you're on Colab:
49
+
50
+ - To **accelerate the fine-tuning training, we'll use a GPU**. To do that, go to `Runtime > Change Runtime type`
51
+
52
+ <img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/gpu-step1.jpg" alt="GPU Step 1"/>
53
+
54
+ - `Hardware Accelerator > GPU`
55
+
56
+ <img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/gpu-step2.jpg" alt="GPU Step 2"/>
57
+
58
+
59
+ ### Important
60
+
61
+ For this Unit, **with the free-tier of Colab** it will take around **6h to train**.
62
+
63
+ You have three solutions if you want to make it faster:
64
+
65
+ 1. Train on your computer if you have GPUs. It might take time but you have less risks of timeout.
66
+
67
+ 2. Use a Google Colab Pro that allows you use to A100 GPU (15-20min training).
68
+
69
+ 3. Just follow the code to learn how to do it without training.
70
+
71
+ ## Step 2: Install dependencies 📚
72
+
73
+ We need multiple librairies:
74
+
75
+ - `bitsandbytes` for quantization
76
+ - `peft`for LoRA adapters
77
+ - `Transformers`for loading the model
78
+ - `datasets`for loading and using the fine-tuning dataset
79
+ - `trl`for the trainer class
80
+ """
81
+
82
+ !pip install -q -U bitsandbytes
83
+ !pip install -q -U peft
84
+ !pip install -q -U trl
85
+ !pip install -q -U tensorboardX
86
+ !pip install -q wandb
87
+
88
+ """## Step 3: Create your Hugging Face Token to push your model to the Hub
89
+
90
+ To be able to share your model with the community there are some more steps to follow:
91
+
92
+ 1️⃣ (If it's not already done) create an account to HF ➡ https://huggingface.co/join
93
+
94
+ 2️⃣ Sign in and then, you need to store your authentication token from the Hugging Face website.
95
+
96
+ - Create a new token (https://huggingface.co/settings/tokens) **with write role**
97
+
98
+ <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/bonus-unit1/create_write_token.png" alt="Create HF Token" width="50%">
99
+
100
+ 3️⃣ Store your token as an environment variable under the name "HF_TOKEN"
101
+ - **Be very carefull not to share it with others** !
102
+
103
+ ## Step 4: Import the librairies
104
+
105
+ Don't forget to put your HF token.
106
+ """
107
+
108
+ from enum import Enum
109
+ from functools import partial
110
+ import pandas as pd
111
+ import torch
112
+ import json
113
+
114
+ from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed
115
+ from datasets import load_dataset
116
+ from trl import SFTConfig, SFTTrainer
117
+ from peft import LoraConfig, TaskType
118
+
119
+ seed = 42
120
+ set_seed(seed)
121
+
122
+ import os
123
+
124
+ # Put your HF Token here
125
+ os.environ['HF_TOKEN']="hf_xxxxxxx" # the token should have write access
126
+
127
+ """## Step 5: Processing the dataset into inputs
128
+
129
+ In order to train the model, we need to **format the inputs into what we want the model to learn**.
130
+
131
+ For this tutorial, I enhanced a popular dataset for function calling "NousResearch/hermes-function-calling-v1" by adding some new **thinking** step computer from **deepseek-ai/DeepSeek-R1-Distill-Qwen-32B**.
132
+
133
+ But in order for the model to learn, we need **to format the conversation correctly**. If you followed Unit 1, you know that going from a list of messages to a prompt is handled by the **chat_template**, or, the default chat_template of gemma-2-2B does not contain tool calls. So we will need to modify it !
134
+
135
+ This is the role of our **preprocess** function. To go from a list of messages, to a prompt that the model can understand.
136
+
137
+ """
138
+
139
+ model_name = "google/gemma-2-2b-it"
140
+ dataset_name = "Jofthomas/hermes-function-calling-thinking-V1"
141
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
142
+
143
+ tokenizer.chat_template = "{{ bos_token }}{% if messages[0]['role'] == 'system' %}{{ raise_exception('System role not supported') }}{% endif %}{% for message in messages %}{{ '<start_of_turn>' + message['role'] + '\n' + message['content'] | trim + '<end_of_turn><eos>\n' }}{% endfor %}{% if add_generation_prompt %}{{'<start_of_turn>model\n'}}{% endif %}"
144
+
145
+
146
+ def preprocess(sample):
147
+ messages = sample["messages"]
148
+ first_message = messages[0]
149
+
150
+ # Instead of adding a system message, we merge the content into the first user message
151
+ if first_message["role"] == "system":
152
+ system_message_content = first_message["content"]
153
+ # Merge system content with the first user message
154
+ messages[1]["content"] = system_message_content + "Also, before making a call to a function take the time to plan the function to take. Make that thinking process between <think>{your thoughts}</think>\n\n" + messages[1]["content"]
155
+ # Remove the system message from the conversation
156
+ messages.pop(0)
157
+
158
+ return {"text": tokenizer.apply_chat_template(messages, tokenize=False)}
159
+
160
+
161
+
162
+ dataset = load_dataset(dataset_name)
163
+ dataset = dataset.rename_column("conversations", "messages")
164
+
165
+ """## Step 6: A Dedicated Dataset for This Unit
166
+
167
+ For this Bonus Unit, we created a custom dataset based on [NousResearch/hermes-function-calling-v1](https://huggingface.co/datasets/NousResearch/hermes-function-calling-v1), which is considered a **reference** when it comes to function-calling datasets.
168
+
169
+ While the original dataset is excellent, it does **not** include a **“thinking”** step.
170
+
171
+ In Function-Calling, such a step is optional, but recent work—like the **deepseek** model or the paper ["Test-Time Compute"](https://huggingface.co/papers/2408.03314)—suggests that giving an LLM time to “think” before it answers (or in this case, **before** taking an action) can **significantly improve** model performance.
172
+
173
+ I, decided to then compute a subset of this dataset and to give it to [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) in order to compute some thinking tokens `<think>` before any function call. Which resulted in the following dataset :
174
+ ![Input Dataset](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/bonus-unit1/dataset_function_call.png)
175
+
176
+ """
177
+
178
+ dataset = dataset.map(preprocess, remove_columns="messages")
179
+ dataset = dataset["train"].train_test_split(0.1)
180
+ print(dataset)
181
+
182
+ """## Step 7: Checking the inputs
183
+
184
+ Let's manually look at what an input looks like !
185
+
186
+ In this example we have :
187
+
188
+ 1. A *User message* containing the **necessary information with the list of available tools** inbetween `<tools></tools>` then the user query, here: `"Can you get me the latest news headlines for the United States?"`
189
+
190
+ 2. An *Assistant message* here called "model" to fit the criterias from gemma models containing two new phases, a **"thinking"** phase contained in `<think></think>` and an **"Act"** phase contained in `<tool_call></<tool_call>`.
191
+
192
+ 3. If the model contains a `<tools_call>`, we will append the result of this action in a new **"Tool"** message containing a `<tool_response></tool_response>` with the answer from the tool.
193
+ """
194
+
195
+ # Let's look at how we formatted the dataset
196
+ print(dataset["train"][8]["text"])
197
+
198
+ # Sanity check
199
+ print(tokenizer.pad_token)
200
+ print(tokenizer.eos_token)
201
+
202
+ """## Step 8: Let's Modify the Tokenizer
203
+
204
+ Indeed, as we saw in Unit 1, the tokenizer splits text into sub-words by default. This is **not** what we want for our new special tokens!
205
+
206
+ While we segmented our example using `<think>`, `<tool_call>`, and `<tool_response>`, the tokenizer does **not** yet treat them as whole tokens—it still tries to break them down into smaller pieces. To ensure the model correctly interprets our new format, we must **add these tokens** to our tokenizer.
207
+
208
+ Additionally, since we changed the `chat_template` in our **preprocess** function to format conversations as messages within a prompt, we also need to modify the `chat_template` in the tokenizer to reflect these changes.
209
+ """
210
+
211
+ class ChatmlSpecialTokens(str, Enum):
212
+ tools = "<tools>"
213
+ eotools = "</tools>"
214
+ think = "<think>"
215
+ eothink = "</think>"
216
+ tool_call="<tool_call>"
217
+ eotool_call="</tool_call>"
218
+ tool_response="<tool_reponse>"
219
+ eotool_response="</tool_reponse>"
220
+ pad_token = "<pad>"
221
+ eos_token = "<eos>"
222
+ @classmethod
223
+ def list(cls):
224
+ return [c.value for c in cls]
225
+
226
+ tokenizer = AutoTokenizer.from_pretrained(
227
+ model_name,
228
+ pad_token=ChatmlSpecialTokens.pad_token.value,
229
+ additional_special_tokens=ChatmlSpecialTokens.list()
230
+ )
231
+ tokenizer.chat_template = "{{ bos_token }}{% if messages[0]['role'] == 'system' %}{{ raise_exception('System role not supported') }}{% endif %}{% for message in messages %}{{ '<start_of_turn>' + message['role'] + '\n' + message['content'] | trim + '<end_of_turn><eos>\n' }}{% endfor %}{% if add_generation_prompt %}{{'<start_of_turn>model\n'}}{% endif %}"
232
+
233
+ model = AutoModelForCausalLM.from_pretrained(model_name,
234
+ attn_implementation='eager',
235
+ device_map="auto")
236
+ model.resize_token_embeddings(len(tokenizer))
237
+ model.to(torch.bfloat16)
238
+
239
+ """## Step 9: Let's configure the LoRA
240
+
241
+ This is we are going to define the parameter of our adapter. Those a the most important parameters in LoRA as they define the size and importance of the adapters we are training.
242
+ """
243
+
244
+ from peft import LoraConfig
245
+
246
+ # TODO: Configure LoRA parameters
247
+ # r: rank dimension for LoRA update matrices (smaller = more compression)
248
+ rank_dimension = 16
249
+ # lora_alpha: scaling factor for LoRA layers (higher = stronger adaptation)
250
+ lora_alpha = 64
251
+ # lora_dropout: dropout probability for LoRA layers (helps prevent overfitting)
252
+ lora_dropout = 0.05
253
+
254
+ peft_config = LoraConfig(r=rank_dimension,
255
+ lora_alpha=lora_alpha,
256
+ lora_dropout=lora_dropout,
257
+ target_modules=["gate_proj","q_proj","lm_head","o_proj","k_proj","embed_tokens","down_proj","up_proj","v_proj"], # wich layer in the transformers do we target ?
258
+ task_type=TaskType.CAUSAL_LM)
259
+
260
+ """## Step 10: Let's define the Trainer and the Fine-Tuning hyperparameters
261
+
262
+ In this step, we define the Trainer, the class that we use to fine-tune our model and the hyperparameters.
263
+ """
264
+
265
+ username="Jofthomas"# REPLCAE with your Hugging Face username
266
+ output_dir = "gemma-2-2B-it-thinking-function_calling-V0" # The directory where the trained model checkpoints, logs, and other artifacts will be saved. It will also be the default name of the model when pushed to the hub if not redefined later.
267
+ per_device_train_batch_size = 1
268
+ per_device_eval_batch_size = 1
269
+ gradient_accumulation_steps = 4
270
+ logging_steps = 5
271
+ learning_rate = 1e-4 # The initial learning rate for the optimizer.
272
+
273
+ max_grad_norm = 1.0
274
+ num_train_epochs=1
275
+ warmup_ratio = 0.1
276
+ lr_scheduler_type = "cosine"
277
+ max_seq_length = 1500
278
+
279
+ training_arguments = SFTConfig(
280
+ output_dir=output_dir,
281
+ per_device_train_batch_size=per_device_train_batch_size,
282
+ per_device_eval_batch_size=per_device_eval_batch_size,
283
+ gradient_accumulation_steps=gradient_accumulation_steps,
284
+ save_strategy="no",
285
+ eval_strategy="epoch",
286
+ logging_steps=logging_steps,
287
+ learning_rate=learning_rate,
288
+ max_grad_norm=max_grad_norm,
289
+ weight_decay=0.1,
290
+ warmup_ratio=warmup_ratio,
291
+ lr_scheduler_type=lr_scheduler_type,
292
+ report_to="tensorboard",
293
+ bf16=True,
294
+ hub_private_repo=False,
295
+ push_to_hub=False,
296
+ num_train_epochs=num_train_epochs,
297
+ gradient_checkpointing=True,
298
+ gradient_checkpointing_kwargs={"use_reentrant": False},
299
+ packing=True,
300
+ max_seq_length=max_seq_length,
301
+ )
302
+
303
+ """As Trainer, we use the `SFTTrainer` which is a Supervised Fine-Tuning Trainer."""
304
+
305
+ trainer = SFTTrainer(
306
+ model=model,
307
+ args=training_arguments,
308
+ train_dataset=dataset["train"],
309
+ eval_dataset=dataset["test"],
310
+ processing_class=tokenizer,
311
+ peft_config=peft_config,
312
+ )
313
+
314
+ """Here, we launch the training 🔥. Perfect time for you to pause and grab a coffee ☕."""
315
+
316
+ trainer.train()
317
+ trainer.save_model()
318
+
319
+ """## Step 11: Let's push the Model and the Tokenizer to the Hub
320
+
321
+ Let's push our model and out tokenizer to the Hub ! The model will be pushed under your username + the output_dir that we specified earlier.
322
+ """
323
+
324
+ trainer.push_to_hub(f"{username}/{output_dir}")
325
+
326
+ """Since we also modified the **chat_template** Which is contained in the tokenizer, let's also push the tokenizer with the model."""
327
+
328
+ tokenizer.eos_token = "<eos>"
329
+ # push the tokenizer to hub ( replace with your username and your previously specified
330
+ tokenizer.push_to_hub(f"{username}/{output_dir}", token=True)
331
+
332
+ """## Step 12: Let's now test our model !
333
+
334
+ To so, we will :
335
+
336
+ 1. Load the adapter from the hub !
337
+ 2. Load the base model : **"google/gemma-2-2b-it"** from the hub
338
+ 3. Resize the model to with the new tokens we introduced !
339
+ """
340
+
341
+ from peft import PeftModel, PeftConfig
342
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
343
+ from datasets import load_dataset
344
+ import torch
345
+
346
+ bnb_config = BitsAndBytesConfig(
347
+ load_in_4bit=True,
348
+ bnb_4bit_quant_type="nf4",
349
+ bnb_4bit_compute_dtype=torch.bfloat16,
350
+ bnb_4bit_use_double_quant=True,
351
+ )
352
+
353
+ peft_model_id = f"{username}/{output_dir}" # replace with your newly trained adapter
354
+ device = "auto"
355
+ config = PeftConfig.from_pretrained(peft_model_id)
356
+ model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path,
357
+ device_map="auto",
358
+ )
359
+ tokenizer = AutoTokenizer.from_pretrained(peft_model_id)
360
+ model.resize_token_embeddings(len(tokenizer))
361
+ model = PeftModel.from_pretrained(model, peft_model_id)
362
+ model.to(torch.bfloat16)
363
+ model.eval()
364
+
365
+ print(dataset["test"][8]["text"])
366
+
367
+ """### Testing the model 🚀
368
+
369
+ In that case, we will take the start of one of the samples from the test set and hope that it will generate the expected output.
370
+
371
+ Since we want to test the function-calling capacities of our newly fine-tuned model, the input will be a user message with the available tools, a
372
+
373
+
374
+ ### Disclaimer ⚠️
375
+
376
+ The dataset we’re using **does not contain sufficient training data** and is purely for **educational purposes**. As a result, **your trained model’s outputs may differ** from the examples shown in this course. **Don’t be discouraged** if your results vary—our primary goal here is to illustrate the core concepts rather than produce a fully optimized or production-ready model.
377
+
378
+ """
379
+
380
+ #this prompt is a sub-sample of one of the test set examples. In this example we start the generation after the model generation starts.
381
+ prompt="""<bos><start_of_turn>human
382
+ You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags.You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.Here are the available tools:<tools> [{'type': 'function', 'function': {'name': 'convert_currency', 'description': 'Convert from one currency to another', 'parameters': {'type': 'object', 'properties': {'amount': {'type': 'number', 'description': 'The amount to convert'}, 'from_currency': {'type': 'string', 'description': 'The currency to convert from'}, 'to_currency': {'type': 'string', 'description': 'The currency to convert to'}}, 'required': ['amount', 'from_currency', 'to_currency']}}}, {'type': 'function', 'function': {'name': 'calculate_distance', 'description': 'Calculate the distance between two locations', 'parameters': {'type': 'object', 'properties': {'start_location': {'type': 'string', 'description': 'The starting location'}, 'end_location': {'type': 'string', 'description': 'The ending location'}}, 'required': ['start_location', 'end_location']}}}] </tools>Use the following pydantic model json schema for each tool call you will make: {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
383
+ <tool_call>
384
+ {tool_call}
385
+ </tool_call>Also, before making a call to a function take the time to plan the function to take. Make that thinking process between <think>{your thoughts}</think>
386
+
387
+ Hi, I need to convert 500 USD to Euros. Can you help me with that?<end_of_turn><eos>
388
+ <start_of_turn>model
389
+ <think>"""
390
+
391
+ inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False)
392
+ inputs = {k: v.to("cuda") for k,v in inputs.items()}
393
+ outputs = model.generate(**inputs,
394
+ max_new_tokens=300,# Adapt as necessary
395
+ do_sample=True,
396
+ top_p=0.95,
397
+ temperature=0.01,
398
+ repetition_penalty=1.0,
399
+ eos_token_id=tokenizer.eos_token_id)
400
+ print(tokenizer.decode(outputs[0]))
401
+
402
+ """## Congratulations
403
+ Congratulations on finishing this first Bonus Unit 🥳
404
+
405
+ You've just **mastered what Function-Calling is and how to fine-tune your model to do Function-Calling**!
406
+
407
+ If it's the first time you do this, it's normal that you're feeling puzzled. Take time to check the documentation and understand each part of the code and why we did it this way.
408
+
409
+ Also, don't hesitate to try to **fine-tune different models**. The **best way to learn is by trying.**
410
+
411
+ ### Keep Learning, Stay Awesome 🤗
412
+ """