Update README.md
Browse files
README.md
CHANGED
@@ -16,14 +16,6 @@ tags:
|
|
16 |
---
|
17 |
|
18 |
|
19 |
-
The only difference from Llama-3.2-1B-chatml-tool-v1 is that it uses AlternateTokenizer, which does not define tool-related tokens (<tools>, <tool_call>, <tool_response>).
|
20 |
-
|
21 |
-
In the case of the existing tool-AlternateTokenizer, the <tool_call> tag was not properly generated before the function call, but in v2, it was observed that it performed well when trained with the general AlternateTokenizer.
|
22 |
-
|
23 |
-
need to check whether this phenomenon is repeated in larger models (3B, 8B).
|
24 |
-
|
25 |
-
|
26 |
-
|
27 |
## Model Performance Comparison (BFCL)
|
28 |
|
29 |
| task name | minpeter/Llama-3.2-1B-chatml-tool-v2 | meta-llama/Llama-3.2-1B-Instruct (measure) | meta-llama/Llama-3.2-1B-Instruct (Reported) |
|
@@ -33,5 +25,12 @@ need to check whether this phenomenon is repeated in larger models (3B, 8B).
|
|
33 |
| simple | **0.72** | 0.215 | 0.2925 |
|
34 |
| multiple | **0.695** | 0.17 | 0.335 |
|
35 |
|
36 |
-
|
37 |
*Parallel calls are not taken into account. 0 points are expected. We plan to fix this in v3.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
---
|
17 |
|
18 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
## Model Performance Comparison (BFCL)
|
20 |
|
21 |
| task name | minpeter/Llama-3.2-1B-chatml-tool-v2 | meta-llama/Llama-3.2-1B-Instruct (measure) | meta-llama/Llama-3.2-1B-Instruct (Reported) |
|
|
|
25 |
| simple | **0.72** | 0.215 | 0.2925 |
|
26 |
| multiple | **0.695** | 0.17 | 0.335 |
|
27 |
|
|
|
28 |
*Parallel calls are not taken into account. 0 points are expected. We plan to fix this in v3.
|
29 |
+
|
30 |
+
### Note
|
31 |
+
|
32 |
+
The only difference from Llama-3.2-1B-chatml-tool-v1 is that it uses AlternateTokenizer, which does not define tool-related tokens (`<tools>`, `<tool_call>`, `<tool_response>`).
|
33 |
+
|
34 |
+
In the case of the existing tool-AlternateTokenizer, the `<tool_call>` tag was not properly generated before the function call, but in v2, it was observed that it performed well when trained with the general AlternateTokenizer.
|
35 |
+
|
36 |
+
We need to check whether this phenomenon is repeated in larger models (3B, 8B).
|