Holy-fox commited on
Commit
16cfbfa
·
verified ·
1 Parent(s): 0f19b66

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +103 -17
README.md CHANGED
@@ -2,35 +2,121 @@
2
  base_model:
3
  - Qwen/Qwen2.5-32B-Instruct
4
  - karakuri-ai/karakuri-lm-32b-thinking-2501-exp
 
 
 
 
5
  library_name: transformers
6
  tags:
7
  - mergekit
8
  - merge
9
-
 
 
 
 
10
  ---
11
- # SKYDRIVE-32B-v0.1
 
 
 
 
 
 
 
 
12
 
13
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
14
 
15
- ## Merge Details
16
- ### Merge Method
17
 
18
- This model was merged using the [Model Stock](https://arxiv.org/abs/2403.19522) merge method using [Qwen/Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) as a base.
 
 
 
 
 
19
 
20
- ### Models Merged
 
 
 
 
 
 
 
 
 
 
21
 
22
- The following models were included in the merge:
23
- * SKYDRIVE_element_jp_02
24
- * SKYDRIVE_element_jp_03
25
- * [karakuri-ai/karakuri-lm-32b-thinking-2501-exp](https://huggingface.co/karakuri-ai/karakuri-lm-32b-thinking-2501-exp)
26
- * SKYCAVE_element_Sky_jp
27
- * SKYDRIVE_element_jp_04
 
28
 
29
- ### Configuration
 
 
 
 
30
 
31
- The following YAML configuration was used to produce this model:
 
 
 
32
 
33
  ```yaml
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
  merge_method: model_stock
35
 
36
  base_model: Qwen/Qwen2.5-32B-Instruct
@@ -38,9 +124,9 @@ base_model: Qwen/Qwen2.5-32B-Instruct
38
  models:
39
  - model: karakuri-ai/karakuri-lm-32b-thinking-2501-exp
40
  - model: SKYCAVE_element_Sky_jp
 
41
  - model: SKYDRIVE_element_jp_02
42
  - model: SKYDRIVE_element_jp_03
43
- - model: SKYDRIVE_element_jp_04
44
 
45
  dtype: bfloat16
46
 
@@ -48,4 +134,4 @@ pad_to_multiple_of: 512
48
  tokenizer_source: base
49
 
50
  name: SKYDRIVE-32B-v0.1
51
- ```
 
2
  base_model:
3
  - Qwen/Qwen2.5-32B-Instruct
4
  - karakuri-ai/karakuri-lm-32b-thinking-2501-exp
5
+ - NovaSky-AI/Sky-T1-32B-Flash
6
+ - FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview
7
+ - cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese
8
+ - TeamDelta/ABEJA-Qwen2.5-32B-base-jp-v0.1
9
  library_name: transformers
10
  tags:
11
  - mergekit
12
  - merge
13
+ license: apache-2.0
14
+ language:
15
+ - en
16
+ - ja
17
+ pipeline_tag: text-generation
18
  ---
19
+ ## 概要
20
+ このモデルはQwQの長文生成能力とR1の性能を合わせたモデルを作ることを目標にMergekitとFTを用いて製作しました。
21
+
22
+
23
+ ## How to use
24
+ ```python
25
+ from transformers import AutoModelForCausalLM, AutoTokenizer
26
+
27
+ model_name = "DataPilot/SKYDRIVE-32B-v0.1"
28
 
29
+ tokenizer_name = ""
30
 
31
+ if tokenizer_name == "":
32
+ tokenizer_name = model_name
33
 
34
+ model = AutoModelForCausalLM.from_pretrained(
35
+ model_name,
36
+ torch_dtype="auto",
37
+ device_map="auto"
38
+ )
39
+ tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)
40
 
41
+ prompt = "メタデータを解析し、自己進化をするAIであるnurture intelligenceが実現した未来の日常生活の姿を教えてください。"
42
+ messages = [
43
+ {"role": "system", "content": "あなたは優秀な日本語アシスタントであり長考モデルです。問題解決をするための思考をした上で回答を行ってください。"},
44
+ {"role": "user", "content": prompt}
45
+ ]
46
+ text = tokenizer.apply_chat_template(
47
+ messages,
48
+ tokenize=False,
49
+ add_generation_prompt=True
50
+ )
51
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
52
 
53
+ generated_ids = model.generate(
54
+ **model_inputs,
55
+ max_new_tokens=4096
56
+ )
57
+ generated_ids = [
58
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
59
+ ]
60
 
61
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
62
+
63
+ print(response)
64
+
65
+ ```
66
 
67
+ ## 謝辞
68
+ このモデルの作成者皆様と、計算資源を貸していただいたVOLTMINDに感謝します。 問題解決に協力してくださったhayashiさんにも感謝申し上げます。
69
+
70
+ ## Mergekit config
71
 
72
  ```yaml
73
+ merge_method: slerp
74
+ base_model: karakuri-ai/karakuri-lm-32b-thinking-2501-exp
75
+ models:
76
+ - model: karakuri-ai/karakuri-lm-32b-thinking-2501-exp
77
+ - model: NovaSky-AI/Sky-T1-32B-Flash
78
+ parameters:
79
+ t: 0.4
80
+ dtype: bfloat16
81
+ name: SKYCAVE_element_Sky_jp
82
+ ---
83
+ merge_method: breadcrumbs_ties
84
+ base_model: Qwen/Qwen2.5-32B
85
+ tokenizer_source: karakuri-ai/karakuri-lm-32b-thinking-2501-exp
86
+ name: SKYDRIVE_element_jp_01
87
+ models:
88
+ - model: karakuri-ai/karakuri-lm-32b-thinking-2501-exp
89
+ parameters:
90
+ weight: 1.0
91
+ - model: FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview
92
+ parameters:
93
+ weight: 0.75
94
+ dtype: bfloat16
95
+ ---
96
+ merge_method: task_arithmetic
97
+ base_model: Qwen/Qwen2.5-32B
98
+ tokenizer_source: karakuri-ai/karakuri-lm-32b-thinking-2501-exp
99
+ name: SKYDRIVE_element_jp_02
100
+ models:
101
+ - model: karakuri-ai/karakuri-lm-32b-thinking-2501-exp
102
+ parameters:
103
+ weight: 1.0
104
+ - model: cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese
105
+ parameters:
106
+ weight: 0.9
107
+ dtype: bfloat16
108
+ ---
109
+ merge_method: slerp
110
+ base_model: karakuri-ai/karakuri-lm-32b-thinking-2501-exp
111
+ models:
112
+ - model: karakuri-ai/karakuri-lm-32b-thinking-2501-exp
113
+ - model: TeamDelta/ABEJA-Qwen2.5-32B-base-jp-v0.1
114
+ parameters:
115
+ t: 0.5
116
+ dtype: bfloat16
117
+ name: SKYDRIVE_element_jp_03
118
+
119
+ ---
120
  merge_method: model_stock
121
 
122
  base_model: Qwen/Qwen2.5-32B-Instruct
 
124
  models:
125
  - model: karakuri-ai/karakuri-lm-32b-thinking-2501-exp
126
  - model: SKYCAVE_element_Sky_jp
127
+ - model: SKYDRIVE_element_jp_01
128
  - model: SKYDRIVE_element_jp_02
129
  - model: SKYDRIVE_element_jp_03
 
130
 
131
  dtype: bfloat16
132
 
 
134
  tokenizer_source: base
135
 
136
  name: SKYDRIVE-32B-v0.1
137
+ ```