zake7749 commited on
Commit
1eedbfd
·
verified ·
1 Parent(s): 3d89892

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -36
README.md CHANGED
@@ -118,41 +118,22 @@ license: gemma
118
  # Kyara: Knowledge Yielding Adaptive Retrieval Augmentation for LLM Fine-tuning
119
 
120
  <p align="left">
121
- 🤗 <a href="https://huggingface.co/zake7749/gemma-2-2b-it-chinese-kyara-dpo">Hugging Face</a>&nbsp | 🚀<a href="https://github.com/zake7749/kyara">Github</a>&nbsp | &nbsp📑 <a href="#">Paper</a>&nbsp | &nbsp📖 <a href="https://github.com/zake7749/kyara/blob/main/document/README_EN.md">English</a>&nbsp | &nbsp📖 <a href="https://github.com/zake7749/kyara">Chinese</a>
122
  </p>
 
123
  <div style="text-align: center;">
124
  <img src="https://i.imgur.com/QiWlcYJ.jpeg" alt="kyara"/>
125
  </div>
126
 
127
  Kyara (Knowledge Yielding Adaptive Retrieval Augmentation) is an experimental project aimed at improving language models through knowledge retrieval processes. The project seeks to enhance the model’s ability to adapt knowledge and improve language comprehension, particularly in underrepresented languages like Traditional Chinese. Given the relatively scarce availability of Traditional Chinese data compared to the vast corpus of English data used for model training, Kyara addresses this gap by expanding the limited corpus for this language.
128
 
129
- To validate the effectiveness of Kyara, we performed full-parameter fine-tuning on `Gemma-2-2b-it`, resulting in the first iteration of the Kyara model. Initial evaluation results can be found in the [Benchmark](#benchmark) section.
130
-
131
- ## Table of Content
132
-
133
- - [Benchmark](#benchmark)
134
- * [General Benchmark](#general-benchmark)
135
- * [Alignment Benchmark](#alignment-benchmark)
136
- - [Method](#method)
137
- * [Dataset Summary](#dataset-summary)
138
- * [Dataset Construction](#dataset-construction)
139
- + [Base Dataset: Knowledge Injection with Retrieval Augmentation](#base-dataset-knowledge-injection-with-retrieval-augmentation)
140
- - [Chinese Math Dataset](#chinese-math-dataset)
141
- + [High Quality Dataset: Model Refinement ](#high-quality-dataset-model-refinement)
142
- * [Preference Learning](#preference-learning)
143
- + [Chinese DPO](#chinese-dpo)
144
- - [SPIN/SPPO](#spinsppo)
145
- - [RLAIF](#rlaif)
146
- - [Feature](#feature)
147
- * [Retrieval Augmented Generation (Experimental)](#retrieval-augmented-generation-experimental)
148
- + [Input](#input)
149
- + [Output](#output)
150
 
151
  ## Benchmark
152
 
153
  ### General Benchmark
154
 
155
- All evaluations are based-on zero-shot.
156
 
157
  | Metric | Kyara-2b-it | Gemma-2-2b-it |
158
  |--------------------------|----------|-------------|
@@ -170,6 +151,14 @@ All evaluations are based-on zero-shot.
170
 
171
  The aggregation method for the groups in TMMLUPlus is macro average, following the practice in the official implementation.
172
 
 
 
 
 
 
 
 
 
173
  ### Alignment Benchmark
174
 
175
  | Metric | Kyara | Gemma-2-2b-it | ChatGPT-3.5-1106 |
@@ -198,6 +187,10 @@ All evaluations are based-on zero-shot.
198
 
199
  where the postfixes CHT and CHS represent Traditional Chinese and Simplified Chinese, respectively. To evaluate the performance on Traditional Chinese in AlignBench, we used [OpenCC](https://github.com/BYVoid/OpenCC) with the `s2tw` configuration to convert all questions from Simplified Chinese to Traditional Chinese.
200
 
 
 
 
 
201
  ## Method
202
 
203
  The following sections provide a brief summary of Kyara's implementation strategy.
@@ -425,16 +418,4 @@ However, the model would respond that this quote is from The "Legend of the Cond
425
  總結起來,這段話表達了楊過���於自己行為的獨特理解和自豪感。他明白自己的行為和價值觀取決於個人的內心和對正義的追求,而非外界的評價和名利。他也承認了自己的責任,作為唐門下一代,必須繼承和發揚門風,這一點是無可替代的。
426
  ```
427
 
428
- It is recommended to exercise caution when using language models.
429
- # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
430
- Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_zake7749__gemma-2-2b-it-chinese-kyara-dpo)
431
-
432
- | Metric |Value|
433
- |-------------------|----:|
434
- |Avg. |19.25|
435
- |IFEval (0-Shot) |53.82|
436
- |BBH (3-Shot) |19.06|
437
- |MATH Lvl 5 (4-Shot)| 6.12|
438
- |GPQA (0-shot) | 2.24|
439
- |MuSR (0-shot) |16.76|
440
- |MMLU-PRO (5-shot) |17.48|
 
118
  # Kyara: Knowledge Yielding Adaptive Retrieval Augmentation for LLM Fine-tuning
119
 
120
  <p align="left">
121
+ 🤗 <a href="https://huggingface.co/zake7749/gemma-2-2b-it-chinese-kyara-dpo">Hugging Face</a>&nbsp; | 🚀<a href="https://github.com/zake7749/kyara">Github</a>&nbsp; | &nbsp;📑 <a href="#">Paper</a>&nbsp; | &nbsp;📖 <a href="https://github.com/zake7749/kyara/blob/main/document/README_EN.md">English</a>&nbsp; | &nbsp;📖 <a href="https://github.com/zake7749/kyara">Chinese</a>&nbsp; | &nbsp;💻 <a href="https://www.kaggle.com/code/zake7749/kyara-a-compact-yet-powerful-chinese-llm">Kaggle Notebook</a>
122
  </p>
123
+
124
  <div style="text-align: center;">
125
  <img src="https://i.imgur.com/QiWlcYJ.jpeg" alt="kyara"/>
126
  </div>
127
 
128
  Kyara (Knowledge Yielding Adaptive Retrieval Augmentation) is an experimental project aimed at improving language models through knowledge retrieval processes. The project seeks to enhance the model’s ability to adapt knowledge and improve language comprehension, particularly in underrepresented languages like Traditional Chinese. Given the relatively scarce availability of Traditional Chinese data compared to the vast corpus of English data used for model training, Kyara addresses this gap by expanding the limited corpus for this language.
129
 
130
+ To validate Kyara's effectiveness, we conducted full-parameter fine-tuning on `Gemma-2-2b-it`, resulting in the first iteration of the Kyara model. Initial evaluation results, as detailed in the [Benchmark](#benchmark) section, demonstrate that Kyara outperforms the original `Gemma-2-2b-it` across various benchmarks, with notable improvements in Chinese language evaluations.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
131
 
132
  ## Benchmark
133
 
134
  ### General Benchmark
135
 
136
+ The following evaluations are based-on zero-shot.
137
 
138
  | Metric | Kyara-2b-it | Gemma-2-2b-it |
139
  |--------------------------|----------|-------------|
 
151
 
152
  The aggregation method for the groups in TMMLUPlus is macro average, following the practice in the official implementation.
153
 
154
+ #### [Open-LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
155
+
156
+ As of now, Kyara-2b-it is the leading competitor among all 2b-scale models on the OpenLLM Leaderboard.
157
+
158
+ <div style="text-align: center">
159
+ <img src="https://i.imgur.com/Jq3hbP1.png" alt="kyara-2b-it-open-llm-leaderboard">
160
+ </div>
161
+
162
  ### Alignment Benchmark
163
 
164
  | Metric | Kyara | Gemma-2-2b-it | ChatGPT-3.5-1106 |
 
187
 
188
  where the postfixes CHT and CHS represent Traditional Chinese and Simplified Chinese, respectively. To evaluate the performance on Traditional Chinese in AlignBench, we used [OpenCC](https://github.com/BYVoid/OpenCC) with the `s2tw` configuration to convert all questions from Simplified Chinese to Traditional Chinese.
189
 
190
+ ## Usage
191
+
192
+ Kyara adopts the same architecture as Gemma2, utilizing identical inference and training methods. We have created a [Jupyter Notebook](https://www.kaggle.com/code/zake7749/kyara-a-compact-yet-powerful-chinese-llm) on Kaggle to demonstrate Kyara’s basic functionality. For service-level deployment, we recommend using Sglang or vllm to achieve greater throughput and robustness.
193
+
194
  ## Method
195
 
196
  The following sections provide a brief summary of Kyara's implementation strategy.
 
418
  總結起來,這段話表達了楊過���於自己行為的獨特理解和自豪感。他明白自己的行為和價值觀取決於個人的內心和對正義的追求,而非外界的評價和名利。他也承認了自己的責任,作為唐門下一代,必須繼承和發揚門風,這一點是無可替代的。
419
  ```
420
 
421
+ It is recommended to exercise caution when using language models.