zake7749
/

gemma-2-2b-it-chinese-kyara-dpo

@@ -118,41 +118,22 @@ license: gemma
 # Kyara: Knowledge Yielding Adaptive Retrieval Augmentation for LLM Fine-tuning
 <p align="left">
-    🤗 <a href="https://huggingface.co/zake7749/gemma-2-2b-it-chinese-kyara-dpo">Hugging Face</a>&nbsp ｜ 🚀<a href="https://github.com/zake7749/kyara">Github</a>&nbsp ｜ &nbsp📑 <a href="#">Paper</a>&nbsp ｜ &nbsp📖 <a href="https://github.com/zake7749/kyara/blob/main/document/README_EN.md">English</a>&nbsp ｜ &nbsp📖 <a href="https://github.com/zake7749/kyara">Chinese</a>
 </p>
 <div style="text-align: center;">
   <img src="https://i.imgur.com/QiWlcYJ.jpeg" alt="kyara"/>
 </div>
 Kyara (Knowledge Yielding Adaptive Retrieval Augmentation) is an experimental project aimed at improving language models through knowledge retrieval processes. The project seeks to enhance the model’s ability to adapt knowledge and improve language comprehension, particularly in underrepresented languages like Traditional Chinese. Given the relatively scarce availability of Traditional Chinese data compared to the vast corpus of English data used for model training, Kyara addresses this gap by expanding the limited corpus for this language.
-To validate the effectiveness of Kyara, we performed full-parameter fine-tuning on `Gemma-2-2b-it`, resulting in the first iteration of the Kyara model. Initial evaluation results can be found in the [Benchmark](#benchmark) section.
-## Table of Content
-- [Benchmark](#benchmark)
-   * [General Benchmark](#general-benchmark)
-   * [Alignment Benchmark](#alignment-benchmark)
-- [Method](#method)
-   * [Dataset Summary](#dataset-summary)
-   * [Dataset Construction](#dataset-construction)
-      + [Base Dataset: Knowledge Injection with Retrieval Augmentation](#base-dataset-knowledge-injection-with-retrieval-augmentation)
-         - [Chinese Math Dataset](#chinese-math-dataset)
-      + [High Quality Dataset: Model Refinement ](#high-quality-dataset-model-refinement)
-   * [Preference Learning](#preference-learning)
-      + [Chinese DPO](#chinese-dpo)
-         - [SPIN/SPPO](#spinsppo)
-         - [RLAIF](#rlaif)
-- [Feature](#feature)
-   * [Retrieval Augmented Generation (Experimental)](#retrieval-augmented-generation-experimental)
-      + [Input](#input)
-      + [Output](#output)
 ## Benchmark
 ### General Benchmark
-All evaluations are based-on zero-shot.
 | Metric                   | Kyara-2b-it    | Gemma-2-2b-it |
 |--------------------------|----------|-------------|
@@ -170,6 +151,14 @@ All evaluations are based-on zero-shot.
  The aggregation method for the groups in TMMLUPlus is macro average, following the practice in the official implementation.
 ### Alignment Benchmark
 | Metric                   | Kyara    | Gemma-2-2b-it | ChatGPT-3.5-1106 |
@@ -198,6 +187,10 @@ All evaluations are based-on zero-shot.
 where the postfixes CHT and CHS represent Traditional Chinese and Simplified Chinese, respectively. To evaluate the performance on Traditional Chinese in AlignBench, we used [OpenCC](https://github.com/BYVoid/OpenCC) with the `s2tw` configuration to convert all questions from Simplified Chinese to Traditional Chinese.
 ## Method
 The following sections provide a brief summary of Kyara's implementation strategy.
@@ -425,16 +418,4 @@ However, the model would respond that this quote is from The "Legend of the Cond
 總結起來，這段話表達了楊過���於自己行為的獨特理解和自豪感。他明白自己的行為和價值觀取決於個人的內心和對正義的追求，而非外界的評價和名利。他也承認了自己的責任，作為唐門下一代，必須繼承和發揚門風，這一點是無可替代的。
 ```
-It is recommended to exercise caution when using language models.
-# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
-Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_zake7749__gemma-2-2b-it-chinese-kyara-dpo)
-|      Metric       |Value|
-|-------------------|----:|
-|Avg.               |19.25|
-|IFEval (0-Shot)    |53.82|
-|BBH (3-Shot)       |19.06|
-|MATH Lvl 5 (4-Shot)| 6.12|
-|GPQA (0-shot)      | 2.24|
-|MuSR (0-shot)      |16.76|
-|MMLU-PRO (5-shot)  |17.48|

 # Kyara: Knowledge Yielding Adaptive Retrieval Augmentation for LLM Fine-tuning
 <p align="left">
+    🤗 <a href="https://huggingface.co/zake7749/gemma-2-2b-it-chinese-kyara-dpo">Hugging Face</a>&nbsp; ｜ 🚀<a href="https://github.com/zake7749/kyara">Github</a>&nbsp; ｜ &nbsp;📑 <a href="#">Paper</a>&nbsp; ｜ &nbsp;📖 <a href="https://github.com/zake7749/kyara/blob/main/document/README_EN.md">English</a>&nbsp; ｜ &nbsp;📖 <a href="https://github.com/zake7749/kyara">Chinese</a>&nbsp; ｜ &nbsp;💻 <a href="https://www.kaggle.com/code/zake7749/kyara-a-compact-yet-powerful-chinese-llm">Kaggle Notebook</a>
 </p>
 <div style="text-align: center;">
   <img src="https://i.imgur.com/QiWlcYJ.jpeg" alt="kyara"/>
 </div>
 Kyara (Knowledge Yielding Adaptive Retrieval Augmentation) is an experimental project aimed at improving language models through knowledge retrieval processes. The project seeks to enhance the model’s ability to adapt knowledge and improve language comprehension, particularly in underrepresented languages like Traditional Chinese. Given the relatively scarce availability of Traditional Chinese data compared to the vast corpus of English data used for model training, Kyara addresses this gap by expanding the limited corpus for this language.
+To validate Kyara's effectiveness, we conducted full-parameter fine-tuning on `Gemma-2-2b-it`, resulting in the first iteration of the Kyara model. Initial evaluation results, as detailed in the [Benchmark](#benchmark) section, demonstrate that Kyara outperforms the original `Gemma-2-2b-it` across various benchmarks, with notable improvements in Chinese language evaluations.
 ## Benchmark
 ### General Benchmark
+The following evaluations are based-on zero-shot.
 | Metric                   | Kyara-2b-it    | Gemma-2-2b-it |
 |--------------------------|----------|-------------|
  The aggregation method for the groups in TMMLUPlus is macro average, following the practice in the official implementation.
+#### [Open-LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
+As of now, Kyara-2b-it is the leading competitor among all 2b-scale models on the OpenLLM Leaderboard.
+<div style="text-align: center">
+  <img src="https://i.imgur.com/Jq3hbP1.png" alt="kyara-2b-it-open-llm-leaderboard">
+</div>
 ### Alignment Benchmark
 | Metric                   | Kyara    | Gemma-2-2b-it | ChatGPT-3.5-1106 |
 where the postfixes CHT and CHS represent Traditional Chinese and Simplified Chinese, respectively. To evaluate the performance on Traditional Chinese in AlignBench, we used [OpenCC](https://github.com/BYVoid/OpenCC) with the `s2tw` configuration to convert all questions from Simplified Chinese to Traditional Chinese.
+## Usage
+Kyara adopts the same architecture as Gemma2, utilizing identical inference and training methods. We have created a [Jupyter Notebook](https://www.kaggle.com/code/zake7749/kyara-a-compact-yet-powerful-chinese-llm) on Kaggle to demonstrate Kyara’s basic functionality. For service-level deployment, we recommend using Sglang or vllm to achieve greater throughput and robustness.
 ## Method
 The following sections provide a brief summary of Kyara's implementation strategy.
 總結起來，這段話表達了楊過���於自己行為的獨特理解和自豪感。他明白自己的行為和價值觀取決於個人的內心和對正義的追求，而非外界的評價和名利。他也承認了自己的責任，作為唐門下一代，必須繼承和發揚門風，這一點是無可替代的。
 ```
+It is recommended to exercise caution when using language models.