jaewanlee commited on
Commit
c6146d6
·
verified ·
1 Parent(s): 9e050f4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -14,7 +14,7 @@ language:
14
 
15
  `daily_hug` is a conversational model designed to engage users in friendly, everyday conversations in Korean. While the model predominantly focuses on light and casual discussions, it is also capable of identifying signs of serious mental health issues. When such signs are detected, the model will gently suggest that there may be an issue worth considering. This makes `daily_hug` both a supportive conversational partner and a helpful companion in times of need.
16
 
17
- The model is based on the Gemma architecture and has been fine-tuned with a conversational dataset to make its responses friendly, natural, and empathetic. The dataset used is [jaewanlee/korean_chat_friendly](https://huggingface.co/datasets/jaewanlee/korean_chat_friendly).
18
 
19
  ## Model Description
20
  * Model Name: daily_hug
@@ -40,7 +40,7 @@ The `daily_hug` model was fine-tuned using a combination of low-rank adaptation
40
  ### Training Process:
41
  1. **Base Model**: The starting point for `daily_hug` was the google/`gemma-2b-it model`, a large language model optimized for natural language generation tasks.
42
  2. **LoRA Adapters:** LoRA (Low-Rank Adaptation) was employed to efficiently fine-tune the model while reducing memory overhead. This allowed us to update a small number of parameters rather than fine-tuning the entire model, making it faster and more resource-efficient.
43
- 3. **Dataset**: The fine-tuning process used the [jaewanlee/korean_chat_friendly](https://huggingface.co/datasets/jaewanlee/korean_chat_friendly) dataset, which contains friendly, conversational data in Korean. The dataset was structured to mimic daily life conversations, with a particular focus on empathy and casual dialogue.
44
  4. **Training Configuration**:
45
  * Optimizer: The AdamW optimizer was employed with 8-bit precision (`paged_adamw_8bit`) to handle large model parameters effectively.
46
  * Batch Size: Due to hardware constraints, gradient accumulation was used to simulate larger batch sizes, allowing the model to train effectively with smaller memory requirements.
 
14
 
15
  `daily_hug` is a conversational model designed to engage users in friendly, everyday conversations in Korean. While the model predominantly focuses on light and casual discussions, it is also capable of identifying signs of serious mental health issues. When such signs are detected, the model will gently suggest that there may be an issue worth considering. This makes `daily_hug` both a supportive conversational partner and a helpful companion in times of need.
16
 
17
+ The model is based on the Gemma architecture and has been fine-tuned with a conversational dataset to make its responses friendly, natural, and empathetic. The dataset used is [JaeJiMin/korean_chat_friendly](https://huggingface.co/datasets/JaeJiMin/korean_chat_friendly).
18
 
19
  ## Model Description
20
  * Model Name: daily_hug
 
40
  ### Training Process:
41
  1. **Base Model**: The starting point for `daily_hug` was the google/`gemma-2b-it model`, a large language model optimized for natural language generation tasks.
42
  2. **LoRA Adapters:** LoRA (Low-Rank Adaptation) was employed to efficiently fine-tune the model while reducing memory overhead. This allowed us to update a small number of parameters rather than fine-tuning the entire model, making it faster and more resource-efficient.
43
+ 3. **Dataset**: The fine-tuning process used the [JaeJiMin/korean_chat_friendly](https://huggingface.co/datasets/JaeJiMin/korean_chat_friendly) dataset, which contains friendly, conversational data in Korean. The dataset was structured to mimic daily life conversations, with a particular focus on empathy and casual dialogue.
44
  4. **Training Configuration**:
45
  * Optimizer: The AdamW optimizer was employed with 8-bit precision (`paged_adamw_8bit`) to handle large model parameters effectively.
46
  * Batch Size: Due to hardware constraints, gradient accumulation was used to simulate larger batch sizes, allowing the model to train effectively with smaller memory requirements.