Slovak
ju-bezdek commited on
Commit
07693ae
1 Parent(s): 1a89142

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -12,7 +12,7 @@ This repository contains the LORA weights finetuned on the translated version of
12
 
13
  ## Training procedure
14
 
15
- The training was done on the 7B LLaMA model (decapoda-research/llama-7b-hf) quantized to 8bits with following Hyperparameters:
16
 
17
  ```
18
  MICRO_BATCH_SIZE = 3
@@ -20,13 +20,13 @@ BATCH_SIZE = 128
20
  GRADIENT_ACCUMULATION_STEPS = BATCH_SIZE // MICRO_BATCH_SIZE
21
  EPOCHS = 2 # paper uses 3
22
  LEARNING_RATE = 2e-5 # from the original paper
23
- CUTOFF_LEN = 256 # 256 accounts for about 96% of the data
24
  LORA_R = 4
25
  LORA_ALPHA = 16
26
  LORA_DROPOUT = 0.05
27
  ```
28
 
29
- The sole goal of this project is to explore the effects of single language finetuning using the same dataset and methods as the original paper did and comapre the results
30
 
31
  @misc{alpaca,
32
  author = {Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto },
 
12
 
13
  ## Training procedure
14
 
15
+ The training was done on the 7B LLaMA model (decapoda-research/llama-7b-hf) quantized to 8bits with the following Hyperparameters:
16
 
17
  ```
18
  MICRO_BATCH_SIZE = 3
 
20
  GRADIENT_ACCUMULATION_STEPS = BATCH_SIZE // MICRO_BATCH_SIZE
21
  EPOCHS = 2 # paper uses 3
22
  LEARNING_RATE = 2e-5 # from the original paper
23
+ CUTOFF_LEN = 256
24
  LORA_R = 4
25
  LORA_ALPHA = 16
26
  LORA_DROPOUT = 0.05
27
  ```
28
 
29
+ The sole goal of this project is to explore the effects of single-language finetuning using the same dataset and methods as the original paper did and comapre the results
30
 
31
  @misc{alpaca,
32
  author = {Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto },