serdem commited on
Commit
97122e6
1 Parent(s): 25d58f3

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -0
README.md ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - tr
5
+ ---
6
+
7
+ # Turkcell-LLM-7b-v1
8
+
9
+ This model is an extended version of a Mistral-based Large Language Model (LLM) for Turkish. It was trained on a cleaned Turkish raw dataset containing 5 billion tokens. The training process involved using the DORA method followed by fine-tuning with the LORA method.
10
+
11
+ ## Model Details
12
+
13
+ - **Base Model**: Mistral 7B based LLM
14
+ - **Tokenizer Extension**: Specifically extended for Turkish
15
+ - **Training Dataset**: Cleaned Turkish raw data with 5 billion tokens
16
+ - **Training Method**: Initially with DORA, followed by fine-tuning with LORA
17
+
18
+ ### DORA Configuration
19
+
20
+ - `lora_alpha`: 128
21
+ - `lora_dropout`: 0.05
22
+ - `r`: 64
23
+ - `target_modules`: "all-linear"
24
+
25
+
26
+ ### LORA Fine-Tuning Configuration
27
+
28
+ - `lora_alpha`: 128
29
+ - `lora_dropout`: 0.05
30
+ - `r`: 256
31
+ - `target_modules`: "all-linear"
32
+
33
+ ## Usage Examples
34
+
35
+ ```python
36
+
37
+ from transformers import AutoModelForCausalLM, AutoTokenizer
38
+
39
+ device = "cuda" # the device to load the model onto
40
+
41
+ model = AutoModelForCausalLM.from_pretrained("TURKCELL/Turkcell-LLM-7b-v1")
42
+ tokenizer = AutoTokenizer.from_pretrained("TURKCELL/Turkcell-LLM-7b-v1")
43
+
44
+ messages = [
45
+ {"role": "user", "content": "Türkiye'nin başkenti neresidir?"},
46
+ ]
47
+
48
+ encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")
49
+
50
+ eos_token = tokenizer("<|im_end|>",add_special_tokens=False)["input_ids"][0]
51
+
52
+ model_inputs = encodeds.to(device)
53
+ model.to(device)
54
+
55
+ generated_ids = model.generate(model_inputs,
56
+ max_new_tokens=1024,
57
+ do_sample=True,
58
+ eos_token_id=eos_token)
59
+
60
+ decoded = tokenizer.batch_decode(generated_ids)
61
+ print(decoded[0])
62
+