File size: 1,669 Bytes
85f6698
 
 
 
9bc1d2a
 
85f6698
 
 
 
4f8c7dd
9bbeb49
2c08ccb
9bbeb49
0156674
 
e777a8c
0156674
e777a8c
9bbeb49
4f8c7dd
0156674
 
 
 
 
df21f9b
e777a8c
df21f9b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9bc1d2a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
---
library_name: transformers
tags:
- unsloth
language:
- tr
---

# Model Card for Model ID

Fine-tuned Llama3-8b model with Lora (trained 1 epoch on colap A100 for experimental purposes)

Base Model: unsloth/llama-3-8b-bnb-4bit

Fine-tuning process video: https://www.youtube.com/watch?v=pK8u4QfdLx0&ab_channel=DavidOndrej

Turkish Fine-tune notebook: https://github.com/yudumpacin/LLM/blob/main/Alpaca_%2B_Llama_3_8b_full_Turkish.ipynb

Original unsloth notebook: https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe1Z0kqjyYIkDXp?usp=sharing

Fine-tuning data : 
- Yudum/turkish-instruct-dataset which includes;
  * open question category of atasoglu/databricks-dolly-15k-tr
  * parsak/alpaca-tr-1k-longest
  * TFLai/Turkish-Alpaca
  * umarigan/GPTeacher-General-Instruct-tr

# Usage
```python
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "Yudum/llama3-lora-turkish", 
        max_seq_length = 2048,
        dtype = None,
        load_in_4bit = True,
    )
FastLanguageModel.for_inference(model) # Enable native 2x faster inference

alpaca_prompt = """Altta bir görevi tanımlayan bir talimat ile daha fazla bilgi sağlayan bir girdi bulunmaktadır. İsteği uygun şekilde tamamlayan bir yanıt yazın.

### Talimat:
{}

### Girdi:
{}

### Yanıt:
{}
"""
inputs = tokenizer(
[
    alpaca_prompt.format(
        "Paris'teki meşhur kulenin ismi nedir?", # instruction
        "", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
tokenizer.batch_decode(outputs)
```