File size: 1,866 Bytes
dd516e4 781a703 dd516e4 3649eca dd516e4 bb06e59 0c2ec55 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
---
language:
- en
license: cc-by-nc-4.0
tags:
- text-generation-inference
- transformers
- unsloth
- mistral
- trl
base_model: alnrg2arg/blockchainlabs_7B_merged_test2_4
datasets:
- Intel/orca_dpo_pairs
---
This is a model from blockchainlab test 2.4 which are merged - alnrg2arg/blockchainlabs_7B_merged_test2_4.
The project is running to make a small LLM for a on-device purpose.
Overall pipeline for this iteration is
1.Merging to make a base model (7B) 2.Prune the model to reduce the parameter (50% sparcity) 3.For recovery phase of the pruning, the DPO is chosen.
This model which is not pruned is intended to compare with the pruned model.
This is the code and parameters I chose for this model(DPO).
```
from transformers import TrainingArguments, AutoModelForCausalLM
from trl import DPOTrainer
dpo_trainer = DPOTrainer(
model = model,
ref_model = None,
args = TrainingArguments(
per_device_train_batch_size = 8,
gradient_accumulation_steps = 8,
warmup_ratio = 0.1,
num_train_epochs = 3,
learning_rate = 5e-6,
fp16 = not torch.cuda.is_bf16_supported(),
bf16 = torch.cuda.is_bf16_supported(),
logging_steps = 1,
optim = "adamw_8bit",
weight_decay = 0.0,
lr_scheduler_type = "linear",
seed = 42,
output_dir = "output_DPO",
),
beta = 0.1,
train_dataset = dataset,
# eval_dataset = raw_datasets["test"],
tokenizer = tokenizer,
max_length = 1024,
max_prompt_length = 512,
)
```
The code and parameters are borrowed from https://colab.research.google.com/drive/1SKrKGV-BZoU4kv5q3g0jtE_OhRgPtrrQ?usp=sharing
Benchmark scores
| Tasks |Version|Filter|n-shot|Metric|Value | |Stderr|
|----------|------:|------|-----:|------|-----:|---|-----:|
|winogrande| 1|none | 5|acc |0.8248|± |0.0107| |