|
--- |
|
license: llama2 |
|
datasets: |
|
- bertin-project/alpaca-spanish |
|
language: |
|
- es |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
--- |
|
## Llama 2-13b-alpaca-spanish LoRA |
|
This is a LoRA for Llama 2 13B trained on a translated [alpaca dataset](https://huggingface.co/datasets/bertin-project/alpaca-spanish) on an attempt to improve spanish performance of the Llama-2 foundation model with a conversational focus. |
|
|
|
Base model used was [The Bloke's Llama-2-13B-fp16](https://huggingface.co/TheBloke/Llama-2-13B-fp16) trained in 4bit precision with an added padding token. |
|
|
|
## Important INFO |
|
The original Llama 2 model does not have a padding token, this came to be restrictive when training. To address this, I added a padding token to the tokenizer associated with the model. |
|
```python |
|
from transformers import LlamaTokenizer, LlamaForCausalLM |
|
|
|
model_name = 'TheBloke/Llama-2-13B-fp16' |
|
|
|
model = LlamaForCausalLM.from_pretrained(model_name).half() |
|
tokenizer = LlamaTokenizer.from_pretrained(model_name) |
|
|
|
# Add padding token |
|
tokenizer.add_tokens(['<PAD>']) |
|
tokenizer.pad_token = '<PAD>' |
|
|
|
# Resizing the model |
|
model.resize_token_embeddings(len(tokenizer)) |
|
|
|
padded_model_name = 'Llama-2-13B-fp16-padded' |
|
|
|
# Save |
|
tokenizer.save_pretrained(padded_model_name) |
|
model.save_pretrained(padded_model_name) |
|
|
|
``` |
|
|
|
| Training parameteres | | |
|
| ----------- | ----------- | |
|
| LoRA scale | 2 | |
|
| Epochs | 0.75 | |
|
| Learning Rate| 2e-5 | |
|
| Warmup Steps| 100 | |
|
| Loss | 1.07 | |
|
|