File size: 5,052 Bytes
dcf1edf 75e3e07 dcf1edf 8a66607 dcf1edf 8a66607 dcf1edf 75e3e07 dcf1edf 8a66607 dcf1edf 8a66607 dcf1edf 75e3e07 dcf1edf 75e3e07 dcf1edf 75e3e07 dcf1edf 75e3e07 dcf1edf 75e3e07 dcf1edf 75e3e07 dcf1edf 75e3e07 dcf1edf 75e3e07 dcf1edf 75e3e07 dcf1edf 75e3e07 dcf1edf 75e3e07 dcf1edf 75e3e07 dcf1edf 75e3e07 dcf1edf 75e3e07 dcf1edf 75e3e07 dcf1edf 75e3e07 dcf1edf 75e3e07 dcf1edf 75e3e07 dcf1edf 75e3e07 dcf1edf 75e3e07 dcf1edf 75e3e07 dcf1edf 75e3e07 dcf1edf 75e3e07 dcf1edf 75e3e07 dcf1edf 75e3e07 7e421c2 75e3e07 8debcd2 8a66607 8debcd2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 |
---
library_name: transformers
tags:
- medical
license: llama3
language:
- en
---
# BioMed LLaMa-3 8B
Meta AI released the Llama-3 family of LLMs, composed of two models of the next generation of Llama, Meta Llama 3, available for broad use. This release features pretrained and instruction-fine-tuned language models with 8B and 70B parameters that can support a broad range of use cases.
Llama-3 is a decoder-only transformer architecture with a 128K-token vocabulary and grouped query attention to improve inference efficiency. It has been trained on sequences of 8192 tokens.
Llama-3 achieved state-of-the-art performance, enhancing capabilities in reasoning, code generation, and instruction following. It is expected to outperform Claude Sonnet, Mistral Medium, and GPT-3.5 on a number of benchmarks.
## Model Details
Powerful LLMs are trained on large amounts of unstructured data and are great at general text generation. BioMed-LLaMa-3-8B based on [Llama-3-8b](https://huggingface.co/meta-llama/Meta-Llama-3-8B) addresses some constraints related to using off-the-shelf pre-trained LLMs, especially in the biomedical domain:
* Efficiently fine-tuned LLaMa-3-8B on medical instruction Alpaca data, encompassing over 54K instruction-focused examples.
* Fine-tuned using QLoRa to further reduce memory usage while maintaining model performance and enhancing its capabilities in the biomedical domain.
![finetuning](assets/finetuning.png "LLaMa-3 Fine-Tuning")
## ⚙️ Config
| Parameter | Value |
|-------------------|-------------|
| learning rate | 1e-8 |
| Optimizer | Adam |
| Betas | (0.9, 0.99) |
| adam_epsilon | 1e-8 |
| Lora Alpha | 16 |
| R | 8 |
| Lora Dropout | 0.05 |
| Load in 4 bits | True |
| Flash Attention 2 | True |
| Train Batch Size | 8 |
| Valid Batch Size | 8 |
| Max Seq Length | 512 |
## 💻 Usage
```python
# Installations
!pip install peft --quiet
!pip install bitsandbytes --quiet
!pip install transformers --quiet
!pip install flash-attn --no-build-isolation --quiet
# Imports
import torch
from peft import LoraConfig, PeftModel
from transformers import (
AutoTokenizer,
BitsAndBytesConfig,
AutoModelForCausalLM)
# generate_prompt function
def generate_prompt(instruction, input=None):
if input:
return f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. # noqa: E501
### Instruction:
{instruction}
### Input:
{input}
### Response:
"""
else:
return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request. # noqa: E501
### Instruction:
{instruction}
### Response:
"""
# Model Loading Configuration
based_model_path = "meta-llama/Meta-Llama-3-8B"
lora_weights = "NouRed/BioMed-Tuned-Llama-3-8b"
load_in_4bit=True
bnb_4bit_use_double_quant=True
bnb_4bit_quant_type="nf4"
bnb_4bit_compute_dtype=torch.bfloat16
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Load Tokenizer
tokenizer = AutoTokenizer.from_pretrained(
based_model_path,
)
tokenizer.padding_side = 'right'
tokenizer.pad_token = tokenizer.eos_token
tokenizer.add_eos_token = True
# Load Base Model in 4 Bits
quantization_config = BitsAndBytesConfig(
load_in_4bit=load_in_4bit,
bnb_4bit_use_double_quant=bnb_4bit_use_double_quant,
bnb_4bit_quant_type=bnb_4bit_quant_type,
bnb_4bit_compute_dtype=bnb_4bit_compute_dtype
)
base_model = AutoModelForCausalLM.from_pretrained(
based_model_path,
device_map="auto",
attn_implementation="flash_attention_2", # I have an A100 GPU with 40GB of RAM 😎
quantization_config=quantization_config,
)
# Load Peft Model
model = PeftModel.from_pretrained(
base_model,
lora_weights,
torch_dtype=torch.float16,
)
# Prepare Input
instruction = "I have a sore throat, slight cough, tiredness. should i get tested fro covid 19?"
prompt = generate_prompt(instruction)
inputs = tokenizer(prompt, return_tensors="pt").to(device)
# Generate Text
with torch.no_grad():
generation_output = model.generate(
**inputs,
max_new_tokens=128
)
# Decode Output
output = tokenizer.decode(
generation_output[0],
skip_special_tokens=True,
clean_up_tokenization_spaces=True)
print(output)
```
## 📋 Cite Us
```
@misc{biomedllama32024zekaoui,
author = {Nour Eddine Zekaoui},
title = {BioMed-LLaMa-3: Efficient Instruction Fine-Tuning in Biomedical Language},
year = {2024},
howpublished = {In Hugging Face Model Hub},
url = {https://huggingface.co/NouRed/BioMed-Tuned-Llama-3-8b}
}
```
```
@article{llama3modelcard,
title={Llama 3 Model Card},
author={AI@Meta},
year={2024},
url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
}
```
Created with ❤️ by [@NZekaoui](https://twitter.com/NZekaoui) |