Mistral-7B Fine-Tuned on Synthetic Medical QA with LoRA (8-bit Quantization)

This repository contains a fine-tuned version of the Mistral-7B model for medical question answering tasks, trained on the SyntheticMedicalQA-4336 dataset. The fine-tuning process leverages Low-Rank Adaptation (LoRA) with 8-bit quantization for memory efficiency and uses the PEFT (Parameter Efficient Fine-Tuning) library for optimization.

Model Overview

  • Base Model: alecocc/mistral-7B-SFT-medqa-graph-cot
  • Dataset: SyntheticMedicalQA-4336
  • Fine-Tuning Method: LoRA (Low-Rank Adaptation)
  • Quantization: 8-bit (with BitsAndBytesConfig)
  • Task: Medical Question Answering
  • Library Used: Transformers, PEFT, and BitsAndBytes

Features

  1. Memory Efficiency: Utilizes 8-bit quantization to reduce memory consumption while maintaining performance.
  2. Fine-Tuned for Medical QA: Specifically optimized to answer medical questions using a synthetic dataset.
  3. LoRA Implementation: Applies LoRA to improve fine-tuning efficiency without modifying the full model architecture.
  4. Optimized Training Techniques:
    • Gradient checkpointing
    • Gradient accumulation
    • AdamW optimizer for stable updates

Dataset Details

The model is trained on the SyntheticMedicalQA-4336 dataset, which contains synthetic medical questions and answers. The dataset is processed as follows:

  • Questions are mapped to input_text.
  • Responses are mapped to output_text.
  • The dataset is split into 80% training and 20% validation.

Training Configuration

Tokenizer

The tokenizer is initialized using the base model:

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("alecocc/mistral-7B-SFT-medqa-graph-cot", trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

### Quantization
The model uses 8-bit quantization with the following configuration:
```python
from transformers import BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_8bit=True,
    llm_int8_enable_fp32_cpu_offload=True,
    llm_int8_skip_modules=["lm_head"]
)

LoRA Configuration

LoRA is applied to specific attention projection layers:

from peft import LoraConfig

peft_config = LoraConfig(
    task_type="CAUSAL_LM",
    inference_mode=False,
    r=16,
    lora_alpha=32,
    lora_dropout=0.1,
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj"
    ]
)

Training Arguments

The training process is configured with the following parameters:

from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir="./mistral_medical_finetuned_lora",
    per_device_train_batch_size=1,
    gradient_accumulation_steps=16,
    learning_rate=1e-4,
    num_train_epochs=3,
    logging_steps=10,
    save_strategy="epoch",
    evaluation_strategy="steps",
    eval_steps=50,
    warmup_steps=100,
    gradient_checkpointing=True,
    fp16=False,
    bf16=False,
    optim="adamw_torch",
    max_grad_norm=0.3,
    weight_decay=0.01,
    remove_unused_columns=False
)

Results & Expected Outcomes

  1. The fine-tuned model generates improved responses for medical QA tasks.
  2. LoRA ensures efficient fine-tuning without modifying the full model architecture.
  3. The use of gradient checkpointing and accumulation optimizes memory usage during training.

How to Use This Model

Load the Model and Tokenizer

You can load this fine-tuned model as follows:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("your_username/mistral-7B-medqa-lora-8bit-v1")
model = AutoModelForCausalLM.from_pretrained("your_username/mistral-7B-medqa-lora-8bit-v1")

Generate Answers

To generate answers for medical questions:

input_text = "### Question:\nWhat are the symptoms of diabetes?\n\n### Answer:\n"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Metadata

Field Value
License Apache 2.0
Language English
Base Model alecocc/mistral-7B-SFT-medqa-graph-cot
Pipeline Tag Auto-detected
Tags medical-QA, LoRA, 8bit, mistral
Dataset SyntheticMedicalQA-4336
Metrics Add metrics (e.g., BLEU, ROUGE)

Citation

If you use this model in your work, please cite it as follows:

@misc{mistral_medical_lora_2025,
  author = {Adrien Cohen},
  title = {Mistral-7B Fine-Tuned on SyntheticMedicalQA with LoRA},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/your_username/mistral-7B-medqa-lora-8bit-v1}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.