Fine tuned over a 10K stratified sample of instruct-question-answer triplets gathered from the following sources:
- Medical meadow flashcards
- Medical meadow wikidocs
- HealthcareMagic dataset
- Medical meadow MedQA MCQs
- Medinstruct dataset
- Medquad dataset
- iCliniq dataset
- Medical meadow patient info dataset
- GenMedGPT dataset
Fine-tuned using LoRA for 1 epoch, with rank=64, alpha=16
Usage:
Load using vLLM as follows:
from vllm import LLM, SamplingParams
llm = LLM(model="jiviadmin/biomistral-ft-10k")
sampling_params = SamplingParams(max_tokens=1, # set it same as max_seq_length in SFT Trainer,
temperature=0.1,
skip_special_tokens=True,
repetition_penalty=1.5)
input_data = <YOUR-INPUT-PROMPTS-AS-A-LIST>
prompts = []
outputs_ls = []
TEMPLATE = """{}""" # The prompt is same as training one, just without output part, you can add special tokens like [INST] if needed
def add_prompt(sample):
prompt = TEMPLATE.format(sample)
return prompt
for sample in input_data:
text = add_prompt(sample)
prompts.append(text)
outputs = llm.generate(prompts, sampling_params) # Batch inference
for output in outputs:
generated_text = output.outputs[0].text
outputs_ls.append(generated_text.strip())
Benchmarks:
Model | Prompt Type | Temp | Repetition Penalty | Overall Accuracy | Pubmed | MedQA | MedMCQA | Pubmed questions count | MedQA questions count | MedMCQA questions count |
---|---|---|---|---|---|---|---|---|---|---|
Biomistral - FT 10K | No RAG | 0.1 | 1.5 | 44.19% | 51.36% | 38.29% | 43.58% | 847 | 935 | 888 |
Biomistral - FT 10K | RAG : Highest scoring chunk selected | 0.1 | 1.5 | 82.08% | 95.89% | 67.68% | 82.48% | 998 | 984 | 959 |
Biomistral - FT 10K | RAG : Reranker (BGE V2 M3) used to select chunk | 0.1 | 1.5 | 86.44% | 97.50% | 73.28% | 88.47% | 999 | 988 | 971 |
- Downloads last month
- 684
Inference API (serverless) is not available, repository is disabled.