File size: 2,331 Bytes

c17b3ea
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aa3749b
 
c17b3ea
 
 
 
 
 
a8d9eb7
 
aa3749b
c17b3ea
aa3749b
 
 
 
 
 
 
 
 
 
 
 
a8d9eb7
 
aa3749b
c17b3ea
aa3749b
 
 
 
 
a8d9eb7
aa3749b
 
 
c17b3ea
aa3749b
 
 
 
 
 
 
 
c17b3ea
aa3749b
a8d9eb7

---
language:
- en
- fr
- nl
- es
- it
- pl
- ro
- de
license: apache-2.0
library_name: transformers
tags:
- mergekit
- merge
- dare
- medical
- biology
- mlx
datasets:
- pubmed
base_model:
- BioMistral/BioMistral-7B
- mistralai/Mistral-7B-Instruct-v0.1
pipeline_tag: text-generation
---

# abhishek-ch/biomistral-7b-synthetic-ehr
This model was converted to MLX format from [`BioMistral/BioMistral-7B-DARE`]().
Refer to the [original model card](https://huggingface.co/BioMistral/BioMistral-7B-DARE) for more details on the model.


## Use with mlx

```bash
pip install mlx-lm
```

The model was LoRA fine-tuned on [health_facts](https://huggingface.co/datasets/health_fact) and 
Synthetic EHR dataset inspired by MIMIC-IV using the format below, for 1000 steps (~1M tokens) using mlx.

```python
def format_prompt(prompt:str, question: str) -> str:
    return """<s>[INST]
## Instructions
{}
## User Question
{}.
[/INST]</s> 
""".format(prompt, question)
```

Example For EHR Diagnosis
```
Prompt = """You are an expert in provide diagnosis summary based on clinical notes inspired by MIMIC-IV-Note dataset.
These notes encompass Chief Complaint along with Patient Summary & medical admission details."""
```

Example for Healthfacts Check
```
Prompt: You are a Public Health AI Assistant. You can do the fact-checking of public health claims. \nEach answer labelled with true, false, unproven or mixture. \nPlease provide the reason behind the answer
```

## Loading the model using `mlx`

```python
from mlx_lm import generate, load
model, tokenizer = load("abhishek-ch/biomistral-7b-synthetic-ehr")
response = generate(
    fused_model,
    fused_tokenizer,
    prompt=format_prompt(prompt, question),
    verbose=True,  # Set to True to see the prompt and response
    temp=0.0,
    max_tokens=512,
)
```

## Loading the model using `transformers`

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "abhishek-ch/biomistral-7b-synthetic-ehr"

tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(repo_id)
model.to("mps")

input_text = format_prompt(system_prompt, question)
input_ids = tokenizer(input_text, return_tensors="pt").to("mps")

outputs = model.generate(
    **input_ids,
    max_new_tokens=512,
)
print(tokenizer.decode(outputs[0]))

```