abhishek-ch
/

biomistral-7b-synthetic-ehr

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

biomistral-7b-synthetic-ehr / README.md

abhishek-ch's picture

update transformer

a8d9eb7 verified 8 months ago

|

2.33 kB

	---
	language:
	- en
	- fr
	- nl
	- es
	- it
	- pl
	- ro
	- de
	license: apache-2.0
	library_name: transformers
	tags:
	- mergekit
	- merge
	- dare
	- medical
	- biology
	- mlx
	datasets:
	- pubmed
	base_model:
	- BioMistral/BioMistral-7B
	- mistralai/Mistral-7B-Instruct-v0.1
	pipeline_tag: text-generation
	---

	# abhishek-ch/biomistral-7b-synthetic-ehr
	This model was converted to MLX format from [`BioMistral/BioMistral-7B-DARE`]().
	Refer to the [original model card](https://huggingface.co/BioMistral/BioMistral-7B-DARE) for more details on the model.


	## Use with mlx

	```bash
	pip install mlx-lm
	```

	The model was LoRA fine-tuned on [health_facts](https://huggingface.co/datasets/health_fact) and
	Synthetic EHR dataset inspired by MIMIC-IV using the format below, for 1000 steps (~1M tokens) using mlx.

	```python
	def format_prompt(prompt:str, question: str) -> str:
	return """<s>[INST]
	## Instructions
	{}
	## User Question
	{}.
	[/INST]</s>
	""".format(prompt, question)
	```

	Example For EHR Diagnosis
	```
	Prompt = """You are an expert in provide diagnosis summary based on clinical notes inspired by MIMIC-IV-Note dataset.
	These notes encompass Chief Complaint along with Patient Summary & medical admission details."""
	```

	Example for Healthfacts Check
	```
	Prompt: You are a Public Health AI Assistant. You can do the fact-checking of public health claims. \nEach answer labelled with true, false, unproven or mixture. \nPlease provide the reason behind the answer
	```

	## Loading the model using `mlx`

	```python
	from mlx_lm import generate, load
	model, tokenizer = load("abhishek-ch/biomistral-7b-synthetic-ehr")
	response = generate(
	fused_model,
	fused_tokenizer,
	prompt=format_prompt(prompt, question),
	verbose=True, # Set to True to see the prompt and response
	temp=0.0,
	max_tokens=512,
	)
	```

	## Loading the model using `transformers`

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	repo_id = "abhishek-ch/biomistral-7b-synthetic-ehr"

	tokenizer = AutoTokenizer.from_pretrained(repo_id)
	model = AutoModelForCausalLM.from_pretrained(repo_id)
	model.to("mps")

	input_text = format_prompt(system_prompt, question)
	input_ids = tokenizer(input_text, return_tensors="pt").to("mps")

	outputs = model.generate(
	**input_ids,
	max_new_tokens=512,
	)
	print(tokenizer.decode(outputs[0]))

	```