Merlin: Vision Language Foundation Model for 3D Computed Tomography

pypi

Merlin is a 3D VLM for computed tomography that leverages both structured electronic health records (EHR) and unstructured radiology reports for pretraining. The huggingface repository here provides the model weights and an example image file.

[πŸ’» Github] [πŸ“„ Paper]

⚑️ Installation

To install Merlin, you can simply run:

pip install merlin-vlm

For an editable installation, use the following commands to clone and install this repository.

git clone https://github.com/StanfordMIMI/Merlin.git
cd merlin
pip install -e .

For usage instructions, please visit the github repository.

πŸ“ Project Structure:

.
β”œβ”€β”€ README.md
β”œβ”€β”€ i3_resnet_clinical_longformer_best_clip_04-02-2024_23-21-36_epoch_99.pt <Merlin weights>
β”œβ”€β”€ image1.nii.gz <Sample Image>

πŸ“Ž Citation

If you find this repository useful for your work, please cite the cite the original paper:

@article{blankemeier2024merlin,
  title={Merlin: A vision language foundation model for 3d computed tomography},
  author={Blankemeier, Louis and Cohen, Joseph Paul and Kumar, Ashwin and Van Veen, Dave and Gardezi, Syed Jamal Safdar and Paschali, Magdalini and Chen, Zhihong and Delbrouck, Jean-Benoit and Reis, Eduardo and Truyts, Cesar and others},
  journal={Research Square},
  pages={rs--3},
  year={2024}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.