Skip to content

Latest commit

 

History

History
55 lines (39 loc) · 2.14 KB

README.md

File metadata and controls

55 lines (39 loc) · 2.14 KB

Merlin: Vision Language Foundation Model for 3D Computed Tomography

arXiv    Hugging Face    pypi    License

Merlin is a 3D VLM for computed tomography that leverages both structured electronic health records (EHR) and unstructured radiology reports for pretraining.

Key Graphic

⚡️ Installation

To install Merlin, you can simply run:

pip install merlin-vlm

For an editable installation, use the following commands to clone and install this repository.

git clone https://github.com/StanfordMIMI/Merlin.git
cd merlin
pip install -e .

🚀 Inference with Merlin

To create a Merlin model with both image and text embeddings enabled, use the following:

from merlin import Merlin

model = Merlin()

To initialize the model with only image embeddings active, use:

from merlin import Merlin

model = Merlin(ImageEmbedding=True)

For inference on a demo CT scan, please check out the demo

For additional information, please read the documentation.

📎 Citation

If you find this repository useful for your work, please cite the cite the original paper:

@article{blankemeier2024merlin,
  title={Merlin: A vision language foundation model for 3d computed tomography},
  author={Blankemeier, Louis and Cohen, Joseph Paul and Kumar, Ashwin and Van Veen, Dave and Gardezi, Syed Jamal Safdar and Paschali, Magdalini and Chen, Zhihong and Delbrouck, Jean-Benoit and Reis, Eduardo and Truyts, Cesar and others},
  journal={Research Square},
  pages={rs--3},
  year={2024}
}