You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Model Card for ConceptCLIP

Model Details

Model Description

ConceptCLIP is a large-scale vision-language pre-training model enhanced with medical concepts for diverse medical image modalities. It enables robust performance across multiple medical imaging tasks through concept-enhanced language-image alignment.

  • Developed by: Yuxiang Nie, Sunan He, Yequan Bie, Yihui Wang, Zhixuan Chen, Shu Yang, Hao Chen
  • Model type: Vision-Language Pre-trained Model (Medical Specialized)
  • Language(s): English (text), Multi-modal (medical imaging)
  • License: MIT
  • Finetuned from model: Based on OpenCLIP

Model Sources

Uses

Direct Use

  • Zero-shot medical image classification
  • Cross-modal retrieval
  • Zero-shot concept annotation
  • Extract features for whole-slide image analysis
  • Extract features for medical report generation

Downstream Use

  • Fine-tuning for specific medical imaging tasks (CT, MRI, X-ray analysis) for classification, and visual question answering
  • Concept bottleneck model for explanation
  • Integration into clinical decision support systems
  • Medical education and training tools

Out-of-Scope Use

  • Direct clinical diagnosis without clinical validation
  • Non-medical image analysis
  • General purpose vision tasks outside medical domain

Bias, Risks, and Limitations

  • Trained primarily on medical imaging data which may contain demographic biases
  • Performance may vary across different medical imaging modalities
  • Should not be used as sole diagnostic tool without human oversight

Recommendations

  • Validate outputs with clinical experts before medical decision making
  • Fine-tune on domain-specific data for specialized applications
  • Conduct bias analysis when deploying in new clinical environments

How to Get Started with the Model

from transformers import AutoModel, AutoProcessor
import torch
from PIL import Image

model = AutoModel.from_pretrained('JerrryNie/ConceptCLIP', trust_remote_code=True)
processor = AutoProcessor.from_pretrained('JerrryNie/ConceptCLIP', trust_remote_code=True)

image = Image.open('example_data/chest_X-ray.jpg').convert('RGB')
labels = ['chest X-ray', 'brain MRI', 'skin lesion']
texts = [f'a medical image of {label}' for label in labels]

inputs = processor(
    images=image, 
    text=texts,
    return_tensors='pt',
    padding=True,
    truncation=True
).to(model.device)

with torch.no_grad():
    outputs = model(**inputs)
    logits = (outputs.logit_scale * outputs.image_features @ outputs.text_features.t()).softmax(dim=-1)[0]

print({label: f"{prob:.2%}" for label, prob in zip(labels, logits)})

Training Details

Training Data

  • Large-scale medical image-text pairs with concept information

Training Procedure

  • Built on OpenCLIP architecture with medical concept integration
  • Pre-training with image-text alignment (IT-Align) and patch-concept alignment (PC-Align) objectives

Training Hyperparameters

  • Base architecture: SigLIP-ViT-400M-16 + PubMedBERT
  • Training regime: Mixed precision training
  • Batch size: 12,288 w/o PC-Align, 6,144 w/ PC-Align
  • Learning rate: 5e-4 w/o PC-Align, 3e-4 w/ PC-Align

Evaluation

Testing Data & Metrics

Testing Data

  • Evaluated on multiple open-sourced medical imaging benchmarks including medical image diagnosis, cross-modal retrieval, medical visual question answering, medical report generation, whole-slide image analysis, and explainable AI

Citation

BibTeX:

@article{nie2025conceptclip,
  title={ConceptCLIP: Towards Trustworthy Medical AI via Concept-Enhanced Language-Image Pre-training},
  author={Nie, Yuxiang and He, Sunan and Bie, Yequan and Wang, Yihui and Chen, Zhixuan and Yang, Shu and Chen, Hao},
  journal={arXiv preprint arXiv:2501.xxxxx},
  year={2025}
}

APA:

[More Information Needed]

Model Card Contact

Yuxiang Nie: [email protected]

Downloads last month
55
Safetensors
Model size
543M params
Tensor type
F32
·
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.