MOrphing HistoPAthology DIffusion (MoPaDi)

---
license: gpl-3.0
tags:
- histopathology
- generative
- diffusion
- counterfactual
- explanations
extra_gated_prompt: >-
  You agree to not use the model to conduct experiments that cause harm to human
  subjects.
extra_gated_fields:
  Name: text
  Email: text
  Affiliation: text
  Country: country
  I want to use this model for:
    type: select
    options:
      - Research
      - Education
  I agree to use this model for non-commercial use ONLY: checkbox
  I agree not to distribute the model, if another user within your organization wishes to use the MoPaDi model, they must register as an individual user: checkbox
---

<h1 style="margin: 0;">MOrphing HistoPAthology DIffusion (MoPaDi)</h1>

[Preprint](https://www.biorxiv.org/content/10.1101/2024.10.29.620913v1) | [Github](https://github.com/KatherLab/mopadi) | [Cite](#citation)

### Abstract

> Deep learning can extract predictive and prognostic biomarkers from histopathology whole slide images, but its interpretability remains elusive. We develop and validate MoPaDi (Morphing histoPathology Diffusion), which generates counterfactual mechanistic explanations. MoPaDi uses diffusion autoencoders to manipulate pathology image patches and flip their biomarker status by changing the morphology. Importantly, MoPaDi includes multiple instance learning for weakly supervised problems. We validate our method on four datasets classifying tissue types, cancer types within different organs, center of slide origin, and a biomarker – microsatellite instability. Counterfactual transitions were evaluated through pathologists’ user studies and quantitative cell analysis. MoPaDi achieves excellent image reconstruction quality (multiscale structural similarity index measure 0.966–0.992) and good classification performance (AUCs 0.76–0.98). In a blinded user study for tissue-type counterfactuals, counterfactual images were realistic (63.3–73.3% of original images identified correctly). For other tasks, pathologists identified meaningful morphological features from counterfactual images. Overall, MoPaDi generates realistic counterfactual explanations that reveal key morphological features driving deep learning model predictions in histopathology, improving interpretability.

<p align="left">
<img src="https://github.com/KatherLab/mopadi/raw/main/images/fig1_paper.png" alt="failed loading the image" width="1100"/>
</p>

## Installation

Clone MoPaDi repository, create a virtual environment, e.g. with conda or mamba, and install required packages:
```
git clone https://github.com/KatherLab/mopadi.git && cd mopadi
mamba create -n mopadi python=3.8 -c conda-forge
pip install -r requirements.txt
```

Example jupyter notebooks, showing how to use these pretrained models to generate counterfactual images are provided on [GitHub](https://github.com/KatherLab/mopadi/tree/main/notebooks).

## Models

In this repository you can find the following models:

<ol type="a">
  <li><b>Tissue classes </b>autoencoding diffusion model (trained on 224 x 224 px tiles from NCT-CRC-HE-100K dataset <a href="https://zenodo.org/records/1214456">(Kather et al., 2018)</a>) + linear 9 classes classifier (Adipose (ADI), background (BACK), debris (DEB), lymphocytes (LYM), mucus (MUC), smooth muscle (MUS), normal colon mucosa (NORM), cancer-associated stroma (STR), colorectal adenocarcinoma epithelium (TUM));</li>
  <li><b>Colorectal (CRC) cancer</b> autoencoding diffusion model (trained on 512 x 512 px tiles (0.5 microns per px, MPP) from tumor regions from TCGA CRC cohort) + microsatellite instability (MSI) status MIL classifier (MSI high [MSIH] vs. nonMSIH);</li>
  <li><b>Breast cancer (BRCA)</b> autoencoding diffusion model (trained on 512 x 512 px tiles (0.5 MPP) from tumor regions from TCGA BRCA cohort) + breast cancer type (invasive lobular carcinoma [ILC] vs. invasive ductal carcinoma [IDC]) and E2 center MIL classifiers;</li>
  <li><b>Pancancer </b>autoencoding diffusion model (trained on 256 x 256 px tiles (varying MPP) from histology images from uniform tumor regions in TCGA WSI <a href="https://zenodo.org/records/5889558">(Komura & Ishikawa, 2021)</a>) + liver cancer types (hepatocellular carcinoma [HCC] vs. cholangiocarcinoma [CCA]) MIL & linear classifiers and lung cancer types (lung adenocarcinoma [LUAD] vs. lung squamous cell carcinoma [LUSC]) MIL & linear classifiers.</li>
</ol>

Examples of counterfactual images generated with corresponding models (please refer to the [preprint](https://www.biorxiv.org/content/10.1101/2024.10.29.620913v1) for more examples):
<p align="left">
<img src="https://github.com/KatherLab/mopadi/raw/main/images/models.png" alt="failed loading the image" width="1100"/>
</p>


## Citation

If you find our work useful in your research or if you use parts of this code please consider citing our [preprint](https://www.biorxiv.org/content/10.1101/2024.10.29.620913v1):

```bibtex
@misc
{zigutyte2024mopadi,
title={ounterfactual Diffusion Models for Mechanistic Explainability of Artificial Intelligence Models in Pathology},
author={Laura Žigutytė and Tim Lenz and Tianyu Han and Katherine Jane Hewitt and Nic Gabriel Reitsam and Sebastian Foersch and Zunamys I Carrero and Michaela Unger and Alexander T Pearson and Daniel Truhn and Jakob Nikolas Kather},
year={2024},
eprint={2024.10.29.620913},
archivePrefix={bioRxiv},
url={https://www.biorxiv.org/content/10.1101/2024.10.29.620913v1},
}
```