--- # For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1 # Doc / guide: https://huggingface.co/docs/hub/model-cards {} --- # Model Card for databio/r2v-mouse-atlas-mm9-v2 ## Model Details This is a single-cell Region2Vec (r2v) model designed to be used with with scEmbed. It was trained on a [single-cell mouse atlas dataset](https://www.cell.com/cell/fulltext/S0092-8674(18)30855-9). This model should be used to generate embeddings of single cells from scATAC-seq experiments. It produces 100 dimensional embeddings for each single-cell. ### Model Sources [optional] - **Repository:** https://github.com/databio/geniml - **Paper:** https://www.biorxiv.org/content/10.1101/2023.08.01.551452v1 ## Uses This model should be used for producing low dimensional embeddings of single-cells. These embeddings can be used for downstream clustering or classification tasks. ## Bias, Risks, and Limitations The [mouse atlas dataset](https://www.cell.com/cell/fulltext/S0092-8674(18)30855-9) profiled genome-wide chromatin accessibility in ∼100,000 single cells from 13 adult mouse tissues. Reads from the these experiments were aligned to mm9, as such, one should only use this model with other data aligned to mm9. ### Recommendations If finetuning on your own data, we recommend 100 epochs. You might be able to get away with less, however. ## How to Get Started with the Model You can use the `geniml` python library to download this model and start encoding your single-cell data: ```python import scanpy as sc from geniml.scembed import ScEmbed adata = sc.read_h5ad("path/to/adata.h5ad") model = ScEmbed("databio/r2v-mouse-atlas-mm9-v2") embeddings = model.encode(adata) ``` ## Training Details ### Training Data The data for this model comes from [Cusanovich2018](https://www.cell.com/cell/fulltext/S0092-8674(18)30855-9). These data define the in vivo landscape of the regulatory genome for common mammalian cell types at single-cell resolution.