File size: 1,355 Bytes
c3635cc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e30f6e9
c3635cc
e30f6e9
c3635cc
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
---
# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
# Doc / guide: https://huggingface.co/docs/hub/model-cards
{}
---

# Region2Vec ChIP-atlas hg38

## Model Details

### Model Description

This is a region2vec model trained on the hg38 ChIP-atlas ATAC-seq data

- **Developed by:** Nathan LeRoy
- **Model type:** Region2Vec
- **Language(s) (NLP):** hg38

### Model Sources [optional]

<!-- Provide the basic links for the model. -->

- **Repository:** https://github.com/databio/geniml
- **Paper [optional]:** https://academic.oup.com/bioinformatics/article/37/23/4299/6307720

## Uses

This model can be used to generate embeddings of genomic regions or region sets. Once embeddings are obtained, they can be directly used for clustering, classification, or search and retrieval tasks. It is limited to hg38. It is not recommended to use this model for data outside ATAC-seq.

## How to Get Started with the Model

You can download and start encoding new genomic region data using the following code:
```python
from geniml.region2vec.experimental import Region2VecExModel

model = Region2VecExModel("databio/r2v-ChIP-atlas-v2")
embeddings = model.encode("path/to/file.bed")

print(embeddings.shape)
```

[More Information Needed]

## Training Details

### Training Data

TODO