zhangzhi commited on
Commit
9d12ea1
·
verified ·
1 Parent(s): fe0950e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -7
README.md CHANGED
@@ -6,17 +6,17 @@ license: cc-by-nc-nd-4.0
6
 
7
  A PTM-Aware Protein Language Model with Bidirectional Gated Mamba Blocks
8
 
9
- [[Huggingface](https://huggingface.co/ChatterjeeLab/PTM-Mamba)] [[Github](https://github.com/programmablebio/ptm-mamba)]
10
 
 
11
 
12
- <img src="https://cdn-uploads.huggingface.co/production/uploads/64cd5b3f0494187a9e8b7c69/joOVN6BR3CppDSRKBqWxj.png" width="300" height="300">
13
- Figure generated by DALL-E 3 with prompt "A PTM-Aware Protein Language Model with Bidirectional Gated Mamba Blocks".
14
 
15
  ## Install Enviroment
16
 
17
  ### Docker
18
 
19
- Setting up env for mamba could be a pain, alternatively we suggest using docker containers.
20
 
21
  #### Run container in interactive and detach mode, and mounte project dir to the container workspace.
22
 
@@ -43,11 +43,11 @@ pip install -e protein_lm/tokenizer/rust_trie
43
 
44
  ## Data
45
 
46
- We collect protein sequences and their PTM annotations from Uniprot-Swissprot. The PTM annotations are represented as tokens and used to replaced the corresponding amino acids. The data can be downloaded from [here](https://drive.google.com/file/d/151KUp79tgBxphoIky1-ohyuvzIS1gtNS/view?usp=drive_link). Please place the data on `protein_lm/dataset/`.
47
 
48
  ## Configs
49
 
50
- The training and testing configs are `protein_lm/configs`. We provide a basic training config at `protein_lm/configs/train/base.yaml`.
51
 
52
  ## Training
53
 
@@ -57,7 +57,7 @@ The training and testing configs are `protein_lm/configs`. We provide a basic tr
57
  python ./protein_lm/modeling/scripts/train.py +train=base
58
  ```
59
 
60
- The commond will use the configs in `protein_lm/configs/train/base.yaml`.
61
 
62
  ##### Multi-GPU Training
63
 
@@ -109,3 +109,20 @@ This project is based on the following codebase. Please give them a star if you
109
 
110
  - [OpenBioML/protein-lm-scaling (github.com)](https://github.com/OpenBioML/protein-lm-scaling)
111
  - [state-spaces/mamba (github.com)](https://github.com/state-spaces/mamba)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
 
7
  A PTM-Aware Protein Language Model with Bidirectional Gated Mamba Blocks
8
 
9
+ [[Huggingface](https://huggingface.co/ChatterjeeLab/PTM-Mamba)] [[Github](https://github.com/programmablebio/ptm-mamba)] [[Paper](https://www.biorxiv.org/content/10.1101/2024.02.28.581983v1)]
10
 
11
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/6430c79620265810703d3986/7QdA6MZ6OTmNHuwyDqFnN.png" width="300" height="300">
12
 
13
+ > Figure generated by Dalle-3 with prompt "A PTM-Aware Protein Language Model with Bidirectional Gated Mamba Blocks".
 
14
 
15
  ## Install Enviroment
16
 
17
  ### Docker
18
 
19
+ Setting up env for mamba could be a pain, alternatively, we suggest using docker containers.
20
 
21
  #### Run container in interactive and detach mode, and mounte project dir to the container workspace.
22
 
 
43
 
44
  ## Data
45
 
46
+ We collect protein sequences and their PTM annotations from Uniprot-Swissprot. The PTM annotations are represented as tokens and used to replace the corresponding amino acids. The data can be downloaded from [here](https://drive.google.com/file/d/151KUp79tgBxphoIky1-ohyuvzIS1gtNS/view?usp=drive_link). Please place the data in `protein_lm/dataset/`.
47
 
48
  ## Configs
49
 
50
+ The training and testing configs are in `protein_lm/configs`. We provide a basic training config at `protein_lm/configs/train/base.yaml`.
51
 
52
  ## Training
53
 
 
57
  python ./protein_lm/modeling/scripts/train.py +train=base
58
  ```
59
 
60
+ The command will use the configs in `protein_lm/configs/train/base.yaml`.
61
 
62
  ##### Multi-GPU Training
63
 
 
109
 
110
  - [OpenBioML/protein-lm-scaling (github.com)](https://github.com/OpenBioML/protein-lm-scaling)
111
  - [state-spaces/mamba (github.com)](https://github.com/state-spaces/mamba)
112
+
113
+ ## Citation
114
+ Please cite our paper if you enjoy our code :)
115
+ ```
116
+ @article {Peng2024.02.28.581983,
117
+ author = {Zhangzhi Peng and Benjamin Schussheim and Pranam Chatterjee},
118
+ title = {PTM-Mamba: A PTM-Aware Protein Language Model with Bidirectional Gated Mamba Blocks},
119
+ elocation-id = {2024.02.28.581983},
120
+ year = {2024},
121
+ doi = {10.1101/2024.02.28.581983},
122
+ publisher = {Cold Spring Harbor Laboratory},
123
+ URL = {https://www.biorxiv.org/content/early/2024/02/29/2024.02.28.581983},
124
+ eprint = {https://www.biorxiv.org/content/early/2024/02/29/2024.02.28.581983.full.pdf},
125
+ journal = {bioRxiv}
126
+ }
127
+
128
+ ```