QBind: QLoRA for ESM-2 Binding Sites Prediction

AmelieSchreiber 's Collections

ESMBind (ESMB) for Protein Binding Sites

ESM-PTM: ESM-2 for Predicting PTM

ESM-Interact

Biomedical

LLMs

QBind: QLoRA for ESM-2 Binding Sites Prediction

updated Nov 21, 2023

QLoRAs for various ESM-2 models for predicting binding sites of protein sequences.

Upvote

AmelieSchreiber/esm2_t33_650M_qlora_binding_16M

Updated Nov 21, 2023 • 132

Note A QLoRA trained on ~16M protein sequences with binding site annotations from UniProt.
AmelieSchreiber/esm2_t33_650M_qlora_binding_12M

Updated Nov 1, 2023 • 1

Note A QLoRA trained on ~12M protein sequences with binding site annotations from UniProt.
AmelieSchreiber/esm2_t6_8m_qlora_binding_sites_v0

Updated Oct 4, 2023 • 8 • 1

Note While this model is not overfit, it still shows more signs of overfitting than a model that uses more QLoRA adapter layer. This model only uses adapters for the query, key, and value weight matrices.
AmelieSchreiber/esm2_t12_35M_qlora_binding_sites_v0

Updated Sep 29, 2023 • 3

Note This model only uses adapters for the query, key, and value weight matrices. Thus, it is more overfit than a model that uses more adapter layers.
AmelieSchreiber/esm2_t6_8m_qlora_binding_sites_v1

Updated Sep 30, 2023

Note Less overfitting occurs in this model due to more weight matrices being adapted with QLoRA.
AmelieSchreiber/esm2_t12_35M_qlora_binding_sites_v1

Updated Oct 6, 2023

Note Less overfitting occurs in this model due to more weight matrices being adapted with QLoRA.
AmelieSchreiber/esm2_t12_35M_qlora_binding_2600K_cp1

Updated Oct 6, 2023
AmelieSchreiber/600K_binding_sites

Updated Oct 1, 2023 • 59

Note This dataset is curated from UniProt. The test set was created by selecting entire families of proteins to separate out at random. The train/test split is approximately 80/20. All binding site and active site annotations were merged. All sequences longer than 1000 amino acids were split into non-overlapping chunks of 1000 residues or less.
AmelieSchreiber/1111K_binding_sites

Updated Oct 1, 2023 • 58

Note This dataset is curated from UniProt. The test set was created by selecting entire families of proteins to separate out at random. The train/test split is approximately 80/20. All binding site and active site annotations were merged. All sequences longer than 1000 amino acids were split into non-overlapping chunks of 1000 residues or less.
AmelieSchreiber/2600K_binding_sites

Updated Oct 1, 2023 • 41

Note This dataset is curated from UniProt. The test set was created by selecting entire families of proteins to separate out at random. The train/test split is approximately 80/20. All binding site and active site annotations were merged. All sequences longer than 1000 amino acids were split into non-overlapping chunks of 1000 residues or less.

Upvote