|
|
|
--- |
|
license: cc-by-nc-nd-4.0 |
|
--- |
|
|
|
# PepDoRA: A Unified Peptide-Specific Language Model via Weight-Decomposed Low-Rank Adaptation |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64cd5b3f0494187a9e8b7c69/fzsxEjCdBJfKa6T44Tjc8.png) |
|
|
|
In this work, we introduce **PepDoRA**, a SMILES transformer that fine-tunes the state-of-the-art [ChemBERTa-77M-MLM](https://huggingface.co/DeepChem/ChemBERTa-77M-MLM) transformer on modified peptide SMILES via [DoRA](https://nbasyl.github.io/DoRA-project-page/), a novel PEFT method that incorporates weight decomposition. These representations can be leveraged for numerous downstream tasks, including membrane permeability prediction and target binding assessment, for both unmodified and modified peptide sequences. |
|
|
|
Here's how to extract PepDoRA embeddings for your input peptide: |
|
|
|
``` |
|
import torch |
|
from transformers import AutoModel,AutoModelForCausalLM, AutoTokenizer |
|
from peft import PeftModel, PeftConfig |
|
|
|
|
|
# Merge the adapter with the base model |
|
base_model = "DeepChem/ChemBERTa-77M-MLM" |
|
adapter_model = "ChatterjeeLab/PepDoRA" |
|
model = AutoModelForCausalLM.from_pretrained(base_model) |
|
model = PeftModel.from_pretrained(model, adapter_model) |
|
tokenizer = AutoTokenizer.from_pretrained(base_model) |
|
|
|
|
|
from transformers import AutoModel |
|
|
|
model_name = "ChatterjeeLab/PepDoRA" |
|
|
|
# Load the model and the tokenizer using AutoModel |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModel.from_pretrained(model_name) |
|
|
|
peptide = "CC(C)C[C@H]1NC(=O)[C@@H](C)NCCCCCCNC(=O)[C@H](CO)NC1=O" |
|
|
|
# Tokenize the peptide |
|
inputs = tokenizer(peptide, return_tensors="pt") |
|
|
|
# Get the hidden states (embeddings) from the model |
|
with torch.no_grad(): |
|
outputs = model(**inputs,output_hidden_states=True) |
|
|
|
# Extract the embeddings from the last hidden layer |
|
embeddng=outputs.last_hidden_state |
|
|
|
# Print the embedding shape (or the embedding itself) |
|
print(outputs.last_hidden_state.shape) |
|
print(embeddng) |
|
``` |
|
|
|
## Repository Authors |
|
|
|
[Leyao Wang](mailto:[email protected]), Undergraduate Intern in the Chatterjee Lab <br> |
|
[Pranam Chatterjee](mailto:[email protected]), Assistant Professor at Duke University |
|
|
|
Reach out to us with any questions! |
|
|