Dataset description

An integrated Ether-a-go-go-related gene (hERG) dataset consisting of molecular structures labeled as hERG (<10uM) and non-hERG (>=10uM) blockers in the form of SMILES strings was obtained from the DeepHIT, the BindingDB database, ChEMBL bioactivity database, and other literature.

Task description

Binary classification. Given a drug SMILES string, predict whether it blocks (1, <10uM) or not blocks (0, >=10uM).

Dataset statistics

Total: 13445; Train_val: 12620; Test: 825

Pre-requisites

Install the following packages

pip install PyTDC
pip install DeepPurpose
pip install git+https://github.com/bp-kelley/descriptastorus
pip install dgl torch torchvision

You can also reference the colab notebook here

Dataset split

Random split on 70% training, 10% validation, and 20% testing

To load the dataset in TDC, type

from tdc.single_pred import Tox
data = Tox(name = 'herg_karim')

Model description

Morgan chemical fingerprint with an MLP decoder. The model is tuned with 100 runs using the Ax platform.

To load the pre-trained model, type

from tdc import tdc_hf_interface
tdc_hf = tdc_hf_interface("hERG_Karim-Morgan")
# load deeppurpose model from this repo
dp_model = tdc_hf.load_deeppurpose('./data')
tdc_hf.predict_deeppurpose(dp_model, ['CC(=O)NC1=CC=C(O)C=C1'])

References

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.