The implemented Absorption, Distribution, Metabolism, Excretion and Toxicity (ADMET) prediction models, including their performance measures, are available
in our paper online.1
The 15 models cover a diverse set of ADMET endpoints. Some of the models have already been
published, including those for Maximum Recommended Therapeutic Dose (MRTD),2
chemical mutagenicity,3 human liver microsomal (HLM),4
Pgp inhibitor/substrates.5
We also present several new models, which we make available here for the first time.
Liver Toxicity
-
DILI: Drug-induced liver injury (DILI) has been one of the most commonly cited reason for drug
withdrawals from the market. This application predicts whether a compound could cause DILI.
The dataset of 1,431 compounds was obtained from four sources used by Xu et al.8
This dataset contains both pharmaceuticals and non-pharmaceuticals; we classified a compound as causing DILI
if it was associated with a high risk of DILI and not if there was no such risk.
Download DILI dataset
or view model performance
or view model performance (Old Version)
-
Cytotoxicity (HepG2): Cytotoxicity is the degree to which a chemical causes damage to cells.
We developed a cytotoxicity prediction model, using in vitro data on toxicity against HepG2 cells
for 6,000 structurally diverse compounds, which we collected from ChEMBL. In developing our
model, we considered compounds with an IC50 ≤ 10 μM in the in vitro assay as
cytotoxic.
Download Cytotoxicity dataset
or view model performance
or view model performance (Old Version)
Metabolism
-
HLM: The human liver microsomal (HLM) stability assay is commonly used to identify and
exclude compounds that are too rapidly metabolized. For a drug to achieve effective therapeutic
concentrations in the body, it cannot be metabolized too rapidly by the liver. Compounds with a
half-life of 30 minutes or longer in an HLM assay are considered as stable; otherwise they are
considered unstable. We retrieved HLM data from the ChEMBL database, manually curated the data,
and classified compounds as stable or unstable based on the reported half-life (T1/2 > 30 min
was considered stable, and T1/2 < 30 min unstable. The final dataset contained 3,654 compounds.
Of these, we classified 2,313 as stable and 1,341 as unstable.4
Download HLM dataset
or view model performance
or view model performance (Old Version)
-
Cytochrome P450 enzyme (CYP) inhibition: CYPs constitute a superfamily of
proteins that play an important role in the metabolism and detoxification of xenobiotics. We used
in vitro data derived from five main drug-metabolizing CYPs—1A2, 3A4, 2D6, 2C9, and 2C19—to
develop CYP inhibition models. We retrieved CYP inhibitors from
PubChem and classified a compound with an IC50 ≤ 10 μM for an enzyme as an
inhibitor of the enzyme. We give predictions for the following enzymes: CYP1A2, CYP3A4, CYP2D6,
CYP2C9, and CYP2C19.
Download CYP1A2 dataset
or view model performance
or view model performance (Old Version)
Download CYP2C9 dataset
or view model performance
or view model performance (Old Version)
Download CYP2C19 dataset
or view model performance
or view model performance (Old Version)
Download CYP2D6 dataset
or view model performance
or view model performance (Old Version)
Download CYP3A4 dataset
or view model performance
or view model performance (Old Version)
Membrane Transporters
-
BBB: The blood-brain barrier (BBB) is a highly selective barrier that separates the
circulating blood from the central nervous system. We developed a vNN-based BBB model, using
352 compounds whose BBB permeability values (logBB) were obtained from the literature
respectively.6,7
We classified compounds with logBB values of less than –0.3 and greater than +0.3 as BBB
non-permeable and permeable.
Download BBB dataset
or view model performance
or view model performance (Old version)
-
Pgp Substrates and Inhibitors: P-glycoprotein (Pgp) is an essential cell membrane protein
that extracts many foreign substances from the cell. Cancer cells often overexpress Pgp, which
increases the efflux of chemotherapeutic agents from the cell and prevents treatment by
reducing the effective intracellular concentrations of such agents—a phenomenon known as multidrug
resistance. For this reason, identifying compounds that can either be transported out of the cell
by Pgp (substrates) or impair Pgp function (inhibitors) is of great interest. We have developed
models to predict both Pgp substrates and Pgp inhibitors.5
The Pgp substrate dataset was collected by Hou and co-workers.11
This dataset consists of measurements of 422 substrates and 400 non-substrates. To generate a large
Pgp inhibitor dataset, we combined two datasets,12,13 and
removed duplicates to form a combined dataset consisting of a training set of
1,319 inhibitors and 937 non-inhibitors.
Download Pgp Substrates dataset
or view model performance
or view model performance (Old version)
Download Pgp Inhibitors dataset
or view model performance
or view model performance (Old version)
Others
-
hERG (Cardiotoxicity): The human ether-à-go-go-related gene (hERG) codes for a potassium
ion channel involved in the normal cardiac repolarization activity of the heart. Drug-induced
blockade of hERG function can cause long QT syndrome, which may result in arrhythmia and death.
We retrieved 282 known hERG blockers from the literature and classified compounds with an IC50
cutoff value of 10 μM or less as blockers.9
We also collected a set of 404 compounds with IC50 values greater than
10 μM from ChEMBL and classified them as non-blockers.
Download hERG dataset
or view model performance
or view model performance (Old Version)
-
MMP (Mitochondrial Toxicity): Given the fundamental role of mitochondria in cellular
energetics and oxidative stress, mitochondrial dysfunction has been implicated in cancer,
diabetes, neurodegenerative disorders, and cardiovascular diseases. We used the largest dataset
of chemical-induced changes in mitochondrial membrane potential (MMP), based on the assumption
that a compound that causes mitochondrial dysfunction is also likely to reduce the MMP. We developed
a vNN-based MMP prediction model, using 6,261 compounds collected from a previous study that screened
a library of 10,000 compounds (~8,300 unique chemicals) at 15 concentrations, each in triplicate,
to measure changes in the MMP in HepG2 cells.10 The study
found that 913 compounds decreased the MMP, whereas 5,395 compounds had no effect.
Download MMP dataset
or view model performance
or view model performance (Old Version)
-
Mutagenicity (AMES Test): Mutagens are chemicals that cause abnormal genetic mutations leading
to cancer. A common way to assess a chemical’s mutagenicity is
the Ames test. We developed the prediction model, using a literature dataset of 6,512 compounds, of
which 3,503 were Ames-positive. We provide further details of the model and its performance in Reference 2.
Download AMES Test dataset
or view model performance
or view model performance (Old Version)
-
MRTD: The Maximum Recommended Therapeutic Dose (MRTD) is an estimated upper daily dose that is
safe. We built a prediction model based on a dataset of MRTD values publically disclosed by the
FDA, mostly of single-day oral doses for an average adult with a body weight of 60 kg, for 1,220
compounds (most of which are small organic drugs). We excluded organometallics, high-molecular
weight polymers (>5,000 Da), nonorganic chemicals, mixtures of chemicals, and very small molecules
(<100 Da). We used an external test set of 160 compounds that were collected by the FDA for
validation. The total dataset for our model contained 1,185 compounds.2 The predicted MRTD value
is reported in mg/day unit based upon an average adult weighing 60 kg.
Download MRTD dataset
or view model performance
or view model performance (Old Version)