vNN-ADMET

Implemented ADMET Predictions

The implemented Absorption, Distribution, Metabolism, Excretion and Toxicity (ADMET) prediction models, including their performance measures, are available in our paper online.¹ The 15 models cover a diverse set of ADMET endpoints. Some of the models have already been published, including those for Maximum Recommended Therapeutic Dose (MRTD),² chemical mutagenicity,³ human liver microsomal (HLM),⁴ Pgp inhibitor/substrates.⁵ We also present several new models, which we make available here for the first time.

Liver Toxicity

DILI: Drug-induced liver injury (DILI) has been one of the most commonly cited reason for drug withdrawals from the market. This application predicts whether a compound could cause DILI. The dataset of 1,431 compounds was obtained from four sources used by Xu et al.⁸ This dataset contains both pharmaceuticals and non-pharmaceuticals; we classified a compound as causing DILI if it was associated with a high risk of DILI and not if there was no such risk.
Download DILI dataset or view model performance (Current Version) or view model performance (Original Version)
Cytotoxicity (HepG2): Cytotoxicity is the degree to which a chemical causes damage to cells. We developed a cytotoxicity prediction model, using in vitro data on toxicity against HepG2 cells for 6,000 structurally diverse compounds, which we collected from ChEMBL. In developing our model, we considered compounds with an IC₅₀ ≤ 10 μM in the in vitro assay as cytotoxic.
Download Cytotoxicity dataset or view model performance (Current Version) or view model performance (Original Version)

Metabolism

HLM: The human liver microsomal (HLM) stability assay is commonly used to identify and exclude compounds that are too rapidly metabolized. For a drug to achieve effective therapeutic concentrations in the body, it cannot be metabolized too rapidly by the liver. Compounds with a half-life of 30 minutes or longer in an HLM assay are considered as stable; otherwise they are considered unstable. We retrieved HLM data from the ChEMBL database, manually curated the data, and classified compounds as stable or unstable based on the reported half-life (T1/2 > 30 min was considered stable, and T1/2 < 30 min unstable. The final dataset contained 3,654 compounds. Of these, we classified 2,313 as stable and 1,341 as unstable.⁴
Download HLM dataset or view model performance (Current Version) or view model performance (Original Version)
Cytochrome P450 enzyme (CYP) inhibition: CYPs constitute a superfamily of proteins that play an important role in the metabolism and detoxification of xenobiotics. We used in vitro data derived from five main drug-metabolizing CYPs—1A2, 3A4, 2D6, 2C9, and 2C19—to develop CYP inhibition models. We retrieved CYP inhibitors from PubChem and classified a compound with an IC₅₀ ≤ 10 μM for an enzyme as an inhibitor of the enzyme. We give predictions for the following enzymes: CYP1A2, CYP3A4, CYP2D6, CYP2C9, and CYP2C19.
Download CYP1A2 dataset or view model performance (Current Version) or view model performance (Original Version)
Download CYP2C9 dataset or view model performance (Current Version) or view model performance (Original Version)
Download CYP2C19 dataset or view model performance (Current Version) or view model performance (Original Version)
Download CYP2D6 dataset or view model performance (Current Version) or view model performance (Original Version)
Download CYP3A4 dataset or view model performance (Current Version) or view model performance (Original Version)

Membrane Transporters

BBB: The blood-brain barrier (BBB) is a highly selective barrier that separates the circulating blood from the central nervous system. We developed a vNN-based BBB model, using 352 compounds whose BBB permeability values (log⁡BB) were obtained from the literature respectively.^6,7 We classified compounds with log⁡BB values of less than –0.3 and greater than +0.3 as BBB non-permeable and permeable.
Download BBB dataset or view model performance (Current Version) or view model performance (Original version)
Pgp Substrates and Inhibitors: P-glycoprotein (Pgp) is an essential cell membrane protein that extracts many foreign substances from the cell. Cancer cells often overexpress Pgp, which increases the efflux of chemotherapeutic agents from the cell and prevents treatment by reducing the effective intracellular concentrations of such agents—a phenomenon known as multidrug resistance. For this reason, identifying compounds that can either be transported out of the cell by Pgp (substrates) or impair Pgp function (inhibitors) is of great interest. We have developed models to predict both Pgp substrates and Pgp inhibitors.⁵ The Pgp substrate dataset was collected by Hou and co-workers.¹¹ This dataset consists of measurements of 422 substrates and 400 non-substrates. To generate a large Pgp inhibitor dataset, we combined two datasets,^12,13 and removed duplicates to form a combined dataset consisting of a training set of 1,319 inhibitors and 937 non-inhibitors.
Download Pgp Substrates dataset or view model performance (Current Version) or view model performance (Original version)
Download Pgp Inhibitors dataset or view model performance (Current Version) or view model performance (Original version)

Others

hERG (Cardiotoxicity): The human ether-à-go-go-related gene (hERG) codes for a potassium ion channel involved in the normal cardiac repolarization activity of the heart. Drug-induced blockade of hERG function can cause long QT syndrome, which may result in arrhythmia and death. We retrieved 282 known hERG blockers from the literature and classified compounds with an IC₅₀ cutoff value of 10 μM or less as blockers.⁹ We also collected a set of 404 compounds with IC₅₀ values greater than 10 μM from ChEMBL and classified them as non-blockers.
Download hERG dataset or view model performance (Current Version) or view model performance (Original Version)
MMP (Mitochondrial Toxicity): Given the fundamental role of mitochondria in cellular energetics and oxidative stress, mitochondrial dysfunction has been implicated in cancer, diabetes, neurodegenerative disorders, and cardiovascular diseases. We used the largest dataset of chemical-induced changes in mitochondrial membrane potential (MMP), based on the assumption that a compound that causes mitochondrial dysfunction is also likely to reduce the MMP. We developed a vNN-based MMP prediction model, using 6,261 compounds collected from a previous study that screened a library of 10,000 compounds (~8,300 unique chemicals) at 15 concentrations, each in triplicate, to measure changes in the MMP in HepG2 cells.¹⁰ The study found that 913 compounds decreased the MMP, whereas 5,395 compounds had no effect.
Download MMP dataset or view model performance (Current Version) or view model performance (Original Version)
Mutagenicity (AMES Test): Mutagens are chemicals that cause abnormal genetic mutations leading to cancer. A common way to assess a chemical’s mutagenicity is the Ames test. We developed the prediction model, using a literature dataset of 6,512 compounds, of which 3,503 were Ames-positive. We provide further details of the model and its performance in Reference 2.
Download AMES Test dataset or view model performance (Current Version) or view model performance (Original Version)
MRTD: The Maximum Recommended Therapeutic Dose (MRTD) is an estimated upper daily dose that is safe. We built a prediction model based on a dataset of MRTD values publically disclosed by the FDA, mostly of single-day oral doses for an average adult with a body weight of 60 kg, for 1,220 compounds (most of which are small organic drugs). We excluded organometallics, high-molecular weight polymers (>5,000 Da), nonorganic chemicals, mixtures of chemicals, and very small molecules (<100 Da). We used an external test set of 160 compounds that were collected by the FDA for validation. The total dataset for our model contained 1,185 compounds.² The predicted MRTD value is reported in mg/day unit based upon an average adult weighing 60 kg.
Download MRTD dataset or view model performance (Current Version) or view model performance (Original Version)

Performance measures of vNN models in 10-fold cross validation using a restricted or unrestricted applicability domain (Current Version)
Model	Data^a	d₀^b	h^c	Accuracy	Specificity	Sensitivity	kappa	R^d	Coverage
DILI	1427	0.60	0.50	0.72	0.71	0.74	0.44		0.64
DILI	1427	1.00	0.20	0.67	0.62	0.71	0.33		1.00
Cytotox (hep2g)	6097	0.40	0.20	0.85	0.89	0.76	0.65		0.89
Cytotox (hep2g)	6097	1.00	0.20	0.84	0.89	0.73	0.63		1.00
HLM	3219	0.40	0.20	0.81	0.86	0.72	0.59		0.91
HLM	3219	1.00	0.20	0.80	0.87	0.69	0.57		1.00
CYP1A2	7558	0.50	0.20	0.91	0.90	0.71	0.81		0.70
CYP1A2	7558	1.00	0.20	0.88	0.90	0.85	0.74		1.00
CYP2C9	8072	0.50	0.20	0.91	0.96	0.56	0.55		0.75
CYP2C9	8072	1.00	0.20	0.90	0.96	0.45	0.46		1.00
CYP2C19	8155	0.50	0.20	0.87	0.93	0.63	0.57		0.75
CYP2C19	8155	1.00	0.20	0.86	0.94	0.52	0.49		1.00
CYP2D6	7805	0.50	0.20	0.89	0.94	0.64	0.60		0.74
CYP2D6	7805	1.00	0.20	0.87	0.94	0.54	0.53		1.00
CYP3A4	10373	0.50	0.20	0.87	0.92	0.75	0.67		0.77
CYP3A4	10373	1.00	0.20	0.87	0.93	0.68	0.63		1.00
BBB	353	0.60	0.20	0.90	0.85	0.94	0.79		0.60
BBB	353	1.00	0.10	0.83	0.76	0.89	0.65		1.00
Pgp Substrate	822	0.60	0.20	0.80	0.80	0.80	0.59		0.64
Pgp Substrate	822	1.00	0.20	0.74	0.75	0.73	0.48		1.00
Pgp Inhibitor	2304	0.50	0.20	0.85	0.72	0.91	0.64		0.75
Pgp Inhibitor	2304	1.00	0.10	0.81	0.74	0.86	0.61		1.00
hERG	685	0.70	0.70	0.85	0.85	0.85	0.69		0.76
hERG	685	1.00	0.20	0.83	0.85	0.80	0.65		1.00
MMP	6261	0.50	0.40	0.89	0.94	0.64	0.61		0.66
MMP	6261	1.00	0.20	0.87	0.94	0.52	0.50		1.00
AMES	6512	0.50	0.40	0.81	0.74	0.86	0.60		0.78
AMES	6512	1.00	0.20	0.78	0.74	0.81	0.56		1.00
MRTD^e	1184	0.60	0.20					0.80	0.67
MRTD^e	1184	1.00	0.20					0.74	1.00
^aNumber of compounds in the dataset; ^bTanimoto-distance threshold value; ^cSmoothing factor; ^dPearson’s correlation coefficient ; ^eRegression model.

See original performance measure (Pipeline Pilot)

Performance measures of vNN models in 10-fold cross validation using a restricted or unrestricted applicability domain (Original Version)
Model	Data^a	d₀^b	h^c	Accuracy	Specificity	Sensitivity	kappa	R^d	Coverage
DILI	1427	0.60	0.50	0.71	0.70	0.73	0.42		0.66
DILI	1427	1.00	0.20	0.67	0.62	0.72	0.34		1.00
Cytotox (hep2g)	6097	0.40	0.20	0.84	0.88	0.76	0.64		0.89
Cytotox (hep2g)	6097	1.00	0.20	0.84	0.89	0.73	0.62		1.00
HLM	3219	0.40	0.20	0.81	0.87	0.72	0.59		0.91
HLM	3219	1.00	0.20	0.81	0.87	0.70	0.57		1.00
CYP1A2	7558	0.50	0.20	0.90	0.95	0.70	0.66		0.75
CYP1A2	7558	1.00	0.20	0.89	0.95	0.61	0.60		1.00
CYP2C9	8072	0.50	0.20	0.91	0.96	0.55	0.54		0.76
CYP2C9	8072	1.00	0.20	0.90	0.96	0.44	0.46		1.00
CYP2C19	8155	0.50	0.20	0.87	0.93	0.64	0.58		0.76
CYP2C19	8155	1.00	0.20	0.86	0.94	0.52	0.50		1.00
CYP2D6	7805	0.50	0.20	0.89	0.94	0.61	0.57		0.75
CYP2D6	7805	1.00	0.20	0.88	0.95	0.52	0.51		1.00
CYP3A4	10373	0.50	0.20	0.88	0.92	0.76	0.68		0.78
CYP3A4	10373	1.00	0.20	0.88	0.93	0.69	0.64		1.00
BBB	353	0.60	0.20	0.90	0.86	0.94	0.80		0.61
BBB	353	1.00	0.10	0.82	0.75	0.88	0.64		1.00
Pgp Substrate	822	0.60	0.20	0.79	0.79	0.80	0.58		0.66
Pgp Substrate	822	1.00	0.20	0.73	0.74	0.73	0.47		1.00
Pgp Inhibitor	2304	0.50	0.20	0.85	0.73	0.91	0.66		0.76
Pgp Inhibitor	2304	1.00	0.10	0.81	0.74	0.86	0.61		1.00
hERG	685	0.70	0.70	0.84	0.83	0.84	0.68		0.80
hERG	685	1.00	0.20	0.82	0.83	0.82	0.64		1.00
MMP	6261	0.50	0.40	0.89	0.94	0.64	0.61		0.69
MMP	6261	1.00	0.20	0.87	0.94	0.52	0.50		1.00
AMES	6512	0.50	0.40	0.82	0.75	0.86	0.62		0.79
AMES	6512	1.00	0.20	0.79	0.75	0.82	0.57		1.00
MRTD^e	1184	0.60	0.20					0.79	0.69
MRTD^e	1184	1.00	0.20					0.74	1.00
^aNumber of compounds in the dataset; ^bTanimoto-distance threshold value; ^cSmoothing factor; ^dPearson’s correlation coefficient ; ^eRegression model.

See current performance measure (Python)

References

Schyman, P., R. Liu, V. Desai, and A. Wallqvist. vNN web server for ADMET predictions. Frontiers in Pharmacology. 2017 December 4; 8:889.
Liu, R., G. Tawa, and A. Wallqvist. Locally weighted learning methods for predicting dose-dependent toxicity with application to the human maximum recommended daily dose. Chemical Research in Toxicology. 2012; 25(10):2216-2226.
Liu, R., and A. Wallqvist. Merging applicability domains for in silico assessment of chemical mutagenicity. Journal of Chemical Information and Modeling. 2014; 54(3):793-800.
Liu, R., P. Schyman, and A. Wallqvist. Critically assessing the predictive power of QSAR models for human liver microsomal stability. Journal of Chemical Information and Modeling. 2015; 55(8):1566-1575.
Schyman, P., R. Liu, and A. Wallqvist. Using the variable-nearest neighbor method to identify P-glycoprotein substrates and inhibitors. ACS Omega. 2016; 1(5):923-929.
Muehlbacher, M., G. Spitzer, K. Liedl, J. Kornhuber. Qualitative prediction of blood–brain barrier permeability on a large and refined dataset. Journal of Computer-Aided Molecular Design. 2011; 25:1095.
R. Naef. A generally applicable computer algorithm based on the group additivity method for the calculation of seven molecular descriptors: heat of combustion, logPO/W, logS, refractivity, polarizability, toxicity and logBB of organic compounds; scope and limits of applicability. Molecules. 2015; 20(10):18279-18351.
Xu, Y., Z. Dai, F. Chen, S. Gao, J. Pei, and L. Lai. Deep learning for drug-induced liver injury. 2015, 55 (10):2085–2093.
Schyman, P., R. Liu, and A. Wallqvist. General purpose 2D and 3D similarity approach to identify hERG blockers. Journal of Chemical Information and Modeling. 2016; 56(1):213-222.
Attene-Ramos, M., R. Huang, S. Michael, K. Witt, A. Richard, R. Tice, A. Simeonov, C. Austin, M. Xia. Profiling of the Tox21 chemical collection for mitochondrial function to identify compounds that acutely decrease mitochondrial membrane potential. 2015; 123(1):49.
Li, D., L. Chen, Y. Li, S. Tian, H. Sun, and T. Hou. ADMET evaluation in drug discovery. 13. Development of in Silico prediction models for P-Glycoprotein substrates. 2014; 11(3):716-726.
Broccatelli, F., E. Carosati, A. Neri, M. Frosini, L. Goracci, T. Oprea, and G. Cruciani. A novel approach for predicting P-Glycoprotein (ABCB1) inhibition using molecular interaction fields. 2011; 54(6):1740-1751.
Chen, L., Y. Li, Q. Zhao, H. Peng, and T. Hou. ADME evaluation in drug discovery. 10. Predictions of P-Glycoprotein inhibitors using recursive partitioning and naive bayesian classification techniques. 2011; 8(3):889–900.