Adverse drug reaction prediction using voting ensemble training approach

Document Type : Original Article

Authors

Computational Biology Research Center (CBRC), Department of Mathematics and Computer Science, Amirkabir University of Technology (Tehran Polytechnic), Iran

Abstract

Identifying and controlling adverse drug reactions (ADRs) is a challenging problem in the pharmacological field. For instance, the drug Rosiglitazone has been associated with adverse reactions that were only recognized after its release. Due to such experiences, pharmacists are now more interested in using computational methods to predict ADRs. The performance of computational methods is contingent upon the defined dataset. In some studies, the known drug-adverse reaction associations are regarded as positive while the unknown drug-adverse reaction associations are regarded as negative data. This consequently creates an unbalanced dataset, which can lead to inaccurate predictions from models and cause the classifiers to be flawed. We propose a framework named Adverse Drug Reaction using the Voting Ensemble Training Approach (ADRP-VETA) for ADR problem to overcome unbalanced dataset challenges. We construct the similarity vector of each drug with other drugs based on chemical structure as a drug feature. Also, the similarity vector of each ADR with other ADRs is computed based on the Unified Medical Language System (UMLS) as adverse reaction feature. With this approach, we can leverage the similarity of the features to more accurately capture the intricate relationships between drugs and adverse reactions. We compare ADRP-VETA to three state-of-the-art models and find that it outperforms them, achieving an AUC-ROC of 91% and an AUC-PR of 89.8%. Furthermore, we assess ADRP-VETA’s ability to predict rare adverse reactions, and find that its AUC-ROC and AUC-PR are 83.3% and 92.2%, respectively. As a case study, we focus on the associations between liver-injury adverse reactions and three drugs.

Keywords

Main Subjects


[1] Coronavirus (COVID-19) update: FDA revokes emergency use authorization for chloroquine and hydroxychloroquine. https://www.fda.gov/news-events/press-announcements/ coronavirus-covid-19-update-fda-revokes-emergency-use-authorization-chloroquine-and, 2020. Accessed: September 2021.
[2] D. G. Altman and J. M. Bland, Statistics notes: Diagnostic tests 1: sensitivity and specificity, BMJ, 308 (1994), pp. 1552–1552.
[3] P. Bansal, A. Goyal, A. Cusick, S. Lahan, H. S. Dhaliwal, P. Bhyan, P. B. Bhattad, F. Aslam, S. Ranka, T. Dalia, L. Chhabra, D. Sanghavi, B. Sonani, and J. M. Davis, Hydroxychloroquine: a comprehensive review and its controversial role in coronavirus disease 2019, Annals of Medicine, 53 (2020), pp. 117–134.
[4] L. Breiman, Bagging predictors, Machine Learning, 24 (1996), pp. 123–140.
[5] L. Chen, T. Huang, J. Zhang, M.-Y. Zheng, K.-Y. Feng, Y.-D. Cai, and K.-C. Chou, Predicting drugs side effects based on chemical-chemical interactions and protein-chemical interactions, BioMed Research International, 2013 (2013), pp. 1–8.
[6] S. Dey, H. Luo, A. Fokoue, J. Hu, and P. Zhang, Predicting adverse drug reactions through interpretable deep learning framework, BMC Bioinformatics, 19 (2018).
[7] D. Galeano and A. Paccanaro, A recommender system approach for predicting drug side effects, in 2018 International Joint Conference on Neural Networks (IJCNN), 2018, pp. 1–8.
[8] X. Guo, W. Zhou, Y. Yu, Y. Ding, J. Tang, and F. Guo, A novel triple matrix factorization method for detecting drug-side effect association based on kernel target alignment, BioMed Research International, 2020 (2020), pp. 1–11.
[9] J. Han, M. Kamber, and J. Pei, Data mining concepts and techniques third edition, University of Illinois at Urbana-Champaign Micheline Kamber Jian Pei Simon Fraser University, 2012.
[10] M. Khan, Drug side-effect prediction using machine learning methods, 2017.
[11] S. Kim, J. Chen, T. Cheng, A. Gindulyte, J. He, S. He, Q. Li, B. A. Shoemaker, P. A. Thiessen, B. Yu, L. Zaslavsky, J. Zhang, and E. E. Bolton, PubChem 2019 update: improved access to chemical data, Nucleic Acids Research, 47 (2019), pp. D1102–D1109.
[12] M. Kuhn, I. Letunic, L. J. Jensen, and P. Bork, The SIDER database of drugs and side effects, Nucleic Acids Research, 44 (2016), pp. D1075–D1079.
[13] R. Kumar and A. Indrayan, Receiver operating characteristic (ROC) curve for medical researchers, Indian Pediatrics, 48 (2011), pp. 277–287.
[14] N. Kuniyoshi, H. Miyakawa, K. Matsumoto, H. Tsunashima, K. Sekine, T. Tsujikawa, M. Mabuchi, S. Doi, and K. Kikuchi, Detection of anti-mitochondrial antibodies accompanied by druginduced hepatic injury due to atorvastatin, Internal Medicine, 58 (2019), pp. 2663–2667.
[15] H. Liang, L. Chen, X. Zhao, and X. Zhang, Prediction of drug side effects with a refined negative sample selection strategy, Computational and Mathematical Methods in Medicine, 2020 (2020), pp. 1–16.
[16] B. W. Matthews, Comparison of the predicted and observed secondary structure of t4 phage lysozyme, Biochimica et Biophysica Acta (BBA) - Protein Structure, 405 (1975), pp. 442–451.
[17] B. T. McInnes, T. Pedersen, and S. V. Pakhomov, UMLS-interface and UMLS-similarity: open source software for measuring paths and semantic similarity, in AMIA annual symposium proceedings, vol. 2009, American Medical Informatics Association, 2009, pp. 431–435.
[18] N. Meinshausen, Quantile regression forests, Journal of Machine Learning Research, 7 (2006), pp. 983–999.
[19] S. Y. Moon, Y. H. Baek, and S. W. Lee, Drug induced liver injury by prophylactic administration of albendazole, The Korean Journal of Gastroenterology, 73 (2019), pp. 360–364.
[20] T. Ota, N. Masuda, K. Matsui, T. Yamada, N. Tanaka, S. Fujimoto, and M. Fukuoka, Successful desensitization with crizotinib after crizotinib-induced liver injury in ROS1-rearranged lung adenocarcinoma, Internal Medicine, 58 (2019), pp. 2651–2655.
[21] A. Poleksic and L. Xie, Predicting serious rare adverse reactions of novel chemicals, Bioinformatics, 34 (2018), pp. 2835–2842.
[22] K. B. Pouwels and K. van Grootheest, The rosiglitazone decision process at FDA and EMA. what should we learn?, International Journal of Risk & Safety in Medicine, 24 (2012), pp. 73–80.
[23] D. M. W. Powers, Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation, (2020).
[24] P. Resnick and H. R. Varian, Recommender systems, Communications of the ACM, 40 (1997), pp. 56–58.
[25] P. P. Rodrigues, D. Ferreira-Santos, A. Silva, J. Polonia, and I. Ribeiro-Vaz ´ , Causality assessment of adverse drug reaction reports using an expert-defined bayesian network, Artificial Intelligence in Medicine, 91 (2018), pp. 12–22.
[26] S. Shabani-Mashcool, S.-A. Marashi, and S. Gharaghani, NDDSA: A network- and domain-based method for predicting drug-side effect associations, Information Processing & Management, 57 (2020), pp. 102357.
[27] H. R. Sofaer, J. A. Hoeting, and C. S. Jarnevich, The area under the precision-recall curve as a performance metric for rare binary events, Methods in Ecology and Evolution, 10 (2019), pp. 565–577.
[28] O. C. Uner, H. I. Kuru, R. G. Cinbis, O. Tastan, and E. Cicek, DeepSide: A deep learning approach for drug side effect prediction, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 20 (1) (2022), pp. 330–339.
[29] A. J. Wagstaff and K. L. Goa, Rosiglitazone: : a review of its use in the management of type 2 diabetes mellitus, Drugs, 62 (2002), pp. 1805–1837.
[30] F. Zhang, B. Sun, X. Diao, W. Zhao, and T. Shu, Prediction of adverse drug reactions based on knowledge graph embedding, BMC Medical Informatics and Decision Making, 21 (2021), pp. 1–11.
[31] H. Zhao, J. Wald, M. Palmer, and Y. Han, Hydroxychloroquine-induced cardiomyopathy and heart failure in twins, J. Thorac Dis., 10 (2018).
[32] H. Zhao, S. Wang, K. Zheng, Q. Zhao, F. Zhu, and J. Wang, A similarity-based deep learning approach for determining the frequencies of drug side effects, Briefings in Bioinformatics, 23 (2021).
[33] X. Zhao, L. Chen, and J. Lu, A similarity-based method for prediction of drug side effects with heterogeneous information, Mathematical Biosciences, 306 (2018), pp. 136–144.
[34] Y. Zheng, H. Peng, S. Ghosh, C. Lan, and J. Li, Inverse similarity and reliable negative samples for drug side-effect prediction, BMC Bioinformatics, 19 (2019).
[35] B. Zhou, X. Zhao, J. Lu, Z. Sun, M. Liu, Y. Zhou, R. Liu, and Y. Wang, Relating substructures and side effects of drugs with chemical-chemical interactions, Combinatorial Chemistry & High Throughput Screening, 23 (2020), pp. 285–294.