A systematic survey and empirical comparison of hybrid methods for imbalanced fraud detection: Combining resampling and machine learning

Yousefimehr, Behnam; Ghatee, Mehdi

doi:10.22060/ajmc.2025.24642.1446

A systematic survey and empirical comparison of hybrid methods for imbalanced fraud detection: Combining resampling and machine learning

Document Type : Review Article

Authors

Department of Mathematics and Computer Science, Amirkabir University of Technology (Tehran Polytechnic), Iran

10.22060/ajmc.2025.24642.1446

Abstract

The accurate identification of fraudulent activities has been a significant focus of computational research, leading to the development of diverse methodologies ranging from traditional statistical tests to advanced machine learning and deep learning models. A persistent and critical challenge undermining these approaches is the inherent class imbalance present in most real-world fraud datasets, where genuine transactions vastly outnumber fraudulent ones, often causing models to exhibit bias toward the majority class. To mitigate this issue, a promising paradigm has emerged: hybrid frameworks that synergistically integrate data resampling techniques with robust machine learning algorithms. These frameworks are particularly valuable for their potential to facilitate accurate, real-time automated detection systems. This survey provides a comprehensive examination of the efficacy and impact of such hybrid techniques on the field of fraud detection. To quantitatively evaluate their performance, we conduct a rigorous numerical study using auto insurance fraud as a case study. Employing the car fraud datasets, we perform a detailed comparative analysis of various detection algorithms, each coupled with different resampling methods. Our empirical results demonstrate that the performance of each fraud detection algorithm is profoundly contingent upon the specific resampling strategy employed, highlighting the necessity for careful methodological selection tailored to the dataset's characteristics. Code for analysis is available at \url{https://github.com/behnamy2010/Car-Claims-Compression}.

Keywords

Main Subjects

Artificial Intelligence and Machine Learning

References

[1] Y. Abakarim, M. Lahby, and A. Attioui, A bagged ensemble convolutional neural networks approach to recognize insurance claim frauds, Applied System Innovation, 6 (2023), p. 20.

[2] A. Abdallah, M. A. Maarof, and A. Zainal, Fraud detection system: A survey, Journal of Network and Computer Applications, 68 (2016), pp. 90–113.

[3] S. Abdelhadi, K. Elbahnasy, and M. Abdelsalam, A proposed model to predict auto insurance claims using machine learning techniques, Journal of Theoretical and Applied Information Technology, 98 (2020).

[4] B. Abma, Evaluation of requirements management tools with support for traceability-based change impact analysis, Master’s thesis, University of Twente, Enschede, (2009).

[5] O. S. Adebayo, T. A. Favour-Bethy, O. Otasowie, and O. A. Okunola, Comparative review of credit card fraud detection using machine learning and concept drift techniques, International Journal of Computer Science and Mobile Computing, 12 (2023), pp. 24–48.

[6] H. Ahmad, B. Kasasbeh, B. Aldabaybah, and E. Rawashdeh, Class balancing framework for credit card fraud detection based on clustering and similarity-based selection (sbs), International Journal of Information Technology, 15 (2023), pp. 325–333.

[7] K. I. Al-Daoud and I. A. Abu-AlSondos, Robust ai for financial fraud detection in the gcc: A hy- brid framework for imbalance, drift, and adversarial threats, Journal of Theoretical and Applied Electronic Commerce Research, 20 (2025), p. 121.

[8] K. G. Al-Hashedi and P. Magalingam, Financial fraud detection applying data mining techniques: A comprehensive review from 2009 to 2019, Computer Science Review, 40 (2021), p. 100402.

[9] H. M. Al Lawati, A. Zainal, B. A. S. Al-Rimy, M. Al-Azawi, M. N. Kassim, S. A. Almalki, and T. A. Alghamdi, An integrated preprocessing and drift detection approach with adaptive windowing for fraud detection in payment systems, IEEE Access, (2025).

[10] A. A. Z. Alabdeen, Adaptive anomaly based fraud detection model for handling concept drift in short-term profile, PhD’s thesis, Universiti Teknologi Malaysia, (2018).

[11] A. Ali, S. Abd Razak, S. H. Othman, T. A. E. Eisa, A. Al-Dhaqm, M. Nasser, T. Elhassan, H. Elshafie, and A. Saif, Financial fraud detection based on machine learning: a systematic literature review, Applied Sciences, 12 (2022), p. 9637.

[12] A. A. Amponsah, A. F. Adekoya, and B. A. Weyori, A novel fraud detection and prevention method for healthcare claim processing using machine learning and blockchain technology, Decision Analytics Journal, 4 (2022), p. 100122.

[13] P. P. Angelov, E. A. Soares, R. Jiang, N. I. Arnold, and P. M. Atkinson, Explainable artificial intelligence: an analytical review, WIREs Data Mining and Knowledge Discovery, 11 (2021), p. e1424.

[14] J. M. Arockiam and A. C. S. Pushpanathan, Mapreduce-iterative support vector machine classifier: novel fraud detection systems in healthcare insurance industry, International Journal of Electrical and Computer Engineering (IJECE), 13 (2023), p. 756.

[15] F. Aslam, A. I. Hunjra, Z. Ftiti, W. Louhichi, and T. Shams, Insurance fraud detection: Evidence from artificial intelligence and machine learning, Research in International Business and Finance, 62 (2022), p. 101744.

[16] S. S. Asrori, L. Wang, and S. Ozawa, Permissioned blockchain-based xgboost for multi banks fraud detection, in International Conference on Neural Information Processing, Springer, 2022, pp. 683–692.

[17] T. Badriyah, L. Rahmaniah, and I. Syarif, Nearest neighbour and statistics method based for detecting fraud in auto insurance, in 2018 International Conference on Applied Engineering (ICAE), IEEE, 2018, pp. 1– 5.

[18] G. E. Batista, A. L. Bazzan, M. C. Monard, et al., Balancing training data for automated annotation of keywords: a case study., in WOB, 2003, pp. 10–18.

[19] G. E. Batista, R. C. Prati, and M. C. Monard, A study of the behavior of several methods for balancing machine learning training data, ACM SIGKDD explorations newsletter, 6 (2004), pp. 20–29.

[20] B. Benedek, C. Ciumas, and B. Z. Nagy, On the cost-efficiency of automobile insurance fraud detection methods: A meta-analysis, Global Business Review, (2023), p. 09721509231158194.

[21] B. Benedek and B. Z. Nagy, Traditional versus ai-based fraud detection: Cost efficiency in the field of automobile insurance, Financial and Economic Review, 22 (2023), pp. 77–98

[22] R. Bhowmik, Detecting auto insurance fraud by data mining techniques, Journal of Emerging Trends in Computing and Information Sciences, 2 (2011), pp. 156–162.

[23] A. Bodaghi and B. Teimourpour, Automobile insurance fraud detection using social network analysis, Applications of Data Management and Analysis: Case Studies in Social Networks and Beyond, (2018), pp. 11– 16.

[24] P. Boulieris, J. Pavlopoulos, A. Xenos, and V. Vassalos, Fraud detection with natural language processing, Machine Learning, (2023), pp. 1–22.

[25] L. Breiman, Random forests, Machine learning, 45 (2001), pp. 5–32.

[26] L. Breiman, J. Friedman, C. J. Stone, and R. Olshen, Classification and regression trees, Routledge, 1 ed., 1984.

[27] A. Calafato, C. Colombo, and G. J. Pace, A controlled natural language for tax fraud detection, in Controlled Natural Language: 5th International Workshop, CNL 2016, Aberdeen, UK, July 25-27, 2016, Proceedings 5, Springer, 2016, pp. 1–12.

[28] Y. Cao, Y. Ma, Y. Zhu, and K. M. Ting, Revisiting streaming anomaly detection: benchmark and evaluation, Artificial Intelligence Review, 58 (2024), p. 8.

[29] V. Chandola, A. Banerjee, and V. Kumar, Anomaly detection: A survey, ACM computing surveys (CSUR), 41 (2009), pp. 1–58.

[30] J.-W. Chang, N. Yen, and J. C. Hung, Design of a nlp-empowered finance fraud awareness model: the anti-fraud chatbot for fraud detection and fraud classification as an instance, Journal of Ambient Intelligence and Humanized Computing, 13 (2022), pp. 4663–4679.

[31] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, Smote: synthetic minority over-sampling technique, Journal of artificial intelligence research, 16 (2002), pp. 321–357.

[32] T. Chen and C. Guestrin, Xgboost: A scalable tree boosting system, in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 2016, pp. 785–794.

[33] A. Cherif, A. Badhib, H. Ammar, S. Alshehri, M. Kalkatawi, and A. Imine, Credit card fraud detection in the era of disruptive technologies: A systematic review, Journal of King Saud University-Computer and Information Sciences, (2022).

[34] S. Choirunnisa and J. Lianto, Hybrid method of undersampling and oversampling for handling imbalanced data, in 2018 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), IEEE, 2018, pp. 276–280.

[35] K. Chowdhary and K. Chowdhary, Natural language processing, Fundamentals of artificial intelligence, (2020), pp. 603–649.

[36] C. Cortes and V. Vapnik, Support-vector networks, Machine learning, 20 (1995), pp. 273–297.

[37] Y. Cui, X. Han, J. Chen, X. Zhang, J. Yang, and X. Zhang, Fraudgnn-rl: a graph neural network with reinforcement learning for adaptive financial fraud detection, IEEE Open Journal of the Computer Society, (2025).

[38] A. Dal Pozzolo, G. Boracchi, O. Caelen, C. Alippi, and G. Bontempi, Credit card fraud detection and concept-drift adaptation with delayed supervised information, in 2015 international joint conference on Neural networks (IJCNN), IEEE, 2015, pp. 1–8.

[39] Z. Deng, G. Xin, Y. Liu, W. Wang, and B. Wang, Contrastive graph neural network-based camouflaged fraud detector, Information Sciences, 618 (2022), pp. 39–52.

[40] R. Dwivedi, D. Dave, H. Naik, S. Singhal, R. Omer, P. Patel, B. Qian, Z. Wen, T. Shah, G. Morgan, et al., Explainable ai (xai): Core ideas, techniques, and solutions, ACM Computing Surveys, 55 (2023), pp. 1–33.

[41] C. Elkan, The foundations of cost-sensitive learning, in International joint conference on artificial intelli-gence, vol. 17, Lawrence Erlbaum Associates Ltd, 2001, pp. 973–978.

[42] I. Eweoya, A. Adebiyi, A. Azeta, and O. Okesola, Fraud prediction in bank credit administration: A systematic literature review, Journal of Theoretical and Applied Information Technology, (2019).

[43] H. Fanai and H. Abbasimehr, A novel combined approach based on deep autoencoder and deep classifiers for credit card fraud detection, Expert Systems with Applications, 217 (2023), p. 119562.

[44] H. Farbmacher, L. L¨ow, and M. Spindler, An explainable attention network for fraud detection in claims management, Journal of Econometrics, 228 (2022), pp. 244–258.

[45] J. Fernandez Rodriguez, A natural language processing approach to fraud detection, Master’s thesis, Dipartimento di Elettronica Informazione e Bioingegneria, Politecnico di Milano, (2020).

[46] U. Fiore, A. De Santis, F. Perla, P. Zanetti, and F. Palmieri, Using generative adversarial networks for improving classification effectiveness in credit card fraud detection, Information Sciences, 479 (2019),
1. 448–455.
[47] Y. Freund and R. E. Schapire, A desicion-theoretic generalization of on-line learning and an application to boosting, in European conference on computational learning theory, Springer, 1995, pp. 23–37.

[48] P. Fukas, J. Rebstadt, L. Menzel, and O. Thomas, Towards explainable artificial intelligence in financial fraud detection: Using shapley additive explanations to explore feature importance, in Advanced Information Systems Engineering, X. Franch, G. Poels, F. Gailly, and M. Snoeck, eds., Cham, 2022, Springer International Publishing, pp. 109–126.

[49] D. Gaspar, P. Silva, and C. Silva, Explainable ai for intrusion detection systems: Lime and shap appli- cability on multi-layer perceptron, IEEE Access, (2024).

[50] V. Gonzalez, Evaluating interpretable models for financial fraud detection, in AMCIS 2024 Proceedings, Salt Lake City, 2024, pp. 1–5.

[51] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, Generative adversarial networks, Communications of the ACM, 63 (2020), pp. 139–144.

[52] B. M. Greenwell, pdp: An R package for constructing partial dependence plots, The R Journal, 9 (2017).

[53] W. Guan, J. Cao, Y. Gu, and S. Qian, Gama: A multi-graph-based anomaly detection framework for business processes via graph neural networks, Information Systems, 124 (2024), p. 102405.

[54] H. Han, W.-Y. Wang, and B.-H. Mao, Borderline-smote: a new over-sampling method in imbalanced data sets learning, in International conference on intelligent computing, Springer, 2005, pp. 878–887.

[55] J. T. Hancock, R. A. Bauder, H. Wang, and T. M. Khoshgoftaar, Explainable machine learning models for medicare fraud detection, Journal of Big Data, 10 (2023), p. 154.

[56] P. Handel, I. Skog, J. Wahlstrom, F. Bonawiede, R. Welch, J. Ohlsson, and M. Ohlsson, Insur- ance telematics: Opportunities and challenges with the smartphone solution, IEEE Intelligent Transportation Systems Magazine, 6 (2014), pp. 57–70.

[57] S. Harjai, S. K. Khatri, and G. Singh, Detecting fraudulent insurance claims using random forests and synthetic minority oversampling technique, in 2019 4th International Conference on Information Systems and Computer Networks (ISCON), IEEE, 2019, pp. 123–128.

[58] H. He, Y. Bai, E. A. Garcia, and S. Li, Adasyn: Adaptive synthetic sampling approach for imbalanced learning, in 2008 IEEE international joint conference on neural networks (IEEE world congress on computa- tional intelligence), IEEE, 2008, pp. 1322–1328.

[59] H. He and E. A. Garcia, Learning from imbalanced data, IEEE Transactions on knowledge and data engineering, 21 (2009), pp. 1263–1284.

[60] S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural computation, 9 (1997), pp. 1735– 1780.

[61] C. Hu, Z. Quan, and W. F. Chong, Imbalanced learning for insurance using modified loss functions in tree-based models, Insurance: Mathematics and Economics, 106 (2022), pp. 13–32

[62] L. Hu, Y. Lu, and Y. Feng, Concept drift detection based on deep neural networks and autoencoders, Applied Sciences, 15 (2025), p. 3056.

[63] C. Huang, W. Wang, D. Liu, R. Lu, and X. Shen, Blockchain-assisted personalized car insurance with privacy preservation and fraud resistance, IEEE Transactions on Vehicular Technology, 72 (2022), pp. 3777– 3792.

[64] B. Itri, Y. Mohamed, Q. Mohammed, and B. Omar, Performance comparative study of machine learning algorithms for automobile insurance fraud detection, in 2019 Third International Conference on Intelligent Computing in Data Sciences (ICDS), IEEE, 2019, pp. 1–4.

[65] B. Itri, Y. Mohamed, B. Omar, and Q. Mohamed, Empirical oversampling threshold strategy for ma- chine learning performance optimisation in insurance fraud detection, International Journal of Advanced Computer Science and Applications, 11 (2020).

[66] R. K. Jagait, M. N. Fekri, K. Grolinger, and S. Mir, Load forecasting under concept drift: Online ensemble learning with recurrent neural network and arima, IEEE Access, 9 (2021), pp. 98992–99008.

[67] C. Jorge, R. Cao, J. M. Vilar, et al., Cost-sensitive thresholding over a two-dimensional decision region for fraud detection, Information Sciences, 657 (2024), p. 119956.

[68] R. Kaafarani, L. Ismail, and O. Zahwe, An adaptive decision-making approach for better selection of blockchain platform for health insurance frauds detection with smart contracts: development and performance evaluation, Procedia Computer Science, 220 (2023), pp. 470–477.

[69] I. Kabir, M. K. Momo, and T. Tazrian, Fraud detection in e-commerce using natural language processing, PhD’s thesis, Brac University, (2023).

[70] K. Kapadiya, U. Patel, R. Gupta, M. D. Alshehri, S. Tanwar, G. Sharma, and P. N. Bokoro, Blockchain and ai-empowered healthcare insurance fraud detection: an analysis, architecture, and future prospects, IEEE Access, 10 (2022), pp. 79606–79627.

[71] S. Kaufman, S. Rosset, C. Perlich, and O. Stitelman, Leakage in data mining: Formulation, detec- tion, and avoidance, ACM Transactions on Knowledge Discovery from Data (TKDD), 6 (2012), pp. 1–21.

[72] G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu, Lightgbm: A highly efficient gradient boosting decision tree, Advances in neural information processing systems, 30 (2017).

[73] R. K. Kennedy, F. Villanustre, and T. M. Khoshgoftaar, Unsupervised feature selection and class labeling for credit card fraud, Journal of Big Data, 12 (2025), p. 111.

[74] J. Kester, Insuring future automobility: A qualitative discussion of british and dutch car insurer’s re- sponses to connected and automated vehicles, Research in Transportation Business & Management, 45 (2022), p. 100903.

[75] A. Kilroy and K. A. Smith, Insurance fraud statistics 2024, https://www.forbes.com/advisor/insurance/fraud-statistics/, Accessed: 2024-06-25, (2024).

[76] T. N. Kipf and M. Welling, Semi-supervised classification with graph convolutional networks, ICLR, (2017).

[77] Y. Kou, C.-T. Lu, S. Sirwongwattana, and Y.-P. Huang, Survey of fraud detection techniques, in IEEE International Conference on Networking, Sensing and Control, 2004, vol. 2, IEEE, 2004, pp. 749–754.

[78] I. Koychev, Gradual forgetting for adaptation to concept drift, Proceedings of ECAI 2000 Workshop on Current Issues in Spatio-Temporal Reasoning, (2000).

[79] T. Le, M. T. Vo, B. Vo, M. Y. Lee, and S. W. Baik, A hybrid approach using oversampling technique and cost-sensitive learning for bankruptcy prediction, Complexity, 2019 (2019).

[80] T.-T.-H. Le, A. T. Prihatno, Y. E. Oktian, H. Kang, and H. Kim, Exploring local explanation of practical industrial ai applications: A systematic literature review, Applied Sciences, 13 (2023), p. 5809.

[81] B. Lebichot, Y.-A. Le Borgne, L. He-Guelton, F. Obl´e, and G. Bontempi, Deep-learning domain adaptation techniques for credit cards fraud detection, in Recent Advances in Big Data and Deep Learning: Proceedings of the INNS Big Data and Deep Learning Conference INNSBDDL2019, held at Sestri Levante, Genova, Italy 16-18 April 2019, Springer, 2020, pp. 78–88.

[82] B. Lebichot, T. Verhelst, Y.-A. Le Borgne, L. He-Guelton, F. Oble, and G. Bontempi, Transfer learning strategies for credit card fraud detection, IEEE access, 9 (2021), pp. 114754–114766.

[83] L. Li and J. Xu, Graph transformer-based self-adaptive malicious relation filtering for fraudulent comments detection in social network, Knowledge-Based Systems, 280 (2023), p. 111005.

[84] C. X. Ling and V. S. Sheng, Class imbalance problem, In: C. Sammut, G. I. Webb (eds) Encyclopedia of machine learning, Springer, Boston, MA (2011), p. 171.

[85] S. M. Lundberg and S.-I. Lee, A unified approach to interpreting model predictions, in Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, eds., Curran Associates, Inc., 2017, pp. 4765–4774.

[86] S. Mahapatra and D. Sinha, Smart h-chain: A blockchain based healthcare framework with insurance fraud detection, Transactions on Emerging Telecommunications Technologies, 35 (2024), p. e4911.

[87] T.-D. Mai, K. Hoang, A. Baigutanova, G. Alina, and S. Kim, Customs fraud detection in the presence of concept drift, in 2021 International Conference on Data Mining Workshops (ICDMW), IEEE, 2021, pp. 370– 379.

[88] L. Maiano, A. Montuschi, M. Caserio, E. Ferri, F. Kieffer, C. German`o, L. Baiocco, L. R. Celsi, I. Amerini, and A. Anagnostopoulos, A deep-learning–based antifraud system for car-insurance claims, Expert Systems with Applications, 231 (2023), p. 120644.

[89] A. Maillart, Toward an explainable machine learning model for claim frequency: a use case in car insurance pricing with telematics data, European Actuarial Journal, (2021), pp. 1–39.

[90] D. Malekian and M. R. Hashemi, An adaptive profile based fraud detection framework for handling concept drift, in 2013 10th International ISC conference on information security and cryptology (ISCISC), IEEE, 2013, pp. 1–6.

[91] I. Mani and I. Zhang, knn approach to unbalanced data distributions: a case study involving information extraction, in Proceedings of workshop on learning from imbalanced datasets, vol. 126, ICML, 2003, pp. 1–7.

[92] A. Mart´ın-Mart´ın, M. Thelwall, E. Orduna-Malea, and E. Delgado L´opez-C´ozar, Google scholar, microsoft academic, scopus, dimensions, web of science, and opencitations’ coci: a multidisciplinary comparison of coverage via citations, Scientometrics, 126 (2021), pp. 871–906.

[93] J. C. Mendoza-Tello, T. Mendoza-Tello, and H. Mora, Blockchain as a healthcare insurance fraud detection tool, in Research and Innovation Forum 2020: Disruptive Technologies in Times of Change, Springer, 2021, pp. 545–552.

[94] I. D. Mienye and Y. Sun, A deep learning ensemble with data resampling for credit card fraud detection, IEEE Access, 11 (2023), pp. 30628–30638.

[95] L. L. Minku, A. P. White, and X. Yao, The impact of diversity on online ensemble learning in the presence of concept drift, IEEE Transactions on Knowledge and Data Engineering, 22 (2009), pp. 730–742.

[96] C. Molnar, Interpretable machine learning, Leanpub, 2020.

[97] R. K. Mothilal, A. Sharma, and C. Tan, Explaining machine learning classifiers through diverse coun- terfactual explanations, in Proceedings of the 2020 conference on fairness, accountability, and transparency, 2020, pp. 607–617.

[98] C. Muranda, A. Ali, and T. Shongwe, Detecting fraudulent motor insurance claims using support vec- tor machines with adaptive synthetic sampling method, in 2020 61st International Scientific Conference on Information Technology and Management Science of Riga Technical University (ITMS), IEEE, 2020, pp. 1–5

[99] C. Muranda, A. Ali, and T. Shongwe, Deep learning method for detecting fraudulent motor insurance claims using unbalanced data, in 2021 62nd International Scientific Conference on Information Technology and Management Science of Riga Technical University (ITMS), IEEE, 2021, pp. 1–5.

[100] S. Najjar-Ghabel, S. Yousefi, and P. Habibi, Comparative analysis and practical implementation of machine learning algorithms for phishing website detection, in 2024 9th International Conference on Computer Science and Engineering (UBMK), IEEE, 2024, pp. 1–6.

[101] M. Nallakaruppan, B. Balusamy, M. L. Shri, V. Malathi, and S. Bhattacharyya, An explainable ai framework for credit evaluation and analysis, Applied Soft Computing, 153 (2024), p. 111307.

[102] E. W. Ngai, Y. Hu, Y. H. Wong, Y. Chen, and X. Sun, The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature, Decision support systems, 50 (2011), pp. 559–569.

[103] T. T. Nguyen, T. C. Phan, H. T. Pham, T. T. Nguyen, J. Jo, and Q. V. H. Nguyen, Example-based explanations for streaming fraud detection on graphs, Information Sciences, 621 (2023), pp. 319–340.

[104] S. N. Nobel, S. Sultana, S. P. Singha, S. Chaki, M. J. N. Mahi, T. Jan, A. Barros, and M. Whaiduzzaman, Unmasking banking fraud: Unleashing the power of machine learning and explainable AI (XAI) on imbalanced data, Information, 15 (2024), p. 298.

[105] S. J. Omar, K. Fred, and K. K. Swaib, A state-of-the-art review of machine learning techniques for fraud detection research, in Proceedings of the 2018 International Conference on Software Engineering in Africa, 2018, pp. 11–19.

[106] S. Onishi, M. Nishimura, R. Fujimura, and Y. Hayashi, Why do tree ensemble approximators not outperform the recursive-rule extraction algorithm?, Machine Learning and Knowledge Extraction, 6 (2024), pp. 658–678.

[107] E. Owens, B. Sheehan, M. Mullins, M. Cunneen, J. Ressel, and G. Castignani, Explainable artificial intelligence (xai) in insurance, Risks, 10 (2022).

[108] J. Pacheco, J. Chela, and G. Salom´e, Fraud detection with machine learning: model comparison, Inter- national Journal of Business Intelligence and Data Mining, 22 (2023), pp. 434–450.

[109] S. Padhi and S. Panigrahi, Decision templates based ensemble classifiers for automobile insurance fraud detection, in 2019 Global Conference for Advancement in Technology (GCAT), IEEE, 2019, pp. 1–5.

[110] E. Parkar, S. Gite, S. Mishra, B. Pradhan, and A. Alamri, Comparative study of deep learning ex- plainability and causal ai for fraud detection, International Journal on Smart Sensing and Intelligent Systems, 17 (2024).

[111] D. K. Patel and S. Subudhi, Application of extreme learning machine in detecting auto insurance fraud, in 2019 International Conference on Applied Machine Learning (ICAML), IEEE, 2019, pp. 78–81.

[112] J. M. P´erez, J. Muguerza, O. Arbelaitz, I. Gurrutxaga, and J. I. Mart´ın, Consolidated tree classifier learning in a car insurance fraud detection domain with class imbalance, in Pattern Recognition and Data Mining: Third International Conference on Advances in Pattern Recognition, ICAPR 2005, Bath, UK, August 22-25, 2005, Proceedings, Part I 3, Springer, 2005, pp. 381–389.

[113] C. Phua, D. Alahakoon, and V. Lee, Minority report in fraud detection: Classification of skewed data, SIGKDD Explor. Newsl., 6 (2004), pp. 50–59.

[114] V. Pillai, Enhancing transparency and understanding in ai decision-making processes, Iconic Research and Engineering Journals, 8 (2024), pp. 168–172.

[115] S. O. Pinto and V. A. Sobreiro, Literature review: Anomaly detection approaches on digital business financial systems, Digital Business, (2022), p. 100038.

[116] T. Pourhabibi, K.-L. Ong, B. H. Kam, and Y. L. Boo, Fraud detection: A systematic literature review of graph-based anomaly detection approaches, Decision Support Systems, 133 (2020), p. 113303.

[117] I. M. N. Prasasti, A. Dhini, and E. Laoh, Automobile insurance fraud detection using supervised classi- fiers, in 2020 International Workshop on Big Data and Information Security (IWBIS), IEEE, 2020, pp. 47–52.

[118] L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, Catboost: unbiased boosting with categorical features, Advances in neural information processing systems, 31 (2018).

[119] L. R. Rabiner, A tutorial on hidden markov models and selected applications in speech recognition, Proceed- ings of the IEEE, 77 (1989), pp. 257–286.

[120] B. Raufi, C. Finnegan, and L. Longo, A comparative analysis of shap, lime, anchors, and dice for inter- preting a dense neural network in credit card fraud detection, in World Conference on Explainable Artificial Intelligence, Springer, 2024, pp. 365–383.

[121] M. T. Ribeiro, S. Singh, and C. Guestrin, “why should i trust you?”: Explaining the predictions of any classifier, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, New York, NY, USA, 2016, Association for Computing Machinery, p. 1135–1144.

[122] , Anchors: High-precision model-agnostic explanations, in Proceedings of the Thirty-Second AAAI Con- ference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, vol. 32, AAAI Press, 2018, pp. 1527–1535.

[123] M. Sabuhi, M. Zhou, C.-P. Bezemer, and P. Musilek, Applications of generative adversarial networks in anomaly detection: a systematic literature review, Ieee Access, 9 (2021), pp. 161003–161029.

[124] G. Saldamli, V. Reddy, K. S. Bojja, M. K. Gururaja, Y. Doddaveerappa, and L. Tawalbeh, Health care insurance fraud detection using blockchain, in 2020 seventh international conference on software defined systems (SDS), IEEE, 2020, pp. 145–152.

[125] Z. Salekshahrezaee, J. L. Leevy, and T. M. Khoshgoftaar, The effect of feature extraction and data sampling on credit card fraud detection, Journal of Big Data, 10 (2023), p. 6.

[126] M. Salmi and D. Atif, A data mining approach for imbalanced automobile insurance fraud data with evaluation of two sampling techniques and two filters, Journal of Information Assurance and Security, 17 (2022), pp. 122–135.

[127] B. K. Sethi, P. K. Sarangi, and A. S. Aashrith, Medical insurance fraud detection based on block chain and machine learning approach, in 2022 Fourth International Conference on Emerging Research in Electronics, Computer Science and Technology (ICERECT), IEEE, 2022, pp. 1–4.

[128] B. K. Sethi, D. Singh, and P. K. Sarangi, Medical insurance fraud detection based on block chain and deep learning approach, in 2022 International Conference on Disruptive Technologies for Multi-Disciplinary Research and Applications (CENTCON), vol. 2, IEEE, 2022, pp. 103–106.

[129] M. K. Severino and Y. Peng, Machine learning algorithms for fraud prediction in property insurance: Empirical evidence using real-world microdata, Machine Learning with Applications, 5 (2021), p. 100074.

[130] Z. Shaeiri and S. Kazemitabar, Fast unsupervised automobile insurance fraud detection based on spectral ranking of anomalies, International Journal of Engineering, 33 (2020), pp. 1240–1248.

[131] A. Shahapurkar and R. Patil, Concept drift and machine learning model for detecting fraudulent trans- actions in streaming environment., International Journal of Electrical & Computer Engineering (2088-8708), 13 (2023).

[132] N. Shaik, N. R. Kar, B. Thankachan, A. K. Pathak, J. Singh, and S. Gupta, Utilizing blockchain and deep learning for decentralized discovery of deceptive practices in healthcare insurance, in 2023 3rd In- ternational Conference on Technological Advancements in Computational Sciences (ICTACS), IEEE, 2023, pp. 445–450.

[133] S. K. Shamitha and V. Ilango, Importance of self-learning algorithms for fraud detection under concept drift, in International Conference on Artificial Intelligence and Sustainable Engineering: Select Proceedings of AISE 2020, Volume 2, Springer, 2022, pp. 343–354.

[134] W. Siblini, G. Coter, R. Fabry, L. He-Guelton, F. Obl´e, B. Lebichot, Y.-A. L. Borgne, and G. Bontempi, Transfer learning for credit card fraud detection: A journey from research to production, In The Proceedings of the Data Science and Advanced Analytics (DSAA 2021) IEEE conference, (2021).

[135] R. Singh, M. P. Ayyar, T. V. S. Pavan, S. Gosain, and R. R. Shah, Automating car insurance claims using deep learning techniques, in 2019 IEEE fifth international conference on multimedia big data (BigMM), IEEE, 2019, pp. 199–207.

[136] A. Singla and H. Jangir, A comparative approach to predictive analytics with machine learning for fraud detection of realtime financial data, in 2020 International Conference on Emerging Trends in Communication, Control and Computing (ICONC3), IEEE, 2020, pp. 1–4.

[137] D. Sisodia and D. S. Sisodia, Feature space transformation of user-clicks and deep transfer learning framework for fraudulent publisher detection in online advertising, Applied Soft Computing, 125 (2022), p. 109142.

[138] , A transfer learning framework towards identifying behavioral changes of fraudulent publishers in pay- per-click model of online advertising for click fraud detection, Expert Systems with Applications, 232 (2023), p. 120922.

[139] A. Somasundaram and S. Reddy, Parallel and incremental credit card fraud detection model to handle concept drift and data imbalance, Neural Computing and Applications, 31 (2019), pp. 3–14.

[140] P. Sood, C. Sharma, S. Nijjer, and S. Sakhuja, Review the role of artificial intelligence in detecting and preventing financial fraud using natural language processing, International Journal of System Assurance Engineering and Management, (2023), pp. 1–16.

[141] E. Soufiane, S.-E. EL Baghdadi, A. Berrahou, A. Mesbah, and H. Berbia, Automobile insurance claims auditing: A comprehensive survey on handling awry datasets, in WITS 2020: Proceedings of the 6th International Conference on Wireless Technologies, Embedded, and Intelligent Systems, Springer, 2022, pp. 135–144.

[142] E. Strelcenia and S. Prakoonwit, Improving classification performance in credit card fraud detection by using new data augmentation, AI, 4 (2023), pp. 172–198.

[143] , A survey on gan techniques for data augmentation to address the imbalanced data issues in credit card fraud detection, Machine Learning and Knowledge Extraction, 5 (2023), pp. 304–329.

[144] L. ˇSubelj, ˇS. Furlan, and M. Bajec, An expert system for detecting automobile insurance fraud using social network analysis, Expert Systems with Applications, 38 (2011), pp. 1039–1052.

[145] S. Subudhi and S. Panigrahi, Effect of class imbalanceness in detecting automobile insurance fraud, in 2018 2nd International Conference on Data Science and Business Analytics (ICDSBA), IEEE, 2018, pp. 528–531.

[146] Y. Sun, L. Lan, X. Zhao, M. Fan, Q. Guo, and C. Li, Selective multi-source transfer learning with wasserstein domain distance for financial fraud detection, in Intelligent Computing and Block Chain: First BenchCouncil International Federated Conferences, FICC 2020, Qingdao, China, October 30–November 3, 2020, Revised Selected Papers 1, Springer, 2021, pp. 489–505.

[147] I. Tomek, Two modifications of cnn, IEEE Transactions on Systems, Man, and Cybernetics, SMC-6 (1976), pp. 769–772.

[148] A. Tsymbal, The problem of concept drift: definitions and related work, Computer Science Department, Trinity College Dublin, 106 (2004), p. 58.

[149] F. J. Valverde-Albacete, J. Carrillo-de Albornoz, and C. Pel´aez-Moreno, A proposal for new evaluation metrics and result visualization technique for sentiment analysis tasks, in Information Access Evalu- ation. Multilinguality, Multimodality, and Visualization: 4th International Conference of the CLEF Initiative, CLEF 2013, Valencia, Spain, September 23-26, 2013. Proceedings 4, Springer, 2013, pp. 41–52.

[150] R. Van Belle, B. Baesens, and J. De Weerdt, Catchm: A novel network-based credit card fraud detection method using node representation learning, Decision Support Systems, 164 (2023), p. 113866.

[151] I. Vorobyev, Fraud risk assessment in car insurance using claims graph features in machine learning, Expert Systems with Applications, 251 (2024), p. 124109.

[152] H. Wang and Z. Abraham, Concept drift detection for streaming data, in 2015 international joint conference on neural networks (IJCNN), IEEE, 2015, pp. 1–9.

[153] H. Wang, J. Zheng, I. E. Carvajal-Roca, L. Chen, and M. Bai, Financial fraud detection based on deep learning: Towards large-scale pre-training transformer models, in China Conference on Knowledge Graph and Semantic Computing, Springer, 2023, pp. 163–177.

[154] S.-C. Wang and S.-C. Wang, Artificial neural network, Interdisciplinary computing in java programming, (2003), pp. 81–100.

[155] X. Wang, Z. Liu, J. Liu, and J. Liu, Fraud detection on multi-relation graphs via imbalanced and interactive learning, Information Sciences, 642 (2023), p. 119153.

[156] Y. Wang and W. Xu, Leveraging deep learning with lda-based text analytics to detect automobile insurance fraud, Decision Support Systems, 105 (2018), pp. 87–95.

[157] Z. Wang, X. Chen, Y. Wu, L. Jiang, S. Lin, and G. Qiu, A robust and interpretable ensemble machine learning model for predicting healthcare insurance fraud, Scientific Reports, 15 (2025), p. 218.

[158] J. West and M. Bhattacharya, Intelligent financial fraud detection: a comprehensive review, Computers & security, 57 (2016), pp. 47–66.

[159] B. Wu, K.-M. Chao, and Y. Li, Heterogeneous graph neural networks for fraud detection and explanation in supply chain finance, Information Systems, 121 (2024), p. 102335.

[160] B. Wu, X. Yao, B. Zhang, K.-M. Chao, and Y. Li, Splitgnn: Spectral graph neural network for fraud detection against heterophily, in Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023, pp. 2737–2746.

[161] J. Wu, R. Hu, D. Li, L. Ren, W. Hu, and Y. Zang, A gnn-based fraud detector with dual resistance to graph disassortativity and imbalance, Information Sciences, 669 (2024), p. 120580.

[162] H. Xia, Y. Zhou, and Z. Zhang, Auto insurance fraud identification based on a cnn-lstm fusion deep learning model, International Journal of Ad Hoc and Ubiquitous Computing, 39 (2022), pp. 37–45.

[163] Z. Xu, X. Huang, Y. Zhao, Y. Dong, and J. Li, Contrastive attributed network anomaly detection with data augmentation, in Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, 2022, pp. 444–457.

[164] C. Yan, M. Li, W. Liu, and M. Qi, Improved adaptive genetic algorithm for the vehicle insurance fraud identification model based on a bp neural network, Theoretical Computer Science, 817 (2020), pp. 12–23.

[165] S. Yousefi, S. Najjar-Ghabel, and Z. S. Owaid, A supervised learning-based framework for failure detection in toy cars using acoustic signal analysis, in 2025 IEEE 7th Symposium on Computers & Informatics (ISCI), IEEE, 2025, pp. 76–81.

[166] B. Yousefimehr, Car-claims-compression. https://github.com/behnamy2010/Car-Claims-Compression, 2024. Accessed: 2024-08-29.

[167] B. Yousefimehr and M. Ghatee, A distribution-preserving method for resampling combined with lightgbm- lstm for sequence-wise fraud detection in credit card transactions, Expert Systems with Applications, 262 (2025), p. 125661.

[168] B. Yousefimehr, M. Ghatee, and A. Heydari, Improving adhd detection with cost-sensitive lightgbm, in 2024 14th International Conference on Computer and Knowledge Engineering (ICCKE), IEEE, 2024, pp. 109– 113.

[169] B. Yousefimehr, M. Ghatee, and R. Razavi-Far, Multi-teacher knowledge distillation framework for lightweight anomaly detection, Neural Networks, (2025), p. 108267.

[170] B. Yousefimehr, M. Ghatee, M. A. Seifi, J. Fazli, S. Tavakoli, Z. Rafei, S. Ghaffari, A. Nikahd, M. R. Gandomani, A. Orouji, et al., Data balancing strategies: A survey of resampling and augmentation methods, arXiv preprint arXiv:2505.13518, (2025).

[171] X. Zhang, Y. Han, W. Xu, and Q. Wang, Hoba: A novel feature engineering methodology for credit card fraud detection with a deep learning architecture, Information Sciences, 557 (2021), pp. 302–316.

[172] P. Zheng, Dynamic Fraud Detection via Sequential Modeling, University of Arkansas, 2020.

[173] M. Zhong, Y. Wang, J. Yan, Y. Cheng, and P. Sun, Transformer-based comparative multi-view illegal transaction detection, Plos one, 18 (2023), p. e0276495.

[174] I. ˇZliobait ˙e, M. Pechenizkiy, and J. Gama, An overview of concept drift applications, Big data analysis: new algorithms for a new society, (2016), pp. 91–114.

AUT Journal of Mathematics and Computing

Article View: 93,943
PDF Download: 5,595

A systematic survey and empirical comparison of hybrid methods for imbalanced fraud detection: Combining resampling and machine learning

References

Volume 7, Issue 1
January 2026
Pages 85-116

Files

Share

How to cite

Statistics

A systematic survey and empirical comparison of hybrid methods for imbalanced fraud detection: Combining resampling and machine learning

References

Volume 7, Issue 1January 2026Pages 85-116

Files

Share

How to cite

Statistics

Volume 7, Issue 1
January 2026
Pages 85-116