A systematic survey and empirical comparison of hybrid methods for imbalanced fraud detection: Combining resampling and machine learning

Document Type : Review Article

Authors

1 Department of Mathematics and Computer Science, Amirkabir University of Technology, Tehran, Iran

2 Department of Mathematics and Computer Science, Amirkabir University of Technology (Tehran Polytechnic)

Abstract

The accurate identification of fraudulent activities has been a significant focus of computational research, leading to the development of diverse methodologies ranging from traditional statistical tests to advanced machine learning and deep learning models. A persistent and critical challenge undermining these approaches is the inherent class imbalance present in most real-world fraud datasets, where genuine transactions vastly outnumber fraudulent ones, often causing models to exhibit bias toward the majority class. To mitigate this issue, a promising paradigm has emerged: hybrid frameworks that synergistically integrate data resampling techniques with robust machine learning algorithms. These frameworks are particularly valuable for their potential to facilitate accurate, real-time automated detection systems. This survey provides a comprehensive examination of the efficacy and impact of such hybrid techniques on the field of fraud detection. To quantitatively evaluate their performance, we conduct a rigorous numerical study using auto insurance fraud as a case study. Employing the Car fraud datasets, we perform a detailed comparative analysis of various detection algorithms, each coupled with different resampling methods. Our empirical results demonstrate that the performance of each fraud detection algorithm is profoundly contingent upon the specific resampling strategy employed, highlighting the necessity for careful methodological selection tailored to the dataset's characteristics. Code for analysis is available at \url{https://github.com/behnamy2010/Car-Claims-Compression}.

Keywords

Main Subjects