Unsupervised feature selection by integration of regularized self-representation and sparse coding

Document Type : Original Article

Authors

1 Department of Computer Engineering, Sanandaj Branch, Islamic Azad University, Sanandaj, Iran

2 Department of Computer Engineering, University of Kurdistan, Sanandaj, Iran

Abstract

Due to the development of social networks and the Internet of things, we recently have faced with large datasets. High-dimensional data is mixed with redundant and irrelevant features, so the performance of machine learning methods is reduced. Feature selection is a common way to tackle this issue with the aim of choosing a small subset of relevant and non-redundant features. Most of the existing feature selection works are for supervised applications, which assume that the information on class labels is available. While in many real-world applications, it is not possible to provide complete knowledge of class labels. To overcome this shortcoming, an unsupervised feature selection method is proposed in this paper. The proposed method uses the matrix factorization-based regularized self-representation model to weight features based on their importance. Here, we initialize the weights of features based on the correlation among features. Several experiments are performed to evaluate the effectiveness of the proposed method. Then the results are compared with several baselines and state-of-the-art methods, which show the superiority of the proposed method in most cases.

Keywords


  1. Bahassine, A. Madani, M. Al-Sarem, and M. Kissi, Feature selection using an improved Chi-square for arabic text classification, J. King Saud Univ. - Comput. Inf. Sci., 32 (2020), pp. 225–231.
  2. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, Foundations and Trends in Machine Learning, 3 (2011), pp. 1–122.
  3. Boyd and L. Vandenberghe, Convex Optimization, Cambridge university press.
  4. Cai, C. Zhang, and X. He, Unsupervised feature selection for multi-cluster data, in Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 2010, Association for Computing Machinery, pp. 333––342.
  5. Chen, G. Li, and Y. Gu, Active orthogonal matching pursuit for sparse subspace clustering, IEEE Signal Processing Letters, 25 (2018), pp. 164–168.
  6. Dornaika, Multi-layer manifold learning with feature selection, Applied Intelligence, 50 (2020), pp. 1859– 1871.
  7. Du, Y. Ma, S. Li, and Y. Ma, Robust unsupervised feature selection via matrix factorization, Neurocomputing, 241 (2017), pp. 115–127.
  8. Elhamifar and R. Vidal, Sparse subspace clustering: Algorithm, theory, and applications, IEEE Transactions on Pattern Analysis and Machine Intelligence, 35 (2013), pp. 2765–2781.
  9. Gan and L. Zhang, Iteratively local fisher score for feature selection, Applied Intelligence, 51 (2021), pp. 6167–6181.
  10. Han, P. Liu, L. Wang, and D. Li, Unsupervised feature selection via graph matrix learning and the low-dimensional space learning for classification, Eng. Appl. Artif. Intell., 87 (2020), p. 103283.
  11. Hanbay, A new standard error based artificial bee colony algorithm and its applications in feature selection, J. King Saud Univ. - Comput. Inf. Sci., 34 (2022), pp. 4554–4567.
  12. He, D. Cai, and P. Niyogi, Laplacian score for feature selection, in Advances in neural information processing systems, vol. 18, 2005, pp. 507–514.
  13. E. Hegazy, M. Makhlouf, and G. S. El-Tawel, Improved salp swarm algorithm for feature selection, J. King Saud Univ. - Comput. Inf. Sci., 32 (2020), pp. 335–344.
  14. Hichem, M. Elkamel, M. Rafik, M. T. Mesaaoud, and C. Ouahiba, A new binary grasshopper optimization algorithm for feature selection problem, J. King Saud Univ. - Comput. Inf. Sci., 34 (2022), pp. 316– 328.
  15. Hou, F. Nie, X. Li, D. Yi, and Y. Wu, Joint embedding learning and sparse regression: A framework for unsupervised feature selection, IEEE Transactions on Cybernetics, 44 (2014), pp. 793–804.
  16. Hu, X. Zhu, D. Cheng, W. He, Y. Yan, J. Song, and S. Zhang, Graph self-representation method for unsupervised feature selection, Neurocomputing, 220 (2017), pp. 130–137. Recent Research in Medical Technology Based on Multimedia and Pattern Recognition.
  17. Huang, Z. Shen, F. Cai, T. Li, and F. Lv, Adaptive graph-based generalized regression model for unsupervised feature selection, Knowledge-Based Systems, 227 (2021), p. 107156.
  18. E. Isabelle Guyon, An introduction to variable and feature selection, J. Mach. Learn. Res., 3 (2003), pp. 1157–1182.
  19. E. B. Jennifer G. Dy, An introduction to variable and feature selection, J. Mach. Learn. Res., 5 (2004), pp. 845–889.
  20. N. K.P. and T. P., Feature selection using efficient fusion of fisher score and greedy searching for alzheimer’s classification, J. King Saud Univ. - Comput. Inf. Sci., 34 (2022), pp. 4993–5006.
  21. Larabi Marie-Sainte and N. Alalyani, Firefly algorithm based feature selection for arabic text classification, J. King Saud Univ. - Comput. Inf. Sci., 32 (2020), pp. 320–328.
  22. Li, Y. Wang, Y. Li, P. Hu, and R. Zhao, Joint local structure preservation and redundancy minimization for unsupervised feature selection, Appl. Intell., 50 (2020), pp. 4394–4411.
  23. Li and J. Tang, Unsupervised feature selection via nonnegative spectral analysis and redundancy control, IEEE Transactions on Image Processing, 24 (2015), pp. 5343–5355.
  24. Miao, Y. Ping, Z. Chen, X.-B. Jin, P. Li, and L. Niu, Unsupervised feature selection by non-convex regularized self-representation, Expert Syst. Appl., 173 (2021), p. 114643.
  25. Miao, T. Yang, L. Sun, X. Fei, L. Niu, and Y. Shi, Graph regularized locally linear embedding for unsupervised feature selection, Pattern Recognition, 122 (2022), p. 108299.
  26. Mitra, C. Murthy, and S. Pal, Unsupervised feature selection using feature similarity, IEEE Trans. Pattern Anal. Mach. Intell., 24 (2002), pp. 301–312.
  27. Moradi and M. Gholampour, A hybrid particle swarm optimization for feature subset selection by integrating a novel local search strategy, Applied Soft Computing, 43 (2016), pp. 117–130.
  28. Nie, H. Huang, X. Cai, and C. Ding, Efficient and robust feature selection via joint ℓ2,1-norms minimization, in Advances in neural information processing systems, J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, eds., vol. 23, Curran Associates, Inc., 2010, pp. 1813–1821.
  29. Nie, S. Xiang, Y. Jia, C. Zhang, and S. Yan, Trace ratio criterion for feature selection., in Proceedings of the National Conference on Artificial Intelligence, vol. 2, 01 2008, pp. 671–676.
  30. G. Parsa, H. Zare, and M. Ghatee, Unsupervised feature selection based on adaptive similarity learning and subspace clustering, Eng. Appl. Artif. Intell., 95 (2020), p. 103855.
  31. , Low-rank dictionary learning for unsupervised feature selection, Expert Syst. Appl., 202 (2022), p. 117149.
  32. B. Pereira, A. Plastino, B. Zadrozny, and L. H. C. Merschmann, Categorizing feature selection methods for multi-label classification, Artificial Intelligence Review, 49 (2018), pp. 57–78.
  33. Qian and C. Zhai, Robust unsupervised feature selection, in Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, F. Rossi, ed., AAAI Press, 2013, pp. 1621–1627.
  34. Shang, J. Chang, L. Jiao, and Y. Xue, Unsupervised feature selection based on self-representation sparse regression and local similarity preserving, Int. J. Mach. Learn. & Cyber., 10 (2019), pp. 757–770.
  35. D. Sheth, S. T. Patil, and M. L. Dhore, Evolutionary computing for clinical dataset classification using a novel feature selection algorithm, J. King Saud Univ. - Comput. Inf. Sci., 34 (2022), pp. 5075–5082.
  36. Singh and B. Singh, Hybridization of feature selection and feature weighting for high dimensional data, Appl. Intell., 49 (2019), pp. 1580–1596.
  37. Tabakhi, P. Moradi, and F. Akhlaghian, An unsupervised feature selection algorithm based on ant colony optimization, Eng. Appl. Artif. Intell., 32 (2014), pp. 112–123.
  38. Tang, X. Liu, M. Li, P. Wang, J. Chen, L. Wang, and W. Li, Robust unsupervised feature selection via dual self-representation and manifold regularization, Knowledge-Based Systems, 145 (2018), pp. 109–120.
  39. Wang, W. Pedrycz, Q. Zhu, and W. Zhu, Subspace learning for unsupervised feature selection via matrix factorization, Pattern Recognition, 48 (2015), pp. 10–19.
  40. Yang, H. Shen, Z. Ma, Z. Huang, and X. Zhou, 2,1-norm regularized discriminative feature selection for unsupervised learning, in Proceedings of the Twenty-Second international joint conference on Artificial Intelligence, T. Walsh, ed., vol. 2, AAAI Press, 2011, pp. 1589–1594.
  41. Zare, M. G. Parsa, M. Ghatee, and S. H. Alizadeh, Similarity preserving unsupervised feature selection based on sparse learning, in 2020 10th International Symposium onTelecommunications (IST), 2020, pp. 50–55.
  42. Zhao, L. Wang, and H. Liu, Efficient spectral feature selection with minimum redundancy, in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 24, AAAI Press, July 2010, pp. 673–678.
  43. Zheng, X. Zhu, G. Wen, Y. Zhu, H. Yu, and J. Gan, Unsupervised feature selection by self-paced learning regularization, Pattern Recognition Letters, 132 (2020), pp. 4–11. Multiple-Task Learning for Big Data (MTL4BD).
  44. Zhou, X. Wang, and R. Zhu, Feature selection based on mutual information with correlation coefficient, Applied Intelligence, 52 (2021), pp. 5457–5474.
  45. Zhu, W. Zuo, L. Zhang, Q. Hu, and S. C. Shiu, Unsupervised feature selection by regularized selfrepresentation, Pattern Recognition, 48 (2015), pp. 438–446.
  46. Zhu, X. Zhang, R. Wang, W. Zheng, and Y. Zhu, Self-representation and PCA embedding for unsupervised feature selection, World Wide Web, 21 (2017), pp. 1675–1688.