SCAD regression model selection with information criteria for multivariate response models

Document Type : Original Article

Authors

Department of Mathematics and Computer Science, Amirkabir University of Technology (Tehran Polytechnic), Iran

Abstract

This paper provides an objective function for smoothly clipped absolute deviation (SCAD) regression models with multivariate responses. The log-likelihood of a multivariate normal distribution is considered instead of $L_2$ norm to create the model's objective function. Additionally, the SCAD penalty has a tuning parameter, and the information criteria, suitable for the proposed model are presented to select the tuning parameter. Based on numerical studies, the consistency of the proposed information criteria is checked via simulation experiments. Moreover, the best criterion is introduced using simulated and real datasets.

Keywords

Main Subjects


[1] H. Akaike, A new look at the statistical model identification, IEEE Transactions on Automatic Control, 19 (1974), pp. 716–723.
[2] L. Breiman and J. H. Friedman, Predicting multivariate responses in multiple linear regression, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 59 (1997), pp. 3–54.
[3] A. Brobbey, Variable selection in multivariate multiple regression, PhD thesis, Memorial University of Newfoundland, 2015.
[4] J. Fan and R. Li, Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association, 96 (2001), pp. 1348–1360.
[5] J. Fan and H. Peng, Nonconcave penalized likelihood with a diverging number of parameters, The Annals of Statistics, 32 (2004), pp. 928–961.
[6] L. E. Frank and J. H. Friedman, A statistical view of some chemometrics regression tools, Technometrics, 35 (1993), pp. 109–135.
[7] A. H. Ghatari and M. Aminghafari, A new type of generalized information criterion for regularization parameter selection in penalized regression, Journal of Biopharmaceutical Statistics, (2023), pp. 1–25.
[8] , Multi-response bridge regularization parameter selection via multivariate generalized information criterion, Fluctuation and Noise Letters, (2024).
[9] X. Guo, H. Zhang, Y. Wang, and J.-L. Wu, Model selection and estimation in high dimensional regression models with group scad, Statistics & Probability Letters, 103 (2015), pp. 86–92.
[10] T. Honda and C.-T. Lin, Forward variable selection for ultra-high dimensional quantile regression models, Annals of the Institute of Statistical Mathematics, 75 (2023), pp. 393–424.
[11] Y. Ma, Y. Li, and J. Xu, Confidence intervals for high-dimensional multi-task regression, JUSTC, 53 (2023), pp. 1–9.
[12] M. H. Rafiei and H. Adeli, A novel machine learning model for estimation of sale prices of real estate units, Journal of Construction Engineering and Management, 142 (2016), pp. 1–10.
[13] Y.-Z. Tian, M.-L. Tang, C. Wong, and M.-Z. Tian, Bayesian analysis of joint quantile regression for multi-response longitudinal data with application to primary biliary cirrhosis sequential cohort study, Statistical Methods in Medical Research, (2024), pp. 1–22.
[14] R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B (Methodological), (1996), pp. 267–288.
[15] A. M. Variyath and A. Brobbey, Variable selection in multivariate multiple regression, Plos One, 15 (2020), p. e0236067.
[16] H. Wang and C. Leng, Unified lasso estimation by least squares approximation, Journal of the American Statistical Association, 102 (2007), pp. 1039–1048.
[17] C.-H. Zhang, Nearly unbiased variable selection under minimax concave penalty, The Annals of Statistics, 38 (2010), pp. 894–942.
[18] Y. Zhang, R. Li, and C.-L. Tsai, Regularization parameter selections via generalized information criterion, Journal of the American Statistical Association, 105 (2010), pp. 312–323.