Predictive modeling of 30-day readmission risk of diabetes patients by logistic regression, artificial neural network, and EasyEnsemble
Xiayu Xiang1, Chuanyi Liu2, Yanchun Zhang3, Wei Xiang4, Binxing Fang5
1 School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing, China 2 School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, China 3 Institute for Sustainable Industries and Liveable Cities, Victoria University, Melbourne, Australia 4 Key Laboratory of Tropical Translational Medicine of Ministry of Education; NHC Key Laboratory of Control of Tropical Diseases, Hainan Medical University, Haikou, China 5 Cyberspace Security Research Center, Peng Cheng Laboratory, Shenzhen, China
Correspondence Address:
Binxing Fang Cyberspace Security Research Center, Peng Cheng Laboratory, Shenzhen China Wei Xiang Key Laboratory of Tropical Translational Medicine of Ministry of Education; NHC Key Laboratory of Control of Tropical Diseases, Hainan Medical University, Haikou China
 Source of Support: This work was supported in part by the Key Research and Development Program for Guangdong Province (No. 2019B010136001), in part by Hainan Major Science and Technology Projects (No. ZDKJ2019010), in part by the National Key Research and Development Program of China (No. 2016YFB0800803 and No. 2018YFB1004005), in part by National Natural Science Foundation of China (No. 81960565, No. 81260139, No. 81060073, No. 81560275, No. 61562021, No. 30560161 and No. 61872110), in part by Hainan Special Projects of Social Development (No. ZDYF2018103 and No. 2015SF 39), and in part by Hainan Association for Academic Excellence Youth Science and Technology Innovation Program (No. 201515)., Conflict of Interest: None
DOI: 10.4103/1995-7645.326254
|
Objective: To determine the most influential data features and to develop machine learning approaches that best predict hospital readmissions among patients with diabetes.
Methods: In this retrospective cohort study, we surveyed patient statistics and performed feature analysis to identify the most influential data features associated with readmissions. Classification of all-cause, 30-day readmission outcomes were modeled using logistic regression, artificial neural network, and EasyEnsemble. F1 statistic, sensitivity, and positive predictive value were used to evaluate the model performance.
Results: We identified 14 most influential data features (4 numeric features and 10 categorical features) and evaluated 3 machine learning models with numerous sampling methods (oversampling, undersampling, and hybrid techniques). The deep learning model offered no improvement over traditional models (logistic regression and EasyEnsemble) for predicting readmission, whereas the other two algorithms led to much smaller differences between the training and testing datasets.
Conclusions: Machine learning approaches to record electronic health data offer a promising method for improving readmission prediction in patients with diabetes. But more work is needed to construct datasets with more clinical variables beyond the standard risk factors and to fine-tune and optimize machine learning models. |