BMJ Open

Development of a machine learning-based prediction model for extremely rapid decline in estimated glomerular filtration rate in patients with chronic kidney disease: a retrospective cohort study using a large data set from a hospital in Japan

View publication


Objectives Trajectories of estimated glomerular filtration rate (eGFR) decline vary highly among patients with chronic kidney disease (CKD). It is clinically important to identify patients who have high risk for eGFR decline. We aimed to identify clusters of patients with extremely rapid eGFR decline and develop a prediction model using a machine learning approach. Design Retrospective single-centre cohort study. Settings Tertiary referral university hospital in Toyoake city, Japan. Participants A total of 5657 patients with CKD with baseline eGFR of 30 mL/min/1.73 m 2 and eGFR decline of ≥30% within 2 years. Primary outcome Our main outcome was extremely rapid eGFR decline. To study-complicated eGFR behaviours, we first applied a variation of group-based trajectory model, which can find trajectory clusters according to the slope of eGFR decline. Our model identified high-level trajectory groups according to baseline eGFR values and simultaneous trajectory clusters. For each group, we developed prediction models that classified the steepest eGFR decline, defined as extremely rapid eGFR decline compared with others in the same group, where we used the random forest algorithm with clinical parameters. Results Our clustering model first identified three high-level groups according to the baseline eGFR (G1, high GFR, 99.7±19.0; G2, intermediate GFR, 62.9±10.3 and G3, low GFR, 43.7±7.8); our model simultaneously found three eGFR trajectory clusters for each group, resulting in nine clusters with different slopes of eGFR decline. The areas under the curve for classifying the extremely rapid eGFR declines in the G1, G2 and G3 groups were 0.69 (95% CI, 0.63 to 0.76), 0.71 (95% CI 0.69 to 0.74) and 0.79 (95% CI 0.75 to 0.83), respectively. The random forest model identified haemoglobin, albumin and C reactive protein as important characteristics. Conclusions The random forest model could be useful in identifying patients with extremely rapid eGFR decline. Trial registration UMIN 000037476; This study was registered with the UMIN Clinical Trials Registry.