Development of Debiasing Technique for Lung Nodule Chest X-ray Datasets to Generalize Deep Learning Models
Abstract
Screening programs for early lung cancer diagnosis are uncommon, primarily due to the challenge of reaching at-risk patients located in rural areas far from medical facilities. To overcome this obstacle, a comprehensive approach is needed that combines mobility, low cost, speed, accuracy, and privacy. One potential solution lies in combining the chest X-ray imaging mode with federated deep learning, ensuring that no single data source can bias the model adversely. This study presents a pre-processing pipeline designed to debias chest X-ray images, thereby enhancing internal classification and external generalization. The pipeline employs a pruning mechanism to train a deep learning model for nodule detection, utilizing the most informative images from a publicly available lung nodule X-ray dataset. Histogram equalization is used to remove systematic differences in image brightness and contrast. Model training is then performed using combinations of lung field segmentation, close cropping, and rib/bone suppression. The resulting deep learning models, generated through this pre-processing pipeline, demonstrate successful generalization on an independent lung nodule dataset. By eliminating confounding variables in chest X-ray images and suppressing signal noise from the bone structures, the proposed deep learning lung nodule detection algorithm achieves an external generalization accuracy of 89%. This approach paves the way for the development of a low-cost and accessible deep learning-based clinical system for lung cancer screening.