The Role of Data-Driven Discovery in Detecting Vulnerable Sub-populations
Disciplined, data-driven discovery has an important role for identifying vulnerable populations. We summarise three recent projects that applied techniques from anomalous pattern detection in order to automatically identify sub-populations that had higher (or lower) rates of outcomes such as child mortality. This type of exploratory analysis can be viewed as complementing human-driven confirmation analysis. Scanning for vulnerable sub-populations that have anomalously high (or low) outcomes can be done directly on the data as a form of stratification. Alternatively, black-box prediction models can be scanned for predictive bias where the observed outcomes of a sub-population are much higher than predicted. In either form, subset scanning is a tool for better understanding data at a sub-population level rather than at aggregate or individual levels.