IBM J. Res. Dev

Real-Time understanding of humanitarian crises via targeted information retrieval

View publication


Humanitarian relief agencies must assess humanitarian crises occurring in the world to prioritize the aid that can be offered. While the rapidly growing availability of relevant information enables better decisions to be made, it also creates an important challenge: How to find, collect, and categorize this information in a timely manner. To address the problem, we propose a targeted retrieval system that automates these tasks. The system uses historical data collected and labeled by subject matter experts to train a classifier that identifies relevant content. Using this classifier, it deploys a focused crawler to locate and retrieve data at scale. The system also incorporates feedback from subject matter experts to adapt to new concepts and information sources. A novel component of the system is an algorithm for re-crawling that improves the crawler efficiency in retrieving recent data. Our preliminary result shows that the algorithm can increase the freshness of collected data while simultaneously decreasing crawling effort. Furthermore, we show that focused crawling outperforms general crawling in this domain. Our initial prototype has received positive feedback from analysts at the Assessment Capacities Project, a humanitarian response agency.