ClimateHub: Deep Search for Climate, Earth and Environmental Sciences
Abstract
ClimateHub involves extracting knowledge from scientific climate literature to create a valuable natural language data mining resource for the climate and Earth sciences community. Leveraging IBM Deep Search's corpus extraction service, ClimateHub continuously retrieves documents from public data sources and converts them into machine-readable outputs. We conduct data mining on over 200 million semantic scholar abstracts and 2 million arXiv publications to perform named entity and relationship recognition. Utilizing a climate ontology, we develop a climate knowledge graph representation. In addition, ClimateHub offers climate queries for sophisticated searches over countries, lakes, rivers, provinces, and cities, enabling geo/hazard traversals. Through geospatial document exploration, we link geographic entities from documents to OpenStreetMap. We also conduct climate hazard and automated geo-annotations on more than 600,000 hydrohazard abstracts, improving geographic annotations using OpenStreetMap Nominatim APIs for collections of abstracts in Climate and Earth sciences. ClimateHub provides a suite of discovery applications for Earth science and climate model and dataset searches. This work offers a valuable resource for the climate and Earth science community, facilitating machine learning within the climate and geoinformatics domains using natural language datasets.