Pairs (Re)Loaded: System Design Benchmarking for Scalable Geospatial Applications

In this paper we benchmark a previously introduced big data platform that enables the analysis of big data from remote sensing and other geospatial-temporal data. The platform, called IBM PAIRS Geoscope, has been developed by leveraging open source big data technologies (Hadoop/HBase) that are in principle scalable in storage and compute to hundreds of PetaBytes. Currently, PAIRS hosts multiple PetaBytes of curated and geospatial-temporally indexed data. It organizes all data with key-value combinations, performing analytics close to the data to minimize data movement.