Unifying Metadata for Stream and Batch Queries in a Cloud Service
SQL interfaces have been independently built both for big data stream and batch processing systems. However, semantically there is an impedance between these two worlds. Streaming systems favor evolution and change while batch systems have historically been designed for write once read many. It is common for schemas to evolve in a streaming system overtime meaning that they have developed different approaches for storing metadata. We describe how we resolved one aspect of this semantic gap by storing and updating metadata in a single system and enabling the use of this unified metadata structure to run streaming and batch queries within the same commercial cloud SQL service. The transparent unification of batch and stream metadata catalogs is, as far as we are aware, novel.