Shadow puppets: Cloud-level accurate AI inference at the speed and economy of edge
Extracting value from insights on unstructured data on the Internet of Things and Humans is a major trend in capitalizing on digitization. To date, the design space for doing AI inference on the edge has been highly binary: either consuming cloud-based inference services through edge APIs or running full-fledged deep models on edge devices. In this paper, we break this design space duality by proposing the Semantic Cache, an approach that blends best-of-breed features of the extreme ends of the current design space. Early evaluation results on a first prototype implementation of our semantic cache service on object classification tasks shows tremendous inference latency reduction, when compared to cloud-only inference, and high potential in scoring adequate accuracy for a plurality of AI use-cases.