Data broker: A case for workflow enablement using a key/value approach

Lars Schneidenbach; Bruce D’Amora; Claudia Misale; Carlos H.A. Costa; Sara Kokkila Schumacher; Thomas Ward

doi:10.1145/3357526.3357572

MEMSYS 2019

Conference paper

30 Sep 2019

Data broker: A case for workflow enablement using a key/value approach

Download paper

Abstract

Complex problems can often only be solved with a workflow of different applications whose progress depends also on the need of shared data. There are different challenges that developers must overcome when sharing data in a workflow; each application may have very different requirements in how they access and consume shared data. For example, data access can be online or offline, data sizes and types may vary as well as the frequency they are consumed and produced by applications in the workflow. Moreover, producers and consumers may not run at the same time scale, thus introducing possible latency. Also, the deployment system itself plays an important role, especially when considering low latency read/write operations and reliability in case of failures. To facilitate the communication of information within a workflow, in this paper we present the Data Broker, a programming model helping applications to share data in the form of named tuples, relying on the concept of namespaces providing software-based data isolation. The client API provides access to the namespaces through mainly put/get primitives, enriched also with namespace management functions and key browsing/querying. Our programming model can support different backends for storing/retrieving the data: in this paper, we will focus on the Redis implementation. By using Redis as backend, we enable the Data Broker with the reliability and infrastructure provided by it, which allows users to have access to a fast local or remote in-memory distributed key-value/object store configured as a set of independent instances or a coordinated cluster. On top of Redis, we implement data access function, namespace management, asynchronous call with client-side queues per server, and key-space browsing.

Paper