Publication
Big Data 2019
Conference paper

Kafka: The Database Inverted, but Not Garbled or Compromised

View publication

Abstract

The Kafka streaming platform has at its heart a distributed commit log. This resembles the change log that exists in every relational database system. It has been suggested that Kafka be viewed not just as a messaging system, but as the core of a database. The database is in effect 'turned inside out' as the normally hidden change log becomes the first class entity of the system, while what is normally considered primary, i.e. the table, view, indexes etc. are just derived from this log. This is appealing as a vision, but raises challenges when applied within an actual enterprise system. The challenges arise from the conflicting interests and requirements of analytics and transactional systems. Running everything on a single system leads to tradeoffs; our intentions here is to identify some of the practical problems with using Kafka as a single data store within an enterprise and to describe our initial approach to resolving them. In particular we present preliminary approaches to ensure consistency and coherence of data from multiple database tables when distributed over Kafka and how to address compliance by encrypting/decrypting data at the Kafka producers and consumers.