Towards an Open Format for Scalable System Telemetry
Abstract
A data representation for system behavior telemetry for scalable big data security analytics is presented, affording telemetry consumers comprehensive visibility into workloads at reduced storage and processing overheads. The new abstraction, SysFlow, is a compact open data format that lifts the representation of system activities into a flow-centric, object-relational mapping that records how applications interact with their environment, relating processes to file accesses, network activities, and runtime information. The telemetry format supports single-event and volumetric flow representations of process control flows, file interactions, and network communications. Evaluation on enterprise-grade benchmarks shows that SysFlow facilitates deeper introspection into attack kill chains while yielding traces orders of magnitude smaller than current state-of-the-art system telemetry approaches - drastically reducing storage requirements and enabling feature-filled system analytics, process-level provenance tracking, and long-term data archival for cyber threat discovery and forensic analysis on historical data.