A Pluggable Edge-Processing Pipeline for SysFlow
Abstract
SysFlow is a compact open data format that lifts the representation of system activities into a flow-centric, object-relational mapping. It records how applications interact with their environment and relates processes to file accesses, network activities, and runtime information. The telemetry format encodes single-event and volumetric flow representations, naturally linking these entities together to provide context for analytics and provenance. SysFlow drastically reduces endpoint event collection rates and lifts events into behaviors. This supports forensic applications and more comprehensive analysis approaches, such as cyber threat hunting, big data analytics, and visualization. This talk will introduce a new stream processing and edge analytics pipeline for SysFlow. The pipeline is implemented as a multi-threaded, pluggable framework that enables custom analytics on SysFlow data streams. It supports enriching those streams with important information such as cluster meta-data, container configurations, and application logs. It also includes an extendable policy engine that can monitor and enforce reference policies on cloud workload and trigger remediations. We will describe the design and open-sourcing of the pipeline and demonstrate threat identification use cases that make use of runtime reference policies. We will also demonstrate the custom analytic capabilities of the pipeline by showing a graph-based streaming analytic that uses process graphlets to uncover security-relevant application behaviors for threat hunting, forensics, and context representation of security alerts.