Publication
S&P 2024
Conference paper

Understanding and Bridging the Gap Between Unsupervised Network Representation Learning and Security Analytics

Abstract

Cyber-attacks have become increasingly sophisticated, which also drive the development of security analytics that produce countermeasures by mining organizational logs, e.g., network and authentication logs. Graph security analytics (GSA) that can model the complex communication patterns between users/hosts/processes have been extensively developed and deployed. Among the techniques that power GSAs, Unsupervised Network Representation Learning (UNRL) is gaining traction, which learns a latent graph representation, i.e., node embedding, and customizes it for different downstream tasks. Prominent advantages have been demonstrated by UNRL-based GSAs, as UNRL trains a detection model in an unsupervised way and exempts the model developers from the duty of feature engineering. In this paper, we revisit the designs of previous UNRL-based GSAs to understand how they perform in the real-world settings. We found their performance is questionable on large-scale, noisy log datasets like LANL authentication dataset, and the main reason is that they follow the standard UNRL framework that trains a generic model in an attack-agnostic way. We argue that generic attack characteristics should be considered, and propose ARGUS, a UNRL-based GSA with new encoder and decoder designs. ARGUS is also designed to work on discrete temporal graph (DTG) to exploit the graph temporal dynamics. Our evaluation on two large-scale datasets, LANL and OpTC, shows it can outperform the state-of-the-art approaches by a large margin.

Date

20 May 2024

Publication

S&P 2024

Authors

Topics

Share