GNNIC: Finding Long-Lost Sibling Functions with Abstract Similarity

Qiushi Wu; Zhongshu Gu; Hani Jamjoom; Kangjie Lu

NDSS 2024

Conference paper

26 Feb 2024

GNNIC: Finding Long-Lost Sibling Functions with Abstract Similarity

Abstract

Generating accurate call graphs for large programs, especially at the OS level, is a well-known challenging problem. This is due to the common use of indirect calls in large programs, which defer the computation of call targets until runtime to achieve program polymorphism. As a result, compilers cannot statically determine indirect call edges. Recent advances have attempted to use type analysis to globally match indirect call targets in programs. However, these approaches still suffer from low precision when dealing with large target programs or generic types. This paper presents GNNIC, a Graph Neural Network (GNN) based Indirect Call analyzer. The technique proposed in GNNIC is called abstract-similarity search, which aims to identify indirect call targets in large programs with high precision. The approach is based on the observation that although indirect call targets may have detailed polymorphic behaviors, they share similar abstract behaviors such as function descriptions, data types, and invoked function calls. We consolidate such information into a representative abstraction graph (RAG) and employ GNNs to learn function embeddings. Abstract-similarity search relies on at least one anchor target to bootstrap. Therefore, we also propose a new program analysis technique to locally identify valid targets of each indirect call. Starting from anchor targets, GNNIC can expand the search scope to find more targets of indirect calls in the whole program. We have developed GNNIC using LLVM and GNN, and evaluated it on multiple OS kernels. The results showed that GNNIC outperformed state-of-the-art type-based techniques by reducing 86% to 93% of false target functions. Moreover, the abstract similarity and precise call graphs generated by GNNIC can enhance security applications by discovering new bugs, alleviating path-explosion issues, and improving static program analysis efficiency. The combination of static analysis and GNNIC resulted in finding 97 new bugs in Linux and FreeBSD kernels.

Paper