Praxi: Cloud Software Discovery That Learns From Practice
Abstract
With today's rapidly-evolving cloud landscape embracing continuous integration and delivery, users of cloud systems must monitor software running on their containers and virtual machines (VMs) to ensure compliance, security, and efficiency. Traditional solutions to this problem rely on manually-created rules that identify software installations and modifications, but these require expert authors and are often unmaintainable. Recently, automated techniques for software discovery have emerged. Some techniques use examples of software to train machine learning models to predict which software has been installed on a system. Others leverage the knowledge of packaging practices to aid in discovery without requiring any pre-training, but these practice-based methods cannot provide precise-enough information to perform discovery by themselves. This article introduces Praxi, a new software discovery method that builds upon the strengths of prior approaches by combining the accuracy of learning-based methods with the efficiency of practice-based methods. In tests using samples collected on real-world cloud systems, Praxi correctly classifies installations at least 97.6 percent of the time, while running 14.8 times faster and using 87 percent less disk space than a similar learning-based method. Using a diverse software dataset, this article quantitatively compares Praxi to systematic rule-, learning-, and practice-based methods, and discusses the best uses for each.