The Potential of Causal Modelling in Microservices-based Architectures
Abstract
The adoption of microservices-based architectures is becoming more prominent due to their advantageous characteristics, including manageability, scalability, and flexibility. However, they can be complex, and their performance may be affected by high latencies, which have the potential to result in Service Level Objective (SLO) violations. In order to identify the causes of high latency in these architectures, a causal modelling framework has been developed which is capable of analysing and predicting latency within a microservice-based architecture. To this end, we employ causal discovery to identify the underlying causal structure governing latency. Our model integrates domain knowledge to impose constraints on the causal graph, ensuring the relevance and accuracy of the discovered relationships. To validate our approach, we reconstruct latency metrics using machine learning techniques, and we demonstrate the effectiveness by accurately capturing the interrelations between microservices and there resources. Our framework provides an enhanced understanding of the causes of latency leading to SLO violations and paves the way for sophisticated mechanisms enabling proactive management of cloud resources.