CLOUD 2023

InsightsSumm: Summarization of ITOps Incidents through In-Context Prompt Engineering

View publication


AI has been extensively used to help Site Reliability Engineers (SREs) to resolve faults in cloud services and applications. It helps to accelerate resolution time by navigating through the vast amount of heterogeneous data (logs, metrics, alerts, etc) related to a fault. A good ITOps system should help SREs by giving precise and meaningful insights for a quick understanding of the data at hand. In this paper, we design a framework to summarize the context or insight present in the heterogeneous data related to a fault. The proposed framework constructs queries/prompts, specific to the ITOps domain, which helps us to generate more insightful abstractive summaries using state-of-the-art text generator models. Initial study on simulated faults shows promising results, which can be expanded to accommodate other datatype, providing summaries for real-world cases.