LogInsights: Understanding and Extracting Information from Logs for Fast Fault Classification by Weak Supervision
Abstract
In many real-world applications, labeled training data is hard to come by for text classification. These tasks are often domain specific, where the vocabulary of the textual input is different than that of the general language vocabulary. In this paper, we deal with one of such tasks of automation of a software monitoring system, where logs are analyzed in real-Time. We describe a weakly supervised method to process incoming streams of logs for identifying fault types in logs. We propose hand-crafted feature extractions, specially designed for the classifiers for log inputs. In order to make the processing time efficient and generalizable across various log sources, we rely on a weak supervised fault classifier, where the domain knowledge is incorporated using a word embedding mode built on a domain specific corpus. Experiments on logs obtained from various applications show the efficacy of our proposed method.