Big Data 2022
Workshop paper

Natural Language Interface for Process Mining Queries in Healthcare

View publication


Recently, the needs of data required for data analysis are becoming more diversified, and research on data extraction and analysis methods has been continuously made in order to effectively respond to various needs. Process mining is a solution that analyzes various system logs built by companies or healthcare institutions so that they can be used for process improvement. From the process model extracted from the system logs, it is possible not only to grasp the exact flow of the current business process, but also to acquire additional information such as repetitive execution of activities in the process where the bottleneck occurs in the business process flow. The manufacturing industry has made great efforts to improve the process management, and as many companies are paying attention to big data these days, various data-related technologies are emerging in the healthcare industry as well to properly provide patients with the care needed. Process mining tools allow users to pull data by programming in a process mining query language using the APIs provided with the process mining tool, or by manually creating reusable analytical documents using user friendly tool. However, these tasks require the users to be familiar with the query language APIs and understand the data model and its relationships with respect to creating analytical documents. This paper proposes a methodology that allows users to easily extract desired data through natural language interface, which relieves nonprofessional users of the burden of programming in a process mining query language. The process mining query engine with natural language interface presented in this paper consists of four major components. Among them, the natural language processing pipeline that not only extracts intermediate representation of entities used when constructing a process mining query language report from natural language queries, but also effectively extracts a query hint from the context of natural language query. The query hint is used to select a process-specific function from the library that fits the context of the user query while transforming a natural language query into a process mining query report. The method proposed in this study has the advantage of being able to roughly grasp the process state for the user just by entering a query in natural language. The proposed system provides users with four query process options. That is, the user 1) retrieves intermediate representation of entities and query hints from the NLP pipeline, 2) retrieves the process mining query language from the query language generator, 3) submits the query language to the process mining engine and execute the query, 4) retrieves description of intermediate representation of entity and query hints in natural language to confirm that the query is processed correctly. The contents proposed in this paper were constructed and executed, and the query reports in process mining query language programmatically generated by the proposed query engine were also executed in a process mining engine and the query results were verified.