ProcessAtlas: A scalable and extensible platform for business process analytics
In today's knowledge-, service-, and cloud-based economy, an overwhelming amount of business-related data are being generated at a fast rate daily from a wide range of sources. These data increasingly show all the typical properties of big data: wide physical distribution, diversity of formats, nonstandard data models, and independently managed and heterogeneous semantics. In this context, there is a need for new scalable and process-aware services for querying, exploration, and analysis of process data in the enterprise because (1) process data analysis services should be capable of processing and querying large amount of data effectively and efficiently and, therefore, have to be able to scale well with the infrastructure's scale and (2) the querying services need to enable users to express their data analysis and querying needs using process-aware abstractions rather than other lower-level abstractions. In this paper, we introduce ProcessAtlas, ie, an extensible large-scale process data querying and analysis platform for analyzing process data in the enterprise. The ProcessAtlas platform offers an extensible architecture by adopting a service-based model so that new analytical services can be plugged into the platform. In ProcessAtlas, we present a domain-specific model for representing process knowledge, ie, process-level entities, abstractions, and the relationships among them modeled as graphs. We provide services for discovering, extracting, and analyzing process data. We provide efficient mapping and execution of process-level queries into graph-level queries by using scalable process query services to deal with the process data size growth and with the infrastructure's scale. We have implemented ProcessAtlas as a MapReduce-based prototype and report on experiments performed on both synthetic and real-world datasets.