In this paper we describe an algorithm to automatically detect correlation identifiers from arbitrary data sources. Correlation identifiers can be useful for determining relationships between data in order to isolate instances of a running business process for the purposes of process monitoring and discovery. We have implemented our algorithm and validate our approach on a simulator that implements a real-world inspired order management case scenario consisting of 24 activities and corresponding event types. This simulated scenario involves a wide range of heterogeneous systems (e.g. Order Management, Document Management, E-Mail, and Export Violation Detection Services) as well as workflow-supported human-driven interactions (Process Management System). Initial results indicate that our approach is promising due to its demonstrated success in distinguishing correlations on data generated by our simulator executions. Our work also highlights the directions we could explore in future work such as distributed statistics calculation, and scalability in terms of handling massive data sets. © 2011 IEEE.