Publication
VLDB
Paper

Auto-grouping emails for faster e-discovery

Abstract

In this paper, we examine the application of various grouping techniques to help improve the efficiency and reduce the costs involved in an electronic discovery process. Specifically, we create coherent groups of email documents which characterize either a syntactic theme, a semantic theme or an email thread. All such grouped documents can be reviewed together leading to a faster and more consistent review of documents. Syntactic grouping of emails is based on near duplicate detection whereas semantic grouping is based on identifying concepts in the email content using information extraction. Email thread detection is achieved using a combination of segmentation and near duplicate detection. We present experimental results on the Enron corpus that suggest that these approaches can significantly reduce the review time and show that high precision and recall in identifying the groups can be achieved. We also describe how these techniques are integrated into the IBM eDiscovery Analyzer product offering. © 2011 VLDB Endowment.

Date

Publication

VLDB