KDD 2021

Workshop on Document Intelligence

View publication


Business documents are central to the operation of all organizations, and they come in all shapes and sizes: project reports, planning documents, technical specifications, financial statements, meeting minutes, legal agreements, contracts, resumes, purchase orders, invoices, and many more. The ability to read, understand and interpret these documents, referred to here as Document Intelligence (DI), is challenging due to their complex formats and structures, internal and external cross references deployed, quality of scans and OCR performed, and many domains of knowledge involved. While a variety of research has advanced the fundamentals of document understanding, the majority have focused on documents found on the web which fail to capture the complexity of analysis and types of understanding needed across business documents. Realizing the vision of Document Intelligence remains a research challenge that requires a multi-disciplinary perspective spanning not only natural language processing and understanding, but also computer vision, layout understanding, knowledge representation and reasoning, data mining, knowledge discovery, information retrieval, and more – all of which have been profoundly impacted and advanced by deep learning in the last few years. This workshop aims to explore and advance the current state of research and practice, including but not limited to the following topics: • Document modeling and representations. • Document structure and layout learning and recognition. • Cleansing and image enhancement techniques for scanned documents. • Information extraction from text and semi-structured documents. • Linguistic analysis of business documents. • Natural language reasoning and inference. • Question answering on business documents. • Semantic understanding of business documents. • Document search and clustering • Handwritten recognition in business documents. • Table identification and extraction from business documents. • Chart learning and understanding. • Domain-specific document understanding. • Knowledge representation for business documents. • Multilingual document understanding methods and frameworks. • Integrated syntax and semantic approaches for document understanding. • Transfer learning methods for business document reading and understanding. In addition to the invited talks and the panel discussion on topics related to Document Intelligence, the workshop program will include paper sessions which provides an opportunity to present peer-reviewed work on the topic related to Document Intelligence.