What Happens When?: Interpreting Schedule of Activity Tables in Clinical Trial Documents
Abstract
Clinical trial protocols are complex documents that must be translated manually for trial execution and management. We have developed a system to automatically transform a schedule of activity (SOA) table from a PDF document into a machine interpretable form. Our system combines semantic, structural, and NLP approaches with a "human in the loop" for verification to determine which cells contain activity or temporal information, and then to understand details of what these cells represent. Using a training and test set of 20 protocols, we assess the accuracy of identifying specific types of SOA elements. This work is the first stage of a larger effort to use artificial intelligence techniques to extract procedural logic in clinical trial documents and to create a knowledge base of protocols for insights and comparison across studies.