About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
AAAI 2017
Conference paper
Efficient clinical concept extraction in electronic medical records
Abstract
Automatic identification of clinical concepts in electronic medical records (EMR) is useful not only in forming a complete longitudinal health record of patients, but also in recovering missing codes for billing, reducing costs, finding more accurate clinical cohorts for clinical trials, and enabling better clinical decision support. Existing systems for clinical concept extraction are mostly knowledge-driven, relying on exact match retrieval from original or lemmatized reports, and very few of them are scaled up to handle large volumes of complex, diverse data. In this demonstration we will showcase a new system for real-time detection of clinical concepts in EMR. The system features a large vocabulary of over 5.6 million concepts. It achieves high precision and recall, with good tolerance to typos through the use of a novel prefix indexing and subsequence matching algorithm, along with a recursive negation detector based on efficient, deep parsing. Our system has been tested on over 12.9 million reports of more than 200 different types, collected from 800,000+ patients. A comparison with the state of the art shows that it outperforms previous systems in addition to being the first system to scale to such large collections.