MIE 2020
Conference paper

Discovering new Social Determinants of Health concepts from Unstructured Data: Framework and Evaluation

View publication


Social determinants of health (SDoH) are the complex set of circumstances in which individuals are born, or with which they live, that impact their health. Integrating SDoH into practice requires that information systems are able to identify SDoH-related concepts from charts and case notes through vocabularies or terminologies. Despite significant standardisation efforts across healthcare domains, SDoH coverage remains sparse in existing terminologies due to the broad spectrum of this domain, ranging from family relations, risk factors, to social programs and benefits, which are not consistently captured across administrative and clinical settings. This paper presents a framework to mine, evaluate and recommend new multidisciplinary concepts that relate to or impact the health and well-being of individuals using a word embedding model trained from a large dynamic corpus of unstructured data. Five key SDoH domains were selected and evaluated by domain experts. The concepts resulting from the trained model were matched against well-established meta-thesaurus UMLS and terminology SNOMED-CT and, overall, a significant proportion of concepts from a set of 10,000 candidates were not found (31% and 28% respectively). The results confirm both the gaps in current terminologies and the feasibility and impact of the methods presented in this paper for the incremental discovery and validation of new SDoH concepts together with domain experts. This sustainable approach facilitates the development and refinement of new and existing terminologies and, in turn, it allows systems such as Natural Language Processing (NLP) annotators to leverage SDoH concepts across integrated care settings.