Language as a biomarker for psychosis: A natural language processing approach
Human ratings of conceptual disorganization, poverty of content, referential cohesion and illogical thinking have been shown to predict psychosis onset in prospective clinical high risk (CHR) cohort studies. The potential value of linguistic biomarkers has been significantly magnified, however, by recent advances in natural language processing (NLP) and machine learning (ML). Such methodologies allow for the rapid and objective measurement of language features, many of which are not easily recognized by human raters. Here we review the key findings on language production disturbance in psychosis. We also describe recent advances in the computational methods used to analyze language data, including methods for the automatic measurement of discourse coherence, syntactic complexity, poverty of content, referential coherence, and metaphorical language. Linguistic biomarkers of psychosis risk are now undergoing cross-validation, with attention to harmonization of methods. Future directions in extended CHR networks include studies of sources of variance, and combination with other promising biomarkers of psychosis risk, such as cognitive and sensory processing impairments likely to be related to language. Implications for the broader study of social communication, including reciprocal prosody, face expression and gesture, are discussed.