Schizophrenia Research

Construct validity for computational linguistic metrics in individuals at clinical risk for psychosis: Associations with clinical ratings

View publication


Language deficits are prevalent in psychotic illness, including its risk states, and are related to marked impairment in functioning. It is therefore important to characterize language impairment in the psychosis spectrum in order to develop potential preventive interventions. Natural language processing (NLP) metrics of semantic coherence and syntactic complexity have been used to discriminate schizophrenia patients from healthy controls (HC) and predict psychosis onset in individuals at clinical high-risk (CHR) for psychosis. To date, no studies have yet examined the construct validity of key NLP features with respect to clinical ratings of thought disorder in a CHR cohort. Herein we test the association of key NLP metrics of coherence and complexity with ratings of positive and negative thought disorder, respectively, in 60 CHR individuals, using Andreasen's Scale of Assessment of Thought, Language and Communication (TLC) Scale to measure of positive and negative thought disorder. As hypothesized, in CHR individuals, the NLP metric of semantic coherence was significantly correlated with positive thought disorder severity and the NLP metrics of complexity (sentence length and determiner use) were correlated with negative thought disorder severity. The finding of construct validity supports the premise that NLP analytics, at least in respect to core features of reduction of coherence and complexity, are capturing clinically relevant language disturbances in risk states for psychosis. Further psychometric study is required, in respect to reliability and other forms of validity.