Schizophrenia Research

Linguistic correlates of suicidal ideation in youth at clinical high-risk for psychosis

View publication


Suicidal ideation (SI) is prevalent among individuals at clinical high-risk for psychosis (CHR). Natural language processing (NLP) provides an efficient method to identify linguistic markers of suicidality. Prior work has demonstrated that an increased use of “I”, as well as words with semantic similarity to “anger”, “sadness”, “stress” and “lonely”, are correlated with SI in other cohorts. The current project analyzes data collected in an SI supplement to an NIH R01 study of thought disorder and social cognition in CHR. This study is the first to use NLP analyses of spoken language to identify linguistic correlates of recent suicidal ideation among CHR individuals. The sample included 43 CHR individuals, 10 with recent suicidal ideation and 33 without, as measured by the Columbia-Suicide Severity Rating Scale, as well as 14 healthy volunteers without SI. NLP methods include part-of-speech (POS) tagging, a GoEmotions-trained BERT Model, and Zero-Shot Learning. As hypothesized, individuals at CHR for psychosis who endorsed recent SI utilized more words with semantic similarity to “anger” compared to those who did not. Words with semantic similarity to “stress”, “loneliness”, and “sadness” were not significantly different between the two CHR groups. Contrary to our hypotheses, CHR individuals with recent SI did not use the word “I” more than those without recent SI. As anger is not characteristic of CHR, findings have implications for the consideration of subthreshold anger-related sentiment in suicidal risk assessment. As NLP is scalable, findings suggest that language markers may improve suicide screening and prediction in this population.