Differential Expression of Anomalous Self-Experiences in Spontaneous Speech in Clinical High-Risk and Early-Course Psychosis Quantified by Natural Language Processing

Agrima Srivastava; Alexandria Selloni; Zarina R. Bilgrami; Cansu Sarac; Alessia McGowan; Matthew Cotter; Johanna Bayer; Jessica Spark; Marija Krcmar; Melanie Formica; Kate Gwyther; Jessica Hartmann; Ezra Ellenberg; Andrea Polari; Patrick McGorry; Jai L. Shah; Alison R. Yung; Romina Mizrahi; Cheryl M. Corcoran; Guillermo A. Cecchi; Barnaby Nelson

doi:10.1016/j.bpsc.2023.06.007

Biological Psychiatry

Paper

01 Oct 2023

Differential Expression of Anomalous Self-Experiences in Spontaneous Speech in Clinical High-Risk and Early-Course Psychosis Quantified by Natural Language Processing

View publication

Abstract

Background: Basic self-disturbance, or anomalous self-experiences (ASEs), is a core feature of the schizophrenia spectrum. We propose a novel method of natural language processing to quantify ASEs in spoken language by direct comparison to an inventory of self-disturbance, the Inventory of Psychotic-Like Anomalous Self-Experiences (IPASE). We hypothesized that there would be increased similarity in open-ended speech to the IPASE items in individuals with early-course psychosis (PSY) compared with healthy individuals, with clinical high-risk (CHR) individuals intermediate in similarity. Methods: Open-ended interviews were obtained from 170 healthy control participants, 167 CHR participants, and 89 PSY participants. We calculated the semantic similarity between IPASE items and “I” sentences from transcribed speech samples using S-BERT (Sentence Bidirectional Encoder Representation from Text). Kolmogorov-Smirnov tests were used to compare distributions across groups. A nonnegative matrix factorization of cosine similarity was performed to rank IPASE items. Results: Spoken language of CHR individuals had the greatest semantic similarity to IPASE items when compared to both healthy control (s = 0.44, p < 10−14) and PSY (s = 0.36, p < 10−6) individuals, while IPASE scores were higher among PSY than CHR group participants. In addition, the nonnegative matrix factorization approach produced a data-driven domain that differentiated the CHR group from the others. Conclusions: We found that open-ended interviews elicited language with increased semantic similarity to the IPASE by participants in the CHR group compared with patients with psychosis. This demonstrates the utility of these methods for differentiating patients from healthy control participants. This complementary approach has the capacity to scale to large studies investigating phenomenological features of schizophrenia and potentially other clinical populations.

Conference paper