Background: The aim of this study is to use classification methods to predict future onset of Alzheimer's disease in cognitively normal subjects through automated linguistic analysis. Methods: To study linguistic performance as an early biomarker of AD, we performed predictive modeling of future diagnosis of AD from a cognitively normal baseline of Framingham Heart Study participants. The linguistic variables were derived from written responses to the cookie-theft picture-description task. We compared the predictive performance of linguistic variables with clinical and neuropsychological variables. The study included 703 samples from 270 participants out of which a dataset consisting of a single sample from 80 participants was held out for testing. Half of the participants in the test set developed AD symptoms before 85 years old, while the other half did not. All samples in the test set were collected during the cognitively normal period (before MCI). The mean time to diagnosis of mild AD was 7.59 years. Findings: Significant predictive power was obtained, with AUC of 0.74 and accuracy of 0.70 when using linguistic variables. The linguistic variables most relevant for predicting onset of AD have been identified in the literature as associated with cognitive decline in dementia. Interpretation: The results suggest that language performance in naturalistic probes expose subtle early signs of progression to AD in advance of clinical diagnosis of impairment. Funding: Pfizer, Inc. provided funding to obtain data from the Framingham Heart Study Consortium, and to support the involvement of IBM Research in the initial phase of the study. The data used in this study was supported by Framingham Heart Study's National Heart, Lung, and Blood Institute contract (N01-HC-25195), and by grants from the National Institute on Aging grants (R01-AG016495, R01-AG008122) and the National Institute of Neurological Disorders and Stroke (R01-NS017950).