Background: We determined the impact of data volume and diversity and training conditions on recurrent neural network methods compared with traditional machine learning methods. Methods and Results: Using longitudinal electronic health record data, we assessed the relative performance of machine learning models trained to detect a future diagnosis of heart failure in primary care patients. Model performance was assessed in relation to data parameters defined by the combination of different data domains (data diversity), the number of patient records in the training data set (data quantity), the number of encounters per patient (data density), the prediction window length, and the observation window length (ie, the time period before the prediction window that is the source of features for prediction). Data on 4370 incident heart failure cases and 30 132 group-matched controls were used. Recurrent neural network model performance was superior under a variety of conditions that included (1) when data were less diverse (eg, a single data domain like medication or vital signs) given the same training size; (2) as data quantity increased; (3) as density increased; (4) as the observation window length increased; and (5) as the prediction window length decreased. When all data domains were used, the performance of recurrent neural network models increased in relation to the quantity of data used (ie, up to 100% of the data). When data are sparse (ie, fewer features or low dimension), model performance is lower, but a much smaller training set size is required to achieve optimal performance compared with conditions where data are more diverse and includes more features. Conclusions: Recurrent neural networks are effective for predicting a future diagnosis of heart failure given sufficient training set size. Model performance appears to continue to improve in direct relation to training set size.