Cardiovascular diseases (CVDs) remain responsible for millions of deaths annually. Myocardial infarction (MI) is the most prevalent condition among CVDs. Although datadriven approaches have been applied to predict CVDs from ECG signals, comparatively little work has been done on the use of multiple-lead ECG traces and their efficient integration to diagnose CVDs. In this paper, we propose an end-to-end trainable and joint spectral-longitudinal model to predict heart attack using data-level fusion of multiple ECG leads. The spectral stage transforms the time-series waveforms to stacked spectrograms and encodes the frequency-time characteristics, whilst the longitudinal model helps to utilise the temporal dependency that exists in these waveforms using recurrent networks. We validate the proposed approach using a public MI dataset. Our results show that the proposed spectrallongitudinal model achieves the highest performance compared to the baseline methods.