Visual inspection of electrocardiograms (ECGs) is a common clinical practice to diagnose heart diseases (HDs), which are still responsible for millions of deaths globally every year. In particular, myocardial infarction (MI) is the leading cause of mortality among HDs. ECGs reflect the electrical activity of the heart and provide a quicker process of diagnosis compared to laboratory blood tests. However, still it requires trained clinicians to interpret ECG waveforms, which poses a challenge in low-resourced healthcare systems, such as poor doctor-to-patient ratios. Previous works in this space have shown the use of data-driven approaches to predict HDs from ECG signals but focused on domain-specific features that are less generalizable across patient and device variations. Moreover, limited work has been conducted on the use of longitudinal information and fusion of multiple ECG leads. In contrast, we propose an end-to-end trainable solution for MI diagnosis, which (1) uses 12 ECG leads; (2) fuses the leads at data-level by stacking their spectrograms; (3) employs transfer learning to encode features rather than learning representations from scratch; and (4) uses a recurrent neural network to encode temporal dependency in long duration ECGs. Our approach is validated using multiple datasets, including tens of thousands of subjects, and encouraging performance is achieved.