NeurIPS 2023
Workshop paper

Learning the Language of NMR: Structure Elucidation from NMR spectra using Transformer Models

View code


The application of machine learning models in chemistry has made remarkable strides in recent years. Even though there is considerable interest in automating common procedure in analytical chemistry using machine learning, very few models have been adopted into everyday use. Among the analytical instruments available to chemists, Nuclear Magnetic Resonance (NMR) spectroscopy is one of the most important, offering insights into molecular structure unobtainable with other methods. However, most processing and analysis of NMR spectra is still performed manually, making the task tedious and time consuming especially for large quantities of spectra. We present a transformer-based machine learning model capable of predicting the molecular structure directly from the NMR spectrum. Our model is pretrained on synthetic NMR spectra, achieving a top–1 accuracy of 67.0% when predicting the structure from both the $^1$H and $^{13}$C spectrum. Additionally, we train a model which, given a spectrum and a set of likely compounds, selects the structure corresponding to the spectrum. This model achieves a top–1 accuracy of 98.28% when trained on both $^1$H and $^{13}$C spectra in selecting the correct structure.