Pre-Treatment Methods for Machine Learning in Finer UV Spectrum Inference
Abstract
Spectroscopy is a crucial modality for evaluating the properties of materials. In particular, UltraViolet (UV) spectra enable the measurement of important energy gap structures that provide information about the electronic structure. Spectral features encompass not only direct properties such as peak positions and bandwidths but also quantitative evaluations of features like graph curvature, which can be challenging. This highlights the potential effectiveness of spectroscopy as a modality for machine learning. Previous studies utilizing Long Short-Term Memory (LSTM) have achieved partial success in predicting the shape of spectra for certain substances, while struggling with others. Meanwhile, Transformers have gained significant attention in machine learning in recent years due to their ability to combine structures and exhibit various variations. In this study, we constructed a Transformer architecture to train on UV spectra. The input consisted of Simplified Molecular Input Line Entry System (SMILES) representations tokenized using a tokenizer, allowing us to infer spectra from chemical structures. The training data included both existing data from prior studies. Furthermore, several techniques were implemented to enhance the model's performance, and the results were compared and examined. Additionally, a comparison was made with other models such as LSTM and Gated Recurrent Unit (GRU) to further assess the model's performance. In this presentation, we will provide an explanation of the results, discussions, and considerations derived from this comparative analysis.