Molecular Descriptors Accounting for Intramolecular Interactions and Application to Chemical Property Prediction
Abstract
Molecular descriptors are essential chemoinfomatic tools for various applications such as structure-property studies, similarity search and virtual screening. The performance of those applications depends on the accuracy of descriptors to describe the structural features of the molecule that are relevant to the target properties. While chemical properties are influenced by intramolecular interactions due to the positional relationship of substructures, the current major descriptors are limited to local features, counting up the specific atom or substructures. Here, we present new molecular descriptors of global features based on the topological distance between substructures, thus implicitly allowing to account for intramolecular interactions. This descriptor takes a molecular structure in SMILES as input and returns a dictionary object where the keys are SMILES of the substructure pairs and the values are features related to the distance between substructure pairs. The distance between substructure pairs is determined with the average number of bonds between each pair of atoms representing each substructure, and the inverse squared value is used as the feature value related to the distance to account for Coulomb force. The target substructures include Hetero atoms and those extracted using circular fingerprints and BRICS methods. Our computational results show that our new descriptor outperformed all other well-known descriptors, including atom- pair fingerprints, in terms of predictive model accuracy against chemical property for light emitting related materials.