Harnessing negative data for enhancing model learning in chemistry
Abstract
In chemistry, the prevalence of negative data is a significant yet often overlooked facet of research. Despite improvements in experimental design, it is still believed that unsuccessful experimental outcomes (negative data) are almost one order of magnitude larger than positive results. For these reasons, their importance for AI and machine learning (ML) models cannot be understated. Negative data provide critical insights into the boundaries and exceptions within chemical systems, offering valuable opportunities for model refinement and validation. By leveraging negative data alongside positive examples, AI/ML models have the potential to develop a better understanding of chemical behaviour, leading to improved predictive accuracy and robustness. We propose an innovative approach to improve model training in chemistry by leveraging negative data to enhance the effectiveness of models initially trained on positive data. Our method employs reinforcement learning coupled with molecular transformer architecture to strengthen model resilience and adaptability in the face of data inconsistencies. Through practical case studies, we demonstrate the effectiveness of our approach across various chemical domains. By integrating both positive and negative data, we not only improve model reliability but also gain deeper insights into complex chemical phenomena. In our presentation, we will provide a clear overview of our framework and its implementation, showcasing its transformative potential in using negative data to ensure the creation of more resilient and adaptable models capable of navigating the evolving landscape of chemical data with refined accuracy.