Later, we realized that we could use the embedded information from our AI classification models to create “reaction fingerprints.” Basically, our model transforms any chemical reaction into a continuous vector, which gives chemists the possibility to map chemical reaction space and allows them to easily inquire about similar reactions. These data-driven reaction fingerprints unlock the possibility of mapping the reaction space without knowing the reaction centers or the reactant-reagent split. They also enable efficient searches on the nearest neighboring reaction data sets containing millions of reactions.
Coming back to the headline analogy, the embedded information (the fingerprint) that comes out of grouping like headlines together is presented as a graph embedded in two-dimensional space, which would allow you to look deeper into the specifics, such as which sport the original headline refers to. With this information, you could easily find other headlines related to the one you’re about to write. Along the same lines, chemists could use this information to find related reactions that might serve as a starting point for their next experiment.
Our models reached a classification accuracy of 98.9 percent on two different reaction data sets. And our reaction fingerprints can be used to almost perfectly cluster chemical reaction space. Essentially, we have developed a new way of exploring chemical reaction data, opening a chemical galaxy highway. Let the expedition begin!
Access the interactive reaction atlas at RXN4Chemistry on GitHub.
Schwaller, P. et al. Mapping the space of chemical reactions using attention-based neural networks. Nat Mach Intell 3, 144–152 (2021). ↩