The American Chemical Society's fall meeting will convene thousands of chemistry professionals and technologists to discuss the latest trends and advances in the field. This year's event theme is "harnessing the power of data” and IBM will present papers, posters, and demos of how we create technology that will help chemists and material scientists to harness the power of data.
Topics include large scale data ingestion and analysis, leveraging the power of foundation models for prediction, generation and building assistants, to automated chemical synthesis executed in autonomous labs and how these advances are driving research in designing more sustainable materials and in drug discovery.
We invite attendees to visit our booths to speak with IBM Researchers and interact with demonstrations of our work:
- Booth # 249 - IBM Research, Accelerated Discovery
- Booth # 248 - IBM Quantum
For presentation times of workshops, demos, and papers see the agenda section below. Note: All times are displayed in your local time.
We look forward to seeing you and telling you more about our latest work and career opportunities at IBM Research.
Learn more about Accelerated Discovery here.
Visit us at the IBM Research booth in the exhibitor area to meet with IBM Researchers to speak about our work and future job opportunities.
- Explore all current IBM Research job openings
- Sign up to be notified of future openings by joining our Talent Network.
Keep up with emerging research and scientific developments from IBM Research. Subscribe to the Future Forward Newsletter.
Symposium/Session: Algorithm Development and Data Analysis in Chemical Space
Abstract: Advances in deep learning and machine learning models combined with high-throughput experimentation have shown potential to accelerate chemical and materials discovery and highlighted the benefits of AI-assisted research practices. The recent advent of multi-domain and multi-task models trained by self-supervision, so-called foundation models, bears also promises for extending learnt representations across multiple fields, thus counteracting the reduced data availability in certain applications and benefiting from information exchange across domains. We propose extending this approach to chemical sensing. In this context, we leverage transfer learning based on fingerprints pretrained in other domains to model new instrument/sensor data representations. Herein, we demonstrate how the output of a model system comprising an integrated electrochemical sensor array for analysis of multi-component liquids can be encoded as image representations to leverage existing deep learning computer vision models pre-trained on large collections of image data. The models effectively extract features from these representations and feed specific model heads to perform downstream tasks. More specifically, the raw potentiometric data from the sensor array is processed to yield a spectral response which is cleaned (moving average and SNV) and transformed to an image representation (Gramian Angular Field). Off-the-shelf features are generated leveraging pretrained neural networks developed to classify natural images. Dimensionality reduction yields a set of features that are then used to train machine learning classification or regression heads. The pipeline was applied to generate visual fingerprints of multiple beverages, proving full discrimination of liquid types, and enabling class identification (mean accuracy ~95%) on a model dataset comprising 11 Italian wines. The results demonstrate the successful creation of a new representation of the chemical sensing space which achieves comparable performance as domain-specific hand-crafted feature selection. The present contribution represents an example of integration of data processing techniques and publicly available libraries/models to support transfer of methodologies across domains.
Author(s): Gianmarco Gabrieli, Matteo Manica, Patrick Ruch
Symposium/Session: Data-driven Design of Energy Materials
Abstract: An accurate knowledge of potential energy surfaces and local forces is of paramount importance to implement molecular dynamics (MD) calculations. As the exact solution of the Schrödinger equation for electrons and nuclei becomes quickly impractical with growing system size, approximate methods have been developed in a delicate balance between performance and accuracy. Besides density functional theory and empirical force fields, machine learning has emerged as a novel and effective framework, leading to the family of so-called neural network potentials (NNPs). The success of classical NNPs is nowadays testified by several high-impact scientific works and by the development of dedicated software libraries. At the same time, the quantum mechanical character of the relationship between molecular configurations, energies and forces immediately leads to the question whether quantum machine learning (QML) methods could provide even greater advantages. Inspired by this idea, our work aims at establishing a direct connection between quantum neural networks (QNNs) and molecular force fields. We carry out such program by designing a dedicated quantum neural network architecture and by applying it to different molecules of growing complexity. The quantum models exhibit a larger effective dimension with respect to classical counterparts, achieving competitive performances. Furthermore, we leverage the recently introduced framework of geometric QML to develop equivariant quantum neural networks that natively respect relevant sets of molecular symmetries upon input of cartesian coordinates, thus enhancing trainability and generalization power. Notably, QML is now reaching a level of maturity at which the quest for non-trivial candidate problems -- both practically relevant and suitable to showcase quantum advantage over classical counterparts -- becomes of paramount importance. With our present contribution, we not only show that QNNs can adequately serve the purpose of generating molecular force fields, but we suggest that this may constitute an appealing playground to test and understand the potential of QML techniques.
Author(s): Francesco Tacchino, Oriel Kiss, Isabel Nha Minh Le, Sofia Vallecorsa, Ivano Tavernelli
Symposium/Session: Helping Chemists Manage their Data
Abstract: In our evolving society, many problems such as climate change, sustainable energy systems, pandemics, and others require faster advances. In chemistry, scientific discovery also involves the critical task of assessing risks associated with proposed novel solutions before moving to the experimental stage. Fortunately, recent advances in machine learning and AI have proven successful in addressing some of these challenges. However, there remains a gap in technology that can support the development of end-to-end discovery process, which seamlessly integrate the vast array of available technologies into a flexible, coherent, and orchestrated system. These applications must manage complex knowledge at scale, enabling subject matter experts (SMEs) to efficiently consume and produce knowledge. Moreover, the discovery of novel functional materials heavily relies on the development of exploration strategies in the chemical space. For instance, generative models have gained attention due to their ability to generate vast volumes of novel molecules across material domains. However, the high level of creativity these models exhibit often translates into low viability of the generated candidates. To address these challenges, we propose a workbench framework that facilitates human-AI co-creation, enabling SMEs to reduce time-to-discovery and the associated opportunity costs. This framework relies on a knowledge base with domain and process knowledge and user-interaction components to acquire knowledge and advise the SMEs. The framework currently supports three main activities: generative modeling, dynamic dataset triage, and risk assessment.
Author(s): Emilio Ashton Vital Brazil, Renato Fontoura De Gusmao Cerqueira, Carlos Raoni De Alencar Mendes, Vinicius Segura, Juliana Jansen Ferreira, Dmitry Zubarev, Kristin Schmidt, Dan Sanders
Symposium/Session: Helping Chemists Manage their Data
Abstract: The FAIR principles guidelines aim to enhance the discovery and usage of digital objects by humans and computational agents. They are formulated at a high level and, as such, are differently interpreted and implemented by distinct communities of practice, which often have to collaborate, such as in the context of the use of chemicals in scientific discovery. Practical approaches outlining FAIR-related characteristics of digital objects are few and far between, and most of these are domain-agnostic, i.e., they do not consider scientific communities’ varied needs and require specific implementations and combinations for better estimation. Questionnaires have been considered the main mechanism to systematically capture the implementation choices corresponding to each FAIR principle. However, existing questionnaires focus on FAIR assessment using identical questions for distinct communities, i.e., evaluating the digital objects in the same way and usually reckoning that the digital objects have passed through a FAIRification process. In other words, they do not aim at characterizing digital objects, which would give a current overview of the properties that most contribute to their FAIRness. This work builds on the FAIR principles while considering distinct proposed metrics and tools for manual, automated, and semi-automated FAIRness assessment, like a questionnaire specifically designed to assess a plurality of interrelated scientific domains and their possible integration. It reports on applying an improved questionnaire aiming to characterize digital objects’ properties towards their FAIRification on two Materials databases: Materials Cloud and PubChem. We investigate the hypothesis that this questionnaire instills digital objects’ characteristics with a richness of details about their current properties and outlines their main elements for FAIRification. We demonstrate that the improved questionnaire is a more suitable tool for both domain specialists and data stewards to investigate digital objects’ characteristics and improve on them.
Author(s): Leonardo Guerreiro Azevedo, Julio Cesar Cardoso Tesolin, Gabriel Banaggia, Renato Fontoura De Gusmao Cerqueira
Symposium/Session: Simulation and Data Science Approaches to Design Biologically Relevant Polymers and their Applications
Abstract: In recent years, language models have disrupted multiple application domains, from natural language to chemistry and material science. Since their inception, they have enabled a revolutionary way to hypothesize the design of novel materials, shown remarkable capabilities in modeling reactivity and successfully adopted in automating chemical synthesis planning. This talk will cover our recent research on applying language models to accelerate scientific discovery in chemistry, from small molecules to polymers and proteins. Our methodologies cover textual representation of molecules, natural language, and hybrid representations, which allow leveraging different data modalities to build holistic foundation models. Besides introducing the methodologies, we will also cover various applications of language models for material design and synthesis. By harnessing the power of language models and the growing availability of datasets, we can transform the discovery process at different stages, paving the way for a revolutionary computer-aided approach to designing, optimizing, and validating novel materials.
Author(s): Matteo Manica
Symposium/Session: Advances in Carbon Capture, Utilization, and Storage for a Sustainable Energy Future
Abstract: High-Throughput Computational Screening (HTCS) is an invaluable technique that has been used to sift through the growing number of candidate gas capture and separation materials compiled in databases during the last two decades. The screening workflow typically consists of loading the material structure from a Crystallographic Information File (CIF) and performing Grand Canonical Monte Carlo (GCMC) simulations of the adsorption behavior of molecules of interest. GCMC provides the equilibrated number of molecules that adsorb on each material at a given temperature and pressure. By sweeping a range of pressures at a fixed temperature, one obtains an adsorption isotherm.
In more advanced studies, the simulated isotherms are fed as input to a process-level optimization method that propagates the molecular-level performance metrics to the process scale. The process-level model covers both the equilibrium and kinetics aspects of adsorption, including mass transfer considerations. Sensitivity analysis shows that the process-level performance is heavily influenced by the adsorption kinetics.
In this work, we performed molecular- to process-level screening of ~1000 metal-organic frameworks (MOF) for carbon capture. We simulated their adsorption isotherms and propagated their process-level performance, leading to a material ranking. We then took the top 10% materials and investigated with classical Molecular Dynamics (MD) simulations how the adsorbate molecules diffuse into the system. We found that many apparently good carbon capture materials in fact had very low diffusivity, which severely impacts their real-world performance at the process level.
Finally, we propose a computational workflow that treats the diffusivity coefficient as a top-tier metric in HTCS studies going forward to accelerate the discovery of new sustainable materials for carbon capture.
Author(s): Felipe Lopes Oliveira, Rodrigo Neumann Barros Ferreira, Binquan Luan, Ashish B. Mhadeshwar, Jayashree Kalyanaraman, Anantha Sundaram, Joseph M. Falkowski, Jonathan R. Szlachta, Yogesh V. Joshi, Mathias Steiner
Symposium/Session: Symposium on Materials for Lithium and Sodium Batteries
Abstract: Lithium-iodine batteries are among a class of next generation conversion-based chemistries that deliver high energy density using abundant, low-cost materials. There are two main obstacles facing these chemistries: the instability of the lithium anode that leads to capacity fade and the low utilization of the active material under practical cell conditions that leads to low specific capacity. In this work we address anode instability through chemical treatment of the surface layer to form a borate rich interphase that protects the lithium from parasitic reactions. We demonstrate that the properties of the treated lithium surface are highly dependent on the treatment environment and require precise tuning to achieve optimal performance. The stabilized lithium surface shows improved capacity retention in lithium-iodine cells at practical mass loadings (above 10 mg/_cm_2). Further, we explore the relationship between iodine utilization and mass-transport limitations. The results indicate that diffusion limited transport of the dissolved active material is the major source for the reduction of specific capacity with increasing iodine loading. These studies provide design rules for materials discovery to enable stable and high energy density conversion batteries.
Author(s): Murtaza Zohair, Maxwell Giammona, Linda Sundberg, Andy Tek, Anthony Fong, Khanh Nguyen, Vidushi Sharma, Holt Bui, Young-hye Na
Symposium/Session: Chemical Information Across the Chemistry Enterprise
Abstract: Molecular fragmentation has been frequently used for machine learning, molecular modeling, and drug discovery studies. However, the current molecular fragmentation tools often lead to large fragments that are useful to limited tasks. Specifically, long aliphatic chains, certain connected ring structures, fused rings, as well as various nitrogen-containing molecular entities often remain intact when using BRICS. With no known methods to solve this issue, we find that the fragments taken from BRICS are inflexible for tasks such as fragment-based machine learning, coarse-graining, and ligand-protein interaction assessment. In this work, we develop a revised BRICS (r-BRICS) module that allows more flexible fragmentation on a wider variety of molecules. We show that r-BRICS generates smaller fragments than BRICS, allowing localized fragment assessments. Furthermore, r-BRICS generates a fragment database with significantly more unique small fragments than BRICS, which is useful for fragment-based drug discovery, submolecular motif identification and coarse-grained simulations.
Author(s): Leili Zhang
Symposium/Session: Past, Present and Future of AI and Predictive Analytics for Chemical Reactions
Abstract: The right solvent is a crucial factor in achieving environmentally friendly, selective, and highly converted chemical reactions. While artificial intelligence-based computer-aided synthesis tools are capable of predicting starting materials and reactants for synthesizing a desired product, they often lack the ability to reliably predict reaction conditions such as the appropriate solvent. In this study, we demonstrate that data-driven machine-learning models can reliably predict the correct solvent for a broad spectrum of organic reactions. We extracted single-solvent reactions from two patent-derived datasets, Pistachio and the USPTO dataset which is openly available. We trained a BERT-based classifier and a random forest in combination with differential reaction fingerprints, achieving a Top-3 accuracy of up to 86.88\% for predicting the most commonly used solvent, as well as a reliable prediction of underrepresented classes with an F1-macro score of up to 56.87\%. An uncertainty analysis revealed that the models' misclassifications can often be explained by the fact that the reaction class of the reaction in question can be run in multiple solvents. These models are currently undergoing experimental validation in a campaign to test reactions that were successfully run in a solvent that differs from the one predicted by the model, in order to evaluate their real-world applicability. This work highlights the potential of data-driven approaches for addressing key challenges in organic synthesis, demonstrating the practical application of machine learning models in predicting reaction solvents for more efficient and sustainable chemical synthesis
Author(s): Oliver Schilter, Carlo Baldassari, Teodoro Laino, Philippe Schwaller
Symposium/Session: Quantum Computing for Tackling Challenges in Quantum Chemistry
Abstract: In recent years, quantum computing has emerged as a promising platform for simulating strongly correlated systems in chemistry, for which the standard quantum chemistry methods are either qualitatively inaccurate or too expensive. However, due to the hardware limitations of the available noisy near-term quantum devices, their application is currently limited only to small chemical systems. One way for extending the range of applicability can be achieved by means of hybrid classical-quantum embedding approaches, multiple of which have been put forward all with different tradeoffs. In this talk, I will present a projection-based embedding method for combining the variational quantum eigensolver (VQE) algorithm, although not limited to, with density functional theory (DFT). The developed VQE-in-DFT method was recently implemented in Qiskit and used to compute the triple bond breaking process in butyronitrile on an IBM quantum device. Our results show that the developed method is a promising approach for simulating systems with a strongly correlated fragment on a quantum computer. This development as well as its future extensions will benefit many different chemical areas including the computer aided drug design as well as the study of metalloenzymes with strongly correlated components.
Author(s): Max Rossmannek, Fabijan Pavošević, Angel Rubio, Ivano Tavernelli