The role of partial charges in data-driven analytics of gas adsorption simulations
High-Throughput Computational Materials Screening has been used to evaluate the gas capture and storage potential of several thousands of crystalline nanoporous materials. These screening workflows often involve molecular-level computationally expensive techniques such as Grand Canonical Monte Carlo (GCMC). To avoid the computational cost of running Physics-based simulations for all candidate materials, one often tries to identify correlations between adsorption metrics and geometrical/topological descriptors obtained from inexpensive computational methods, and build a data-driven model to estimate the desired physical properties while avoiding the costly physics simulations. In the GCMC technique, the electrostatic energy plays an important role in the Boltzmann factor that determines the acceptance/rejection of a Monte Carlo move. Therefore, the assignment of partial atomic charges is a crucial step in any screening workflow. Many charge calculation methods exist, ranging from simpler charge-equilibration to DFT-derived DDEC methods, each one with associated computational cost (dis)advantages. In this work, we explore the cost-benefit balance over a range of charge calculation methods and the impact they cause in our ability to identify correlations and build data-driven predictive models for adsorption.