Seeing the unseen: leveraging vision models to extract digital fingerprints of complex liquids

Gianmarco Gabrieli; Matteo Manica; Patrick Ruch

ACS Fall 2023

Talk

13 Aug 2023

Seeing the unseen: leveraging vision models to extract digital fingerprints of complex liquids

Abstract

Advances in deep learning and machine learning models combined with high-throughput experimentation have shown potential to accelerate chemical and materials discovery and highlighted the benefits of AI-assisted research practices. The recent advent of multi-domain and multi-task models trained by self-supervision, so-called foundation models, bears also promises for extending learnt representations across multiple fields, thus counteracting the reduced data availability in certain applications and benefiting from information exchange across domains. We propose extending this approach to chemical sensing. In this context, we leverage transfer learning based on fingerprints pretrained in other domains to model new instrument/sensor data representations. Herein, we demonstrate how the output of a model system comprising an integrated electrochemical sensor array for analysis of multi-component liquids can be encoded as image representations to leverage existing deep learning computer vision models pre-trained on large collections of image data. The models effectively extract features from these representations and feed specific model heads to perform downstream tasks. More specifically, the raw potentiometric data from the sensor array is processed to yield a spectral response which is cleaned (moving average and SNV) and transformed to an image representation (Gramian Angular Field). Off-the-shelf features are generated leveraging pretrained neural networks developed to classify natural images. Dimensionality reduction yields a set of features that are then used to train machine learning classification or regression heads. The pipeline was applied to generate visual fingerprints of multiple beverages, proving full discrimination of liquid types, and enabling class identification (mean accuracy ~95%) on a model dataset comprising 11 Italian wines. The results demonstrate the successful creation of a new representation of the chemical sensing space which achieves comparable performance as domain-specific hand-crafted feature selection. The present contribution represents an example of integration of data processing techniques and publicly available libraries/models to support transfer of methodologies across domains.

Workshop paper