- PODC 2022
Virtual experiments — a lab in the cloud
Measuring properties of things, be it molecules, computer systems, or markets, is the heart of the scientific method. Many of these real-world measurements can be simulated or replicated on a computer by physics simulations, data-driven, or inference-based methods. We call the combination of setup, compute, and analysis steps which measure a particular characteristic of an input system on a computer, a virtual-experiment.
Virtual experiments have unique characteristics compared to other workflows of tasks: multiple virtual-experiments could be available for measuring the same characteristic using different methods or tools, each best suited for a particular set of objectives; they could also require long execution times and process or produce large amounts of data over many nodes, necessitating a robust support to ensure completion, re-use of outputs from previous calculations, and efficient handling of data movement between steps of the virtual-experiment.
Since 2015 we have been developing technologies for developing and executing virtual-experiments driven by our work with collaborators in UK and beyond on materials design. Two technologies in particular have come out of this work: Datashim and the Simulation Toolkit for Scientific Discovery (ST4SD) Runtime.
A runtime compatible with cloud and HPC that supports AI-surrogate development
The ST4SD runtime allows scientists to create and run virtual-experiments. It provides features like flexible memoization, robust execution support, and deployment of the same virtual-experiment across classic HPC and cloud. Our current work includes specific support for creating and using AI surrogates of physical models; this includes the ability to automatically create AI-powered surrogate versions of physics-based virtual-experiments once the core surrogate model is provided.
Accessing scientific data in the cloud
We developed Datashim as part of the EVOLVE H2020 project to answer the data-access issues that big-science workloads on cloud infrastructure were facing. Datashim provides abstractions for frictionless and performant data-access for these workloads in Kubernetes. It is an LF Data&AI incubation project and has been used by EMBL to accelerate genomic workflows, as well as the Swiss Data Science Center for their user portal Renku.
Easily consume start-of-the-art computational methods
We believe virtual-experiments should be usable by all researchers who need them and not just those who are experts in computational methods. To enable this, we are examining how to build a virtual-experiment registry, and associated developer guidelines, which would allow scientists, to share a virtual-experiments, and other tools to automatically consume and use them.