Lab-in-The-Cloud-SolidColor-2.png

Virtual experiments — a lab in the cloud

Helping researchers build and run virtual versions of real-world experiments in the cloud.

Overview

Measuring properties of things, be it  molecules, computer systems, or markets, is the heart of the scientific method. Many of these real-world measurements can be simulated or replicated on a computer by physics  simulations, data-driven, or inference-based methods.  We call the combination of setup, compute, and analysis steps which measure a particular characteristic of an input system on a computer, a virtual-experiment.  

Virtual experiments have unique characteristics compared to other workflows of tasks: multiple virtual-experiments could be available for measuring the same characteristic using different methods or tools, each best suited for a particular set of objectives; they could also require long execution times and process or produce large amounts of data over many nodes, necessitating a robust support to ensure completion, re-use of outputs from previous calculations, and efficient handling of data movement between steps of the virtual-experiment.

Since 2015 we have been developing technologies for developing and executing virtual-experiments driven by our work with collaborators in UK and beyond on materials design. Two technologies in particular have come out of this work: Datashim and the Simulation Toolkit for Scientific Discovery (ST4SD) Runtime. 

A runtime compatible with cloud and HPC that supports AI-surrogate development

The ST4SD runtime allows scientists to create and run virtual-experiments. It provides features like flexible memoization, robust execution support, and deployment of the same virtual-experiment across classic HPC and cloud. Our current work includes specific support for creating and using AI surrogates of physical models; this includes the ability to automatically create AI-powered surrogate versions of physics-based virtual-experiments once the core surrogate model is provided. 

Accessing scientific data in the cloud

We developed Datashim as part of the EVOLVE H2020 project to answer the data-access issues that big-science workloads on cloud infrastructure were facing. Datashim provides abstractions for frictionless and performant data-access for these workloads in Kubernetes. It is an LF Data&AI incubation project and has been used by EMBL to accelerate genomic workflows, as well as the Swiss Data Science Center for their user portal Renku. 

Easily consume start-of-the-art computational methods

We believe virtual-experiments should be usable by all researchers who need them and not just those who are experts in computational methods. To enable this, we are examining how to build a virtual-experiment registry, and associated developer guidelines, which would allow scientists, to share a virtual-experiments, and other tools to automatically consume and use them.

Video of Datashim

Datashim - a framework for declarative management of datasets on Kubernetes

Publications

Resources

Contributors