Thymesisflow: A software-defined, HW/SW co-designed interconnect stack for rack-scale memory disaggregation
- Christian Pinto
- Dimitris Syrivelis
- et al.
- 2020
- MICRO 2020
Computing systems are reaching their limits in terms of sustainability and costs associated to their maintenance and operation. In parallel, workloads, are becoming highly heterogenous and require increasing amounts of resources (memory, accelerators, etc.) that are not easily found in commodity servers and often require the creation of ad-hoc configurations, further impacting on the sustainability of systems. A promising solution to the above issues is a Composable System. Resources are not statically grouped in servers but are instead part of a large pool and connected through a high speed network fabric. Composable resources can then be grouped together to re-create the abstraction of a server. The resulting flexibility allows the infrastructure to adapt to workloads, removing the need for over-provisioning or the creation of custom cluster configurations. This unlocks new opportunities to strategically handle resources allocation and eliminate electricity consumption generated by idle hardware and cooling. In addition, with a composable system, the update cycles of the various hardware components can be decoupled leading to a substantial reduction of capital expenses, as well as the reduction of hardware waste when components are still fit for purpose.
Our team started working on disaggregated infrastructure thanks to the H2020 European Research project dReDBox that we coordinated. dReDBox served as the playground and technology validator for resources disaggregation, leading to the delivery of a fully working disaggregated system with custom ARM based processors, memory and accelerator modules, a full software stack and full-optical switching. Thanks to dReDBox we attracted attention within IBM and we soon migrated from an ARM based platform, to IBM POWER9™ systems to leverage the OpenCAPI coherent interconnect. The next step was ThymesisFlow, a first of its class, fully open source, hardware demonstrator showcasing memory disaggregation built on top of off-the-shelf IBM POWER9™ processors. We have exploited ThymesisFlow for building several software demonstrators leading to publications and demos. We are currently exploring the upcoming CXL standard for building fully composable systems via a high-speed/low-latency fabric.
We are strongly engaged with the OpenFabrics Alliance where we are co-chairing a workgroup called OpenFabrics Management Framework (OFMF). In the OFMF workgroup we are working on defining a reference implementation for a universal fabric manager that exposes a standardized interface based on the RedFish standard schema to client software. The OFMF is capable of interfacing with multiple fabric managers by means of a set of agents in charge of translating vendor specific APIs into RedFish.
We are currently working on how the current cloud control plane (e.g. RedHat OpenShift) needs to change in order to move from a statically defined infrastructure to a fully composable one where resources are added/removed to/from nodes depending on workloads needs.