AI Hardware Forum 2025
- Yorktown Heights, NY, USA
IBM and Ray have collaborated for years to make Ray an efficient and production-ready solution for enterprise workloads. Most recently, we expanded our partnership to advance KubeRay and its community, enabling Kubernetes to be the recommended platform for Ray in the enterprise.
Today, AI foundation models and LLMs bring new and unique requirements for distributed computing. As a result, IBM and Ray are making Ray the most scalable and efficient framework for foundation model and LLM data preparation and validation, including on IBM's watsonx data and AI platform.
For presentation times of featured talks see the agenda section below. Note: All times are displayed in your local time.
Serverless computing is a development model that lets developers build and run applications without having to manage servers. Two popular open source frameworks for running serverless workloads are Ray Serve and Knative Serving. Each framework takes a slightly different approach to serverless: Ray focuses primarily on serving machine learning models, whereas Knative focuses on building automatic HTTP services more generally. But despite these differences, there are many opportunities for both communities to learn from one another, which this talk will highlight. Drawing on experience participating in both communities and building open technologies using both frameworks, this talk compares and contrasts the different approaches that Ray and Knative take to serverless. We also uncover best practices and lessons learned for serverless development as well as potential pitfalls and difficulties that serverless users should be aware of. Furthermore, we highlight key pillars for the next generation of serverless applications, including possible areas of collaboration between the Ray and Knative communities. Speakers: Paul Schweigert - Senior Software Engineer, IBM Michael Maximilien - Distinguished Engineer, IBM
We demonstrate the integration of Ray with CodeFlare and Red Hat OpenShift Data Science Pipelines (RHODS Pipelines) for automatically scaling the execution of end-to-end workflows to train and validate foundation models on an OpenShift Container Platform (OCP). Workflow pipelines in foundation model development typically involves running various pre-processing steps to deduplicate data sources, filter out biased and low-quality data, and remove hate and profanity contents. The preprocessed and cleaned-up data are then tokenized and used to further train or fine-tune existing generative pre-trained models. Auto scaling is critical in the execution of this workflow because these steps are usually very compute intensive, with some of the steps iterated several times. RHODS Pipelines is a tool for specifying workflow pipelines as DAGs. It uses Tekton as the workflow engine to deploy pods to execute the workflow DAG in a Kubernetes cluster. However, RHODS Pipelines +Tekton lacks a way for the user to automatically scale up with parallel pods to run a task in the DAG. CodeFlare is a tool to create the necessary configurations for deploying a Ray cluster in an OCP and submitting a task programmed with Ray to execute in parallel. We explore the integration of Ray with CodeFlare and RHODS Pipelines, such that the entire end-to-end workflow DAG, or any subset of them, can be easily specified, independently managed, automatically scaled up by individual developers. We will show foundation model use cases that leverage and benefit from a simple interface to provide specific parameters and specify the DAG in RHODS Pipelines, allowing the tool to generate all the necessary configurations and artifacts for effectively running foundation model workflows in parallel with Ray on OpenShift. Speakers: Yuan-Chi Chang - Research Staff Member, IBM Research Alex Corvin - Software Engineering Manager, Red Hat
KServe is a Opensource production-ready model inference framework on Kubernetes utilizing many knative's features such as routing for canary traffic and payload logging. However, the one model per container paradigm limits the concurrency and throughput when sending multiple inference requests. With RayServe integration, a model can be deployed as individual Python workers allowing for parallel inference. This enables concurrent inference requests to be processed simultaneously, improving overall efficiency. In this talk, we will share how you can configure, run, and scale machine learning models in Kubernetes using KServe and Ray. Speakers: Ted Chang - Software Engineer, IBM Jim Busche - Software Engineer, IBM
Join us for a hands-on demo of the CodeFlare-SDK, an open-source project that simplifies cloud-native data pre-processing, model training and validation with an intuitive Python interface to Ray, PyTorch/TorchX, and Kubernetes. With the CodeFlare-SDK, you can easily manage your cloud resources, submit jobs, and monitor job status, without worrying about the complexities of DevOps and cloud infrastructure. In this demo, Mustafa Eyceoz and Atin Sood will guide you through the CodeFlare-SDK workflow, from resource allocation to ML job submission and monitoring. You will see firsthand how easy it is to train large scale models (foundation models) in the cloud using the CodeFlare-SDK. Don't miss this opportunity to learn how CodeFlare can make cloud-native model training more accessible and manageable for developers. Speakers: Mustafa Eyceoz - Software Engineer, Red Hat Atin Sood - Program Manager - IBM Research, IBM
Emerging AI/ML workflows, like the end-to-end cycle to train and deploy Foundation Models, are increasingly complex tasks with wide-ranging compute and data requirements. In this fundamental paradigm shift, a single and typically very large model is trained and adapted to work on many different specific tasks. From data cleaning and preparation, to large-scale distributed training, to fine-tuning and validation, to scalable inferencing, the current reality is that working with foundation models is a remarkably complex task, involving fragmented software tooling that often requires extensive expertise to be deployed and operated at large scale. In this talk, we will show how we simplified the end-to-end life cycle of foundation models with a cloud-native, and scalable stack for training, fine- and prompt-tuning, and inferencing, realized with Red Hat OpenShift Data Science (RHODS). We will give an overview of how we are introducing new open source components, like CodeFlare SDK, Multi-cloud App Dispatcher, and InstaScale and integrating with Ray and PyTorch to enable large scale data preparation, training and validation. Once models are validated, we will show how our inference stack, based on ModelMesh and KServe, can be used to deploy models in production with RHODS model serving, and how we use Data Science Pipelines to orchestrate models, including versioning and tracking in production deployment. We will also show how we operate this stack, from public cloud to on-premise, and how we are leveraging it to enable and accelerate the value of foundation models in a range of use cases, including success stories highlighting the benefits of this full stack for end-to-end life cycle of foundation models. Speakers: Carlos Costa - Principal Research Scientist, IBM Research Taneem Ibrahim - Engineering Manager, Red Hat Nick Hill - Senior Software Engineer and Architect, AI Infrastructure, IBM Research