Ray Libraries in Practice: Multimodal AI Workloads

Wednesday, June 25, 5:00PM UTC

Multimodal AI – which processes unstructured unstructured data like text, images, audio, and video – is powering the next wave of AI breakthroughs. The tradeoff? Exploding data and model sizes, demanding workload complexity, and infrastructure pushed to the edge.

Unlike traditional ML workloads that could scale using CPUs or a single GPU node, multimodal workloads require coordinated use of CPUs for data loading and preprocessing, and multi-node, multi-GPU clusters for compute-heavy tasks like training and inference – all within a single, end-to-end pipeline.

Ray’s open-source, distributed AI compute framework was built for this challenge. In this session, we’ll walk through an image semantic-search and classification demo and show you how Ray enables heterogeneous compute clusters that efficiently support multimodal AI workloads.

What You’ll Learn:

How to scale data (ingestion, preprocessing, batch inference), train (distributed training) and serve (online inference) workloads in a highly performant and fault tolerant way (out of the box).
How to manage dependencies and scale compute efficiently without the operational overhead.
Create production batch Jobs for offline workloads (embedding generation, model training, etc.) and production online Services that can scale up/down as needed.

Plus, we’ll walk through an easy to follow image semantic-search demo.

LinkWho is this For?

This session is for ML engineers and infra teams running complex multimodal AI workloads. If you are working with unstructured data like text, images, audio, video, etc. and need to coordinate heterogenous workloads, you will learn how Ray can simplify and scale your workflow. Ideal for those building high-performance, end-to-end AI systems.