This month we had two Ray committers and a Pinterest community Ray user to share with the community:
How to use Ray for batch inference at scale, comparing solutions against other alternatives such as Apache Spark and SageMaker.
How Ray Data streaming can be used to further optimize hardware accelerators (CPUs and GPUs) for parallel pipeline execution.
How Ray is used at Pinterest for workflow-oriented ML development
The slides and video are below 👇, scroll to the end.
Agenda
5:30-6:00 pm :Networking, Snacks & Drinks
6:00 pm: Talk 1 (30-35 mins): Offline Batch Inference: Comparing Ray, Apache Spark, SageMaker - Amog Kamsetty Anyscale
Q & A (10 mins)
6:35 pm: Talk 2 (30-35 mins): Streaming distributed execution across CPUs and GPUs -Eric Liang, Anyscale
Q & A (10 mins)
7:10 pm: Talk 3 30-35 mins): Workflows to interactive pipelines: How Pinterest plans to improve ML developer velocity with Ray - Se Won Jang, Pinterest
Q & A (10 mins)
Talk 1: Offline Batch Inference: Comparing Ray, Apache Spark, and SageMaker
Abstract: As more companies use large scale machine learning (ML) models for training and evaluation, offline batch inference becomes an essential workload. A number of challenges come with it: managing compute infrastructure; optimizing use of all heterogeneous resources; and transferring data from storage to hardware accelerators. Addressing these challenges, Ray performs significantly better as it can coordinate clusters of diverse resources, allowing for better utilization of the specific resource requirements of the workload.
In this talk we will talk about:
What are the challenges and limitations
Examine three different solutions for offline batch inference: AWS SageMaker Batch Transform, Apache Spark, and Ray Data.
Share our performance numbers showing Ray data as the best solution for offline batch inference at scale
Bio: Amog Kamsetty is a software engineer at Anyscale where he works on building distributed ML libraries and integrations on top of Ray. He is one of the lead developers of Ray's distributed training and offline batch inference libraries.
Talk 2: Streaming distributed execution across. CPUs and GPUs
Abstract: Some of the most demanding machine learning (ML) use cases we have encountered involve pipelines that span both CPU and GPU devices in distributed environments. These situations are common workloads, including:
Batch inference, which involves a CPU-intensive preprocessing stage (e.g., video decoding or image resizing) before utilizing a GPU-intensive model to make predictions.
Distributed training, where similar CPU-heavy transformations are required to prepare or augment the dataset prior to GPU training.
In this talk, we examine how Ray data streaming works and how to use it for your own machine learning pipelines to address these common workloads utilizing all your compute resource–CPUs and GPUs–at scale.
Bio: Eric Liang is a software engineer at Anyscale and TL for the Ray open source project. He is interested in building reliable and performant distributed systems. Before joining Anyscale, Eric was a staff engineer at Databricks, and received his PhD from UC Berkeley.
Talk 3: Workflows to interactive pipelines: How Pinterest plans to improve ML developer velocity with Ray
Abstract: Large scale machine learning at web scale companies often involve a long chain of big-data processing, training, and evaluation jobs expressed as workflows. ML application and platform teams invest in making these workflows easier to compose, reproduce, and production ready. However, workflow-based developer experiences have inherent limitations when it comes to quickly iterating on new ideas - In many cases, jobs have to be written, tested and integrated carefully before a hypothesis can be tested.
In this talk, we will discuss how Pinterest plans on shifting this workflow-oriented ML development paradigm using Ray, into one that empowers interactive and fast iteration while ensuring production readiness. We will visit various investments Pinterest made to streamline workflow-oriented ML development, common usage patterns, and where we envision ourselves in the next 12 months with Ray
Bio: Se Won Jang is the engineering manager of the ML Data Platform team at Pinterest. Before this role, he was the tech-lead of the feature store, dataset management, and training orchestration initiatives at Pinterest.
Watch: MEETUP VIDEO
Slides: Introduction Talk
Slides: Batch Inference Talk