Influence the future of Ray with our Ray Community Pulse survey. Complete it by Monday, January 27th, 2025 to get exclusive swag for eligible participants.

Home EventsLarge Scale Data Loading and Data Preprocessing wi...

Other Events

Ray Summit 2024

09 . 30 . 2024 , 03:00 PM (PST)

Ray Summit 2023

09 . 18 . 2023 , 03:30 PM (PST)

Powering Open Data Hub with Ray

06 . 23 . 2021 , 08:35 PM (PST)

Introducing Amazon SageMaker Kubeflow Reinforcement Learning Pipelines for Robotics

06 . 23 . 2021 , 08:35 PM (PST)

Software 2.0 Needs Data 2.0: A New Way of Storing and Managing Data for Efficient Deep Learning

06 . 23 . 2021 , 08:35 PM (PST)

Mars-on-Ray: Accelerating Large-scale Tensor and DataFrame Workloads

06 . 23 . 2021 , 08:35 PM (PST)

A Growing Ecosystem of Scalable ML Libraries on Ray

06 . 23 . 2021 , 08:35 PM (PST)

Using RLLib in an Enterprise Scale Reinforcement Learning Solution

06 . 23 . 2021 , 08:35 PM (PST)

The Journey from AI to Bedside: Translating Academic Research to Serving Patients Using Ray

06 . 23 . 2021 , 08:00 PM (PST)

A Deep Dive into Ray’s Scheduling Policy

06 . 23 . 2021 , 08:00 PM (PST)

Machine Learning and Microbiology in the Cloud Through Ray

06 . 23 . 2021 , 08:00 PM (PST)

Anomaly Detection on Remote Sensing with Ray + Horovod

06 . 23 . 2021 , 08:00 PM (PST)

Applying Ray and RLlib to Real-life Industrial Use Cases

06 . 23 . 2021 , 08:00 PM (PST)

Scaling and Unifying SciKit Learn and Spark Pipelines using Ray

06 . 23 . 2021 , 07:25 PM (PST)

How Smart is the ECG? Lessons from Advanced ECG Analytics

06 . 23 . 2021 , 07:25 PM (PST)

Ray Summit

Large Scale Data Loading and Data Preprocessing with Ray

Wednesday, June 23, 8:00PM UTC

Wei Chen, Deep Learning Software Engineer, NVIDIA

View Slides >>>

Data loading is one of the most crucial steps in the DL pipeline. It needs to be designed and implemented in both a flexible and performant manner so that (1) it can be resued to support different DNN models, (2) it can match the speed of GPU compute, and (3) it can scale to multi-cores and even multi-nodes. However, achieving these design goals is not trivial, especially given that the most commonly used language in DL is python in which there is no good support for parallel programming.

In this talk, we will show that how we can use Ray to implement our data loading pipeline. Powered by the Ray actor, we are able to reuse most of our python modules and run our data loading pipeline in parallel without worrying about the overhead of managing it at scale. We will also talk about the experience and lessons we learned during our implementation and production deployment.

Speakers

Wei Chen

Deep Learning Software Engineer, NVIDIA

Wei Chen is a deep learning software engineer at NVIDIA.