HomeEventsRay Datasets: Scalable data preprocessing for distributed ML

Webinar

Ray Datasets: Scalable data preprocessing for distributed ML

Register for the webinar and get a first-hand look at how Ray Datasets:

  • Provides hyper-scalable parallel I/O to most popular storage backends and file formats

  • Supports common last-mile preprocessing operations, including basic parallel data transformations such as map, batched map, and filter, and global operations such as sort, shuffle, groupby, and stats aggregations

  • Efficiently integrates with data processing libraries (e.g., Spark, Pandas, NumPy, Dask, Mars) and machine learning frameworks (e.g., TensorFlow, Torch, Horovod)

LinkResources

Speakers

Clark Zinzow

Clark Zinzow

Software Engineer, Anyscale, Anyscale

Alex Wu

Alex Wu

Software Engineer, Anyscale, Anyscale

Other Events

Scaling Robot Policy Evaluations to Thousands of Parallel Simulations

07 . 22 . 2026  ,  03:30 PM (PST)

Anyscale on Azure: Build and deploy AI at scale in your own tenant

06 . 16 . 2026  ,  03:30 PM (PST)

How Torc Robotics Scales Multimodal AI for Autonomous Driving with Ray

06 . 10 . 2026  ,  03:30 PM (PST)