Multimodal data curation
Large-scale pipelines for curating and preparing multimodal data across videos, images, text, and audio.
Powered by Ray, the world’s most widely adopted AI compute engine.



Large-scale pipelines for curating and preparing multimodal data across videos, images, text, and audio.
Orchestrate model training across GPU clusters with elastic scaling, last-mile data preprocessing, and GPU observability.
Process and generate embeddings at scale for downstream search, retrieval, or training use cases.
Run LLM inference and training on post-training frameworks like SkyRL and veRL, natively built on Ray.
Anyscale enables us to push the boundaries of what’s possible in generative AI by giving us the flexibility to scale workloads seamlessly. This removes the risk around our infrastructure and allows our team to focus on innovation rather than infrastructure bottlenecks”

Optimize distributed training, data curation, and batch inference pipelines with Ray on Anyscale.
Scale existing AI libraries like PyTorch, vLLM, SGLang, and XGBoost with Python APIs across thousands of nodes.
Ray is the world’s most trusted AI compute engine for building, running and scaling data-intensive AI workloads
All time downloads
GitHub stars
Contributors
Execute Python functions and classes on a distributed cluster with a single decorator
Compose workloads with distributed functions and classes each running on different CPUs, GPUs, TPUs, or accelerator racks like NVL72.
Leverage Ray’s in-memory distributed object store or direct transport over RDMA for high throughput communication.
Ray offers native libraries like Ray Data and Ray Train and a rapidly expanding ecosystem of 3rd party libraries like vLLM and SkyRL.
Enable AI builders to move fast by pooling your GPUs across clouds, regions, and K8s clusters.
Run training and inference on a shared resource pool, dynamically reallocating capacity as workload demand shifts to maximize utilization.
Run the same code across AWS, GCP, Azure, Nebius or CoreWeave to maximize GPU access across regions without cloud-specific rewrites.
Access controls and authentication including SSO, SAML, SCIM, and audit logs for secure multi-team security and governance.