Jiaxin Shan is a software engineer at ByteDance.
The KubeRay project was released in Oct 2021, and it was developed in collaboration with Anyscale, Ant Group, ByteDance, and Microsoft. Over the past 6 months, KubeRay has become a popular toolkit for managing Ray clusters on Kubernetes.
The KubeRay 0.2 release introduces several important enhancements:
Integration with Ray autoscaler (alpha) and simplified autoscaler setup
gRPC service and CLI for easy integration
Simplified installation using the Kustomize tool
Ray v1.11.0 released the minimum viable product for Ray autoscaler integration with KubeRay. It is not ready for prime time or general use, but should be enough for interested parties to get started. It adds a new NodeProvider
implementation KuberayNodeProvider
, which is used to interact with custom resource RayCluster
defined in KubeRay.
KubeRay further simplifies autoscaler setup by providing a new field, enableInTreeAutoscaling: bool
. With this change, users no longer need to manually configure the autoscaler container in Ray head.
User feedback has shown that there is a learning curve to manage Ray clusters in native Kubernetes, since this requires a sophisticated permission system and people need to carefully write YAML files correctly.
In order to overcome these challenges for Ray users and improve the integration, KubeRay now includes a generic abstraction on top of RayCluster
CRDs and introduces a backend service backed by gRPC and gateway. Users can easily talk to the service to operate a cluster using HTTP or gRPC. ByteDance is adopting this method to build their Ray testing infrastructure. A simple kuberay
CLI is provided to end users to further reduce the learning curve.
KubeRay now can be deployed using the Kustomize installation tool. This flexible installation pattern simplifies customization by overlaying manifests. A few companies use this pattern to replace images, inject environment variables, etc. by extending the base manifests.
KubeRay also introduces Helm charts as an alternative to simplify the control plane installation experience for Helm users.
In addition to the major enhancements above, the community has also shipped a number of bug fixes and stability and performance enhancements. You can check out the full changelog here.
KubeRay contributors are working on KubeRay 0.3 planning. Job and Serve CRD will be introduced to further simplify workload management efforts. The community would also like to improve the workspace-centric development experience and cluster observability. Please check out the milestones for the 0.3 release for more details.
Achieving this milestone has truly been a community effort. We would like to thank everyone for their efforts on the KubeRay 0.2 release, especially the users, code contributors, and maintainers. As you can see from the extensive contributions to KubeRay 0.2, the KubeRay community is vibrant and diverse and is solving real-world problems for Ray users around the world.
Newcomers to the KubeRay project are always welcome! The KubeRay contributors hold open meetings and are always looking for more volunteers and users to unlock the potential of Ray experiences on Kubernetes. Use the following resources to ask questions and troubleshoot issues as you get acquainted with the project.
Visit the KubeRay GitHub page
Join the KubeRay Slack channel
Learn how to contribute to the KubeRay project
Access Anyscale today to see how companies using Anyscale and Ray benefit from rapid time-to-market and faster iterations across the entire AI lifecycle.