Ray Serve is an online model inference and agnostic framework built atop the Ray framework. Compared with other inference service frameworks, Ray Serve focuses on elastic scaling, optimizing inference graphs composed of multiple models, and scenarios where a large number of models multiplexes a small amount of hardware. It highlights Ray's flexible scheduling capabilities and high-performance RPC. Ray Serve supports any machine learning framework for serving models, has its own batch processing function to improve throughput, and natively supports the FastAPI framework.
Ray aims to provide a universal API for distributed computing. A core part of achieving this goal is to provide simple but general programming abstractions, letting the system do all the hard work. This philosophy is what makes it possible as a developer to use Ray with existing Python and Java libraries and systems.
Ray seeks to enable the development and composition of distributed applications and libraries in general. Concretely, this includes coarse-grained elastic workloads (i.e., types of serverless computing), machine learning training (e.g., Ray Train), online serving (e.g., Ray Serve), data processing (e.g., Ray Datasets, Modin, Dask-on-Ray), and ad-hoc computation (e.g., parallelizing Python apps, gluing together different distributed frameworks).
Ray's API enables developers to easily compose multiple libraries within a single distributed application. For example, Ray tasks and actors may call into or be called from distributed training (e.g., torch.distributed) or online serving workloads also running in Ray. In this sense, Ray makes for an excellent "distributed glue" system, because its API is general and performant enough to serve as the interface between many different workload types.
Java is one of the mainstream programming languages in the computer industry, and a large number of users use Java as their main language. As of Ray 2.0, Ray Serve supports Java natively. Users can deploy their own Java code as a Deployment, and call and manage it through the Java API. At the same time, the Python API can also act on Java Deployment across languages, and vice versa.
In this post, we will introduce the Java part of Ray Serve in detail and step through simple steps how to get started using Java with Ray Serve
Start the Serve process in the same way as Ray Serve's Python API. Before using Ray Serve in Java, you need to start the operation to pull up the Controller and Proxy roles of Serve. For example:
1Serve.start(/*detached=*/true, /*dedicatedCpu=*/false, /*config=*/null);
By specifying the full class name in the Serve.deployment() interface, users can create and deploy a deployment:
1public static class Counter {
2
3 private AtomicInteger value;
4
5 public Counter(String value) {
6 this.value= new AtomicInteger(Integer.valueOf(value));
7 }
8
9 public String call(String delta) {
10 return String. valueOf(value. addAndGet(Integer. valueOf(delta)));
11 }
12 }
13
14 public void create() {
15 Serve.deployment()
16 .setName("counter")
17 .setDeploymentDef(Counter.class.getName())
18 .setInitArgs(new Object[] {"1"})
19 .setNumReplicas(1)
20 .create()
21 .deploy(/*blocking=*/true);
22 }
Once the deployment is successfully created, it can be queried by its name:
1public Deployment query() {
2 Deployment deployment = Serve.getDeployment("counter");
3 return deployment;
4 }
A Java Serve deployment can be called through Java's RayServeHandle, for example:
1Deployment deployment = Serve.getDeployment("counter");
2System.out.println(deployment.getHandle().remote("10").get());
Similarly, it can also be called via HTTP:
1curl -d '"10"' http://127.0.0.1:8000/counter
After a deployment, you can modify its configuration to redeploy. For example, the following code changes the number of copies of "counter" to 2:
1public void update() {
2 Serve.deployment()
3 .setName("counter")
4 .setDeploymentDef(Counter.class.getName())
5 .setInitArgs(new Object[] {"2"})
6 .setNumReplicas(1)
7 .create()
8 .deploy(/*blocking=*/true);
9 }
Through the Java API, you can also configure the deployment:
expand and shrink the number of deployment replicas
and specify the CPU or GPU resources of each replica
The numReplicas parameter control how many replicas are deployed. You can adjust this parameter dynamically, for example:
1public void scaleOut() {
2 Deployment deployment = Serve.getDeployment("counter");
3
4 // Scale up to 2 replicas.
5 deployment.options().setNumReplicas(2).create().deploy(/*blocking=*/true);
6
7 // Scale down to 1 replica.deployment.options
8 deployment.options().setNumReplicas(1).create().deploy(/*blocking=*/true);
9 }
Through the rayActorOptions parameter of the deployment, you can set the binding of each deployment replica to a given resource, for example a GPU:
1public void manageResource() {
2 Map<String, Object> rayActorOptions = new HashMap<>();
3 rayActorOptions.put("num_gpus", 1);
4 Serve.deployment()
5 .setName("counter")
6 .setDeploymentDef(Counter.class.getName ())
7 .setRayActorOptions(rayActorOptions)
8 .create()
9 .deploy(/*blocking=*/true);
10 }
Through the Java API, you can also deploy and call Python deployment across languages. Suppose there is a Python file counter.py in the /path/to/code/ directory:
1from ray import serve
2
3@serve.deployment
4class Counter(object):
5 def __init__(self, value):
6 self. value = int(value)
7
8 def increase(self, delta):
9 self. value += int(delta)
10 return str(self.value
An example of deploying and calling this Python deployment is as follows:
1import io.ray.api.Ray;
2import io.ray.serve.api.Serve;
3import io.ray.serve.deployment.Deployment;
4import io .ray.serve.generated.DeploymentLanguage;
5import java.io.File;
6
7public class ManagePythonDeployment {
8
9 public static void main(String[] args) {
10
11 System. setProperty(
12 "ray.job.code-search-path",
13 System.getProperty("java.class.path") + File.pathSeparator + "/path/to/code/");
14
15 Serve.start(true, false, null) ;
16
17 Deployment deployment =
18 Serve.deployment()
19 .setDeploymentLanguage(DeploymentLanguage.PYTHON)
20 .setName("counter")
21 .setDeploymentDef("counter.Counter")
22 .setNumReplicas(1)
23 .setInitArgs(new Object[] {"1"})
24 .create();
25 deployment.deploy(/*blocking=*/true);
26
27 System.out.println(Ray.get(deployment.getHandle().method("increase").remote("2")));
28 }
29}
To sum up, in this short, step-by-step getting-started tutorial, we illustrated via code samples the ease with which you can get started using cross language functionality of Ray Serve, in particular how Ray Serve supports Java. Although in Ray 2.0, the Java support is experimental, we encourage you to try our Java tutorial and provide us feedback by filing any issues you encounter.