Home BlogBlog Detail

Enterprise Governance and Observability on Anyscale

By The Anyscale Team | October 1, 2024

Anyscale aims to put the power of modern compute into the hands of every developer. In order to do so sustainably, enterprises need to implement reasonable controls to facilitate innovation while preventing impending AI sprawl and overspending. To support this mission, we are excited to introduce new enterprise governance and observability tools that help organizations better control their AI infrastructure, understand their infrastructure utilization, and improve their ML workloads' efficiency.

LinkEnterprise Compute Observability

Anyscale customers get comprehensive insights into their infrastructure, helping them optimize compute resources and application performance. By gathering data from AI workloads, Anyscale increases visibility into how resources are used across clouds and compute resources, making it easier for teams to identify inefficiencies and enhance utilization.

LinkOptimizing at the Infrastructure Level

Anyscale offers a holistic view of compute utilization, showing aggregate resource usage across all jobs, services, and workspaces. This transparency helps teams easily identify the most and least utilized clusters, spot underutilized machines, and make decisions about reallocating resources or adjusting provisioning.

Key features include:

Cluster-wide visibility: See metrics for jobs, workspaces, and services, including how resources like CPU, GPU, and memory are being utilized.
Spot utilization insights: Understand how efficiently spot instances are used, and track spot preemptions to identify patterns.
Ray telemetry: Understand Ray metrics across multiple clusters even after the clusters are terminated.

LinkAdding Enterprise Governance

Anyscale’s tools include Resource Quotas to give platform administrators granular control over the allocation and usage of cloud resources across their organization.

With resource quotas, teams can set hard limits on resources such as the number of instances, CPU cores, and GPUs with the ability to set quotas on the specific types of GPUs. These controls help to keep projects within budget and avoid unexpected overages. Resource quotas provide a flexible method of creating guardrails for different projects and users by taking into account all active resources.

Traditional methods would restrict who can define the types of resources or the number of experiments that can be run at a time. However with Ray and Anyscale, users are able to run many small experiments or larger experiments in a way that provides guardrails without inhibiting innovation. In addition, this new tool enables teams to balance resource allocation across various users, ensuring fair usage and preventing resource hogging.

Key Benefits:

Cost Control: Set hard limits to prevent over-consumption of cloud resources.
Fair Allocation: Provide teams and projects the resources they need without waste.
Transparency: Gain visibility into resource consumption across your organization.
Customizable: Tailor quotas to specific teams, users, or workloads to meet your organization's unique needs.

LinkGet Started

At Anyscale, we’re committed to empowering customers with the tools they need to build and deploy AI workloads at scale.

Reach out to our team to take control over your AI infrastructure and costs. Book a demo here: https://www.anyscale.com/book/demo

Enterprise Compute Observability
Optimizing at the Infrastructure Level
Adding Enterprise Governance
Get Started

Sharing

Sign up for product updates

Introducing KubeRay v1.4

Deploy DeepSeek‑R1 with vLLM and Ray Serve on Kubernetes

The architecture of a Reinforcement Learning (RL) library is split into two primary components: Generation and Training. During the generation phase, an LLM Engine performs multi-turn rollouts within an environment to produce data and reward signals. This output is then fed into the training phase to update the model's parameters. This process forms a feedback loop, where the progressively improved model generates the next iteration of data for continuous refinement.

Open Source RL Libraries for LLMs

Ready to try Anyscale?

Access Anyscale today to see how companies using Anyscale and Ray benefit from rapid time-to-market and faster iterations across the entire AI lifecycle.