Foundation Model builders scale distributed trainingmultimodal data curationembedding generationpost-training workloads on Anyscale

Powered by Ray, the world’s most widely adopted AI compute engine.

Coinbase
Character-ai
TwelveLabs
Reflection
Torc Robotic
Tripadvisor Logo RGB Pine For Light Background
runway
grab logo icon

1,600,000+

GPU hours run on Anyscale every month

AI Workloads

Anyscale powers the data-intensive workloads required to build and scale Foundation Models.

Multimodal data curation

Large-scale pipelines for curating and preparing multimodal data across videos, images, text, and audio.

1import ray
2from ray.data.expressions import col, download
3
4class DetectObjects:
5    def __init__(self):
6        self.model = DummyModel()
7
8    def __call__(self, batch):
9        scores = self.model(batch["media"])
10        batch["scores"] = scores
11        return batch
12
13# Load metadata and download media files
14ds = ray.data.read_parquet("s3://my_data_metadata")
15ds = ds.with_column("media", download(col("path")))
16
17# Run object detection on GPU 
18ds = ds.map_batches(
19    DetectObjects,
20    batch_size=64,
21    num_gpus=1,
22)  # Store high-confidence results
23
24ds = ds.filter(col("scores") > 0.85)
25ds.write_parquet("s3://bucket/curated/")

Distributed model training

Orchestrate model training across GPU clusters with elastic scaling, last-mile data preprocessing, and GPU observability.

1from ray.train.torch import TorchTrainer
2from ray.train import ScalingConfig, RunConfig
3
4def train_loop(config):
5    model = build_model(config["model_name"])
6    model = ray.train.torch.prepare_model(model)
7    optimizer = torch.optim.AdamW(
8        model.parameters(), lr=config["lr"])
9
10    # Standard training loop 
11    for epoch in range(10):
12        for batch in train_dataloader:
13            loss = model(batch)
14            loss.backward()
15            optimizer.step()
16        ray.train.report(
17            {"loss": loss.item(), "epoch": epoch})
18
19# Launch distributed training across 64 GPU workers
20trainer = TorchTrainer(
21    train_loop,
22    train_loop_config={
23        "model_name": "llama-3.1-70b"24        "lr": 3e-4},
25    scaling_config=ScalingConfig(
26        num_workers=64, use_gpu=True)),
27)
28result = trainer.fit()

Batch embedding generation

Process and generate embeddings at scale for downstream search, retrieval, or training use cases.

1import ray
2from sentence_transformers import SentenceTransformer
3
4class SentenceTransformerEmbed:
5   def __init__(self, model: str):
6        self.model = SentenceTransformer(
7            model, device="cuda")
8
9    def __call__(self, batch):
10        texts = batch["text"]
11        embeddings = self.model.encode(texts,...)
12        batch["embedding"] = embeddings
13        return batch
14
15# Load source documents from object store
16ds = ray.data.read_parquet("s3://documents")
17
18# Compute embeddings in parallel across 16 GPU workers 
19ds = ds.map_batches(
20    SentenceTransformerEmbed,
21    fn_constructor_kwargs={"model": "bge-large-en-v1.5"},
22    concurrency=16,
23    num_gpus=1,
24    batch_size=512,
25)
26
27# Persist the embeddings to the warehouse
28ds.write_parquet("s3://warehouse/embeddings/")

Post-training

Run LLM inference and training on post-training frameworks like SkyRL and veRL, natively built on Ray.

1from vllm import LLM
2
3@ray.remote(num_gpus=1)
4class TrainingActor:
5  def __init__(self, model, ...):
6    self.model = get_model(model, ...)
7
8  def forward(data: dict):
9    outputs = self.model(
10        input_ids=data["input_ids"],
11        attention_mask=data["attention_mask"])
12    logprobs = logprobs_from_logits(outputs["logits"])
13    return logprobs
14
15vllm = ray.remote(num_gpus=1)(LLM).remote(
16   model_name, tensor_parallel_size=1)
17policy_model = TrainingActor.remote(model=model_name)
18reference_model = TrainingActor.remote(model=ref_model)
19
20# Generate trajectories
21outputs = vllm.generate.remote(prompts, sampling_params)
22trajectories_and_rewards = environment.score(outputs)
23
24# Tokenize and prepare input_ids, attention_mask
25data = prepare_for_forward(trajectories_and_rewards)
26
27# Get reference log probs and policy log probs
28reference_logprobs = reference_model.forward.remote(data)
29policy_logprobs = policy_model.forward.remote(data)
Anyscale enables us to push the boundaries of what’s possible in generative AI by giving us the flexibility to scale workloads seamlessly. This removes the risk around our infrastructure and allows our team to focus on innovation rather than infrastructure bottlenecks
Anastasis Germanidis avatar
Anastasis Germanidis
Co-Founder & CTO
runway

Why Anyscale

Build, run and optimize data-intensive training and inference pipelines on your own GPUs

Optimize distributed training, data curation, and batch inference pipelines with Ray on Anyscale.

Scale existing AI libraries like PyTorch, vLLM, SGLang, and XGBoost with Python APIs across thousands of nodes.

1import ray
2from sentence_transformers import SentenceTransformer
3
4class SentenceTransformerEmbed:
5   def __init__(self, model: str):
6        self.model = SentenceTransformer(
7            model, device="cuda")
8
9    def __call__(self, batch):
10        texts = batch["text"]
11        embeddings = self.model.encode(texts,...)
12        batch["embedding"] = embeddings
13        return batch
14
15# Load source documents from object store
16ds = ray.data.read_parquet("s3://documents")
17
18# Compute embeddings in parallel across 16 GPU workers 
19ds = ds.map_batches(
20    SentenceTransformerEmbed,
21    fn_constructor_kwargs={"model": "bge-large-en-v1.5"},
22    concurrency=16,
23    num_gpus=1,
24    batch_size=512,
25)
26
27# Persist the embeddings to the warehouse
28ds.write_parquet("s3://warehouse/embeddings/")

Built on Open Source

Anyscale is Built on Ray by the Creators of Ray

Ray is the world’s most trusted AI compute engine for building, running and scaling data-intensive AI workloads

500M+

All time downloads

41K+

GitHub stars

1.2k+

Contributors

icon-lightning

Simple Python APIs

Execute Python functions and classes on a distributed cluster with a single decorator

icon-shield

Fine-grained hardware allocation

Compose workloads with distributed functions and classes each running on different CPUs, GPUs, TPUs, or accelerator racks like NVL72.

icon-dashboard-gauge-low

Efficient distributed communication

Leverage Ray’s in-memory distributed object store or direct transport over RDMA for high throughput communication.

icon-distribution

Multi-framework support

Ray offers native libraries like Ray Data and Ray Train and a rapidly expanding ecosystem of 3rd party libraries like vLLM and SkyRL.

Unify and govern every team’s AI workloads across clouds

Enable AI builders to move fast by pooling your GPUs across clouds, regions, and K8s clusters.

illustration-pooled-gpus
Pooled GPUs

Run training and inference on a shared resource pool, dynamically reallocating capacity as workload demand shifts to maximize utilization.

illustration-multi-cloud-execution
Multi-cloud execution

Run the same code across AWS, GCP, Azure, Nebius or CoreWeave to maximize GPU access across regions without cloud-specific rewrites.

illustration-secure-governed
Secure and governed

Access controls and authentication including SSO, SAML, SCIM, and audit logs for secure multi-team security and governance.

Ready to build?

Start building with a free Anyscale account and access to dozens of code templates on Anyscale Platform.

Discover Anyscale

Learn more about building with Anyscale and Ray through our self-service courses, webinars, events and more.