Foundation Model builders scale distributed trainingmultimodal data curationembedding generationpost-training
workloads on Anyscale

Start for Free Get a Demo

Multimodal data curation

Large-scale pipelines for curating and preparing multimodal data across videos, images, text, and audio.

Learn More Try now

1import ray
2from ray.data.expressions import col, download
3
4class DetectObjects:
5    def __init__(self):
6        self.model = DummyModel()
7
8    def __call__(self, batch):
9        scores = self.model(batch["media"])
10        batch["scores"] = scores
11        return batch
12
13# Load metadata and download media files
14ds = ray.data.read_parquet("s3://my_data_metadata")
15ds = ds.with_column("media", download(col("path")))
16
17# Run object detection on GPU 
18ds = ds.map_batches(
19    DetectObjects,
20    batch_size=64,
21    num_gpus=1,
22)  # Store high-confidence results
23
24ds = ds.filter(col("scores") > 0.85)
25ds.write_parquet("s3://bucket/curated/")

Distributed model training

Orchestrate model training across GPU clusters with elastic scaling, last-mile data preprocessing, and GPU observability.

Learn more Try now

1from ray.train.torch import TorchTrainer
2from ray.train import ScalingConfig, RunConfig
3
4def train_loop(config):
5    model = build_model(config["model_name"])
6    model = ray.train.torch.prepare_model(model)
7    optimizer = torch.optim.AdamW(
8        model.parameters(), lr=config["lr"])
9
10    # Standard training loop 
11    for epoch in range(10):
12        for batch in train_dataloader:
13            loss = model(batch)
14            loss.backward()
15            optimizer.step()
16        ray.train.report(
17            {"loss": loss.item(), "epoch": epoch})
18
19# Launch distributed training across 64 GPU workers
20trainer = TorchTrainer(
21    train_loop,
22    train_loop_config={
23        "model_name": "llama-3.1-70b", 
24        "lr": 3e-4},
25    scaling_config=ScalingConfig(
26        num_workers=64, use_gpu=True)),
27)
28result = trainer.fit()

Batch embedding generation

Process and generate embeddings at scale for downstream search, retrieval, or training use cases.

Learn more Try now

1import ray
2from sentence_transformers import SentenceTransformer
3
4class SentenceTransformerEmbed:
5   def __init__(self, model: str):
6        self.model = SentenceTransformer(
7            model, device="cuda")
8
9    def __call__(self, batch):
10        texts = batch["text"]
11        embeddings = self.model.encode(texts,...)
12        batch["embedding"] = embeddings
13        return batch
14
15# Load source documents from object store
16ds = ray.data.read_parquet("s3://documents")
17
18# Compute embeddings in parallel across 16 GPU workers 
19ds = ds.map_batches(
20    SentenceTransformerEmbed,
21    fn_constructor_kwargs={"model": "bge-large-en-v1.5"},
22    concurrency=16,
23    num_gpus=1,
24    batch_size=512,
25)
26
27# Persist the embeddings to the warehouse
28ds.write_parquet("s3://warehouse/embeddings/")

Post-training

Run LLM inference and training on post-training frameworks like SkyRL and veRL, natively built on Ray.

Learn more Try now

1from vllm import LLM
2
3@ray.remote(num_gpus=1)
4class TrainingActor:
5  def __init__(self, model, ...):
6    self.model = get_model(model, ...)
7
8  def forward(data: dict):
9    outputs = self.model(
10        input_ids=data["input_ids"],
11        attention_mask=data["attention_mask"])
12    logprobs = logprobs_from_logits(outputs["logits"])
13    return logprobs
14
15vllm = ray.remote(num_gpus=1)(LLM).remote(
16   model_name, tensor_parallel_size=1)
17policy_model = TrainingActor.remote(model=model_name)
18reference_model = TrainingActor.remote(model=ref_model)
19
20# Generate trajectories
21outputs = vllm.generate.remote(prompts, sampling_params)
22trajectories_and_rewards = environment.score(outputs)
23
24# Tokenize and prepare input_ids, attention_mask
25data = prepare_for_forward(trajectories_and_rewards)
26
27# Get reference log probs and policy log probs
28reference_logprobs = reference_model.forward.remote(data)
29policy_logprobs = policy_model.forward.remote(data)

Anyscale enables us to push the boundaries of what’s possible in generative AI by giving us the flexibility to scale workloads seamlessly. This removes the risk around our infrastructure and allows our team to focus on innovation rather than infrastructure bottlenecks”

Anastasis Germanidis

Co-Founder & CTO

Optimize distributed training, data curation, and batch inference pipelines with Ray on Anyscale.

Scale existing AI libraries like PyTorch, vLLM, SGLang, and XGBoost with Python APIs across thousands of nodes.

1import ray
2from sentence_transformers import SentenceTransformer
3
4class SentenceTransformerEmbed:
5   def __init__(self, model: str):
6        self.model = SentenceTransformer(
7            model, device="cuda")
8
9    def __call__(self, batch):
10        texts = batch["text"]
11        embeddings = self.model.encode(texts,...)
12        batch["embedding"] = embeddings
13        return batch
14
15# Load source documents from object store
16ds = ray.data.read_parquet("s3://documents")
17
18# Compute embeddings in parallel across 16 GPU workers 
19ds = ds.map_batches(
20    SentenceTransformerEmbed,
21    fn_constructor_kwargs={"model": "bge-large-en-v1.5"},
22    concurrency=16,
23    num_gpus=1,
24    batch_size=512,
25)
26
27# Persist the embeddings to the warehouse
28ds.write_parquet("s3://warehouse/embeddings/")

1import ray
2from sentence_transformers import SentenceTransformer
3
4class SentenceTransformerEmbed:
5   def __init__(self, model: str):
6        self.model = SentenceTransformer(
7            model, device="cuda")
8
9    def __call__(self, batch):
10        texts = batch["text"]
11        embeddings = self.model.encode(texts,...)
12        batch["embedding"] = embeddings
13        return batch
14
15# Load source documents from object store
16ds = ray.data.read_parquet("s3://documents")
17
18# Compute embeddings in parallel across 16 GPU workers 
19ds = ds.map_batches(
20    SentenceTransformerEmbed,
21    fn_constructor_kwargs={"model": "bge-large-en-v1.5"},
22    concurrency=16,
23    num_gpus=1,
24    batch_size=512,
25)
26
27# Persist the embeddings to the warehouse
28ds.write_parquet("s3://warehouse/embeddings/")

Built on Open Source

Anyscale is Built on Ray by the Creators of Ray

Ray is the world’s most trusted AI compute engine for building, running and scaling data-intensive AI workloads

500M+

All time downloads

41K+

GitHub stars

1.2k+

Contributors

Simple Python APIs

Execute Python functions and classes on a distributed cluster with a single decorator

Fine-grained hardware allocation

Compose workloads with distributed functions and classes each running on different CPUs, GPUs, TPUs, or accelerator racks like NVL72.

Efficient distributed communication

Leverage Ray’s in-memory distributed object store or direct transport over RDMA for high throughput communication.

Multi-framework support

Ray offers native libraries like Ray Data and Ray Train and a rapidly expanding ecosystem of 3rd party libraries like vLLM and SkyRL.

Pooled GPUs

Run training and inference on a shared resource pool, dynamically reallocating capacity as workload demand shifts to maximize utilization.

Multi-cloud execution

Run the same code across AWS, GCP, Azure, Nebius or CoreWeave to maximize GPU access across regions without cloud-specific rewrites.

Secure and governed

Access controls and authentication including SSO, SAML, SCIM, and audit logs for secure multi-team security and governance.

Ray and Anyscale aligned with our vision: to iterate faster, scale smarter, and operate more efficiently.”

Wenyue Liu

Senior Machine Learning Platform Engineer

13x

Faster model loading

4x

Faster experimentation for visual language models (VLAs)

One of our applied AI engineers said, ‘we should use this model,’ and the next day it was running in production. Before Anyscale, that would’ve taken a week or more. ”

Ross Morrow

Principal Engineer

Foundation Model builders scale distributed trainingmultimodal data curationembedding generationpost-training
workloads on Anyscale

AI Workloads

Anyscale powers the data-intensive workloads required to build and scale Foundation Models.

Multimodal data curation

Distributed model training

Batch embedding generation

Post-training

Why Anyscale

Build, run and optimize data-intensive training and inference pipelines on your own GPUs

Built on Open Source

Anyscale is Built on Ray by the Creators of Ray

500M+

41K+

1.2k+

Simple Python APIs

Fine-grained hardware allocation

Efficient distributed communication

Multi-framework support

Unify and govern every team’s AI workloads across clouds

Pooled GPUs

Multi-cloud execution

Secure and governed

Customers

AI builders choose Anyscale

13x

4x

Ready to build?

Discover Anyscale

Foundation Model builders scale distributed trainingmultimodal data curationembedding generationpost-trainingworkloads on Anyscale

AI Workloads

Anyscale powers the data-intensive workloads required to build and scale Foundation Models.

Multimodal data curation

Distributed model training

Batch embedding generation

Post-training

Why Anyscale

Build, run and optimize data-intensive training and inference pipelines on your own GPUs

Scale data-intensive AI workloads

Fine-grained machine control

Seamless agent-first experience

Multi-cloud orchestration

Price-performance optimized Ray workloads

Advanced observability

Built on Open Source

Anyscale is Built on Ray by the Creators of Ray

500M+

41K+

1.2k+

Simple Python APIs

Fine-grained hardware allocation

Efficient distributed communication

Multi-framework support

Unify and govern every team’s AI workloads across clouds

Pooled GPUs

Multi-cloud execution

Secure and governed

Customers

AI builders choose Anyscale

13x

4x

Ready to build?

Discover Anyscale

Foundation Model builders scale distributed trainingmultimodal data curationembedding generationpost-training
workloads on Anyscale