Case Study

Geotab enhances commercial fleet safety with video AI on Anyscale

With Anyscale, Geotab is able to deliver a unified GPU compute platform where data scientists build models and run inference across billions of video frames to deliver safety and risk reduction insights to drivers and fleet managers.

43x

higher peak-hour throughput on video processing

4x

higher video processing GPU utilization

40%

less GPUs needed for peak processing

Geotab is a global leader in connected operations, video telematics and AI-powered insights, with more than 6 million connected vehicle subscriptions generating over 100 billion data points every day. Their AI camera product, GO Focus™, sits inside fleet vehicles and captures external road images used on machine learning pipelines for safety use cases like speed sign detection, risky driving behavior identification, and real-time driver coaching. These video-based use cases help reduce accidents, lower insurance costs and keep drivers safer on the road.

Behind those features sit multiple video AI pipelines, both for training and inference, that need to keep up with production-scale processing demands.

To power this end-to-end workflow, Geotab needed a compute platform capable of training computer vision models, fine-tuning vision-language models, and running inference across millions of dashcam videos in every batch job. At the same time, the platform had to reduce developer friction in self-serving scalable GPU compute, without requiring developers to learn Kubernetes. Geotab chose Anyscale to power that unified video AI platform.

LinkChallenges

Geotab builds AI and Machine Learning (ML) capabilities for fleet safety and video analysis, but the infrastructure underneath those workloads was the bottleneck to how fast the team could innovate. Running large batch video inference jobs, supporting dozens of data scientists with different GPU needs, and maintaining enterprise security across the organization all required a platform purpose-built for the complexity of their video AI workloads.

Three challenges stood in the way:

A wide technical gap between data scientists and GPU infrastructure. Geotab attempted to run GPU workloads on Kubernetes without an abstraction layer and ran into a fundamental problem: data scientists struggled to efficiently deploy containers, configure clusters, or navigate Kubernetes internals. The gap was too large to bridge through documentation alone, and asking data scientists to become infrastructure experts was not a viable path forward for a team focused on building fleet intelligence.
GPU sharing across teams was slow and inefficient to scale. Without a mechanism to share GPUs across users, one user running a workload locked others out of the machine entirely. Workloads ran in sequence rather than running in parallel, and the platform team had no good way to enforce fairness or priority across competing jobs. Spinning up a dedicated GPU cluster per user was operationally complex requiring individual Terraforms for setup and configuration. On top of that, GPU container images weighed tens of gigabytes and took 20-30 minutes to load on a fresh node, making quick experiments expensive and discouraging fast, iterative ML development.
Batch inference performance suffered from slow model loading. Geotab's video pipeline processes up to a million road images per batch run, using a vision language model to perform many critical tasks. Keeping a VLM running continuously to serve that workload was not cost-efficient. The team needed a pipeline that could deploy the model when a batch job arrived, process the full run, and then shut it down automatically once results were written, with minimal operational complexity and without bolting a separate orchestration tool like Airflow to stitch the stages together.

"There's a big technical gap over there between the data scientists who don't know much about Kubernetes and versus the Kubernetes expertise. Experimentation slowed to a crawl whenever ML developers had to depend on Kubernetes experts to run every workload."

Mike Wang | AI Platform Manager

LinkThe Solution

Geotab's platform team evaluated options for modernizing its GPU infrastructure and landed on Anyscale as the right foundation that could support both development as well as inference workloads at billions-of-frames scale.

With Anyscale, Geotab is able to:

Empower data scientists to run experiments at scale, without becoming K8s experts. Using Ray behind Anyscale Runtime, data scientists can scale AI workloads with familiar Python APIs. They can develop on interactive notebooks or from the CLI, which integrates with popular coding agents through Anyscale Agent Skills. Distributed work feels local without requiring manual Kubernetes provisioning, and waiting on the platform team to access or scale resources.
Centralize GPU resources with fractional allocation, priority management, and multi-tenancy. Python annotations like "GPU = 0.1" allow Anyscale to allocate fractional GPU resources natively across concurrent workloads, replacing the one-user-per-cluster model with a shared pool where the platform team can enforce priority across teams. RBAC ensures each user only accesses their own project and compute resources, and Docker image load times dropped by 5x from 20-30 minutes down to 4-5 minutes, removing the startup tax that previously made fast experimentation impractical.
Keep GPU costs in check by deploying models only when needed. Rather than running a VLM as an always-on service, Geotab structures its batch pipeline so the model spins up at job creation, processes the full set of images, and shuts down once results are written. That on-demand pattern, where models are fast loaded and end-to-end video processing can be executed without intermediate reads/writes managed by an external orchestrator, means GPU capacity is never reserved for work that is not actively happening.

"With a simple Python annotation, Anyscale allocates exactly the GPU fraction a workload needs and runs it natively alongside everything else. No waiting, no wasted capacity. "

Mike Wang | AI Platform Manager

LinkEmpower data scientists to run experiments at scale, without becoming K8s experts

Geotab's platform team has supported GPU workloads at scale for six years, yet the recurring problem has always remained the same: the engineers who know how to configure distributed compute are not the same people who need to run experiments on it. When Geotab attempted to run GPU workloads on Kubernetes, data scientists could not efficiently deploy containers or navigate cluster configuration for individual experiments without platform team intervention, which created a constant bottleneck and slowed down the research cycle.

With Anyscale, that dependency disappears. Data Scientists log in through Google Single Sign-On, land in their own project environment, and can submit jobs, spin up Jupyter notebooks, and manage workspaces which are interactive notebooks backed by multi-node clusters entirely through the UI without writing a line of infrastructure code. Each user is scoped to their own project through RBAC, so the platform team can maintain security and governance across dozens of concurrent users.

Geotab was also among the early testers of Anyscale Agent Skills, and the platform team adopted the platform and infrastructure Skills for their own work while guiding data scientists toward the workload-focused skills for code generation and experimentation. Engineers now go directly to the relevant skill when a question arises about cluster configuration, job submission, or debugging, rather than searching through documentation or filing a request.

"Our team adopted Anyscale Agent Skills heavily from the start. When anyone has a Ray question, they go straight to Agent Skills rather than the docs. It has become the default first stop for debugging, configuration, and figuring out what is possible. "

Mike Wang | AI Platform Engineer

LinkA centralized GPU platform for data scientists to efficiently share compute

Beyond self-service access, Geotab needed a way to share a centralized pool of GPUs across a growing number of teams without any single workload monopolizing resources or any team sitting idle waiting for capacity. The old model of one cluster per user was operationally unsustainable, and without fractional resource allocation, a researcher occupying an entire GPU meant everyone else had to wait in sequence.

Fractional GPU allocation through Python-native annotations resolved that problem. Multiple workloads now share the same hardware concurrently. The platform team can also enforce priority-aware scheduling so high-urgency jobs move ahead without blocking lower-priority work entirely, and the administrative overhead of per-user cluster management disappears. Docker image load times dropping from 20 to 30 minutes down to 4 to 5 minutes removed the startup tax that previously made quick experiments feel expensive and discouraged the iteration speed that ML research depends on. Across video processing workloads, the shift to fractional allocation and centralized GPU sharing has reduced the total number of GPUs needed at peak by 40%, freeing up capacity that previously sat idle waiting for individual users or jobs to finish.

Together these changes let the platform team support a large and growing organization of data scientists running fine-tuning jobs, deep learning experiments, and A100 GPU workloads without scaling the infrastructure burden alongside the headcount.

"Anyscale provides the enterprise foundation we need on top of Ray to efficiently share GPU resources across teams. Features like RBAC, SSO and audit logs are critical to how we operate our AI workloads at scale. "

Mike Wang | AI Platform Engineer

LinkOn-demand model deployment for cost efficiency

Running a large vision language model continuously against a workload that arrives in batches would mean paying for GPU capacity the vast majority of the time when no work is being done. Geotab's pipeline is deliberately structured to avoid that. When a batch job is created, an AI model spins up, processes the full set of images, writes the results to storage, and shuts down the cluster. The model only consumes GPU resources during the window when it is actively needed, and that window closes as soon as the batch is complete.

Anyscale and Ray Data makes this pattern practical by handling every stage of the pipeline natively, from image ingestion and blurring through model inference and result storage, without requiring an external orchestrator like Airflow to coordinate handoffs between steps. The entire workflow runs as a single coherent job, meaning there are no gaps between stages where idle compute accumulates and no orchestration overhead to maintain separately from the compute layer itself. This resulted in a 43x improvement in peak-hour video processing throughput and a 4x improvement in GPU utilization across video workloads, driven by eliminating the idle capacity that comes with keeping a large model running continuously. For a team managing GPU scarcity and watching cloud spend closely, eliminating persistent model footprint is one of the most direct levers available.

"Keeping a large vision language model running at all times would mean paying for GPU capacity that sits idle most of the day. Deploying on demand, processing the batch, and shutting down immediately is the only approach that makes sense at our scale. "

Mike Wang | AI Platform Manager

LinkWhat's Next

Looking further ahead, the platform roadmap centers on deepening multi-tenancy and enterprise governance as GPU demand grows across more teams, and on expanding adoption of Anyscale Runtime optimizations and Agent Skills to further reduce manual overhead for both the platform team and the data scientists they support.

"Fundamentally, we need that layer to manage the GPUs and manage the people who are using the GPUs. And that's kind of the layer where I see Anyscale is going to grow in importance. "

Mike Wang | AI Platform Manager

“Geotab generates billions of dash cam frames every day, and we need every one of our data scientists to be able to build and deploy AI against that scale. Anyscale gives them access to scalable GPU compute through Python and AI agents, without having to be infrastructure experts.”

Mike Wang

AI Platform Manager, Geotab

Geotab enhances commercial fleet safety with video AI on Anyscale

43x

4x

40%

LinkChallenges

LinkThe Solution

LinkEmpower data scientists to run experiments at scale, without becoming K8s experts

LinkA centralized GPU platform for data scientists to efficiently share compute

LinkOn-demand model deployment for cost efficiency

LinkWhat's Next

Want to give it a try?