Case Study
lower cost than OpenAI
hours of catalog onboarding from 28 days
throughput in token generation
Major retailers and manufacturers like Macy’s, Conrad, and Airbus have one thing in common: massive product catalogues that need to be onboarded on multiple marketplaces. On average, a seller takes about 28 days to onboard their catalog to a marketplace and be able to sell their products. More than half of this time is specifically dedicated to the catalog onboarding process, which involves mapping fields between the retailers catalog and the marketplace listings forms. Using LLM-powered automation, Mirakl is helping retailers tackle this problem by automating onboarding for over 10 million products each month.
Matthieu Mouminoux | Principal Architect
To meet with their growing seller volume and product diversity demands, Mirakl needed an AI platform that could automate and accelerate catalog onboarding with scalable and cost-efficient batch LLM inference for both off-the-shelf models and their own fine-tuned models across multiple cloud providers. By leveraging Anyscale’s elastic and developer friendly platform for Ray, Mirakl dramatically reduced processing costs while scaling their LLM-powered system to handle record-high peaks.
Product listing onboarding is a major bottleneck for marketplace sellers, often requiring weeks of tedious, manual data mapping between the retailer catalog and the marketplace listing forms. The seller-submitted product data can arrive in all formats – ranging from Excel spreadsheets to CSV files or API feeds – and each format contains different schemas and attribute naming. This variability makes standardization challenging, and is further exacerbated across verticals. For example, product attributes in the fashion industry differ significantly from those in electronics, grocery, health & beauty or industrial goods, requiring domain-specific knowledge that makes data mapping even harder.
Before a product can go live on a marketplace, the data must undergo several layers of pre-processing including: data mapping, cleaning, categorizing, and rewriting to adhere to marketplace requirements. This took sellers up to 28 days to manually onboard, ultimately delaying time-to-market and impacting the overall customer experience. The team needed a scalable and accurate solution to automate this transformation for over 10 million products each month; therefore, deciding to build a Catalog Transformer using LLMs. While initial LLM testing with GPT-4 mini produced fruitful results, Mirakl, however, encountered major limitations in costs, reliability and volume variability at scale. Ultimately, the team decided to look for a platform that would enable LoRA-based fine-tuning to build and subsequently run smaller, more task-specific LLMs.
Challenge | Consequences |
Cost Constraints | - Massive token volume generation with 70 billion+ input and 25 billion+ output per month with GPT-4 mini was pre-calculated to >$500k/month spent on inference alone. |
Infrastructure Reliability | - GPT-based APIs suffered downtime and latency during peak US hours, leading to delayed catalog transformations and onboarding. |
Elastic Load Variability | - Inefficient resource utilization between peak traffic up to 80k+ tokens/sec and low periods down to zero activity led to wasteful overprovisioning. |
Limited Customizations | - No flexible way to fine-tune small, domain-specific attributes without having to retrain entire models would have further increased complexity and costs. |
This led the team to look for a solution where they had more control, while still getting the operational efficiency of a hosted model. Anyscale checked all the boxes when compared with other providers.
To support production-scale inference and deliver the best customer experience, Mirakl chose Anyscale as the best option to engineer a cost-efficient, scalable architecture, while also laying out the foundation for future expansion into multimodal LLMs.
Requirement | Anyscale Advantage |
Optimized Spending | - Run small, task-specific LLaMA models with fine-tuned LoRA adapters - Efficient autoscaling that delivers high GPU utilization during peaks with eliminated overprovisioning and during low hours without idle GPUs. |
High Availability and Fault Tolerance | - Maintained both stable (4.4 tokens/sec) and peak (50k tokens/sec) workloads without latency spikes, downtime, or manual intervention. - Upgraded Ray clusters with Anyscale without compromising live pipelines. - Integrated observability with Datadog ensured real-time monitoring on latency, load, and adapter performance. |
Developer Agility | - Easy to spin up and manage Ray clusters as code thanks to the Anyscale CLI. - Isolated environments enabled independent product workloads, empowering separate experimentation, testing and production safely. |
During initial testing, Mirakl powered their automation with GPT-4 mini via the OpenAI API, but quickly discovered this would have been monumentally expensive in production. The team projected costs greater than $500k/month on inference alone, making production rollout unviable.
Realizing this, Mirakl decided to go with Anyscale by deploying smaller, open-source models like LLaMA 3.1 8B and fine-tuning them with LoRA adapters, both on top of Anyscale’s enterprise-grade Ray platform. By intelligently routing up to 90% of catalog traffic to LLaMA 3.1 8B, the team drastically drove costs down while leveraging flexible fine-tuned LoRA adapters to deliver domain-specific outputs. This enabled continuous improvements on model quality without having to retrain entire models every time a new domain-specific attribute is introduced. The rest of the edge and large instances would then have a fallback in GPT-4 mini. Investing in fine-tuned open-source models and leveraging Anyscale to manage Ray orchestration saved the company 3x in inference costs compared to using OpenAI API for all listings.
Another instance driving up the compute costs for the team was underutilized, overprovisioned GPUs. Because inference workloads varied in compute throughout the day, they were often only provisioning a fraction of the GPU capacity, leading to wasted spend that could have been invested elsewhere. Part of why Mirakl chose Anyscale was the platform's ability to elastically autoscale GPU resources in real time. That way, they can scale up to 20 GPU nodes during peak hours, but scale down to near-zero GPUs overnight, thus eliminating costs from idle compute..
Arthur Delaitre | Manager of Data Science
After suffering from reliability issues during US peak hours, Mirakl needed to ensure their LLM inference models could run at scale without downtime and latency. By deploying their fine-tuned LLaMA 3.1 8B models on Anyscale’s optimized Ray clusters, the team achieved consistent uptime and elastic scalability, seamlessly handling fluctuating traffic at every point in time.
Anyscale delivered reliability through:
Consistent peak performance, handling 80k+ token/sec generations without latency.
Zero-downtime upgrades when pushing changes to Ray clusters.
Integrations with observability tools like Datadog to monitor performance, latency and cluster health.
To accelerate AI development without adding further infrastructure complexity, Mirakl needed a process that allowed their developers and data scientists to prototype, test and deploy at scale. Anyscale significantly increased developer agility by streamlining how the teams at Mirakl build, test, and scale their AI workloads to production including:
Infrastructure-as-code: Ray clusters are quickly deployed with Github Actions and easily managed with Anyscale CLI without requiring manual Kubernetes setup and scheduling.
Unified Python platform: Fostered collaboration between data engineers and data scientists under a common language.
Cluster isolation for independent use cases: Dedicated Alpha, Beta, and Live clusters for uninterrupted testing and innovation.
Fast iteration without retraining the entire model: Flexible, hot-swappable LoRA adapter changes without affecting performance.
Faster time to production: Saved massive amount of catalog onboarding time for sellers to get onto the marketplace faster.
Matthieu Mouminoux | Principal Architect
Mirakl has fully transformed and automated how sellers onboard their products onto the Mirakl Catalog Platform, cutting onboarding time from weeks to hours. Product catalogs are now cleaner, more accurate and tailored to their specific verticals. Looking ahead, the team is expanding into multimodal AI, combining both text and image to enhance their data even more. Built on top of Anyscale’s production-grade Ray platform, Mirakl has created the foundation to innovate and drive more automation with AI in the E-commerce marketplaces industry.