Home BlogBlog Detail

Reinforcement learning based on market simulation at JPMorgan

By Erik Martinez | May 3, 2022

Financial institutions have long been at the forefront of predictive analytics and market forecasting. Now, they’re going even further with the help of reinforcement learning (RL). At the Production RL Summit, Sumitra Ganesh, research director at JP Morgan AI Research, explained how RL helps her team model complex economic systems and efficient policy learning.

Due to expense and risk in live environments — especially in multi-agent, interactive situations — simulations are employed. Simulation can come in many forms depending on the environment they’re emulating. JP Morgan needed an environment that simulated multiple heterogeneous interacting agents, since even a small market of just a few shops and consumers introduces a complex range of preferences, constraints, and connectivity.

But that is no easy feat. Economic systems are heterogeneous, include agents looking to maximize utility, and only offer partial observability. With this complexity comes two major challenges when looking to simulate it: finding equilibrium among multiple strategic agents and model calibration with real data. Fortunately, these are precisely the issues RL can help address. It can help learn agent behaviors and policies and then calibrate the agent composition with real world data.

Without RL, Sumitra and her team could have tried learning the separate policies of each agent in the market system she was simulating, but that would have been hard to scale, unstable, and does not leverage what is common across agents. In terms of calibrating agents to match the real market, it could be done agent by agent. But that would have been slow.

Watch Sumitra’s Production RL Summit talk and learn how RL-based simulations of complex, multi-agent environments and systems built with Ray and RLlib have greatly increased the efficacy and efficiency of the models they generate.

Sharing

Sign up for product updates

Introducing KubeRay v1.4

Deploy DeepSeek‑R1 with vLLM and Ray Serve on Kubernetes

The architecture of a Reinforcement Learning (RL) library is split into two primary components: Generation and Training. During the generation phase, an LLM Engine performs multi-turn rollouts within an environment to produce data and reward signals. This output is then fed into the training phase to update the model's parameters. This process forms a feedback loop, where the progressively improved model generates the next iteration of data for continuous refinement.

Open Source RL Libraries for LLMs

Ready to try Anyscale?

Access Anyscale today to see how companies using Anyscale and Ray benefit from rapid time-to-market and faster iterations across the entire AI lifecycle.