Closing the Loop: Coordinating Inventory and Recommendation via Deep Reinforcement Learning on Multiple Timescales – Industrial Systems Engineering and Management

DAO - ISEM - IORA Seminar Series

“Closing the Loop: Coordinating Inventory and Recommendation via Deep Reinforcement Learning on Multiple Timescales”

Yijie Peng

Associate Professor

Guanghua School of Management, Peking University

6 October 2025 (Monday), 10am – 11.30am
Venue: E1-07-21/22 ISEM Executive Classroom

ABSTRACT

Effective cross-functional coordination is essential for enhancing firm-wide profitability, particularly in the face of growing organizational complexity and scale. Recent advances in artificial intelligence, especially in reinforcement learning (RL), offer promising avenues to address this fundamental challenge. This paper proposes a unified multi-agent RL framework tailored for joint optimization across distinct functional modules, exemplified via coordinating inventory replenishment and personalized product recommendation. We first develop an integrated theoretical model to capture the intricate interplay between these functions and derive analytical benchmarks that characterize optimal coordination. The analysis reveals synchronized adjustment patterns across products and over time, highlighting the importance of coordinated decision-making. Leveraging these insights, we design a novel multi-timescale multi-agent RL architecture that decomposes policy components according to departmental functions and assigns distinct learning speeds based on task complexity and responsiveness. Our model-free multi-agent design improves scalability and deployment flexibility, while multi-timescale updates enhance convergence stability and adaptability across heterogeneous decisions. We further establish the asymptotic convergence of the proposed algorithm. Extensive simulation experiments demonstrate that the proposed approach significantly improves profitability relative to siloed decision-making frameworks, while the behaviors of the trained RL agents align closely with the managerial insights from our theoretical model. Taken together, this work provides a scalable, interpretable RL-based solution to enable effective cross-functional coordination in complex business settings.

PROFILE OF SPEAKER

Professor Peng is currently an Associate Professor at Guanghua School of Management, Peking University. He has been dedicated to methodological and theoretical research of stochastic simulation optimization of complex systems, and applied the new methods to artificial intelligence, financial engineering and risk management, health care and other fields. So far he has published in the top journals including Operations Research, INFORMS Journal on Computing, IEEE Transactions on Automatic Control. Professor Peng was awarded the INFORMS Outstanding Simulation Publication Award, and is the principal investigator of the Outstanding Young Scholar Grant, Original Research Grant, Excellent Young Scholar Grant from National Science Foundation of China.