Pure Exploration for Good Arm/Policy Identification – Industrial Systems Engineering and Management

ISEM Seminar Series

“Pure Exploration for Good Arm/Policy Identification”

Li Zitian

PhD student, Department of Industrial Systems Engineering & Management

College of Design and Engineering, NUS

26 May 2026 (Tuesday), 3pm – 4pm
Venue: E1-07-21/22 - ISEM Executive Classroom

ABSTRACT

Traditional pure exploration in Multi-Armed Bandits (MAB) and Reinforcement Learning (RL) has long focused on the "Best-Arm" or "Best-Policy" identification problem. However, in many real-world scenarios, identifying any candidate better than a threshold mu0 is sufficient. This seminar explores the sample complexity for identifying a good candidate, including 1-Identification (Multi-Armed Case) and Good-Policy Identification (Reinforcement Learning Case). We will first examine the 1-Identification problem in the multi-armed bandit case. With the fixed confidence setting, the goal is to determine whether at least one arm exceeds a pre-defined reward threshold mu0 or to correctly declare that no such arm exists. We propose the Parallel-Sequential-Exploration-Exploitation-on-Brackets (PSEEB) algorithm and a novel optimization-based lower bound framework. Both of ideas achieve near-optimality by matching upper and lower bounds on the sample complexity, whose gap is within polynomial logarithmic factors. Second, we turn to Markov Decision Process(MDP) setting, introducing Good-Policy-Identification (GPI). GPI aims to find a policy with an expected reward of at least mu0. We will present the BEE-GPI algorithm with upper bound analysis on the sample complexity. We also provide lower bounds, suggesting that our proposed algorithm is near-optimal from the minimax perspective. By covering these two topics, this seminar provides a unified perspective on threshold-based exploration, demonstrating how shifting the objective from "best" to "good enough" enables significantly more efficient exploration strategies in both bandits and episodic RL.

PROFILE OF SPEAKER

Li Zitian is a Ph.D. candidate in the Department of Industrial Systems Engineering and Management at the National University of Singapore, advised by Prof. Cheung Wang Chi. His research focuses on online learning, including Bandits, Pure Exploration, and Reinforcement Learning. He received his master’s degree from NUS and his bachelor’s degree from Sun Yat-sen University.