张慧铭学术报告：Minimax Rate and Sub-Gaussian Estimation for Multi-armed Bandits in Reinforcement Learning

学术报告

张慧铭学术报告：Minimax Rate and Sub-Gaussian Estimation for Multi-armed Bandits in Reinforcement Learning

发布时间：2023-10-02 浏览次数：10

报告题目：Minimax Rate and Sub-Gaussian Estimation for Multi-armed Bandits in Reinforcement Learning

报告人：张慧铭，北京航空航天大学人工智能研究院 15501122255

报告时间：10月7日下午15:00-16:00

报告地点：雁山校区理4-501

报告摘要：In machine learning, to analyze how well a learning algorithm performs in the most unfavorable situation, we use a theoretical concept called the minimax rate. In this talk, we give an introduction to multi-armed bandit problems in reinforcement learning and their minimax rate of regret bounds. The regret bounds for two algorithms are discussed: upper confidence bound (UCB) algorithms and minimax optimal strategy in the stochastic case (MOSS) algorithms. Further, since existing UCB algorithms contain unknown sub-Gaussian parameters, we propose estimated and bootstrapped UCB algorithms under both sub-Gaussian bandits and small sample assumptions.

报告人简介：张慧铭，北航人工智能研究院的副教授(准聘)。曾在澳门大学担任过濠江学者博士后研究员(2020-2022)；曾就读于北京大学(2016-2020)获得统计学博士。本科（09级）与硕士（13级）均就读于华中师范大学，获得数学与经济学双学士学位以及数理统计硕士学位; 高中毕业于广西师大附中。研究方向：非渐近推断、高维概率统计、稳健估计、机器学习与深度学习理论、函数型数据等。发表SCI论文21篇(包括机器学习与人工智能领域顶刊JMLR;统计顶刊JASA,Biometrika;精算顶刊IME; 统计、数学、与物理主流刊Statistica Sinica, Journal of Complexity,和Physica Scripta等；谷歌学术引用超470次)，其中两篇为Web of Science高被引论文。目前主持国自科青基一项；担任美国《数学评论》评论员，SCI期刊Mathematics (Q1,中科院三区, IF=2.592)的高维与非渐近统计专栏客座主编。曾担任统计、概率、人工智能与机器学习领域顶刊（AOS,AOAP,JASA,JMLR,IEEET-SP）的审稿人。