张慧铭学术报告:Minimax Rate and Sub-Gaussian Estimation for Multi-armed Bandits in Reinforcement Learning

发布时间:2023-10-02 浏览次数:10

报告题目:Minimax Rate and Sub-Gaussian Estimation for Multi-armed Bandits in Reinforcement Learning

报告人:张慧铭,北京航空航天大学人工智能研究院 15501122255

报告时间:10月7日下午15:00-16:00

报告地点:雁山校区理4-501


报告摘要:In machine learning, to analyze how well a learning algorithm performs in the most unfavorable situation, we use a theoretical concept called the minimax rate. In this talk, we give an introduction to multi-armed bandit problems in reinforcement learning and their minimax rate of regret bounds. The regret bounds for two algorithms are discussed: upper confidence bound (UCB) algorithms and minimax optimal strategy in the stochastic case (MOSS) algorithms. Further, since existing UCB algorithms contain unknown sub-Gaussian parameters, we propose estimated and bootstrapped UCB algorithms under both sub-Gaussian bandits and small sample assumptions.


报告人简介:张慧铭,北航人工智能研究院的副教授(准聘)。曾在澳门大学担任过濠江学者博士后研究员(2020-2022);曾就读于北京大学(2016-2020)获得统计学博士。本科(09级)与硕士(13级)均就读于华中师范大学,获得数学与经济学双学士学位以及数理统计硕士学位; 高中毕业于广西师大附中。研究方向:非渐近推断、高维概率统计、稳健估计、机器学习与深度学习理论、函数型数据等。发表SCI论文21篇(包括机器学习与人工智能领域顶刊JMLR;统计顶刊JASA,Biometrika;精算顶刊IME; 统计、数学、与物理主流刊Statistica Sinica, Journal of Complexity,和Physica Scripta等;谷歌学术引用超470次),其中两篇为Web of Science高被引论文。目前主持国自科青基一项;担任美国《数学评论》评论员,SCI期刊Mathematics (Q1,中科院三区, IF=2.592)的高维与非渐近统计专栏客座主编。曾担任统计、概率、人工智能与机器学习领域顶刊(AOS,AOAP,JASA,JMLR,IEEET-SP)的审稿人。