machine learning with experts and bandits
4 crediti (20 ore), Dottorato in Informatica
DOCENTE: Nicolò Cesa-Bianchi
The course is about topics at the interface between machine learning and game theory. This area, also known as online learning, is concerned with the design of efficient algorithms for big data using local optimization techniques. We will start by introducing the Hedge algorithm for solving a class of decision problems known as prediction with expert advice. Hedge will be analyzed in the wider setting of sequential decision problems with partial feedback (multiarmed bandits). We will describe two application of multiarmed bandits: online auctions and recommender systems. In the second part of the course we will show that Hedge is a special case of a larger family of algorithms for online optimization known as Mirror Descent, whose analysis will be carried out using convex analysis tools. Finally, we will show how certain instances of Mirror Descent can learn data sources that change over time in an arbitrary fashion.
- Introduction and motivation
- Prediction with expert advice: binary prediction (PLG 1.4)
- Halving (PLG 1.4)
- Weighted Majority (PLG 1.4)
- Lower bound for deterministic binary prediction
- Randomized Weighted Majority (PLG 2.2)
- Lower bound for randomized binary prediction (PLG Theorem 3.7)
- From binary prediction to sequential decisions
- The bandit setting
- The importance weighting trick
- Exp3 (notes in prep.)
- Similarities between actions: Exp3-SET run on a feedback graph (GRA Lemma 1)
- Independence number bounds the variance in Exp3-SET (notes in prep.)
- Exp4 (REG 4.2)
- Second-price auctions (ALG Appendix C)
- Exp3-RTB (ALG Appendix C)
- Prediction with a linear optimization oracle
- Follow the perturbed leader
- The online convex optimization setting (OCO Section 2)
- Hedge performs online linear optimization in the simplex (OCO Section 2)
- The linearization trick (OCO Section 2)
- Follow the regularized leader (OCO Section 2)
- Convex duality (OCO Section 2)
- Euclidean and entropic regularizers (OCO Section 2)
- Regret against a sequence of models
Possible topics for the exam
- PLG: N. Cesa-Bianchi and G. Lugosi
Prediction, Learning, and Games
Cambridge University Press, 2006.
- REG: S. Bubeck and N. Cesa-Bianchi
Regret analysis of stochastic and nonstochastic multi-armed bandit problems (online version)
Foundations and Trends in Machine Learning, 5(1)1-122, 2012.
- GRA: N. Alon, N. Cesa-Bianchi, C. Gentile, S. Mannor, Y. Mansour, and O. Shamir
Nonstochastic multi-armed bandits with graph-structured feedback (online version)
SIAM Journal on Computing. To appear.
- ALG: N. Cesa-Bianchi, P. Gaillard, C. Gentile, and S. Gerchinovitz
Algorithmic chaining and the role of partial feedback in online nonparametric learning (online version)
Proceedings of the 30th Conference on Learning Theory (COLT). PMLR 65:465-481, 2017.
- OCO: S. Shalev-Shwartz
Online Learning and Online Convex Optimization (online version)
Foundations and Trends in Machine Learning 4(2), 2011
- The bandit killer: Lower bounds for the bandit setting
- Follow the Lazy Leader and the Shrinking Dartboard: Algorithms for prediction with expert advice that seldom change their minds.
- Experts and game theory: A relationship in (correlated) equilibrium.
- Adagrad and Online Newton Step: A superior breed of algorithms for online convex optimization.
- Adaptive regret, shifting regret, tracking regret: How to compete against a sequence of models.
- Routing packets in the dark: algorithms for the linear bandit setting.
In order to pass the exam you have to write a research note summarizing on one or more papers on a topic previously agreed with the lecturer. Your note must include:
- motivation and explanation of the problem,
- rigorous illustration of the most important results (including the formal definitions necessary to understand the results),
- at least one significant proof carried out in detail,
- description of previous results and follow-up results,
- a final section mentioning what are, in your opinion, strengths and weaknesses of the results' contributions.