Bibliography
Major publications by the team in recent years
-
1O. Cappé, A. Garivier, O.-A. Maillard, R. Munos, G. Stoltz.
Kullback-Leibler Upper Confidence Bounds for Optimal Sequential Allocation, in: Annals of Statistics, 2013, vol. 41, no 3, pp. 1516-1541, Accepted, to appear in Annals of Statistics.
https://hal.archives-ouvertes.fr/hal-00738209 -
2A. Carpentier, M. Valko.
Revealing graph bandits for maximizing local influence, in: International Conference on Artificial Intelligence and Statistics, Seville, Spain, May 2016.
https://hal.inria.fr/hal-01304020 -
3H. De Vries, F. Strub, J. Mary, H. Larochelle, O. Pietquin, A. Courville.
Modulating early visual processing by language, in: Conference on Neural Information Processing Systems, Long Beach, United States, December 2017.
https://hal.inria.fr/hal-01648683 -
4N. Gatti, A. Lazaric, M. Rocco, F. Trovò.
Truthful Learning Mechanisms for Multi–Slot Sponsored Search Auctions with Externalities, in: Artificial Intelligence, October 2015, vol. 227, pp. 93-139.
https://hal.inria.fr/hal-01237670 -
5M. Ghavamzadeh, Y. Engel, M. Valko.
Bayesian Policy Gradient and Actor-Critic Algorithms, in: Journal of Machine Learning Research, January 2016, vol. 17, no 66, pp. 1-53.
https://hal.inria.fr/hal-00776608 -
6H. Kadri, E. Duflos, P. Preux, S. Canu, A. Rakotomamonjy, J. Audiffren.
Operator-valued Kernels for Learning from Functional Response Data, in: Journal of Machine Learning Research (JMLR), 2016.
https://hal.archives-ouvertes.fr/hal-01221329 -
7E. Kaufmann, O. Cappé, A. Garivier.
On the Complexity of Best Arm Identification in Multi-Armed Bandit Models, in: Journal of Machine Learning Research, January 2016, vol. 17, pp. 1-42.
https://hal.archives-ouvertes.fr/hal-01024894 -
8A. Lazaric, M. Ghavamzadeh, R. Munos.
Analysis of Classification-based Policy Iteration Algorithms, in: Journal of Machine Learning Research, 2016, vol. 17, pp. 1 - 30.
https://hal.inria.fr/hal-01401513 -
9R. Munos.
From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning, in: Foundations and Trends® in Machine Learning, 2014, vol. 7, no 1, pp. 1-129.
http://dx.doi.org/10.1561/2200000038 -
10R. Ortner, D. Ryabko, P. Auer, R. Munos.
Regret bounds for restless Markov bandits, in: Journal of Theoretical Computer Science (TCS), 2014, vol. 558, pp. 62-76. [ DOI : 10.1016/j.tcs.2014.09.026 ]
https://hal.inria.fr/hal-01074077
Doctoral Dissertations and Habilitation Theses
-
11M. Abeille.
Exploration-Exploitation with Thompson Sampling in Linear Systems, Université de Lille, December 2017. -
12D. Calandriello.
Efficient Sequential Learning in Structured and Constrained Environments, Université de Lille, December 2017. -
13P. Gajane.
Multi-armed bandits with unconventional feedback, Université de Lille, November 2017. -
14J. Pérolat.
Reinforcement learning: the multiplayer case, Université de Lille, December 2017.
Articles in International Peer-Reviewed Journals
-
15B. Danglot, P. Preux, B. Baudry, M. Monperrus.
Correctness Attraction: A Study of Stability of Software Behavior Under Runtime Perturbation, in: Empirical Software Engineering, 2017, https://arxiv.org/abs/1611.09187. [ DOI : 10.1007/s10664-017-9571-8 ]
https://hal.archives-ouvertes.fr/hal-01378523 -
16C. Dimitrakakis, B. Nelson, Z. Zhang, A. Mitrokotsa, B. I. P. Rubinstein.
Differential Privacy for Bayesian Inference through Posterior Sampling, in: Journal of Machine Learning Research, April 2017, vol. 18, no 11, 1−39 p.
https://hal.inria.fr/hal-01500302 -
17E. Kaufmann, T. Bonald, M. Lelarge.
A Spectral Algorithm with Additive Clustering for the Recovery of Overlapping Communities in Networks, in: Journal of Theoretical Computer Science (TCS), 2017, https://arxiv.org/abs/1506.04158, forthcoming.
https://hal.archives-ouvertes.fr/hal-01163147 -
18E. Kaufmann, A. Garivier.
Learning the distribution with largest mean: two bandit frameworks, in: ESAIM: Proceedings and Surveys, 2017, vol. 2017, pp. 1 - 10, https://arxiv.org/abs/1702.00001, forthcoming.
https://hal.archives-ouvertes.fr/hal-01449822 -
19E. Kaufmann.
On Bayesian index policies for sequential resource allocation, in: Annals of Statistics, 2017, https://arxiv.org/abs/1601.01190, forthcoming.
https://hal.archives-ouvertes.fr/hal-01251606 -
20V. Musco, M. Monperrus, P. Preux.
A Large-scale Study of Call Graph-based Impact Prediction using Mutation Testing, in: Software Quality Journal, September 2017, vol. 25, no 3, pp. 921–950. [ DOI : 10.1007/s11219-016-9332-8 ]
https://hal.inria.fr/hal-01346046
International Conferences with Proceedings
-
21M. Abeille, A. Lazaric.
Linear Thompson Sampling Revisited, in: AISTATS 2017 - 20th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, United States, April 2017.
https://hal.inria.fr/hal-01493561 -
22M. Abeille, A. Lazaric.
Thompson Sampling for Linear-Quadratic Control Problems, in: AISTATS 2017 - 20th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, United States, April 2017.
https://hal.inria.fr/hal-01493564 -
23B. Balle, O.-A. Maillard.
Spectral Learning from a Single Trajectory under Finite-State Policies, in: International conference on Machine Learning, Sidney, France, Proceedings of the International conference on Machine Learning, July 2017.
https://hal.archives-ouvertes.fr/hal-01590940 -
24S. Brodeur, E. Perez, A. Anand, F. Golemo, L. Celotti, F. Strub, J. Rouat, H. Larochelle, A. Courville.
HoME: a Household Multimodal Environment, in: NIPS 2017's Visually-Grounded Interaction and Language Workshop, Long Beach, United States, December 2017, https://arxiv.org/abs/1711.11017.
https://hal.inria.fr/hal-01653037 -
25A. Bérard, O. Pietquin, L. Besacier.
LIG-CRIStAL System for the WMT17 Automatic Post-Editing Task, in: Second conference on machine translation (WMT17) during EMNLP 2017, Copenhague, Denmark, September 2017.
https://hal.archives-ouvertes.fr/hal-01580881 -
26D. Calandriello, A. Lazaric, M. Valko.
Distributed adaptive sampling for kernel matrix approximation, in: International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, United States, 2017.
https://hal.inria.fr/hal-01482760 -
27D. Calandriello, A. Lazaric, M. Valko.
Efficient second-order online kernel learning with adaptive embedding, in: NIPS 2017 : The Thirty-first Annual Conference on Neural Information Processing Systems, Long Beach, United States, December 2017, pp. 1-17.
https://hal.inria.fr/hal-01643961 -
28D. Calandriello, A. Lazaric, M. Valko.
Second-Order Kernel Online Convex Optimization with Adaptive Sketching, in: International Conference on Machine Learning, Sydney, Australia, 2017.
https://hal.inria.fr/hal-01537799 -
29H. De Vries, F. Strub, S. Chandar, O. Pietquin, H. Larochelle, A. Courville.
GuessWhat?! Visual object discovery through multi-modal dialogue, in: Conference on Computer Vision and Pattern Recognition, Honolulu, United States, July 2017, https://arxiv.org/abs/1611.08481.
https://hal.inria.fr/hal-01549641 -
30H. De Vries, F. Strub, J. Mary, H. Larochelle, O. Pietquin, A. Courville.
Modulating early visual processing by language, in: NIPS 2017 - Conference on Neural Information Processing Systems, Long Beach, United States, December 2017, pp. 1-14, https://arxiv.org/abs/1707.00683.
https://hal.inria.fr/hal-01648683 -
31A. Erraqabi, A. Lazaric, M. Valko, E. Brunskill, Y.-E. Liu.
Trading off rewards and errors in multi-armed bandits, in: International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, United States, 2017.
https://hal.inria.fr/hal-01482765 -
32C. Z. Felício, K. V. R. Paixão, C. A. Z. Barcelos, P. Preux.
A Multi-Armed Bandit Model Selection for Cold-Start User Recommendation, in: 25th ACM Conference on User Modelling, Adaptation and Personalization (UMAP), Bratislava, Slovakia, July 2017.
https://hal.inria.fr/hal-01517967 -
33R. Fruit, A. Lazaric.
Exploration–Exploitation in MDPs with Options, in: AISTATS 2017 - 20th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, United States, April 2017.
https://hal.inria.fr/hal-01493567 -
34R. Fruit, M. Pirotta, A. Lazaric, E. Brunskill.
Regret Minimization in MDPs with Options without Prior Knowledge, in: NIPS 2017 - Neural Information Processing Systems, Long Beach, United States, December 2017, pp. 1-36.
https://hal.inria.fr/hal-01649082 -
35G. Gautier, R. Bardenet, M. Valko.
Zonotope hit-and-run for efficient sampling from projection DPPs, in: International Conference on Machine Learning, Sydney, Australia, 2017.
https://hal.inria.fr/hal-01526577 -
36M. Geist, B. Piot, O. Pietquin.
Is the Bellman residual a bad proxy?, in: NIPS 2017 - Advances in Neural Information Processing Systems, Long Beach, United States, December 2017, pp. 1-13.
https://hal.archives-ouvertes.fr/hal-01629739 -
37E. Kaufmann, W. M. Koolen.
Monte-Carlo Tree Search by Best Arm Identification, in: NIPS 2017 - 31st Annual Conference on Neural Information Processing Systems, Long Beach, United States, Advances in Neural Information Processing Systems, December 2017, pp. 1-23, https://arxiv.org/abs/1706.02986.
https://hal.archives-ouvertes.fr/hal-01535907 -
38R. Laroche, M. Barlier.
Transfer Reinforcement Learning with Shared Dynamics, in: AAAI-17 - Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, United States, February 2017, 7 p.
https://hal.archives-ouvertes.fr/hal-01548649 -
39O.-A. Maillard.
Boundary Crossing for General Exponential Families, in: Algorithmic Learning Theory, Kyoto, Japan, Proceedings of Algorithmic Learning Theory, October 2017, vol. 1, pp. 1 - 34.
https://hal.archives-ouvertes.fr/hal-01615427 -
40A. M. Metelli, M. Pirotta, M. Restelli.
Compatible Reward Inverse Reinforcement Learning, in: The Thirty-first Annual Conference on Neural Information Processing Systems - NIPS 2017, Long Beach, United States, December 2017.
https://hal.inria.fr/hal-01653328 -
41J. Mourtada, O.-A. Maillard.
Efficient tracking of a growing number of experts, in: Algorithmic Learning Theory, Tokyo, Japan, Proceedings of Algorithmic Learning Theory, October 2017, vol. 76, pp. 1 - 23.
https://hal.archives-ouvertes.fr/hal-01615424 -
42M. Papini, M. Pirotta, M. Restelli.
Adaptive Batch Size for Safe Policy Gradients, in: The Thirty-first Annual Conference on Neural Information Processing Systems (NIPS), Long Beach, United States, December 2017.
https://hal.inria.fr/hal-01653330 -
43G. Papoudakis, P. Preux, M. Monperrus.
A generative model for sparse, evolving digraphs, in: 6th International Conference on Complex Networks and their applications, Lyon, France, November 2017, https://arxiv.org/abs/1710.06298. [ DOI : 10.1007/978-3-319-72150-7_43 ]
https://hal.inria.fr/hal-01617851 -
44E. Perez, H. De Vries, F. Strub, V. Dumoulin, A. Courville.
Learning Visual Reasoning Without Strong Priors, in: ICML 2017's Machine Learning in Speech and Language Processing Workshop, Sidney, France, August 2017, https://arxiv.org/abs/1709.07871.
https://hal.inria.fr/hal-01648684 -
45E. Perez, F. Strub, H. De Vries, V. Dumoulin, A. Courville.
FiLM: Visual Reasoning with a General Conditioning Layer, in: AAAI Conference on Artificial Intelligence, New Orleans, United States, February 2018, https://arxiv.org/abs/1707.03017.
https://hal.inria.fr/hal-01648685 -
46J. Pérolat, F. Strub, B. Piot, O. Pietquin.
Learning Nash Equilibrium for General-Sum Markov Games from Batch Data, in: AISTATS 2017 - The 20th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, United States, April 2017, pp. 1-14.
https://hal.inria.fr/hal-01648489 -
47C. Riquelme, M. Ghavamzadeh, A. Lazaric.
Active Learning for Accurate Estimation of Linear Models, in: ICML 2017 - 34th International Conference on Machine Learning, Sydney, Australia, August 2017, 36 p.
https://hal.inria.fr/hal-01538762 -
48D. Ryabko.
Hypotheses testing on infinite random graphs, in: ALT 2017 - 28th International Conference on Algorithmic Learning Theory, kyoto, Japan, October 2017, pp. 1-12, https://arxiv.org/abs/1708.03131.
https://hal.inria.fr/hal-01627330 -
49D. Ryabko.
Independence clustering (without a matrix), in: NIPS 2017 - Thirty-first Annual Conference on Neural Information Processing Systems, Long Beach, United States, December 2017, pp. 1-14, https://arxiv.org/abs/1703.06700.
https://hal.inria.fr/hal-01627333 -
50D. Ryabko.
Universality of Bayesian mixture predictors, in: ALT 2017 - 28th International Conference on Algorithmic Learning Theory, Kyoto, Japan, October 2017, pp. 1-13, https://arxiv.org/abs/1610.08249.
https://hal.inria.fr/hal-01627332 -
51F. Strub, H. De Vries, J. Mary, B. Piot, A. Courville, O. Pietquin.
End-to-end optimization of goal-driven and visually grounded dialogue systems Harm de Vries, in: International Joint Conference on Artificial Intelligence, Melbourne, Australia, August 2017, https://arxiv.org/abs/1703.05423.
https://hal.inria.fr/hal-01549642 -
52S. Tosatto, M. Pirotta, C. D'Eramo, M. Restelli.
Boosted Fitted Q-Iteration, in: 34th International Conference on Machine Learning (ICML), Sydney, Australia, August 2017.
https://hal.inria.fr/hal-01653332 -
53N. Tziortziotis, C. Dimitrakakis.
Bayesian Inference for Least Squares Temporal Difference Regularization, in: ECML 2017 - European Conference on Machine Learning, Skopje, Macedonia, 2017-09-22, September 2017.
https://hal.inria.fr/hal-01593212 -
54Z. Wen, B. Kveton, M. Valko, S. Vaswani.
Online influence maximization under independent cascade model with semi-bandit feedback, in: NIPS 2017 - Neural Information Processing Systems, Long Beach, United States, December 2017, pp. 1-24.
https://hal.inria.fr/hal-01643976 -
55M. Zanon Boito, A. Bérard, A. Villavicencio, L. Besacier.
Unwritten Languages Demand Attention Too! Word Discovery with Encoder-Decoder Models, in: IEEE Automatic Speech Recognition and Understanding (ASRU), Okinawa, Japan, December 2017.
https://hal.archives-ouvertes.fr/hal-01592091
National Conferences with Proceedings
-
56M. Geist, B. Piot, O. Pietquin.
Faut-il minimiser le résidu de Bellman ou maximiser la valeur moyenne ?, in: Journées Francophones sur la Planification, la Décision et l'Apprentissage pour la conduite de systèmes (JFPDA 2017), Caen, France, Actes des Journées Francophones sur la Planification, la Décision et l'Apprentissage pour la conduite de systèmes (JFPDA 2017), July 2017.
https://hal.archives-ouvertes.fr/hal-01576347
Conferences without Proceedings
-
57R. Bonnefoi, L. Besson, C. Moy, E. Kaufmann, J. Palicot.
Multi-Armed Bandit Learning in IoT Networks: Learning helps even in non-stationary settings, in: CROWNCOM 2017 - 12th EAI International Conference on Cognitive Radio Oriented Wireless Networks, Lisbon, Portugal, September 2017.
https://hal.archives-ouvertes.fr/hal-01575419 -
58N. Carrara, R. Laroche, O. Pietquin.
Online learning and transfer for user adaptation in dialogue systems, in: SIGDIAL/SEMDIAL joint special session on negotiation dialog 2017, Saarbrücken, Germany, August 2017.
https://hal.archives-ouvertes.fr/hal-01557775
Other Publications
-
59L. Besson, E. Kaufmann.
Multi-Player Bandits Models Revisited, October 2017, https://arxiv.org/abs/1711.02317 - working paper or preprint.
https://hal.inria.fr/hal-01629733 -
60C. Dimitrakakis, F. Jarboui, D. Parkes, L. Seeman.
Multi-view Sequential Games: The Helper-Agent Problem, February 2017, working paper or preprint.
https://hal.archives-ouvertes.fr/hal-01408294 -
61C. Dimitrakakis, Y. Liu, D. Parkes, G. Radanovic.
Subjective Fairness: Fairness is in the eye of the beholder, July 2017, https://arxiv.org/abs/1706.00119 - working paper or preprint.
https://hal.inria.fr/hal-01531849 -
62A. R. Luedtke, E. Kaufmann, A. Chambaz.
Asymptotically Optimal Algorithms for Budgeted Multiple Play Bandits, October 2017, https://arxiv.org/abs/1606.09388 - working paper or preprint.
https://hal.archives-ouvertes.fr/hal-01338733 -
63O.-A. Maillard.
Basic Concentration Properties of Real-Valued Distributions, September 2017, Lecture.
https://hal.archives-ouvertes.fr/cel-01632228
-
64R. Allesiardo, R. Féraud, O.-A. Maillard.
The Non-stationary Stochastic Multi-armed Bandit Problem, in: International Journal of Data Science and Analytics, 2017, vol. 3, no 4, pp. 267–283. [ DOI : 10.1007/s41060-017-0050-5 ]
https://hal.archives-ouvertes.fr/hal-01575000 -
65P. Auer, N. Cesa-Bianchi, P. Fischer.
Finite-time analysis of the multi-armed bandit problem, in: Machine Learning, 2002, vol. 47, no 2/3, pp. 235–256. -
66R. Bellman.
Dynamic Programming, Princeton University Press, 1957. -
67D. Bertsekas, S. Shreve.
Stochastic Optimal Control (The Discrete Time Case), Academic Press, New York, 1978. -
68D. Bertsekas, J. Tsitsiklis.
Neuro-Dynamic Programming, Athena Scientific, 1996. -
69M. Puterman.
Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley and Sons, 1994. -
70H. Robbins.
Some aspects of the sequential design of experiments, in: Bull. Amer. Math. Soc., 1952, vol. 55, pp. 527–535. -
71R. Sutton, A. Barto.
Reinforcement learning: an introduction, MIT Press, 1998. -
72P. Werbos.
ADP: Goals, Opportunities and Principles, IEEE Press, 2004, pp. 3–44, Handbook of learning and approximate dynamic programming.