Bibliography

Publications of the year

Articles in International Peer-Reviewed Journals

1R. Busa-Fekete, W. Cheng, E. Hüllermeier, B. Szörényi, P. Weng.

Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm, in: Machine Learning, December 2014, vol. 97, n^o 3, pp. 327-351. [ DOI : 10.1007/s10994-014-5458-8 ]

https://hal.inria.fr/hal-01079370
2C. Dhanjal, R. Gaudel, S. Clémençon.

Efficient Eigen-updating for Spectral Graph Clustering, in: Neurocomputing, May 2014, vol. 131, pp. 440-452, Correction of several typos. [ DOI : 10.1016/j.neucom.2013.11.015 ]

https://hal.archives-ouvertes.fr/hal-00770889
3A. György, G. Neu.

Near-Optimal Rates for Limited-Delay Universal Lossy Source Coding, in: IEEE Transactions on Information Theory, 2014, pp. 2823-2834. [ DOI : 10.1109/TIT.2014.2307062 ]

https://hal.archives-ouvertes.fr/hal-01079327
4G. Neu, A. György, C. Szepesvári, A. Antos.

Online Markov Decision Processes Under Bandit Feedback, in: IEEE Transactions on Automatic Control, 2014, vol. 59, pp. 676 - 691. [ DOI : 10.1109/TAC.2013.2292137 ]

https://hal.archives-ouvertes.fr/hal-01079422
5R. Ortner, D. Ryabko, P. Auer, R. Munos.

Regret bounds for restless Markov bandits, in: Journal of Theoretical Computer Science (TCS), 2014, vol. 558, pp. 62-76. [ DOI : 10.1016/j.tcs.2014.09.026 ]

https://hal.inria.fr/hal-01074077
6D. Ryabko.

Uniform hypothesis testing for finite-valued stationary processes, in: Statistics, 2014, vol. 48, n^o 1, pp. 121-128. [ DOI : 10.1080/02331888.2012.719511 ]

https://hal.inria.fr/inria-00610009
7B. Scherrer, M. Ghavamzadeh, V. Gabillon, B. Lesner, M. Geist.

Approximate Modified Policy Iteration and its Application to the Game of Tetris, in: Journal of Machine Learning Research, 2015, 47 p, forthcoming.

https://hal.inria.fr/hal-01091341

International Conferences with Proceedings

8R. Busa-Fekete, E. Hüllermeier, B. Szörényi.

Preference-Based Rank Elicitation using Statistical Models: The Case of Mallows, in: Proceedings of The 31st International Conference on Machine Learning, Beijing, China, JMLR Workshop and Conference Proceedings, June 2014, vol. 32.

https://hal.inria.fr/hal-01079369
9D. Calandriello, A. Lazaric, M. Restelli.

Sparse Multi-task Reinforcement Learning, in: NIPS - Advances in Neural Information Processing Systems 26, Montreal, Canada, December 2014.

https://hal.inria.fr/hal-01073513
10A. Carpentier, M. Valko.

Extreme bandits, in: Advances in Neural Information Processing Systems 27, Montréal, Canada, December 2014.

https://hal.inria.fr/hal-01079354
11P. Chainais, P. Pfennig, A. Leray.

Quantitative control of the error bounds of a fast super-resolution technique for microscopy and astronomy, in: Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, May 2014, pp. 2853 - 2857. [ DOI : 10.1109/ICASSP.2014.6854121 ]

https://hal.archives-ouvertes.fr/hal-01081402
12P. Chainais, C. Richard.

A diffusion strategy for distributed dictionary learning, in: 2nd "international Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST'14), Namur, Belgium, Proceedings of the second "international Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST'14), Laurent Jacques, August 2014.

https://hal.archives-ouvertes.fr/hal-01104781
13E. Daucé, E. Thomas.

Evidence build-up facilitates on-line adaptivity in dynamic environments: example of the BCI P300-speller, in: 22nd European Symposium on Artificial Neural Networks, Bruges, Belgium, April 2014.

https://hal.inria.fr/hal-01104024
14C. Dhanjal, R. Gaudel, S. Clémençon.

Online Matrix Completion Through Nuclear Norm Regularisation, in: SDM - SIAM International Conference on Data Mining, Philadelphia, United States, April 2014, Corrected a typo in the affiliation. [ DOI : 10.1137/1.9781611973440.72 ]

https://hal.inria.fr/hal-00926605
15M. Gheshlaghi Azar, A. Lazaric, E. Brunskill.

Online Stochastic Optimization under Correlated Bandit Feedback, in: 31st International Conference on Machine Learning, Beijing, China, June 2014.

https://hal.inria.fr/hal-01080138
16S. Iván, Á. D. Lelkes, J. Nagy-György, B. Szörényi, G. Turán.

Biclique Coverings, Rectifier Networks and the Cost of ε-Removal, in: 16th International Workshop on Descriptional Complexity of Formal Systems, Proceedings, Turku, Finland, August 2014, pp. 174 - 185. [ DOI : 10.1007/978-3-319-09704-6_16 ]

https://hal.inria.fr/hal-01079368
17A. Khaleghi, D. Ryabko.

Asymptotically consistent estimation of the number of change points in highly dependent time series, in: International Conference on Machine Learning (ICML), Beijing, China, June 2014, pp. 539-547.

https://hal.inria.fr/hal-01026583
18T. Kocák, G. Neu, M. Valko, R. Munos.

Efficient learning by implicit exploration in bandit problems with side observations, in: Advances in Neural Information Processing Systems 27, Montréal, Canada, December 2014.

https://hal.inria.fr/hal-01079351
19T. Kocák, M. Valko, R. Munos, S. Agrawal.

Spectral Thompson Sampling, in: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, Québec City, Canada, July 2014.

https://hal.inria.fr/hal-00981575
20T. Kocák, M. Valko, R. Munos, B. Kveton, S. Agrawal.

Spectral Bandits for Smooth Graph Functions with Applications in Recommender Systems, in: AAAI Workshop on Sequential Decision-Making with Big Data, Québec City, Canada, July 2014.

https://hal.inria.fr/hal-01045036
21G. Neu, M. Valko.

Online combinatorial optimization with stochastic decision sets and adversarial losses, in: Advances in Neural Information Processing Systems 27, Montréal, Canada, December 2014.

https://hal.inria.fr/hal-01079355
22O. Nicol, J. Mary, P. Preux.

Improving offline evaluation of contextual bandit algorithms via bootstrapping techniques, in: International Conference on Machine Learning, Beijing, China, E. Xing, T. Jebara (editors), Journal of Machine Learning Research, Workshop and Conference Proceedings; Proceedings of The 31st International Conference on Machine Learning, June 2014, vol. 32.

https://hal.inria.fr/hal-00990840
23R. Ortner, O.-A. Maillard, D. Ryabko.

Selecting Near-Optimal Approximate State Representations in Reinforcement Learning, in: International Conference on Algorithmic Learning Theory (ALT), Bled, Slovenia, LNCS, Springer, October 2014, vol. 8776, pp. 140-154.

https://hal.inria.fr/hal-01057562
24O. Pietquin, H. Glaude, C. Enderli.

Subspace Identification for Predictive State Representation by Nuclear Norm Minimization, in: IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL 2014), Orlando, United States, December 2014.

https://hal.inria.fr/hal-01104423
25B. Piot, M. Geist, O. Pietquin.

Difference of Convex Functions Programming for Reinforcement Learning, in: Advances in Neural Information Processing Systems (NIPS 2014), Montreal, Canada, December 2014.

https://hal.inria.fr/hal-01104419
26B. Piot, O. Pietquin, M. Geist.

Predicting when to laugh with structured classification, in: InterSpeech 2014, Singapore, September 2014, pp. 1786-1790.

https://hal-supelec.archives-ouvertes.fr/hal-01104739
27P. Preux, R. Munos, M. Valko.

Bandits attack function optimization, in: IEEE Congress on Evolutionary Computation, Beijing, China, July 2014.

https://hal.inria.fr/hal-00978637
28A. Sani, G. Neu, A. Lazaric.

Exploiting easy data in online optimization, in: Advances in Neural Information Processing 27, Montreal, Canada, December 2014.

https://hal.archives-ouvertes.fr/hal-01079428
29M. Soare, A. Lazaric, R. Munos.

Best-Arm Identification in Linear Bandits, in: NIPS - Advances in Neural Information Processing Systems 27, Montreal, Canada, December 2014.

https://hal.inria.fr/hal-01075701
30B. Szörényi, G. Kedenburg, R. Munos.

Optimistic planning in Markov decision processes using a generative model, in: Advances in Neural Information Processing Systems 27, Montréal, Canada, December 2014.

https://hal.inria.fr/hal-01079366
31E. Thomas, E. Daucé, D. Devlaminck, L. Mahé, A. Carpentier, R. Munos, M. Perrin, E. Maby, J. Mattout, T. Papadopoulo, M. Clerc.

CoAdapt P300 speller: optimized flashing sequences and online learning, in: 6th International Brain Computer Interface Conference, Graz, Austria, September 2014.

https://hal.inria.fr/hal-01103441
32M. Valko, R. Munos, B. Kveton, T. Kocák.

Spectral Bandits for Smooth Graph Functions, in: 31th International Conference on Machine Learning, Beijing, China, May 2014.

https://hal.inria.fr/hal-00986818

National Conferences with Proceedings

33M. Pachebat, N. Totaro, P. Chainais, O. Collery.

Synthèse en espace et temps du rayonnement acoustique d'une paroi sous excitation turbulente par synthèse spectrale 2D+T et formulation vibro-acoustique directe, in: Congrès Français d'acoustique 2014, Poitiers, France, April 2014, pp. 1915-1921, 6 Pages, 20 Refs, papier N183.

https://hal.archives-ouvertes.fr/hal-01058151

Conferences without Proceedings

34B. Piot, M. Geist, O. Pietquin.

Méthode de minimisation du résidu de Bellman boostée qui tient compte des démonstrations expertes., in: JFPDA - 9èmes Journées Francophones de Planification, Décision et Apprentissage, Liège, Belgium, May 2014.

https://hal-supelec.archives-ouvertes.fr/hal-01104789

Internal Reports

35J. Mary, R. Gaudel, P. Preux.

Bandits Warm-up Cold Recommender Systems, Inria Lille, July 2014, n^o RR-8563, 18 p.

https://hal.inria.fr/hal-01022628
36R. Munos.

From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning, 2014, 130 pages.

https://hal.archives-ouvertes.fr/hal-00747575
37V. Musco, M. Monperrus, P. Preux.

A Generative Model of Software Dependency Graphs to Better Understand Software Evolution, Inria, 2014.

https://hal.archives-ouvertes.fr/hal-01078716

Other Publications

38P. Chainais, A. Leray.

Statistical performance analysis of a fast super-resolution technique using noisy translations, November 2014, 15 pages, forthcoming.

https://hal.archives-ouvertes.fr/hal-01104759
39F. Guillou, R. Gaudel, J. Mary, P. Preux.

User Engagement as Evaluation: a Ranking or a Regression Problem?, October 2014, 1. Introduction 2. Recsys Challenge 2014: Data and Protocol 2.1 Data Characteristics and Statistics 2.2 About User Engagement as Evaluation 2.3 Input Features for the Model 3. Method 3.1 LambdaMART Model 3.2 Random Forests 3.3 Description of the Approach 4. Experiments 4.1 Experimental results 4.2 Relevant Features 5. Discussions 6. Conclusions 7. Acknowledgments 8. References. [ DOI : 10.1145/2668067.2668073 ]

https://hal.inria.fr/hal-01077986

References in notes

40P. Auer, N. Cesa-Bianchi, P. Fischer.

Finite-time analysis of the multi-armed bandit problem, in: Machine Learning, 2002, vol. 47, n^o 2/3, pp. 235–256.
41R. Bellman.

Dynamic Programming, Princeton University Press, 1957.
42D. Bertsekas, S. Shreve.

Stochastic Optimal Control (The Discrete Time Case), Academic Press, New York, 1978.
43D. Bertsekas, J. Tsitsiklis.

Neuro-Dynamic Programming, Athena Scientific, 1996.
44T. Ferguson.

A Bayesian Analysis of Some Nonparametric Problems, in: The Annals of Statistics, 1973, vol. 1, n^o 2, pp. 209–230.
45T. Hastie, R. Tibshirani, J. Friedman.

The elements of statistical learning — Data Mining, Inference, and Prediction, Springer, 2001.
46P. Nguyen, O.-A. Maillard, D. Ryabko, R. Ortner.

Competing with an Infinite Set of Models in Reinforcement Learning, in: AISTATS, Arizona, United States, JMLR W&CP, 2013, vol. 31, pp. 463-471.

https://hal.inria.fr/hal-00823230
47W. Powell.

Approximate Dynamic Programming, Wiley, 2007.
48M. Puterman.

Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley and Sons, 1994.
49H. Robbins.

Some aspects of the sequential design of experiments, in: Bull. Amer. Math. Soc., 1952, vol. 55, pp. 527–535.
50J. Rust.

How Social Security and Medicare Affect Retirement Behavior in a World of Incomplete Market, in: Econometrica, July 1997, vol. 65, n^o 4, pp. 781–831.

http://econpapers.repec.org/paper/wpawuwppe/9406005.htm
51J. Rust.

On the Optimal Lifetime of Nuclear Power Plants, in: Journal of Business & Economic Statistics, 1997, vol. 15, n^o 2, pp. 195–208.
52R. Sutton, A. Barto.

Reinforcement learning: an introduction, MIT Press, 1998.
53G. Tesauro.

Temporal Difference Learning and TD-Gammon, in: Communications of the ACM, March 1995, vol. 38, n^o 3.
54P. Werbos.

ADP: Goals, Opportunities and Principles, IEEE Press, 2004, pp. 3–44, Handbook of learning and approximate dynamic programming.

Previous |

Home