EN FR
EN FR


Bibliography

Major publications by the team in recent years
  • 1M. Baboulin, D. Becker, J. Dongarra.

    A Parallel Tiled Solver for Dense Symmetric Indefinite Systems on Multicore Architectures, in: Proceedings of IEEE International Parallel & Distributed Processing Symposium (IPDPS 2012), 2012, pp. 14-24.
  • 2M. Baboulin, S. Donfack, J. Dongarra, L. Grigori, A. Rémy, S. Tomov.

    A class of communication-avoiding algorithms for solving general dense linear systems on CPU/GPU parallel machines, in: International Conference on Computational Science (ICCS 2012), Procedia Computer Science, Elsevier, 2012, vol. 9, pp. 17–26.
  • 3M. Baboulin, J. Dongarra, J. Herrmann, S. Tomov.

    Accelerating linear system solutions using randomization techniques, in: ACM Trans. Math. Softw., 2013, vol. 39, no 2.
  • 4M. Baboulin, S. Gratton.

    A contribution to the conditioning of the total least squares problem, in: SIAM J. Matrix Anal. and Appl., 2011, vol. 32, no 3, pp. 685–699.
  • 5M. Bahi, C. Eisenbeis.

    Impact of Reverse Computing on Information Locality in Register Allocation for High Performance Computing, in: International Journal of Parallel Programming, 2012, pp. 1–28.
  • 6D. Barthou, O. Brand-Foissac, O. Pene, G. Grosdidier, R. Dolbeau, C. Eisenbeis, M. Kruse, K. Petrov, C. Tadonki.

    Automated Code Generation for Lattice Quantum Chromodynamics and beyond, in: Journal of Physics: Conference Series, 2014, vol. 510, LPT-Orsay-13-142 article nb. 012005. [ DOI : 10.1088/1742-6596/510/1/012005 ]

    http://hal.inria.fr/hal-00926513
  • 7P. Esterie, J. Falcou, M. Gaunard, J.-T. Lapresté, L. Lacassagne.

    The Numerical Template toolbox: A Modern C++ Design for Scientific Computing, in: Journal of Parallel and Distributed Computing, July 2014. [ DOI : 10.1016/j.jpdc.2014.07.002 ]

    https://hal.inria.fr/hal-01061305
  • 8P. Esterie, M. Gaunard, J. Falcou, J.-T. Lapresté.

    Exploiting Multimedia Extensions in C++: A Portable Approach, in: Computing in Science & Engineering, 2012, vol. 14, no 5, pp. 72–77.
  • 9A. Ferreira Leite.

    A User-Centered and Autonomic Multi-Cloud Architecture for High Performance Computing Applications, Paris-Sud XI ; Universidade de Brasília, December 2014.

    https://hal.inria.fr/tel-01097295
  • 10G. Fursin, Y. Kashnikov, A. W. Memon, Z. Chamski, O. Temam, M. Namolaru, E. Yom-Tov, B. Mendelson, A. Zaks, E. Courtois, F. Bodin, P. Barnard, E. Ashton, E. Bonilla, J. Thomson, C. Williams, M. F. P. O'Boyle.

    Milepost GCC: Machine Learning Enabled Self-tuning Compiler, in: International Journal of Parallel Programming, 2011, vol. 39, pp. 296-327, 10.1007/s10766-010-0161-2.
  • 11M. Kruse.

    Lattice QCD Optimization and Polytopic Representations of Distributed Memory, Paris-Sud XI, September 2014.

    https://hal.inria.fr/tel-01078440
  • 12S. Tomov, J. Dongarra, M. Baboulin.

    Towards dense linear algebra for hybrid GPU accelerated manycore systems, in: Parallel Computing, 2010, vol. 36, no 5&6, pp. 232–240.
Publications of the year

Doctoral Dissertations and Habilitation Theses

International Conferences with Proceedings

  • 14M. Baboulin, J. Dongarra, A. Rémy, S. Tomov, I. Yamazaki.

    Dense Symmetric Indefinite Factorization on GPU Accelerated Architectures, in: 11th International Conference on Parallel Processing and Applied Mathematics (PPAM 2015), Krakow, Poland, Lecture Notes in Computer Science, September 2015.

    https://hal.inria.fr/hal-01223022
  • 15M. Baboulin, A. Jamal, M. Sosonkina.

    Using Random Butterfly Transformations in Parallel Schur Complement-Based Preconditioning, in: 8th Workshop on Computer Aspects of Numerical Algorithms (CANA'15), Lodz, Poland, September 2015.

    https://hal.inria.fr/hal-01223090
  • 16M. Baboulin, A. Khabou, A. Rémy.

    A Randomized LU-based Solver Using GPU and Intel Xeon Phi Accelerators, in: HeteroPar'2015, Vienna, Austria, August 2015.

    https://hal.inria.fr/hal-01223018
  • 17L. Bagnères, O. Zinenko, S. Huot, C. Bastoul.

    Opening Polyhedral Compiler's Black Box, in: CGO 2016 - 14th Annual IEEE/ACM International Symposium on Code Generation and Optimization, Barcelona, Spain, March 2016.

    https://hal.inria.fr/hal-01253322
  • 18A. Ferreira Leite, V. C. Alves, G. Nunes Rodrigues, C. Tadonki, C. Eisenbeis, A. C. Magalhaes Alves de Melo.

    Automating Resource Selection and Configuration in Inter-clouds through a Software Product Line Method, in: 8th International Conference on Cloud Computing (CLOUD), 2015 IEEE, New York City, United States, July 2015, pp. 726-733. [ DOI : 10.1109/CLOUD.2015.101 ]

    https://hal-mines-paristech.archives-ouvertes.fr/hal-01252985
  • 19G. W. Howell, M. Baboulin.

    LU Preconditioning for Overdetermined Sparse Least Squares Problems, in: 11th International Conference on Parallel Processing and Applied Mathematics (PPAM 2015), Krakow, Poland, Lecture Notes in Computer Science, September 2015.

    https://hal.inria.fr/hal-01223069
  • 20L. Lacassagne, L. Cabaret, D. Etiemble.

    Parallel light speed labeling: the world’s fastest connected component labeling for multicore processors, in: International Conference on Image Processing, Quebec, Canada, IEEE, September 2015, 8 p.

    https://hal.inria.fr/hal-01243310
  • 21I. Masliah, M. Baboulin, J. Falcou.

    Metaprogramming dense linear algebra solvers. Applications to multi and many-core architectures, in: 13th IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA 2015), Helsinki, Finland, August 2015.

    https://hal.inria.fr/hal-01221358

Scientific Books (or Scientific Book chapters)

  • 22M. Baboulin, J. Dongarra, R. Lacroix.

    Computing least squares condition numbers on hybrid multicore/GPU systems, in: Interdisciplinary Topics in Applied Mathematics, Modeling and Computational Science, Springer International Publishing, 2015, vol. 117. [ DOI : 10.1007/978-3-319-12307-3_6 ]

    https://hal.inria.fr/hal-01204804

Internal Reports

  • 23M. Baboulin, J. Falcou, I. Masliah.

    Meta-programming and Multi-stage Programming for GPGPUs, Inria Saclay Ile de France ; Paris-Sud XI, September 2015, no RR-8780.

    https://hal.inria.fr/hal-01204661

Other Publications

  • 24M. Baboulin, V. Dobrev, J. Dongarra, C. Earl, J. Falcou, A. Haidar, I. Karlin, T. Kolev, I. Masliah, S. Tomov.

    Towards a High-Performance Tensor Algebra Package for Accelerators, August 2015, Smoky Mountains Computational Sciences and Engineering Conference (SMC 2015), Poster.

    https://hal.archives-ouvertes.fr/hal-01231234
References in notes
  • 25The HiPEAC vision on high-performance and embedded architecture and compilation (2012-2020), 2012.

    http://www.hipeac.net/roadmap
  • 26European Union Framework Program 6 MILEPOST project No 035307 (MachIne Learning for Embedded PrOgramS opTimization).

    http://cordis.europa.eu/project/rcn/79763_en.html
  • 27PRACE: Partnership for Advanced Computing in Europe.

    http://www.prace-project.eu
  • 28AMD.

    AMD Core Math Library.

    http://developer.amd.com/libraries/acml/
  • 29E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. D. Croz, A. Greenbaum, S. Hammarling, A. McKenney, D. Sorensen.

    LAPACK Users' Guide, SIAM, 1999, Third edition.
  • 30K. Aneja, F. Laguzet, L. Lacassagne, A. Merigot.

    Video rate image segmentation by means of region splitting and merging, in: IEEE International Conference on Signal and Image Processing Applications (ICSIPA), 2009.
  • 31M. Arioli, M. Baboulin, S. Gratton.

    A partial condition number for linear least-squares problems, in: SIAM J. Matrix Anal. and Appl., 2007, vol. 29, no 2, pp. 413–433.
  • 32K. Asanovic.

    The landscape of parallel computing research: a view from Berkeley, Electrical Engineering and Computer Sciences, University of California at Berkeley, December 2006, no UCB/EECS-2006-183.

    http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.pdf
  • 33A. Avron, P. Maymounkov, S. Toledo.

    Blendenpick: Supercharging LAPACK’s least-squares solvers, in: SIAM J. Sci. Comput., 2010, vol. 32, pp. 1217–1236.
  • 34M. Baboulin, D. Becker, G. Bosilca, A. Danalis, J. Dongarra.

    An efficient distributed randomized algorithm for solving large dense symmetric indefinite linear systems, in: Parallel Computing, 2014, vol. 40, no 7, pp. 213–223.
  • 35M. Baboulin, D. Becker, J. Dongarra.

    A Parallel Tiled Solver for Dense Symmetric Indefinite Systems on Multicore Architectures, in: Proceedings of IEEE International Parallel & Distributed Processing Symposium (IPDPS 2012), 2012, pp. 14-24.
  • 36M. Baboulin, A. Buttari, J. Dongarra, J. Kurzak, J. Langou, J. Langou, P. Luszczek, S. Tomov.

    Accelerating scientific computations with mixed precision algorithms, in: Computer Physics Communications, 2009, vol. 180, no 12, pp. 2526–2533.
  • 37M. Baboulin, S. Donfack, J. Dongarra, L. Grigori, A. Rémy, S. Tomov.

    A class of communication-avoiding algorithms for solving general dense linear systems on CPU/GPU parallel machines, in: International Conference on Computational Science (ICCS 2012), Procedia Computer Science, Elsevier, 2012, vol. 9, pp. 17–26.
  • 38M. Baboulin, J. Dongarra, J. Demmel, S. Tomov, V. Volkov.

    Enhancing the performance of dense linear algebra solvers on GPUs in the MAGMA project, November 15, 2008.

    http://www.lri.fr/~baboulin/SC08.pdf
  • 39M. Baboulin, J. Dongarra, S. Gratton, J. Langou.

    Computing the conditioning of the components of a linear least squares solution, in: Numerical Linear Algebra with Applications, 2009, vol. 16, no 7, pp. 517–533.
  • 40M. Baboulin, J. Dongarra, J. Herrmann, S. Tomov.

    Accelerating linear system solutions using randomization techniques, in: ACM Trans. Math. Softw., 2013, vol. 39, no 2.
  • 41M. Baboulin, J. Dongarra, R. Lacroix.

    Computing least squares condition numbers on hybrid multicore/GPU systems, in: Proceedings of the International Conference of Applied Mathematics, Modeling and Computational Science (AMMCS 2013), 2013.
  • 42M. Baboulin, J. Dongarra, S. Tomov.

    Some Issues in Dense Linear Algebra for Multicore and Special Purpose Architectures, in: 9th International Workshop on State-of-the-Art in Scientific and Parallel Computing (PARA'08), Lecture Notes in Computer Science, Springer-Verlag, 2008, vol. 6126-6127.
  • 43M. Baboulin, S. Gratton.

    A contribution to the conditioning of the total least squares problem, in: SIAM J. Matrix Anal. and Appl., 2011, vol. 32, no 3, pp. 685–699.
  • 44M. Baboulin, S. Gratton, R. Lacroix, A. J. Laub.

    Statistical estimates for the conditioning of linear least squares problems, in: 10th International Conference on Parallel Processing and Applied Mathematics (PPAM 2013), Heidelberg, R. Wyrzykowski (editor), Lecture Notes in Computer Science, Springer-Verlag, 2014, vol. 8384, pp. 124-133.
  • 45M. Baboulin, X. S. Li, F.-H. Rouet.

    Using Random Butterfly Transformations to Avoid Pivoting in Sparse Direct Methods, in: Proceedings of VECPAR 2014, 2014.
  • 46J. C. Baez, M. Stay.

    Algorithmic thermodynamics, in: Mathematical Structures in Computer Science, 2012, vol. 22, no 5, pp. 771–787.

    http://dx.doi.org/10.1017/S0960129511000521
  • 47M. Bahi, C. Eisenbeis.

    Spatial complexity of reversibly computable DAG, in: Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems, ACM, 2009, pp. 47–56.
  • 48M. Bahi, C. Eisenbeis.

    Impact of Reverse Computing on Information Locality in Register Allocation for High Performance Computing, in: International Journal of Parallel Programming, 2012, pp. 1–28.
  • 49D. Barthou, O. Brand-Foissac, O. Pene, G. Grosdidier, R. Dolbeau, C. Eisenbeis, M. Kruse, K. Petrov, C. Tadonki.

    Automated Code Generation for Lattice Quantum Chromodynamics and beyond, in: Journal of Physics: Conference Series, 2014, vol. 510, 012005, LPT-Orsay-13-142. [ DOI : 10.1088/1742-6596/510/1/012005 ]

    http://hal.inria.fr/hal-00926513
  • 50D. Barthou, G. Grosdidier, C. Eisenbeis, P. Guichon, M. Kruse, O. Pene, K. Petrov, C. Tadonki.

    PetaQCD: En Route for the automatic code generation for lattice QCD, in: Proceedings of the 29th International Symposium on Lattice field theory (Lattice 2011), 2011, vol. 2011.
  • 51P. Basu, S. Williams, B. V. Straalen, A. Venkat, L. Oliker, M. Hall.

    Compiler Generation and Autotuning of Communication-Avoiding Operators for Geometric Multigrid, in: High Performance Computing Conference (HiPC), december 2013.
  • 52D. Becker, M. Baboulin, J. Dongarra.

    Reducing the amount of pivoting in symmetric indefinite systems, in: 9th International Conference on Parallel Processing and Applied Mathematics (PPAM 2011), Heidelberg, R. Wyrzykowski (editor), Lecture Notes in Computer Science, Springer-Verlag, 2012, vol. 7203, pp. 133–142.
  • 53T. Betcke, N. J. Higham, V. Mehrmann, C. Schröder, F. Tisseur.

    NLEVP: A Collection of Nonlinear Eigenvalue Problems, in: ACM Trans. Math. Software, February 2013, vol. 39, no 2, pp. 7:1-7:28. [ DOI : 0.1145/2427023.2427024 ]
  • 54L. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, R. Whaley.

    ScaLAPACK Users' Guide, SIAM, 1997, pp. 58–60.
  • 55Blaze.

    The Blaze Library, 2014.

    https://bitbucket.org/blaze-lib/blaze
  • 56G. Bradski.

    The OpenCV Library, in: Dr. Dobb's Journal of Software Tools, 2000.
  • 57L. Cabaret, L. Lacassagne.

    A Review of World’s Fastest Connected Component Labeling Algorithms : Speed and Energy Estimation, in: IEEE International Conference on Design and Architectures for Signal and Image Processing (DASIP), 2014, pp. 1-8.
  • 58L. Cabaret, L. Lacassagne.

    What is the world fastest Connected Component Labeling Algorithm ?, in: IEEE International Workshop on Signal Processing Systems (SiPS), 2014, pp. 1-6.
  • 59V. G. Cerf.

    Where is the science in computer science?, in: Communications of the ACM, 2012, vol. 55, no 10, pp. 5-5.
  • 60M. O. Cheema, L. Lacassagne, O. Hammami.

    System-Platforms-Based SystemC TLM Design of Image Processing Chains for Embedded Applications, in: EURASIP Journal on Embedded Systems, 2007, pp. 1-14. [ DOI : 10.1155/2007/71043 ]
  • 61P. Courbin, A. Pédron, T. Saidani, L. Lacassagne.

    Parallélisation d'opérateurs de TI: multi-coeurs, Cell ou GPU ?, in: GRETSI, 2009.
  • 62K. Czarnecki, U. W. Eisenecker, R. Glück, D. Vandevoorde, T. L. Veldhuizen.

    Generative Programming and Active Libraries, in: Generic Programming, 1998, pp. 25-39.
  • 63P. I. Davies, N. J. Higham.

    Numerically Stable Generation of Correlation Matrices and their Factors, in: BIT, 2000, vol. 40, no 4, pp. 640-651.
  • 64J. W. Demmel, L. Grigori, M. Hoemmen, J. Langou.

    Communication-optimal parallel and sequential QR and LU factorizations, in: SIAM Journal on Scientific Computing, 2012, vol. 34, no 1, pp. 206–239.
  • 65J. W. Demmel, A. McKenney.

    A Test Matrix Generation Suite, Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, USA, March 1989, no MCS-P69-0389, 16 p, LAPACK Working Note 9.
  • 66J. Dongarra et.al..

    The International Exascale Software Project roadmap, in: Int. J. High Perform. Comput. Appl., February 2011, vol. 25, no 1, pp. 3–60.

    http://dx.doi.org/10.1177/1094342010391989
  • 67A. Elouardi, S. Bouaziz, A. Dupret, L. Lacassagne, J. Klein, R. Reynaud.

    A smart sensor based vision system: implementation and evaluation, in: Journal of Applied Physics, 2006, vol. 39, pp. 1694-1705. [ DOI : 10.1088/0022-3727/39/8/033 ]
  • 68A. Elouardi, S. Bouaziz, A. Dupret, L. Lacassagne, J. Klein, R. Reynaud.

    A Smart Architecture for Low-Level Image Computing, in: International Journal of Computer Sciences and Application, 2008, vol. 5,3, pp. 1-19.
  • 69P. Esterie, J. Falcou, M. Gaunard, J.-T. Lapresté, L. Lacassagne.

    The numerical template toolbox: A modern C++ design for scientific computing, in: Journal of Parallel and Distributed Computing, 2014.
  • 70P. Esterie, M. Gaunard, J. Falcou, J.-T. Lapresté.

    Exploiting Multimedia Extensions in C++: A Portable Approach, in: Computing in Science & Engineering, 2012, vol. 14, no 5, pp. 72–77.
  • 71P. Estérie, M. Gaunard, J. Falcou.

    A proposal to add single instruction multiple data computation to the standard library, in: N3561, 2013.
  • 72D. Etiemble, S. Piskorski, L. Lacassagne.

    Performance evaluation of Altera C2H compiler on image processing benchmarks, in: TCHA: Workshop on Tools And Compiler for Hardware Acceleration, 2006.
  • 73J. Falcou, L. Lacassagne, S. Schaetz.

    Cell MPI: Mastering the Cell Broadband Engine architecture through a Boost based parallel communication library, in: Boost Conference, 2011.
  • 74J. Falcou, T. Saidani, L. Lacassagne, D. Etiemble.

    Programmation par squelettes algorithmiques pour le processeur Cell, in: SYMPA, 2008.
  • 75J. Falcou, J. Sérot, L. Pech, J.-T. Lapresté.

    Meta-programming applied to automatic SMP parallelization of linear algebra code, in: Euro-Par 2008–Parallel Processing, Springer Berlin Heidelberg, 2008, pp. 729–738.
  • 76G. Fursin, C. Dubach.

    Experience report: community-driven reviewing and validation of publications, in: Proceedings of the 1st Workshop on Reproducible Research Methodologies and New Publication Models in Computer Engineering (ACM SIGPLAN TRUST'14), ACM, 2014.

    http://dx.doi.org/10.1145/2618137.2618142
  • 77G. Fursin.

    Collective Tuning Initiative: automating and accelerating development and optimization of computing systems, in: Proceedings of the GCC Developers' Summit, June 2009.
  • 78G. Fursin, Y. Kashnikov, A. W. Memon, Z. Chamski, O. Temam, M. Namolaru, E. Yom-Tov, B. Mendelson, A. Zaks, E. Courtois, F. Bodin, P. Barnard, E. Ashton, E. Bonilla, J. Thomson, C. Williams, M. F. P. O'Boyle.

    Milepost GCC: Machine Learning Enabled Self-tuning Compiler, in: International Journal of Parallel Programming, 2011, vol. 39, pp. 296-327, 10.1007/s10766-010-0161-2.
  • 79G. Fursin, R. Miceli, A. Lokhmotov, M. Gerndt, M. Baboulin, A. D. Malony, Z. Chamski, D. Novillo, D. D. Vento.

    Collective Mind: towards practical and collaborative auto-tuning, in: Special issue on Automatic Performance Tuning for HPC Architectures, Scientific Programming Journal, 2014.
  • 80M. Gouiffès, F. Laguzet, L. Lacassagne.

    Color Connectedness Degree For Mean-Shift Tracking, in: IEEE International Conference on Pattern Recognition (ICPR), 2010.
  • 81M. Gouiffès, F. Laguzet, L. Lacassagne.

    Projection Histogram For Mean-Shift Tracking, in: IEEE International Conference on Image Processing (ICIP), 2010.
  • 82C. Grana, D. Borghesani, R. Cucchiara.

    Connected Component Labeling Techniques on Modern Architectures, in: ICIAP, IEEE, 2009, pp. 816-824.
  • 83L. Grigori, J. Demmel, H. Xiang.

    CALU: a communication optimal LU factorization algorithm, in: SIAM J. Matrix Anal. and Appl., 2011, vol. 32, pp. 1317-1350.
  • 84M. Gu, S. C. Eisenstat.

    Efficient Algorithms for Computing a Strong Rank-revealing QR Factorization, in: SIAM Journal on Scientific Computing, July 1996, vol. 17, no 4, pp. 848–869.

    http://dx.doi.org/10.1137/0917055
  • 85S. Guelton, J. Falcou, P. Brunet.

    Exploring the vectorization of python constructs using pythran and boost SIMD, in: Proceedings of the 2014 Workshop on Workshop on programming models for SIMD/Vector processing, ACM, 2014, pp. 79–86.
  • 86G. Guennebaud, B. Jacob.

    Eigen v3, 2010.

    http://eigen.tuxfamily.org
  • 87N. Halko, P. G. Martinsson, J. A. Tropp.

    Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions, in: SIAM Review, 2011, vol. 53, pp. 217–288.
  • 88C. Harris, M. Stephens.

    A combined corner and edge detector, in: 4th ALVEY Vision Conference, Editions Hermes, Paris, 1988.
  • 89L. He, Y. Chao, K. Suzuki.

    A run-based two-scan labeling algorithm, in: ICIAR, LNCS 4633, 2007, pp. 131-142.
  • 90R. M. Heiberger.

    Algorithm AS 127: Generation of Random Orthogonal Matrices, in: J. Roy. Statist. Soc. Ser. C (Applied Statistics), 1978, vol. 27, no 2, pp. 199-206.
  • 91N. J. Higham.

    J-Orthogonal Matrices: Properties and Generation, in: SIAM Rev., September 2003, vol. 45, no 3, pp. 504-519. [ DOI : 10.1137/S0036144502414930 ]
  • 92G. E. Hinton, S. Osindero.

    A fast learning algorithm for deep belief nets, in: Neural Computation, 2006, vol. 18.
  • 93S. Horowitz, T. Pavlidis.

    Picture segmentation by a tree traversal algorithm, in: Journal of the ACM, 1976, vol. 23, pp. 368-388.
  • 94T. Ikegami, T. Sakurai, U. Nagashima.

    A filter diagonalization for generalized eigenvalue problems based on the Sakurai-Sugiura projection method, in: Journal of Computational and Applied Mathematics, 2010, vol. 233, no 8, pp. 1927–1936.
  • 95Intel.

    Math Kernel Library.

    http://developer.intel.com/software/products/mkl/
  • 96V. Jimenez, I. Gelado, L. Vilanova, M. Gil, G. Fursin, N. Navarro.

    Predictive runtime code scheduling for heterogeneous architectures, in: Proceedings of the International Conference on High Performance Embedded Architectures & Compilers (HiPEAC 2009), January 2009.
  • 97C. S. Kenney, A. J. Laub.

    Small-sample statistical condition estimates for general matrix functions, in: SIAM J. Sci. Comput., 1994, vol. 15, pp. 36–61.
  • 98A. Khabou, J. Demmel, L. Grigori, M. Gu.

    LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version, in: SIAM Journal on Matrix Analysis and Applications, 2013, vol. 34, no 3, pp. 1401-1429.

    http://epubs.siam.org/doi/abs/10.1137/120863691
  • 99M. Kruse.

    Lattice QCD Optimization and Polytopic Representations of Distributed Memory, Université Paris-Sud 11, September, 26th 2014.
  • 100T. Kunlin, L. Lacassagne, A. Mérigot.

    A Fast image segmentation scheme, in: International Conference on Information and Communication Technologies, IEEE, 2004.
  • 101L. Lacassagne, D. Etiemble, A. Hassan Zahraee, A. Dominguez, P. Vezolle.

    High Level Transforms for SIMD and low-level computer vision algorithms, in: ACM Workshop on Programming Models for SIMD/Vector Processing (PPoPP), 2014, pp. 49-56.
  • 102L. Lacassagne, D. Etiemble, S. Kablia.

    16-bit Floating Point Instructions for embedded Multimedia Applications, in: CAMP: Computer Architecture and Machine Perception, IEEE, 2005.
  • 103L. Lacassagne, D. Etiemble.

    16-bit floating point operations for low-end and high-end embedded processors, in: ODES: Optimizations for DSP and Embedded Systems, IEEE/ACM, 2005.
  • 104L. Lacassagne, A. Manzanera, J. Denoulet, A. Mérigot.

    High Performance Motion Detection: Some trends toward new embedded architectures for vision systems, in: Journal of Real Time Image Processing, october 2008, pp. 127-148. [ DOI : 10.1007/s11554-008-0096-7 ]
  • 105L. Lacassagne, A. B. Zavidovique.

    Light Speed Labeling for RISC architectures, in: IEEE International Conference on Image Analysis and Processing (ICIP), 2009.
  • 106L. Lacassagne, B. Zavidovique.

    Light Speed Labeling: efficient connected component labeling on RISC architectures, in: Journal of Real-Time Image Processing, 2011, vol. 6, no 2, pp. 117-135.
  • 107F. Laguzet, M. Gouiffès, L. Lacassagne.

    Automatic color space switching for robust tracking, in: IEEE International Conference on Signal and Image Processing Applications (ICSIPA), 2011.
  • 108F. Laguzet, A. Romero, M. Gouiffès, L. Lacassagne, D. Etiemble.

    Color tracking with contextual switching: Real-time implementation on CPU, in: Journal of Real-Time Image Processing, 2013, pp. 1-18.
  • 109J. Lambert, L. Lacassagne, G. Rougeron, S. L. Berre, S. Chatillon.

    High Performance simulation of ultrasonic fields for Non Destructive Testing, in: International Symposium in Nuclear Application and Monte-Carlo, 2013.
  • 110J. Lambert, A. Pédron, G. Gens, F. Bimbard, L. Lacassagne, E. Iakovleva, S. L. Berre.

    Analysis of multicore CPU and GPU toward parallelization of Total Focusing Method ultrasound reconstruction, in: IEEE International Conference on Design and Architectures for Signal and Image Processing (DASIP), 2012.
  • 111J. Lambert, G. Rougeron, L. Lacassagne, S. Chatillon.

    A fast ultrasonic simulation tool based on massively parallel implementations, in: Review of Progress of Quantitative Nondestructive Evaluation, 2013.
  • 112Q. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. Corrado, J. Dean, A. Ng.

    Building high-level features using large scale unsupervised learning, in: International Conference in Machine Learning, 2012.
  • 113W. Ledermann, C. Alexander, D. Ledermann.

    Random Orthogonal Matrix Simulation, in: Linear Algebra Appl., 2011, vol. 434, no 6, pp. 1444-1467. [ DOI : 10.1016/j.laa.2010.10.023 ]
  • 114A. Leite, C. Tadonki, C. Eisenbeis, A. de Melo.

    A Fine-grained Approach for Power Consumption Analysis and Prediction, in: Procedia Computer Science, 2014, vol. 29, pp. 2260–2271.
  • 115S. Liu, C. Eisenbeis, J.-L. Gaudiot.

    A theoretical framework for value prediction in parallel systems, in: Parallel Processing (ICPP), 2010 39th International Conference on, IEEE, 2010, pp. 11–20.
  • 116M. W. Mahoney.

    Randomized algorithms for matrices and data, in: Foundations and Trends in Machine Learning, 2011, vol. 3, no 2, pp. 123–224.
  • 117D. Menard, R. Serizel, R. Rocher, O. Sentieys.

    Accuracy Constraint Determination in Fixed-Point System Design, in: Journal on Embedded Systems (JES), 2008, vol. 2008, pp. 1-12. [ DOI : 10.1155/2008/242584 ]
  • 118P. Monasse, F. Guichard.

    Fast computation of contrast-onvariant image representation, in: Transaction on, 2000, vol. 9,5, pp. 860-872.
  • 119S. Moufawad.

    Demmel type communication-avoiding generalized minimal residual method (CA-GMRES) on multicore hardwares: an application in QCD, American university of Beirut, Beirut, Libanon, june 2011, defended on 2010, June 10th.
  • 120M. Odersky.

    An Overview of the SCALA Programming Language, EPFL Lausanne, Switzerland, 2004, no IC/2004/64.
  • 121D. S. Parker.

    Random Butterfly Transformations with Applications in Computational Linear Algebra, Computer Science Department, UCLA, 1995, no CSD-950023.
  • 122M. Pharr, W. R. Mark.

    ISPC: A SPMD Compiler for High-Performance CPU Programming, in: Innovative Parallel Computing (InPar), 2012.
  • 123S. Piskorski, L. Lacassagne, D. Etiemble.

    IPLG: un outil pour la fusion d'opérateurs en Traitement d'Images, in: SYMPA, 2009.
  • 124S. Piskorski, L. Lacassagne, M. Kieffer, D. Etiemble.

    Efficient floating point interval processing for embedded systems and applications, in: SCAN - International Symposium of Scientific computing, Computer Arithmetic and Validated Numerics, 2006.
  • 125S. Pop, A. Cohen, C. Bastoul, S. Girbal, G. A. Silber, N. Vasilache.

    GRAPHITE: Loop optimizations based on the polyhedral model for GCC, in: Proc. of the 4th GCC Developper's Summit, June 2006, pp. 179–198.
  • 126A. Pédron, L. Lacassagne, V. Barbillon, F. Bimbard, G. Rougeron, S. L. Berre.

    Performance analysis of an ultrasound reconstruction algorithm for non destructuve testing, in: IEEE International Conference on Parallel Computing (ParCo), 2011.
  • 127A. Pédron, L. Lacassagne, F. Bimbard, S. L. Berre.

    Parallelization of an ultrasound reconstruction algorithm for non destructive testing on multicore CPU and GPU, in: IEEE International Conference on Design and Architectures for Signal and Image Processing (DASIP), 2011.
  • 128A. Romero, M. Gouiffès, L. Lacassagne.

    Feature Points tracking adaptative to Saturation, in: IEEE International Conference on Signal and Image Processing Applications (ICSIPA), 2011.
  • 129A. Romero, M. Gouiffès, L. Lacassagne.

    Covariance Descriptor Multiple Object Tracking and Re-Identification with Colorspace Evaluation, in: IEEE ACCV - Workshop on Detection and Tracking in Challenging Environnements, 2012.
  • 130A. Romero, M. Gouiffès, L. Lacassagne.

    Enhanced Local Binary Covariance Matrices (ELBCM) for texture analysis and object tracking, in: ACM International Conference on Computer Vision / Computer Graphics Collaboration Techniques and Applications, 2013.
  • 131A. Romero, L. Lacassagne, M. Gouiffès.

    Real-time covariance tracking algorithm for embedded systems, in: IEEE International Conference on Design and Architectures for Signal and Image Processing (DASIP), 2013.
  • 132A. Rosenfeld, J. Platz.

    Sequential operator in digital pictures processing, in: Journal of ACM, 1966, vol. 13,4, pp. 471-494.
  • 133A. Rémy, M. Baboulin, M. Sosonkina, B. Rozoy.

    Locality optimization on a NUMA architecture for hybrid LU factorization, in: International Conference on Parallel Computing (PARCO 2013), Advances in Parallel Computing, IOS Press, 2014, vol. 25, pp. 153-162.
  • 134T. Saidani, J. Falcou, C. Tadonki, L. Lacassagne, D. Etiemble.

    Algorithmic Skeletons within an Embedded Domain Specific Language for the Cell Processor, in: Parallel Architectures and Compilation Techniques, PACT, 2009, pp. 67-76.
  • 135T. Saidani, L. Lacassagne, S. Bouaziz, T. M. Khan.

    Parallelization Strategies for the Points of Interests Algorithm on the Cell Processor, in: Lecture Notes in Computer Science, Springer, 2007, pp. 104-112. [ DOI : 10.1007/978-3-540-74742-0 ]
  • 136T. Saidani, S. Piskorski, L. Lacassagne, S. Bouaziz.

    Parallelization Schemes for Memory Optimization on the Cell Processor: A Case Study of Image Processing Algorithm, in: PACT-MEDEA, 2007, pp. 15-19.
  • 137C. Sanderson.

    Armadillo: An open source C++ linear algebra library for fast prototyping and computationally intensive experiments, in: Report Version, 2010, vol. 2.
  • 138J. Siek, L.-Q. Lee, A. Lumsdaine.

    Boost Random Number Library, June 2000.

    http://www.boost.org/libs/graph/
  • 139D. Spinellis.

    Notable design patterns for domain-specific languages, in: Journal of Systems and Software, 2001, vol. 56, no 1, pp. 91 - 99. [ DOI : 10.1016/S0164-1212(00)00089-3 ]

    http://www.sciencedirect.com/science/article/pii/S0164121200000893
  • 140G. W. Stewart.

    The Efficient Generation of Random Orthogonal Matrices With an Application to Condition Estimators, in: SIAM J. Numer. Anal., 1980, vol. 17, no 3, pp. 403-409.
  • 141A. K. Sujeeth, A. Gibbons, K. J. Brown, H. Lee, T. Rompf, M. Odersky, K. Olukotun.

    Forge: Generating a High Performance DSL Implementation from a Declarative Specification, in: 12th International Conference on Generative Programming: Concepts and Experiences, 2013.
  • 142A. K. Sujeeth, T. Rompf, K. J. Brown, H. Lee, H. Chafi, V. Popic, M. Wu, A. Prokopec, V. Jovanovic, M. Odersky, K. Olukotun.

    Composition and Reuse with Compiled Domain-Specific Languages, in: ECOOP'13: European Conference on Object-Oriented Programming, 2013.
  • 143V. Sundriyal, M. Sosonkina, A. Gaenko, Z. Zhang.

    Energy saving strategies for parallel applications with point-to-point communication phases, in: Journal of Parallel and Distributed Computing, 2013. [ DOI : 10.1016/j.jpdc.2013.03.006 ]
  • 144V. Sundriyal, M. Sosonkina, Z. Zhang.

    Automatic runtime frequency-scaling system for energy savings in parallel applications, in: The Journal of Supercomputing, 2014, vol. 68, no 2, pp. 777–797.
  • 145K. Suzuki, I. Horiba, N. Sugie.

    Linear-time connected component labeling based on sequential local operations, in: Computer Vision and Image Understanding, january 2003, vol. 89, no 1, pp. 1-23. [ DOI : 10.1016/S1077-3142(02)00030-9 ]
  • 146H. Tabia, M. Gouiffès, L. Lacassagne.

    Motion histogram quantification for human action recognition, in: IEEE International Conference on Pattern Recognition (ICPR), 2012.
  • 147H. Tabia, M. Gouiffès, L. Lacassagne.

    Motion modeling for abnormal event detection in crowd scenes, in: IEEE International Conference on Pattern Recognition (ISCIVC), 2012.
  • 148C. Tadonki, L. Lacassagne, T. Saïdani, J. Falcou, K. Hamidouche.

    The Harris algorithm revisited on the Cell processor, in: International Workshop on Highly-Efficient Accelerators and Reconfigurable Technologies (HEART), 2010.
  • 149S. Tomov, J. Dongarra, M. Baboulin.

    Towards dense linear algebra for hybrid GPU accelerated manycore systems, in: Parallel Computing, 2010, vol. 36, no 5&6, pp. 232–240.
  • 150University of Tennessee.

    PLASMA Users' Guide, Parallel Linear Algebra Software for Multicore Architectures, Version 2.3, 2010.
  • 151T. L. Veldhuizen.

    Active Libraries and Universal Languages, Indiana University Computer Science, May 2004.

    http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.9.3916
  • 152H. Wang, H. Andrade, B. Gedik, K.-L. Wu.

    A Code Generation Approach for Auto-Vectorization in the Spade Compiler, in: LCPC'09, 2009, pp. 383-390.
  • 153Y. Wang, M. Baboulin, J. Dongarra, J. Falcou, Y. Fraigneau, O. L. Maître.

    A parallel solver for incompressible fluid flows, in: International Conference on Computational Science (ICCS 2013), Procedia Computer Science, Elsevier, 2013, vol. 18, pp. 439–448.
  • 154Y. Wang, M. Baboulin, K. Rupp, O. Le Maître, Y. Fraigneau.

    Solving 3D Incompressible Navier-Stokes Equations on Hybrid CPU/GPU Systems, in: Proceedings of the High Performance Computing Symposium, San Diego, CA, USA, HPC '14, Society for Computer Simulation International, 2014, pp. 12:1–12:8.

    http://dl.acm.org/citation.cfm?id=2663510.2663522
  • 155H. Ye, L. Lacassagne, D. Etiemble, L. Cabaret, J. Falcou, O. Florent.

    Impact of High Level Transforms on High Level Synthesis for motion detection algorithm, in: IEEE International Conference on Design and Architectures for Signal and Image Processing (DASIP), 2012, pp. 1-8.
  • 156H. Ye, L. Lacassagne, J. Falcou, D. Etiemble, L. Cabaret, O. Florent.

    High Level Transforms to reduce energy consumption of signal and image processing operators, in: IEEE International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS), 2013, pp. 247-254.