Bibliography
Major publications by the team in recent years
-
1M. Baboulin, D. Becker, J. Dongarra.
A Parallel Tiled Solver for Dense Symmetric Indefinite Systems on Multicore Architectures, in: Proceedings of IEEE International Parallel & Distributed Processing Symposium (IPDPS 2012), 2012, pp. 14-24. -
2M. Baboulin, S. Donfack, J. Dongarra, L. Grigori, A. Rémy, S. Tomov.
A class of communication-avoiding algorithms for solving general dense linear systems on CPU/GPU parallel machines, in: International Conference on Computational Science (ICCS 2012), Procedia Computer Science, Elsevier, 2012, vol. 9, pp. 17–26. -
3M. Baboulin, J. Dongarra, J. Herrmann, S. Tomov.
Accelerating linear system solutions using randomization techniques, in: ACM Trans. Math. Softw., 2013, vol. 39, no 2. -
4M. Baboulin, S. Gratton.
A contribution to the conditioning of the total least squares problem, in: SIAM J. Matrix Anal. and Appl., 2011, vol. 32, no 3, pp. 685–699. -
5M. Bahi, C. Eisenbeis.
Impact of Reverse Computing on Information Locality in Register Allocation for High Performance Computing, in: International Journal of Parallel Programming, 2012, pp. 1–28. -
6D. Barthou, O. Brand-Foissac, O. Pene, G. Grosdidier, R. Dolbeau, C. Eisenbeis, M. Kruse, K. Petrov, C. Tadonki.
Automated Code Generation for Lattice Quantum Chromodynamics and beyond, in: Journal of Physics: Conference Series, 2014, vol. 510, 012005 p, LPT-Orsay-13-142. [ DOI : 10.1088/1742-6596/510/1/012005 ]
http://hal.inria.fr/hal-00926513 -
7P. Esterie, J. Falcou, M. Gaunard, J.-T. Lapresté, L. Lacassagne.
The Numerical Template toolbox: A Modern C++ Design for Scientific Computing, in: Journal of Parallel and Distributed Computing, July 2014. [ DOI : 10.1016/j.jpdc.2014.07.002 ]
https://hal.inria.fr/hal-01061305 -
8P. Esterie, M. Gaunard, J. Falcou, J.-T. Lapresté.
Exploiting Multimedia Extensions in C++: A Portable Approach, in: Computing in Science & Engineering, 2012, vol. 14, no 5, pp. 72–77. -
9A. Ferreira Leite.
A User-Centered and Autonomic Multi-Cloud Architecture for High Performance Computing Applications, Paris-Sud XI ; Universidade de Brasília, December 2014.
https://hal.inria.fr/tel-01097295 -
10G. Fursin, Y. Kashnikov, A. W. Memon, Z. Chamski, O. Temam, M. Namolaru, E. Yom-Tov, B. Mendelson, A. Zaks, E. Courtois, F. Bodin, P. Barnard, E. Ashton, E. Bonilla, J. Thomson, C. Williams, M. F. P. O'Boyle.
Milepost GCC: Machine Learning Enabled Self-tuning Compiler, in: International Journal of Parallel Programming, 2011, vol. 39, pp. 296-327, 10.1007/s10766-010-0161-2. -
11M. Kruse.
Lattice QCD Optimization and Polytopic Representations of Distributed Memory, Paris-Sud XI, September 2014.
https://hal.inria.fr/tel-01078440 -
12S. Tomov, J. Dongarra, M. Baboulin.
Towards dense linear algebra for hybrid GPU accelerated manycore systems, in: Parallel Computing, 2010, vol. 36, no 5&6, pp. 232–240.
Doctoral Dissertations and Habilitation Theses
-
13A. Ferreira Leite.
A User-Centered and Autonomic Multi-Cloud Architecture for High Performance Computing Applications, Paris-Sud XI, December 2014.
https://hal.inria.fr/tel-01097295 -
14M. Kruse.
Lattice QCD Optimization and Polytopic Representations of Distributed Memory, Paris-Sud XI, September 2014.
https://hal.inria.fr/tel-01078440
Articles in International Peer-Reviewed Journals
-
15M. Baboulin, D. Becker, G. Bosilca, A. Danalis, J. Dongarra.
An efficient distributed randomized algorithm for solving large dense symmetric indefinite linear systems, in: Parallel Computing, July 2014, vol. 40, no 7, pp. 213-223. [ DOI : 10.1016/j.parco.2013.12.003 ]
https://hal.inria.fr/hal-01024857 -
16M. Baboulin, S. Gratton, R. Lacroix, A. J. Laub.
Statistical estimates for the conditioning of linear least squares problems, in: Lecture notes in computer science, 2014, vol. 8384, pp. 124-133. [ DOI : 10.1007/978-3-642-55224-3_13 ]
https://hal.inria.fr/hal-00991710 -
17D. Barthou, O. Brand-Foissac, O. Pene, G. Grosdidier, R. Dolbeau, C. Eisenbeis, M. Kruse, K. Petrov, C. Tadonki.
Automated Code Generation for Lattice Quantum Chromodynamics and beyond, in: Journal of Physics: Conference Series, 2014, vol. 510, 012005 p, LPT-Orsay-13-142. [ DOI : 10.1088/1742-6596/510/1/012005 ]
https://hal.inria.fr/hal-00926513 -
18P. Esterie, J. Falcou, M. Gaunard, J.-T. Lapresté, L. Lacassagne.
The Numerical Template toolbox: A Modern C++ Design for Scientific Computing, in: Journal of Parallel and Distributed Computing, July 2014. [ DOI : 10.1016/j.jpdc.2014.07.002 ]
https://hal.inria.fr/hal-01061305 -
19G. Fursin, R. Miceli, A. Lokhmotov, M. Gerndt, M. Baboulin, A. D. Malony, Z. Chamski, D. Novillo, D. D. Vento.
Collective mind: Towards practical and collaborative auto-tuning, in: Scientific Programming, July 2014, vol. 22, no 4, pp. 309-329. [ DOI : 10.3233/SPR-140396 ]
https://hal.inria.fr/hal-01054763 -
20A. Romero, L. Lacassagne, M. Gouiffès, A. Hassan Zahraee.
Covariance tracking: architecture optimizations for embedded systems, in: EURASIP Journal on Advances in Signal Processing, December 2014, 25 p. [ DOI : 10.1186/1687-6180-2014-175 ]
https://hal.inria.fr/hal-01094903 -
21M. Szydlarski, P. Esterie, J. Falcou, L. Grigori, R. Stompor.
Spherical harmonic transform on heterogeneous architectures using hybrid programming, in: Concurrency and Computation Practice and Experience, March 2014, vol. 26, no 3, 28 p. [ DOI : 10.1002/cpe.3038 ]
https://hal.inria.fr/hal-01091256
International Conferences with Proceedings
-
22L. Bagnères, C. Bastoul.
Switchable Scheduling for Runtime Adaptation of Optimization, in: Euro-Par 2014 Parallel Processing, Porto, Portugal, Lecture Notes in Computer Science, Springer International Publishing, August 2014, vol. 8632, pp. 222 - 233. [ DOI : 10.1007/978-3-319-09873-9_19 ]
https://hal.inria.fr/hal-01097200 -
23L. Cabaret, L. Lacassagne.
What Is the World's Fastest Connected Component Labeling Algorithm?, in: SiPS: IEEE International Workshop on Signal Processing Systems, Belfast, United Kingdom, IEEE, October 2014, 6 p.
https://hal.inria.fr/hal-01094905 -
24L. Cabaret, L. Lacassagne, L. Oudni.
A Review of World's Fastest Connected Component Labeling Algorithms: Speed and Energy Estimation, in: International Conference on Design and Architectures for Signal and Image Processing, Madrid, Spain, October 2014.
https://hal.inria.fr/hal-01081962 -
25A. Ferreira Leite, C. Tadonki, C. Eisenbeis, T. Raiol, M. E. Walter, A. C. Alves De Melo.
Excalibur: An Autonomic Cloud Architecture for Executing Parallel Applications, in: Fourth International Workshop on Cloud Data and Platforms (CloudDP), Amsterdam, Netherlands, April 2014. [ DOI : 10.1145/2592784.2592786 ]
https://hal-mines-paristech.archives-ouvertes.fr/hal-01087315 -
26L. Lacassagne, D. Etiemble, A. Hassan Zahraee, A. Dominguez, P. Vezolle.
High Level Transforms for SIMD and Low-Level Computer Vision Algorithms, in: Symposium on Principles and Practice of Parallel Programming / WPMVP, Orlando, Florida, United States, February 2014, 8 p. [ DOI : 10.1145/2568058.2568067 ]
https://hal.inria.fr/hal-01094906 -
27A. Leite, C. Tadonki, C. Eisenbeis, A. De Melo.
A Fine-grained Approach for Power Consumption Analysis and Prediction, in: International Conference on Computational Science - ICCS, Cairns, Australia, June 2014. [ DOI : 10.1016/j.procs.2014.05.211 ]
https://hal.inria.fr/hal-01074959 -
28A. Tran Tan, J. Falcou, D. Etiemble, H. Kaiser.
Automatic Task-based Code Generation for High Performance Domain Specific Embedded Language, in: HLPP 2014, Amsterdam, Netherlands, July 2014.
https://hal.inria.fr/hal-01061423 -
29O. Zinenko, C. Bastoul, S. Huot.
Manipulating Visualization, Not Codes, in: International Workshop on Polyhedral Compilation Techniques (IMPACT), Amsterdam, Netherlands, January 2015, 8 p.
https://hal.inria.fr/hal-01100974
Scientific Books (or Scientific Book chapters)
-
30A. Rémy, M. Baboulin, M. Sosonkina, B. Rozoy.
Locality Optimization on a NUMA Architecture for Hybrid LU Factorization, in: Advances in Parallel Computing, 2014, vol. 25, pp. 153-162. [ DOI : 10.3233/978-1-61499-381-0-153 ]
https://hal.inria.fr/hal-00987284
Internal Reports
-
31M. Baboulin, J. Dongarra, R. Lacroix.
Computing least squares condition numbers on hybrid multicore/GPU systems, February 2014, no RR-8479.
https://hal.inria.fr/hal-00947204 -
32M. Baboulin, J. Falcou, I. Masliah.
Towards an automatic generation of dense linear algebra solvers on parallel architectures, Université Paris-Sud, October 2014, no RR-8615, 20 p.
https://hal.inria.fr/hal-01075663 -
33M. Baboulin, X. S. Li, F.-H. Rouet.
Using Random Butterfly Transformations to Avoid Pivoting in Sparse Direct Methods, Inria, February 2014, no RR-8481, Also appeared as Lapack Working Note 285.
https://hal.inria.fr/hal-00950612 -
34G. Fursin, C. Dubach.
Experience report: community-driven reviewing and validation of publications, June 2014.
https://hal.inria.fr/hal-01006563 -
35A. Rémy, M. Baboulin, M. Sosonkina, B. Rozoy.
Locality optimization on a NUMA architecture for hybrid LU factorization, March 2014, no RR-8497.
https://hal.inria.fr/hal-00957673
Other Publications
-
36D. Barthou, O. Brand-Foissac, R. Dolbeau, G. Grosdidier, C. Eisenbeis, M. Kruse, O. Pene, K. Petrov, C. Tadonki.
Automated Code Generation for Lattice Quantum Chromodynamics and beyond, January 2014.
https://hal.archives-ouvertes.fr/hal-00930288 -
37J. Lambert, H. Chouh, G. Rougeron, V. Bergeaud, S. Chatillon, L. Lacassagne, J.-C. Iehl, J.-P. Farrugia, V. Ostromoukhov.
Interactive Ultrasonic Field Simulation For Non-Destructive Testing, June 2014, vol. 33, no 2, 25th Eurographics Symposium on Rendering.
https://hal.inria.fr/hal-01093294 -
38J. Lambert, G. Rougeron, L. Lacassagne.
Calcul de champ ultrasonore interactif pour le contrôle non destructif, May 2014, Les Journées COFREND.
https://hal.inria.fr/hal-01093131
-
39The HiPEAC vision on high-performance and embedded architecture and compilation (2012-2020), 2012.
http://www.hipeac.net/roadmap -
40European Union Framework Program 6 MILEPOST project No 035307 (MachIne Learning for Embedded PrOgramS opTimization).
http://cordis.europa.eu/project/rcn/79763_en.html -
41PRACE: Partnership for Advanced Computing in Europe.
http://www.prace-project.eu -
42AMD.
AMD Core Math Library.
http://developer.amd.com/libraries/acml/ -
43E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. D. Croz, A. Greenbaum, S. Hammarling, A. McKenney, D. Sorensen.
LAPACK Users' Guide, SIAM, 1999, Third edition. -
44K. Aneja, F. Laguzet, L. Lacassagne, A. Merigot.
Video rate image segmentation by means of region splitting and merging, in: IEEE International Conference on Signal and Image Processing Applications (ICSIPA), 2009. -
45M. Arioli, M. Baboulin, S. Gratton.
A partial condition number for linear least-squares problems, in: SIAM J. Matrix Anal. and Appl., 2007, vol. 29, no 2, pp. 413–433. -
46K. Asanovic.
The landscape of parallel computing research: a view from Berkeley, Electrical Engineering and Computer Sciences, University of California at Berkeley, December 2006, no UCB/EECS-2006-183.
http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.pdf -
47A. Avron, P. Maymounkov, S. Toledo.
Blendenpick: Supercharging LAPACK’s least-squares solvers, in: SIAM J. Sci. Comput., 2010, vol. 32, pp. 1217–1236. -
48M. Baboulin, D. Becker, G. Bosilca, A. Danalis, J. Dongarra.
An efficient distributed randomized algorithm for solving large dense symmetric indefinite linear systems, in: Parallel Computing, 2014, vol. 40, no 7, pp. 213–223. -
49M. Baboulin, D. Becker, J. Dongarra.
A Parallel Tiled Solver for Dense Symmetric Indefinite Systems on Multicore Architectures, in: Proceedings of IEEE International Parallel & Distributed Processing Symposium (IPDPS 2012), 2012, pp. 14-24. -
50M. Baboulin, A. Buttari, J. Dongarra, J. Kurzak, J. Langou, J. Langou, P. Luszczek, S. Tomov.
Accelerating scientific computations with mixed precision algorithms, in: Computer Physics Communications, 2009, vol. 180, no 12, pp. 2526–2533. -
51M. Baboulin, S. Donfack, J. Dongarra, L. Grigori, A. Rémy, S. Tomov.
A class of communication-avoiding algorithms for solving general dense linear systems on CPU/GPU parallel machines, in: International Conference on Computational Science (ICCS 2012), Procedia Computer Science, Elsevier, 2012, vol. 9, pp. 17–26. -
52M. Baboulin, J. Dongarra, J. Demmel, S. Tomov, V. Volkov.
Enhancing the performance of dense linear algebra solvers on GPUs in the MAGMA project, November 15, 2008.
http://www.lri.fr/~baboulin/SC08.pdf -
53M. Baboulin, J. Dongarra, S. Gratton, J. Langou.
Computing the conditioning of the components of a linear least squares solution, in: Numerical Linear Algebra with Applications, 2009, vol. 16, no 7, pp. 517–533. -
54M. Baboulin, J. Dongarra, J. Herrmann, S. Tomov.
Accelerating linear system solutions using randomization techniques, in: ACM Trans. Math. Softw., 2013, vol. 39, no 2. -
55M. Baboulin, J. Dongarra, R. Lacroix.
Computing least squares condition numbers on hybrid multicore/GPU systems, in: Proceedings of the International Conference of Applied Mathematics, Modeling and Computational Science (AMMCS 2013), 2013. -
56M. Baboulin, J. Dongarra, S. Tomov.
Some Issues in Dense Linear Algebra for Multicore and Special Purpose Architectures, in: 9th International Workshop on State-of-the-Art in Scientific and Parallel Computing (PARA'08), Lecture Notes in Computer Science, Springer-Verlag, 2008, vol. 6126-6127. -
57M. Baboulin, S. Gratton.
A contribution to the conditioning of the total least squares problem, in: SIAM J. Matrix Anal. and Appl., 2011, vol. 32, no 3, pp. 685–699. -
58M. Baboulin, S. Gratton, R. Lacroix, A. J. Laub.
Statistical estimates for the conditioning of linear least squares problems, in: 10th International Conference on Parallel Processing and Applied Mathematics (PPAM 2013), Heidelberg, R. Wyrzykowski (editor), Lecture Notes in Computer Science, Springer-Verlag, 2014, vol. 8384, pp. 124-133. -
59M. Baboulin, X. S. Li, F.-H. Rouet.
Using Random Butterfly Transformations to Avoid Pivoting in Sparse Direct Methods, in: Proceedings of VECPAR 2014, 2014. -
60J. C. Baez, M. Stay.
Algorithmic thermodynamics, in: Mathematical Structures in Computer Science, 2012, vol. 22, no 5, pp. 771–787.
http://dx.doi.org/10.1017/S0960129511000521 -
61M. Bahi, C. Eisenbeis.
Spatial complexity of reversibly computable DAG, in: Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems, ACM, 2009, pp. 47–56. -
62M. Bahi, C. Eisenbeis.
Impact of Reverse Computing on Information Locality in Register Allocation for High Performance Computing, in: International Journal of Parallel Programming, 2012, pp. 1–28. -
63D. Barthou, O. Brand-Foissac, O. Pene, G. Grosdidier, R. Dolbeau, C. Eisenbeis, M. Kruse, K. Petrov, C. Tadonki.
Automated Code Generation for Lattice Quantum Chromodynamics and beyond, in: Journal of Physics: Conference Series, 2014, vol. 510, 012005, LPT-Orsay-13-142. [ DOI : 10.1088/1742-6596/510/1/012005 ]
http://hal.inria.fr/hal-00926513 -
64D. Barthou, G. Grosdidier, C. Eisenbeis, P. Guichon, M. Kruse, O. Pene, K. Petrov, C. Tadonki.
PetaQCD: En Route for the automatic code generation for lattice QCD, in: Proceedings of the 29th International Symposium on Lattice field theory (Lattice 2011), 2011, vol. 2011. -
65P. Basu, S. Williams, B. V. Straalen, A. Venkat, L. Oliker, M. Hall.
Compiler Generation and Autotuning of Communication-Avoiding Operators for Geometric Multigrid, in: High Performance Computing Conference (HiPC), december 2013. -
66D. Becker, M. Baboulin, J. Dongarra.
Reducing the amount of pivoting in symmetric indefinite systems, in: 9th International Conference on Parallel Processing and Applied Mathematics (PPAM 2011), Heidelberg, R. Wyrzykowski (editor), Lecture Notes in Computer Science, Springer-Verlag, 2012, vol. 7203, pp. 133–142. -
67T. Betcke, N. J. Higham, V. Mehrmann, C. Schröder, F. Tisseur.
NLEVP: A Collection of Nonlinear Eigenvalue Problems, in: ACM Trans. Math. Software, February 2013, vol. 39, no 2, pp. 7:1-7:28. [ DOI : 0.1145/2427023.2427024 ] -
68L. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, R. Whaley.
ScaLAPACK Users' Guide, SIAM, 1997, pp. 58–60. -
69Blaze.
The Blaze Library, 2014.
https://code.google.com/p/blaze-lib/ -
70G. Bradski.
The OpenCV Library, in: Dr. Dobb's Journal of Software Tools, 2000. -
71L. Cabaret, L. Lacassagne.
A Review of Worlds Fastest Connected Component Labeling Algorithms : Speed and Energy Estimation, in: IEEE International Conference on Design and Architectures for Signal and Image Processing (DASIP), 2014, pp. 1-8. -
72L. Cabaret, L. Lacassagne.
What is the world fastest Connected Component Labeling Algorithm ?, in: IEEE International Workshop on Signal Processing Systems (SiPS), 2014, pp. 1-6. -
73V. G. Cerf.
Where is the science in computer science?, in: Communications of the ACM, 2012, vol. 55, no 10, pp. 5-5. -
74M. O. Cheema, L. Lacassagne, O. Hammami.
System-Platforms-Based SystemC TLM Design of Image Processing Chains for Embedded Applications, in: EURASIP Journal on Embedded Systems, 2007, pp. 1-14. [ DOI : 10.1155/2007/71043 ] -
75P. Courbin, A. Pédron, T. Saidani, L. Lacassagne.
Parallélisation d'opérateurs de TI: multi-coeurs, Cell ou GPU ?, in: GRETSI, 2009. -
76K. Czarnecki, U. W. Eisenecker, R. Glück, D. Vandevoorde, T. L. Veldhuizen.
Generative Programming and Active Libraries, in: Generic Programming, 1998, pp. 25-39. -
77P. I. Davies, N. J. Higham.
Numerically Stable Generation of Correlation Matrices and their Factors, in: BIT, 2000, vol. 40, no 4, pp. 640-651. -
78J. W. Demmel, L. Grigori, M. Hoemmen, J. Langou.
Communication-optimal parallel and sequential QR and LU factorizations, in: SIAM Journal on Scientific Computing, 2012, vol. 34, no 1, pp. 206–239. -
79J. W. Demmel, A. McKenney.
A Test Matrix Generation Suite, Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, USA, March 1989, no MCS-P69-0389, 16 p, LAPACK Working Note 9. -
80J. Dongarra et.al..
The International Exascale Software Project roadmap, in: Int. J. High Perform. Comput. Appl., February 2011, vol. 25, no 1, pp. 3–60.
http://dx.doi.org/10.1177/1094342010391989 -
81A. Elouardi, S. Bouaziz, A. Dupret, L. Lacassagne, J. Klein, R. Reynaud.
A smart sensor based vision system: implementation and evaluation, in: Journal of Applied Physics, 2006, vol. 39, pp. 1694-1705. [ DOI : 10.1088/0022-3727/39/8/033 ] -
82A. Elouardi, S. Bouaziz, A. Dupret, L. Lacassagne, J. Klein, R. Reynaud.
A Smart Architecture for Low-Level Image Computing, in: International Journal of Computer Sciences and Application, 2008, vol. 5,3, pp. 1-19. -
83P. Esterie, J. Falcou, M. Gaunard, J.-T. Lapresté, L. Lacassagne.
The numerical template toolbox: A modern C++ design for scientific computing, in: Journal of Parallel and Distributed Computing, 2014. -
84P. Esterie, M. Gaunard, J. Falcou, J.-T. Lapresté.
Exploiting Multimedia Extensions in C++: A Portable Approach, in: Computing in Science & Engineering, 2012, vol. 14, no 5, pp. 72–77. -
85P. Estérie, M. Gaunard, J. Falcou.
A proposal to add single instruction multiple data computation to the standard library, in: N3561, 2013. -
86D. Etiemble, S. Piskorski, L. Lacassagne.
Performance evaluation of Altera C2H compiler on image processing benchmarks, in: TCHA: Workshop on Tools And Compiler for Hardware Acceleration, 2006. -
87J. Falcou, L. Lacassagne, S. Schaetz.
Cell MPI: Mastering the Cell Broadband Engine architecture through a Boost based parallel communication library, in: Boost Conference, 2011. -
88J. Falcou, T. Saidani, L. Lacassagne, D. Etiemble.
Programmation par squelettes algorithmiques pour le processeur Cell, in: SYMPA, 2008. -
89J. Falcou, J. Sérot, L. Pech, J.-T. Lapresté.
Meta-programming applied to automatic SMP parallelization of linear algebra code, in: Euro-Par 2008–Parallel Processing, Springer Berlin Heidelberg, 2008, pp. 729–738. -
90G. Fursin, C. Dubach.
Experience report: community-driven reviewing and validation of publications, in: Proceedings of the 1st Workshop on Reproducible Research Methodologies and New Publication Models in Computer Engineering (ACM SIGPLAN TRUST'14), ACM, 2014.
http://dx.doi.org/10.1145/2618137.2618142 -
91G. Fursin.
Collective Tuning Initiative: automating and accelerating development and optimization of computing systems, in: Proceedings of the GCC Developers' Summit, June 2009. -
92G. Fursin, Y. Kashnikov, A. W. Memon, Z. Chamski, O. Temam, M. Namolaru, E. Yom-Tov, B. Mendelson, A. Zaks, E. Courtois, F. Bodin, P. Barnard, E. Ashton, E. Bonilla, J. Thomson, C. Williams, M. F. P. O'Boyle.
Milepost GCC: Machine Learning Enabled Self-tuning Compiler, in: International Journal of Parallel Programming, 2011, vol. 39, pp. 296-327, 10.1007/s10766-010-0161-2. -
93G. Fursin, R. Miceli, A. Lokhmotov, M. Gerndt, M. Baboulin, A. D. Malony, Z. Chamski, D. Novillo, D. D. Vento.
Collective Mind: towards practical and collaborative auto-tuning, in: Special issue on Automatic Performance Tuning for HPC Architectures, Scientific Programming Journal, 2014. -
94M. Gouiffès, F. Laguzet, L. Lacassagne.
Color Connectedness Degree For Mean-Shift Tracking, in: IEEE International Conference on Pattern Recognition (ICPR), 2010. -
95M. Gouiffès, F. Laguzet, L. Lacassagne.
Projection Histogram For Mean-Shift Tracking, in: IEEE International Conference on Image Processing (ICIP), 2010. -
96C. Grana, D. Borghesani, R. Cucchiara.
Connected Component Labeling Techniques on Modern Architectures, in: ICIAP, IEEE, 2009, pp. 816-824. -
97L. Grigori, J. Demmel, H. Xiang.
CALU: a communication optimal LU factorization algorithm, in: SIAM J. Matrix Anal. and Appl., 2011, vol. 32, pp. 1317-1350. -
98M. Gu, S. C. Eisenstat.
Efficient Algorithms for Computing a Strong Rank-revealing QR Factorization, in: SIAM Journal on Scientific Computing, July 1996, vol. 17, no 4, pp. 848–869.
http://dx.doi.org/10.1137/0917055 -
99S. Guelton, J. Falcou, P. Brunet.
Exploring the vectorization of python constructs using pythran and boost SIMD, in: Proceedings of the 2014 Workshop on Workshop on programming models for SIMD/Vector processing, ACM, 2014, pp. 79–86. -
100G. Guennebaud, B. Jacob.
Eigen v3, 2010.
http://eigen.tuxfamily.org -
101N. Halko, P. G. Martinsson, J. A. Tropp.
Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions, in: SIAM Review, 2011, vol. 53, pp. 217–288. -
102C. Harris, M. Stephens.
A combined corner and edge detector, in: 4th ALVEY Vision Conference, Editions Hermes, Paris, 1988. -
103L. He, Y. Chao, K. Suzuki.
A run-based two-scan labeling algorithm, in: ICIAR, LNCS 4633, 2007, pp. 131-142. -
104R. M. Heiberger.
Algorithm AS 127: Generation of Random Orthogonal Matrices, in: J. Roy. Statist. Soc. Ser. C (Applied Statistics), 1978, vol. 27, no 2, pp. 199-206. -
105N. J. Higham.
-Orthogonal Matrices: Properties and Generation, in: SIAM Rev., September 2003, vol. 45, no 3, pp. 504-519. [ DOI : 10.1137/S0036144502414930 ] -
106G. E. Hinton, S. Osindero.
A fast learning algorithm for deep belief nets, in: Neural Computation, 2006, vol. 18. -
107S. Horowitz, T. Pavlidis.
Picture segmentation by a tree traversal algorithm, in: Journal of the ACM, 1976, vol. 23, pp. 368-388. -
108T. Ikegami, T. Sakurai, U. Nagashima.
A filter diagonalization for generalized eigenvalue problems based on the Sakurai-Sugiura projection method, in: Journal of Computational and Applied Mathematics, 2010, vol. 233, no 8, pp. 1927–1936. -
109Intel.
Math Kernel Library.
http://developer.intel.com/software/products/mkl/ -
110V. Jimenez, I. Gelado, L. Vilanova, M. Gil, G. Fursin, N. Navarro.
Predictive runtime code scheduling for heterogeneous architectures, in: Proceedings of the International Conference on High Performance Embedded Architectures & Compilers (HiPEAC 2009), January 2009. -
111C. S. Kenney, A. J. Laub.
Small-sample statistical condition estimates for general matrix functions, in: SIAM J. Sci. Comput., 1994, vol. 15, pp. 36–61. -
112A. Khabou, J. Demmel, L. Grigori, M. Gu.
LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version, in: SIAM Journal on Matrix Analysis and Applications, 2013, vol. 34, no 3, pp. 1401-1429.
http://epubs.siam.org/doi/abs/10.1137/120863691 -
113M. Kruse.
Lattice QCD Optimization and Polytopic Representations of Distributed Memory, Université Paris-Sud 11, September, 26th 2014. -
114T. Kunlin, L. Lacassagne, A. Mérigot.
A Fast image segmentation scheme, in: International Conference on Information and Communication Technologies, IEEE, 2004. -
115L. Lacassagne, D. Etiemble, A. Hassan Zahraee, A. Dominguez, P. Vezolle.
High Level Transforms for SIMD and low-level computer vision algorithms, in: ACM Workshop on Programming Models for SIMD/Vector Processing (PPoPP), 2014, pp. 49-56. -
116L. Lacassagne, D. Etiemble, S. Kablia.
16-bit Floating Point Instructions for embedded Multimedia Applications, in: CAMP: Computer Architecture and Machine Perception, IEEE, 2005. -
117L. Lacassagne, D. Etiemble.
16-bit floating point operations for low-end and high-end embedded processors, in: ODES: Optimizations for DSP and Embedded Systems, IEEE/ACM, 2005. -
118L. Lacassagne, A. Manzanera, J. Denoulet, A. Mérigot.
High Performance Motion Detection: Some trends toward new embedded architectures for vision systems, in: Journal of Real Time Image Processing, october 2008, pp. 127-148. [ DOI : 10.1007/s11554-008-0096-7 ] -
119L. Lacassagne, A. B. Zavidovique.
Light Speed Labeling for RISC architectures, in: IEEE International Conference on Image Analysis and Processing (ICIP), 2009. -
120L. Lacassagne, B. Zavidovique.
Light Speed Labeling: efficient connected component labeling on RISC architectures, in: Journal of Real-Time Image Processing, 2011, vol. 6, no 2, pp. 117-135. -
121F. Laguzet, M. Gouiffès, L. Lacassagne.
Automatic color space switching for robust tracking, in: IEEE International Conference on Signal and Image Processing Applications (ICSIPA), 2011. -
122F. Laguzet, A. Romero, M. Gouiffès, L. Lacassagne, D. Etiemble.
Color tracking with contextual switching: Real-time implementation on CPU, in: Journal of Real-Time Image Processing, 2013, pp. 1-18. -
123J. Lambert, L. Lacassagne, G. Rougeron, S. L. Berre, S. Chatillon.
High Performance simulation of ultrasonic fields for Non Destructive Testing, in: International Symposium in Nuclear Application and Monte-Carlo, 2013. -
124J. Lambert, A. Pédron, G. Gens, F. Bimbard, L. Lacassagne, E. Iakovleva, S. L. Berre.
Analysis of multicore CPU and GPU toward parallelization of Total Focusing Method ultrasound reconstruction, in: IEEE International Conference on Design and Architectures for Signal and Image Processing (DASIP), 2012. -
125J. Lambert, G. Rougeron, L. Lacassagne, S. Chatillon.
A fast ultrasonic simulation tool based on massively parallel implementations, in: Review of Progress of Quantitative Nondestructive Evaluation, 2013. -
126Q. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. Corrado, J. Dean, A. Ng.
Building high-level features using large scale unsupervised learning, in: International Conference in Machine Learning, 2012. -
127W. Ledermann, C. Alexander, D. Ledermann.
Random Orthogonal Matrix Simulation, in: Linear Algebra Appl., 2011, vol. 434, no 6, pp. 1444-1467. [ DOI : 10.1016/j.laa.2010.10.023 ] -
128A. Leite, C. Tadonki, C. Eisenbeis, A. de Melo.
A Fine-grained Approach for Power Consumption Analysis and Prediction, in: Procedia Computer Science, 2014, vol. 29, pp. 2260–2271. -
129S. Liu, C. Eisenbeis, J.-L. Gaudiot.
A theoretical framework for value prediction in parallel systems, in: Parallel Processing (ICPP), 2010 39th International Conference on, IEEE, 2010, pp. 11–20. -
130M. W. Mahoney.
Randomized algorithms for matrices and data, in: Foundations and Trends in Machine Learning, 2011, vol. 3, no 2, pp. 123–224. -
131D. Menard, R. Serizel, R. Rocher, O. Sentieys.
Accuracy Constraint Determination in Fixed-Point System Design, in: Journal on Embedded Systems (JES),, 2008, vol. 2008, pp. 1-12. [ DOI : 10.1155/2008/242584 ] -
132P. Monasse, F. Guichard.
Fast computation of contrast-onvariant image representation, in: Transaction on, 2000, vol. 9,5, pp. 860-872. -
133S. Moufawad.
Demmel type communication-avoiding generalized minimal residual method (CA-GMRES) on multicore hardwares: an application in QCD, American university of Beirut, Beirut, Libanon, june 2011, defended on 2010, June 10th. -
134M. Odersky.
An Overview of the SCALA Programming Language, EPFL Lausanne, Switzerland, 2004, no IC/2004/64. -
135D. S. Parker.
Random Butterfly Transformations with Applications in Computational Linear Algebra, Computer Science Department, UCLA, 1995, no CSD-950023. -
136D. Petcu.
Consuming Resources and Services from Multiple Clouds, in: Journal of Grid Computing, 2014, pp. 1–25. -
137M. Pharr, W. R. Mark.
ISPC: A SPMD Compiler for High-Performance CPU Programming, in: Innovative Parallel Computing (InPar), 2012. -
138S. Piskorski, L. Lacassagne, D. Etiemble.
IPLG: un outil pour la fusion d'opérateurs en Traitement d'Images, in: SYMPA, 2009. -
139S. Piskorski, L. Lacassagne, M. Kieffer, D. Etiemble.
Efficient floating point interval processing for embedded systems and applications, in: SCAN - International Symposium of Scientific computing, Computer Arithmetic and Validated Numerics, 2006, 2006 p. -
140S. Pop, A. Cohen, C. Bastoul, S. Girbal, G. A. Silber, N. Vasilache.
GRAPHITE: Loop optimizations based on the polyhedral model for GCC, in: Proc. of the 4th GCC Developper's Summit, June 2006, pp. 179–198. -
141A. Pédron, L. Lacassagne, V. Barbillon, F. Bimbard, G. Rougeron, S. L. Berre.
Performance analysis of an ultrasound reconstruction algorithm for non destructuve testing, in: IEEE International Conference on Parallel Computing (ParCo), 2011. -
142A. Pédron, L. Lacassagne, F. Bimbard, S. L. Berre.
Parallelization of an ultrasound reconstruction algorithm for non destructive testing on multicore CPU and GPU, in: IEEE International Conference on Design and Architectures for Signal and Image Processing (DASIP), 2011. -
143A. Romero, M. Gouiffès, L. Lacassagne.
Feature Points tracking adaptative to Saturation, in: IEEE International Conference on Signal and Image Processing Applications (ICSIPA), 2011. -
144A. Romero, M. Gouiffès, L. Lacassagne.
Covariance Descriptor Multiple Object Tracking and Re-Identification with Colorspace Evaluation, in: IEEE ACCV - Workshop on Detection and Tracking in Challenging Environnements, 2012. -
145A. Romero, M. Gouiffès, L. Lacassagne.
Enhanced Local Binary Covariance Matrices (ELBCM) for texture analysis and object tracking, in: ACM International Conference on Computer Vision / Computer Graphics Collaboration Techniques and Applications, 2013. -
146A. Romero, L. Lacassagne, M. Gouiffès.
Real-time covariance tracking algorithm for embedded systems, in: IEEE International Conference on Design and Architectures for Signal and Image Processing (DASIP), 2013. -
147A. Rosenfeld, J. Platz.
Sequential operator in digital pictures processing, in: Journal of ACM, 1966, vol. 13,4, pp. 471-494. -
148A. Rémy, M. Baboulin, M. Sosonkina, B. Rozoy.
Locality optimization on a NUMA architecture for hybrid LU factorization, in: International Conference on Parallel Computing (PARCO 2013), Advances in Parallel Computing, IOS Press, 2014, vol. 25, pp. 153-162. -
149T. Saidani, J. Falcou, C. Tadonki, L. Lacassagne, D. Etiemble.
Algorithmic Skeletons within an Embedded Domain Specific Language for the Cell Processor, in: Parallel Architectures and Compilation Techniques, PACT, 2009, pp. 67-76. -
150T. Saidani, L. Lacassagne, S. Bouaziz, T. M. Khan.
Parallelization Strategies for the Points of Interests Algorithm on the Cell Processor, in: Lecture Notes in Computer Science, Springer, 2007, pp. 104-112. [ DOI : 10.1007/978-3-540-74742-0 ] -
151T. Saidani, S. Piskorski, L. Lacassagne, S. Bouaziz.
Parallelization Schemes for Memory Optimization on the Cell Processor: A Case Study of Image Processing Algorithm, in: PACT-MEDEA, 2007, pp. 15-19. -
152C. Sanderson.
Armadillo: An open source C++ linear algebra library for fast prototyping and computationally intensive experiments, in: Report Version, 2010, vol. 2. -
153J. Siek, L.-Q. Lee, A. Lumsdaine.
Boost Random Number Library, June 2000.
http://www.boost.org/libs/graph/ -
154D. Spinellis.
Notable design patterns for domain-specific languages, in: Journal of Systems and Software, 2001, vol. 56, no 1, pp. 91 - 99. [ DOI : 10.1016/S0164-1212(00)00089-3 ]
http://www.sciencedirect.com/science/article/pii/S0164121200000893 -
155G. W. Stewart.
The Efficient Generation of Random Orthogonal Matrices With an Application to Condition Estimators, in: SIAM J. Numer. Anal., 1980, vol. 17, no 3, pp. 403-409. -
156A. K. Sujeeth, A. Gibbons, K. J. Brown, H. Lee, T. Rompf, M. Odersky, K. Olukotun.
Forge: Generating a High Performance DSL Implementation from a Declarative Specification, in: 12th International Conference on Generative Programming: Concepts and Experiences, 2013. -
157A. K. Sujeeth, T. Rompf, K. J. Brown, H. Lee, H. Chafi, V. Popic, M. Wu, A. Prokopec, V. Jovanovic, M. Odersky, K. Olukotun.
Composition and Reuse with Compiled Domain-Specific Languages, in: ECOOP'13: European Conference on Object-Oriented Programming, 2013. -
158V. Sundriyal, M. Sosonkina, A. Gaenko, Z. Zhang.
Energy saving strategies for parallel applications with point-to-point communication phases, in: Journal of Parallel and Distributed Computing, 2013. [ DOI : 10.1016/j.jpdc.2013.03.006 ] -
159V. Sundriyal, M. Sosonkina, Z. Zhang.
Automatic runtime frequency-scaling system for energy savings in parallel applications, in: The Journal of Supercomputing, 2014, vol. 68, no 2, pp. 777–797. -
160K. Suzuki, I. Horiba, N. Sugie.
Linear-time connected component labeling based on sequential local operations, in: Computer Vision and Image Understanding, january 2003, vol. 89, no 1, pp. 1-23. [ DOI : 10.1016/S1077-3142(02)00030-9 ] -
161H. Tabia, M. Gouiffès, L. Lacassagne.
Motion histogram quantification for human action recognition, in: IEEE International Conference on Pattern Recognition (ICPR), 2012. -
162H. Tabia, M. Gouiffès, L. Lacassagne.
Motion modeling for abnormal event detection in crowd scenes, in: IEEE International Conference on Pattern Recognition (ISCIVC), 2012. -
163C. Tadonki, L. Lacassagne, T. Saïdani, J. Falcou, K. Hamidouche.
The Harris algorithm revisited on the Cell processor, in: International Workshop on Highly-Efficient Accelerators and Reconfigurable Technologies (HEART), 2010. -
164S. Tomov, J. Dongarra, M. Baboulin.
Towards dense linear algebra for hybrid GPU accelerated manycore systems, in: Parallel Computing, 2010, vol. 36, no 5&6, pp. 232–240. -
165University of Tennessee.
PLASMA Users' Guide, Parallel Linear Algebra Software for Multicore Architectures, Version 2.3, 2010. -
166T. L. Veldhuizen.
Active Libraries and Universal Languages, Indiana University Computer Science, May 2004.
http://www.ubietylab.net/ubigraph/content/Papers/pdf/VeldhuizenThesis.pdf -
167H. Wang, H. Andrade, B. Gedik, K.-L. Wu.
A Code Generation Approach for Auto-Vectorization in the Spade Compiler, in: LCPC'09, 2009, pp. 383-390. -
168Y. Wang, M. Baboulin, J. Dongarra, J. Falcou, Y. Fraigneau, O. L. Maître.
A parallel solver for incompressible fluid flows, in: International Conference on Computational Science (ICCS 2013), Procedia Computer Science, Elsevier, 2013, vol. 18, pp. 439–448. -
169Y. Wang, M. Baboulin, K. Rupp, O. Le Maître, Y. Fraigneau.
Solving 3D Incompressible Navier-Stokes Equations on Hybrid CPU/GPU Systems, in: Proceedings of the High Performance Computing Symposium, San Diego, CA, USA, HPC '14, Society for Computer Simulation International, 2014, pp. 12:1–12:8.
http://dl.acm.org/citation.cfm?id=2663510.2663522 -
170H. Ye, L. Lacassagne, D. Etiemble, L. Cabaret, J. Falcou, O. Florent.
Impact of High Level Transforms on High Level Synthesis for motion detection algorithm, in: IEEE International Conference on Design and Architectures for Signal and Image Processing (DASIP), 2012, pp. 1-8. -
171H. Ye, L. Lacassagne, J. Falcou, D. Etiemble, L. Cabaret, O. Florent.
High Level Transforms to reduce energy consumption of signal and image processing operators, in: IEEE International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS), 2013, pp. 247-254.