Bibliography
Major publications by the team in recent years
-
1M. Baboulin, D. Becker, J. Dongarra.
A Parallel Tiled Solver for Dense Symmetric Indefinite Systems on Multicore Architectures, in: Proceedings of IEEE International Parallel & Distributed Processing Symposium (IPDPS 2012), 2012, pp. 14-24. -
2M. Baboulin, S. Donfack, J. Dongarra, L. Grigori, A. Rémy, S. Tomov.
A class of communication-avoiding algorithms for solving general dense linear systems on CPU/GPU parallel machines, in: International Conference on Computational Science (ICCS 2012), Procedia Computer Science, Elsevier, 2012, vol. 9, pp. 17–26. -
3M. Baboulin, J. Dongarra, J. Herrmann, S. Tomov.
Accelerating linear system solutions using randomization techniques, in: ACM Trans. Math. Softw., 2013, vol. 39, no 2. -
4M. Baboulin, S. Gratton.
A contribution to the conditioning of the total least squares problem, in: SIAM J. Matrix Anal. and Appl., 2011, vol. 32, no 3, pp. 685–699. -
5M. Bahi, C. Eisenbeis.
Impact of Reverse Computing on Information Locality in Register Allocation for High Performance Computing, in: International Journal of Parallel Programming, 2012, pp. 1–28. -
6D. Barthou, O. Brand-Foissac, O. Pene, G. Grosdidier, R. Dolbeau, C. Eisenbeis, M. Kruse, K. Petrov, C. Tadonki.
Automated Code Generation for Lattice Quantum Chromodynamics and beyond, in: Journal of Physics: Conference Series, 2014, vol. 510, LPT-Orsay-13-142 article nb. 012005. [ DOI : 10.1088/1742-6596/510/1/012005 ]
http://hal.inria.fr/hal-00926513 -
7P. Esterie, J. Falcou, M. Gaunard, J.-T. Lapresté, L. Lacassagne.
The Numerical Template toolbox: A Modern C++ Design for Scientific Computing, in: Journal of Parallel and Distributed Computing, July 2014. [ DOI : 10.1016/j.jpdc.2014.07.002 ]
https://hal.inria.fr/hal-01061305 -
8P. Esterie, M. Gaunard, J. Falcou, J.-T. Lapresté.
Exploiting Multimedia Extensions in C++: A Portable Approach, in: Computing in Science & Engineering, 2012, vol. 14, no 5, pp. 72–77. -
9A. Ferreira Leite.
A User-Centered and Autonomic Multi-Cloud Architecture for High Performance Computing Applications, Paris-Sud XI ; Universidade de Brasília, December 2014.
https://hal.inria.fr/tel-01097295 -
10G. Fursin, Y. Kashnikov, A. W. Memon, Z. Chamski, O. Temam, M. Namolaru, E. Yom-Tov, B. Mendelson, A. Zaks, E. Courtois, F. Bodin, P. Barnard, E. Ashton, E. Bonilla, J. Thomson, C. Williams, M. F. P. O'Boyle.
Milepost GCC: Machine Learning Enabled Self-tuning Compiler, in: International Journal of Parallel Programming, 2011, vol. 39, pp. 296-327, 10.1007/s10766-010-0161-2. -
11M. Kruse.
Lattice QCD Optimization and Polytopic Representations of Distributed Memory, Paris-Sud XI, September 2014.
https://hal.inria.fr/tel-01078440 -
12S. Tomov, J. Dongarra, M. Baboulin.
Towards dense linear algebra for hybrid GPU accelerated manycore systems, in: Parallel Computing, 2010, vol. 36, no 5&6, pp. 232–240.
Doctoral Dissertations and Habilitation Theses
-
13A. Rémy.
Solving dense linear systems on accelerated multicore architectures, Université Paris-Sud, July 2015.
https://hal.inria.fr/tel-01206837
International Conferences with Proceedings
-
14M. Baboulin, J. Dongarra, A. Rémy, S. Tomov, I. Yamazaki.
Dense Symmetric Indefinite Factorization on GPU Accelerated Architectures, in: 11th International Conference on Parallel Processing and Applied Mathematics (PPAM 2015), Krakow, Poland, Lecture Notes in Computer Science, September 2015.
https://hal.inria.fr/hal-01223022 -
15M. Baboulin, A. Jamal, M. Sosonkina.
Using Random Butterfly Transformations in Parallel Schur Complement-Based Preconditioning, in: 8th Workshop on Computer Aspects of Numerical Algorithms (CANA'15), Lodz, Poland, September 2015.
https://hal.inria.fr/hal-01223090 -
16M. Baboulin, A. Khabou, A. Rémy.
A Randomized LU-based Solver Using GPU and Intel Xeon Phi Accelerators, in: HeteroPar'2015, Vienna, Austria, August 2015.
https://hal.inria.fr/hal-01223018 -
17L. Bagnères, O. Zinenko, S. Huot, C. Bastoul.
Opening Polyhedral Compiler's Black Box, in: CGO 2016 - 14th Annual IEEE/ACM International Symposium on Code Generation and Optimization, Barcelona, Spain, March 2016.
https://hal.inria.fr/hal-01253322 -
18A. Ferreira Leite, V. C. Alves, G. Nunes Rodrigues, C. Tadonki, C. Eisenbeis, A. C. Magalhaes Alves de Melo.
Automating Resource Selection and Configuration in Inter-clouds through a Software Product Line Method, in: 8th International Conference on Cloud Computing (CLOUD), 2015 IEEE, New York City, United States, July 2015, pp. 726-733. [ DOI : 10.1109/CLOUD.2015.101 ]
https://hal-mines-paristech.archives-ouvertes.fr/hal-01252985 -
19G. W. Howell, M. Baboulin.
LU Preconditioning for Overdetermined Sparse Least Squares Problems, in: 11th International Conference on Parallel Processing and Applied Mathematics (PPAM 2015), Krakow, Poland, Lecture Notes in Computer Science, September 2015.
https://hal.inria.fr/hal-01223069 -
20L. Lacassagne, L. Cabaret, D. Etiemble.
Parallel light speed labeling: the world’s fastest connected component labeling for multicore processors, in: International Conference on Image Processing, Quebec, Canada, IEEE, September 2015, 8 p.
https://hal.inria.fr/hal-01243310 -
21I. Masliah, M. Baboulin, J. Falcou.
Metaprogramming dense linear algebra solvers. Applications to multi and many-core architectures, in: 13th IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA 2015), Helsinki, Finland, August 2015.
https://hal.inria.fr/hal-01221358
Scientific Books (or Scientific Book chapters)
-
22M. Baboulin, J. Dongarra, R. Lacroix.
Computing least squares condition numbers on hybrid multicore/GPU systems, in: Interdisciplinary Topics in Applied Mathematics, Modeling and Computational Science, Springer International Publishing, 2015, vol. 117. [ DOI : 10.1007/978-3-319-12307-3_6 ]
https://hal.inria.fr/hal-01204804
Internal Reports
-
23M. Baboulin, J. Falcou, I. Masliah.
Meta-programming and Multi-stage Programming for GPGPUs, Inria Saclay Ile de France ; Paris-Sud XI, September 2015, no RR-8780.
https://hal.inria.fr/hal-01204661
Other Publications
-
24M. Baboulin, V. Dobrev, J. Dongarra, C. Earl, J. Falcou, A. Haidar, I. Karlin, T. Kolev, I. Masliah, S. Tomov.
Towards a High-Performance Tensor Algebra Package for Accelerators, August 2015, Smoky Mountains Computational Sciences and Engineering Conference (SMC 2015), Poster.
https://hal.archives-ouvertes.fr/hal-01231234
-
25The HiPEAC vision on high-performance and embedded architecture and compilation (2012-2020), 2012.
http://www.hipeac.net/roadmap -
26European Union Framework Program 6 MILEPOST project No 035307 (MachIne Learning for Embedded PrOgramS opTimization).
http://cordis.europa.eu/project/rcn/79763_en.html -
27PRACE: Partnership for Advanced Computing in Europe.
http://www.prace-project.eu -
28AMD.
AMD Core Math Library.
http://developer.amd.com/libraries/acml/ -
29E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. D. Croz, A. Greenbaum, S. Hammarling, A. McKenney, D. Sorensen.
LAPACK Users' Guide, SIAM, 1999, Third edition. -
30K. Aneja, F. Laguzet, L. Lacassagne, A. Merigot.
Video rate image segmentation by means of region splitting and merging, in: IEEE International Conference on Signal and Image Processing Applications (ICSIPA), 2009. -
31M. Arioli, M. Baboulin, S. Gratton.
A partial condition number for linear least-squares problems, in: SIAM J. Matrix Anal. and Appl., 2007, vol. 29, no 2, pp. 413–433. -
32K. Asanovic.
The landscape of parallel computing research: a view from Berkeley, Electrical Engineering and Computer Sciences, University of California at Berkeley, December 2006, no UCB/EECS-2006-183.
http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.pdf -
33A. Avron, P. Maymounkov, S. Toledo.
Blendenpick: Supercharging LAPACK’s least-squares solvers, in: SIAM J. Sci. Comput., 2010, vol. 32, pp. 1217–1236. -
34M. Baboulin, D. Becker, G. Bosilca, A. Danalis, J. Dongarra.
An efficient distributed randomized algorithm for solving large dense symmetric indefinite linear systems, in: Parallel Computing, 2014, vol. 40, no 7, pp. 213–223. -
35M. Baboulin, D. Becker, J. Dongarra.
A Parallel Tiled Solver for Dense Symmetric Indefinite Systems on Multicore Architectures, in: Proceedings of IEEE International Parallel & Distributed Processing Symposium (IPDPS 2012), 2012, pp. 14-24. -
36M. Baboulin, A. Buttari, J. Dongarra, J. Kurzak, J. Langou, J. Langou, P. Luszczek, S. Tomov.
Accelerating scientific computations with mixed precision algorithms, in: Computer Physics Communications, 2009, vol. 180, no 12, pp. 2526–2533. -
37M. Baboulin, S. Donfack, J. Dongarra, L. Grigori, A. Rémy, S. Tomov.
A class of communication-avoiding algorithms for solving general dense linear systems on CPU/GPU parallel machines, in: International Conference on Computational Science (ICCS 2012), Procedia Computer Science, Elsevier, 2012, vol. 9, pp. 17–26. -
38M. Baboulin, J. Dongarra, J. Demmel, S. Tomov, V. Volkov.
Enhancing the performance of dense linear algebra solvers on GPUs in the MAGMA project, November 15, 2008.
http://www.lri.fr/~baboulin/SC08.pdf -
39M. Baboulin, J. Dongarra, S. Gratton, J. Langou.
Computing the conditioning of the components of a linear least squares solution, in: Numerical Linear Algebra with Applications, 2009, vol. 16, no 7, pp. 517–533. -
40M. Baboulin, J. Dongarra, J. Herrmann, S. Tomov.
Accelerating linear system solutions using randomization techniques, in: ACM Trans. Math. Softw., 2013, vol. 39, no 2. -
41M. Baboulin, J. Dongarra, R. Lacroix.
Computing least squares condition numbers on hybrid multicore/GPU systems, in: Proceedings of the International Conference of Applied Mathematics, Modeling and Computational Science (AMMCS 2013), 2013. -
42M. Baboulin, J. Dongarra, S. Tomov.
Some Issues in Dense Linear Algebra for Multicore and Special Purpose Architectures, in: 9th International Workshop on State-of-the-Art in Scientific and Parallel Computing (PARA'08), Lecture Notes in Computer Science, Springer-Verlag, 2008, vol. 6126-6127. -
43M. Baboulin, S. Gratton.
A contribution to the conditioning of the total least squares problem, in: SIAM J. Matrix Anal. and Appl., 2011, vol. 32, no 3, pp. 685–699. -
44M. Baboulin, S. Gratton, R. Lacroix, A. J. Laub.
Statistical estimates for the conditioning of linear least squares problems, in: 10th International Conference on Parallel Processing and Applied Mathematics (PPAM 2013), Heidelberg, R. Wyrzykowski (editor), Lecture Notes in Computer Science, Springer-Verlag, 2014, vol. 8384, pp. 124-133. -
45M. Baboulin, X. S. Li, F.-H. Rouet.
Using Random Butterfly Transformations to Avoid Pivoting in Sparse Direct Methods, in: Proceedings of VECPAR 2014, 2014. -
46J. C. Baez, M. Stay.
Algorithmic thermodynamics, in: Mathematical Structures in Computer Science, 2012, vol. 22, no 5, pp. 771–787.
http://dx.doi.org/10.1017/S0960129511000521 -
47M. Bahi, C. Eisenbeis.
Spatial complexity of reversibly computable DAG, in: Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems, ACM, 2009, pp. 47–56. -
48M. Bahi, C. Eisenbeis.
Impact of Reverse Computing on Information Locality in Register Allocation for High Performance Computing, in: International Journal of Parallel Programming, 2012, pp. 1–28. -
49D. Barthou, O. Brand-Foissac, O. Pene, G. Grosdidier, R. Dolbeau, C. Eisenbeis, M. Kruse, K. Petrov, C. Tadonki.
Automated Code Generation for Lattice Quantum Chromodynamics and beyond, in: Journal of Physics: Conference Series, 2014, vol. 510, 012005, LPT-Orsay-13-142. [ DOI : 10.1088/1742-6596/510/1/012005 ]
http://hal.inria.fr/hal-00926513 -
50D. Barthou, G. Grosdidier, C. Eisenbeis, P. Guichon, M. Kruse, O. Pene, K. Petrov, C. Tadonki.
PetaQCD: En Route for the automatic code generation for lattice QCD, in: Proceedings of the 29th International Symposium on Lattice field theory (Lattice 2011), 2011, vol. 2011. -
51P. Basu, S. Williams, B. V. Straalen, A. Venkat, L. Oliker, M. Hall.
Compiler Generation and Autotuning of Communication-Avoiding Operators for Geometric Multigrid, in: High Performance Computing Conference (HiPC), december 2013. -
52D. Becker, M. Baboulin, J. Dongarra.
Reducing the amount of pivoting in symmetric indefinite systems, in: 9th International Conference on Parallel Processing and Applied Mathematics (PPAM 2011), Heidelberg, R. Wyrzykowski (editor), Lecture Notes in Computer Science, Springer-Verlag, 2012, vol. 7203, pp. 133–142. -
53T. Betcke, N. J. Higham, V. Mehrmann, C. Schröder, F. Tisseur.
NLEVP: A Collection of Nonlinear Eigenvalue Problems, in: ACM Trans. Math. Software, February 2013, vol. 39, no 2, pp. 7:1-7:28. [ DOI : 0.1145/2427023.2427024 ] -
54L. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, R. Whaley.
ScaLAPACK Users' Guide, SIAM, 1997, pp. 58–60. -
55Blaze.
The Blaze Library, 2014.
https://bitbucket.org/blaze-lib/blaze -
56G. Bradski.
The OpenCV Library, in: Dr. Dobb's Journal of Software Tools, 2000. -
57L. Cabaret, L. Lacassagne.
A Review of Worlds Fastest Connected Component Labeling Algorithms : Speed and Energy Estimation, in: IEEE International Conference on Design and Architectures for Signal and Image Processing (DASIP), 2014, pp. 1-8. -
58L. Cabaret, L. Lacassagne.
What is the world fastest Connected Component Labeling Algorithm ?, in: IEEE International Workshop on Signal Processing Systems (SiPS), 2014, pp. 1-6. -
59V. G. Cerf.
Where is the science in computer science?, in: Communications of the ACM, 2012, vol. 55, no 10, pp. 5-5. -
60M. O. Cheema, L. Lacassagne, O. Hammami.
System-Platforms-Based SystemC TLM Design of Image Processing Chains for Embedded Applications, in: EURASIP Journal on Embedded Systems, 2007, pp. 1-14. [ DOI : 10.1155/2007/71043 ] -
61P. Courbin, A. Pédron, T. Saidani, L. Lacassagne.
Parallélisation d'opérateurs de TI: multi-coeurs, Cell ou GPU ?, in: GRETSI, 2009. -
62K. Czarnecki, U. W. Eisenecker, R. Glück, D. Vandevoorde, T. L. Veldhuizen.
Generative Programming and Active Libraries, in: Generic Programming, 1998, pp. 25-39. -
63P. I. Davies, N. J. Higham.
Numerically Stable Generation of Correlation Matrices and their Factors, in: BIT, 2000, vol. 40, no 4, pp. 640-651. -
64J. W. Demmel, L. Grigori, M. Hoemmen, J. Langou.
Communication-optimal parallel and sequential QR and LU factorizations, in: SIAM Journal on Scientific Computing, 2012, vol. 34, no 1, pp. 206–239. -
65J. W. Demmel, A. McKenney.
A Test Matrix Generation Suite, Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, USA, March 1989, no MCS-P69-0389, 16 p, LAPACK Working Note 9. -
66J. Dongarra et.al..
The International Exascale Software Project roadmap, in: Int. J. High Perform. Comput. Appl., February 2011, vol. 25, no 1, pp. 3–60.
http://dx.doi.org/10.1177/1094342010391989 -
67A. Elouardi, S. Bouaziz, A. Dupret, L. Lacassagne, J. Klein, R. Reynaud.
A smart sensor based vision system: implementation and evaluation, in: Journal of Applied Physics, 2006, vol. 39, pp. 1694-1705. [ DOI : 10.1088/0022-3727/39/8/033 ] -
68A. Elouardi, S. Bouaziz, A. Dupret, L. Lacassagne, J. Klein, R. Reynaud.
A Smart Architecture for Low-Level Image Computing, in: International Journal of Computer Sciences and Application, 2008, vol. 5,3, pp. 1-19. -
69P. Esterie, J. Falcou, M. Gaunard, J.-T. Lapresté, L. Lacassagne.
The numerical template toolbox: A modern C++ design for scientific computing, in: Journal of Parallel and Distributed Computing, 2014. -
70P. Esterie, M. Gaunard, J. Falcou, J.-T. Lapresté.
Exploiting Multimedia Extensions in C++: A Portable Approach, in: Computing in Science & Engineering, 2012, vol. 14, no 5, pp. 72–77. -
71P. Estérie, M. Gaunard, J. Falcou.
A proposal to add single instruction multiple data computation to the standard library, in: N3561, 2013. -
72D. Etiemble, S. Piskorski, L. Lacassagne.
Performance evaluation of Altera C2H compiler on image processing benchmarks, in: TCHA: Workshop on Tools And Compiler for Hardware Acceleration, 2006. -
73J. Falcou, L. Lacassagne, S. Schaetz.
Cell MPI: Mastering the Cell Broadband Engine architecture through a Boost based parallel communication library, in: Boost Conference, 2011. -
74J. Falcou, T. Saidani, L. Lacassagne, D. Etiemble.
Programmation par squelettes algorithmiques pour le processeur Cell, in: SYMPA, 2008. -
75J. Falcou, J. Sérot, L. Pech, J.-T. Lapresté.
Meta-programming applied to automatic SMP parallelization of linear algebra code, in: Euro-Par 2008–Parallel Processing, Springer Berlin Heidelberg, 2008, pp. 729–738. -
76G. Fursin, C. Dubach.
Experience report: community-driven reviewing and validation of publications, in: Proceedings of the 1st Workshop on Reproducible Research Methodologies and New Publication Models in Computer Engineering (ACM SIGPLAN TRUST'14), ACM, 2014.
http://dx.doi.org/10.1145/2618137.2618142 -
77G. Fursin.
Collective Tuning Initiative: automating and accelerating development and optimization of computing systems, in: Proceedings of the GCC Developers' Summit, June 2009. -
78G. Fursin, Y. Kashnikov, A. W. Memon, Z. Chamski, O. Temam, M. Namolaru, E. Yom-Tov, B. Mendelson, A. Zaks, E. Courtois, F. Bodin, P. Barnard, E. Ashton, E. Bonilla, J. Thomson, C. Williams, M. F. P. O'Boyle.
Milepost GCC: Machine Learning Enabled Self-tuning Compiler, in: International Journal of Parallel Programming, 2011, vol. 39, pp. 296-327, 10.1007/s10766-010-0161-2. -
79G. Fursin, R. Miceli, A. Lokhmotov, M. Gerndt, M. Baboulin, A. D. Malony, Z. Chamski, D. Novillo, D. D. Vento.
Collective Mind: towards practical and collaborative auto-tuning, in: Special issue on Automatic Performance Tuning for HPC Architectures, Scientific Programming Journal, 2014. -
80M. Gouiffès, F. Laguzet, L. Lacassagne.
Color Connectedness Degree For Mean-Shift Tracking, in: IEEE International Conference on Pattern Recognition (ICPR), 2010. -
81M. Gouiffès, F. Laguzet, L. Lacassagne.
Projection Histogram For Mean-Shift Tracking, in: IEEE International Conference on Image Processing (ICIP), 2010. -
82C. Grana, D. Borghesani, R. Cucchiara.
Connected Component Labeling Techniques on Modern Architectures, in: ICIAP, IEEE, 2009, pp. 816-824. -
83L. Grigori, J. Demmel, H. Xiang.
CALU: a communication optimal LU factorization algorithm, in: SIAM J. Matrix Anal. and Appl., 2011, vol. 32, pp. 1317-1350. -
84M. Gu, S. C. Eisenstat.
Efficient Algorithms for Computing a Strong Rank-revealing QR Factorization, in: SIAM Journal on Scientific Computing, July 1996, vol. 17, no 4, pp. 848–869.
http://dx.doi.org/10.1137/0917055 -
85S. Guelton, J. Falcou, P. Brunet.
Exploring the vectorization of python constructs using pythran and boost SIMD, in: Proceedings of the 2014 Workshop on Workshop on programming models for SIMD/Vector processing, ACM, 2014, pp. 79–86. -
86G. Guennebaud, B. Jacob.
Eigen v3, 2010.
http://eigen.tuxfamily.org -
87N. Halko, P. G. Martinsson, J. A. Tropp.
Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions, in: SIAM Review, 2011, vol. 53, pp. 217–288. -
88C. Harris, M. Stephens.
A combined corner and edge detector, in: 4th ALVEY Vision Conference, Editions Hermes, Paris, 1988. -
89L. He, Y. Chao, K. Suzuki.
A run-based two-scan labeling algorithm, in: ICIAR, LNCS 4633, 2007, pp. 131-142. -
90R. M. Heiberger.
Algorithm AS 127: Generation of Random Orthogonal Matrices, in: J. Roy. Statist. Soc. Ser. C (Applied Statistics), 1978, vol. 27, no 2, pp. 199-206. -
91N. J. Higham.
-Orthogonal Matrices: Properties and Generation, in: SIAM Rev., September 2003, vol. 45, no 3, pp. 504-519. [ DOI : 10.1137/S0036144502414930 ] -
92G. E. Hinton, S. Osindero.
A fast learning algorithm for deep belief nets, in: Neural Computation, 2006, vol. 18. -
93S. Horowitz, T. Pavlidis.
Picture segmentation by a tree traversal algorithm, in: Journal of the ACM, 1976, vol. 23, pp. 368-388. -
94T. Ikegami, T. Sakurai, U. Nagashima.
A filter diagonalization for generalized eigenvalue problems based on the Sakurai-Sugiura projection method, in: Journal of Computational and Applied Mathematics, 2010, vol. 233, no 8, pp. 1927–1936. -
95Intel.
Math Kernel Library.
http://developer.intel.com/software/products/mkl/ -
96V. Jimenez, I. Gelado, L. Vilanova, M. Gil, G. Fursin, N. Navarro.
Predictive runtime code scheduling for heterogeneous architectures, in: Proceedings of the International Conference on High Performance Embedded Architectures & Compilers (HiPEAC 2009), January 2009. -
97C. S. Kenney, A. J. Laub.
Small-sample statistical condition estimates for general matrix functions, in: SIAM J. Sci. Comput., 1994, vol. 15, pp. 36–61. -
98A. Khabou, J. Demmel, L. Grigori, M. Gu.
LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version, in: SIAM Journal on Matrix Analysis and Applications, 2013, vol. 34, no 3, pp. 1401-1429.
http://epubs.siam.org/doi/abs/10.1137/120863691 -
99M. Kruse.
Lattice QCD Optimization and Polytopic Representations of Distributed Memory, Université Paris-Sud 11, September, 26th 2014. -
100T. Kunlin, L. Lacassagne, A. Mérigot.
A Fast image segmentation scheme, in: International Conference on Information and Communication Technologies, IEEE, 2004. -
101L. Lacassagne, D. Etiemble, A. Hassan Zahraee, A. Dominguez, P. Vezolle.
High Level Transforms for SIMD and low-level computer vision algorithms, in: ACM Workshop on Programming Models for SIMD/Vector Processing (PPoPP), 2014, pp. 49-56. -
102L. Lacassagne, D. Etiemble, S. Kablia.
16-bit Floating Point Instructions for embedded Multimedia Applications, in: CAMP: Computer Architecture and Machine Perception, IEEE, 2005. -
103L. Lacassagne, D. Etiemble.
16-bit floating point operations for low-end and high-end embedded processors, in: ODES: Optimizations for DSP and Embedded Systems, IEEE/ACM, 2005. -
104L. Lacassagne, A. Manzanera, J. Denoulet, A. Mérigot.
High Performance Motion Detection: Some trends toward new embedded architectures for vision systems, in: Journal of Real Time Image Processing, october 2008, pp. 127-148. [ DOI : 10.1007/s11554-008-0096-7 ] -
105L. Lacassagne, A. B. Zavidovique.
Light Speed Labeling for RISC architectures, in: IEEE International Conference on Image Analysis and Processing (ICIP), 2009. -
106L. Lacassagne, B. Zavidovique.
Light Speed Labeling: efficient connected component labeling on RISC architectures, in: Journal of Real-Time Image Processing, 2011, vol. 6, no 2, pp. 117-135. -
107F. Laguzet, M. Gouiffès, L. Lacassagne.
Automatic color space switching for robust tracking, in: IEEE International Conference on Signal and Image Processing Applications (ICSIPA), 2011. -
108F. Laguzet, A. Romero, M. Gouiffès, L. Lacassagne, D. Etiemble.
Color tracking with contextual switching: Real-time implementation on CPU, in: Journal of Real-Time Image Processing, 2013, pp. 1-18. -
109J. Lambert, L. Lacassagne, G. Rougeron, S. L. Berre, S. Chatillon.
High Performance simulation of ultrasonic fields for Non Destructive Testing, in: International Symposium in Nuclear Application and Monte-Carlo, 2013. -
110J. Lambert, A. Pédron, G. Gens, F. Bimbard, L. Lacassagne, E. Iakovleva, S. L. Berre.
Analysis of multicore CPU and GPU toward parallelization of Total Focusing Method ultrasound reconstruction, in: IEEE International Conference on Design and Architectures for Signal and Image Processing (DASIP), 2012. -
111J. Lambert, G. Rougeron, L. Lacassagne, S. Chatillon.
A fast ultrasonic simulation tool based on massively parallel implementations, in: Review of Progress of Quantitative Nondestructive Evaluation, 2013. -
112Q. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. Corrado, J. Dean, A. Ng.
Building high-level features using large scale unsupervised learning, in: International Conference in Machine Learning, 2012. -
113W. Ledermann, C. Alexander, D. Ledermann.
Random Orthogonal Matrix Simulation, in: Linear Algebra Appl., 2011, vol. 434, no 6, pp. 1444-1467. [ DOI : 10.1016/j.laa.2010.10.023 ] -
114A. Leite, C. Tadonki, C. Eisenbeis, A. de Melo.
A Fine-grained Approach for Power Consumption Analysis and Prediction, in: Procedia Computer Science, 2014, vol. 29, pp. 2260–2271. -
115S. Liu, C. Eisenbeis, J.-L. Gaudiot.
A theoretical framework for value prediction in parallel systems, in: Parallel Processing (ICPP), 2010 39th International Conference on, IEEE, 2010, pp. 11–20. -
116M. W. Mahoney.
Randomized algorithms for matrices and data, in: Foundations and Trends in Machine Learning, 2011, vol. 3, no 2, pp. 123–224. -
117D. Menard, R. Serizel, R. Rocher, O. Sentieys.
Accuracy Constraint Determination in Fixed-Point System Design, in: Journal on Embedded Systems (JES), 2008, vol. 2008, pp. 1-12. [ DOI : 10.1155/2008/242584 ] -
118P. Monasse, F. Guichard.
Fast computation of contrast-onvariant image representation, in: Transaction on, 2000, vol. 9,5, pp. 860-872. -
119S. Moufawad.
Demmel type communication-avoiding generalized minimal residual method (CA-GMRES) on multicore hardwares: an application in QCD, American university of Beirut, Beirut, Libanon, june 2011, defended on 2010, June 10th. -
120M. Odersky.
An Overview of the SCALA Programming Language, EPFL Lausanne, Switzerland, 2004, no IC/2004/64. -
121D. S. Parker.
Random Butterfly Transformations with Applications in Computational Linear Algebra, Computer Science Department, UCLA, 1995, no CSD-950023. -
122M. Pharr, W. R. Mark.
ISPC: A SPMD Compiler for High-Performance CPU Programming, in: Innovative Parallel Computing (InPar), 2012. -
123S. Piskorski, L. Lacassagne, D. Etiemble.
IPLG: un outil pour la fusion d'opérateurs en Traitement d'Images, in: SYMPA, 2009. -
124S. Piskorski, L. Lacassagne, M. Kieffer, D. Etiemble.
Efficient floating point interval processing for embedded systems and applications, in: SCAN - International Symposium of Scientific computing, Computer Arithmetic and Validated Numerics, 2006. -
125S. Pop, A. Cohen, C. Bastoul, S. Girbal, G. A. Silber, N. Vasilache.
GRAPHITE: Loop optimizations based on the polyhedral model for GCC, in: Proc. of the 4th GCC Developper's Summit, June 2006, pp. 179–198. -
126A. Pédron, L. Lacassagne, V. Barbillon, F. Bimbard, G. Rougeron, S. L. Berre.
Performance analysis of an ultrasound reconstruction algorithm for non destructuve testing, in: IEEE International Conference on Parallel Computing (ParCo), 2011. -
127A. Pédron, L. Lacassagne, F. Bimbard, S. L. Berre.
Parallelization of an ultrasound reconstruction algorithm for non destructive testing on multicore CPU and GPU, in: IEEE International Conference on Design and Architectures for Signal and Image Processing (DASIP), 2011. -
128A. Romero, M. Gouiffès, L. Lacassagne.
Feature Points tracking adaptative to Saturation, in: IEEE International Conference on Signal and Image Processing Applications (ICSIPA), 2011. -
129A. Romero, M. Gouiffès, L. Lacassagne.
Covariance Descriptor Multiple Object Tracking and Re-Identification with Colorspace Evaluation, in: IEEE ACCV - Workshop on Detection and Tracking in Challenging Environnements, 2012. -
130A. Romero, M. Gouiffès, L. Lacassagne.
Enhanced Local Binary Covariance Matrices (ELBCM) for texture analysis and object tracking, in: ACM International Conference on Computer Vision / Computer Graphics Collaboration Techniques and Applications, 2013. -
131A. Romero, L. Lacassagne, M. Gouiffès.
Real-time covariance tracking algorithm for embedded systems, in: IEEE International Conference on Design and Architectures for Signal and Image Processing (DASIP), 2013. -
132A. Rosenfeld, J. Platz.
Sequential operator in digital pictures processing, in: Journal of ACM, 1966, vol. 13,4, pp. 471-494. -
133A. Rémy, M. Baboulin, M. Sosonkina, B. Rozoy.
Locality optimization on a NUMA architecture for hybrid LU factorization, in: International Conference on Parallel Computing (PARCO 2013), Advances in Parallel Computing, IOS Press, 2014, vol. 25, pp. 153-162. -
134T. Saidani, J. Falcou, C. Tadonki, L. Lacassagne, D. Etiemble.
Algorithmic Skeletons within an Embedded Domain Specific Language for the Cell Processor, in: Parallel Architectures and Compilation Techniques, PACT, 2009, pp. 67-76. -
135T. Saidani, L. Lacassagne, S. Bouaziz, T. M. Khan.
Parallelization Strategies for the Points of Interests Algorithm on the Cell Processor, in: Lecture Notes in Computer Science, Springer, 2007, pp. 104-112. [ DOI : 10.1007/978-3-540-74742-0 ] -
136T. Saidani, S. Piskorski, L. Lacassagne, S. Bouaziz.
Parallelization Schemes for Memory Optimization on the Cell Processor: A Case Study of Image Processing Algorithm, in: PACT-MEDEA, 2007, pp. 15-19. -
137C. Sanderson.
Armadillo: An open source C++ linear algebra library for fast prototyping and computationally intensive experiments, in: Report Version, 2010, vol. 2. -
138J. Siek, L.-Q. Lee, A. Lumsdaine.
Boost Random Number Library, June 2000.
http://www.boost.org/libs/graph/ -
139D. Spinellis.
Notable design patterns for domain-specific languages, in: Journal of Systems and Software, 2001, vol. 56, no 1, pp. 91 - 99. [ DOI : 10.1016/S0164-1212(00)00089-3 ]
http://www.sciencedirect.com/science/article/pii/S0164121200000893 -
140G. W. Stewart.
The Efficient Generation of Random Orthogonal Matrices With an Application to Condition Estimators, in: SIAM J. Numer. Anal., 1980, vol. 17, no 3, pp. 403-409. -
141A. K. Sujeeth, A. Gibbons, K. J. Brown, H. Lee, T. Rompf, M. Odersky, K. Olukotun.
Forge: Generating a High Performance DSL Implementation from a Declarative Specification, in: 12th International Conference on Generative Programming: Concepts and Experiences, 2013. -
142A. K. Sujeeth, T. Rompf, K. J. Brown, H. Lee, H. Chafi, V. Popic, M. Wu, A. Prokopec, V. Jovanovic, M. Odersky, K. Olukotun.
Composition and Reuse with Compiled Domain-Specific Languages, in: ECOOP'13: European Conference on Object-Oriented Programming, 2013. -
143V. Sundriyal, M. Sosonkina, A. Gaenko, Z. Zhang.
Energy saving strategies for parallel applications with point-to-point communication phases, in: Journal of Parallel and Distributed Computing, 2013. [ DOI : 10.1016/j.jpdc.2013.03.006 ] -
144V. Sundriyal, M. Sosonkina, Z. Zhang.
Automatic runtime frequency-scaling system for energy savings in parallel applications, in: The Journal of Supercomputing, 2014, vol. 68, no 2, pp. 777–797. -
145K. Suzuki, I. Horiba, N. Sugie.
Linear-time connected component labeling based on sequential local operations, in: Computer Vision and Image Understanding, january 2003, vol. 89, no 1, pp. 1-23. [ DOI : 10.1016/S1077-3142(02)00030-9 ] -
146H. Tabia, M. Gouiffès, L. Lacassagne.
Motion histogram quantification for human action recognition, in: IEEE International Conference on Pattern Recognition (ICPR), 2012. -
147H. Tabia, M. Gouiffès, L. Lacassagne.
Motion modeling for abnormal event detection in crowd scenes, in: IEEE International Conference on Pattern Recognition (ISCIVC), 2012. -
148C. Tadonki, L. Lacassagne, T. Saïdani, J. Falcou, K. Hamidouche.
The Harris algorithm revisited on the Cell processor, in: International Workshop on Highly-Efficient Accelerators and Reconfigurable Technologies (HEART), 2010. -
149S. Tomov, J. Dongarra, M. Baboulin.
Towards dense linear algebra for hybrid GPU accelerated manycore systems, in: Parallel Computing, 2010, vol. 36, no 5&6, pp. 232–240. -
150University of Tennessee.
PLASMA Users' Guide, Parallel Linear Algebra Software for Multicore Architectures, Version 2.3, 2010. -
151T. L. Veldhuizen.
Active Libraries and Universal Languages, Indiana University Computer Science, May 2004.
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.9.3916 -
152H. Wang, H. Andrade, B. Gedik, K.-L. Wu.
A Code Generation Approach for Auto-Vectorization in the Spade Compiler, in: LCPC'09, 2009, pp. 383-390. -
153Y. Wang, M. Baboulin, J. Dongarra, J. Falcou, Y. Fraigneau, O. L. Maître.
A parallel solver for incompressible fluid flows, in: International Conference on Computational Science (ICCS 2013), Procedia Computer Science, Elsevier, 2013, vol. 18, pp. 439–448. -
154Y. Wang, M. Baboulin, K. Rupp, O. Le Maître, Y. Fraigneau.
Solving 3D Incompressible Navier-Stokes Equations on Hybrid CPU/GPU Systems, in: Proceedings of the High Performance Computing Symposium, San Diego, CA, USA, HPC '14, Society for Computer Simulation International, 2014, pp. 12:1–12:8.
http://dl.acm.org/citation.cfm?id=2663510.2663522 -
155H. Ye, L. Lacassagne, D. Etiemble, L. Cabaret, J. Falcou, O. Florent.
Impact of High Level Transforms on High Level Synthesis for motion detection algorithm, in: IEEE International Conference on Design and Architectures for Signal and Image Processing (DASIP), 2012, pp. 1-8. -
156H. Ye, L. Lacassagne, J. Falcou, D. Etiemble, L. Cabaret, O. Florent.
High Level Transforms to reduce energy consumption of signal and image processing operators, in: IEEE International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS), 2013, pp. 247-254.