Bibliography

Major publications by the team in recent years

1M. Baboulin, D. Becker, J. Dongarra.

A Parallel Tiled Solver for Dense Symmetric Indefinite Systems on Multicore Architectures, in: Proceedings of IEEE International Parallel & Distributed Processing Symposium (IPDPS 2012), 2012, pp. 14-24.
2M. Baboulin, S. Donfack, J. Dongarra, L. Grigori, A. Rémy, S. Tomov.

A class of communication-avoiding algorithms for solving general dense linear systems on CPU/GPU parallel machines, in: International Conference on Computational Science (ICCS 2012), Procedia Computer Science, Elsevier, 2012, vol. 9, pp. 17–26.
3M. Baboulin, J. Dongarra, J. Herrmann, S. Tomov.

Accelerating linear system solutions using randomization techniques, in: ACM Trans. Math. Softw., 2013, vol. 39, n^o 2.
4M. Baboulin, S. Gratton.

A contribution to the conditioning of the total least squares problem, in: SIAM J. Matrix Anal. and Appl., 2011, vol. 32, n^o 3, pp. 685–699.
5M. Bahi, C. Eisenbeis.

Impact of Reverse Computing on Information Locality in Register Allocation for High Performance Computing, in: International Journal of Parallel Programming, 2012, pp. 1–28.
6D. Barthou, O. Brand-Foissac, O. Pene, G. Grosdidier, R. Dolbeau, C. Eisenbeis, M. Kruse, K. Petrov, C. Tadonki.

Automated Code Generation for Lattice Quantum Chromodynamics and beyond, in: Journal of Physics: Conference Series, 2014, vol. 510, LPT-Orsay-13-142 article nb. 012005. [ DOI : 10.1088/1742-6596/510/1/012005 ]

http://hal.inria.fr/hal-00926513
7P. Esterie, J. Falcou, M. Gaunard, J.-T. Lapresté, L. Lacassagne.

The Numerical Template toolbox: A Modern C++ Design for Scientific Computing, in: Journal of Parallel and Distributed Computing, July 2014. [ DOI : 10.1016/j.jpdc.2014.07.002 ]

https://hal.inria.fr/hal-01061305
8P. Esterie, M. Gaunard, J. Falcou, J.-T. Lapresté.

Exploiting Multimedia Extensions in C++: A Portable Approach, in: Computing in Science & Engineering, 2012, vol. 14, n^o 5, pp. 72–77.
9A. Ferreira Leite.

A User-Centered and Autonomic Multi-Cloud Architecture for High Performance Computing Applications, Paris-Sud XI ; Universidade de Brasília, December 2014.

https://hal.inria.fr/tel-01097295
10G. Fursin, Y. Kashnikov, A. W. Memon, Z. Chamski, O. Temam, M. Namolaru, E. Yom-Tov, B. Mendelson, A. Zaks, E. Courtois, F. Bodin, P. Barnard, E. Ashton, E. Bonilla, J. Thomson, C. Williams, M. F. P. O'Boyle.

Milepost GCC: Machine Learning Enabled Self-tuning Compiler, in: International Journal of Parallel Programming, 2011, vol. 39, pp. 296-327, 10.1007/s10766-010-0161-2.
11M. Kruse.

Lattice QCD Optimization and Polytopic Representations of Distributed Memory, Paris-Sud XI, September 2014.

https://hal.inria.fr/tel-01078440
12S. Tomov, J. Dongarra, M. Baboulin.

Towards dense linear algebra for hybrid GPU accelerated manycore systems, in: Parallel Computing, 2010, vol. 36, n^o 5&6, pp. 232–240.

Publications of the year

Doctoral Dissertations and Habilitation Theses

13A. Rémy.

Solving dense linear systems on accelerated multicore architectures, Université Paris-Sud, July 2015.

https://hal.inria.fr/tel-01206837

International Conferences with Proceedings

14M. Baboulin, J. Dongarra, A. Rémy, S. Tomov, I. Yamazaki.

Dense Symmetric Indefinite Factorization on GPU Accelerated Architectures, in: 11th International Conference on Parallel Processing and Applied Mathematics (PPAM 2015), Krakow, Poland, Lecture Notes in Computer Science, September 2015.

https://hal.inria.fr/hal-01223022
15M. Baboulin, A. Jamal, M. Sosonkina.

Using Random Butterfly Transformations in Parallel Schur Complement-Based Preconditioning, in: 8th Workshop on Computer Aspects of Numerical Algorithms (CANA'15), Lodz, Poland, September 2015.

https://hal.inria.fr/hal-01223090
16M. Baboulin, A. Khabou, A. Rémy.

A Randomized LU-based Solver Using GPU and Intel Xeon Phi Accelerators, in: HeteroPar'2015, Vienna, Austria, August 2015.

https://hal.inria.fr/hal-01223018
17L. Bagnères, O. Zinenko, S. Huot, C. Bastoul.

Opening Polyhedral Compiler's Black Box, in: CGO 2016 - 14th Annual IEEE/ACM International Symposium on Code Generation and Optimization, Barcelona, Spain, March 2016.

https://hal.inria.fr/hal-01253322
18A. Ferreira Leite, V. C. Alves, G. Nunes Rodrigues, C. Tadonki, C. Eisenbeis, A. C. Magalhaes Alves de Melo.

Automating Resource Selection and Configuration in Inter-clouds through a Software Product Line Method, in: 8th International Conference on Cloud Computing (CLOUD), 2015 IEEE, New York City, United States, July 2015, pp. 726-733. [ DOI : 10.1109/CLOUD.2015.101 ]

https://hal-mines-paristech.archives-ouvertes.fr/hal-01252985
19G. W. Howell, M. Baboulin.

LU Preconditioning for Overdetermined Sparse Least Squares Problems, in: 11th International Conference on Parallel Processing and Applied Mathematics (PPAM 2015), Krakow, Poland, Lecture Notes in Computer Science, September 2015.

https://hal.inria.fr/hal-01223069
20L. Lacassagne, L. Cabaret, D. Etiemble.

Parallel light speed labeling: the world’s fastest connected component labeling for multicore processors, in: International Conference on Image Processing, Quebec, Canada, IEEE, September 2015, 8 p.

https://hal.inria.fr/hal-01243310
21I. Masliah, M. Baboulin, J. Falcou.

Metaprogramming dense linear algebra solvers. Applications to multi and many-core architectures, in: 13th IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA 2015), Helsinki, Finland, August 2015.

https://hal.inria.fr/hal-01221358

Scientific Books (or Scientific Book chapters)

22M. Baboulin, J. Dongarra, R. Lacroix.

Computing least squares condition numbers on hybrid multicore/GPU systems, in: Interdisciplinary Topics in Applied Mathematics, Modeling and Computational Science, Springer International Publishing, 2015, vol. 117. [ DOI : 10.1007/978-3-319-12307-3_6 ]

https://hal.inria.fr/hal-01204804

Internal Reports

23M. Baboulin, J. Falcou, I. Masliah.

Meta-programming and Multi-stage Programming for GPGPUs, Inria Saclay Ile de France ; Paris-Sud XI, September 2015, n^o RR-8780.

https://hal.inria.fr/hal-01204661

Other Publications

24M. Baboulin, V. Dobrev, J. Dongarra, C. Earl, J. Falcou, A. Haidar, I. Karlin, T. Kolev, I. Masliah, S. Tomov.

Towards a High-Performance Tensor Algebra Package for Accelerators, August 2015, Smoky Mountains Computational Sciences and Engineering Conference (SMC 2015), Poster.

https://hal.archives-ouvertes.fr/hal-01231234

References in notes

25The HiPEAC vision on high-performance and embedded architecture and compilation (2012-2020), 2012.

http://www.hipeac.net/roadmap
26European Union Framework Program 6 MILEPOST project No 035307 (MachIne Learning for Embedded PrOgramS opTimization).

http://cordis.europa.eu/project/rcn/79763_en.html
27PRACE: Partnership for Advanced Computing in Europe.

http://www.prace-project.eu
28AMD.

AMD Core Math Library.

http://developer.amd.com/libraries/acml/
29E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. D. Croz, A. Greenbaum, S. Hammarling, A. McKenney, D. Sorensen.

LAPACK Users' Guide, SIAM, 1999, Third edition.
30K. Aneja, F. Laguzet, L. Lacassagne, A. Merigot.

Video rate image segmentation by means of region splitting and merging, in: IEEE International Conference on Signal and Image Processing Applications (ICSIPA), 2009.
31M. Arioli, M. Baboulin, S. Gratton.

A partial condition number for linear least-squares problems, in: SIAM J. Matrix Anal. and Appl., 2007, vol. 29, n^o 2, pp. 413–433.
32K. Asanovic.

The landscape of parallel computing research: a view from Berkeley, Electrical Engineering and Computer Sciences, University of California at Berkeley, December 2006, n^o UCB/EECS-2006-183.

http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.pdf
33A. Avron, P. Maymounkov, S. Toledo.

Blendenpick: Supercharging LAPACK’s least-squares solvers, in: SIAM J. Sci. Comput., 2010, vol. 32, pp. 1217–1236.
34M. Baboulin, D. Becker, G. Bosilca, A. Danalis, J. Dongarra.

An efficient distributed randomized algorithm for solving large dense symmetric indefinite linear systems, in: Parallel Computing, 2014, vol. 40, n^o 7, pp. 213–223.
35M. Baboulin, D. Becker, J. Dongarra.

A Parallel Tiled Solver for Dense Symmetric Indefinite Systems on Multicore Architectures, in: Proceedings of IEEE International Parallel & Distributed Processing Symposium (IPDPS 2012), 2012, pp. 14-24.
36M. Baboulin, A. Buttari, J. Dongarra, J. Kurzak, J. Langou, J. Langou, P. Luszczek, S. Tomov.

Accelerating scientific computations with mixed precision algorithms, in: Computer Physics Communications, 2009, vol. 180, n^o 12, pp. 2526–2533.
37M. Baboulin, S. Donfack, J. Dongarra, L. Grigori, A. Rémy, S. Tomov.

A class of communication-avoiding algorithms for solving general dense linear systems on CPU/GPU parallel machines, in: International Conference on Computational Science (ICCS 2012), Procedia Computer Science, Elsevier, 2012, vol. 9, pp. 17–26.
38M. Baboulin, J. Dongarra, J. Demmel, S. Tomov, V. Volkov.

Enhancing the performance of dense linear algebra solvers on GPUs in the MAGMA project, November 15, 2008.

http://www.lri.fr/~baboulin/SC08.pdf
39M. Baboulin, J. Dongarra, S. Gratton, J. Langou.

Computing the conditioning of the components of a linear least squares solution, in: Numerical Linear Algebra with Applications, 2009, vol. 16, n^o 7, pp. 517–533.
40M. Baboulin, J. Dongarra, J. Herrmann, S. Tomov.

Accelerating linear system solutions using randomization techniques, in: ACM Trans. Math. Softw., 2013, vol. 39, n^o 2.
41M. Baboulin, J. Dongarra, R. Lacroix.

Computing least squares condition numbers on hybrid multicore/GPU systems, in: Proceedings of the International Conference of Applied Mathematics, Modeling and Computational Science (AMMCS 2013), 2013.
42M. Baboulin, J. Dongarra, S. Tomov.

Some Issues in Dense Linear Algebra for Multicore and Special Purpose Architectures, in: 9th International Workshop on State-of-the-Art in Scientific and Parallel Computing (PARA'08), Lecture Notes in Computer Science, Springer-Verlag, 2008, vol. 6126-6127.
43M. Baboulin, S. Gratton.

A contribution to the conditioning of the total least squares problem, in: SIAM J. Matrix Anal. and Appl., 2011, vol. 32, n^o 3, pp. 685–699.
44M. Baboulin, S. Gratton, R. Lacroix, A. J. Laub.

Statistical estimates for the conditioning of linear least squares problems, in: 10th International Conference on Parallel Processing and Applied Mathematics (PPAM 2013), Heidelberg, R. Wyrzykowski (editor), Lecture Notes in Computer Science, Springer-Verlag, 2014, vol. 8384, pp. 124-133.
45M. Baboulin, X. S. Li, F.-H. Rouet.

Using Random Butterfly Transformations to Avoid Pivoting in Sparse Direct Methods, in: Proceedings of VECPAR 2014, 2014.
46J. C. Baez, M. Stay.

Algorithmic thermodynamics, in: Mathematical Structures in Computer Science, 2012, vol. 22, n^o 5, pp. 771–787.

http://dx.doi.org/10.1017/S0960129511000521
47M. Bahi, C. Eisenbeis.

Spatial complexity of reversibly computable DAG, in: Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems, ACM, 2009, pp. 47–56.
48M. Bahi, C. Eisenbeis.

Impact of Reverse Computing on Information Locality in Register Allocation for High Performance Computing, in: International Journal of Parallel Programming, 2012, pp. 1–28.
49D. Barthou, O. Brand-Foissac, O. Pene, G. Grosdidier, R. Dolbeau, C. Eisenbeis, M. Kruse, K. Petrov, C. Tadonki.

Automated Code Generation for Lattice Quantum Chromodynamics and beyond, in: Journal of Physics: Conference Series, 2014, vol. 510, 012005, LPT-Orsay-13-142. [ DOI : 10.1088/1742-6596/510/1/012005 ]

http://hal.inria.fr/hal-00926513
50D. Barthou, G. Grosdidier, C. Eisenbeis, P. Guichon, M. Kruse, O. Pene, K. Petrov, C. Tadonki.

PetaQCD: En Route for the automatic code generation for lattice QCD, in: Proceedings of the 29th International Symposium on Lattice field theory (Lattice 2011), 2011, vol. 2011.
51P. Basu, S. Williams, B. V. Straalen, A. Venkat, L. Oliker, M. Hall.

Compiler Generation and Autotuning of Communication-Avoiding Operators for Geometric Multigrid, in: High Performance Computing Conference (HiPC), december 2013.
52D. Becker, M. Baboulin, J. Dongarra.

Reducing the amount of pivoting in symmetric indefinite systems, in: 9th International Conference on Parallel Processing and Applied Mathematics (PPAM 2011), Heidelberg, R. Wyrzykowski (editor), Lecture Notes in Computer Science, Springer-Verlag, 2012, vol. 7203, pp. 133–142.
53T. Betcke, N. J. Higham, V. Mehrmann, C. Schröder, F. Tisseur.

NLEVP: A Collection of Nonlinear Eigenvalue Problems, in: ACM Trans. Math. Software, February 2013, vol. 39, n^o 2, pp. 7:1-7:28. [ DOI : 0.1145/2427023.2427024 ]
54L. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, R. Whaley.

ScaLAPACK Users' Guide, SIAM, 1997, pp. 58–60.
55Blaze.

The Blaze Library, 2014.

https://bitbucket.org/blaze-lib/blaze
56G. Bradski.

The OpenCV Library, in: Dr. Dobb's Journal of Software Tools, 2000.
57L. Cabaret, L. Lacassagne.

A Review of Worlds Fastest Connected Component Labeling Algorithms : Speed and Energy Estimation, in: IEEE International Conference on Design and Architectures for Signal and Image Processing (DASIP), 2014, pp. 1-8.
58L. Cabaret, L. Lacassagne.

What is the world fastest Connected Component Labeling Algorithm ?, in: IEEE International Workshop on Signal Processing Systems (SiPS), 2014, pp. 1-6.
59V. G. Cerf.

Where is the science in computer science?, in: Communications of the ACM, 2012, vol. 55, n^o 10, pp. 5-5.
60M. O. Cheema, L. Lacassagne, O. Hammami.

System-Platforms-Based SystemC TLM Design of Image Processing Chains for Embedded Applications, in: EURASIP Journal on Embedded Systems, 2007, pp. 1-14. [ DOI : 10.1155/2007/71043 ]
61P. Courbin, A. Pédron, T. Saidani, L. Lacassagne.

Parallélisation d'opérateurs de TI: multi-coeurs, Cell ou GPU ?, in: GRETSI, 2009.
62K. Czarnecki, U. W. Eisenecker, R. Glück, D. Vandevoorde, T. L. Veldhuizen.

Generative Programming and Active Libraries, in: Generic Programming, 1998, pp. 25-39.
63P. I. Davies, N. J. Higham.

Numerically Stable Generation of Correlation Matrices and their Factors, in: BIT, 2000, vol. 40, n^o 4, pp. 640-651.
64J. W. Demmel, L. Grigori, M. Hoemmen, J. Langou.

Communication-optimal parallel and sequential QR and LU factorizations, in: SIAM Journal on Scientific Computing, 2012, vol. 34, n^o 1, pp. 206–239.
65J. W. Demmel, A. McKenney.

A Test Matrix Generation Suite, Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, USA, March 1989, n^o MCS-P69-0389, 16 p, LAPACK Working Note 9.
66J. Dongarra et.al..

The International Exascale Software Project roadmap, in: Int. J. High Perform. Comput. Appl., February 2011, vol. 25, n^o 1, pp. 3–60.

http://dx.doi.org/10.1177/1094342010391989
67A. Elouardi, S. Bouaziz, A. Dupret, L. Lacassagne, J. Klein, R. Reynaud.

A smart sensor based vision system: implementation and evaluation, in: Journal of Applied Physics, 2006, vol. 39, pp. 1694-1705. [ DOI : 10.1088/0022-3727/39/8/033 ]
68A. Elouardi, S. Bouaziz, A. Dupret, L. Lacassagne, J. Klein, R. Reynaud.

A Smart Architecture for Low-Level Image Computing, in: International Journal of Computer Sciences and Application, 2008, vol. 5,3, pp. 1-19.
69P. Esterie, J. Falcou, M. Gaunard, J.-T. Lapresté, L. Lacassagne.

The numerical template toolbox: A modern C++ design for scientific computing, in: Journal of Parallel and Distributed Computing, 2014.
70P. Esterie, M. Gaunard, J. Falcou, J.-T. Lapresté.

Exploiting Multimedia Extensions in C++: A Portable Approach, in: Computing in Science & Engineering, 2012, vol. 14, n^o 5, pp. 72–77.
71P. Estérie, M. Gaunard, J. Falcou.

A proposal to add single instruction multiple data computation to the standard library, in: N3561, 2013.
72D. Etiemble, S. Piskorski, L. Lacassagne.

Performance evaluation of Altera C2H compiler on image processing benchmarks, in: TCHA: Workshop on Tools And Compiler for Hardware Acceleration, 2006.
73J. Falcou, L. Lacassagne, S. Schaetz.

Cell MPI: Mastering the Cell Broadband Engine architecture through a Boost based parallel communication library, in: Boost Conference, 2011.
74J. Falcou, T. Saidani, L. Lacassagne, D. Etiemble.

Programmation par squelettes algorithmiques pour le processeur Cell, in: SYMPA, 2008.
75J. Falcou, J. Sérot, L. Pech, J.-T. Lapresté.

Meta-programming applied to automatic SMP parallelization of linear algebra code, in: Euro-Par 2008–Parallel Processing, Springer Berlin Heidelberg, 2008, pp. 729–738.
76G. Fursin, C. Dubach.

Experience report: community-driven reviewing and validation of publications, in: Proceedings of the 1st Workshop on Reproducible Research Methodologies and New Publication Models in Computer Engineering (ACM SIGPLAN TRUST'14), ACM, 2014.

http://dx.doi.org/10.1145/2618137.2618142
77G. Fursin.

Collective Tuning Initiative: automating and accelerating development and optimization of computing systems, in: Proceedings of the GCC Developers' Summit, June 2009.
78G. Fursin, Y. Kashnikov, A. W. Memon, Z. Chamski, O. Temam, M. Namolaru, E. Yom-Tov, B. Mendelson, A. Zaks, E. Courtois, F. Bodin, P. Barnard, E. Ashton, E. Bonilla, J. Thomson, C. Williams, M. F. P. O'Boyle.

Milepost GCC: Machine Learning Enabled Self-tuning Compiler, in: International Journal of Parallel Programming, 2011, vol. 39, pp. 296-327, 10.1007/s10766-010-0161-2.
79G. Fursin, R. Miceli, A. Lokhmotov, M. Gerndt, M. Baboulin, A. D. Malony, Z. Chamski, D. Novillo, D. D. Vento.

Collective Mind: towards practical and collaborative auto-tuning, in: Special issue on Automatic Performance Tuning for HPC Architectures, Scientific Programming Journal, 2014.
80M. Gouiffès, F. Laguzet, L. Lacassagne.

Color Connectedness Degree For Mean-Shift Tracking, in: IEEE International Conference on Pattern Recognition (ICPR), 2010.
81M. Gouiffès, F. Laguzet, L. Lacassagne.

Projection Histogram For Mean-Shift Tracking, in: IEEE International Conference on Image Processing (ICIP), 2010.
82C. Grana, D. Borghesani, R. Cucchiara.

Connected Component Labeling Techniques on Modern Architectures, in: ICIAP, IEEE, 2009, pp. 816-824.
83L. Grigori, J. Demmel, H. Xiang.

CALU: a communication optimal LU factorization algorithm, in: SIAM J. Matrix Anal. and Appl., 2011, vol. 32, pp. 1317-1350.
84M. Gu, S. C. Eisenstat.

Efficient Algorithms for Computing a Strong Rank-revealing QR Factorization, in: SIAM Journal on Scientific Computing, July 1996, vol. 17, n^o 4, pp. 848–869.

http://dx.doi.org/10.1137/0917055
85S. Guelton, J. Falcou, P. Brunet.

Exploring the vectorization of python constructs using pythran and boost SIMD, in: Proceedings of the 2014 Workshop on Workshop on programming models for SIMD/Vector processing, ACM, 2014, pp. 79–86.
86G. Guennebaud, B. Jacob.

Eigen v3, 2010.

http://eigen.tuxfamily.org
87N. Halko, P. G. Martinsson, J. A. Tropp.

Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions, in: SIAM Review, 2011, vol. 53, pp. 217–288.
88C. Harris, M. Stephens.

A combined corner and edge detector, in: 4th ALVEY Vision Conference, Editions Hermes, Paris, 1988.
89L. He, Y. Chao, K. Suzuki.

A run-based two-scan labeling algorithm, in: ICIAR, LNCS 4633, 2007, pp. 131-142.
90R. M. Heiberger.

Algorithm AS 127: Generation of Random Orthogonal Matrices, in: J. Roy. Statist. Soc. Ser. C (Applied Statistics), 1978, vol. 27, n^o 2, pp. 199-206.
91N. J. Higham.

$J$ -Orthogonal Matrices: Properties and Generation, in: SIAM Rev., September 2003, vol. 45, n^o 3, pp. 504-519. [ DOI : 10.1137/S0036144502414930 ]
92G. E. Hinton, S. Osindero.

A fast learning algorithm for deep belief nets, in: Neural Computation, 2006, vol. 18.
93S. Horowitz, T. Pavlidis.

Picture segmentation by a tree traversal algorithm, in: Journal of the ACM, 1976, vol. 23, pp. 368-388.
94T. Ikegami, T. Sakurai, U. Nagashima.

A filter diagonalization for generalized eigenvalue problems based on the Sakurai-Sugiura projection method, in: Journal of Computational and Applied Mathematics, 2010, vol. 233, n^o 8, pp. 1927–1936.
95Intel.

Math Kernel Library.

http://developer.intel.com/software/products/mkl/
96V. Jimenez, I. Gelado, L. Vilanova, M. Gil, G. Fursin, N. Navarro.

Predictive runtime code scheduling for heterogeneous architectures, in: Proceedings of the International Conference on High Performance Embedded Architectures & Compilers (HiPEAC 2009), January 2009.
97C. S. Kenney, A. J. Laub.

Small-sample statistical condition estimates for general matrix functions, in: SIAM J. Sci. Comput., 1994, vol. 15, pp. 36–61.
98A. Khabou, J. Demmel, L. Grigori, M. Gu.

LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version, in: SIAM Journal on Matrix Analysis and Applications, 2013, vol. 34, n^o 3, pp. 1401-1429.

http://epubs.siam.org/doi/abs/10.1137/120863691
99M. Kruse.

Lattice QCD Optimization and Polytopic Representations of Distributed Memory, Université Paris-Sud 11, September, 26th 2014.
100T. Kunlin, L. Lacassagne, A. Mérigot.

A Fast image segmentation scheme, in: International Conference on Information and Communication Technologies, IEEE, 2004.
101L. Lacassagne, D. Etiemble, A. Hassan Zahraee, A. Dominguez, P. Vezolle.

High Level Transforms for SIMD and low-level computer vision algorithms, in: ACM Workshop on Programming Models for SIMD/Vector Processing (PPoPP), 2014, pp. 49-56.
102L. Lacassagne, D. Etiemble, S. Kablia.

16-bit Floating Point Instructions for embedded Multimedia Applications, in: CAMP: Computer Architecture and Machine Perception, IEEE, 2005.
103L. Lacassagne, D. Etiemble.

16-bit floating point operations for low-end and high-end embedded processors, in: ODES: Optimizations for DSP and Embedded Systems, IEEE/ACM, 2005.
104L. Lacassagne, A. Manzanera, J. Denoulet, A. Mérigot.

High Performance Motion Detection: Some trends toward new embedded architectures for vision systems, in: Journal of Real Time Image Processing, october 2008, pp. 127-148. [ DOI : 10.1007/s11554-008-0096-7 ]
105L. Lacassagne, A. B. Zavidovique.

Light Speed Labeling for RISC architectures, in: IEEE International Conference on Image Analysis and Processing (ICIP), 2009.
106L. Lacassagne, B. Zavidovique.

Light Speed Labeling: efficient connected component labeling on RISC architectures, in: Journal of Real-Time Image Processing, 2011, vol. 6, n^o 2, pp. 117-135.
107F. Laguzet, M. Gouiffès, L. Lacassagne.

Automatic color space switching for robust tracking, in: IEEE International Conference on Signal and Image Processing Applications (ICSIPA), 2011.
108F. Laguzet, A. Romero, M. Gouiffès, L. Lacassagne, D. Etiemble.

Color tracking with contextual switching: Real-time implementation on CPU, in: Journal of Real-Time Image Processing, 2013, pp. 1-18.
109J. Lambert, L. Lacassagne, G. Rougeron, S. L. Berre, S. Chatillon.

High Performance simulation of ultrasonic fields for Non Destructive Testing, in: International Symposium in Nuclear Application and Monte-Carlo, 2013.
110J. Lambert, A. Pédron, G. Gens, F. Bimbard, L. Lacassagne, E. Iakovleva, S. L. Berre.

Analysis of multicore CPU and GPU toward parallelization of Total Focusing Method ultrasound reconstruction, in: IEEE International Conference on Design and Architectures for Signal and Image Processing (DASIP), 2012.
111J. Lambert, G. Rougeron, L. Lacassagne, S. Chatillon.

A fast ultrasonic simulation tool based on massively parallel implementations, in: Review of Progress of Quantitative Nondestructive Evaluation, 2013.
112Q. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. Corrado, J. Dean, A. Ng.

Building high-level features using large scale unsupervised learning, in: International Conference in Machine Learning, 2012.
113W. Ledermann, C. Alexander, D. Ledermann.

Random Orthogonal Matrix Simulation, in: Linear Algebra Appl., 2011, vol. 434, n^o 6, pp. 1444-1467. [ DOI : 10.1016/j.laa.2010.10.023 ]
114A. Leite, C. Tadonki, C. Eisenbeis, A. de Melo.

A Fine-grained Approach for Power Consumption Analysis and Prediction, in: Procedia Computer Science, 2014, vol. 29, pp. 2260–2271.
115S. Liu, C. Eisenbeis, J.-L. Gaudiot.

A theoretical framework for value prediction in parallel systems, in: Parallel Processing (ICPP), 2010 39th International Conference on, IEEE, 2010, pp. 11–20.
116M. W. Mahoney.

Randomized algorithms for matrices and data, in: Foundations and Trends in Machine Learning, 2011, vol. 3, n^o 2, pp. 123–224.
117D. Menard, R. Serizel, R. Rocher, O. Sentieys.

Accuracy Constraint Determination in Fixed-Point System Design, in: Journal on Embedded Systems (JES), 2008, vol. 2008, pp. 1-12. [ DOI : 10.1155/2008/242584 ]
118P. Monasse, F. Guichard.

Fast computation of contrast-onvariant image representation, in: Transaction on, 2000, vol. 9,5, pp. 860-872.
119S. Moufawad.

Demmel type communication-avoiding generalized minimal residual method (CA-GMRES) on multicore hardwares: an application in QCD, American university of Beirut, Beirut, Libanon, june 2011, defended on 2010, June 10th.
120M. Odersky.

An Overview of the SCALA Programming Language, EPFL Lausanne, Switzerland, 2004, n^o IC/2004/64.
121D. S. Parker.

Random Butterfly Transformations with Applications in Computational Linear Algebra, Computer Science Department, UCLA, 1995, n^o CSD-950023.
122M. Pharr, W. R. Mark.

ISPC: A SPMD Compiler for High-Performance CPU Programming, in: Innovative Parallel Computing (InPar), 2012.
123S. Piskorski, L. Lacassagne, D. Etiemble.

IPLG: un outil pour la fusion d'opérateurs en Traitement d'Images, in: SYMPA, 2009.
124S. Piskorski, L. Lacassagne, M. Kieffer, D. Etiemble.

Efficient floating point interval processing for embedded systems and applications, in: SCAN - International Symposium of Scientific computing, Computer Arithmetic and Validated Numerics, 2006.
125S. Pop, A. Cohen, C. Bastoul, S. Girbal, G. A. Silber, N. Vasilache.

GRAPHITE: Loop optimizations based on the polyhedral model for GCC, in: Proc. of the 4th GCC Developper's Summit, June 2006, pp. 179–198.
126A. Pédron, L. Lacassagne, V. Barbillon, F. Bimbard, G. Rougeron, S. L. Berre.

Performance analysis of an ultrasound reconstruction algorithm for non destructuve testing, in: IEEE International Conference on Parallel Computing (ParCo), 2011.
127A. Pédron, L. Lacassagne, F. Bimbard, S. L. Berre.

Parallelization of an ultrasound reconstruction algorithm for non destructive testing on multicore CPU and GPU, in: IEEE International Conference on Design and Architectures for Signal and Image Processing (DASIP), 2011.
128A. Romero, M. Gouiffès, L. Lacassagne.

Feature Points tracking adaptative to Saturation, in: IEEE International Conference on Signal and Image Processing Applications (ICSIPA), 2011.
129A. Romero, M. Gouiffès, L. Lacassagne.

Covariance Descriptor Multiple Object Tracking and Re-Identification with Colorspace Evaluation, in: IEEE ACCV - Workshop on Detection and Tracking in Challenging Environnements, 2012.
130A. Romero, M. Gouiffès, L. Lacassagne.

Enhanced Local Binary Covariance Matrices (ELBCM) for texture analysis and object tracking, in: ACM International Conference on Computer Vision / Computer Graphics Collaboration Techniques and Applications, 2013.
131A. Romero, L. Lacassagne, M. Gouiffès.

Real-time covariance tracking algorithm for embedded systems, in: IEEE International Conference on Design and Architectures for Signal and Image Processing (DASIP), 2013.
132A. Rosenfeld, J. Platz.

Sequential operator in digital pictures processing, in: Journal of ACM, 1966, vol. 13,4, pp. 471-494.
133A. Rémy, M. Baboulin, M. Sosonkina, B. Rozoy.

Locality optimization on a NUMA architecture for hybrid LU factorization, in: International Conference on Parallel Computing (PARCO 2013), Advances in Parallel Computing, IOS Press, 2014, vol. 25, pp. 153-162.
134T. Saidani, J. Falcou, C. Tadonki, L. Lacassagne, D. Etiemble.

Algorithmic Skeletons within an Embedded Domain Specific Language for the Cell Processor, in: Parallel Architectures and Compilation Techniques, PACT, 2009, pp. 67-76.
135T. Saidani, L. Lacassagne, S. Bouaziz, T. M. Khan.

Parallelization Strategies for the Points of Interests Algorithm on the Cell Processor, in: Lecture Notes in Computer Science, Springer, 2007, pp. 104-112. [ DOI : 10.1007/978-3-540-74742-0 ]
136T. Saidani, S. Piskorski, L. Lacassagne, S. Bouaziz.

Parallelization Schemes for Memory Optimization on the Cell Processor: A Case Study of Image Processing Algorithm, in: PACT-MEDEA, 2007, pp. 15-19.
137C. Sanderson.

Armadillo: An open source C++ linear algebra library for fast prototyping and computationally intensive experiments, in: Report Version, 2010, vol. 2.
138J. Siek, L.-Q. Lee, A. Lumsdaine.

Boost Random Number Library, June 2000.

http://www.boost.org/libs/graph/
139D. Spinellis.

Notable design patterns for domain-specific languages, in: Journal of Systems and Software, 2001, vol. 56, n^o 1, pp. 91 - 99. [ DOI : 10.1016/S0164-1212(00)00089-3 ]

http://www.sciencedirect.com/science/article/pii/S0164121200000893
140G. W. Stewart.

The Efficient Generation of Random Orthogonal Matrices With an Application to Condition Estimators, in: SIAM J. Numer. Anal., 1980, vol. 17, n^o 3, pp. 403-409.
141A. K. Sujeeth, A. Gibbons, K. J. Brown, H. Lee, T. Rompf, M. Odersky, K. Olukotun.

Forge: Generating a High Performance DSL Implementation from a Declarative Specification, in: 12th International Conference on Generative Programming: Concepts and Experiences, 2013.
142A. K. Sujeeth, T. Rompf, K. J. Brown, H. Lee, H. Chafi, V. Popic, M. Wu, A. Prokopec, V. Jovanovic, M. Odersky, K. Olukotun.

Composition and Reuse with Compiled Domain-Specific Languages, in: ECOOP'13: European Conference on Object-Oriented Programming, 2013.
143V. Sundriyal, M. Sosonkina, A. Gaenko, Z. Zhang.

Energy saving strategies for parallel applications with point-to-point communication phases, in: Journal of Parallel and Distributed Computing, 2013. [ DOI : 10.1016/j.jpdc.2013.03.006 ]
144V. Sundriyal, M. Sosonkina, Z. Zhang.

Automatic runtime frequency-scaling system for energy savings in parallel applications, in: The Journal of Supercomputing, 2014, vol. 68, n^o 2, pp. 777–797.
145K. Suzuki, I. Horiba, N. Sugie.

Linear-time connected component labeling based on sequential local operations, in: Computer Vision and Image Understanding, january 2003, vol. 89, n^o 1, pp. 1-23. [ DOI : 10.1016/S1077-3142(02)00030-9 ]
146H. Tabia, M. Gouiffès, L. Lacassagne.

Motion histogram quantification for human action recognition, in: IEEE International Conference on Pattern Recognition (ICPR), 2012.
147H. Tabia, M. Gouiffès, L. Lacassagne.

Motion modeling for abnormal event detection in crowd scenes, in: IEEE International Conference on Pattern Recognition (ISCIVC), 2012.
148C. Tadonki, L. Lacassagne, T. Saïdani, J. Falcou, K. Hamidouche.

The Harris algorithm revisited on the Cell processor, in: International Workshop on Highly-Efficient Accelerators and Reconfigurable Technologies (HEART), 2010.
149S. Tomov, J. Dongarra, M. Baboulin.

Towards dense linear algebra for hybrid GPU accelerated manycore systems, in: Parallel Computing, 2010, vol. 36, n^o 5&6, pp. 232–240.
150University of Tennessee.

PLASMA Users' Guide, Parallel Linear Algebra Software for Multicore Architectures, Version 2.3, 2010.
151T. L. Veldhuizen.

Active Libraries and Universal Languages, Indiana University Computer Science, May 2004.

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.9.3916
152H. Wang, H. Andrade, B. Gedik, K.-L. Wu.

A Code Generation Approach for Auto-Vectorization in the Spade Compiler, in: LCPC'09, 2009, pp. 383-390.
153Y. Wang, M. Baboulin, J. Dongarra, J. Falcou, Y. Fraigneau, O. L. Maître.

A parallel solver for incompressible fluid flows, in: International Conference on Computational Science (ICCS 2013), Procedia Computer Science, Elsevier, 2013, vol. 18, pp. 439–448.
154Y. Wang, M. Baboulin, K. Rupp, O. Le Maître, Y. Fraigneau.

Solving 3D Incompressible Navier-Stokes Equations on Hybrid CPU/GPU Systems, in: Proceedings of the High Performance Computing Symposium, San Diego, CA, USA, HPC '14, Society for Computer Simulation International, 2014, pp. 12:1–12:8.

http://dl.acm.org/citation.cfm?id=2663510.2663522
155H. Ye, L. Lacassagne, D. Etiemble, L. Cabaret, J. Falcou, O. Florent.

Impact of High Level Transforms on High Level Synthesis for motion detection algorithm, in: IEEE International Conference on Design and Architectures for Signal and Image Processing (DASIP), 2012, pp. 1-8.
156H. Ye, L. Lacassagne, J. Falcou, D. Etiemble, L. Cabaret, O. Florent.

High Level Transforms to reduce energy consumption of signal and image processing operators, in: IEEE International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS), 2013, pp. 247-254.

Previous |

Home