Major publications by the team in recent years
1G. Antoniu, L. Bougé, P. Hatcher, M. MacBeth, K. McGuigan, R. Namyst.
The Hyperion system: Compiling multithreaded Java bytecode for distributed execution, in: Parallel Computing, October 2001, vol. 27, p. 1279–1297. -
2O. Aumage, L. Bougé, A. Denis, L. Eyraud, J.-F. Méhaut, G. Mercier, R. Namyst, L. Prylli.
A Portable and Efficient Communication Library for High-Performance Cluster Computing (extended version), in: Cluster Computing, January 2002, vol. 5, no 1, p. 43-54. -
3O. Aumage, É. Brunet, N. Furmento, R. Namyst.
NewMadeleine: a Fast Communication Scheduling Engine for High Performance Networks, in: CAC 2007: Workshop on Communication Architecture for Clusters, held in conjunction with IPDPS 2007, Long Beach, California, USA, March 2007, Also available as LaBRI Report 1421-07 and INRIA RR-6085.
http://hal. inria. fr/ inria-00127356 -
4O. Aumage, G. Mercier.
MPICH/MadIII: a Cluster of Clusters Enabled MPI Implementation, in: Proc. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2003), Tokyo, IEEE, May 2003, p. 26–35. -
5O. Aumage, G. Mercier, R. Namyst.
MPICH/Madeleine: a True Multi-Protocol MPI for High-Performance Networks, in: Proc. 15th International Parallel and Distributed Processing Symposium (IPDPS 2001), San Francisco, IEEE, April 2001, 51 p, Extended proceedings in electronic form only.. -
6F. Broquedis, J. Clet-Ortega, S. Moreaud, N. Furmento, B. Goglin, G. Mercier, S. Thibault, R. Namyst.
hwloc: a Generic Framework for Managing Hardware Affinities in HPC Applications, in: Proceedings of the 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP2010), Pisa, Italia, IEEE Computer Society Press, February 2010, p. 180–186. [ DOI : 10.1109/PDP.2010.67 ]
http://hal. inria. fr/ inria-00429889 -
7F. Broquedis, N. Furmento, B. Goglin, P.-A. Wacrenier, R. Namyst.
ForestGOMP: an efficient OpenMP environment for NUMA architectures, in: International Journal on Parallel Programming, Special Issue on OpenMP; Guest Editors: Matthias S. Müller and Eduard Ayguadé, 2010, vol. 38, no 5, p. 418-439. [ DOI : 10.1007/s10766-010-0136-3 ]
http://hal. inria. fr/ inria-00496295 -
8D. Buntinas, G. Mercier, W. Gropp.
Implementation and Shared-Memory Evaluation of MPICH2 over the Nemesis Communication Subsystem, in: Recent Advances in Parallel Virtual Machine and Message Passing Interface: Proc. 13th European PVM/MPI Users Group Meeting, Bonn, Germany, September 2006. -
9V. Danjean, R. Namyst, R. Russell.
Linux Kernel Activations to Support Multithreading, in: Proc. 18th IASTED International Conference on Applied Informatics (AI 2000), Innsbruck, Austria, IASTED, February 2000, p. 718-723. -
10B. Goglin, N. Furmento.
Finding a Tradeoff between Host Interrupt Load and MPI Latency over Ethernet, in: Proceedings of the IEEE International Conference on Cluster Computing, New Orleans, LA, IEEE Computer Society Press, September 2009.
http://hal. inria. fr/ inria-00397328 -
11S. Moreaud, B. Goglin.
Impact of NUMA Effects on High-Speed Networking with Multi-Opteron Machines, in: The 19th IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS 2007), Cambridge, Massachussetts, November 2007.
http://hal. inria. fr/ inria-00175747 -
12R. Namyst.
Contribution à la conception de supports exécutifs multithreads performants, Université Claude Bernard de Lyon, pour des travaux effectués à l'école normale supérieure de Lyon, December 2001, Habilitation à diriger des recherches. -
13S. Thibault, F. Broquedis, B. Goglin, R. Namyst, P.-A. Wacrenier.
An Efficient OpenMP Runtime System for Hierarchical Architectures, in: International Workshop on OpenMP (IWOMP), Beijing,China, 6 2007, p. 148–159.
http://hal. inria. fr/ inria-00154502 -
14S. Thibault, R. Namyst, P.-A. Wacrenier.
Building Portable Thread Schedulers for Hierarchical Multiprocessors: the BubbleSched Framework, in: EuroPar, Rennes,France, ACM, 8 2007.
http://hal. inria. fr/ inria-00154506 -
15F. Trahay, É. Brunet, A. Denis, R. Namyst.
A multithreaded communication engine for multicore architectures, in: CAC 2008: Workshop on Communication Architecture for Clusters, held in conjunction with IPDPS 2008, Miami, FL, IEEE Computer Society Press, April 2008.
http://hal. inria. fr/ inria-00224999 -
16F. Trahay, A. Denis, O. Aumage, R. Namyst.
Improving Reactivity and Communication Overlap in MPI using a Generic I/O Manager, in: EuroPVM/MPI, Recent Advances in Parallel Virtual Machine and Message Passing Interface, F. Cappello, T. Herault, J. Dongarra (editors), Lecture Notes in Computer Science, Springer, 2007, no 4757, p. 170-177.
http://hal. inria. fr/ inria-00177167
Doctoral Dissertations and Habilitation Theses
17C. Augonnet.
Scheduling Tasks over Multicore machines enhanced with Accelerators: a Runtime System's Perspective, Université Sciences et Technologies - Bordeaux I, December 2011. -
18S. Moreaud.
Mouvement de données et placement des tâches pour les communications haute performance sur machines hiérarchiques, Université Sciences et Technologies - Bordeaux I, October 2011.
http://hal. inria. fr/ tel-00635651/ en
Articles in International Peer-Reviewed Journal
19C. Augonnet, S. Thibault, R. Namyst, P.-A. Wacrenier.
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures, in: Concurrency and Computation: Practice and Experience, Special Issue: Euro-Par 2009, February 2011, vol. 23, p. 187–198. [ DOI : 10.1002/cpe.1631 ]
http://hal. inria. fr/ inria-00550877 -
20S. Benkner, S. Pllana, J. L. Träf, P. Tsigas, U. Dolinsky, C. Augonnet, B. Bachmayer, C. Kessler, D. Moloney, V. Osipov.
PEPPHER: Efficient and Productive Usage of Hybrid Computing Systems, in: IEEE Micro, 2011, vol. 31, no 5, p. 28-41. [ DOI : 10.1109/MM.2011.67 ]
http://hal. inria. fr/ hal-00648480/ en -
21A. Benoit, L.-C. Canon, E. Jeannot, Y. Robert.
Reliability of task graph schedules with transient and fail-stop failures: complexity and algorithms, in: Journal of Scheduling, May 2011.
http://hal. inria. fr/ hal-00653477/ en -
22B. Goglin.
High-Performance Message Passing over generic Ethernet Hardware with Open-MX, in: Journal of Parallel Computing, February 2011, vol. 37, no 2, p. 85-100. [ DOI : 10.1016/j.parco.2010.11.001 ]
http://hal. inria. fr/ inria-00533058/ en -
23B. Goglin.
NIC-assisted cache-efficient receive stack for message passing over Ethernet, in: Concurrency and Computation: Practice and Experience, 2011, vol. 23, no 2, p. 199-210. [ DOI : 10.1002/cpe.1632 ]
http://hal. inria. fr/ inria-00496301/ en -
24B. Goglin, J. Squyres, S. Thibault.
Hardware Locality: Peering under the hood of your server, in: Linux Pro Magazine, July 2011, no 128, p. 28-33.
http://hal. inria. fr/ inria-00597961/ en -
25E. Jeannot, E. Saule, D. Trystram.
Optimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation Algorithms and Heuristics, in: Journal of Parallel and Distributed Computing, 2012, vol. 72, no 2, p. 268 – 280. [ DOI : 10.1016/j.jpdc.2011.11.003 ]
International Conferences with Proceedings
26E. Agullo, C. Augonnet, J. Dongarra, M. Faverge, J. Langou, H. Ltaief, S. Tomov.
LU Factorization for Accelerator-based Systems, in: 9th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA 11), Sharm El-Sheikh, Egypt, June 2011.
http://hal. inria. fr/ hal-00654193/ en -
27E. Agullo, C. Augonnet, J. Dongarra, M. Faverge, H. Ltaief, S. Thibault, S. Tomov.
QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators, in: 25th IEEE International Parallel & Distributed Processing Symposium, Anchorage, United States, May 2011.
http://hal. inria. fr/ inria-00547614/ en -
28S. Benkner, S. Pllana, J. Larsson Träff, P. Tsigas, A. Richards, R. Namyst, B. Bachmayer, C. Kessler, D. Moloney, P. Sanders.
The PEPPHER Approach to Programmability and Performance Portability for Heterogeneous many-core Architectures, in: ParCo, Ghent, Belgique, 2011.
http://hal. inria. fr/ hal-00661320 -
29É. Brunet, F. Trahay, A. Denis, R. Namyst.
A sampling-based approach for communication libraries auto-tuning, in: IEEE International Conference on Cluster Computing, Austin, United States, September 2011.
http://hal. inria. fr/ inria-00605735/ en -
30L.-C. Canon, E. Jeannot.
MO-Greedy: an extended beam-search approach for solving a multi-criteria scheduling problem on heterogeneous machines, in: International Heterogeneity in Computing Workshop, Anchorage, United States, September 2011.
http://hal. inria. fr/ hal-00653724/ en -
31L.-C. Canon, E. Jeannot, J. Weissman.
A Scheduling and Certification Algorithm for Defeating Collusion in Desktop Grids, in: International Conference on Distributed Computing Systems, Minneapolis, United States, July 2011.
http://hal. inria. fr/ hal-00653493/ en -
32U. Dastgeer, C. Kessler, S. Thibault.
Flexible runtime support for efficient skeleton programming on hybrid systems, in: International conference on Parallel Computing (ParCo), Gent, Belgium, August 2011.
http://hal. inria. fr/ inria-00606200/ en -
33A. Denis.
A High-Performance Superpipeline Protocol for InfiniBand, in: Euro-Par 2011, Bordeaux, France, E. Jeannot, R. Namyst, J. Roman (editors), Lecture Notes in Computer Science, Springer, August 2011, vol. 6853, p. 276-287.
http://hal. inria. fr/ inria-00586015/ en -
34B. Goglin, S. Moreaud.
Dodging Non-Uniform I/O Access in Hierarchical Collective Operations for Multicore Clusters, in: CASS 2011: The 1st Workshop on Communication Architecture for Scalable Systems, held in conjunction with IPDPS 2011, Anchorage, United States, May 2011, 7p p.
http://hal. inria. fr/ inria-00566246/ en -
35T. Ma, G. Bosilca, A. Bouteiller, B. Goglin, J. Squyres, J. Dongarra.
Kernel Assisted Collective Intra-node MPI Communication Among Multi-core and Many-core CPUs, in: 40th International Conference on Parallel Processing (ICPP-2011), Taipei, Taiwan, Province Of China, September 2011.
http://hal. inria. fr/ inria-00602877/ en -
36A. Mazouz, S.-A.-A. Touati, D. Barthou.
Analysing the Variability of OpenMP Programs Performances on Multicore Architectures, in: Fourth Workshop on Programmability Issues for Heterogeneous Multicores (MULTIPROG-2011), Heraklion, Greece, Held in conjunction with: the 6th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), 2011, 14 p.
http://hal. inria. fr/ inria-00637957/ en -
37G. Mercier, E. Jeannot.
Improving MPI Applications Performance on Multicore Clusters with Rank Reordering, in: EuroMPI, Santorini, Italy, Springer Verlag, September 2011, vol. 6960, p. 39-49. [ DOI : 10.1007/978-3-642-24449-0 ]
http://hal. inria. fr/ hal-00643151/ en -
38B. Putigny, B. Goglin, D. Barthou.
Performance modeling for power consumption reduction on SCC, in: 4th Many-core Applications Research Community (MARC) Symposium, Potsdam, Germany, H. Plattner (editor), December 2011.
http://hal. inria. fr/ hal-00649635/ en -
39F. Trahay, F. Rue, M. Faverge, Y. Ishikawa, R. Namyst, J. Dongarra.
EZTrace: a generic framework for performance analysis, in: IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Newport Beach, CA, United States, May 2011, Poster Session.
http://hal. inria. fr/ inria-00587216/ en -
40S. Yi, E. Jeannot, D. Kondo, D. P. Anderson.
Towards Real-Time, Volunteer Distributed Computing, in: 11th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid 2011), Newport Beach, CA, United States, 2011.
http://hal. inria. fr/ hal-00654691/ en
National Conferences with Proceeding
41S. Mahmoudi, P. Manneback, C. Augonnet, S. Thibault.
Détection optimale des coins et contours dans des bases d'images volumineuses sur architectures multicœurs hétérogènes, in: Rencontres francophones du parallélisme, Saint-Malo, France, May 2011.
http://hal. inria. fr/ inria-00606195/ en -
42H. Sylvain.
Programmation multi-accélérateurs unifiée en OpenCL, in: RenPAR'20, Saint Malo, France, May 2011.
http://hal. inria. fr/ hal-00643257/ en
Scientific Books (or Scientific Book chapters)
43P. Vicat-Blanc Primet, B. Goglin, R. Guillier, S. Soudan.
Computing Networks: From Cluster to Cloud Computing, Wiley-ISTE, May 2011.
http://hal. inria. fr/ inria-00590739/ en -
44P. de Oliveira Castro, S. Louise, D. Barthou.
Programming Multi-core and Many-core Computing Systems, Wiley-Blackwell, 2012, To Appear.
Books or Proceedings Editing
45E. Jeannot, R. Namyst, J. Roman (editors)
Euro-Par 2011 Parallel Processing - 17th International Conference, Euro-Par 2011, Bordeaux, France, August 29 - September 2, 2011, Proceedings, Part I, Lecture Notes in Computer Science, Springer, 2011, vol. 6852. -
46E. Jeannot, R. Namyst, J. Roman (editors)
Euro-Par 2011 Parallel Processing - 17th International Conference, Euro-Par 2011, Bordeaux, France, August 29 - September 2, 2011, Proceedings, Part II, Lecture Notes in Computer Science, Springer, 2011, vol. 6853.
Scientific Popularization
47B. Goglin.
De votre boulangerie à un système d'exploitation multiprocesseur, in: Interstices, February 2011.
http://hal. inria. fr/ inria-00566232/ en -
48B. Goglin.
Et plus vite si affinités..., in: Interstices, June 2011.
http://hal. inria. fr/ inria-00604025/ en -
49R. Namyst.
Virtualization of Hybrid Architectures, in: Super-computers: at the frontiers of extreme computing, November 2011.
Other Publications
50S. Barascou.
Optimisation des communications pour les calculs parallèles avec SALOME/YACS et PadicoTM, Université Sciences et Technologies - Bordeaux I, September 2011.
http://hal. inria. fr/ hal-00652882/ en -
51A.-E. Hugo.
Composabilité de codes parallèles sur architectures hétérogènes, Université Sciences et Technologies - Bordeaux I, 2011.
http://hal. inria. fr/ inria-00619654/ en -
52J. Jaeger, D. Barthou.
Stencils sur CPU et GPU, December 2011, Quatrième rencontres de la communauté française de compilation, Saint-Hippolyte, France. -
53R. Namyst.
Programming heterogeneous, accelerator-based multicore machines:current situation and main challenges, May 2011, Invited Talk.
http://hal. inria. fr/ inria-00590670/ en -
54B. Putigny, D. Barthou, B. Goglin.
Modélisation du coût de la cohérence de cache pour améliorer le tuilage de boucles, December 2011, Quatrième rencontres de la communauté française de compilation, Saint-Hippolyte, France. -
55C. Roelandt.
Association de modèles de programmation pour l'exploitation de clusters de GPUs dans le calcul intensif, Université Sciences et Technologies - Bordeaux I, June 2011. -
56C. Rossignon.
Étude du GMRES dans un code de simulation de réservoir, Université Sciences et Technologies - Bordeaux I, June 2011.
57P. Balaji, H.-W. Jin, K. Vaidyanathan, D. K. Panda.
Supporting iWARP Compatibility and Features for Regular Network Adapters, in: Proceedings of the Workshop on Remote Direct Memory Access (RDMA): Applications, Implementations, and Technologies (RAIT); held in conjunction with the IEEE International Confer ence on Cluster Computing, Boston, MA, September 2005. -
58G. Ciaccio, G. Chiola.
GAMMA and MPI/GAMMA on GigabitEthernet, in: Proceedings of 7th EuroPVM-MPI conference, Balatonfured, Hongrie, Lecture Notes in Computer Science, Springer Verlag, Septembre 2000, vol. 1908. -
59G. R. Gao, T. Sterling, R. Stevens, M. Hereld, W. Zhu.
Hierarchical multithreading: programming model and system software, in: 20th International Parallel and Distributed Processing Symposium (IPDPS), April 2006. -
60B. Goglin, S. Moreaud.
KNEM: a Generic and Scalable Kernel-Assisted Intra-node MPI Communication Framework, in: Journal of Parallel and Distributed Computing, 2012, Submitted. -
61A. Mazouz, S.-A.-A. Touati, D. Barthou.
Study of Variations of Native Program Execution Times on Multi-Core Architectures, in: Intl. IEEE Workshop on Multi-Core Computing Systems, Krakow, Poland, IEEE Computer Society, February 2010, 919—924 p.