Bibliography

Major publications by the team in recent years

1G. Antoniu, L. Bougé, P. Hatcher, M. MacBeth, K. McGuigan, R. Namyst.

The Hyperion system: Compiling multithreaded Java bytecode for distributed execution, in: Parallel Computing, October 2001, vol. 27, p. 1279–1297.
2O. Aumage, L. Bougé, A. Denis, L. Eyraud, J.-F. Méhaut, G. Mercier, R. Namyst, L. Prylli.

A Portable and Efficient Communication Library for High-Performance Cluster Computing (extended version), in: Cluster Computing, January 2002, vol. 5, n^o 1, p. 43-54.
3O. Aumage, É. Brunet, N. Furmento, R. Namyst.

NewMadeleine: a Fast Communication Scheduling Engine for High Performance Networks, in: CAC 2007: Workshop on Communication Architecture for Clusters, held in conjunction with IPDPS 2007, Long Beach, California, USA, March 2007, Also available as LaBRI Report 1421-07 and INRIA RR-6085.

http://hal.inria.fr/inria-00127356
4O. Aumage, G. Mercier.

MPICH/MadIII: a Cluster of Clusters Enabled MPI Implementation, in: Proc. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2003), Tokyo, IEEE, May 2003, p. 26–35.
5O. Aumage, G. Mercier, R. Namyst.

MPICH/Madeleine: a True Multi-Protocol MPI for High-Performance Networks, in: Proc. 15th International Parallel and Distributed Processing Symposium (IPDPS 2001), San Francisco, IEEE, April 2001, 51 p, Extended proceedings in electronic form only..
6F. Broquedis, J. Clet-Ortega, S. Moreaud, N. Furmento, B. Goglin, G. Mercier, S. Thibault, R. Namyst.

hwloc: a Generic Framework for Managing Hardware Affinities in HPC Applications, in: Proceedings of the 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP2010), Pisa, Italia, IEEE Computer Society Press, February 2010, p. 180–186. [ DOI : 10.1109/PDP.2010.67 ]

http://hal.inria.fr/inria-00429889
7F. Broquedis, N. Furmento, B. Goglin, P.-A. Wacrenier, R. Namyst.

ForestGOMP: an efficient OpenMP environment for NUMA architectures, in: International Journal on Parallel Programming, Special Issue on OpenMP; Guest Editors: Matthias S. Müller and Eduard Ayguadé, 2010, vol. 38, n^o 5, p. 418-439. [ DOI : 10.1007/s10766-010-0136-3 ]

http://hal.inria.fr/inria-00496295
8D. Buntinas, G. Mercier, W. Gropp.

Implementation and Shared-Memory Evaluation of MPICH2 over the Nemesis Communication Subsystem, in: Recent Advances in Parallel Virtual Machine and Message Passing Interface: Proc. 13th European PVM/MPI Users Group Meeting, Bonn, Germany, September 2006.
9V. Danjean, R. Namyst, R. Russell.

Linux Kernel Activations to Support Multithreading, in: Proc. 18th IASTED International Conference on Applied Informatics (AI 2000), Innsbruck, Austria, IASTED, February 2000, p. 718-723.
10B. Goglin, N. Furmento.

Finding a Tradeoff between Host Interrupt Load and MPI Latency over Ethernet, in: Proceedings of the IEEE International Conference on Cluster Computing, New Orleans, LA, IEEE Computer Society Press, September 2009.

http://hal.inria.fr/inria-00397328
11S. Moreaud, B. Goglin.

Impact of NUMA Effects on High-Speed Networking with Multi-Opteron Machines, in: The 19th IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS 2007), Cambridge, Massachussetts, November 2007.

http://hal.inria.fr/inria-00175747
12R. Namyst.

Contribution à la conception de supports exécutifs multithreads performants, Université Claude Bernard de Lyon, pour des travaux effectués à l'école normale supérieure de Lyon, December 2001, Habilitation à diriger des recherches.
13S. Thibault, F. Broquedis, B. Goglin, R. Namyst, P.-A. Wacrenier.

An Efficient OpenMP Runtime System for Hierarchical Architectures, in: International Workshop on OpenMP (IWOMP), Beijing,China, 6 2007, p. 148–159.

http://hal.inria.fr/inria-00154502
14S. Thibault, R. Namyst, P.-A. Wacrenier.

Building Portable Thread Schedulers for Hierarchical Multiprocessors: the BubbleSched Framework, in: EuroPar, Rennes,France, ACM, 8 2007.

http://hal.inria.fr/inria-00154506
15F. Trahay, É. Brunet, A. Denis, R. Namyst.

A multithreaded communication engine for multicore architectures, in: CAC 2008: Workshop on Communication Architecture for Clusters, held in conjunction with IPDPS 2008, Miami, FL, IEEE Computer Society Press, April 2008.

http://hal.inria.fr/inria-00224999
16F. Trahay, A. Denis, O. Aumage, R. Namyst.

Improving Reactivity and Communication Overlap in MPI using a Generic I/O Manager, in: EuroPVM/MPI, Recent Advances in Parallel Virtual Machine and Message Passing Interface, F. Cappello, T. Herault, J. Dongarra (editors), Lecture Notes in Computer Science, Springer, 2007, n^o 4757, p. 170-177.

http://hal.inria.fr/inria-00177167

Publications of the year

Doctoral Dissertations and Habilitation Theses

17C. Augonnet.

Scheduling Tasks over Multicore machines enhanced with Accelerators: a Runtime System's Perspective, Université Sciences et Technologies - Bordeaux I, December 2011.
18S. Moreaud.

Mouvement de données et placement des tâches pour les communications haute performance sur machines hiérarchiques, Université Sciences et Technologies - Bordeaux I, October 2011.

http://hal.inria.fr/tel-00635651/en

Articles in International Peer-Reviewed Journal

19C. Augonnet, S. Thibault, R. Namyst, P.-A. Wacrenier.

StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures, in: Concurrency and Computation: Practice and Experience, Special Issue: Euro-Par 2009, February 2011, vol. 23, p. 187–198. [ DOI : 10.1002/cpe.1631 ]

http://hal.inria.fr/inria-00550877
20S. Benkner, S. Pllana, J. L. Träf, P. Tsigas, U. Dolinsky, C. Augonnet, B. Bachmayer, C. Kessler, D. Moloney, V. Osipov.

PEPPHER: Efficient and Productive Usage of Hybrid Computing Systems, in: IEEE Micro, 2011, vol. 31, n^o 5, p. 28-41. [ DOI : 10.1109/MM.2011.67 ]

http://hal.inria.fr/hal-00648480/en
21A. Benoit, L.-C. Canon, E. Jeannot, Y. Robert.

Reliability of task graph schedules with transient and fail-stop failures: complexity and algorithms, in: Journal of Scheduling, May 2011.

http://hal.inria.fr/hal-00653477/en
22B. Goglin.

High-Performance Message Passing over generic Ethernet Hardware with Open-MX, in: Journal of Parallel Computing, February 2011, vol. 37, n^o 2, p. 85-100. [ DOI : 10.1016/j.parco.2010.11.001 ]

http://hal.inria.fr/inria-00533058/en
23B. Goglin.

NIC-assisted cache-efficient receive stack for message passing over Ethernet, in: Concurrency and Computation: Practice and Experience, 2011, vol. 23, n^o 2, p. 199-210. [ DOI : 10.1002/cpe.1632 ]

http://hal.inria.fr/inria-00496301/en
24B. Goglin, J. Squyres, S. Thibault.

Hardware Locality: Peering under the hood of your server, in: Linux Pro Magazine, July 2011, n^o 128, p. 28-33.

http://hal.inria.fr/inria-00597961/en
25E. Jeannot, E. Saule, D. Trystram.

Optimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation Algorithms and Heuristics, in: Journal of Parallel and Distributed Computing, 2012, vol. 72, n^o 2, p. 268 – 280. [ DOI : 10.1016/j.jpdc.2011.11.003 ]

International Conferences with Proceedings

26E. Agullo, C. Augonnet, J. Dongarra, M. Faverge, J. Langou, H. Ltaief, S. Tomov.

LU Factorization for Accelerator-based Systems, in: 9th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA 11), Sharm El-Sheikh, Egypt, June 2011.

http://hal.inria.fr/hal-00654193/en
27E. Agullo, C. Augonnet, J. Dongarra, M. Faverge, H. Ltaief, S. Thibault, S. Tomov.

QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators, in: 25th IEEE International Parallel & Distributed Processing Symposium, Anchorage, United States, May 2011.

http://hal.inria.fr/inria-00547614/en
28S. Benkner, S. Pllana, J. Larsson Träff, P. Tsigas, A. Richards, R. Namyst, B. Bachmayer, C. Kessler, D. Moloney, P. Sanders.

The PEPPHER Approach to Programmability and Performance Portability for Heterogeneous many-core Architectures, in: ParCo, Ghent, Belgique, 2011.

http://hal.inria.fr/hal-00661320
29É. Brunet, F. Trahay, A. Denis, R. Namyst.

A sampling-based approach for communication libraries auto-tuning, in: IEEE International Conference on Cluster Computing, Austin, United States, September 2011.

http://hal.inria.fr/inria-00605735/en
30L.-C. Canon, E. Jeannot.

MO-Greedy: an extended beam-search approach for solving a multi-criteria scheduling problem on heterogeneous machines, in: International Heterogeneity in Computing Workshop, Anchorage, United States, September 2011.

http://hal.inria.fr/hal-00653724/en
31L.-C. Canon, E. Jeannot, J. Weissman.

A Scheduling and Certification Algorithm for Defeating Collusion in Desktop Grids, in: International Conference on Distributed Computing Systems, Minneapolis, United States, July 2011.

http://hal.inria.fr/hal-00653493/en
32U. Dastgeer, C. Kessler, S. Thibault.

Flexible runtime support for efficient skeleton programming on hybrid systems, in: International conference on Parallel Computing (ParCo), Gent, Belgium, August 2011.

http://hal.inria.fr/inria-00606200/en
33A. Denis.

A High-Performance Superpipeline Protocol for InfiniBand, in: Euro-Par 2011, Bordeaux, France, E. Jeannot, R. Namyst, J. Roman (editors), Lecture Notes in Computer Science, Springer, August 2011, vol. 6853, p. 276-287.

http://hal.inria.fr/inria-00586015/en
34B. Goglin, S. Moreaud.

Dodging Non-Uniform I/O Access in Hierarchical Collective Operations for Multicore Clusters, in: CASS 2011: The 1st Workshop on Communication Architecture for Scalable Systems, held in conjunction with IPDPS 2011, Anchorage, United States, May 2011, 7p p.

http://hal.inria.fr/inria-00566246/en
35T. Ma, G. Bosilca, A. Bouteiller, B. Goglin, J. Squyres, J. Dongarra.

Kernel Assisted Collective Intra-node MPI Communication Among Multi-core and Many-core CPUs, in: 40th International Conference on Parallel Processing (ICPP-2011), Taipei, Taiwan, Province Of China, September 2011.

http://hal.inria.fr/inria-00602877/en
36A. Mazouz, S.-A.-A. Touati, D. Barthou.

Analysing the Variability of OpenMP Programs Performances on Multicore Architectures, in: Fourth Workshop on Programmability Issues for Heterogeneous Multicores (MULTIPROG-2011), Heraklion, Greece, Held in conjunction with: the 6th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), 2011, 14 p.

http://hal.inria.fr/inria-00637957/en
37G. Mercier, E. Jeannot.

Improving MPI Applications Performance on Multicore Clusters with Rank Reordering, in: EuroMPI, Santorini, Italy, Springer Verlag, September 2011, vol. 6960, p. 39-49. [ DOI : 10.1007/978-3-642-24449-0 ]

http://hal.inria.fr/hal-00643151/en
38B. Putigny, B. Goglin, D. Barthou.

Performance modeling for power consumption reduction on SCC, in: 4th Many-core Applications Research Community (MARC) Symposium, Potsdam, Germany, H. Plattner (editor), December 2011.

http://hal.inria.fr/hal-00649635/en
39F. Trahay, F. Rue, M. Faverge, Y. Ishikawa, R. Namyst, J. Dongarra.

EZTrace: a generic framework for performance analysis, in: IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Newport Beach, CA, United States, May 2011, Poster Session.

http://hal.inria.fr/inria-00587216/en
40S. Yi, E. Jeannot, D. Kondo, D. P. Anderson.

Towards Real-Time, Volunteer Distributed Computing, in: 11th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid 2011), Newport Beach, CA, United States, 2011.

http://hal.inria.fr/hal-00654691/en

National Conferences with Proceeding

41S. Mahmoudi, P. Manneback, C. Augonnet, S. Thibault.

Détection optimale des coins et contours dans des bases d'images volumineuses sur architectures multicœurs hétérogènes, in: Rencontres francophones du parallélisme, Saint-Malo, France, May 2011.

http://hal.inria.fr/inria-00606195/en
42H. Sylvain.

Programmation multi-accélérateurs unifiée en OpenCL, in: RenPAR'20, Saint Malo, France, May 2011.

http://hal.inria.fr/hal-00643257/en

Scientific Books (or Scientific Book chapters)

43P. Vicat-Blanc Primet, B. Goglin, R. Guillier, S. Soudan.

Computing Networks: From Cluster to Cloud Computing, Wiley-ISTE, May 2011.

http://hal.inria.fr/inria-00590739/en
44P. de Oliveira Castro, S. Louise, D. Barthou.

Programming Multi-core and Many-core Computing Systems, Wiley-Blackwell, 2012, To Appear.

Books or Proceedings Editing

45E. Jeannot, R. Namyst, J. Roman (editors)

Euro-Par 2011 Parallel Processing - 17th International Conference, Euro-Par 2011, Bordeaux, France, August 29 - September 2, 2011, Proceedings, Part I, Lecture Notes in Computer Science, Springer, 2011, vol. 6852.
46E. Jeannot, R. Namyst, J. Roman (editors)

Euro-Par 2011 Parallel Processing - 17th International Conference, Euro-Par 2011, Bordeaux, France, August 29 - September 2, 2011, Proceedings, Part II, Lecture Notes in Computer Science, Springer, 2011, vol. 6853.

Scientific Popularization

47B. Goglin.

De votre boulangerie à un système d'exploitation multiprocesseur, in: Interstices, February 2011.

http://hal.inria.fr/inria-00566232/en
48B. Goglin.

Et plus vite si affinités..., in: Interstices, June 2011.

http://hal.inria.fr/inria-00604025/en
49R. Namyst.

Virtualization of Hybrid Architectures, in: Super-computers: at the frontiers of extreme computing, November 2011.

Other Publications

50S. Barascou.

Optimisation des communications pour les calculs parallèles avec SALOME/YACS et PadicoTM, Université Sciences et Technologies - Bordeaux I, September 2011.

http://hal.inria.fr/hal-00652882/en
51A.-E. Hugo.

Composabilité de codes parallèles sur architectures hétérogènes, Université Sciences et Technologies - Bordeaux I, 2011.

http://hal.inria.fr/inria-00619654/en
52J. Jaeger, D. Barthou.

Stencils sur CPU et GPU, December 2011, Quatrième rencontres de la communauté française de compilation, Saint-Hippolyte, France.
53R. Namyst.

Programming heterogeneous, accelerator-based multicore machines:current situation and main challenges, May 2011, Invited Talk.

http://hal.inria.fr/inria-00590670/en
54B. Putigny, D. Barthou, B. Goglin.

Modélisation du coût de la cohérence de cache pour améliorer le tuilage de boucles, December 2011, Quatrième rencontres de la communauté française de compilation, Saint-Hippolyte, France.
55C. Roelandt.

Association de modèles de programmation pour l'exploitation de clusters de GPUs dans le calcul intensif, Université Sciences et Technologies - Bordeaux I, June 2011.
56C. Rossignon.

Étude du GMRES dans un code de simulation de réservoir, Université Sciences et Technologies - Bordeaux I, June 2011.

References in notes

57P. Balaji, H.-W. Jin, K. Vaidyanathan, D. K. Panda.

Supporting iWARP Compatibility and Features for Regular Network Adapters, in: Proceedings of the Workshop on Remote Direct Memory Access (RDMA): Applications, Implementations, and Technologies (RAIT); held in conjunction with the IEEE International Confer ence on Cluster Computing, Boston, MA, September 2005.
58G. Ciaccio, G. Chiola.

GAMMA and MPI/GAMMA on GigabitEthernet, in: Proceedings of 7th EuroPVM-MPI conference, Balatonfured, Hongrie, Lecture Notes in Computer Science, Springer Verlag, Septembre 2000, vol. 1908.
59G. R. Gao, T. Sterling, R. Stevens, M. Hereld, W. Zhu.

Hierarchical multithreading: programming model and system software, in: 20th International Parallel and Distributed Processing Symposium (IPDPS), April 2006.
60B. Goglin, S. Moreaud.

KNEM: a Generic and Scalable Kernel-Assisted Intra-node MPI Communication Framework, in: Journal of Parallel and Distributed Computing, 2012, Submitted.
61A. Mazouz, S.-A.-A. Touati, D. Barthou.

Study of Variations of Native Program Execution Times on Multi-Core Architectures, in: Intl. IEEE Workshop on Multi-Core Computing Systems, Krakow, Poland, IEEE Computer Society, February 2010, 919—924 p.

Previous |

Home