Bibliography

Major publications by the team in recent years

1F. Bodin, A. Seznec.

Skewed associativity improves performance and enhances predictability, in: IEEE Transactions on Computers, May 1997.
2M. Cornero, R. Costa, R. Fernández Pascual, A. Ornstein, E. Rohou.

An Experimental Environment Validating the Suitability of CLI as an Effective Deployment Format for Embedded Systems, in: Conference on HiPEAC, Göteborg, Sweden, P. Stenström, M. Dubois, M. Katevenis, R. Gupta, T. Ungerer (editors), Springer, January 2008, p. 130–144.
3R. Costa, E. Rohou.

Comparing the size of .NET applications with native code, in: 3rd Intl Conference on Hardware/software codesign and system synthesis, Jersey City, NJ, USA, P. Eles, A. Jantsch, R. A. Bergamaschi (editors), ACM, September 2005, p. 99–104.
4D. Hardy, I. Puaut.

WCET analysis of multi-level non-inclusive set-associative instruction caches, in: Proc. of the 29th IEEE Real-Time Systems Symposium, Barcelona, Spain, December 2008.
5T. Lafage, A. Seznec.

Choosing Representative Slices of Program Execution for Microarchitecture Simulations: A Preliminary Application to the Data Stream, in: Workload Characterization of Emerging Applications, Kluwer Academic Publishers, 2000, p. 145–163.
6P. Michaud.

STiMuL: a Software for Modeling Steady-State Temperature in Multilayers - Description and user manual, Inria, Apr 2010, RT-0385.

http://hal.inria.fr/inria-00474286
7P. Michaud, Y. Sazeides, A. Seznec, T. Constantinou, D. Fetis.

A study of thread migration in temperature-constrained multi-cores, in: ACM Transactions on Architecture and Code Optimization, 2007, vol. 4, n^o 2, 9 p.
8P. Michaud, Y. Sazeides, A. Seznec.

Proposition for a Sequential Accelerator in Future General-Purpose Manycore Processors and the Problem of Migration-Induced Cache Misses, in: ACM International Conference on Computing Frontiers, Italie Bertinoro, May 2010.

http://hal.inria.fr/inria-00471410
9P. Michaud, A. Seznec, S. Jourdan.

An Exploration of Instruction Fetch Requirement in Out-of-Order Superscalar Processors, in: International Journal of Parallel Programming, 2001, vol. 29, n^o 1, p. 35-58.
10E. Rohou, M. Smith.

Dynamically managing processor temperature and power, in: Second Workshop on Feedback-Directed Optimizations, 1999.
11A. Seznec, P. Michaud.

A case for (partially)-tagged geometric history length predictors, in: Journal of Instruction Level Parallelism (http://www.jilp.org/vol8), April 2006.

http://www.jilp.org/vol8
12A. Seznec, N. Sendrier.

HAVEGE: a user-level software heuristic for generating empirically strong random numbers, in: ACM Transactions on Modeling and Computer Systems, October 2003.
13A. Seznec.

Analysis of the O-GEHL branch predictor, in: Proceedings of the 32nd Annual International Symposium on Computer Architecture, June 2005.
14A. Seznec.

The L-TAGE Branch Predictor, in: Journal of Instruction Level Parallelism, May 2007.

http://www.jilp.org/vol9
15A. Seznec.

Decoupled sectored caches: conciliating low tag implementation cost, in: SIGARCH Comput. Archit. News, 1994, vol. 22, n^o 2, p. 384–393.

http://doi.acm.org/10.1145/192007.192072

Publications of the year

Articles in International Peer-Reviewed Journals

16P. Michaud.

Demystifying multicore throughput metrics, in: IEEE Computer Architecture Letters, August 2012, p. ISSN: 1556-6056. [ DOI : 10.1109/L-CA.2012.25 ]

http://hal.inria.fr/hal-00737044
17N. Prémillieu, A. Seznec.

SYRANT: SYmmetric Resource Allocation on Not-taken and Taken Paths, in: ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers, January 2012, vol. 8, n^o 4, Article No.: 43 p. [ DOI : 10.1145/2086696.2086722 ]

http://hal.inria.fr/inria-00539647
18E. Rohou, K. Williams, D. Yuste.

Vectorization Technology To Improve Interpreter Performance, in: ACM Transactions on Architecture and Code Optimization, January 2013.

http://hal.inria.fr/hal-00747072

International Conferences with Proceedings

19B. Berna, I. Puaut.

PDPA: Period Driven Task and Cache Partitioning Algorithm for Mumti-core Systems, in: 20th International Conference on Real-Time and Network Systems (RTNS 2012), Pont à Mousson, France, November 2012.

http://hal.inria.fr/hal-00737591
20D. Hardy, I. Sideris, N. Ladas, Y. Sazeides.

The performance vulnerability of architectural and non-architectural arrays to permanent faults, in: MICRO 45, Vancouver, Canada, December 2012.

http://hal.inria.fr/hal-00747488
21K. Kuroyanagi, A. Seznec.

Service Value Aware Memory Scheduler by Estimating Request Weight and Using per-Thread Traffic Lights, in: 3rd JILP Workshop on Computer Architecture Competitions (JWAC-3): Memory Scheduling Championship (MSC), Portland, États-Unis, Rajeev Balasubramonian (Univ. of Utah), Niladrish Chatterjee (Univ. of Utah), Zeshan Chishti (Intel), June 2012.

http://hal.inria.fr/hal-00746951
22J. Lai, A. Seznec.

Break Down GPU Execution Time with an Analytical Method, in: Rapido '12, Paris, France, ACM (editor), January 2012. [ DOI : 10.1145/2162131.2162136 ]

http://hal.inria.fr/hal-00764874
23B. Lesage, I. Puaut, A. Seznec.

PRETI: Partitioned REal-TIme shared cache for mixed-criticality real-time systems., in: RTNS - 20th International Conference on Real-Time and Network Systems - 2012, Pont à Mousson, France, ACM, 2012, 10 p.

http://hal.inria.fr/hal-00661687
24J. Marinho, V. Nélis, S. M. Petters, I. Puaut.

An Improved Preemption Delay Upper Bound for Floating Non-Preemptive Region Scheduling, in: 7th IEEE International Symposium on Industrial Embedded Systems (SIES'12), Karlsruhe, Allemagne, June 2012.

http://hal.inria.fr/hal-00737580
25J. Marinho, V. Nélis, S. M. Petters, I. Puaut.

Preemption Delay Analysis for Floating Non-Preemptive Region Scheduling, in: Design, Automation and Test in Europe 2012, Dresden, Allemagne, March 2012, p. 497-502.

http://hal.inria.fr/hal-00737577
26T. Milanez, S. Collange, F. Magno Quintão Pereira, W. Meira, R. Ferreira.

Data and Instruction Uniformity in Minimal Multi-Threading, in: 24th International Symposium on Computer Architecture and High Performance Computing, New-York, NY, États-Unis, October 2012, p. 270-277. [ DOI : 10.1109/SBAC-PAD.2012.21 ]

http://hal.inria.fr/hal-00755273
27E. Rohou.

Tiptop: Hardware Performance Counters for the Masses, in: 41st International Conference on Parallel Processing Workshops (ICPPW), Pittsburgh, PA, États-Unis, September 2012, p. 404-413. [ DOI : 10.1109/ICPPW.2012.58 ]

http://hal.inria.fr/hal-00747064
28D. Sampaio, R. Martins, S. Collange, F. Magno Quintão Pereira.

Divergence Analysis with Affine Constraints, in: 24th International Symposium on Computer Architecture and High Performance Computing, New-York, NY, États-Unis, October 2012, p. 67-74. [ DOI : 10.1109/SBAC-PAD.2012.22 ]

http://hal.inria.fr/hal-00650235
29R. A. Velasquez, P. Michaud, A. Seznec.

BADCO: Behavioral Application-Dependent Superscalar Core Model, in: SAMOS XII: International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, Samos, Grèce, July 2012.

http://hal.inria.fr/hal-00707346

Internal Reports

30J. Lai, A. Seznec.

Bound the Peak Performance of SGEMM on GPU with software-controlled fast memory, Inria, April 2012, n^o RR-7923.

http://hal.inria.fr/hal-00686006
31P. Michaud.

Constant-work multiprogram throughput metrics for microarchitecture studies, Inria, November 2012, n^o RR-8150.

http://hal.inria.fr/hal-00758195
32A. Perais, A. Seznec.

Revisiting Value Prediction, Inria, November 2012, n^o RR-8155, 22 p.

http://hal.inria.fr/hal-00758713
33R. A. Velasquez, P. Michaud, A. Seznec.

Selecting Benchmarks Combinations for the Evaluation of Multicore Throughput, October 2012, 23 p.

http://hal.inria.fr/hal-00737446

References in notes

34G. M. Amdahl.

Validity of the Single Processor Approach to Achieving Large Scale Computing Capabilities, in: SJCC., 1967, p. 483–485.
35N. Brunie, S. Collange, G. Diamos.

Simultaneous Branch and Warp Interweaving for Sustained GPU Performance, in: ISCA Conference Proceedings, Portland, OR, États-Unis, June 2012, p. 49 - 60. [ DOI : 10.1109/ISCA.2012.6237005 ]

http://hal-ens-lyon.archives-ouvertes.fr/ensl-00649650
36D. Burger, T. M. Austin.

The simplescalar tool set, version 2.0, 1997.
37R. S. Chappell, J. Stark, S. P. Kim, S. K. Reinhardt, Y. N. Patt.

Simultaneous subordinate microthreading (SSMT), in: ISCA '99: Proceedings of the 26th annual international symposium on Computer architecture, Washington, DC, USA, IEEE Computer Society, 1999, p. 186–195.

http://doi.acm.org/10.1145/300979.300995
38C. Ferdinand, R. Wilhelm.

Efficient and Precise Cache Behavior Prediction for Real-Time Systems, in: Real-Time Syst., 1999, vol. 17, n^o 2-3, p. 131–181.

http://dx.doi.org/10.1023/A:1008186323068
39T. S. Karkhanis, J. E. Smith.

A First-Order Superscalar Processor Model, in: Proceedings of the International Symposium on Computer Architecture, Los Alamitos, CA, USA, IEEE Computer Society, 2004, 338 p.

http://doi.ieeecomputersociety.org/10.1109/ISCA.2004.1310786
40B. Lee, J. Collins, H. Wang, D. Brooks.

CPR : composable performance regression for scalable multiprocessor models, in: Proceedings of the 41st International Symposium on Microarchitecture, 2008.
41Y. Liang, T. Mitra.

Cache modeling in probabilistic execution time analysis, in: DAC '08: Proceedings of the 45th annual conference on Design automation, New York, NY, USA, ACM, 2008, p. 319–324.

http://doi.acm.org/10.1145/1391469.1391551
42T. Lundqvist, P. Stenström.

Timing Anomalies in Dynamically Scheduled Microprocessors, in: RTSS '99: Proceedings of the 20th IEEE Real-Time Systems Symposium, Washington, DC, USA, IEEE Computer Society, 1999.
43M. Paolieri, E. Quitones, F. J. Cazorla, R. I. Davis, M. Valero.

IA3: An Interference Aware Allocation Algorithm for Multicore Hard Real-Time Systems, in: 2011 17th IEEE RealTime and Embedded Technology and Applications Symposium, 2011, p. 280–290.

http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=5767118
44L. Rauchwerger, Y. Zhan, J. Torrellas.

Hardware for Speculative Run-Time Parallelization in Distributed Shared-Memory Multiprocessors, in: HPCA '98: Proceedings of the 4th International Symposium on High-Performance Computer Architecture, Washington, DC, USA, IEEE Computer Society, 1998, 162 p.
45T. Sherwood, E. Perelman, G. Hamerly, B. Calder.

Automatically characterizing large scale program behavior, in: In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems, 2002, p. 45–57.
46K. Skadron, M. Stan, W. Huang, S. Velusamy.

Temperature-aware microarchitecture, in: Proceedings of the International Symposium on Computer Architecture, 2003.
47J. G. Steffan, C. Colohan, A. Zhai, T. C. Mowry.

The STAMPede approach to thread-level speculation, in: ACM Trans. Comput. Syst., 2005, vol. 23, n^o 3, p. 253–300.

http://doi.acm.org/10.1145/1082469.1082471
48V. Suhendra, T. Mitra.

Exploring locking & partitioning for predictable shared caches on multi-cores, in: DAC '08: Proceedings of the 45th annual conference on Design automation, New York, NY, USA, ACM, 2008, p. 300–303.

http://doi.acm.org/10.1145/1391469.1391545
49D. M. Tullsen, S. Eggers, H. M. Levy.

Simultaneous Multithreading: Maximizing On-Chip Parallelism, in: Proceedings of the 22th Annual International Symposium on Computer Architecture, 1995.
50J. Yan, W. Zhan.

WCET Analysis for Multi-Core Processors with Shared L2 Instruction Caches, in: Proceedings of Real-Time and Embedded Technology and Applications Symposium, 2008. RTAS '08., 2008, p. 80-89.

Previous |

Home