# **Activity Report 2011** # **Project-Team CAIRN** # Energy Efficient Computing ArchItectures with Embedded Reconfigurable Resources IN COLLABORATION WITH: Institut de recherche en informatique et systèmes aléatoires (IRISA) RESEARCH CENTER Rennes - Bretagne-Atlantique THEME **Architecture and Compiling** # **Table of contents** | 1. | Members | 1 | | | | | |----|---------------------------------------------------------------------------------------------|----|--|--|--|--| | 2. | | | | | | | | | 2.1. Overall Objectives | | | | | | | | 2.2. Highlights | 4 | | | | | | 3. | Scientific Foundations | 4 | | | | | | ٥. | 3.1. Panorama | | | | | | | | 3.2. Dynamically and Heterogeneous Reconfigurable Platforms | - | | | | | | | 3.3. Compilation and Synthesis for Reconfigurable Platform | 6 | | | | | | | 3.4. Algorithm Architecture Interaction | 7 | | | | | | 4. | Application Domains | 7 | | | | | | •• | 4.1. Panorama | 7 | | | | | | | 4.2. 4G Wireless Communication Systems | 8 | | | | | | | 4.3. Wireless Sensor Networks | 8 | | | | | | | 4.4. Multimedia processing | 8 | | | | | | 5. | Software | 8 | | | | | | ٠. | 5.1. Panorama | ۶۶ | | | | | | | 5.2. Gecos | C | | | | | | | 5.3. ID.Fix: Infrastructure for the Design of Fixed-point Systems | 10 | | | | | | | 5.4. UPaK: Abstract Unified Pattern-Based Synthesis Kernel for Hardware and Software System | | | | | | | | 5.5. DURASE: Automatic Synthesis of Application-Specific Processor Extensions | 11 | | | | | | | 5.6. PowWow: Power Optimized Hardware and Software FrameWork for Wireless Motes (AP | | | | | | | | 10-01) | 11 | | | | | | | 5.7. SoCLib: Open Platform for Virtual Prototyping of Multi-Processors System on Chip | 12 | | | | | | | 5.8. OCHRE: On-Chip Randomness Extraction | 12 | | | | | | 6. | New Results | 13 | | | | | | • | 6.1. Dynamically and Heterogeneous Reconfigurable Platforms | 13 | | | | | | | 6.1.1. New Reconfigurable Architectures | 13 | | | | | | | 6.1.1.1. Power models of reconfigurable architectures | 13 | | | | | | | 6.1.1.2. High-level modeling of reconfigurable architectures | 13 | | | | | | | 6.1.1.3. Reconfiguration controller | 14 | | | | | | | 6.1.2. Management of Dynamically Reconfigurable Systems | 14 | | | | | | | 6.1.2.1. Spatio-Temporal Scheduling based on Artificial Neural Networks | 14 | | | | | | | 6.1.2.2. Flexible Communication OS Service | 14 | | | | | | | 6.1.3. Fault-Tolerant Reconfigurable Architectures | 15 | | | | | | | 6.1.4. Low-Power Architectures | 15 | | | | | | | 6.1.4.1. Ultra Low-Power Architecture for Control-Oriented Applications in Wireless Sen | | | | | | | | Nodes | 15 | | | | | | | 6.1.4.2. Wakeup Time and Wakeup Energy Estimation in Power-Gated Logic Clusters | 15 | | | | | | | 6.1.5. Arithmetic Operators for Cryptography | 16 | | | | | | | 6.1.5.1. ECC Processor with Protections Against SCA | 16 | | | | | | | 6.1.5.2. Arithmetic Operators for High-Performance Cryptography | 16 | | | | | | | 6.1.6. SoC Modeling and Prototyping on FPGA-based Systems | 16 | | | | | | | 6.2. Compilation and Synthesis for Reconfigurable Platform | 17 | | | | | | | 6.2.1. Polyhedral based loop transformations for High-Level synthesis | 17 | | | | | | | 6.2.2. Reconfigurable Processor Extensions Generation | 17 | | | | | | | 6.2.3. Run-time Reconfigurable Architecture Modeling | 18 | | | | | | | 6.2.4. Floating-Point to Fixed-Point Conversion | 18 | | | | | | | 6.3. Algorithm Architecture Interaction | 18 | | | | | | | 6.3.1. Flexible hardware accelerators for biocomputing applications | 18 | | | | | | | 6.3.2. Range Estimation and Computation Accuracy Optimization | 19 | |-----|----------------------------------------------------------------------------------|----| | | 6.3.2.1. Range Estimation | 19 | | | 6.3.2.2. Accuracy and performance evaluation | 19 | | | 6.3.3. Reconfigurable Video Coding | 19 | | | 6.3.4. Multi-Antenna Systems | 20 | | | 6.3.5. Cooperative Strategies for Low-Energy Wireless Networks | 20 | | | 6.3.6. Opportunistic Routing | 21 | | | 6.3.7. Adaptive techniques for WSN power optimization | 21 | | 7. | Contracts and Grants with Industry | 22 | | | 7.1. ANR Ingénérie Numérique et Sécurité - ARDyT (2011-2015) | 22 | | | 7.2. ANR Ingénérie Numérique et Sécurité - COMPA (2011-2015) | 22 | | | 7.3. ANR Ingénérie Numérique et Sécurité - DEFIS (2011-2015) | 23 | | | 7.4. ANR ARPEGE - GRECO (2010-2013) | 23 | | | 7.5. Images and Networks competitiveness cluster - 100GFlex project (2010-2013) | 23 | | | 7.6. NANO2012 Program - S2S4HLS (2008-2012) | 23 | | | 7.7. NANO2012 Program - RecMotifs (2008-2012) | 24 | | | 7.8. ANR Architectures du Futur Open-People (2009-2012) | 24 | | | 7.9. ANR BioWiC (2009-2011) | 24 | | | 7.10. ANR Architectures du Futur - CIFAER (2008-2011) | 25 | | | 7.11. ANR Architectures du Futur - FOSFOR (2008-2011) | 25 | | 8. | Partnerships and Cooperations | 25 | | | 8.1. Regional Initiatives | 25 | | | 8.2. National Initiatives | 25 | | | 8.3. European Initiatives | 26 | | | 8.3.1. FP7 Projects | 26 | | | 8.3.2. Collaborations in European Programs, except FP7 | 27 | | | 8.3.3. Major European Organizations with which Cairn has followed Collaborations | 27 | | | 8.4. International Initiatives | 28 | | | 8.4.1. INRIA Associate Teams | 28 | | | 8.4.2. INRIA International Partners | 28 | | | 8.5. Exterior research visitors | 29 | | 9. | Dissemination | | | | 9.1. Animation of the Scientific Community | 29 | | | 9.2. Seminars and Invitations | 30 | | | 9.3. Teaching and Responsibilities | 31 | | | 9.3.1. Teaching Responsibilities | 31 | | | 9.3.2. Teaching | 31 | | | 9.3.3. HDR and PhD | 32 | | 10. | Bibliography | 34 | **Keywords:** Hardware Accelerators, Compiling, Embedded Systems, Energy Consumption, Parallelism, Sensor Networks, Security, Signal Processing, Reconfigurable Architectures, Systemon-Chip, High-Level Synthesis, Arithmetic Operators, Wireless Communications, Cooperative Communications CAIRN is a common project with CNRS, University of Rennes 1 (ENSSAT Lannion and ISTIC/ESIR Rennes) and ENS Cachan-Antenne de Bretagne, and is located on two sites: Rennes and Lannion. The team has been created on January the 1<sup>st</sup>, 2008 and is a "reconfiguration" of the former R2D2 research team from Irisa. ## 1. Members #### **Research Scientists** François Charot [Research Associate (CR) Inria, Rennes] Steven Derrien [Associate professor, University of Rennes 1, ISTIC, on leave at Inria since Sept. 2009, Rennes, HdR] Daniel Menard [Associate professor, University of Rennes 1, ENSSAT, on leave at Inria since Sept. 2011, Lannion, HdR] Olivier Sentieys [Team Leader, Professor, University of Rennes 1, ENSSAT, on leave (half time) at Inria, Lannion, HdR] Arnaud Tisserand [Research Associate (CR) CNRS, Lannion, HdR] #### **Faculty Members** Olivier Berder [Associate professor, University of Rennes 1, ENSSAT, Lannion] Emmanuel Casseau [Professor, University of Rennes 1, ENSSAT, Lannion, HdR] Daniel Chillet [Associate professor, University of Rennes 1, ENSSAT, Lannion, HdR] Sébastien Pillement [Associate professor, University of Rennes 1, IUT, Lannion, HdR] Matthieu Gautier [Associate professor, University of Rennes 1, IUT, Lannion, since Nov. 2011] Patrice Quinton [Professor, Director of the Brittany branch of the ENS de Cachan, Rennes, HdR] Romuald Rocher [Associate Professor, University of Rennes 1, IUT, Lannion] Pascal Scalart [Professor, University of Rennes 1, ENSSAT, Lannion, HdR] Christophe Wolinski [Professor, University of Rennes 1, Director of ESIR, Rennes, HdR] Stanislaw Piestrak [Professor, on leave from University of Metz at Inria since Sept. 2008 until Aug. 2011, Lannion, HdR] ### **Technical Staff** Charles Wagner [IR CNRS SED, until Aug. 2011, Rennes] Philippe Quémerais [Research Engineer (half time), University of Rennes 1, ENSSAT, Lannion] Arnaud Carer [100Gflex Project, Lannion] Romain Fontaine [Perecap Project, Lannion] Remi Pallas [POF Project, Lannion] Nicolas Simon [DEFIS Project since Nov. 2011, Lannion] Amit Kumar [Nano 2012 Project until Sept. 2011, Rennes] Maxime Naullet [IJD INRIA KerGekoz Project, Rennes] Manh Pham [Cifaer ANR Project until Dec. 2011, Lannion] Vaibhav Bhatnagar [IC INRIA SNOW Project, Lannion] #### PhD Students Michel Thériault [CSRNG Canada grant (co-supervision with Laval University, Québec), Lannion] Antoine Eiche [University grant, Lannion] Quoc-Tuong Ngo [Region/CG22 University grant, Lannion] Andrei Banciu [CIFRE grant, STMicroelectronics, Grenoble] Karthick Parashar [Inria Cordi grant, Lannion] Antoine Floch [Inria grant, Rennes] Antoine Morvan [Inria grant, Rennes] Naeem Abbas [Inria grant, Rennes] Le Quang Vinh Tran [MENRT grant, Lannion] Chenglong Xiao [Inria grant, Lannion] Jean-Charles Naud [Inria grant, Lannion] Matthieu Texier [CEA grant, Saclay] Thomas Chabrier [Brittany Region/CG22 University grant, Lannion] Danuta Pamula [Co-tutelle France-Poland, Lannion] Robin Bonamy [University grant, Lannion] Vivek D. Tovinakere [University grant, Lannion] Mahtab Alam [University grant, Lannion] Amine Didioui [CEA grant, Grenoble] Hervé Yviquel [MENRT grant, Lannion] Istas Pratomo [Indonesian Gov. grant, Lannion] Aymen Chakhari [Brittany Region INRIA grant, Lannion] Trong-Nhan Le [University grant, Lannion] Pramod P. Udupa [University grant, Lannion] Ganda-Stéphane Ouedraogo [MENRT grant, Lannion] Karim Bigou [INRIA/DGA grant, Lannion] Franck Bucheron [DGA grant, Rennes] Romain Brillu [CIFRE grant, Thales, Palaiseau] Riham Nehmeh [CIFRE grant, STMicroelectronics, Grenoble] Quang-Hai Khuat [Brittany Region/CG22 University grant, Lannion] #### **Post-Doctoral Fellows** Ruifeng Zhang [Since Apr. 2010, Lannion] Cécile Beaumin [ATER Univ. Rennes 1 since Oct. 2011, Lannion] Ludovic Devaux [ATER Univ. Rennes 1 since Oct. 2011, Lannion] Kevin Martin [ATER Univ. Rennes 1 until Aug. 2011, Rennes] Quentin Meunier [NANO2012 Project until Aug. 2011, Lannion] #### **Administrative Assistant** Nadia Saintpierre [Assistant, INRIA, Rennes] # 2. Overall Objectives ## 2.1. Overall Objectives The scientific aim of CAIRN is to study hardware and software architectures of *Reconfigurable System-on-Chip* (RSoC), i.e. integrated chips which include reconfigurable blocks whose hardware configuration may be changed before or even during execution. Reconfigurable systems have been considered by research in computer science and electrical engineering for about twenty years [95], [102] thanks to the possibilities opened up initially by Field Programmable Gate Arrays (FPGA) technology and more recently by reconfigurable processors [92], [3], [9]. In FPGA, a particular hardware configuration is obtained by loading a bit-stream that is used to shape parameterizable blocks into specific hardware functions. In a reconfigurable processor, coarse-grained logic elements operate on word-size operands and employ reconfigurable operators as computing elements. They are generally tightly coupled with one or more processor cores and act as reconfigurable computing accelerators. Usually, the configuration streams are small enough to ensure run-time – or dynamic – reconfiguration. In a broader sense, hardware reconfiguration may happen not only in a single chip, but also in a distributed hardware system, in order to adapt this system to changing conditions. This happens, for example, on a mobile system. Recent evolutions in technology and modern hardware systems confirm that reconfigurable chips are increasingly used in modern applications or embedded into more general System-on-Chip (SoC) [116]. Rapidly changing application standards in fields such as communications and information security ask for frequent modifications of the devices. Software updates may often not be sufficient to keep devices in the market, but hardware redesigns are quite expensive. The need to continuously adapt to changing environments (e.g. cognitive radio) is another incentive to use dynamic reconfiguration at runtime. Finally, with technologies at 65 nm and below, manufacturing problems strongly influence electrical parameters of transistors, and transient errors caused by particles or radiations will also more and more often appear during execution: error detection and correction mechanisms or autonomic self-control can benefit from reconfiguration capabilities. Standard processors or system-on-chip enable to develop flexible software on fixed hardware. Reconfigurable platforms enable to develop *flexible software on flexible hardware*. As the density of chips increases [115], power efficiency has become "the Grail" of chip architects: not only for portable devices but also for high-performance general-purpose processors, power (or energy) considerations are as important as the overall performance of the products. This power challenge can only be tackled by using application-specific architectures, or at least by incorporating some application-specific elements into SoCs, as ASICs (Application Specific Integrated Circuit) are much more power-efficient than GPPs (General-Purpose Processor). The designers of SoCs thus face a very difficult challenge: trading between the flexibility of GPP which leads to high-volume and short design time, and the efficiency of ASICs which helps solving the power efficiency problem. Therefore, reconfigurable architectures are often recognized to exhibit the best trade-off potential between power, performance, cost and flexibility [114], [98] because their hardware structure can be adapted to the application needs. However, reconfigurable systems raise several questions: - What are the basic elements of a good reconfigurable system? In the early days, they were bit-level operators, and they tend to become word-level operators. There is however no agreement on the model that should be used. - How can we reconfigure such a system quickly? When to reconfigure? What is the information needed to reconfigure? - How can we program efficiently reconfigurable systems? We would like to have compilers, not hardware synthesizers and place-and-routers. - In an application, what must be targeted to reconfigurable chips and what to conventional processors? More generally, how can we transform and optimize an algorithm to take advantage of the potential of reconfigurable chips? - How reconfigurable architectures can impact security of a complete SoC? The scientific goal of CAIRN is to contribute to answer these questions, based on our background and past experience. To this end, CAIRN intends to approach energy efficient reconfigurable architectures from three angles: the invention of **new reconfigurable platforms**, associated **design and compilation tools**, and the exploration of the **interaction between algorithms and architectures**. Power consumption and processing power are considered as the main constraints in our proposed architecture, design flow and algorithm optimizations, in order to maximize the global energy efficiency of the system. **Wireless Communication** is our privileged field of applications. Our research includes the prototyping of parts of these applications on reconfigurable and programmable platforms. Moreover in the framework of research and/or contractual cooperations other **application domains** are considered: bioinformatics, image indexing, video processing, operators for cryptography and traffic filtering in high-speed networks. Members of the CAIRN team have collaborations with large companies like STmicroelectronics (Grenoble), Technicolor (Rennes), Thales (Paris), Alcatel (Lannion), France-Telecom Orange Labs (Lannion), Atmel (Nantes), Xilinx (USA), SME like Geensys (Nantes), R-interface (Marseille), TeamCast/Ditocom (Rennes), Sensaris (Grenoble), Envivio (Rennes), InPixal (Rennes), Sestream (Paris), Ekinops (Lannion) and Institute like DGA (Rennes), CEA (Saclay, Grenoble). They are involved in several national or international funded projects (FP7 Alma, FP7 Flextiles, ITEA2 Geodes, Nano2012 S2S4HLS and RECMOTIF projects, ANR funded Ardyt, Defis, Faon, Compa, BioWic, Open-People, Greco, Ocelot and "Images&Networks Competitiveness Cluster" funded 100Gflex). ## 2.2. Highlights Daniel Menard and Steven Derrien defended their "Habilitation à Diriger des Recherches (HDR)" thesis in 2011. ## 3. Scientific Foundations #### 3.1. Panorama The development of complex applications is traditionally divided into three steps: theoretical study of the algorithms, study of the target architecture and implementation. When facing new emerging applications such as high-performance, low-power, low-cost mobile communication systems or smart sensor-based systems, it is mandatory to strengthen the design flow by a simultaneous study of both algorithmic and architectural issues<sup>1</sup>. Figure 1. CAIRN's general design flow and related research themes <sup>&</sup>lt;sup>1</sup>Often referenced as algorithm-architecture mapping or interaction. Figure 1 shows the global design flow that we propose to develop. It is organized in levels which refer to our three research themes: application optimization (algorithmic, fixed-point and advanced representations of numbers), platform instance optimization (hardware and middleware), and stepwise refinement and compilation of software tasks (transformations, configuration generation). In the rest of this part, we briefly describe the challenges concerning **new reconfigurable platforms** in Section 3.2, the issues on **compiler and synthesis tools** related to these platforms in Section 3.3, and the remaining challenges in **algorithm architecture interaction** in Section 3.4. ## 3.2. Dynamically and Heterogeneous Reconfigurable Platforms One available technology for building reconfigurable systems is the field-programmable gate arrays (FPGA) introduced to the market in the mid 1980s. Today's components feature millions of gates of programmable logic, and they are dense enough to host complete computing systems on a programmable chip. These FPGAs have been the reconfigurable computing mainstream for a couple of years and achieve flexibility by supporting gate-level reconfigurability, e.g. they can be fully optimized for any application at the bit level. However, their flexibility is achieved at a very important interconnection cost. To be configured, a large amount of data must be distributed via a slow serial programming process to all the processing and interconnection resources. Configurations must be stored in an external memory. These interconnection and configuration overheads lead to energy inefficient architectures. To increase optimization potential of programmable processors without the FPGAs penalties, the functional-level reconfiguration was introduced. *Reconfigurable Processors* are the most advanced class of reconfigurable architectures. The main concern of this class of architectures is to support flexibility while reducing reconfiguration overhead. Precursors of this class were the KressArray [103], RaPid [101], and RaW machines [118] which were specifically designed for streaming algorithms. Morphosys [106], Remarc [110] or Adres [99] contain programmable ALUs with a reconfigurable interconnect. These works have led to commercial products such as the Extreme Processor Platform (XPP) [91] from PACT, Bresca [113] from Silicon Hive, designed mainly for telecommunication applications. Another strong trend towards heterogeneous reconfigurable processors can be observed. Hybrid architectures combine standard GPP or DSP cores with arrays of *field-configurable elements*. These new reconfigurable architectures are entering the commercial market. Some of their benefits are the following: functionality on demands (set-top boxes for digital TV equipped with decoding hardware on demand), acceleration on demand (coprocessors that accelerate computationally demanding applications in multimedia, communications applications), and shorter time to market (products that target ASIC platforms can be released earlier using reconfigurable hardware). Dynamic reconfiguration allows an architecture to adapt itself to various incoming tasks. This requires complex management and control which can be provided as services of a real-time operating system (RTOS) [107]: communication, memory management, task scheduling [97] [94] and task placement [89]. Such an Operating System (OS) approach has many advantages: it is a complete design framework, independent of the technology and of the hardware architecture, thus helping to drastically reduce the design time of the complete platform. Communications in a reconfigurable platform is also a very important research subject. The role of communication resources is to support transactions between the different components of the platform, either between macro-components of the platform – main processor, dedicated modules, dynamically reconfigurable parts of the platform – or inside the elements of the reconfigurable parts themselves. This has motivated studies on Networks on Chip for Reconfigurable SoCs [93] [112] that trade flexibility and quality of service. In CAIRN we mainly target reconfigurable system-on-chip (RSoC) defined as a set of computing and storing resources organized around a flexible interconnection network and integrated onto a single silicon chip (or programmable chip such as FPGAs). The architecture is specialized for an application domain, and the flexibility is featured by hardware reconfiguration and software programmability. Therefore, computing resources are heterogeneous and we focus on the following: - Reconfigurable hardware blocks with a dynamic behavior where reconfigurability can be achieved at the bit or at the operator level. Our research aims at defining new reconfigurable computing and storing resources. Since reconfiguration must occur as fast as possible (typically a few cycles), the reduction of the configuration bit-stream length is also a key issue. - When performance and power consumption are major constraints, it is well known that optimized specialized hardware blocks (often called IPs for Intellectual Properties) are the best (and often the only) solution. As a flexible extension of specialized IPs, we study multi-mode components for very specific set of high-complexity algorithms, without loss of performance. - Specialized **processors with tailored instruction-set** still offer a viable solution to trade between energy efficiency and flexibility. They are especially interesting in the context of recent FPGA platforms where multiple processors can be easily embedded. We also focus on the automatic generation of an optimized customized instruction-set and of the associated data-path and interface with an embedded processor core. ## 3.3. Compilation and Synthesis for Reconfigurable Platform The absence of compilers is one of the major limitations for the use of reconfigurable architectures in real-life applications. Therefore, the ability to compile and optimize code on reconfigurable hardware platforms from high-level specifications is the key for a real success story and is a hot topic in the research community. We continue our research efforts to offer **efficient tools with close links to architectures**. Most current programming environments for reconfigurable systems consist of separate tool flows for the software and the hardware. Processor code and configuration data for the reconfigurable processing units are handcrafted and wrapped into libraries of functions. Progress beyond current practices calls for compilers capable of generating code and configurations from a high-level general-purpose programming language. Such a compiler decides which operations go into the reconfigurable processors. Loops or frequently executed code fragments are good candidates for reconfigurable platforms. For general-purpose code, this leads to several problems: it is difficult to extract sets of operations with matching granularity at a sufficient level of parallelism; inner loops of general-purpose programs often contain excess code; i.e. code that must be run on a CPU such as exceptions, function or system calls. Efforts aimed at automatic code generation for reconfigurable architectures include works of [111], [117] and [120]. Another approach to programming and design of reconfigurable platform, especially for special-purpose elements, is to use techniques inspired from high-level synthesis. Here also, loops are the target of the methods: the goal is to either generate special-purpose architectures made out of arithmetic operators or to produce parallel architectures. In both cases, the output may be either efficient special-purpose hardware for computation-intensive tasks and/or the parameters for a reconfigurable architecture. Such approaches will eventually create a bridge between compilation techniques and hardware design. Finally, we continue to investigate floating-point to fixed-point automatic conversion with the objective to develop an open-source tool. Multimedia and signal processing are main application fields for reconfigurable platforms. In general, these algorithms are specified using floating-point operations, but, for efficiency reasons, they are often implemented with fixed-point operations either in software for DSP cores or as special-purpose hardware. Unfortunately, fixed-point conversion is very challenging and time-consuming, typically demanding 25 to 50% of the total design or implementation time<sup>2</sup>. Thus, tools are required to automate this conversion. In software implementations (DSP, MCU), the aim is to define an optimized fixed-point specification which <sup>&</sup>lt;sup>2</sup>http://www.mathworks.com/company/newsletters/digest/may04/uwb.html minimizes the code size and the execution time for a given computation accuracy constraint. This optimization is achieved through the modification of the scaling operation location and the selection of the data word-length according to the different data-types supported by DSPs. In hardware implementations (ASIC, FPGA), the complete architecture has to be defined. The efficient implementation requires to minimize the architecture size and the power consumption. Thus, the conversion process goal is to minimize the operator word-length. In the fixed-point conversion process, one of the main challenge is to evaluate the fixed-point specification accuracy. For DSP-software implementation, methodologies have been proposed [105], [109], [108] to achieve a floating-point to fixed-point conversion leading to an ANSI-C code with integer data types. One of the key is to closely link the compilation flow to the latest DSP features. For hardware implementation, the best results are obtained when the word-length optimization process is coupled with the high-level synthesis [104] [96]. ## 3.4. Algorithm Architecture Interaction As CAIRN mainly targets domain-specific systems-on-chip including reconfigurable capabilities, algorithmic-level optimizations have a great potential on the efficiency of the overall system. Based on the skills and experiences in "signal processing and communications" of some CAIRN's members, we conduct research on algorithmic optimization techniques under two main constraints: energy consumption and computation accuracy; and for two main application domains: fourth-generation (4G) mobile telecommunications and wireless sensor networks (WSN). These application domains are very conducive to our research activities. The high complexity of the first one and the stringent power constraint of the second one, require the design of specific high-performance and energy efficient SoCs. Sections 4.1 to 4.4 detail the application domains that we focus on. We also work on computer arithmetic operators and representations of numbers for hardware and software implementations. We provide algorithms for evaluating operations such as: addition, multiplication, multiplication by constant, power, division, roots, (inverse) trigonometric functions, (inverse) hyperbolic functions, logarithms, exponentials, and combinations. For hardware implementations, we work on the reduction of the delay, silicon area and power consumption. For software implementations, we focus on high-performance computing libraries on general purpose processors (GPPs) and graphic processor units (GPUs). We work on the use of exotic representations of numbers in specific domains such as secured implementations of cryptosystems with high-performance protection against side-channel analysis or fault attacks. # 4. Application Domains #### 4.1. Panorama Our research is based on realistic applications, in order to both discover the main needs created by these applications and to invent realistic and interesting solutions. The high complexity of the **Next-Generation (4G) Wireless Communication Systems** leads to the design of real-time high-performance specific architectures. The study of these techniques is one of the main field of applications for our research, based on our experience on WCDMA for 3G implementation. In **Wireless Sensor Networks** (WSN), where each wireless node has to operate without battery replacement for a long time, energy consumption is the most important constraint. In this domain, we mainly study energy-efficient architectures and wireless cooperative techniques for WSN. **Intelligent Transportation Systems** (ITS), and especially Automotive Systems, more and more apply technology advances. While wireless transmissions allow a car to communicate with another or even with road infrastructure, **automotive industry** can also propose driver assistance and more secure vehicles thanks to improvements in computation accuracy for embedded systems. Other important fields will also be considered: hardware cryptographic and security modules, specialized hardware systems for the filtering of the network traffic at high-speed, high-speed true-random number generation for security, content-based image retrieval and video processing. ## 4.2. 4G Wireless Communication Systems With the advent of the next generation (4G) broadband wireless communications, the combination of MIMO (Multiple-Input Multiple-Output) wireless technology with Multi-Carrier CDMA (MC-CDMA) has been recognized as one of the most promising techniques to support high data rate and high performance. Moreover, future mobile devices will have to propose interoperability between wireless communication standards (4G, WiMax ...) and then implement MIMO pre-coding, already used by WiMax standard. Finally, in order to maximize mobile devices lifetime and guarantee quality of services to consumers, 4G systems will certainly use cooperative MIMO schemes or MIMO relays. Our research activity focuses on MIMO pre-coding and MIMO cooperative communications with the aim of algorithmic optimization and implementation prototyping. #### 4.3. Wireless Sensor Networks Sensor networks are a very dynamic domain of research due, on the one hand, to the opportunity to develop innovative applications that are linked to a specific environment, and on the other hand to the challenge of designing totally autonomous communicating objects. Cross-layer optimizations lead to energy-efficient architectures and cooperative techniques dedicated to sensor networks applications. In particular, cooperative MIMO techniques are used to decrease the energy consumption of the communications. ## 4.4. Multimedia processing In multimedia applications, audio and video processing is the major challenge embedded systems have to face. It is computationally intensive with power requirements to meet. Video or image processing at pixel level, like image filtering, edge detection and pixel correlation or at bloc level such as transforms, quantization, entropy coding and motion estimation have to be accelerated. We investigate the potential of reconfigurable architectures for the design of efficient and flexible accelerators in the context of multimedia applications. ## 5. Software #### 5.1. Panorama Besides the development of new reconfigurable architectures, the need for efficient compilation flow is stronger than ever. Challenges come from the high parallelism of these architectures and also from new constraints such as resource heterogeneity, memory hierarchy and power constraints and management. We aim at defining a highly effective software framework for the compilation of high-level specifications into optimized code executed on a reconfigurable hardware platform. Figure 2 shows the global framework that we are currently developing. Our approach assumes that the application is specified as a hierarchical block diagram of communicating tasks expressing data-flow or control, where each task is expressed using languages like C, Signal, Scilab or Matlab, and is then transformed into an internal representation by the compiler front-end. Then, our framework is based on applying some high-level transformations onto the internal representation. Figure 2. CAIRN's general software development framework Different internal representations are used depending on the targeted transformations or the targeted architectures. - The classical Control and Data Flow Graph (CDFG) is the main internal formalism of our framework. It is the basis for transformations like code optimizations, fixed-point transformations, instruction-set extraction or scheduling. Gateways will be provided from CDFG to other supported formalisms. - The Hierarchical Conditional Dependency Graph (HCDG) format is also used as the internal representation for pattern-based transformations. - Other internal representations like Signal Flow Graphs (SFG) and Polyhedral Reduced Dependence Graph (PRDG) will be used respectively for application accuracy estimation and loop parallelization techniques. Finally, back-end tools enable the generation of code like VHDL for the hardwired or reconfigurable blocks, C for embedded processor software, and SystemC for simulation purposes (e.g. fixed-point simulations). The compiler front-end, the back-end generators, the transformation toolbox as well as the different internal representations and their respective gateways are based on a single framework: the Gecos framework. Besides CAIRN's general design workflow, and in order to promote research undertaken by CAIRN, several hardware and software prototypes are developed. Among those, some distributed software are presented in this report: Gecos a flexible compilation platform, ID.Fix an infrastructure for the automatic transformation of software code aiming at the conversion of floating-point data types into a fixed-point representation, UPaK and Durase for the compilation and the synthesis targeting reconfigurable platforms, and Interconnect Explorer a high-level power and delay estimation tool for on-chip interconnects. ### 5.2. Gecos **Participants:** Steven Derrien [correspondant], Daniel Menard, Kevin Martin, Maxime Naullet, Antoine Floch, Antoine Morvan, Clément Guy, Amit Kumar. The Gecos (Generic Compiler Suite) project is an open source Eclipse-based C compiler infrastructure developed in the CAIRN group since 2004 that allows for fast prototyping of complex compiler passes. Gecos was designed so as to address part of the shortcomings of existing C/C++ infrastructures such as SUIF and LLVM Gecos is a 100% Java based implementation and is based on modern software engineering practices. It uses Eclipse plugin as an underlying infrastructure and thus takes benefits of its plugin mechanism to be easily extensible. Gecos follows Model Driven Software Engineering techniques and rely on Eclipse Modeling Framework. The framework is open-source and is hosted on the INRIA gforge at <a href="http://gecos.gforge.inria.fr">http://gecos.gforge.inria.fr</a>. The Gecos infrastructure is still under very active development, and now serves as a backbone infrastructure to many group members (Upak, Durase, ID.Fix). In 2011, the work has focused on extending the loop analysis transformation framework, which now includes an OpenMP static analysis tool (developed jointly with Colorado State University) that was presented in June at the 7th International Workshop on OpenMP [39]. The software engineering challenges posed by optimizing compiler also happen to be a novel and promising application field for the MDE community, which led to joint publication [45] with members from CSU and the Triskell EPI team at the IEEE/ACM Models conference in October 2011. This cross fertilization between MDE and Compilers is the core topic of Clément Guy's PhD thesis supervised by members of CAIRN (S. Derrien) and Triskell (J.M. Jezequel and B. Combemale). ## 5.3. ID.Fix: Infrastructure for the Design of Fixed-point Systems **Participants:** Daniel Menard [correspondant], Olivier Sentieys, Romuald Rocher, Nicolas Simon, Quentin Meunier. The different techniques proposed by the team for fixed-point conversion are implemented on the ID.Fix infrastructure. The application is described with a C code using floating-point data types and different pragmas, used to specify parameters (dynamic, input/output word-length, delay operations) for the fixed-point conversion. This tool determines and optimizes the fixed-point specification and then, generates a C code using fixed-point data types (ac\_fixed) from Mentor Graphics. The infrastructure is made-up of three main modules corresponding to the fixed-point conversion (Fix.Conv), the accuracy evaluation (Acc.Eval) and the dynamic range evaluation (Dyn.Eval). The different developments carried-out in 2011 allow obtaining a fixed-point conversion tool handling functions, conditional structures and repetitive structures having a fixed number of iterations during time. For the accuracy evaluation (Acc.Eval), conditional structures and correlation between noise sources have been considered. For the dynamic range evaluation (Dyn.Eval), the method based on the Karhunen-Loève Expansion (KLE) have been implemented. It allows determining the dynamic range for a given overflow probability. The development of this tool has been achieved thanks to an INRIA post-doc in the context of S2S4HLS project until August 2011, and a University of Rennes graduate engineer from November 2011 in the context of DEFIS ANR project and different students during their training period. # 5.4. UPaK: Abstract Unified Pattern-Based Synthesis Kernel for Hardware and Software Systems Participants: Christophe Wolinski [correspondant], François Charot, Antoine Floch. We are developing (with strong collaboration of Lund University, Sweden and Queensland University, Australia) UPaK Abstract Unified Pattern Based Synthesis Kernel for Hardware and Software Systems [119]. The preliminary experimental results obtained by the UPak system show that the methods employed in the systems enable a high coverage of application graphs with small quantities of patterns. Moreover, high application execution speed-ups are ensured, both for sequential and parallel application execution with processor extensions implementing the selected patterns. UPaK is one of the basis for our research on compilation and synthesis for reconfigurable platforms. It is based on the HCDG representation of the Polychrony software designed at INRIA-Rennes in the project-team Espresso. # 5.5. DURASE: Automatic Synthesis of Application-Specific Processor Extensions Participants: Christophe Wolinski [correspondant], François Charot, Antoine Floch. We are developing a framework enabling the automatic synthesis of application specific processor extensions. It uses advanced technologies, such as algorithms for graph matching and graph merging together with constraints programming methods. The framework is organized around several modules. - CoSaP: Constraint Satisfaction Problem. The goal of CoSaP is to decouple the statement of a constraint satisfaction problem from the solver used to solve it. The CoSaP model is an Eclipse plugin described using EMF to take advantage of the automatic code generation and of various EMF tools. - HCDG: Hierarchical Conditional Dependency Graph. HCDG is an intermediate representation mixing control and data flow in a single acyclic representation. The control flow is represented as hierarchical guards specifying the execution or the definition conditions of nodes. It can be used in the Gecos compilation framework via a specific pass which translates a CDFG representation into an HCDG. - Patterns: Flexible tools for identification of computational pattern in a graph and graph covering. These tools model the concept of pattern in a graph and provide generic algorithms for the identification of pattern and the covering of a graph. The following sub-problems are addressed: (sub)-graphs isomorphism, patterns generation under constraints, covering of a graph using a library of patterns. Most of the implemented algorithms use constraints programming and rely on the CoSaP module to solve the optimization problem. # **5.6.** PowWow: Power Optimized Hardware and Software FrameWork for Wireless Motes (AP-L-10-01) **Participants:** Olivier Sentieys [correspondant], Olivier Berder, Romain Fontaine, Arnaud Carer, Samuel Mouget, Steven Derrien. PowWow is an open-source hardware and software platform designed to handle wireless sensor network (WSN) protocols and related applications. Based on an optimized preamble sampling medium access (MAC) protocol, geographical routing and protothread library, PowWow requires a lighter hardware system than Zigbee [90] to be processed (memory usage including application is less than 10kb). Therefore, network lifetime is increased and price per node is significantly decreased. CAIRN's hardware platform (see Figure 3) is composed of: - The motherboard, designed to reduce power consumption of sensor nodes, embeds an MSP430 microcontroller and all needed components to process PowWow protocol except radio chip. JTAG, RS232, and I2C interfaces are available on this board. - The radio chip daughter board is currently based on a TI CC2420. - The coprocessing daughter board includes a low-power FPGA which allows for hardware acceleration for some PowWow features and also includes dynamic voltage scaling features to increase power efficiency. The current version of PowWow integrates an Actel IGLOO AGL250 FPGA and a programmable DC-DC converter. We have shown that gains in energy of up to 700 can be obtained by using FPGA acceleration on functions like CRC-32 or error detection with regards to a software implementation on the MSP430. PowWow distribution also includes a generic software architecture using event-driven programming and organized into protocol layers (PHY, MAC, LINK, NET and APP). The software is based on Contiki [100], and more precisely on the Protothread library which provides a sequential control flow without complex state machines or full multi-threading. Figure 3. CAIRN's PowWow motherboard with radio board connected To optimize the network regarding a particular application and to define a global strategy to reduce energy, PowWow offers the following extra tools: over-the-air reprogramming (and soon reconfiguration), analytical power estimation based on software profiling and power measurements, a dedicated network analyzer to probe and fix transmissions errors in the network. More information can be found at <a href="http://powwow.gforge.inria.fr">http://powwow.gforge.inria.fr</a>. # 5.7. SoCLib: Open Platform for Virtual Prototyping of Multi-Processors System on Chip Participants: François Charot [correspondant], Laurent Perraudeau, Charles Wagner. SoCLib is an open platform for virtual prototyping of multi-processors system on chip (MP-SoC) developed in the framework of the SoCLib ANR project. The core of the platform is a library of SystemC simulation models for virtual components (IP cores), with a guaranteed path to silicon. All simulation models are written in SystemC, and can be simulated with the standard SystemC simulation environment distributed by the OSCI organization. Two types of models are available for each IP-core: CABA (Cycle Accurate / Bit Accurate), and TLM-DT (Transaction Level Modeling with Distributed Time). All simulation models are distributed as free software. We have developed the simulation model of the NIOSII processor, of the Altera Avalon interconnect, and of the TMS320C62 DSP processor from Texas Instruments. Find more information on its dedicated web page: http://www.soclib.fr. ## 5.8. OCHRE: On-Chip Randomness Extraction Participants: Olivier Sentieys [correspondant], Arnaud Carer, Arnaud Tisserand. Ochre is a set of synthesizable VHDL models for true and pseudo random number generation and hardware accelerated statistical tests. It includes IP cores of different oscillator-based TRNGs, different PRNGs (linear feedback shift registers, cellular automata, AES) and several statistical tests (FIPS 140-2, AIS31, Diehard). This set of IPs has been used to design Ochre V1 and V2 chips and were delivered under license to a company. ## 6. New Results ## 6.1. Dynamically and Heterogeneous Reconfigurable Platforms #### 6.1.1. New Reconfigurable Architectures 6.1.1.1. Power models of reconfigurable architectures Participants: Robin Bonamy, Daniel Chillet, Olivier Sentieys. Including a reconfigurable area in complex systems-on-chip is now considered as an interesting solution to reduce the area of the global system and to support high performances. But the key challenge in the context of embedded systems is currently the power budget of the system, and the designer needs some early estimations of the power consumption of its system. Power estimation for reconfigurable systems is a difficult problem because several parameters need to be taken into account to define an accurate model. Hardware implementation of an algorithm provides different choices to the designer compared to software implementation. It is possible to vary the parallelism level or loop unrolling index, which has a direct impact on area and execution time and therefore on power and energy consumption. First we evaluated delay, area, power and energy impacts of loop transformations using High Level Synthesis tools. We have made several power measurements on a real FPGA platform and for different task implementations in order to build a model of energy consumption versus execution time. Work is in progress to also characterize energy consumption of tasks through extracting the number of elementary operators used in the hardware implemented task. Furthermore, we also consider the opportunity of the dynamic reconfiguration, which makes possible to partially reconfigure a specific part of the circuit while the rest of the system is running. This opportunity has two main effects on power consumption. First, thanks to the area sharing ability, the global size of the device can be reduced and the static (leakage) power consumption can thus be reduced. Secondly, it is possible to delete the configuration of a part of the device which reduces the dynamic power consumption when a task is no longer used. Although the cost of the reconfiguration is still important, in some cases this technique can be interesting to reduce the power of the system. To evaluate the potential gain of the dynamic reconfiguration, we have made some measurements on a Virtex 5 board. We have defined a first model of the power consumption of the reconfiguration. This model shows that the power consumption not only depends on the bitstream file size but also on the content of the reconfiguration region [41], [42]. These experiments allow us to define energy and delay models that will be used by the operating system including a power management strategy to decide on-line which task instances must be executed to efficiently manage the available power using dynamic partial reconfiguration [82]. 6.1.1.2. High-level modeling of reconfigurable architectures Participants: Robin Bonamy, Daniel Chillet, Sébastien Pillement. To help System-on-Chip designers to explore the large design space, high-level methodologies and tools are more and more often required. The exploration phase is particularly difficult when the system must satisfy a large number of constraints, like performance, real time and power consumption. If the classical multiprocessor system-on-chips can be modeled without any difficulty, dynamically reconfigurable embedded accelerators are not correctly covered by the usual modeling languages. In this context, we have extended the AADL (Architecture Analysis and Design Language) language to include the reconfiguration aspect included in nowadays' MPSoC [19], [40]. This work is part of a more general project, Open-People, which proposes complete methodology for power and energy consumption analysis. The proposal is based on AADL property extensions which are applied on component models. A three-level model has been defined for every targeted FPGA. The first level defines a generic FPGA which allows to model every possible FPGA. The second level allows the specialization of the FPGA for a specific family. Finally, the third level provides the support to describe the deployment of an application on a specific FPGA circuit. To complete these levels of description, we started the development of techniques for constraint verifications. These developments are based on the OCL language, which allows to extract characteristics on the AADL model, compute mathematical expressions and finally verify mathematical constraints. These verifications have been developed for power and energy consumption, they include static and dynamic power estimation and soon the power consumption during the dynamic reconfiguration process. #### 6.1.1.3. Reconfiguration controller Participants: Manh Pham, Daniel Chillet, Sébastien Pillement. Dynamically reconfigurable architectures, which can offer high performance, are increasingly used in different domains. Unfortunately, lots of applications cannot benefit from this new paradigm due to large timing overhead. Even for partial reconfiguration, modifying a small region of an FPGA takes few *ms*. To cope with this problem we have developed an ultra-fast power-aware reconfiguration controller (UPaRC) to boost the reconfiguration throughput up to 1.433 GB/s. UPaRC can not only enhance the system performance, but also auto-adapt to various performance and consumption conditions. This could enlarge the range of supported applications and can optimize power-timing trade-off of reconfiguration phase for each selected application during run-time. The energy-efficiency of UPaRC over state-of-the-art reconfiguration controllers is up to 45 times more efficient. This work has been accepted for publication in DATE'2012 [56]. ### 6.1.2. Management of Dynamically Reconfigurable Systems #### 6.1.2.1. Spatio-Temporal Scheduling based on Artificial Neural Networks Participants: Antoine Eiche, Daniel Chillet, Sébastien Pillement, Olivier Sentieys. Management of task execution on dynamically reconfigurable accelerators is known to be a difficult problem due to the large number of possibilities of task instantiations. The problem to solve can be defined as the "spatio-temporal task scheduling". The problem becomes even more difficult to solve when the solutions must be produced during the execution of the application, i.e. on-line. In this context, new algorithms must be defined and, to solve this problem, we propose to define a neural network based on the Hopfield model. We are therefore able to address heterogeneous multiprocessor systems and to manage the reconfigurable resources embedded within MPSoC [23]. Our latest works on this topic focused on two different issues. First we demonstrated that neural network structures used for task scheduling can continue to produce valid solutions even if one or several neurons are in fault [70]. This characteristic is very important for present and future technologies for which the fabrication process variability can lead to increase the number of defaults in the circuit. The second focus concerned the optimization of the neural network convergence by using parallel evaluation of neurons. We have shown how to define several neuron packets (from the neural network) that can be evaluated in parallel without modifying the convergence property [44], [76]. #### 6.1.2.2. Flexible Communication OS Service Participants: Daniel Chillet, Sébastien Pillement, Ludovic Devaux. In a multiprocessor system, to gain the advantages of parallelism, efficient communication and memory management are highly required. Recent developments in the partial and dynamic reconfigurable computing domain demand better ways to manage simultaneous task execution. But, the requirements are slightly different from the traditional software based systems. In this context, Operating System (OS) services like scheduling, placement, inter-task communication have been developed to make such platforms flexible and self-sufficient. For task communications within flexible architectures, we defined a specific network-on-chip adapted to dynamically and partially reconfigurable resources included into modern SoC. The characterization of the *DRAFT* network was completed and its integration inside reconfigurable systems on chip was realized [14]. We then focused on the run-time communication service [50] and dynamic memory management [49] in reconfigurable System-on-Chips (RSoCs). We first developed a hardware communication block and the communication schemes supported by this new OS service. The originality relies on the implementation of this services directly inside the FPGA. We then demonstrated the requirements and advantages of having a local memory task or a dynamically configurable memory task, in order to improve effectiveness and efficiency of the proposed schemes. #### 6.1.3. Fault-Tolerant Reconfigurable Architectures Participants: Sébastien Pillement, Manh Pham, Stanislaw Piestrak. In terms of complex systems implementation, reconfigurable FPGA circuits are now part of the mainstream thanks to their flexibility, performances and high number of integrated resources. FPGAs enter new fields of applications such as aeronautics, military, automotive or confined control thanks to their ability to be remotely updated. However, these fields of applications correspond to harsh environments (cosmic radiation, ionizing, electromagnetic noise) and with high fault-tolerance requirements. We then propose a complete framework to design reconfigurable architecture supporting fault-tolerance mitigation scheme [58]. The proposed framework allows simulation, validation of mitigation operations, but also to size architecture resources. The physical implementation of the fault-tolerant reconfigurable platform permits to validate the proposed model and the effectiveness of the framework. This implementation shows the potential of dynamically reconfigurable architectures for supporting fault-tolerance in embedded systems. We also worked on new approach in order to include dependability in the DRAFT coarse-grained reconfigurable architecture [37]. #### 6.1.4. Low-Power Architectures 6.1.4.1. Ultra Low-Power Architecture for Control-Oriented Applications in Wireless Sensor Nodes Participants: Olivier Sentieys, Steven Derrien, Vivek D. Tovinakere, Philippe Quémerais, Romain Fontaine. This research work aims at developing ultra low-power SoC for wireless sensor nodes, as an alternative to existing approaches based low-power micro-controllers. The proposed approach reduces the power consumption by using a combination of hardware specialization and power gating techniques. In particular, we use the fact that typical WSN applications are generally modeled as a set of small to medium grain tasks that are implemented on low power microcontroller using light weight *thread*-like OS constructs. Rather than implementing these tasks in software, we instead propose to map each of these tasks to their own specialized hardware structures that we call a *hardware micro-task*. Such hardware task consists of a minimalistic (and customized) data-path controlled by a finite state machine (FSM). By customizing each of these hardware implementations to their corresponding task, we expect to significantly reduce the dynamic power dissipated by the whole system. Besides, to circumvent the increase in static power caused by the possibly numerous hardware tasks implemented in the chip, we also propose to combine our approach with *power gating*, so as to supply power to a hardware task only when it needs to be executed [28]. As a prof of concept, a chip has been designed and fabricated in a 65nm CMOS from STMicroelectonics using the CMP facilities. The area is about 1mm<sup>2</sup> in a QFN52 package. The circuit is a controller part of a wireless sensor network node. It embeds an OpenMSP microcontroller core with SRAM memories for data and programs and some dedicated hardware tasks to control an external radio transceiver such as the TI CC2420 commonly used in the industry. To reduce energy consumption, low power design techniques such as power gating were used. Two power domains are implemented: one is dedicated to microcontroller and memories, while the goal of the second is to measure the efficiency of our hardware micro-task concept. The input-output ring around the core is divided into three parts: two parts are digital I/O pads corresponding to a power domain and the third contains analog pads to control the power gating for monitoring. Our goal is to analyze the power benefits of our approach and to compare it with classical microprocessor architectures. 6.1.4.2. Wakeup Time and Wakeup Energy Estimation in Power-Gated Logic Clusters **Participants:** Olivier Sentieys, Vivek D. Tovinakere. Run-time power gating for aggressive leakage reduction has brought into focus the cost of mode transition overheads due to frequent switching between sleep and active modes of circuit operation. In order to design circuits for effective power gating, logic circuits must be characterized for overheads they present during mode transitions. We have proposed a method to determine steady-state virtual-supply voltage in active mode and hence present a model for virtual-supply voltage in terms of basic circuit parameters. Further, we derived Figure 4. Layout and Die of Express-Chip Test Chip expressions for estimation of two mode transition overheads: wakeup time and wakeup energy for a power-gated logic cluster using the proposed model. Experimental results of application of the model to ISCAS85 benchmark circuits show that wakeup time may be estimated within an average error of 16.3% across $22\times$ variation in sleep transistor sizes and $13\times$ variation in circuit sizes with significant speedup in computation time compared to SPICE level circuit simulations [30], [63]. #### 6.1.5. Arithmetic Operators for Cryptography Participants: Arnaud Tisserand, Thomas Chabrier, Danuta Pamula. #### 6.1.5.1. ECC Processor with Protections Against SCA A dedicated processor for elliptic curve cryptography (ECC) is under development. Functional units for arithmetic operations in $\mathbb{F}_{2^m}$ and $\mathbb{F}_p$ finite fields and 160–600-bit operands have been developed for FPGA implementation. Several protection methods against side channel attacks (SCA) have been studied. The use of some number systems, especially very redundant ones, allows to change the way some computations are performed and then their effects on side channel traces. We propose in [83] hardware implementation of the double base number system (DBNS) random recoding of secret keys. This recoding is performed on-the-fly during the elliptic curve cryptograpy (ECC) scalar multiplication [k]P. This leads random behavior of the point operations at the side channel level. We started a collaboration with the University of Sfax in Tunisia on the use of ECC processor for secure communications in low-cost wireless applications. A first FPGA implementation is under development and we expect to submit our first results in 2012. ### 6.1.5.2. Arithmetic Operators for High-Performance Cryptography In [32], we published an extended version of the work started in 2010 on fast algorithms and implementations of $\mathbb{F}_{2^m}$ finite field multiplication units in FPGA. The proposed and compared methods are based on separated multiplication and reduction steps and analyzed various area and time dependency/efficiency/complexity tradeoffs. With Mark Hamilton, PhD student in the Code and Crypto group from the University College Cork (UCC), we worked on fast algorithms and implementations of $\mathbb{F}_p$ finite field multipliers for some specific values of p. The corresponding results have been published in [46]. #### 6.1.6. SoC Modeling and Prototyping on FPGA-based Systems Participants: François Charot, Kevin Martin, Laurent Perraudeau, Charles Wagner. SoCLib and MutekH are two software development projects to which we contribute. SoCLib (http://www.soclib.fr) is an open platform for modeling and simulation of multiprocessors system-on-chip (MP-SoC). MutekH (http://www.mutekh.org) is a free and portable operating system for embedded platforms, ranging from micro-controller to multiprocessor systems. The use of the configurable and extensible simulation model of the Altera NIOSII processor of the SoCLib library and of the MutekH operating system allows us to easily deploy software applications such as codes from MediaBench, MiBench and Cryptographic Library benchmark sets or multithreaded applications on monoprocessor and multiprocessor simulation platforms. These platforms are used on the one hand for the validation of the processor extensions automatically generated by our compilation tools and on the other hand for the measurement of the speedup achieved using these new extensions. ## **6.2.** Compilation and Synthesis for Reconfigurable Platform **Participants:** Steven Derrien, Emmanuel Casseau, Daniel Menard, François Charot, Christophe Wolinski, Olivier Sentieys, Patrice Quinton. #### 6.2.1. Polyhedral based loop transformations for High-Level synthesis Participants: Steven Derrien, Antoine Morvan, Patrice Quinton. After almost two decades of research effort, there now exists a large choice of robust and mature C to hardware tools that are used as production tools by world-class chip vendor companies. Although these tools dramatically slash design time, their ability to generate efficient accelerators is still limited, and they rely on the designer to expose parallelism and to use appropriate data layout in the source program. We believe this can be overcome by tackling the problem directly at the source level, using source-to-source optimizing compilers. More precisely, our aim is to study how polyhedral based program analysis and transformation can be used to address this problem. In the context of the PhD of Antoine Morvan, we have studied how it was possible to improve the efficiency and applicability of nested loop pipelining (also known as nested software pipelining) in C to hardware tools. Loop pipelining is a key transformation in high-level synthesis tools as it helps maximizing both computational throughput and hardware utilization. Nevertheless, it somewhat looses its efficiency when dealing with small trip-count inner loops, as the pipeline latency overhead quickly limits its efficiency. Even if it is possible to overcome this limitation by pipelining the execution of a whole loop nest, the applicability of nested loop pipelining has so far been limited to a very narrow subset of loops, namely perfectly nested loops with constant bounds. In this work, we have extended the applicability of nested-loop pipelining to imperfectly nested loops with affine dependencies. We have shown how such loop nest can be analyzed and, under certain conditions, how one can modify the source code in order to allow nested loop pipeline to be applied using a method called polyhedral bubble insertion. Our approach shown encouraging results and led to a publication to the IEEE International Conference on Field Programmable Technology [48] in December 2011. #### 6.2.2. Reconfigurable Processor Extensions Generation Participants: Christophe Wolinski, François Charot, Erwan Raffin, Kevin Martin, Antoine Floch. During this year, we have continued our work on the generation of reconfigurable processor extension using the constraint programming approach. Previously, we showed how all the problems ranging from instruction identification, scheduling and binding to optimized architecture synthesis can be defined and solved using the constraint programming approach. This year, a new pattern scheduling approach has been defined. It enables concurrent match selection and parallel match scheduling on the processor and extension assuming that the execution on an extension is not atomic. It means that the data produced by an extension must not necessarily be sent to the processor just after the end of processing. Thanks to that, a better scheduling can be obtained [71]. The efficient FPGA implementation of processing units require optimization of hardware resources, such as registers and multiplexers. The extension synthesis defined previously has been revisited. For applications from MediaBench, MiBench and MiCrypt benchmark sets, an improvement of 35%, after placement and routing on the Stratix2 Altera FPGA, is observed. #### 6.2.3. Run-time Reconfigurable Architecture Modeling Participants: Christophe Wolinski, François Charot, Emmanuel Casseau, Daniel Menard, Antoine Floch, Erwan Raffin, Steven Derrien. We have continued to work on the modeling problem of the run-time reconfigurable, operator-based, ROMA multimedia architecture. The ROMA processor is composed of a set of M coarse-grained reconfigurable operators, N data memories, a configuration memory, two interconnection networks (between operators and between operators and memories), and dedicated controllers designed for each module of the datapath. A centralized controller manages the configuration and the execution steps. The ROMA processor has three different interfaces: a data interface connected to the operator network, a control interface and a debug interface connected to the main controller. The number of operators, the number of memories and their size can be decided according to application requirements. The compilation flow of our framework rests on the use of an architecture abstract model of the targeted ROMA architecture. During this year we have focused on the definition of the constraint model to deploy an application graph on the pipeline architecture model. The goal is here to minimize the latency of the pipeline. The main changes are at the operator and memory levels. The operators are pipelined and the dual port memories behave like circular buffers. Recall that in the case of the non pipelined model, the goal is to optimize the execution time of the application under resource constraints. We have carried out experiments to evaluate the quality of our method using different pattern libraries (patterns supported by the ROMA SWP coarse-grained reconfigurable operator, patterns extracted from the MediaBench set) [47]. In these experiments the model has no limitation in terms of number of operators and number of memories. The optimality of the solutions were proven in 93% of cases. More details can be found in [29] and in the Ph.D. thesis of Erwan Raffin [17]. In the context of the *RecMotifs* project, we have continued to work on a specific design flow integrating STMicroelectronics' compiler flow. This project also allowed us to bring significant evolution to our pattern analysis software tools. The RecMotifs flow consists in a pattern analysis flow for STMicroelectronics *graphml* files generated by ST compiler. This flow allows pattern description (description of *graphml* pattern that can be used in the covering pass), type extraction, pattern generation (pattern generation on a *graphml* file), covering (covering of a *graphml* file with minimization of the parallel execution time without any resource constraints). Once the pattern analysis has been applied to the *graphml* files, C code regeneration can be performed using GeCos. #### 6.2.4. Floating-Point to Fixed-Point Conversion Participants: Daniel Menard, Karthick Parashar, Olivier Sentieys, Romuald Rocher, Hai-Nam Nguyen. For the fixed-point conversion process, different optimization algorithms have been tested. The aim is to minimize the implementation cost under accuracy constraint. In [54], two new algorithms for the word-length optimization procedure, based on the Greedy Randomized Adaptive Search Procedure (GRASP), are proposed. Compared to existing methods, our proposition yields better results and has a complexity between deterministic methods and stochastic methods. #### 6.3. Algorithm Architecture Interaction ### 6.3.1. Flexible hardware accelerators for biocomputing applications Participants: Steven Derrien, Naeem Abbas, Patrice Quinton. It is widely acknowledged that FPGA-based hardware acceleration of compute intensive bioinformatics applications can be a viable alternative to cluster (or grid) based approach as they offer very interesting MIPS/watt figure of merits. One of the issues with this technology is that it remains somewhat difficult to use and to maintain (one is rather designing a circuit rather than programming a machine). Even though there exists C-to-hardware compilation tools (Catapult-C, Impulse-C, etc.), a common belief is that they do not generally offer good enough performance to justify the use of such reconfigurable technology. As a matter of fact, successful hardware implementations of bio-computing algorithms are manually designed at RTL level and are usually targeted to a specific system, with little if any performance portability among reconfigurable platforms. This research work, which is part of the ANR BioWic project, aims at providing a framework for helping semi-automatic generation of high-performance hardware accelerators. In particular we expect to widen the scope of common design constraints by focusing on system-level criterions that involve both the host machine and the accelerator (workload balancing, communications and data reuse optimisations, hardware utilization rate, etc.). This research work builds upon the CAIRN research group expertise on automatic parallelization for application specific hardware accelerators and has been targeting mainstream bioinformatics applications (HMMER, ClustalW and BLAST). Our work in 2011 extended the experiment results obtained in 2010 and led to the submission of a paper to IEEE Trans. in Parallel and Distributed Computing (the article being in revision). We also investigated another case study based on a more classical sequence comparison algorithm for which we investigated different style of architectural partitioning. This work led to a paper published in the proceedings of the ARC International Symposium [43]. #### 6.3.2. Range Estimation and Computation Accuracy Optimization **Participants:** Daniel Menard, Karthick Parashar, Olivier Sentieys, Romuald Rocher, Pascal Scalart, Aymen Chakhari, Jean-Charles Naud, Emmanuel Casseau, Andrei Banciu. #### 6.3.2.1. Range Estimation Efficient range estimation methods are required to optimize the integer part word-length. Our previous works based on the Karhunen-Loève Expansion (KLE) have been extended in [38]. The impulse response between the input and a variable is used to propagate the KLE parameters of the inputs. Range estimation has proven to be a difficult problem for non-linear operations especially when the input data is correlated. A stochastic approach can significantly improve the results compared to the classical methods like the interval and affine arithmetic. The aim is to obtain tight intervals by adapting the bounds to a desired probability of overflows. An approach for the analysis of range uncertainties based on the Polynomial Chaos Expansion (PCE) has been developed. The PCE representation is obtained for every input variable and an analytical description of the variability of the output is determined. Furthermore, the correlation of the inputs is captured using the Nataf transform. The range is computed using a probabilistic analysis from the probability density function (PDF). #### 6.3.2.2. Accuracy and performance evaluation The automation of fixed-point conversion requires generic methods to study accuracy degradation. In [51], [73] a new approach using analytical noise power propagation considering conditional structures. These structures are generated from programming language statements such as *if-then-else* or *Switch*. The proposed model takes into account two key points in fixed-point design: first, an alternative processing of noise depending on the condition; second, decision errors generated by quantization noise affecting the condition. This method is integrated in the fixed-point conversion process and uses path probabilities of execution alternatives obtained from profiling. This work extends existing analytical approaches for fixed-point conversion. Experimentations of our analytical method show that it has a fairly accurate noise power estimation compared to the real accuracy degradation. An analytical approach is studied to determine accuracy of systems including unsmooth operators. An unsmooth operator represents a function which is not derivable in all its definition interval (for example the sign operator). The classical model is no valid yet since these operators introduce errors that do not respect the Widrow assumption (their values are often higher than signal power). So an approach based on the distribution of the signal and the noise is proposed. It is applied to the sphere decoding algorithm. We also focus on recursive structure where an error influences future decision. So, the Decision Feedback Equalizer is also considered. ### 6.3.3. Reconfigurable Video Coding Participants: Emmanuel Casseau, Olivier Sentieys, Arnaud Carer, Cécile Beaumin, Hervé Yviquel. In the field of multimedia coding, standardization recommendations are always evolving. To reduce design time, Reconfigurable Video Coding (RVC) standard allows defining new codec algorithms based on a modular library of components. RVC dataflow-based specification formalism expressly targets multiprocessors platforms. However software processor cannot cope with high performance and low power requirements. Hence the mapping of RVC specifications on hardware accelerators is investigated in this work, as well as the scheduling of the functional units (FU) of the specification. Dataflow programming, such as RVC applications, express explicit parallelism within an application. Although multi-core processors are now available everywhere, few applications are able to truly exploit their multiprocessing capabilities. We describe in [69] a scheduling strategy for executing a dataflow program on multi-core architectures using distributed schedulers and lock-free communications. Actually, our goal is to design an RVC-dedicated reconfigurable architecture with various resources. Our previous results lead to the definition of a reconfigurable FIFO for optimizing cost and performance of RVC dataflow specifications by taking advantage of their dynamic behavior. We are currently working with Mickael Raulet from IETR INSA Rennes and Dr. Jani Boutellier from the university of Oulu (Finland), concerning the execution of an RVC decoder on a network of Transport Triggered Architecture (TTA) processors (proposed by the Tampere University of Technology). Thanks to its modular structure, TTA can be seen as a nice kind of CPU design to develop Application-Specific Processor. TTA processor network is connected by hardware channels so it has many similarities with RVC network. Hervé Yviquel, is expected to have a 4-month stay in 2012 in TUT to provide a functional automated flow to design TTA-based platform and compile RVC application for this platform. #### 6.3.4. Multi-Antenna Systems Participants: Olivier Berder, Pascal Scalart, Quoc-Tuong Ngo. Considering the possibility for the transmitter to get some Channel State Information (CSI) from the receiver, antenna power allocation strategies can be performed thanks to the joined optimization of linear precoder (at the transmitter) and decoder (at the receiver) according to various criteria. A new exact solution of the maximization of the minimum Euclidean distance between received symbols has been proposed for two 16-QAM modulated symbols. This precoder shows an important enhancement of this minimum distance compared to diagonal precoders, which leads to a significant BER performance improvement. This new strategy selects the best precoding matrix among eight different expressions, depending on the value of the channel angle. Selecting only two of these expressions, this precoder was then generalized to any rectangular QAM modulation [26]. Not only the minimum Euclidean distance but also the number of neighbors providing it has an important role in reducing the error probability when a Maximum Likelihood detection is considered at the receiver. Aiming at reducing this number of neighbors, a new precoder in which the rotation parameter has no influence is proposed for two independent data streams transmitted. The expression of the new precoding strategy is less complex and the space of solution is, therefore, smaller [53], [74]. In the paper [52], we proposed the general *neighbor-dmin* precoder for three independent data-streams and the simulation results also confirm a significant bit-error-rate improvement of the new precoder in comparison with other traditional precoding strategies. #### 6.3.5. Cooperative Strategies for Low-Energy Wireless Networks Participants: Olivier Berder, Le Quang Vinh Tran, Olivier Sentieys. During the last decade, many works were devoted to improving the performance of relaying techniques in ad hoc networks. One promising approach consists in allowing the relay nodes to cooperate, thus using spatial diversity to increase the capacity of the system. In wireless distributed networks where multiple antennas can not be installed in one wireless node, cooperative relay and cooperative Multi-Input Multi-Output (MIMO) techniques can indeed be used to exploit spatial and temporal diversity gain in order to reduce energy consumption. Considering a system having a two-antenna source, two one-antenna relays and a one-antenna destination, MIMO simple cooperative relay model (MSCR) and MIMO full cooperative relay model (MFCR) are proposed in comparison with MIMO normal cooperative relay model (MNCR) where the relays forward signals consecutively to destination. The energy efficiency of these models is investigated by using a realistic power consumption model where the parameters are extracted from the characteristics of CC2420, a wireless sensor transceiver widely used and commercially available. For each transmission ranges, the optimal cooperative scheme in terms of energy efficiency is provided by simulation results [65], [78]. A fair analytical investigation on these cooperative protocols was also performed. A lower bound for the average symbol error probability (ASEP) of full DSTC cooperative relaying system in a Rayleigh fading environment is provided. In the case when the Signal to Noise Ratio (SNR) of the relay-relay link is much greater than that of the source-relay link, the upper bound on ASEP of this system is also derived. The effect of the distance between the relays shows that the performance does not degrade so much as the distance between relays is lower than a half of the source-destination distance. Moreover, we also show that, when the error synchronization range is lower than 0.5, the impact of the transmission synchronization error of the relay-destination link on the performance is not considerable [64]. The energy efficiency of cooperative MIMO and relay techniques is also very useful for the Infrastructure to Vehicle (I2V) and Infrastructure to Infrastructure (I2I) communications in Intelligent Transport Systems (ITS) networks where the energy consumption of wireless nodes embedded on road infrastructure is constrained. Applications of cooperation between nodes to ITS networks are proposed and the performance and the energy consumption of cooperative relay and cooperative MIMO are investigated in comparison with the traditional multi-hop technique. The comparison between these cooperative techniques helps us to choose the optimal cooperative strategy in terms of energy consumption for energy constrained road infrastructure networks in ITS applications [27]. #### 6.3.6. Opportunistic Routing Participants: Olivier Berder, Olivier Sentieys, Ruifeng Zhang, Jean-Marie Gorce [Insa Lyon, INRIA Swing]. However, the aforementioned approaches introduce an overhead in terms of information exchange, increasing the complexity of the receivers. A simpler way of exploiting spatial diversity is referred to as opportunistic routing. In this scheme, a cluster of nodes still serves as relay candidates but only a single node in the cluster forwards the packet. This paper proposes a thorough analysis of opportunistic routing efficiency under different realistic radio channel conditions. The study aims at finding the best trade-off between two objectives: energy and latency minimizations, under a hard reliability constraint. We derive an optimal bound, namely, the Pareto front of the related optimization problem, which offers a good insight into the benefits of opportunistic routings compared with classical multi-hop routing schemes [31]. We then provided a closed-form expression of the lower bound of the energy-delay tradeoff and of energy efficiency for different channel models (additive white Gaussian noise, Rayleigh fast fading and Rayleigh block-fading) in a linear network. These analytical results are also verified in 2-dimensional Poisson networks using simulations. The closed-form expression provides a framework to evaluate the energy-delay performance and to optimize the parameters in physical layer, MAC layer and routing layer from the viewpoint of cross-layer design during the planning phase of a network. #### 6.3.7. Adaptive techniques for WSN power optimization Participants: Olivier Berder, Daniel Menard, Olivier Sentieys, Mahtab Alam, Trong-Nhan Le. Wireless sensor networks (WSNs) have obtained a great relevancy in civil as well as military applications such as environment sensing, real-time surveillance and habitat monitoring. It is difficult to design a node that is efficient for all of these different applications. The ideal sensor node would have to dynamically adapt its behavior to various parameters such as the data traffic, the channel conditions, the amount of harvested energy, its battery level, etc. Including the capability to scavenge energy from its environment, the design of an efficient power manager able to address both hardware and software processing seems very promising. Energy modeling is an important issue for designing and dimensioning low power wireless sensor networks (WSN). In order to help the developers to optimize the energy spent by WSN nodes, a pragmatic and precise hybrid energy model is proposed. This model considers different scenarios that occur during the communication and evaluates their energy consumption based on software profiling as well as the hardware components power profiles. The proposed model is a combination of analytical derivations and real time measurements. These experiments are particularly useful to understand the medium access control (MAC) layer mechanisms, such as wake up or data collisions for the preamble sampling category, and the energy wasted by collisions can be evaluated [18], [35]. An adaptive wake-up-interval scheme for preamble sampling MAC protocols for variable traffic in WSN is then proposed. The wake-up-interval is updated based on the traffic status register (whose content depends on the presence of messages for a particular node). The results show that the sensor node adapts and converges its wake-up-interval to the best trade-off value for fixed and variable traffic patterns. Two optimization parameters (length of traffic status register and initial wake-up-interval value) are also tuned to achieve fast convergence speed for different traffic rates and variations. A wireless body area sensor network (WBASN) demands ultra-low power and energy-efficient protocols. MAC layer plays a pivotal role for energy management in WBASN, moreover, idle listening is the dominant energy waste in most of the MAC protocols. WBASN exhibits wide range of traffic variations based on different physiological data emanating from the monitored patient. In this context, we proposed a novel energy efficient traffic-aware dynamic (TAD) MAC protocol for WBASN [36]. A comparison with other protocols for three different widely used radio chips, i.e. cc2420, cc1000 and amis52100, is presented. The results show that TAD-MAC outperforms all the other protocols under fixed and variable traffic rates. # 7. Contracts and Grants with Industry ## 7.1. ANR Ingénérie Numérique et Sécurité - ARDyT (2011-2015) Participants: Sébastien Pillement, Arnaud Tisserand, Philippe Quémerais. ARDyT (in french: Architecture Reconfigurable Dynamiquement Tolérante aux fautes) is a project on a Reliable and Reconfigurable Dynamic Architecture. It involves IRISA-Cairn (Lannion), Lab-STICC (Lorient), LIEN (Nancy) and ATMEL. The purpose of the ARDyT project is to provide a complete environment for the design of a fault tolerant and self-adaptable platform. Then, a platform architecture, its programming environment and management methodologies for diagnosis, testability and reliability have to be defined and implemented. The considered techniques are exempt from the use of hardened components for terrestrial and aeronautics applications for the design of low-cost solutions. The ARDyT platform will provide a European alternative to import ITAR constraints for fault-tolerant reconfigurable architectures. For more details see <a href="http://ardyt.irisa.fr">http://ardyt.irisa.fr</a>. ## 7.2. ANR Ingénérie Numérique et Sécurité - COMPA (2011-2015) Participants: Emmanuel Casseau, Steven Derrien, Sébastien Pillement. COMPA (model oriented design of embedded and adaptive multiprocessor) is a project which involves Cairn, IETR (Institut d'Electronique et de Télécommunications de Rennes), Lab-STICC (University of Bretagne Sud), CAPS Entreprise, Modae Technologies and Texas Instruments. The goal of the project is to design adaptive multiprocessor embedded systems from dataflow models. Reconfigurable video coding (RVC) standard will be targeted as application use case. We will then more specifically focus on the use of the portable and platform-independent RVC-CAL language to describe the applications. We will propose transformations in order to refine, optimize and translate the application model into software and hardware components. Task mapping, instructions and processor allocation, and constrained scheduling will also be investigated for runtime execution and reconfiguration. ## 7.3. ANR Ingénérie Numérique et Sécurité - DEFIS (2011-2015) Participants: Olivier Sentieys, Daniel Menard, Romuald Rocher, Nicolas Simon. DEFIS (Design of fixed-point embedded systems) is a project which involves Cairn, LIP6 (University of Paris VI), LIRMM (University of Perpignan), CEA LIST, Thales, Inpixal. The main objectives of the project are to propose new approaches to improve the efficiency of the floating-point to fixed-point conversion process and to provide a complete design flow for fixed-point refinement of complex applications. This infrastructure will reduce the time-to-market by automating the fixed-point conversion and by mastering the trade-off between application quality and implementation cost. Moreover, this flow will guarantee and validate the numerical behavior of the resulting implementation. The proposed infrastructure will be validated on two real applications provided by the industrial partners. For more details see <a href="http://defis.lip6.fr">http://defis.lip6.fr</a>. #### 7.4. ANR ARPEGE - GRECO (2010-2013) Participants: Olivier Sentieys, Olivier Berder, Arnaud Carer, Romain Fontaine, Trong-Nhan Le. Sensor network technologies and the increase efficiency of photovoltaic cells show that it is possible to reach communicating objects solutions with low enough power consumption to foresee the possibility of developing autonomous objects. Greco (GREen wireless Communicating Objects) is a project on the design of autonomous communicating object platforms (i.e. self-powered sensor networks). The aim is to optimize the power consumption based on (i) a modeling of the performances and power of the required blocks (RF front-end, converters, modem, peripherals, digital architecture, OS, software, power generator, battery, etc.) (ii) heterogeneous simulation models and tools, and (iii) the use of a real-time global "Power Manager". The final validation will be performed on various case studies: a monitoring system and an audio communication between firemen. A HW/SW prototyping (based on an CAIRN's PowWow platform with energy harvesting) and a simulation associating a precise modeling (virtual platform) of an object inserted in a network simulator-like environment will be developed as demonstrators. Greco involves **Thales**, INRIA/CAIRN, CEA List, CEA Leti, Im2nP, LEAT, Insight-SiP. For more details see <a href="http://greco.irisa.fr">http://greco.irisa.fr</a>. # 7.5. Images and Networks competitiveness cluster - 100GFlex project (2010-2013) Participants: Olivier Sentieys, Arnaud Carer, Remi Pallas, Pascal Scalart. Speed and flexibility are quickly increasing in the metropolitan networks. In this context, 100GFLEX studies the relevance of a new transmission scheme: the multiband optical OFDM at very-high rates (up to 100 Gbits/s). In this project we will study efficient algorithms (e.g. synchronization) and high-speed architectures for the digital signal processing of the optical transceivers. Due to the high rate of analog signals (sampling at more than 10Gsample/s), synchronizing and processing is real challenge. 100Gflex involves Mitsubishi-Electric R&D Center Europe, Institut Télécom, Ekinops, France Télécom, Yenista Optics, Foton and Cairn. ## 7.6. NANO2012 Program - S2S4HLS (2008-2012) **Participants:** Emmanuel Casseau, Steven Derrien, Daniel Menard, Olivier Sentieys, Loic Cloatre, Amit Kumar, Antoine Morvan, Chenglong Xiao, Jean-Charles Naud. High-level synthesis (HLS) tools start to be used for industrial designs. HLS is analogous to software compilation transposed to the hardware domain. From an algorithmic behavior of the specification, HLS tools automate the design process and generate a register transfer level RTL architecture taking account of user-specified constraints. However, design performance still depends on designer's skill to write the appropriate source code. The S2S4HLS (Source-to-Source for High-Level Synthesis) project intends to process source code transformations to guide synthesis hence leading to more efficient designs, and aims at providing a toolbox for automatic C code source-to-source transformations. The project is focused on three complementary goals to push the limits of existing HLS tools: loop transformations for performance optimization and a better resource usage, automatic floating-point to fixed-point conversion and synthesis of multi-mode architectures. S2S4HLS is organized into three sub-projects targeting these three objectives. The project is in close collaboration with ST Microelectronics and Compsys team at Inria Rhône-Alpes, within the overall INRIA-ST partnership agreement. It is financed by the Ministry of Industry in the Nano2012 program. Cairn is responsible of the project and involved in the three workpackages. ## 7.7. NANO2012 Program - RecMotifs (2008-2012) Participants: François Charot, Antoine Floch, Christophe Wolinski. The RecMotifs project aims at the generation of application specific extensions targeting the STxP70 processor from STMicroelectronics. Cairn will study advanced technologies algorithms for graph matching and graph merging together with constraints programming methods. The project is in close collaboration with ST Microelectronics within the overall INRIA-ST partnership agreement. It is financed by the Ministry of Industry in the Nano2012 program. ## 7.8. ANR Architectures du Futur Open-People (2009-2012) Participants: Daniel Chillet, Robin Bonamy, Olivier Sentieys. The Open-People (Open Power and Energy Optimization PLatform and Estimator) project aims at defining a complete platform for power estimation and optimization. The platform will be composed of hardware boards to support measurements for the applications. End-users will be able to upload their applications through a web portal, and to control the power measurements of the execution of their applications on a specific electronic board. The Open-People project will also propose a complete power component model library which allows end-users to estimate the power consumption of some parts of the applications without making measurements. This will allow to quickly evaluate the different design choices regarding the power consumption. Finally, through the web portal <a href="http://www.open-people.fr">http://www.open-people.fr</a>, Open-People will propose software tools to apply power optimizations. In this project, CAIRN team will develop power model for FPGA components using dynamic reconfiguration. Open-People involves LabSticc (Lorient), Trio (Nancy), CAIRN (Rennes/Lannion) and Dart (Lille/Valenciennes) teams from Inria, Leat at Nice, Thales (Colombes) and InPixal (Rennes). Cairn is in charge of power models and optimization for reconfigurable architectures. #### 7.9. ANR BioWiC (2009-2011) Participants: Steven Derrien, Naeem Abbas, Patrice Quinton. The increasing flow of genomic data provided by the steadily improvement of new biotechnologies cannot be now efficiently exploited without a systematic *in silico* analysis. Data need to be filtered, curated, classified, annotated, validated, etc., to be actively used in a discovery process. The design of such complex pipeline of processing stages is known to be an extremely tedious task as their designers have to deal with both specification and implementation issues. Indeed, the execution time of such *workflows* is very often a bottleneck as huge amount of data has to be processed. Therefore, the goal of the BioWiC (Bioinformatics Workflows for Intensive Computation) project is twofold: - Reducing the design time of complex bioinformatics pipelines by providing a domain specific workflow environment; - Reducing the execution time of these workflows through the use of parallel execution on GPU, FGPA and clusters of PC whenever possible. The ANR BioWic project is funded for 3 years, and involves several institutions (INRA-MIG, Ouest Genopole, CAIRN and Symbiose project-teams at INRIA) and Universities (Eliaus Laboratory at Université de Perpignan). For more details see <a href="http://biowic.inria.fr">http://biowic.inria.fr</a>. CAIRN will provide a framework for helping semi-automatic generation of flexible IP cores, by widening the scope typical design constraints so as to integrate communication and data reuse optimizations between the host and the hardware accelerator. ## 7.10. ANR Architectures du Futur - CIFAER (2008-2011) Participants: Sébastien Pillement, Manh Pham, Olivier Sentieys. In various application domains, emerging requirements lead to the definition of new architectures for electronic embedded systems. In the automotive context, investigated solutions correspond to network of processing elements, distributed in the vehicle. In this context, the research activity considered in the CIFAER (Flexible Intra-Vehicule Communications and Embedded Reconfigurable Architectures) project is the definition of an innovative embedded architecture, based on general purpose processor with reconfigurable processing areas and on the use of adaptable interfaces (radio and powerline communications). Efficient software layers in the associated operating system will be investigated to enable new services as dynamic reconfiguration and task migration for error tolerance. CIFAER involves Irisa, IETR Rennes, Ireena Nantes, Atmel and Geensys. CAIRN will propose and develop the dynamically reconfigurable platform used a the test vehicle of the project. This platform will include fault-tolerant mechanisms for error mitigation. ## 7.11. ANR Architectures du Futur - FOSFOR (2008-2011) Participants: Daniel Chillet, Sébastien Pillement, Manh Pham, Ludovic Devaux, Didier Demigny. The Fosfor (Flexible Operating System FOr Reconfigurable platform) project aims at reconsidering the structure of the RTOS which is generally implemented in software, centralized, and static, by proposing a distributed RTOS with homogeneous interface from the application point of view. We propose to exploit dynamic and partial reconfiguration of the reconfigurable SoC. In this context, the tasks are statically or dynamically deployed (i.e. instantiated) on software units (general processors) or hardware units (reconfigurable areas). Flexibility of the OS will be achieved thanks to virtualization mechanisms of OS services, such that the tasks of the application are executed and communicate without prior knowledge of their assignment to software or hardware. FOSFOR involves Irisa, LEAT Nice, ETIS Cergy, Xilinx and Thales. CAIRN will propose and include in the FOSFOR OS a flexible communication infrastructure and its control management. # 8. Partnerships and Cooperations ## 8.1. Regional Initiatives Organisation by A. Tisserand of working group on cryptography and digital security between research teams from University of Brest, University of Lorient and University of Rennes. The Grappas project, funded by the *Equipe Projet Transversale* program from Université Européenne de Bretagne (UEB) aims at evaluating (and improving) the efficiency of automatic parallelization techniques for accelerating electromagnetic FDTD simulations of antennas on GPUs (Graphical Processing Units). The project is a joint project between IETR (D. Thouroude and R. Sauleau) and IRISA (S. Derrien). ## 8.2. National Initiatives The CAIRN team has currently some collaboration with the following laboratories: CEA List, SATIE ENS Cachan, LEAT Nice, Lab-Sticc (Lorient, Brest), LIRMM (Montpellier, Perpignan), ETIS Cergy, LIP6 Paris, IETR Rennes, Ireena Nantes; and with the following INRIA project-teams: Arénaire, Compsys, Swing, Symbiose, TexMex. The team participates in the activities of the following research organization of CNRS (GdR for in french "Groupe de Recherche"): - GdR SOC-SIP (System On Chip & System In Package), working groups on reconfigurable architectures, embedded software for SoC, low power issues. See <a href="http://www2.lirmm.fr/~w3mic/SOCSIP/index.php">http://www2.lirmm.fr/~w3mic/SOCSIP/index.php</a>. CAIRN is the leader of the group on reconfigurable architectures. - GdR ISIS (Information Signal ImageS), working group on Algorithms Architectures Adequation. - GdR ASR (Architectures Systèmes et Réseaux) - GdR IM (Informatique Mathématiques), C2 working group on Codes and Cryptography ## 8.3. European Initiatives #### 8.3.1. FP7 Projects Program: FP7-ICT-2011-7 Project acronym: Flextiles Duration: Oct. 2011 - Sep. 2014 Coordinator: Thales Other partners: Thales (FR), UR1 (FR), KIT (GE), TU/e (NL), CSEM (SW), CEA LETI (FR), Sundance (UK) Project title: Self Adaptive Heterogeneous Manycore Based on Flexible Tiles Abstract: A major challenge in computing is to leverage multi-core technology to develop energy-efficient high performance systems. This is critical for embedded systems with a very limited energy budget as well as for supercomputers in terms of sustainability. Moreover the efficient programming of multi-core architectures, as we move towards manycores with more than a thousand cores predicted by 2020, remains an unresolved issue. The FlexTiles project will define and develop an energy-efficient yet programmable heterogeneous manycore platform with self-adaptive capabilities. The manycore will be associated with an innovative virtualisation layer and a dedicated tool-flow to improve programming efficiency, reduce the impact on time to market and reduce the development cost by 20 to 50%. FlexTiles will raise the accessibility of the manycore technology to industry-from small SMEs to large companies - thanks to its programming efficiency and its ability to adapt to the targeted domain using embedded reconfigurable technologies. Program: FP7-ICT-2011-7 Project acronym: Alma Project title: Architecture oriented paraLlelization for high performance embedded Multicore sys- tems using scilAb Duration: Sep. 2011 - Aug. 2014 Coordinator: KIT Other partners: KIT (GE), UR1 (FR), Recore Systems (NL), Univ. of Peloponnese (GR), TEI-MES (GR), Intracom SA (GR), Fraunhofer (GE) Abstract: The mapping process of high performance embedded applications to today's multiprocessor system on chip devices suffers from a complex toolchain and programming process. The problem here is the expression of parallelism with a pure imperative programming language which is commonly C. This traditional approach limits the mapping, partitioning and the generation of optimized parallel code, and consequently the achievable performance and power consumption of applications from different domains. The Architecture oriented paraLlelization for high performance embedded Multicore systems using scilAb (ALMA) project aims to bridge these hurdles through the introduction and exploitation of a Scilab-based toolchain which enables the efficient mapping of applications on multiprocessor platforms from high level of abstraction. This holistic solution of the toolchain allows the complexity of both the application and the architecture to be hidden, which leads to a better acceptance, reduced development cost and shorter time-to-market. Driven by the technology restrictions in chip design, the end of Moore's law and an unavoidable increasing request of computing performance, ALMA is a fundamental step forward in the necessary introduction of novel computing paradigms and methodologies. ALMA helps to strengthen the position of the EU in the world market of multiprocessor targeted software toolchains. The challenging research will be achieved by the unique ALMA consortium which brings together industry and academia. High class partners from industry such as Recore and Intracom, will contribute their expertise in reconfigurable hardware technology for multi-core systems-on-chip, software development tools and real world applications. The academic partners will contribute their outstanding expertise in reconfigurable computing and compilation tools development. #### 8.3.2. Collaborations in European Programs, except FP7 Program: ITEA2 Project acronym: GEODES Project title: Global Energy Optimization for Distributed Embedded Systems Duration: Sep. 2008 - Aug. 2011 Coordinator: Thales Other partners: Thales (FR, IT, NL), Sensaris (FR), CNRS (LEAT and IRISA) (FR), CETMEF/MARTEC (FR), Infineon (AU), Thomson (FR), TUV (AU), UAQ (IT), Phillips (NL), Organo (AU), TI-WMC (NL) Abstract: The GEODES project will provide design techniques, embedded software and accompanying tools needed to face the challenge of allowing long power-autonomy of features rich and connected embedded systems, which are becoming pervasive and whose usage is significantly rising. It approaches this challenge by considering all system levels, and notably emphasizes the distributed system view. GEODES is an ITEA2 project which involves partners from France, Austria, Italy and the Netherlands. In GEODES Cairn will provide to partners the PowWow very power sensor platform including reconfigurable hardware accelerators. CAIRN will also contribute on link and MAC layers strategies to a global optimization of the energy, and define and optimize advanced signal processing, error detection and correction and medium access (MAC) techniques in order to reduce the transmit power as well as the useless listening of the communication media. In particular, the case of cooperative strategies like cooperative MIMO or relaying techniques will be investigated. #### 8.3.3. Major European Organizations with which Cairn has followed Collaborations Imec (Belgium) Scenario-based fixed-point data format refinement to enable energy-scalable of Software Defined Radios (SDR) University of Erlangen-Nuremberg and Dresden University of Technology (Germany) Massively parallel embedded reconfigurable architectures and on dynamic reconfiguration optimisation in the mesh fabric University of Paderborn (Germany) Spatio-temporal scheduling for reconfigurable systems Lund University (Sweden) Constraints programming approach application in the reconfigurable data-paths synthesis flow Computer Vision and Robotic Group of the Institute for Informatics and Applications at the University of Girona (Spain) Parallel architectures for vision algorithms applied to underwater robot University of Eindhoven (Netherlands) Reconfigurable data-path synthesis University of Leiden (Netherlands) Parallel architecture synthesis Code and Cryptography group of University College Cork (Ireland) Arithmetic operators for cryptography Ecole Polytechnique Fédérale de Lausanne - EPFL (Switzerland) Optimization of systems using fixed-point arithmetic Technical University of Madrid - UPM (Spain) Optimization of systems using fixed-point arithmetic Technical University of Tampere, University of Oulu (Finland) Reconfigurable Video Coding Thomas Chabrier spent four months in the group of Prof. William P. Marnane at University College Cork, Ireland, from June. #### 8.4. International Initiatives #### 8.4.1. INRIA Associate Teams 8.4.1.1. LRS: Loop unRolling Stones Title: Loop unRolling Stones: compiling in the polyhedral model INRIA principal investigator: Steven Derrien International Partner: Institution: Colorado State University (United States) Laboratory: Mélange Group Duration: 2010 - 2012 Abstract: The goal of the team is twofold: (i) Propose new methodologies and algorithms to tackle some of the open problems in automatic parallelization and high level hardware synthesis from nested loop specifications. In particular, we would like to address the problem of parallelization of complex bioinformatics algorithms based of sophisticated dynamic programming algorithms, for which we would like to propose efficient parallelization schemes for both FPGAs (Field Programmable Gate Arrays) and GPUs (Graphical Processing Units). (ii) Provide a common open software infrastructure based on (modern/cutting edge) software engineering techniques (Model Driven Software Development) so as to help researchers prototyping new ideas and concept in the domain of optimizing compilers. Our goal being to be able to make our in-house software completely interoperable. As far as the second point is concerned, the CAIRN group at IRISA already has a strong commitment in using Model Driven Software Design technique and has set up a very fruitful collaboration with the Triskell EPI in Rennes. This is not yet the case of the Mélange group, however we expect to leverage on another Associate Team (the MoCAa EA) which also involves groups from CSU (Software Insurance Lab) and IRISA (Triskell group) to strengthen the connections on the CSU side. ### 8.4.2. INRIA International Partners Los Alamos National Laboratory (USA) Reconfigurable architectures for scientific processing LRTS laboratory, Laval University in Québec (Canada) Architectures for MIMO systems, Wireless Sensor Networks, INRIA Associate Team (2006-2008) LSSI laboratory, Québec University in Trois-Rivières (Canada) Design of architectures for digital filters and mobile communications Computer Science Department, Colorado State University in Fort-Collins (USA) Loop parallelization, development of high-level synthesis tools, INRIA Associate Team (2010-2012) University of Adelaide (Australia) Arithmetic operators University of Queensland (Australia) Reconfigurable architectures for scientific processing University of California, Riverside (USA) Optimized image processing applications synthesis VLSI CAD lab, Electrical and Computer Engineering Department, University of Massachusetts at Amherst (USA) CAD tools for arithmetic datapath synthesis and optimization University of Douala, University of Yaoundé and University of Dschang (Cameroun) Models and tools for parallelization, SARIMA GIS for the development of research laboratories in Mathematics and Computer Science in Africa ENIT, Univ. Tunis (Tunisia) Architectures for mobile communications Steven Derrien spent two months in the group of Professor Sanjay Rajopdahye at Colorado State University, US, in May and June. ### 8.5. Exterior research visitors Prof. Gabriel Caffarena (University CEU-San Pablo, Madrid) for one week in June. PhD Student Nabil Ghanmy (University of Sfax, Tunisia) for one month in September. Prof. Sébastien Roy for one month and a half in June. Dr. Nicolas Veyrat-Charvillon (Crypto Group from the *Université Catholique de Louvain*, Belgium for 4 days in May-June. PhD Student Tomofumi Yuki (Colorado State University, USA) for two months in November and December. ## 9. Dissemination ## 9.1. Animation of the Scientific Community - O. Berder was a member of the Program Committee of IEEE International Workshop on Cross Layer Design (IWCLD). - F. Charot was a member of the Program Committee of IEEE ASAP Conference. - F. Charot, O. Sentieys and A. Tisserand are members of the steering committee of a CNRS spring school for graduate students on embedded systems architectures and associated design tools (ARCHI). - D. Chillet was Program Co-Chair of the Conference on Design and Architectures for Signal and Image Processing (DASIP) in Tempere, Finland, Nov. 2011. He will be the General Co-Chair of DASIP 2012 in Karlsruhe, Germany. - D. Chillet has organized and managed the special session about Low power estimations and optimizations during the Patmos 2011 conference in september 2011 at Madrid. - D. Menard and O. Sentieys were guest editors of Eurasip Journal of Advances in Signal Processing, special issue on Quantization of VLSI Digital Signal Processing Systems. - S. Pillement was a member of the Program Committee of IEEE FPL, SPL, DTIS and ERSA. - S. Pillement and E. Casseau were members of the Program Committee of DASIP. - P. Quinton is member of the steering committee of the System Architecture MOdelling and Simulation (SAMOS) workshop and a member of the scientific committee of ASAP. - O. Sentieys was a member of technical program committee of the following conferences: IEEE/ACM DATE, IEEE ISQED, IEEE VTC, IEEE DDECS, AFRICON, WUPS, FTFC. He was Track Chair at NEWCAS. He is on the editorial board of Journal of Low Power Electronics, American Scientific Publishers, and of ISRN Sensor Networks. - O. Sentieys is a member of the steering committee of the GDR SOC-SIP. He is the chair of the IEEE Circuits and Systems (CAS) French Chapter. In 2011, he was an expert for some scientific organizations (ANR INS, ANR blanc). He is a member of Allistene working group. - O. Sentieys and A. Tisserand are members of the steering committee of a CNRS spring school for graduate students on low-power design (ECOFAC). - A. Tisserand co-organized the ARCHI 2011 école thématique CNRS architectures des systèmes matériels enfouis et méthodes de conception associées in Mont-Louis, May 2–6, 2011. Details on <a href="http://archi11.univ-perp.fr/">http://archi11.univ-perp.fr/</a>. - A. Tisserand was a member of technical program committee of the following conferences: IEEE ASAP 2011, Reconfig 2011, DASIP 2011, NEWCAS 2011 and SympA 2011. He is a member of the editorial board of the International Journal of High Performance Systems Architecture, Inderscience. - C. Wolinski was a member of technical committee of the following conferences: IEEE/ACM DATE, IEEE FPL, IEEE ASAP, Euromicro DSD, IEEE ISQED, SPL. He is a member of Board of Directors of Euromicro Society and an Advisor of the Métivier Foundation, France. ## 9.2. Seminars and Invitations - O. Sentieys gave an invited talk at WPMC in November on Low-Energy Wireless Sensor Networks. - O. Sentieys gave a tutorial on PowWow at WPMC in November. - O. Sentieys gave an invited talk at GDR SoC-SiP in May on System-Level Synthesis of Ultra Low-Power WSN Node Controllers. - O. Sentieys gave a tutorial at WUPS in January on Energy efficient techniques for lower layers of WSNs. - A. Tisserand gave an invited talk at the french *journée sécurité numérique GDR SoC-SiP* in November on *hardware arithmetic operators for elliptic curve cryptography*. - A. Tisserand gave an invited talk at the Claude Shannon Institute Workshop on Coding & Cryptography in May 2011, Cork, Ireland on *Circuits for True Random Number Generation with On-Line Quality Monitoring*. - A. Tisserand gave a invited lecture at the CNRS ARCHI 2011 spring school on *Hardware Arithmetic Operators*. - A. Tisserand gave a popularization talk at Fête de la science 2011 in Lannion on random numbers generation. ## 9.3. Teaching and Responsibilities #### 9.3.1. Teaching Responsibilities There is a strong teaching activity in the CAIRN team since most of the permanent members are Professors or Associate Professors. - C. Wolinski is the Director of ESIR since March 2011. - P. Quinton is the deputy-director of Ecole Normale Supérieure de Cachan, responsible of the Brittany branch of this school. - E. Casseau is the Director of Academic Studies of ENSSAT. - D. Chillet is the Head of the Electronics Engineering department of ENSSAT. - O. Sentieys is responsible of the "Embedded Systems" branch of the SISEA Master of Research (M2R). ENSSAT stands for "École Nationale Supérieure des Sciences Appliquées et de Technologie" and is an "École d'Ingénieurs" of the University of Rennes 1, located in Lannion. ISTIC is the Electrical Engineering and Computer Science Department of the University of Rennes 1. ESIR (formerly DIIC) stands for "École supérieure d'ingénieur de Rennes" and is an "École d'Ingénieurs" of the University of Rennes 1, located in Rennes. M2R stands for Master by Research, second year. - D. Chillet is member of the French National University Council since 2009 in signal processing and electronics (Conseil National des Universités en 61ème section). - D. Chillet is member of the Permanent Committee of the French National University Council since november 2011 in signal processing and electronics (Commission Permanente du Conseil National des Universités en 61ème section). ## 9.3.2. Teaching - O. Berder: introduction to signal processing, 38h, ENSSAT(L3) - O. Berder: microprocessors and digital systems, 19h, ENSSAT(L3) - O. Berder: wireless communications, 23h, ENSSAT(M2) - O. Berder: digital signal processing, 60h, ENSSAT(M1) - O. Berder: ad hoc networks, 58h, ENSSAT(M1-M2) - O. Berder: signal processing, 24h, IUT Lannion (L2) - E. Casseau: verification, 12h, Master by Research and ENSSAT(M2) - E. Casseau: high-level synthesis, 3h, ENSSAT(M2) - E. Casseau: hardware description language, 24h, ENSSAT(M1) - E. Casseau: design methodology of real-time systems, 24h, ENSSAT(M1) - E. Casseau: verification, 4h, ENSEIRB (M1) - E. Casseau: signal processing, 16h, ENSSAT(L3) - F. Charot: specification of applications with the signal synchronous language, 24h, ESIR(M1) - F. Charot: virtual prototyping of multiprocessor system-on-chip, 48h, ESIR(M1, M2) - D. Chillet: advanced processors architectures, 24h, Master by Research and ENSSAT(M2) - D. Chillet: low-power digital CMOS circuits, 6h, Telecom Bretagne and University of Occidental Brittany (UBO) (M2) - D. Menard: embedded software for signal processing, 14 h, Master by Research and ENSSAT(M2) - D. Menard: embedded systems, 18 h, ENSSAT(M1) - D. Menard: digital signal processors, 20 h, ENSSAT(M1) - D. Menard: digital systems, 38 h, ENSSAT(L3) - D. Menard: embedded processors, 40 h, ENSSAT(L3) - S. Pillement: embedded microprocessors, 28h, ENSSAT(M2) - S. Pillement: initiation to electronic system integration, 12h, ENSSAT(L3) - S. Pillement: computer architecture, 61h, IUT Lannion (L2) - S. Pillement: data acquisition, 21h, IUT Lannion (L2) - S. Pillement: micro-controller programming, 30h, IUT Lannion (L1) - S. Pillement: digital electronics, 43h, IUT Lannion (L1) - R. Rocher: electricity, 16h, IUT Lannion (L1) - R. Rocher: electronics, 56h, IUT Lannion (L1) - R. Rocher: telecommunications, 94h, IUT Lannion (L1) - R. Rocher: signal processing, 12h, IUT Lannion (L2) - R. Rocher: digital communications, 56h, IUT Lannion (L2) - O. Sentieys: methodologies for system-on-chip design, 6h, Master by Research and ENSSAT(M2) - O. Sentieys: VLSI integrated circuit design, 66h, ENSSAT(M1) - O. Sentieys: high-level synthesis of digital signal processors, 16h, Master by Research and EN-SSAT(M2) - A. Tisserand: GPU programming, 6h, ENSSAT(M2) - A. Tisserand: hardware computer arithmetic operators, 6h, Master by Research, Univ. Rennes 1 (M2) - A. Tisserand: computer arithmetic, 12h, ENS Cachan, Antenne de Bretagne, *Magister* Computer Science and Telecommunications (L3) - C. Wolinski: architecture 1, 64h, ESIR(L3) - C. Wolinski: architecture 2, 28h, ESIR(L3) - C. Wolinski: design of Embedded Systems, 48h, ESIR(M1) - C. Wolinski: signal, image, architecture, 26h, ESIR(M1) - C. Wolinski: programmable architectures, 10h, ESIR(M1) - C. Wolinski: component and system synthesis, 10h, Master by Research (MRI ISTIC) (M2) #### 9.3.3. HDR and PhD HDR: Daniel Menard, Contributions à la conception de systèmes en virgule fixe, Habilitation à Diriger des Recherches, University of Rennes 1, Nov. 2011. HDR: Steven Derrien, Contributions à la conception d'architectures matérielles dédiées, Habilitation à Diriger des Recherches, University of Rennes 1, Dec. 2011. PhD: Erwan Raffin, Déploiement d'applications multimédia sur architecture reconfigurable à gros grain : modélisation avec la programmation par contraintes, University of Rennes 1, Jul. 2011, C. Wolinski, F. Charot. PhD: Ludovic Devaux, Réseaux d'interconnexion flexibles pour architectures reconfigurables dynamiquement, University of Rennes 1, Nov. 2011, S. Pillement, D. Demigny. PhD: Hai-Nam Nguyen, Optimisation de la précision de calcul pour la réduction d'énergie des systèmes embarqués, University of Rennes 1, Dec. 2011, D. Menard, O. Sentieys. PhD in progress: Naeem Abbas, Flexible Hardware Accelerators for Biocomputing Applications, Jan. 2009, P. Quinton, S. Derrien PhD in progress: Mahtab Alam, Power Aware Signal Processing for Reconfigurable Radios in the context of Wireless Sensor Networks, Nov. 2009, O. Sentieys, O. Berder, D. Menard PhD in progress: Andrei Banciu, New Digital Design Methodology for Multi Giga bits/s Tranceivers, Oct. 2008, E. Casseau, D. Menard PhD in progress: Karim Bigou, RNS Hardware Units for ECC, Oct. 2011, A. Tisserand PhD in progress: Robin Bonamy, Power Consumption Modelling and Optimisation for Reconfigurable Platform, Oct. 2009, D. Chillet PhD in progress: Franck Bucheron, Secure Virtualization for Embedded Systems, Oct. 2011, A. Tisserand PhD in progress: Thomas Chabrier, Reconfigurable Arithmetic Units for Cryptoprocessors with Protection against Side Channel Attacks, Oct. 2009, A. Tisserand, E. Casseau PhD in progress: Aymen Chakhari, Analytical approach for decision errors in fixed-point digital communication systems, Oct. 2010, R. Rocher, P. Scalart PhD in progress: Antoine Eiche, Real-Time Scheduling for Heterogeneous and Reconfigurable Architectures using Neural Network Structures, Oct. 2009, D. Chillet, S. Pillement PhD in progress: Antoine Floch, Pattern Recognition for Processor Instruction-Set Extension, Jan. 2009, C. Wolinski, F. Charot PhD in progress: Clément Guy, Generic Definition of Domain Specific Analysis using MDE, Oct. 2010, S. Derrien, jointly with J.M. Jezequel and B. Combemale from Triskell EPI Trong-Nhan Le, Global power management system for self-powered autonomous wireless sensor nodes, Jan. 2011, O. Sentieys, O. Berder PhD in progress: Antoine Morvan, Loop Transformations for Design Space Exploration in High-Level Synthesis, Oct. 2009, P. Quinton, S. Derrien PhD in progress: Jean-Charles Naud, Source-to-Source Code Transformation for Fixed-Point Conversion, Oct. 2009, D. Menard PhD in progress: Quoc-Tuong Ngo, Optimization of Precoding Strategies for Multi-User MIMO-OFDM Systems, Oct. 2008, P. Scalart, O. Berder PhD in progress: Cécile Beaumin, Reconfigurable Architecture for High-Performance Video Transcoding, Oct. 2008, O. Sentieys, E. Casseau PhD in progress: Karthick Parashar, System-level Approach for Implementation and Optimization of Signal Processing Applications into Fixed-Point Architectures, Oct. 2008, O. Sentieys, D. Menard PhD in progress: Danuta Pamula, Arithmetic Operators for Cryptography, Oct. 2009, A. Tisserand PhD in progress: Matthieu Texier, Low-Power Embedded Multi-Core Architectures for Mobile Systems, Oct. 2009, O. Sentieys, jointly with R. David from CEA List PhD in progress: Michel Theriault, Transmit Beam-forming for Distributed Wireless Access with Centralized Signal Processing, Oct. 2007, O. Sentieys, jointly with S. Roy from U. Laval, Canada PhD in progress: Vivek Dwarakanath-Tovinakere, Ultra-Low Power Reconfigurable Controllers for Wireless Sensor Networks, Oct. 2009, O. Sentieys PhD in progress: Le Quang Vinh Tran, Energy Optimisation of Cooperative Transmissions for Wireless Sensor Networks, Oct. 2009, O. Berder, O. Sentieys Ganda-Stéphane Ouedraogo, Automatic synthesis of hardware accalerator from high-level specifications in flexible radios, Oct. 2011, M. Gautier, O. Sentieys PhD in progress: Pramod Udupa, Sampling, synchronising, digital processing and FPGA implementation of 100Gbps optical OFDM signals, Jan. 2011, O. Sentieys PhD in progress: Chenglong Xiao, Pattern-Based Guided High-Level Synthesis, Oct. 2009, E. Casseau PhD in progress: Hervé Yviquel, Video coding design framework based on SoC-based platforms, Oct. 2010, E. Casseau # 10. Bibliography # Major publications by the team in recent years - [1] L. COLLIN, O. BERDER, P. ROSTAING, G. BUREL. *Optimal Minimum Distance Based Precoder for MIMO Spatial Multiplexing Systems*, in "IEEE Transactions on Signal Processing", March 2004, vol. 52, n<sup>o</sup> 3. - [2] A. COURTAY, O. SENTIEYS, J. LAURENT, N. JULIEN. *High-level Interconnect Delay and Power Estimation*, in "Journal of Low Power Electronics (JOLPE)", 2008, vol. 4, n<sup>o</sup> 1, p. 21-33. - [3] R. DAVID, S. PILLEMENT, O. SENTIEYS. *Energy-Efficient Reconfigurable Processors*, in "Low Power Electronics Design", C. PIGUET (editor), Computer Engineering, Vol 1, CRC Press, August 2004, chap. 20. - [4] S. DERRIEN, P. QUINTON. *Parallelizing HMMER for Hardware Acceleration on FPGAs*, in "18th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2007)", Montreal, Canada, July 2007, p. 10–18, Best Paper Award. - [5] L. IMBERT, A. PEIRERA, A. TISSERAND. A Library for Prototyping the Computer Arithmetic Level in Elliptic Curve Cryptography, in "Proc. Advanced Signal Processing Algorithms, Architectures and Implementations XVII", San Diego, California, U.S.A., F. T. LUK (editor), SPIE, August 2007, vol. 6697, n<sup>o</sup> 66970N, p. 1–9, http://dx.doi.org/10.1117/12.733652. - [6] K. KUCHCINSKI, C. WOLINSKI. Global Approach to Scheduling Complex Behaviors based on Hierarchical Conditional Dependency Graphs and Constraint Programming, in "Journal of Systems Architecture", December 2003, vol. 49, n<sup>o</sup> 12-15. - [7] D. MENARD, D. CHILLET, O. SENTIEYS. *Floating-to-fixed-point Conversion for Digital Signal Processors*, in "EURASIP Journal on Applied Signal Processing (JASP), Special Issue Design Methods for DSP Systems", 2006, vol. 2006, n<sup>o</sup> 1, p. 1–15. - [8] D. MENARD, O. SENTIEYS. *Automatic Evaluation of the Accuracy of Fixed-point Algorithms*, in "IEEE/ACM Design, Automation and Test in Europe (DATE-02)", Paris, March 2002. - [9] S. PILLEMENT, O. SENTIEYS, R. DAVID. DART: A Functional-Level Reconfigurable Architecture for High Energy Efficiency, in "EURASIP Journal on Embedded Systems (JES)", 2008, p. 1-13, Article ID 562326, 13 pages. - [10] C. PLAPOUS, C. MARRO, P. SCALART. *Improved signal-to-noise ratio estimation for speech enhancement*, in "IEEE Transactions on Speech and Audio Processing", 2006, vol. 14, n<sup>o</sup> 6. [11] A. TISSERAND. *High-Performance Hardware Operators for Polynomial Evaluation*, in "Int. J. High Performance Systems Architecture", March 2007, vol. 1, n<sup>o</sup> 1, p. 14–23, invited paper, http://dx.doi.org/10.1504/IJHPSA.2007.013288. [12] C. WOLINSKI, M. GOKHALE, K. MCCABE. *Polymorphous fabric-based systems: Model, tools, applications*, in "Journal of Systems Architecture", September 2003, vol. 49, n<sup>o</sup> 4-6. # Publications of the year #### **Doctoral Dissertations and Habilitation Theses** - [13] S. DERRIEN. *Contributions à la conception d'architectures matérielles dédiées*, University of Rennes 1, December 2011, Habilitation à Diriger des Recherches. - [14] L. DEVAUX. Réseaux d'interconnexion flexibles pour architectures reconfigurables dynamiquement, University of Rennes 1, November 2011. - [15] D. MENARD. *Contribution à la conception de systèmes en virgule fixe*, University of Rennes 1, November 2011, Habilitation à Diriger des Recherches. - [16] H.-N. NGUYEN. Optimisation de la précision de calcul pour la réduction d'énergie des systèmes embarqués, University of Rennes 1, December 2011. - [17] E. RAFFIN. Déploiement d'applications multimédia sur architecture reconfigurable à gros grain : modélisation avec la programmation par contraintes, University of Rennes 1, July 2011, http://tel.archives-ouvertes.fr/docs/00/64/23/30/PDF/These\_Erwan\_Raffin.pdf. ## **Articles in International Peer-Reviewed Journal** - [18] M. ALAM, O. BERDER, D. MENARD, T. ANGER, O. SENTIEYS. *A Hybrid Model for Accurate Energy Analysis of WSN nodes*, in "EURASIP Journal on Embedded Systems", January 2011, vol. 2011, n<sup>o</sup> Article ID 307079, p. 4:1–4:16, http://dx.doi.org/10.1155/2011/307079. - [19] D. BLOUIN, D. CHILLET, E. SENN, S. BILAVARN, R. BONAMY, C. SAMOYEAU. *AADL Extension to Model Classical FPGA and FPGA Embedded within a SoC*, in "International Journal of Reconfigurable Computing", 2011, vol. 2011, n<sup>o</sup> Article ID 425401, 15, http://dx.doi.org/10.1155/2011/425401. - [20] G. CAFFARENA, O. SENTIEYS, D. MENARD, J. A. LÓPEZ, D. NOVO. *Editorial: Quantization of VLSI Digital Signal Processing Systems*, in "EURASIP Journal on Advances in Signal Processing", 2011, vol. 2011, 2. - [21] G. CAFFARENA, O. SENTIEYS, D. MENARD, J. A. LÓPEZ, D. NOVO. *Editors of the Special Issue on Quantization of VLSI Digital Signal Processing Systems*, in "EURASIP Journal on Advances in Signal Processing", 2011, vol. 2011. - [22] E. CASSEAU, B. LE GAL. *Design of multi-mode application-specific cores based on high-level synthesis*, in "Integration, the VLSI Journal", January 2012, vol. 45, n<sup>o</sup> 1, p. 9-21 [*DOI*: 10.1016/J.VLSI.2011.07.003], http://hal.inria.fr/hal-00631007/en. - [23] D. CHILLET, A. EICHE, S. PILLEMENT, O. SENTIEYS. *Real-time scheduling on heterogeneous system-on-chip architectures using an optimised artificial neural network*, in "Journal of Systems Architecture Embedded Systems Design", April 2011, vol. 57, n<sup>o</sup> 4, p. 340-353, http://dx.doi.org/10.1016/j.sysarc.2011.01.004. - [24] B. LE GAL, E. CASSEAU. *Latency-Sensitive High-Level Synthesis for Multiple Word-Length DSP Design*, in "EURASIP Journal on Advances in Signal Processing", January 2011, vol. 2011, 11 [DOI: 10.1155/2011/927670], http://hal.inria.fr/hal-00631012/en. - [25] B. LE GAL, E. CASSEAU. Word-Length Aware DSP Hardware Design Flow Based on High-Level Synthesis, in "Journal of Signal Processing Systems", March 2011, vol. 62, p. 341–357 [DOI: 10.1007/S11265-010-0467-8], http://hal.archives-ouvertes.fr/hal-00554228/fr/. - [26] Q.-T. NGO, O. BERDER, P. SCALART. *Minimum Euclidean distance based precoders for MIMO systems using rectangular QAM modulations*, in "IEEE Transactions on Signal Processing", March 2011, http://dx.doi.org/10.1109/TSP.2011.2177972. - [27] T.-D. NGUYEN, O. BERDER, O. SENTIEYS. *Energy-Efficient Cooperative Techniques for Infrastructure-to-Vehicle Communications*, in "IEEE Transactions on Intelligent Transportation Systems", September 2011, vol. 12, no 3, p. 659 -668, http://dx.doi.org/10.1109/TITS.2011.2118754. - [28] A. PASHA, S. DERRIEN, O. SENTIEYS. System Level Synthesis for Wireless Sensor Node Controllers: A Complete Design Flow, in "ACM Transactions on Design Automation of Electronic Systems (TODAES)", 2011. - [29] E. RAFFIN, C. WOLINSKI, F. CHAROT, K. KUCHCINSKI, S. GUYETANT, S. CHEVOBBE, E. CASSEAU. Scheduling, Binding and Routing System for a Run-Time Reconfigurable Operator Based Multimedia Architecture, in "International Journal of Embedded and Real-Time Communication Systems (IJERTCS)", 2011. - [30] V. D. TOVINAKERE, O. SENTIEYS, S. DERRIEN. A Polynomial Based Approach to Wakeup Time and Energy Estimation in Power-Gated Logic Clusters, in "Journal of Low Power Electronics (JOLPE)", December 2011, vol. 7, no 4, p. 482-489. - [31] R. ZHANG, J.-M. GORCE, O. BERDER, O. SENTIEYS. Lower Bound of Energy-Latency Trade-off of Opportunistic Routing in Multi-hop Networks, in "EURASIP Journal on Wireless Communications and Networking", 2011, vol. 2011, n<sup>o</sup> Article ID 265083, 17, http://www.hindawi.com/journals/wcn/2011/ 265083.html. #### **Articles in National Peer-Reviewed Journal** - [32] D. PAMULA, E. HRYNKIEWICZ, A. TISSERAND. *Analiza algorytmow mnozenia w ciele GF*( $2^m$ ), in "Pomiary, Automatyka, Kontrola (PAK)", 2011, vol. 57, $n^0$ 1, p. 58–60. - [33] M. PHAM, S. PILLEMENT. *Reconfigurable ECU communications in AUTOSAR Environment*, in "Ingénieurs de l'Automobile", 2011, vol. 813. #### **International Conferences with Proceedings** [34] D. ADROUCHE, R. SADOUN, S. PILLEMENT. *A design methodology for specification and performances evaluation of Network On Chip*, in "Proc. IEEE International Workshop on Reliability Aware System Design and Test", Chennai, India, January 2011, p. 65–70. - [35] M. ALAM, O. BERDER, D. MENARD, O. SENTIEYS. *Accurate Energy Consumption Evaluation of Preamble Sampling MAC Protocols for WSN*, in "Proc. of the Workshop on Ultra-Low Power Sensor Networks (WUPS), co-located with Int. Conf. on Architecture of Computing Systems (ARCS)", Como, Italy, February 2011. - [36] M. ALAM, O. BERDER, D. MENARD, O. SENTIEYS. *Traffic-Aware Adaptive Wake-Up-Interval for Preamble Sampling MAC Protocols of WSN*, in "Proc. of the International Workshop on Cross-Layer Design (IWCLD)", Rennes, France, December 2011. - [37] M. M. AZEEM, STANISLAW J. PIESTRAK, O. SENTIEYS, S. PILLEMENT. *Error recovery technique for coarse-grained reconfigurable architectures*, in "Proc. IEEE 14th International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS 2011)", April 2011, p. 441-446, http://dx.doi.org/10. 1109/DDECS.2011.5783133. - [38] A. BANCIU, E. CASSEAU, D. MENARD, T. MICHEL. Stochastic modeling for floating-point to fixed-point conversion, in "Proc. 2011 IEEE Workshop on Signal Processing Systems (SiPS)", oct. 2011, p. 180 -185, http://dx.doi.org/10.1109/SiPS.2011.6088971. - [39] V. BASUPALLI, T. YUKI, S. V. RAJOPADHYE, A. MORVAN, S. DERRIEN, P. QUINTON, D. WONNACOTT. *ompVerify: Polyhedral Analysis for the OpenMP Programmer*, in "7th International Workshop on OpenMP, IWOMP", 2011, p. 37-53. - [40] D. BLOUIN, E. SENN, R. BONAMY, D. CHILLET, S. BILAVARN, C. SAMOYEAU. FPGA Modeling for SoC Design Exploration, in "International Workshop on Highly-Efficient Accelerators and Reconfigurable Technologies (HEART)", Londres, May 2011. - [41] R. BONAMY, D. CHILLET, S. BILAVARN, O. SENTIEYS. *Parallelism Level Impact on Energy Consumption in Reconfigurable Devices*, in "International Workshop on Highly-Efficient Accelerators and Reconfigurable Technologies (HEART)", Londres, May 2011. - [42] R. BONAMY, D. CHILLET, S. BILAVARN, O. SENTIEYS. *Towards a power and energy efficient use of partial dynamic reconfiguration*, in "Proc. 6th Int. Workshop on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)", Montpellier, France, June 2011, p. 1–4, http://dx.doi.org/10.1109/ReCoSoC. 2011.5981540. - [43] A. CORNU, S. DERRIEN, D. LAVENIER. *HLS Tools for FPGA: faster development with better performances*, in "Proceeding of the 7th International Symposium on Applied Reconfigurable Computing", Belfast, United Kingdom, March 2011, vol. 6578, p. 67-78, http://hal.inria.fr/hal-00637830/en. - [44] A. EICHE, D. CHILLET, S. PILLEMENT, O. SENTIEYS. *Parallel Evaluation of Hopfield Neural Networks*, in "International Conference on Neural Computation Theory and Applications (NCTA)", Paris, France, October 2011. - [45] A. FLOCH, T. YUKI, C. GUY, S. DERRIEN, B. COMBEMALE, S. RAJOPADHYE, R. B. FRANCE. *Model-Driven Engineering and Optimizing Compilers: A bridge too far ?*, in "ACM/IEEE 14th International - Conference on Model Driven Engineering Languages and Systems (Models'11)", October 2011, p. 608-622, http://hal.inria.fr/inria-00613575/en. - [46] M. HAMILTON, W. P. MARNANE, A. TISSERAND. A Comparison on FPGA of Modular Multipliers Suitable for Elliptic Curve Cryptography over GF(p) for Specific p Values, in "Proc. 21st International Conference on Field Programmable Logic and Applications (FPL)", Chania, Greece, IEEE, September 2011, p. 273-276, http://dx.doi.org/10.1109/FPL.2011.55. - [47] D. MENARD, H.-N. NGUYEN, F. CHAROT, S. GUYETANT, J. GUILLOT, E. RAFFIN, E. CASSEAU. *Exploiting reconfigurable SWP operators for multimedia applications*, in "Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)", Prague, may 2011, p. 1717 -1720 [DOI: 10.1109/ICASSP.2011.5946832], http://hal.inria.fr/inria-00567017/en. - [48] A. MORVAN, S. DERRIEN, P. QUINTON. *Efficient Nested Loop Pipelining in High Level Synthesis using Polyhedral Bubble Insertion*, in "IEEE International Conference on Field-Programmable Technology (FPT'11)", New Delhi, India, December 2011, p. 1-10. - [49] S. NARAYANAN, D. CHILLET, S. PILLEMENT, I. SOURDIS. *Hardware OS Communication Service and Dynamic Memory Management for RSoCs*, in "Proc. International Conference on ReConFigurable Computing and FPGAs (ReConFig)", Cancun, Mexico, November 2011. - [50] S. NARAYANAN, L. DEVAUX, D. CHILLET, S. PILLEMENT, I. SOURDIS. *Communication service for hard-ware tasks executed on dynamic and partially reconfigurable substrate*, in "Proc. 19th IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC)", Hong-Kong, September 2011. - [51] J.-C. NAUD, Q. MEUNIER, D. MENARD, O. SENTIEYS. *Fixed-point Accuracy Evaluation in the Context of Conditional Structures*, in "Proc. 19th European Signal Processing Conference (EUSIPCO)", Barcelona, Spain, September 2011. - [52] Q.-T. NGO, O. BERDER, P. SCALART. *Neighbor-dmin Precoder for Three Data-Stream MIMO Systems*, in "19th European Signal Processing Conference (EUSIPCO)", Barcelona, Spain, August 2011, p. 81-85. - [53] Q.-T. NGO, O. BERDER, P. SCALART. *Reducing the number of neighbors in the received constellation of dmin precoded MIMO systems*, in "Proc. of the IEEE Conference on Wireless Communications and Networking Conference (WCNC)", Cancun, Mexico, March 2011, p. 1635 -1639, http://dx.doi.org/10.1109/WCNC.2011. 5779380. - [54] H.-N. NGUYEN, D. MENARD, O. SENTIEYS. Novel Algorithms for Word-length Optimization, in "Proc. 19th European Signal Processing Conference (EUSIPCO)", Barcelona, Spain, September 2011, http://hal.inria.fr/inria-00617718/en. - [55] P. PATRONIK, K. BEREZOWSKI, S. J. PIESTRAK, J. BIERNAT, A. SHRIVASTAVA. Fast and energy-efficient constant-coefficient FIR filters using residue number system, in "Proc. of the 17th IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED'11)", 1-3 Aug. 2011, p. 385-390, http://dx.doi. org/10.1109/ISLPED.2011.5993671. - [56] M. PHAM, R. BONAMY, S. PILLEMENT, D. CHILLET. *Power-Aware Ultra-Rapid Reconfiguration Controller*, in "Proc. IEEE/ACM Design and Test in Europe Conference (DATE)", 2012. [57] M. PHAM, L. DEVAUX, S. PILLEMENT. *Re2DA: Reliable and Reconfigurable Dynamic Architectures*, in "Proc. 6th Int. Workshop on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)", June 2011, p. 1–6, http://dx.doi.org/10.1109/ReCoSoC.2011.5981519. - [58] M. PHAM, S. PILLEMENT, S. LE NOURS, O. PASQUIER. A Framework for the Design of Reconfigurable Fault Tolerant Architectures, in "Proc. Conference on Design and Architectures for Signal and Image Processing (DASIP)", Tampere Finland, November 2011. - [59] S. J. PIESTRAK. *Design of multi-residue generators using shared logic*, in "Proc. IEEE International Symposium on Circuits and Systems (ISCAS)", 15-18 May 2011, p. 1435-1438, http://dx.doi.org/10.1109/ISCAS.2011.5937843. - [60] P. SCALART, L. LEPAULOUX. *Highlighting the influence of artifacts signals on the equilibrium state of the feedback structure*, in "Proc. 19th European Signal Processing Conference (EUSIPCO)", Barcelona, Spain, September 2011, http://hal.inria.fr/inria-00636259/en. - [61] M. TEXIER, R. DAVID, K. B. CHEHIDA, O. SENTIEYS. *Graphic rendering Application Profiling on a Shared Memory MPSoC Architecture*, in "Proc. of the Conference on Design and Architectures for Signal and Image Processing (DASIP)", Tampere Finland, November 2011. - [62] M. TEXIER, E. PIRIOU, M. THEVENIN, R. DAVID. *Designing Processors Using MAsS, a Modular and Lightweight Instruction-level Exploration Tool*, in "Proc. of the Conference on Design and Architectures for Signal and Image Processing (DASIP)", Tampere Finland, November 2011. - [63] V. D. TOVINAKERE, O. SENTIEYS, S. DERRIEN. Wakeup Time and Wakeup Energy Estimation in Power-Gated Logic Clusters, in "Proc. of the 24th International Conference on VLSI Design", Chennai, India, January 2011, p. 340 345, http://dx.doi.org/10.1109/VLSID.2011.18. - [64] L.-Q.-V. TRAN, O. BERDER, O. SENTIEYS. Non-Regenerative Full Distributed Space-Time Codes in Cooperative Relaying Networks, in "Proc. of the IEEE International Wireless Communications and Networking Conference (WCNC)", Cancun, Mexico, March 2011, p. 1529 1533, http://dx.doi.org/10.1109/WCNC.2011. 5779357. - [65] L.-Q.-V. TRAN, O. BERDER, O. SENTIEYS. Spectral efficiency and energy efficiency of distributed spacetime relaying models, in "Proc. of the IEEE Conference on Consumer Communications and Networking Conference (CCNC)", Las Vegas, US, January 2011, p. 1088 -1092, http://dx.doi.org/10.1109/CCNC.2011. 5766335. - [66] C. XIAO, E. CASSEAU. An efficient algorithm for custom instruction enumeration, in "Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSI", New York, NY, USA, GLSVLSI '11, ACM, 2011, p. 187–192, http://doi.acm.org/10.1145/1973009.1973047. - [67] C. XIAO, E. CASSEAU. *Efficient custom instruction enumeration for extensible processors*, in "Proc. 2011 IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP)", sept. 2011, p. 211 -214, http://dx.doi.org/10.1109/ASAP.2011.6043270. - [68] C. XIAO, E. CASSEAU. Efficient Maximal Convex Custom Instruction Enumeration for Extensible Processors, in "Proc. of the Conference on Design and Architectures for Signal and Image Processing (DASIP)", Tampere Finland, November 2011. [69] H. YVIQUEL, E. CASSEAU, M. WIPLIEZ, M. RAULET. *Efficient multicore scheduling of dataflow process networks*, in "Proc. 2011 IEEE Workshop on Signal Processing Systems (SiPS)", oct. 2011, p. 198 -203, http://dx.doi.org/10.1109/SiPS.2011.6088974. ### **National Conferences with Proceeding** - [70] D. CHILLET, A. EICHE, S. PILLEMENT, O. SENTIEYS. Exploitation du concept de tolérance aux fautes des réseaux de neurones pour la résolution de problèmes d'optimisation, in "XXIIIe Colloque GRETSI Traitement du Signal et des Images", Bordeaux, France, September 2011. - [71] A. FLOCH, F. CHAROT, S. DERRIEN, K. MARTIN, A. MORVAN, C. WOLINSKI. Sélection d'instructions et ordonnancement parallèle simultanés pour la conception de processeurs spécialisés, in "Symposium en Architecture de Machines (Sympa'14)", St Malo, France, May 2011, http://hal.inria.fr/hal-00640999/en/. - [72] C. GUY, S. DERRIEN, B. COMBEMALE, J.-M. JEZEQUEL. *Vers un rapprochement de l'IDM et de la compilation*, in "Journées sur l'Ingénierie Dirigée par les Modèles", Lille, France, June 2011, http://hal.inria.fr/inria-00601670/en. - [73] J.-C. NAUD, D. MENARD, Q. MEUNIER, O. SENTIEYS. *Evaluation de la précision en virgule fixe dans le cas des structures conditionnelles*, in "Symposium en Architecture de Machines (Sympa'14)", Saint Malo, France, May 2011, http://hal.inria.fr/inria-00617720/en. - [74] Q.-T. NGO, O. BERDER, P. SCALART. *Influence du nombre de symboles voisins sur les performances des systèmes MIMO précodés par le critère de la distance minimale*, in "XXIIIe Colloque GRETSI Traitement du Signal et des Images", Bordeaux, France, September 2011. - [75] K. PARASHAR, O. SENTIEYS, D. MENARD. Approche hiérarchique pour l'optimisation de la précision des systèmes de traitement du signal utilisant l'arithmétique virgule fixe, in "XXIIIe Colloque GRETSI Traitement du Signal et des Images", Bordeaux, France, September 2011. - [76] A. PASTUREL, A. EICHE, D. CHILLET, S. PILLEMENT, O. SENTIEYS. *Implémentation matérielle d'un réseau de neurones pour l'ordonnancement de tâches sur architectures multi-processeur hétérogènes*, in "Symposium en Architecture de Machines (Sympa'14)", Saint-Malo, France, May 2011. - [77] M. PHAM, S. PILLEMENT, S. LE NOURS, O. PASQUIER. *Modélisation et implémentation de calculateurs re-configurables tolérants aux fautes et communications flexibles intra-véhicules*, in "Symposium en Architecture de Machines (Sympa'14)", Saint-Malo, France, May 2011, p. 23–32. - [78] L.-Q.-V. TRAN, O. BERDER, O. SENTIEYS. *Efficacités spectrale et énergétique des systèmes de relais*, in "XXIIIe Colloque GRETSI Traitement du Signal et des Images", Bordeaux, France, September 2011. #### **Conferences without Proceedings** [79] D. CHILLET. JSimRisc: un outil pédagogique pour appréhender le fonctionnement pipeline et quelques techniques avancées mises en œuvre dans les processeurs récents, in "Colloque sur l'Enseignement des Technologies et des Sciences de l'Information et des Systèmes (CETSIS)", Trois Rivières, Québec, October 2011. #### Scientific Books (or Scientific Book chapters) [80] F. NOUVEL, P. TANGUY, S. PILLEMENT, M. PHAM. Experiments of in-vehicle power line Communications, in "Vehicular Technologies", M. ALMEIDA (editor), Intech, 2011, p. 255–278. ## **Scientific Popularization** [81] A. TISSERAND. Comment produire des nombres vraiment aléatoires?, October 2011, Exposé, Fête de la Science, Lannion. #### **Other Publications** - [82] R. BONAMY, D. CHILLET, S. BILAVARN, O. SENTIEYS. *Towards a Power and energy Efficient Use of Partial Dynamic Reconfiguration*, in "GDR SoC-SiP", Lyon, France, June 2011. - [83] T. CHABRIER, D. PAMULA, A. TISSERAND. Hardware Random Recoding: Redundant Representations of Numbers, Side Channel Analysis, Elliptic Curve Cryptography, April 2011, Journées Codage et Cryptographie du GDR IM. - [84] A. CORNU, S. DERRIEN, D. LAVENIER. *How to accelerate genomic sequence alignment 4X using half an FPGA*, July 2011, http://www.eetimes.com/design/programmable-logic/4217568/How-to-accelerate-genomic-sequence-alignment-4X-using-half-an-FPGA?Ecosystem=programmable-logic. - [85] F. NOUVEL, P. TANGUY, S. LE NOURS, S. PILLEMENT. Architecture embarquée reconfigurable pour les communications intra-véhicule, 2011, Séminaire du GDR SoC-SiP. - [86] O. SENTIEYS. System-Level Synthesis of Ultra Low-Power WSN Node Controllers, May 2011, Séminaire du GDR SoC-SiP. - [87] A. TISSERAND. *Circuits for True Random Number Generation with On-Line Quality Monitoring*, May 2011, Claude Shannon Institut Workshop on Coding and Cryptography. - [88] A. TISSERAND. *Opérateurs arithmétiques matériels*, June 2011, Cours École Thématique ARCHI 2011, http://archi11.univ-perp.fr/. # **References in notes** - [89] A. AHMADINIA, C. BOBDA, M. BEDNARA, J. TEICH. A new approach for on-line placement on reconfigurable devices, in "18th International Parallel and Distributed Processing Symposium, 2004.", 2004. - [90] Z. ALLIANCE. Zigbee specification, ZigBee Alliance, 2005, no ZigBee Document 053474r06, Version. - [91] V. BAUMGARTE, G. EHLERS, F. MAY, A. NÜCKEL, M. VORBACH, M. WEINHARDT. *PACT XPP A Self-Reconfigurable Data Processing Architecture*, in "The Journal of Supercomputing", 2003, vol. 26, n<sup>o</sup> 2, p. 167–184. - [92] C. BOBDA. Introduction to Reconfigurable Computing: Architectures Algorithms and Applications, Springer, 2007. - [93] C. Bobda, M. Majer, D. Koch, A. Ahmadinia, J. Teich. *A Dynamic NoC Approach for Communication in Reconfigurable Devices*, in "Proceedings of International Conference on Field-Programmable Logic and - Applications (FPL)", Antwerp, Belgium, Lecture Notes in Computer Science (LNCS), Springer, August 2004, vol. 3203, p. 1032–1036. - [94] D. CHILLET, S. PILLEMENT, O. SENTIEYS. A Neural Network Model for Real-Time Scheduling on Heterogeneous SoC Architectures, in "IEEE International Joint Conference on Neural Networks, IJCNN'07", Orlando, FL, August, 12-17 2007. - [95] K. COMPTON, S. HAUCK. *Reconfigurable computing: a survey of systems and software*, in "ACM Comput. Surv.", 2002, vol. 34, n<sup>o</sup> 2, p. 171–210, http://doi.acm.org/10.1145/508352.508353. - [96] G. CONSTANTINIDES, P. CHEUNG, W. LUK. Wordlength optimization for linear digital signal processing, in "IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems", October 2003, vol. 22, no 10, p. 1432-1442. - [97] K. DANNE, R. MUHLENBERND, M. PLATZNER. Executing hardware tasks on dynamically reconfigurable devices under real-time conditions, in "International Conference on Field Programmable Logic and Applications", Lecture Notes in Computer Science, 2006. - [98] R. DAVID, S. PILLEMENT, O. SENTIEYS. *Energy-Efficient Reconfigurable Processors*, in "Low Power Electronics Design", C. PIGUET (editor), Computer Engineering, Vol 1, CRC Press, August 2004, chap. 20. - [99] A. DEJONGHE, B. BOUGARD, S. POLLIN, J. CRANINCKX, A. BOURDOUX, L. VAN DER PERRE, F. CATTHOOR. *Green Reconfigurable Radio Systems*, in "Signal Processing Magazine, IEEE", 2007, vol. 24, n<sup>o</sup> 3, p. 90–101. - [100] A. DUNKELS, B. GRONVALL, T. VOIGT. *Contiki-a lightweight and flexible operating system for tiny networked sensors*, in "Proceedings of the First IEEE Workshop on Embedded Networked Sensors", 2004. - [101] C. EBELING, D. CRONQUIST, P. FRANKLIN. *RaPiD Reconfigurable Pipelined Datapath*, in "International Workshop on Field Programmable Logic and Applications", Darmstadt, Lecture notes in Computer Science 1142, September 1996, p. 126–135. - [102] R. HARTENSTEIN. A Decade of Reconfigurable Computing: A Visionary retrospective, in "Design Automation and Test in Europe (DATE 01)", Munich, Germany, March 2001. - [103] R. HARTENSTEIN, M. HERZ, T. HOFFMAN, U. NAGELDINGER. *Using The KressArray for Configurable Computing*, in "Configurable Computing: Technology and Applications, Proc. SPIE 3526", Bellingham, WA, November 1998, p. 150–161. - [104] S. Kim, W. Sung. Word-Length Optimization for High Level Synthesis of Digital Signal Processing Systems, in "IEEE Workshop on Signal Processing Systems", Boston, October 1998, p. 142-151. - [105] K. Kum, J. Kang, W. Sung. *AUTOSCALER for C: An optimizing floating-point to integer C program converter for fixed-point digital signal processors*, in "IEEE Transactions on Circuits and Systems II Analog and Digital Signal Processing", September 2000, vol. 47, n<sup>o</sup> 9, p. 840-848. [106] M. LEE, H. SIGNH, G. LU, N. BAGHERZADEH, F. KURDAHI. *Design and Implementation of the MorphoSys Reconfigurable Computing Processor*, in "Journal of VLSI and Signal Processing-Systems for Signal, Image and Video Applications", March 2000, vol. 24, n<sup>o</sup> 2, p. 147–164. - [107] T. MARESCAUX, V. NOLLET, J. MIGNOLET, A. BARTICA, W. MOFFATA, P. AVASAREA, P. COENEA, D. VERKEST, S. VERNALDE, R. LAUWEREINS. *Run-time support for heterogeneous multitasking on reconfigurable SoCs*, in "the VLSI journal", 2004, vol. 38, p. 107–130, http://doi.acm.org/10.1145/996566.996637. - [108] D. MENARD, D. CHILLET, F. CHAROT, O. SENTIEYS. *Automatic Floating-point to Fixed-point Conversion for DSP Code Generation*, in "International Conference on Compilers, Architectures and Synthesis for Embedded Systems 2002 (CASES 2002)", Grenoble, October 2002. - [109] D. MENARD, D. CHILLET, O. SENTIEYS. Floating-to-fixed-point Conversion for Digital Signal Processors, in "EURASIP Journal on Applied Signal Processing (JASP), Special Issue Design Methods for DSP Systems", 2006, vol. 2006, n<sup>o</sup> 1. - [110] T. MIYAMORI, K. OLUKOTUN. *REMARC: Reconfigurable Multimedia Array Coprocessor*, in "IEICE Transactions on Information and Systems E82-D", February 1999, p. 389–397. - [111] W. NAJJAR, W. BOHM, B. DRAPER, J. HAMMES, R. RINKER, J. BEVERIDGE, M. CHAWATHE, C. ROSS. *High-Level Language Abstraction for Reconfigurable Computing*, in "Computer", 2003, vol. 36, n<sup>o</sup> 8, p. 63-69, http://doi.ieeecomputersociety.org/10.1109/MC.2003.1220583. - [112] V. NOLLET, T. MARESCAUX, D. VERKEST, J. MIGNOLET, S. VERNALDE. *Operating-system controlled network on chip*, in "Proceedings of the 41st annual Conference on Design automation", 2004, p. 256–259, http://doi.acm.org/10.1145/996566.996637. - [113] PHILIPS. Silicon Hive, Philips Inc., 2003, http://www.siliconhive.com. - [114] J. RABAEY. *Reconfigurable Processing: The Solution to Low-Power Programmable DSP*, in "IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)", 1997, vol. 1, p. 275–278. - [115] R. SALEH, S. WILTON, S. MIRABBASI, A. HU, M. GREENSTREET, G. LEMIEUX, P. PANDE, C. GRECU, A. IVANOV. *System-on-chip: reuse and integration*, in "Proceedings of the IEEE", 2006, vol. 94, n<sup>o</sup> 6, p. 1050–1069. - [116] T. TODMAN, G. CONSTANTINIDES, S. WILTON, O. MENCER, W. LUK, P. CHEUNG. *Reconfigurable computing: architectures and design methods*, in "IEE Proc.-Comput. Digit. Tech.", March 2005, vol. 152, n<sup>o</sup> 2. - [117] G. VENKATARAMANI, W. NAJJAR, F. KURDAHI, N. BAGHERZADEH, W. BOHM, J. HAMMES. *Automatic compilation to a coarse-grained reconfigurable system-on-chip*, in "Trans. on Embedded Computing Systems", 2003, vol. 2, n<sup>o</sup> 4, p. 560–589, http://doi.acm.org/10.1145/950162.950167. - [118] E. WAINGOLD, M. TAYLOR, D. SRIKRISHNA, V. SARKAR, W. LEE, V. LEE, J. KIM, M. FRANK, P. FINCH, R. BARUA, J. BABB, S. AMARASINGHE, A. AGARWAL. *Baring it all to software: The raw machine*, in "IEEE Computer", September 1997, vol. 30, n<sup>o</sup> 9, p. 86–93. - [119] C. WOLINSKI, K. KUCHCINSKI, A. POSTOLA. *UPaK: Abstract Unified Pattern Based Synthesis Kernel for Hardware and Software Systems*, in "University Booth, DATE 2007", Nice, France, May 2007. - [120] Z. A. YE, N. SHENOY, P. BANEIJEE. A C compiler for a processor with a reconfigurable functional unit, in "Proceedings of the 2000 ACM/SIGDA eighth international symposium on Field Programmable Gate-Arrays, FPGA '00", New York, NY, USA, ACM Press, 2000, p. 95–100, http://doi.acm.org/10.1145/329166.329187.