INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE # Project-Team Cairn # Energy Efficient Computing Architectures Rennes - Bretagne-Atlantique Theme: Architecture and Compiling ## **Table of contents** | 1. | Team | | | | | | |-----------|-------|-------------------------------------------------------------------------------------------|------------|--|--|--| | 2. | Overa | Overall Objectives2 | | | | | | | 2.1. | Overall Objectives | 2 | | | | | | 2.2. | Highlights | 3 | | | | | 3. | Scien | tific Foundations | .4 | | | | | | 3.1. | Panorama | 4 | | | | | | 3.2. | Dynamically and Heterogeneous Reconfigurable Platforms | 5 | | | | | | 3.3. | Compilation and Synthesis for Reconfigurable Platform | 6 | | | | | | 3.4. | Algorithm Architecture Interaction | 7 | | | | | 4. | Appli | cation Domains | . <b>7</b> | | | | | | 4.1. | Panorama | 7 | | | | | | 4.2. | 4G Wireless Communication Systems | 8 | | | | | | 4.3. | Wireless Sensor Networks | 8 | | | | | | 4.4. | Automotive Systems | 8 | | | | | | 4.5. | Multimedia processing | 8 | | | | | <b>5.</b> | Softw | are | . 9 | | | | | | 5.1. | Panorama | 9 | | | | | | 5.2. | Gecos | 10 | | | | | | 5.3. | ID.Fix: Infrastructure for the Design of Fixed-point Systems | 10 | | | | | | 5.4. | UPaK: Abstract Unified Pattern-Based Synthesis Kernel for Hardware and Software Systems | 10 | | | | | | 5.5. | DURASE: Automatic Synthesis of Application-Specific Processor Extensions | 11 | | | | | | 5.6. | PowWow: Power Optimized Hardware and Software FrameWork for Wireless Motes (AP-L | ,- | | | | | | 1 | 0-01) | 11 | | | | | | 5.7. | SoCLib: Open Platform for Virtual Prototyping of Multi-Processors System on Chip | 12 | | | | | | 5.8. | OCHRE: On-Chip Randomness Extraction | 13 | | | | | 6. | New 1 | Results | 13 | | | | | | 6.1. | Dynamically and Heterogeneous Reconfigurable Platforms | 13 | | | | | | 6.1 | .1. New Reconfigurable Architectures | 13 | | | | | | | 6.1.1.1. High-level modeling of reconfigurable architectures | 13 | | | | | | | 6.1.1.2. Power models of reconfigurable architectures | 13 | | | | | | | ı e | 13 | | | | | | | 1 | 14 | | | | | | 6.1 | 1 71 6 1 7 | 14 | | | | | | | C | 14 | | | | | | | 6.1.2.2. Arithmetic Operators for High-Performance Cryptography | 15 | | | | | | | S S | 15 | | | | | | | 1 21 2 1 7 | 15 | | | | | | 6.1 | .3. Optimization of Advanced Arithmetic Operators | 15 | | | | | | 6.1 | .4. Management of Dynamically Reconfigurable Systems | 15 | | | | | | | 6.1.4.1. Spatio-Temporal Scheduling based on Artificial Neural Networks | 16 | | | | | | | 6.1.4.2. Flexible Communication Infrastructure | 16 | | | | | | 6.1 | · | 16 | | | | | | 6.1 | | 17 | | | | | | | 6.1.6.1. Coding Techniques Improving Reliability and Power Consumption for On-Chip | p | | | | | | | | 17 | | | | | | | 6.1.6.2. Ultra Low-Power Architecture for Control-Oriented Applications in Wireless Senso | | | | | | | | Nodes | 17 | | | | | | 6.1 | .7. SoC Modeling and Prototyping on FPGA-based Systems | 18 | | | | | | 6.2. | Compilation and Synthesis for Reconfigurable Platform | 18 | | | | | | 6.2 | 6.2.1. Optimized Synthesis of Processor Extensions in the <i>DURASE</i> Syst | | 18 | |--------------------------------|-------|------------------------------------------------------------------------------|-----------------------------------------------------------------------|--------------| | 6.2.2. Run-time Reconfigurable | | .2. Run- | time Reconfigurable Architecture Modeling | 19 | | | | 6.2.2.1. | Roma project | 19 | | | | 6.2.2.2. | RecMotifs project | 21 | | | | 6.2.2.3. | Floating-Point to Fixed-Point Conversion | 22 | | | 6.3. | n Architecture Interaction | 22 | | | | 6.3 | .1. Reco | onfigurable Video Coding | 22 | | | 6.3 | .2. Rang | ge Estimation and Computation Accuracy Optimization | 22 | | | | 6.3.2.1. | Range Estimation | 22 | | | | 6.3.2.2. | Performance Evaluation of Fixed-Point Systems | 23 | | | 6.3 | | i-Antenna Systems | 23 | | | 6.3 | .4. Coop | perative Strategies for Low-Energy Wireless Networks | 23 | | | 6.3 | | ortunistic Routing | 24 | | | 6.3 | | ch enhancement and coding issues | 24 | | | 6.3 | .7. True | Random Number Generators | 25 | | | 6.3 | | ible hardware accelerators for biocomputing applications | 25 | | | 6.3 | .9. Paral | llel reconfigurable architectures for LDPC decoding | 26 | | 7. | Contr | | Grants with Industry | | | | 7.1. | | GEODES (2008-2011) | 27 | | | 7.2. | | 12 Program - S2S4HLS (2008-2012) | 27 | | | 7.3. | | 12 Program - RecMotifs (2008-2012) | 27 | | | 7.4. | | hitectures du Futur Open-People (2009-2011) | 28 | | | 7.5. | | WiC (2009-2011) | 28 | | | 7.6. | | hitectures du Futur - CIFAER (2008-2011) | 28 | | | 7.7. | | hitectures du Futur - FOSFOR (2008-2011) | 29 | | | 7.8. | | hnologies Logicielles - SoCLib (2007-2010) | 29 | | | 7.9. | | ges et Réseaux - Transmedi@ (2008-2009) | 29 | | | | | ges et Réseaux - RPS2 (2008-2010) | 29 | | | | | hitectures du Futur - ROMA: Reconfigurable Operators for Multimedia A | Applications | | | | 2007-2010 | | 30 | | 8. | | | nd Activities | 30 | | | | National 1 | | 30 | | | 8.2. | - | Initiatives | 31 | | | 8.3. | | anal Initiatives | 31 | | | 8.4. | | esearch visitors | 32 | | 9. | | | | | | | 9.1. | | Community Animation | 32 | | | 9.2. | | h.D. Subjects | 33 | | | 9.3. | | and Invitations | 33 | | | 9.4. | _ | and Responsibilities | 34 | | 10. | Bibli | ography | | | ## 1. Team #### **Research Scientists** François Charot [Research Associate (CR) Inria, Rennes] Steven Derrien [Associate professor, University of Rennes 1, IFSIC, on leave at Inria since Sept. 2009, Rennes] Olivier Sentieys [Team Leader, Professor, University of Rennes 1, ENSSAT, on leave (half time) at Inria, Lannion, HdR] Arnaud Tisserand [Research Associate (CR) CNRS, Lannion, HdR] #### **Faculty Members** Olivier Berder [Associate professor, University of Rennes 1, ENSSAT, Lannion] Emmanuel Casseau [Professor, University of Rennes 1, ENSSAT, Lannion, HdR] Daniel Chillet [Associate professor, University of Rennes 1, ENSSAT, Lannion, HdR] Daniel Ménard [Associate professor, University of Rennes 1, ENSSAT, Lannion] Sébastien Pillement [Associate professor, University of Rennes 1, IUT, Lannion, HdR] Stanislaw Piestrak [Professor, on leave from University of Metz at Inria since Sept. 2008, Lannion, HdR] Patrice Quinton [Professor, Director of the Brittany branch of the ENS de Cachan, Rennes, HdR] Romuald Rocher [Associate Professor, University of Rennes 1, IUT, Lannion] Pascal Scalart [Professor, University of Rennes 1, ENSSAT, Lannion, HdR] Christophe Wolinski [Professor, University of Rennes 1, IFSIC, Rennes, HdR] #### **Technical Staff** Charles Wagner [IR CNRS SED, Rennes] Arnaud Carer [100Gflex Project, Lannion] Romain Fontaine [Perecap Project since Oct. 2010, Lannion] Amit Kumar [Nano 2012 Project, Rennes] Maxime Naullet [IJD INRIA KerGekoz Project since Nov. 2010, Rennes] Jérémie Guidoux [Nano 2012 Project until Dec. 2010, Rennes] Jérôme Astier [Geodes Project until Aug. 2010, Lannion] Thomas Anger [Transmedi@ Project until Oct. 2010, Lannion] Loïc Cloatre [Nano2012 Project until Aug. 2010, Rennes] Florent Berthelot [RPS2 Project until Aug. 2010, Rennes] Renaud Santoro [POF Project with Orange Labs until Oct. 2010, Lannion] #### PhD Students Michel Thériault [CSRNG Canada grant (co-supervision with Laval University, Québec), Lannion] Ludovic Devaux [University grant, Lannion] Antoine Eiche [University grant, Lannion] Quoc Tuong Ngo [University grant, Lannion] Cécile Beaumin-Palud [MENRT grant, Lannion] Andrei Banciu [CIFRE grant, STMicroelectronics, Grenoble] Karthick Parashar [Inria Cordi grant, Lannion] Antoine Floch [Inria grant, Rennes] Antoine Morvan [Inria grant, Rennes] Naeem Abbas [Inria grant, Rennes] Le Quang Vinh Tran [MENRT grant, Lannion] Chenglong Xiao [Inria grant, Lannion] Jean-Charles Naud [Inria grant, Lannion] Matthieu Texier [CEA grant, Saclay] Thomas Chabrier [University grant, Lannion] Danuta Pamula [Co-tutelle France-Poland, Lannion] Robin Bonamy [University grant, Lannion] Vivek Tovinakere-Dwarakanath [University grant, Lannion] Mahtab Alam [University grant, Lannion] Amine Didioui [CEA grant, Grenoble] Hervé Yviquel [MENRT grant, Lannion] Aymen Chakhari [University grant, Lannion] Nhan Le Trong [University grant, Lannion] Pramod P. Udupa [University grant, Lannion] Erwan Grace [CEA - University grant until Sept. 2010, Lannion] Hai-Nam Nguyen [University grant, Lannion] Erwan Raffin [CIFRE grant, Thomson, Rennes] Shafqat Khan [University grant until Sept. 2010, Lannion] Kevin Martin [INRIA grant until Aug. 2010, Rennes] Adeel Pasha [MENRT grant until Dec. 2010, Rennes] Manh Pham [Brittany Region - University grant until Dec. 2010, Lannion] #### **Post-Doctoral Fellows** Ruifeng Zhang [GEODES Project since Apr. 2010, Lannion] Kevin Martin [ATER UR1 since Sept. 2010, Rennes] Jérémie Guillot [ROMA Project until Aug. 2010, Lannion] #### **Administrative Assistants** Nadia Saint-Pierre [Assistant, University of Rennes 1, IRISA, Rennes] Joelle Thépault [Assistant, University of Rennes 1, Enssat, Lannion] ## 2. Overall Objectives ## 2.1. Overall Objectives CAIRN is a common project with CNRS, University of Rennes 1 (ENSSAT Lannion and IFSIC Rennes) and ENS Cachan-Antenne de Bretagne, and is located on two sites: Rennes and Lannion. The team has been created on January the 1<sup>st</sup>, 2008 and is a "reconfiguration" of the former R2D2 research team from Irisa. The scientific aim of CAIRN is to study hardware and software architectures of *Reconfigurable System-on-Chip* (RSOC), i.e. integrated chips which include reconfigurable blocks whose hardware configuration may be changed before or even during execution. Reconfigurable systems have been considered by research in computer science and electrical engineering for about twenty years [89], [96] thanks to the possibilities opened up initially by Field Programmable Gate Arrays (FPGA) technology and more recently by reconfigurable processors [86], [3], [9]. In FPGA, a particular hardware configuration is obtained by loading a bit-stream that is used to shape parameterizable blocks into specific hardware functions. In a reconfigurable processor, coarse-grained logic elements operate on word-size operands and employ reconfigurable operators as computing elements. They are generally tightly coupled with one or more processor cores and act as reconfigurable computing accelerators. Usually, the configuration streams are small enough to ensure run-time – or dynamic – reconfiguration. In a broader sense, hardware reconfiguration may happen not only in a single chip, but also in a distributed hardware system, in order to adapt this system to changing conditions. This happens, for example, on a mobile system. Recent evolutions in technology and modern hardware systems confirm that reconfigurable chips are increasingly used in modern applications or embedded into more general System-on-Chip (SoC) [110]. Rapidly changing application standards in fields such as communications and information security ask for frequent modifications of the devices. Software updates may often not be sufficient to keep devices in the market, but hardware redesigns are quite expensive. The need to continuously adapt to changing environments (e.g. cognitive radio) is another incentive to use dynamic reconfiguration at runtime. Finally, with technologies at 65 nm and below, manufacturing problems strongly influence electrical parameters of transistors, and transient errors caused by particles or radiations will also more and more often appear during execution: error detection and correction mechanisms or autonomic self-control can benefit from reconfiguration capabilities. Standard processors or system-on-chip enable to develop flexible software on fixed hardware. Reconfigurable platforms enable to develop *flexible software on flexible hardware*. As the density of chips increases [109], power efficiency has become "the Grail" of chip architects: not only for portable devices but also for high-performance general-purpose processors, power (or energy) considerations are as important as the overall performance of the products. This power challenge can only be tackled by using application-specific architectures, or at least by incorporating some application-specific elements into SoCs, as ASICs (Application Specific Integrated Circuit) are much more power-efficient than GPPs (General-Purpose Processor). The designers of SoCs thus face a very difficult challenge: trading between the flexibility of GPP which leads to high-volume and short design time, and the efficiency of ASICs which helps solving the power efficiency problem. Therefore, reconfigurable architectures are often recognized to exhibit the best trade-off potential between power, performance, cost and flexibility [108], [92] because their hardware structure can be adapted to the application needs. However, reconfigurable systems raise several questions: - What are the basic elements of a good reconfigurable system? In the early days, they were bit-level operators, and they tend to become word-level operators. There is however no agreement on the model that should be used. - How can we reconfigure such a system quickly? When to reconfigure? What is the information needed to reconfigure? - How can we program efficiently reconfigurable systems? We would like to have compilers, not hardware synthesizers and place-and-routers. - In an application, what must be targeted to reconfigurable chips and what to conventional processors? More generally, how can we transform and optimize an algorithm to take advantage of the potential of reconfigurable chips? The scientific goal of CAIRN is to contribute to answer these questions, based on our background and past experience. To this end, CAIRN intends to approach energy efficient reconfigurable architectures from three angles: the invention of **new reconfigurable platforms**, associated **design and compilation tools**, and the exploration of the **interaction between algorithms and architectures**. Power consumption and processing power are considered as the main constraints in our proposed architecture, design flow and algorithm optimizations, in order to maximize the global energy efficiency of the system. **Wireless Communication** is our privileged field of applications. Our research includes the prototyping of parts of these applications on reconfigurable and programmable platforms. Moreover in the framework of research and/or contractual cooperations other **application domains** are considered: image indexing, video processing, cryptography and traffic filtering in high-speed networks. Members of the CAIRN team have collaborations with large companies like STmicroelectronics (Grenoble), Thomson (Rennes), Thales (Paris), Alcatel (Lannion), France-Telecom Orange Labs (Lannion), Atmel (Nantes), Xilinx (USA) or SME like Geensys (Nantes), R-interface (Marseille), Ditocom (Rennes), Sensaris (Grenoble), Envivio (Rennes), InPixal (Rennes), Sestream (Paris), Ekinops (Lannion). They are involved in several national or international funded projects (ITEA2 Geodes, Nano2012 S2S4HLS and RECMOTIF projects, ANR funded Cifaer, Fosfor, BioWic, Open-People, Greco, Ocelot and "Pôles de compétitivité" funded 100Gflex). ## 2.2. Highlights The team has organized the 21th IEEE International Conference on Application-specific Systems, Architectures and Processor (ASAP 2010) in Rennes. 121 submissions came from 31 countries in Africa, Asia, Europe, North America, Oceania, and South America while 32 long papers (26% acceptance rate) and 18 short papers (posters) were published. 76 participants attended ASAP'2010. This year has seen the design and fabrication of a second integrated circuit prototype (Ochre: a circuit for On-Chip Randomness Extraction) including our architecture proposal for hybrid random number generator (RNG). The proposed architecture monitors the TRNG quality in real time to validate the RNG. Ochre aims at validating different RNG architectures and also their tolerance to non-invasive security attacks. See Section 6.3.7. Daniel Chillet, Sébastien Pillement and Arnaud Tisserand defended their "Habilitation à Diriger des Recherches (HDR)" thesis in 2010. Erwan Raffin et al. received the Best Paper Award for their paper [69] E. Raffin, C. Wolinski, F. Charot, K. Kuchcinski, S. Guyetant, S. Chevobbe and E. Casseau, Scheduling, Binding and Routing System for a Run-Time Reconfigurable Operator Based Multimedia Architecture in the Conference on Design and Architectures for Signal and Image Processing (DASIP), Edinburgh, UK, Oct. 2010. This paper is one of the results disseminated from the ROMA ANR Project, coordinated by CAIRN. The article of Naeem Abbas, Steven Derrien, Sanaya Rajopadhye and Patrice Quinton (Accelerating HMMER on FPGA using Parallel Prefixes and Reductions) was selected as a best paper candidate (with 6 others papers out of 47) at the IEEE 2010 International Conference on Field-Programmable Technology (FPT'10) (acceptance rate of 47 out of 170). The official FPL-Conference <a href="http://www.fpl.org">http://www.fpl.org</a> benchmark-suite for FPGAs <a href="http://benchmarksuite.beyer-andreas.net/list">http://benchmarksuite.beyer-andreas.net/list</a> is part of the FPL-Conference community. The main objective is to compose a set of "problems" in different categories, in order to permit rapid evaluation and comparisons between several approaches. In this context the DRAFT network and the related DRAGOON generator have been chosen to be included in this benchmark suite. Our IP belongs to the dynamic reconfiguration process tests and evaluations. ## 3. Scientific Foundations #### 3.1. Panorama The development of complex applications is traditionally divided into three steps: theoretical study of the algorithms, study of the target architecture and implementation. When facing new emerging applications such as high-performance, low-power, low-cost mobile communication systems or smart sensor-based systems, it is mandatory to strengthen the design flow by a simultaneous study of both algorithmic and architectural issues<sup>1</sup>. Figure 1 shows the global design flow that we propose to develop. It is organized in levels which refer to our three research themes: application optimization (algorithmic, fixed-point and advanced representations of numbers), platform instance optimization (hardware and middleware), and stepwise refinement and compilation of software tasks (transformations, configuration generation). In the rest of this part, we briefly describe the challenges concerning **new reconfigurable platforms** in Section 3.2, the issues on **compiler and synthesis tools** related to these platforms in Section 3.3, and the remaining challenges in **algorithm architecture interaction** in Section 3.4. <sup>&</sup>lt;sup>1</sup>Often referenced as algorithm-architecture mapping or interaction. Figure 1. CAIRN's general design flow and related research themes ## 3.2. Dynamically and Heterogeneous Reconfigurable Platforms One available technology for building reconfigurable systems is the field-programmable gate arrays (FPGA) introduced to the market in the mid 1980s. Today's components feature millions of gates of programmable logic, and they are dense enough to host complete computing systems on a programmable chip. These FPGAs have been the reconfigurable computing mainstream for a couple of years and achieve flexibility by supporting gate-level reconfigurability, e.g. they can be fully optimized for any application at the bit level. However, their flexibility is achieved at a very important interconnection cost. To be configured, a large amount of data must be distributed via a slow serial programming process to all the processing and interconnection resources. Configurations must be stored in an external memory. These interconnection and configuration overheads lead to energy inefficient architectures. To increase optimization potential of programmable processors without the FPGAs penalties, the functional-level reconfiguration was introduced. *Reconfigurable Processors* are the most advanced class of reconfigurable architectures. The main concern of this class of architectures is to support flexibility while reducing reconfiguration overhead. Precursors of this class were the KressArray [97], RaPid [95], and RaW machines [112] which were specifically designed for streaming algorithms. Morphosys [100], Remarc [104] or Adres [93] contain programmable ALUs with a reconfigurable interconnect. These works have led to commercial products such as the Extreme Processor Platform (XPP) [85] from PACT, Bresca [107] from Silicon Hive, designed mainly for telecommunication applications. Another strong trend towards heterogeneous reconfigurable processors can be observed. Hybrid architectures combine standard GPP or DSP cores with arrays of *field-configurable elements*. These new reconfigurable architectures are entering the commercial market. Some of their benefits are the following: functionality on demands (set-top boxes for digital TV equipped with decoding hardware on demand), acceleration on demand (coprocessors that accelerate computationally demanding applications in multimedia, communications applications), and shorter time to market (products that target ASIC platforms can be released earlier using reconfigurable hardware). Dynamic reconfiguration allows an architecture to adapt itself to various incoming tasks. This requires complex management and control which can be provided as services of a real-time operating system (RTOS) [101]: communication, memory management, task scheduling [91] [88] and task placement [83]. Such an Operating System (OS) approach has many advantages: it is a complete design framework, independent of the technology and of the hardware architecture, thus helping to drastically reduce the design time of the complete platform. Communications in a reconfigurable platform is also a very important research subject. The role of communication resources is to support transactions between the different components of the platform, either between macro-components of the platform – main processor, dedicated modules, dynamically reconfigurable parts of the platform – or inside the elements of the reconfigurable parts themselves. This has motivated studies on Networks on Chip for Reconfigurable SoCs [87] [106] that trade flexibility and quality of service. In CAIRN we mainly target reconfigurable system-on-chip (RSoC) defined as a set of computing and storing resources organized around a flexible interconnection network and integrated onto a single silicon chip (or programmable chip such as FPGAs). The architecture is specialized for an application domain, and the flexibility is featured by hardware reconfiguration and software programmability. Therefore, computing resources are heterogeneous and we focus on the following: - Reconfigurable hardware blocks with a dynamic behavior where reconfigurability can be achieved at the bit or at the operator level. Our research aims at defining new reconfigurable computing and storing resources. Since reconfiguration must occur as fast as possible (typically a few cycles), the reduction of the configuration bit-stream length is also a key issue. - When performance and power consumption are major constraints, it is well known that optimized specialized hardware blocks (often called IPs for Intellectual Properties) are the best (and often the only) solution. As a flexible extension of specialized IPs, we study multi-mode components for very specific set of high-complexity algorithms, without loss of performance. - Specialized processors with tailored instruction-set still offer a viable solution to trade between energy efficiency and flexibility. They are especially interesting in the context of recent FPGA platforms where multiple processors can be easily embedded. We also focus on the automatic generation of an optimized customized instruction-set and of the associated data-path and interface with an embedded processor core. ## 3.3. Compilation and Synthesis for Reconfigurable Platform The absence of compilers is one of the major limitations for the use of reconfigurable architectures in real-life applications. Therefore, the ability to compile and optimize code on reconfigurable hardware platforms from high-level specifications is the key for a real success story and is a hot topic in the research community. We continue our research efforts to offer **efficient tools with close links to architectures**. Most current programming environments for reconfigurable systems consist of separate tool flows for the software and the hardware. Processor code and configuration data for the reconfigurable processing units are handcrafted and wrapped into libraries of functions. Progress beyond current practices calls for compilers capable of generating code and configurations from a high-level general-purpose programming language. Such a compiler decides which operations go into the reconfigurable processors. Loops or frequently executed code fragments are good candidates for reconfigurable platforms. For general-purpose code, this leads to several problems: it is difficult to extract sets of operations with matching granularity at a sufficient level of parallelism; inner loops of general-purpose programs often contain excess code; i.e. code that must be run on a CPU such as exceptions, function or system calls. Efforts aimed at automatic code generation for reconfigurable architectures include works of [105], [111] and [114]. Another approach to programming and design of reconfigurable platform, especially for special-purpose elements, is to use techniques inspired from high-level synthesis. Here also, loops are the target of the methods: the goal is to either generate special-purpose architectures made out of arithmetic operators or to produce parallel architectures. In both cases, the output may be either efficient special-purpose hardware for computation-intensive tasks and/or the parameters for a reconfigurable architecture. Such approaches will eventually create a bridge between compilation techniques and hardware design. Finally, we continue to investigate floating-point to fixed-point automatic conversion with the objective to develop an open-source tool. Multimedia and signal processing are main application fields for reconfigurable platforms. In general, these algorithms are specified using floating-point operations, but, for efficiency reasons, they are often implemented with fixed-point operations either in software for DSP cores or as special-purpose hardware. Unfortunately, fixed-point conversion is very challenging and time-consuming, typically demanding 25 to 50% of the total design or implementation time<sup>2</sup>. Thus, tools are required to automate this conversion. In software implementations (DSP, MCU), the aim is to define an optimized fixed-point specification which minimizes the code size and the execution time for a given computation accuracy constraint. This optimization is achieved through the modification of the scaling operation location and the selection of the data wordlength according to the different data-types supported by DSPs. In hardware implementations (ASIC, FPGA), the complete architecture has to be defined. The efficient implementation requires to minimize the architecture size and the power consumption. Thus, the conversion process goal is to minimize the operator word-length. In the fixed-point conversion process, one of the main challenge is to evaluate the fixed-point specification accuracy. For DSP-software implementation, methodologies have been proposed [99], [103], [102] to achieve a floating-point to fixed-point conversion leading to an ANSI-C code with integer data types. One of the key is to closely link the compilation flow to the latest DSP features. For hardware implementation, the best results are obtained when the word-length optimization process is coupled with the high-level synthesis [98] [90]. ## 3.4. Algorithm Architecture Interaction As CAIRN mainly targets domain-specific systems-on-chip including reconfigurable capabilities, algorithmic-level optimizations have a great potential on the efficiency of the overall system. Based on the skills and experiences in "signal processing and communications" of some CAIRN's members, we conduct research on algorithmic optimization techniques under two main constraints: energy consumption and computation accuracy; and for two main application domains: fourth-generation (4G) mobile telecommunications and wireless sensor networks (WSN). These application domains are very conducive to our research activities. The high complexity of the first one and the stringent power constraint of the second one, require the design of specific high-performance and energy efficient SoCs. Sections 4.1 to 4.5 detail the application domains that we focus on. We also work on computer arithmetic operators and representations of numbers for hardware and software implementations. We provide algorithms for evaluating operations such as: addition, multiplication, multiplication by constant, power, division, roots, (inverse) trigonometric functions, (inverse) hyperbolic functions, logarithms, exponentials, and combinations. For hardware implementations, we work on the reduction of the delay, silicon area and power consumption. For software implementations, we focus on high-performance computing libraries on general purpose processors (GPPs) and graphic processor units (GPUs). We work on the use of exotic representations of numbers in specific domains such as secured implementations of cryptosystems with high-performance protection against side-channel analysis or fault attacks. ## 4. Application Domains #### 4.1. Panorama Our research is based on realistic applications, in order to both discover the main needs created by these applications and to invent realistic and interesting solutions. <sup>&</sup>lt;sup>2</sup>http://www.mathworks.com/company/newsletters/digest/may04/uwb.html The high complexity of the **Next-Generation (4G) Wireless Communication Systems** leads to the design of real-time high-performance specific architectures. The study of these techniques is one of the main field of applications for our research, based on our experience on WCDMA for 3G implementation. In **Wireless Sensor Networks** (WSN), where each wireless node has to operate without battery replacement for a long time, energy consumption is the most important constraint. In this domain, we mainly study energy-efficient architectures and wireless cooperative techniques for WSN. **Intelligent Transportation Systems** (ITS), and especially Automotive Systems, more and more apply technology advances. While wireless transmissions allow a car to communicate with another or even with road infrastructure, **automotive industry** can also propose driver assistance and more secure vehicles thanks to improvements in computation accuracy for embedded systems. Other important fields will also be considered: specialized hardware systems for the filtering of the network traffic at high-speed, high-speed true-random number generation for security, content-based image retrieval and video processing. ## 4.2. 4G Wireless Communication Systems With the advent of the next generation (4G) broadband wireless communications, the combination of MIMO (Multiple-Input Multiple-Output) wireless technology with Multi-Carrier CDMA (MC-CDMA) has been recognized as one of the most promising techniques to support high data rate and high performance. Moreover, future mobile devices will have to propose interoperability between wireless communication standards (4G, WiMax ...) and then implement MIMO pre-coding, already used by WiMax standard. Finally, in order to maximize mobile devices lifetime and guarantee quality of services to consumers, 4G systems will certainly use cooperative MIMO schemes or MIMO relays. Our research activity focuses on MIMO pre-coding and MIMO cooperative communications with the aim of algorithmic optimization and implementation prototyping. #### 4.3. Wireless Sensor Networks Sensor networks are a very dynamic domain of research due, on the one hand, to the opportunity to develop innovative applications that are linked to a specific environment, and on the other hand to the challenge of designing totally autonomous communicating objects. Cross-layer optimizations lead to energy-efficient architectures and cooperative techniques dedicated to sensor networks applications. ### 4.4. Automotive Systems Technology advances, for embedded devices inside vehicles or communication systems between vehicles (V2V) or with road infrastructure (V2R), allow to significantly improve the security of drivers and road users. One of our goals is to propose new low-cost and energy-efficient mobile communication solutions to ease and make safer road traffic conditions. Considering "intelligent" road signs and vehicles, i.e. equipped with an autonomous radio communication system, drivers will be able to receive at any time various information about traffic fluidity or road signs identification. In particular, cooperative MIMO techniques are used to decrease the energy consumption of the communications. Other research related to automative systems is for example the design of proved accurate fixed-point controllers. ## 4.5. Multimedia processing In multimedia applications, audio and video processing is the major challenge embedded systems have to face. It is computationally intensive with power requirements to meet. Video or image processing at pixel level, like image filtering, edge detection and pixel correlation or at bloc level such as transforms, quantization, entropy coding and motion estimation have to be accelerated. We investigate the potential of reconfigurable architectures for the design of efficient and flexible accelerators in the context of multimedia applications. ## 5. Software #### 5.1. Panorama Besides the development of new reconfigurable architectures, the need for efficient compilation flow is stronger than ever. Challenges come from the high parallelism of these architectures and also from new constraints such as resource heterogeneity, memory hierarchy and power constraints and management. We aim at defining a highly effective software framework for the compilation of high-level specifications into optimized code executed on a reconfigurable hardware platform. Figure 2 shows the global framework that we are currently developing. Figure 2. CAIRN's general software development framework Our approach assumes that the application is specified as a hierarchical block diagram of communicating tasks expressing data-flow or control, where each task is expressed using languages like C, Signal, Scilab or Matlab, and is then transformed into an internal representation by the compiler front-end. Then, our framework is based on applying some high-level transformations onto the internal representation. Different internal representations are used depending on the targeted transformations or the targeted architectures. - The classical Control and Data Flow Graph (CDFG) is the main internal formalism of our framework. It is the basis for transformations like code optimizations, fixed-point transformations, instruction-set extraction or scheduling. Gateways will be provided from CDFG to other supported formalisms. - The Hierarchical Conditional Dependency Graph (HCDG) format<sup>3</sup> will be used as the internal representation for pattern-based transformations. - Other internal representations like Signal Flow Graphs (SFG) and Polyhedral Reduced Dependence Graph (PRDG) will be used respectively for application accuracy estimation and loop parallelization techniques. <sup>&</sup>lt;sup>3</sup>as defined in the Polychrony http://www.irisa.fr/espresso/Polychrony/ toolset Finally, back-end tools enable the generation of code like VHDL for the hardwired or reconfigurable blocks, C for embedded processor software, and SystemC for simulation purposes (e.g. fixed-point simulations). The compiler front-end, the back-end generators, the transformation toolbox as well as the different internal representations and their respective gateways are based on a single framework: the Gecos framework. Besides CAIRN's general design workflow, and in order to promote research undertaken by CAIRN, several hardware and software prototypes are developed. Among those, some distributed software are presented in this report: Gecos a flexible compilation platform, ID.Fix an infrastructure for the automatic transformation of software code aiming at the conversion of floating-point data types into a fixed-point representation, UPaK and Durase for the compilation and the synthesis targeting reconfigurable platforms, and Interconnect Explorer a high-level power and delay estimation tool for on-chip interconnects. #### **5.2. Gecos** **Participants:** Steven Derrien [correspondant], Daniel Ménard, Kevin Martin, Antoine Floch, Antoine Morvan, Adeel Pasha, Patrice Quinton, Amit Kumar, Loïc Cloatre. The Gecos (Generic Compiler Suite) project is an open source Eclipse-based C compiler infrastructure developed in the CAIRN group since 2004 that allows for fast prototyping of complex compiler passes. Gecos was designed so as to address part of the shortcomings of existing C/C++ infrastructures such as SUIF and LLVM. Gecos is a 100% Java based implementation and is based on modern software engineering practices. It uses Eclipse plugin as an underlying infrastructure and thus takes benefits of its plugin mechanism to be easily extensible. So as to benefit from all the benefits of Model Driven Software Engineering techniques, we now also offer a EMF (Eclipse Modeling Framework) based version of the compiler intermediate representation, and plan to base all subsequent developments on MDE technologies. The Gecos infrastructure is still under very active development, and now serves as a backbone infrastructure to many group members (Upak, Durase, ID.Fix). In 2009, the work has focused on retargeting the infrastructure for source to source transformation, in the context of the Nano2012-S2S4HLS project in collaboration with STMicroelectronics. The Gecos compiler framework is open-source and is hosted on the INRIA gforge <a href="http://gecos.gforge.inria.fr">http://gecos.gforge.inria.fr</a>. #### 5.3. ID.Fix: Infrastructure for the Design of Fixed-point Systems Participants: Daniel Ménard [correspondant], Olivier Sentieys, Romuald Rocher, Loïc Cloatre, Jérémie Guillot. The different techniques proposed by the team for fixed-point conversion are implemented on the ID.Fix infrastructure. The application is described with a C code using floating-point data types and different pragmas, used to specify parameters (dynamic, input/output word-length, delay operations) for the fixed-point conversion. This tool determines and optimizes the fixed-point specification and then, generates a C code using fixed-point data types (ac\_fixed) from Mentor Graphics. The infrastructure is made-up of three main modules corresponding to the fixed-point conversion (Fix.Conv), the accuracy evaluation (Acc.Eval) and the dynamic range evaluation (Dyn.Eval). The different development carried-out in 2010 allows obtaining a first functional version of the tool. A fixed-point conversion can be carried-out on small examples (a single C function). The development of this tool has been achieved thanks to an INRIA graduate engineer in the context of S2S4HLS project until September 2010, a CNRS graduate engineer until August 2010 in the context of ROMA ANR project and different students during their training period. # 5.4. UPaK: Abstract Unified Pattern-Based Synthesis Kernel for Hardware and Software Systems Participants: Christophe Wolinski [correspondant], François Charot, Kevin Martin, Antoine Floch. We are developing (with strong collaboration of Lund University, Sweden and Queensland University, Australia) UPaK Abstract Unified Pattern Based Synthesis Kernel for Hardware and Software Systems [113]. The preliminary experimental results obtained by the UPak system show that the methods employed in the systems enable a high coverage of application graphs with small quantities of patterns. Moreover, high application execution speed-ups are ensured, both for sequential and parallel application execution with processor extensions implementing the selected patterns. UPaK is one of the basis for our research on compilation and synthesis for reconfigurable platforms. It is based on the HCDG representation of the Polychrony software designed at INRIA-Rennes in the project-team Espresso. # **5.5. DURASE: Automatic Synthesis of Application-Specific Processor** Extensions Participants: Christophe Wolinski [correspondant], François Charot, Kevin Martin, Antoine Floch. We are developing a framework enabling the automatic synthesis of application specific processor extensions. It uses advanced technologies, such as algorithms for graph matching and graph merging together with constraints programming methods. The framework is organized around several modules. - CoSaP: Constraint Satisfaction Problem. The goal of CoSaP is to decouple the statement of a constraint satisfaction problem from the solver used to solve it. The CoSaP model is an Eclipse plugin described using EMF to take advantage of the automatic code generation and of various EMF tools. - HCDG: Hierarchical Conditional Dependency Graph. HCDG is an intermediate representation mixing control and data flow in a single acyclic representation. The control flow is represented as hierarchical guards specifying the execution or the definition conditions of nodes. It can be used in the Gecos compilation framework via a specific pass which translates a CDFG representation into an HCDG. - Patterns: Flexible tools for identification of computational pattern in a graph and graph covering. These tools model the concept of pattern in a graph and provide generic algorithms for the identification of pattern and the covering of a graph. The following sub-problems are addressed: (sub)-graphs isomorphism, patterns generation under constraints, covering of a graph using a library of patterns. Most of the implemented algorithms use constraints programming and rely on the CoSaP module to solve the optimization problem. # **5.6.** PowWow: Power Optimized Hardware and Software FrameWork for Wireless Motes (AP-L-10-01) **Participants:** Olivier Sentieys [correspondant], Olivier Berder, Thomas Anger, Arnaud Carer, Jérôme Astier, Samuel Mouget, Adeel Pasha, Steven Derrien. PowWow is a hardware and software platform designed to handle wireless sensor network (WSN) protocols and related applications. Based on an asynchronous rendezvous medium access (MAC) protocol, geographical routing and protothread library, PowWow requires a lighter hardware system than Zigbee [84] to be processed (memory usage including application is less than 10kb). Therefore, network lifetime is increased and price per node is significantly decreased. CAIRN's hardware platform (see Figure 3) is composed of: - The motherboard, designed to reduce power consumption of sensor nodes, embeds an MSP430 microcontroller and all needed components to process PowWow protocol except radio chip. JTAG, RS232, and I2C interfaces are available on this board. - The radio chip daughter board is currently based on a TI CC2420. • The coprocessing daughter board includes a low-power FPGA which allows for hardware acceleration for some PowWow features and also includes dynamic voltage scaling features to increase power efficiency. The current version of PowWow integrates an Actel IGLOO AGL250 FPGA and a programmable DC-DC converter. We have shown that gains in energy of up to 700 can be obtained by using FPGA acceleration on functions like CRC-32 or error detection with regards to a software implementation on the MSP430. Figure 3. CAIRN's PowWow motherboard with radio board connected PowWow distribution also includes a generic software architecture using event-driven programming and organized into protocol layers (PHY, MAC, LINK, NET and APP). The software is based on Contiki [94], and more precisely on the Protothread library which provides a sequential control flow without complex state machines or full multi-threading. To optimize the network regarding a particular application and to define a global strategy to reduce energy, PowWow offers the following extra tools: over-the-air reprogramming (and soon reconfiguration), analytical power estimation based on software profiling and power measurements, a dedicated network analyzer to probe and fix transmissions errors in the network. More information can be found at <a href="http://powwow.gforge.inria.fr">http://powwow.gforge.inria.fr</a>. ## 5.7. SoCLib: Open Platform for Virtual Prototyping of Multi-Processors System on Chip Participants: François Charot [correspondant], Laurent Perraudeau, Charles Wagner. SoCLib is an open platform for virtual prototyping of multi-processors system on chip (MP-SoC) developed in the framework of the SoCLib ANR project. The core of the platform is a library of SystemC simulation models for virtual components (IP cores), with a guaranteed path to silicon. All simulation models are written in SystemC, and can be simulated with the standard SystemC simulation environment distributed by the OSCI organization. Two types of models are available for each IP-core: CABA (Cycle Accurate / Bit Accurate), and TLM-DT (Transaction Level Modeling with Distributed Time). All simulation models are distributed as free software. We have developed the simulation model of the NIOSII processor, of the Altera Avalon interconnect, and of the TMS320C62 DSP processor from Texas Instruments. Find more information on its dedicated web page: http://www.soclib.fr. ## 5.8. OCHRE: On-Chip Randomness Extraction Participants: Olivier Sentieys [correspondant], Arnaud Carer, Arnaud Tisserand. Ochre is a set of synthesizable VHDL models for true and pseudo random number generation and hardware accelerated statistical tests. It includes IP cores of different oscillator-based TRNGs, different PRNGs (linear feedback shift registers, cellular automata, AES) and several statistical tests (FIPS 140-2, AIS31, Diehard). This set of IPs has been used to design Ochre V1 and V2 chips and are delivered under GNU GPL license. ## 6. New Results ## 6.1. Dynamically and Heterogeneous Reconfigurable Platforms #### 6.1.1. New Reconfigurable Architectures 6.1.1.1. High-level modeling of reconfigurable architectures Participants: Robin Bonamy, Daniel Chillet, Sébastien Pillement. The evolution of application complexity and System-on-Chip architectures places the designer of embedded systems in front of a very large design space. Exploring the design space to reach an efficient solution becomes very difficult, especially when the design must satisfy a large number of constraints. This problem becomes even more difficult when the system includes reconfigurable area to support the flexibility of the application. To help the designer in exploring the design space of its design, it becomes more and more important to provide methods and tools for early estimations of the system's characteristics (performance, power). Several methods and tools have been developed for that, but none of them proposes to model the reconfiguration of System-on-Chips. In this context, we developed a high-level model of reconfigurable circuits, like FPGAs. This model is based on the AADL (Architecture Analysis and Design Language) language. This work is part of a more general project, Open-PEOPLE, whose main goal is to define a complete exploration flow based on AADL allowing efficient power and energy consumption analysis. This model will be used to define some exploration strategies in order to provide earlier estimations of performance, area, energy and power consumption. #### 6.1.1.2. Power models of reconfigurable architectures Participants: Robin Bonamy, Daniel Chillet, Olivier Sentieys. Including a reconfigurable area in complex systems-on-chip is now considered as an interesting solution to reduce the area of the global system and to support high performances. But the key challenge in the context of embedded systems is currently the power budget of the system, and the designer needs some early estimation of the power consumption of its system. Power estimations for reconfigurable systems is a difficult problem because several parameters need to be taken into account to define an accurate model. Our first work on this subject consists in evaluating delay, area, power and energy impacts of loop transformations. We have made several power measurements on a real FPGA platform and for different task implementations. These experiments allow us to define an energy and delay model which will be used by the operating system to decide on-line which task instances must be executed to efficiently manage the available power. Furthermore, we also consider the opportunity of the dynamic reconfiguration for the energy consumption. Indeed, using dynamic reconfiguration, it is now possible to partially reconfigure a specific part of the circuit while the rest of the system is running. The cost of the reconfiguration is still important, but some cases this technique can be interesting to reduce the power of the system. To evaluate the potential gain of the dynamic reconfiguration, we have made some measurements on a Virtex 5 board. We have defined a first model of the power consumption of the reconfiguration. This model shows that the power consumption mainly depends on the bitstream file size. This model will be also included in the power management strategy of the operating system for the same goal, i.e. ensuring an efficient management of the available power. #### 6.1.1.3. Flexible Arithmetic Operator Design Participants: Emmanuel Casseau, Daniel Ménard, Shafqat Khan. Our aim is to propose new arithmetic operators which are flexible in term of both computation and data size. Targeted applications are multimedia processing. Such processing handles low precision data (typically, pixels are codes using 8, 10, 12 and 16 bits). To optimize their implementation, architectures must offer operators which support different data word-lengths. Operator efficiency can be increased using subword parallelism (SWP) scheme. A single SWP instruction performs the same operation on multiple sets of subwords in parallel using SWP operators. In the existing SWP capable processors, the choices for subword data sizes are usually 8, 16, 32 bits etc. The reason behind the selection of these subword sizes being the less complexity of the SWP operator design especially when the subword sizes are multiple of the smallest subword size. However in multimedia applications, operators which can support multimedia oriented subword sizes (8, 10, 12 and 16) are required. Multimedia operations are based on basic operators (add, absolute value, multiply) but more complex operations are also required to increase both speed an efficiency. For instance $\sum |a-b|$ operation is required in the calculation of SAD, $\sum (a \times b)$ operation is required for the multiplicationaccumulation operation used in the DCT algorithm etc. To overcome the overheads of reconfigurations such as the complexity of the interconnection network and the reconfiguration time, we designed a flexible pipelined multimedia operator which provides reconfigurability inside the operator using a configurable datapath. The operator can be configured to perform most of multimedia operations on different data sizes without any need of reconfiguration time. This operator will be used as one computing unit inside a reconfigurable processor tailored for multimedia applications. The operator has been also design using redundant data representation [27] for high-speed processing. #### 6.1.1.4. Adaptive and Multi-mode Devices **Participants:** Emmanuel Casseau, Antoine Floch, Erwan Raffin, Daniel Ménard, Shafqat Khan, François Charot, Christophe Wolinski. More and more devices need to continuously adapt to changing environments that is to say devices will have to be flexible to implement different algorithms at different times. Such mode switches require more than just software based changes but also adaptation of the application specific hardware components. To issue this requirement, we investigate two ways. The first one is the design of a reconfigurable processor able to adapt its computing structure to a dedicated domain: video and image processing applications. The processor is built around a pipeline of coarse grain reconfigurable operators exhibiting a good trade-off between performance and power consumption. On the contrary of what has been done in previous reconfigurable processors, flexibility is not obtained through the use of a flexible interconnect network but on the use of configurable domain-dedicated units. This work is done in the context of the ROMA ANR project. We particularly investigate reconfigurable operator design [27] and compilation framework [69]. The second way is the synthesis of multi-mode architectures which do not lead to any reconfiguration time penalty. Such architectures implement all required operators according to the pre-defined set of computations to be performed. In order to optimize area, these operators are shared between the set of algorithms, and some control logic steers the data to operators depending on the particular algorithm to be executed at a specific time. The approach is based on high-level synthesis [29]. Syntheses can be constrained for performance or area and both ASIC and FPGA technologies can be targeted. Application domains are typically channel encoding, cryptography and multimedia. This work is done through a collaboration with IMS Lab. (B. Le Gal). #### 6.1.2. Arithmetic Operators for Cryptography **Participants:** Arnaud Tisserand, Thomas Chabrier, Danuta Pamula, Stanislaw Piestrak, Andrianina Andriamanga. #### 6.1.2.1. ECC Processor with Protections Against SCA A dedicated processor for elliptic curve cryptography (ECC) is under development. Functional units for arithmetic operations in $\mathbb{F}_{2^m}$ and $\mathbb{F}_p$ finite fields and 160–600-bit operands have been developed for FPGA implementation. Several protection methods against side channel attacks (SCA) have been studied. The use of some number systems, especially very redundant ones, allows to change the way some computations are performed and then their effects on side channel traces. In [36] we propose the use of the double base number system (DBNS) to randomly recode secret keys digits on-the-fly during the main ECC operation: the scalar multiplication [k]P. The proposed method, implemented on FPGAs, leads to a totally random behavior of the point operations at the side channel level, and with a speed equivalent to the best standard unprotected methods. A long talk on *Arithmetic Level Countermeasures for ECC Coprocessor* [76] was presented at the Claude Shannon Institute Workshop on Coding and Cryptography in Cork, Ireland, May 2010. #### 6.1.2.2. Arithmetic Operators for High-Performance Cryptography We worked on fast algorithms and implementations of $\mathbb{F}_{2^m}$ finite field multiplication units in FPGA. We focused on methods based on separated multiplication and reduction steps and analyzed various area and time dependency/efficiency/complexity tradeoffs. The corresponding results have been presented in [55]. A journal version of this work has been accepted for future publication in a national Polish journal "Measurement Automation and Monitoring". Mark Hamilton, PhD student in the Code and Crypto group from the University College Cork (UCC), spent five months at CAIRN-Lannion to work on fast algorithms and implementations of $\mathbb{F}_p$ finite field multipliers for some specific values of p. A common publication is under preparation. #### 6.1.2.3. ECC Protections Against Fault Injection Attacks During the Master internship of Andrianina Andriamanga, we worked on the use of residue code ( $\mod 2^p - 1$ for some values of p) detection methods to protect ECC operations against some fault injection attacks. The corresponding results will be submitted to a conference in the beginning of 2011. #### 6.1.2.4. Hardware Implementation of Code-Based Cryptography A new collaboration with CASED (Center for Advanced Security Research Darmstadt) laboratory in Germany is starting on efficient hardware implementations of new cryptographic methods based on code theory. This type of cryptographic methods are robust against mathematical attacks using quantum computers. ## 6.1.3. Optimization of Advanced Arithmetic Operators Participant: Arnaud Tisserand. A software library in SystemC was developed for the optimization and validation of fixed-point hardware arithmetic operators. The corresponding results have been presented in [72]. We use an interface to the gappa software developed by G. Melquiond to tightly bound rounding errors and verify that those bounds are below some given threshold. A SystemC description of arithmetic operations is analyzed by gappa to certify the operator accuracy. We also provide various optimization methods to reduce the size of fixed-point operators under maximal-error constraints. This avoids to overestimate maximal rounding errors like in standard methods. The library can be used to perform architecture exploration with certified accuracy. In the collaboration with the VLSI CAD laboratory from the University of Massachusetts (UMASS), started in 2009, we continue the integration of arithmetic methods for bounding rounding errors and optimizing some basic arithmetic operators in the TDS system developed at UMASS. A common publication in under preparation. #### 6.1.4. Management of Dynamically Reconfigurable Systems Participants: Antoine Eiche, Daniel Chillet, Sébastien Pillement, Ludovic Devaux, Olivier Sentieys. To support the dynamic behavior of new embedded applications, heterogeneous execution resources are often included in modern SoC or MPSoC (Multi-Processor System-on-Chip) systems. The management of these resources is classically supported by an operating system (OS) that includes several specific services. One new needed service concerns the task scheduling and placement within the reconfigurable resources. The classical temporal scheduling problem is then extended with a spatial dimension in order to manage the physical available area into the reconfigurable resource. The second impacted service is the task communication management. The on-line task placement makes the interconnection support difficult to predict. Then, a flexible and dynamically interconnect medium must be defined. #### 6.1.4.1. Spatio-Temporal Scheduling based on Artificial Neural Networks Participants: Antoine Eiche, Daniel Chillet, Sébastien Pillement, Olivier Sentieys. By including dynamic and partial reconfiguration paradigm into a System-on-Chip platform, some specific management services must be developed to support the parallel and/or sequential instantiations of multiple hardware tasks within the same piece of silicon. One of the main problem consists in defining the placement and the scheduling of the different tasks within the reconfigurable part of the system, this problem is generally called the *spatio-temporal scheduling*. From our experience about neural networks for temporal task scheduling, we address now the problem of task placement within a reconfigurable resource. This work considers a heterogeneous reconfigurable area where several instances of task are defined. Our solution is based on a neural network structure specifically designed to optimize the task placement problem. The main objective of the optimization is to consider the reduction of the task rejection. Our placement policy has been compared to other propositions and provides better results under identical assumptions [45]. We also have continued our work about the hardware implementation of our neural network. The temporal scheduling is now completely defined and we plan to develop the hardware implementation of the spatial scheduling for our future works. #### 6.1.4.2. Flexible Communication Infrastructure Participants: Daniel Chillet, Sébastien Pillement, Ludovic Devaux. For task communications within flexible architectures, we defined a specific interconnection architecture adapted to dynamically and partially reconfigurable resources included into modern SoC. The characterization of the *DRAFT* network was completed and its integration inside reconfigurable systems on chip was realized [35]. In the framework of the FosFor project, a bridge was designed to allow the interconnection of draft to an AHB bus enabling communications with off-the-shell processors like Leon3. An interconnection service, that will be part of the FosFor operating system, was also specified [43]. This service manages the communications and offers new strategies of communications allowed by dynamic reconfiguration like the creation of dynamic memory spaces instantiating temporary memory tasks in unused logic areas. Considering possibilities offered by dynamic reconfiguration, a new network was designed and characterized. *R2NoC* has the particularity to present reconfigurable routers containing only communication links [44]. However, limitations imposed by industrial products avoid an efficient use of this network due to very long reconfiguration times. That is why the *Ocean* network is currently being designed. #### 6.1.5. Fault-Tolerant Reconfigurable Systems Participants: Stanislaw Piestrak, Sébastien Pillement, Manh Pham, Olivier Sentieys. The use of reconfigurable hardware in critical applications like transportation and transaction systems is increasing rapidly. Undetected errors caused e.g. by radiation may result in fatal silent data corruption and unreproducible system crashes. Since it is virtually impossible to build devices which are free from faults, it is essential to embed some sort of fault-tolerance in such devices, which will enable them to work correctly even in the presence of faults. Since the past decade, a lot of research has been done to develop fault-tolerant reconfigurable systems on various granularity levels, although most of them have dealt with the lowest level such as offered by FPGAs. In [49], we have considered the possibility of implementing low-cost hardware techniques which would allow to tolerate temporary faults in the data-paths of coarse-grained reconfigurable architectures. Our goal was to use less hardware overhead than commonly used duplication or triplication methods. The proposed technique relies on concurrent error detection by using *residue code modulo 3* and re-execution of the last operation, once an error is detected. Simulation results performed for the DART architecture developed at IRISA with all of its data-paths protected using residue code confirmed hardware savings of the proposed approach over duplication. Moreover, we also have studied different strategies for fault recovery after detection. The pervasiveness of electronic computers has led the automotive industry to face new security and performance requirements to integrate new applications in the field. Modern reconfigurable logic circuits meet now the requirements of processing performance, flexibility and industry trends on reducing product cost. We show in this work the importance of new dynamically reconfigurable architectures in the automotive field and more generally in the area of dependability. The use of dynamically reconfigurable computers can reduce the number of computers and reduce the costs of implementation. Unfortunately, these architectures are very sensitive to radiation and therefore to errors. During this year we have enhanced our FT-DYMPSoC system [18] and fully implemented it on a commercial FPGA circuit. In order to cope with dynamic behaviors, we have proposed a NoC based version of the system [64]. During this process of implementation, we encounter some problems due to the lacks in effective method to estimate the impacts of fault mitigation schemes on the system performance. Thus, we have defined an analytical model is proposed to ease the evaluation of performance/reliability trade-off while including fault-tolerance technique into the target systems [65]. Duplex system is more appreciated than a triple redundancy system in term of required hardware overhead. On the contrary, duplex system lacks the fault identification localization, and hence correction capabilities which are present in triplication system. We then have proposed an amelioration of existing fault-tolerant schemes based on duplication by using dynamically reconfigurable architectures. For that purpose we have designed a low overhead softcore processor system based on lockstep scheme. The lockstep system contains a duplex copy of processor which is able to detect errors in the dual processor thanks to a mismatch indicator. Our proposal enhance the lockstep scheme by adding the fault identification capability. A proposed configuration engine supervises the system in back-ground. The fault localization action detects which processor within the duplex copy is defected by error. Afterwards the correct output of the fault-free processor is instantly switched to the final output. That prevents the erroneous results from being introduced to the environment and thus avoids any potential catastrophic results propagation. The operation disruption due to fault occurrence is minimized offering a big advantage to the system safety. Moreover the generality of the proposed configuration engine do not prevent them from being implemented in diverse types of systems. #### 6.1.6. Low-Power Architectures 6.1.6.1. Coding Techniques Improving Reliability and Power Consumption for On-Chip Buses Participants: Olivier Sentieys, Sébastien Pillement, Stanislaw Piestrak. Interconnects are now considered as one of the bottlenecks in the design of system-on-chip (SoC) since they introduce delay and power consumption. To deal with this issue, data coding for interconnect power and timing optimization has been introduced. Several coding techniques have been suggested to reduce both noise and wire power consumption in onchip interconnections, like bus-invert coding, low-weight coding, and reduction of the voltage swing of the signal on the wire. Unfortunately, the latter involves reduced noise margin which might result in increased error rate. Recently, Berger-invert code has been suggested to protect communication channels against all asymmetric errors and to decrease power consumption. We have not only shown some inaccuracies of the approach proposed [30], but also suggested a modified encoding scheme and a new design of codec [31]. Implementation results have shown that our approach leads to significant hardware savings and results in reduced error rate and power consumption. 6.1.6.2. Ultra Low-Power Architecture for Control-Oriented Applications in Wireless Sensor Nodes Participants: Steven Derrien, Adeel Pasha, Olivier Sentieys. This research work aims at developing ultra low-power SoC for wireless sensor nodes, as an alternative to existing approaches based low-power micro-controllers such as the Texas Instrument's MSP430. The proposed approach reduces the power consumption by using a combination of hardware specialization and power gating techniques. In particular, we use the fact that typical WSN applications are generally modeled as a set of small to medium grain tasks that are implemented on low power microcontroller using light weight *thread*-like OS constructs. Rather than implementing these tasks in software, we instead propose to map each of these tasks to their own specialized hardware structures that we call a *hardware micro-task*. Such hardware task consists of a minimalistic (and customized) data-path controlled by a finite state machine (FSM). By customizing each of these hardware implementations to their corresponding task, we expect to significantly reduce the dynamic power dissipated by the whole system. Besides, to circumvent the increase in static power caused by the possibly numerous hardware tasks implemented in the chip, we also propose to combine our approach with *power gating*, so as to supply power to a hardware task only when it needs to be executed. The results obtained are very promising and have led us to a publication at IEEE/ACM Design Automation Conference [61]. Our work was also described in the article *Embedded systems power down* of the EDN.com magazine by citing our presentation at DAC and the fact that *microtasking=low power*. See http://www.edn.com/article/509878-Embedded\_systems\_power\_down.php for the article. The work done in 2010 mainly consisted in finalizing the system-level design-flow for the synthesis of ultra low-power WSN node controllers. In particular, we completed the design-flow for hardware micro-tasks from a higher level description in ANSI-C. We have also developed a Domain Specific Language (DSL) that is used to specify the system-level model of a WSN node controller. This system-level model consists in the notion of micro-tasks, their interaction through the generated events, their hierarchies and priorities, and their shared resources. Using all this information, our design-flow is able to generate the VHDL description of a hardware System Monitor (SM) that is used to control the hardware micro-task and the shared resource activation and deactivation. To summarize, the whole design-flow is comprised of two parts (i) a C to VHDL flow for hardware micro-task synthesis, and (ii) a DSL to VHDL flow for hardware system monitor synthesis. #### 6.1.7. SoC Modeling and Prototyping on FPGA-based Systems Participants: François Charot, Kevin Martin, Laurent Perraudeau, Charles Wagner. CAIRN participates in the SoCLib ANR project (see Section 7.8 for more information) whose goal is to build an open platform for modeling and simulation of multiprocessors system-on-chip (MP-SoC). As part of our participation in this project, we have developed simulation models of the Altera NIOSII processor and of the Altera interconnect (Avalon bus). These models and their associated wrappers now allow NIOSII<sup>4</sup>-based multiprocessor systems to be modeled. A multithreaded version of a H264 video decoder has been deployed on a NIOSII-based multiprocessor SoCLib platform thanks to the use of the MutekH operating system developed at LIP6 laboratory. MutekH is a set of libraries built on top of the Hexo exo-kernel which defines the Hardware Abstraction Layer, providing both portability and support for heterogeneity. In the framework of this SoCLib project, we have ported Hexo on NIOSII processor based MPSoCs architectures modeled with SoCLib. This NIOSII processor port is integrated to the MutekH distribution (http://www.mutekh.org). ## 6.2. Compilation and Synthesis for Reconfigurable Platform **Participants:** Steven Derrien, Emmanuel Casseau, Daniel Ménard, François Charot, Christophe Wolinski, Olivier Sentieys, Patrice Quinton. #### 6.2.1. Optimized Synthesis of Processor Extensions in the DURASE System Participants: Christophe Wolinski, François Charot, Erwan Raffin, Kevin Martin, Antoine Floch. In the context of the *DURASE* system, this year, we have focused on the optimization of the processor extension synthesis. We have developed an original method based on constraint programming enabling the global minimization of the logic elements of the FPGA processor extensions' implementation. The abstract generic architecture model of a processor extension is depicted in Figure 4. It is composed of a processor interface (Figure 4 shows the NIOSII processor interface), a set of processing units U, a set of registers R and two sets of multiplexers MAS and MBS respectively. The number of registers is parametrized and each register is identified by a unique number $r_{id}$ . The number and types of processing units are also parametrized. <sup>&</sup>lt;sup>4</sup>The NiosII processor core is a configurable processor core proposed by Altera. This NiosII processor core is declined in three families (economic, standard, fast). A SoCLib model of the fast version has been previously developed in 2008. Figure 4. Generic architecture model. The processing units can be heterogeneous, i.e., each unit can execute a specific set of complex operations. Generally, a particular unit can contain a run-time reconfigurable, at the functional level, data-path. Each processing unit is identified by a unique number $u_{id}$ and can have several input and output ports (P), identified by their unique number $p_{id}$ . Only two operands can be sent by a processor to an extension during one cycle using buses MAS and MBS. This parameter is defined by the processor interface and it applies, in our case, to the NIOSII processor. Only one result can be sent back by an extension to a processor during one cycle and this assumption is also specific for the NIOSII processor. Data transferred from the processor to the extension and data passed directly (without processor intervention) between processing units can be stored for further processing in an extension's internal register file. In the context of this project, in the first step, we have defined a constraint programming model of the generic architecture presented in Figure 4. In the second step, a corresponding tool was built. The tool is capable of minimizing the number of logic elements taking into account the registers and multiplexers simultaneously. The synthesis results confirm the efficiency of our approach. In average, a 30% improvement in the number of FPGA logic elements needed for the processor extensions' implementation was observed (full details in the Ph.D. thesis of Kevin Martin [16]). #### 6.2.2. Run-time Reconfigurable Architecture Modeling **Participants:** Christophe Wolinski, François Charot, Emmanuel Casseau, Daniel Ménard, Antoine Floch, Erwan Raffin, Steven Derrien. #### 6.2.2.1. Roma project We have continued to work on the modeling problem of the run-time partially reconfigurable ROMA processor in order to optimize the execution time of the application. The ROMA processor is composed of a set of coarse grain reconfigurable operators, data memories, configuration memories, operator network, data network, control network and a centralized controller. The centralized controller manages the configuration steps and the execution steps. The ROMA processor has three different interfaces: one data interface connected to the operator network, one control interface and one debug interface connected to the main controller. The reconfigurable operators are connected together via a dedicated network (called operator-operator network) and to the data memories via another network (called data memory-operator network). The local memories have their own programmable address generators. Figure 5 shows the block diagram of the ROMA processor. The main controller (Global CTRL) executes a C program defining synchronizations between the configuration and execution sequences. Figure 5. Architecture of ROMA processor: the control structure includes a Global CTRL and dedicated controllers designated for each module of the reconfigurable datapath. The reconfigurable datapath is composed of data memory banks, two interconnection networks and a set of coarse grain reconfigurable operators. In order to support this kind of architecture a new extension of the *DURASE* system was developed (Figure 6). As shown in Figure 6, the inputs to our system are an application program written in C and an abstract generic parallel run-time reconfigurable architecture model. The outputs are the C program and the configuration information (binary files) needed to manage the run-time reconfigurable ROMA architecture. The newly developed system is part of the *DURASE* system (see Figure 6). It implements our new method, based on CP, that enables to model complex run-time reconfigurable architectures together with their application programs. The model can then be used to perform scheduling, binding and routing while optimizing application's execution time. Our system contains also the target dependent back-end compiler (in our case, the supporting ROMA architecture). We have carried out extensive experiments to evaluate the quality of our newly developed system. All experiments have been run on 2GHz Intel Core Duo under the Windows XP operating system. In our experiments, the ROMA abstract model has been instantiated with 8 memories and 4 operators. All operators support the same types of computations and the delay of a computation is the same, independently to its resource assignment. The following latencies have been assumed $WR_{lat} = RD_{lat} = ope\_ope_{lat} = 1$ . We Figure 6. DURASE global design flow overview. have also assumed that all data is stored in memories before processing starts. In 78% of the cases, our system provides optimal results, confirming the high quality of our scheduling, binding and routing system [69]. ### 6.2.2.2. RecMotifs project In the context of the *RecMotifs* project, we have continued to work on a specific design flow integrating STMicroelectronics' compiler and our development platform. We have also defined a new CP (Constraint Programming) model [75], [46] of the scheduler well adapted to a parallel architecture. Our generic simplified architecture is composed of functionally reconfigurable cells implementing a set of computational patterns (selected by our system). The cells are connected directly to the processor data-path. The cell contains also registers for local and intermediate data. The cells can communicate through the crossbar switch. The number of registers and the structure of interconnections are application dependent. Cells can also have a local memory to store coefficients and data needed for processing. In this case, the memory has two ports, one connected to the cell and the second connected directly to the processor. The address generation can be ensured by the memory address generator. In the context of this project the *DURASE* flow was modified. The main contribution is the new parallel architecture composed of an ASIP processor and a functionally reconfigurable cell fabric. The new design method for pattern selection uses also a new model of graph covering for this architecture when scheduling instructions for parallel execution. Moreover, we model detail architectural constraints. The presented method substantially extends the *DURASE* system, which can now be applied to generic parallel architectures. We have carried out experiments to evaluate the possible speed-up that can be obtained using the NiosII processor (running at 200MHz on a Stratix2 Altera FPGA) extended with functionally reconfigurable cell fabric. The patterns have been generated with an assumption that the number of inputs cannot exceed four inputs and the number of outputs can not exceed two outputs. Results, obtained for selected applications from *MediaBench*, *MiBench* and *Cryptographic Library* benchmark sets, have been presented in [46]. These applications are written in C and compiled using our design flow for the ALTERA NIOSII target processor. #### 6.2.2.3. Floating-Point to Fixed-Point Conversion Participants: Daniel Ménard, Karthick Parashar, Olivier Sentieys, Romuald Rocher, Hai-Nam Nguyen. In [57] a hierarchical approach has been proposed to perform word-length optimization of a complete system made-up of several subsystems. At the system level, the fixed-point behavior of each subsystem is modeled by a single noise source located at the subsystem output. The aim is to find the noise power levels of each noise source so as to minimize the implementation cost while maintaining the overall performance. This year experiments have been carried-out on a MIMO-OFDM receiver to demonstrate the efficiency of our approach. For the fixed-point conversion process, different optimization algorithms have been tested. An improvement of the word-length optimization techniques based on genetic algorithm has been proposed. The quality of the solution is improved without increasing the optimization time. The execution time of this kind of algorithm is quite long but it allows obtaining directly the Pareto curve of the cost according to the accuracy constraint. As example, this curve is used in our hierarchical approach. The use of the GRASP algorithm for word-length optimization has been proposed. Compared to the genetic algorithms, this approach allows reducing the optimization time for a given accuracy constraint and improving the solution quality. ## 6.3. Algorithm Architecture Interaction **Participants:** Steven Derrien, Romuald Rocher, Daniel Ménard, François Charot, Christophe Wolinski, Olivier Sentieys, Patrice Quinton. #### 6.3.1. Reconfigurable Video Coding Participants: Emmanuel Casseau, Olivier Sentieys, Arnaud Carer, Cecile Beaumin-Palud, Herve Yviquel. In the field of multimedia coding, standardization recommendations are always evolving. To reduce design time, Reconfigurable Video Coding (RVC) standard allows defining new codec algorithms based on a modular library of components. RVC dataflow-based specification formalism expressly targets multiprocessors platforms. However software processor cannot cope with high performance and low power requirements. Hence the mapping of RVC specifications on hardware accelerators is investigated in this work, as well as the scheduling of the functional units (FU) of the specification. Aim is to make use as much as possible of the parallelism the specification exhibits for the scheduling of the tasks based on the available resources. Reconfigurability will be used and the design of an RVC-dedicated reconfigurable architecture will be studied. First results [39] lead to the definition of a reconfigurable FIFO for optimizing cost and performance of RVC dataflow specifications by taking advantage of their dynamic behavior. This work is done within a close collaboration with IETR Rennes. #### 6.3.2. Range Estimation and Computation Accuracy Optimization **Participants:** Daniel Ménard, Karthick Parashar, Olivier Sentieys, Romuald Rocher, Hai-Nam Nguyen, Emmanuel Casseau, Andrei Banciu. #### 6.3.2.1. Range Estimation The floating-point to fixed-point conversion is an important part of the hardware design in order to obtain efficient implementations. In order to optimize the integer word-length under performance constraints, the dynamic variations of the variables during execution must be determined. Traditional range estimation methods based on simulations are data dependent and time consuming whereas analytical methods like interval and affine arithmetic give pessimistic results as they lack of a statistical background. Recently, a novel approach, based on the Karhunen-Loève Expansion (KLE) was presented for linear time-invariant (LTI) systems offering a solid stochastic foundation. We have investigated this theory. The KLE approach is able to optimize the integer word-length so that the distortions introduced would still satisfy the application performances. However, the accuracy of the estimation is limited by the expansion order and by the complexity of the computations. We checked the relevance of the theory for practical implementations with an OFDM modulator as a test case [38]. #### 6.3.2.2. Performance Evaluation of Fixed-Point Systems Existing analytical techniques to evaluate performance of fixed-point systems are not applicable to the errors due to quantization in the presence of un-smooth operators like decision operators. In [58], a generalized decision operator has been defined and an analytical model for determining the probability of decision error due to quantization noise has been proposed. Nevertheless, the perturbation theory cannot be used to propagate the decision error inside the system. Thus, it is inevitable to use simulation to evaluate performance of fixed-point systems in the presence un-smooth operators. In [56], a hybrid technique which can be used in place of pure simulation to accelerate the performance evaluation has been proposed. The principle idea is to selectively simulate parts of the system only when un-smooth errors occur but use analytical results otherwise. We applied this approach to a complex MIMO sphere decoding algorithm in collaboration with Imec (Interuniversitair Micro-Electronika Centrum), Belgium. The performance evaluation time has been reduced of several orders of magnitude compared to existing approaches based on pure fixed-point simulations. This technique uses the single noise source model. This model attempts to capture the fixed-point behavior of any sub-system integrating smooth operators, with a single noise source located at the system output. In [59], an estimation of the noise frequency response has been proposed and in [60] an estimation of the noise probability density function has been defined. #### 6.3.3. Multi-Antenna Systems Participants: Olivier Berder, Pascal Scalart, Quoc-Tuong Ngo. Considering the possibility for the transmitter to get some Channel State Information (CSI) from the receiver, antenna power allocation strategies can be performed thanks to the joined optimization of linear precoder (at the transmitter) and decoder (at the receiver) according to various criteria. An efficient linear precoder based on the maximization of the minimum Euclidean distance between two received data vectors for three data-streams MIMO spatial multiplexing systems is proposed. In the literature, these dmin-based precoders were only derived for two data streams. By using trigonometric functions, a new virtual MIMO channel representation, which is performed by two channel angles, allows the parameterization of the max-dmin precoder and the optimization of the distance between signal points at the received constellation. To illustrate the optimization process, a sub-optimal precoder is firstly derived for BPSK and QPSK modulation following the max-SNR approach, which consists in pouring power only on the most favored virtual sub-channel [52]. According to this representation, the optimal dmin precoders are then proposed for BPSK and QPSK modulation. Simulation results over Rayleigh fading channel demonstrate a large bit-error-rate improvement of the proposed solution in comparison with beamforming and other traditional precoding strategies. It is shown that the performance improvement depends on the channel characteristics and the more dispersive the channel is, the more significant the performance improvements are. #### 6.3.4. Cooperative Strategies for Low-Energy Wireless Networks **Participants:** Olivier Berder, Le Quang Vinh Tran, Olivier Sentieys, Tuan-Duc Nguyen [International University - VNU. - Hochiminh City, Vietnam]. During the last decade, many works were devoted to improving the performance of relaying techniques in ad hoc networks. One promising approach consists in allowing the relay nodes to cooperate, thus using spatial diversity to increase the capacity of the system. In wireless distributed networks where multiple antennas can not be installed in one wireless node, cooperative relay and cooperative Multi-Input Multi-Output (MIMO) techniques can indeed be used to exploit spatial and temporal diversity gain in order to reduce energy consumption. Performance and energy consumption of the cooperative MIMO and relay techniques are investigated over a Rayleigh fading channel. If under ideal conditions cooperative MIMO has been proved to be better than relay, the latter is a better solution when transmission synchronization errors occur. The comparison between these two cooperative techniques helps us to choose the optimal cooperative strategy for energy constrained WSN applications [53]. An association strategy of these two techniques is then proposed in order to exploit simultaneously the advantages of these two techniques [54]. The principle of this association strategy is that a cooperative MIMO technique is employed at multiple relay nodes to retransmit the signal by using a MISO transmission in one transmission phase instead of multiple transmission phases of the traditional parallel relay technique. The energy efficiency of cooperative MIMO and relay techniques is very useful for the Infrastructure to Vehicle (I2V) and Infrastructure to Infrastructure (I2I) communications in Intelligent Transport Systems (ITS) networks where the energy consumption of wireless nodes embedded on road infrastructure is constrained. Applications of cooperation between nodes to ITS networks are proposed and the performance and the energy consumption of cooperative relay and cooperative MIMO are investigated in comparison with the traditional multi-hop technique. The comparison between these cooperative techniques helps us to choose the optimal cooperative strategy in terms of energy consumption for energy constrained road infrastructure networks in ITS applications. In this context, the impact of cooperative strategies is analyzed thanks to the realistic power model of a real radio transceiver [73]. A system using a two-antenna source, two one-antenna relays and one antenna destination is considered. Three types of association strategies of Space time coding and relaying technique, MIMO full cooperative relay (MFCR), MIMO simple cooperative relay (MSCR) and MIMO normal cooperative relay (MNCR) are presented. The power consumption model parameters are extracted from characteristics of CC2420, a wireless sensor transceiver widely used and commercially available. The energy analysis for different transmit protocols are analyzed and compared to show the optimal scheme for different ranges of transmission distance. The threshold of transmission distance to choose the optimal energy consumption model is derived. The maximum transmission distance of three models is shown, i.e. 122m for Alamouti scheme and 280m for MFCR. Depending on the relative distance of relay and the transmission distance, the proposed optimal energy efficient scheme selection defines which model should be used to minimize the total energy consumption. #### 6.3.5. Opportunistic Routing Participants: Olivier Berder, Olivier Sentieys, Ruifeng Zhang, Jean-Marie Gorce [Insa Lyon, INRIA Swing]. However, the aforementioned approaches introduce an overhead in terms of information exchange, increasing the complexity of the receivers. A simpler way of exploiting spatial diversity is referred to as opportunistic routing. In this scheme, a cluster of nodes still serves as relay candidates but only a single node in the cluster forwards the packet. This paper proposes a thorough analysis of opportunistic routing efficiency under different realistic radio channel conditions. The study aims at finding the best trade-off between two objectives: energy and latency minimizations, under a hard reliability constraint. We derive an optimal bound, namely, the Pareto front of the related optimization problem, which offers a good insight into the benefits of opportunistic routings compared with classical multi-hop routing schemes. Meanwhile, the lower bound provides a framework to optimize the parameters in physical layer, MAC layer and routing layer from the viewpoint of cross layer during the design or planning phase of a network [34]. #### 6.3.6. Speech enhancement and coding issues Participant: Pascal Scalart. Microphone arrays and more specifically beamforming methods are enabling technology for hands-free communication that is now viable and cost effective. By offering directional gain to improve the signal-to-noise ratio and taking the spatial correlation of sound ?eld into account to de-reverberate the desired speech signal and to reduce noise and acoustic echoes, microphone arrays techniques play an essential role in hands-free mobile telephony, distant-talker speech recognition, voice-controlled systems, hearing aids, or audio monitoring. To tackle time-varying environments with both non-stationary signal characteristics and potentially moving sources, we worked [50] on the Generalized Sidelobe Canceller (GSC) which is an efficient implementation of adaptive beamformers. One of its main drawbacks lies in the self-cancellation phenomena of the derided signal caused by the signal leakage into the noise reference. To cope with this problem, we proposed to take benefit of the ability of the crosstalk-resistant adaptive noise canceller (CTRANC) to deal with crosstalk problem that, in fact, is the same as the signal leakage problem in the GSC. Describing the new adaptive recursive structure for the GSC, we derived a complete analysis of the CTRANC and proposed new adaptive algorithms in the frequency-domain [70]. We established new results about the convergence properties and the existence of an equilibrium point for this recursive structure and we showed that the recursive GSC is an effective solution to solve the leakage problem and to improve its performance. In the audio coding domain, we focused our research activity on stereo coding which is widely used in audio applications such as streaming, broadcasting or storage, and significant progress was made in reducing the bit rate for (joint) stereo coding, as shown by the evolution of MPEG audio standards (MP3, AAC, HE-AAC, USAC). On the other hand, in conversational applications speech coders are designed to handle mostly mono signals; stereo, when supported by the service (e.g conferencing), is usually coded using dual mono, that is by coding separately each channel. Recently, ITU-T has launched several standardization activities aiming at extending existing wideband (50-7000 Hz) mono coding standards to superwideband (50-14000 Hz) and stereo. Examples are given by G.729.1-SWB, G.718-SWB, and G.722/G.711.1-SWB. In these examples, the bitrate set for stereo does not allow dual mono coding and therefore joint stereo coding operating at lower bit rate than dual mono is needed. In the same spirit as the G.722/G.711.1-SWB activity, we proposed an experimental stereo extension of G.722 that follows the constraints of the stereo extension, e.g. frame length of 5 ms and additional bit rate of 8 or 16 kbit/s. Using a frequency domain stereo to mono downmixing technique, the proposed coder [48] preserves the energy of mono signal and avoids issues due to the complete dependency on one channel (L or R) for the phase computation. A parametric stereo extension of G.722 at 56+8 and 64+16 kbit/s has been studied and the quality of the proposed coder was evaluated in MUSHRA tests. The proposed stereo coder operates at the lower bitrate than G.722 dual mono, with a speech and music quality at 64+16 kbit/s that is equivalent to G.722 dual mono. #### 6.3.7. True Random Number Generators **Participants:** Renaud Santoro, Olivier Sentieys, Arnaud Tisserand, Philippe Quémerais, Arnaud Carer, Thomas Anger. #### 6.3.7.1. Ochre V2: TRNG chip with on-line randomness quality monitoring A new chip has been designed and sent to fabrication: 4mm² in CMOS 130nm STMicroelectronics HC-MOS9GP. This circuit is a true random number generator based on several architectures of oscillator sampling (the physical noise source is the jitter produced by one or several free running oscillators). The quality of the random sequence generated by a TRNG depends on many parameters such as noise source characteristics, implementation details and environment parameters. A hardware unit for on-line and real-time evaluation of the quality of TRNG output has been design and implemented in the Ochre V2 circuit. This is useful in critical applications such as cryptographic embedded systems. The on-line and real-time monitoring of the generated random sequence is useful to prevent randomness quality reduction due to environment variations or physical attacks against the TRNG. #### 6.3.8. Flexible hardware accelerators for biocomputing applications Participants: Steven Derrien, Naeem Abbas, Patrice Quinton. It is widely acknowledged that FPGA-based hardware acceleration of compute intensive bioinformatics applications can be a viable alternative to cluster (or grid) based approach as they offer very interesting MIPS/watt figure of merits. One of the issues with this technology is that it remains somewhat difficult to use and to maintain (one is rather designing a circuit rather than programming a machine). Even though there exists C-to-hardware compilation tools (Catapult-C, Impulse-C, etc.), a common belief is that they do not generally offer good enough performance to justify the use of such reconfigurable technology. As a matter of fact, successful hardware implementations of bio-computing algorithms are manually designed at RTL level and are usually targeted to a specific system, with little if any performance portability among reconfigurable platforms. Figure 7. Layout of Ochre V2 Integrated Circuit This research work, which is part of the ANR BioWic project, aims at providing a framework for helping semi-automatic generation of high-performance hardware accelerators. In particular we expect to widen the scope of common design constraints by focusing on system-level criterions that involve both the host machine and the accelerator (workload balancing, communications and data reuse optimisations, harwdare utilization rate, etc.). This research work builds upon the CAIRN research group expertise on automatic parallelization for application specific hardware accelerators and has been targeting mainstream bioinfiormatic applications (HMMer, ClustalW and BLAST). Our work in 2010 focused on the HMMER algorithm, and led to a very fruitful collaboration with Prof Rajopadhye at CSU. In particular we have proposed a mathematical reformulation of the HMMER algorithm (previously known to be sequential) that exposes parallelism in the form of *reductions* and *prefix-scan* operations, that are very well suited to efficient hardware implementation [37]. #### 6.3.9. Parallel reconfigurable architectures for LDPC decoding Participants: Florent Berthelot, François Charot, Charles Wagner, Christophe Wolinski. LDPC codes are a class of error-correcting code introduced by Gallager with an iterative probability-based decoding algorithm. Their performances combined with their relatively simple decoding algorithm make these codes very attractive for the next satellite and radio digital transmission system generations. LDPC codes were chosen in DVB-S2, 802.11n, 802.16e, 802.3an and CCSDS standards. The major problem is the huge design space composed of many interrelated parameters which enforces drastic design trade-offs. Another important issue is the need for flexibility of the hardware solutions which have to be able to support all the declinations of a given standard. In the context of the RPS2 project, we have designed a partly parallel architecture suited to the decoding of LDPC codes for the digital video broadcast DVB-S2 standard [41]. A complete development flow starting from Matlab specification downto backend tools dedicated to FPGA implementation has been defined. Algorithm analysis and bit error performance evaluation have been performed using the open source IS-CML Matlab toolbox. Firstly a functional DVB-S2 decoding algorithm based on the iterative belief propagation algorithm has been written in C/C++. Floating point to fixed point conversion has been studied. Then the functional model has been rewritten in SystemC with the goal to match the defined architecture at a cycle accurate bit accurate level using SystemC synchronous threads. Finally an iterative transformation to VHDL code of each SystemC thread has been realized. This flow allowed a better understanding of the algorithm in terms of complexity, performance and its hardware implementation. We focused on complexity-performance trade-offs due to message quantizations and we compared its effects for several algorithmic approximations used for the processing of check nodes. The decoder has been implemented on a XD2000i FPGA in-socket accelerator from XtremeData – a platform composed of a stratix 3 FPGA from Altera plugged in a CPU-socket. The Matlab-based simulation acceleration allowed quantization effect study and error floor effect at very low BER for DVB-S2 check node algorithm approximations. # 7. Contracts and Grants with Industry #### 7.1. ITEA2 - GEODES (2008-2011) **Participants:** Olivier Sentieys, Olivier Berder, Arnaud Carer, Jérôme Astier, Thomas Anger, Vivek Tovinakere-Dwarakanath, Mahtab Alam. The GEODES (Global Energy Optimization for Distributed Embedded Systems) project will provide design techniques, embedded software and accompanying tools needed to face the challenge of allowing long power-autonomy of features rich and connected embedded systems, which are becoming pervasive and whose usage is significantly rising. It approaches this challenge by considering all system levels, and notably emphasizes the distributed system view. GEODES is an ITEA2 project which involves partners from France, Austria, Italy and the Netherlands: Thales (FR, IT, NL), Sensaris (FR), CNRS (LEAT and IRISA) (FR), CETMEF/MARTEC (FR), Infineon (AU), Thomson (FR), TUV (AU), UAQ (IT), Phillips (NL), Organo (AU), TI-WMC (NL). In GEODES Cairn will provide to partners the PowWow very power sensor platform including reconfigurable hardware accelerators. CAIRN will also contribute on link and MAC layers strategies to a global optimization of the energy, and define and optimize advanced signal processing, error detection and correction and medium access (MAC) techniques in order to reduce the transmit power as well as the useless listening of the communication media. In particular, the case of cooperative strategies like cooperative MIMO or relaying techniques will be investigated. ## 7.2. NANO2012 Program - S2S4HLS (2008-2012) **Participants:** Emmanuel Casseau, Steven Derrien, Daniel Ménard, Olivier Sentieys, Loïc Cloatre, Amit Kumar, Antoine Morvan, Chenglong Xiao, Jean-Charles Naud. High-level synthesis (HLS) tools start to be used for industrial designs. HLS is analogous to software compilation transposed to the hardware domain. From an algorithmic behavior of the specification, HLS tools automate the design process and generate a register transfer level RTL architecture taking account of user-specified constraints. However, design performance still depends on designer's skill to write the appropriate source code. The S2S4HLS (Source-to-Source for High-Level Synthesis) project intends to process source code transformations to guide synthesis hence leading to more efficient designs, and aims at providing a toolbox for automatic C code source-to-source transformations. The project is focused on three complementary goals to push the limits of existing HLS tools: loop transformations for performance optimization and a better resource usage, automatic floating-point to fixed-point conversion and synthesis of multi-mode architectures. S2S4HLS is organized into three sub-projects targeting these three objectives. The project is in close collaboration with ST Microelectronics and Compsys team at Inria Rhône-Alpes, within the overall INRIA-ST partnership agreement. It is financed by the Ministry of Industry in the Nano2012 program. Cairn is responsible of the project and involved in the three workpackages. ## 7.3. NANO2012 Program - RecMotifs (2008-2012) Participants: François Charot, Antoine Floch, Jérémie Guidoux, Christophe Wolinski. The RecMotifs project aims at the generation of application specific extensions targeting the STxP70 processor from STMicroelectronics. Cairn will study advanced technologies algorithms for graph matching and graph merging together with constraints programming methods. The project is in close collaboration with ST Microelectronics within the overall INRIA-ST partnership agreement. It is financed by the Ministry of Industry in the Nano2012 program. ## 7.4. ANR Architectures du Futur Open-People (2009-2011) Participants: Daniel Chillet, Robin Bonamy, Olivier Sentieys. The Open-People (Open Power and Energy Optimization PLatform and Estimator) project aims at defining a complete platform for low power estimation and optimization. The platform will be composed of hardware boards to support measurements for the applications. End-users will be able to upload their applications through a web portal, and to control the power measurements of the execution of their applications on a specific electronic board. The Open-People project will also propose a complete power component model library which allows end-users to estimate the power consumption of some parts of the applications without making measurements. This will allow to quickly evaluate the different design choices regarding the power consumption. Finally, through the web portal <a href="http://www.open-people.fr">http://www.open-people.fr</a>, Open-People will propose software tools to apply power optimizations. In this project, CAIRN team will develop power model for FPGA components using dynamic reconfiguration. Open-People involves LabSticc (Lorient), Trio (Nancy), CAIRN (Rennes/Lannion) and Dart (Lille/Valenciennes) teams from Inria, Leat at Nice, Thales (Colombes) and InPixal (Rennes). Cairn is in charge of power models and optimization for reconfigurable architectures. #### 7.5. ANR BioWiC (2009-2011) Participants: Steven Derrien, Naeem Abbas, Patrice Quinton. The increasing flow of genomic data provided by the steadily improvement of new biotechnologies cannot be now efficiently exploited without a systematic *in silico* analysis. Data need to be filtered, curated, classified, annotated, validated, etc., to be actively used in a discovery process. The design of such complex pipeline of processing stages is known to be an extremely tedious task as their designers have to deal with both specification and implementation issues. Indeed, the execution time of such *workflows* is very often a bottleneck as huge amount of data has to be processed. Therefore, the goal of the BioWiC (Bioinformatics Workflows for Intensive Computation) project is twofold: - Reducing the design time of complex bioinformatics pipelines by providing a domain specific workflow environment; - Reducing the execution time of these workflows through the use of parallel execution on GPU, FGPA and clusters of PC whenever possible. The ANR BioWic project is funded for 3 years, and involves several institutions (INRA-MIG, Ouest Genopole, CAIRN and Symbiose project-teams at INRIA) and Universities (Eliaus Laboratory at Université de Perpignan). For more details see <a href="http://biowic.inria.fr">http://biowic.inria.fr</a>. CAIRN will provide a framework for helping semi-automatic generation of flexible IP cores, by widening the scope typical design constraints so as to integrate communication and data reuse optimizations between the host and the hardware accelerator. #### 7.6. ANR Architectures du Futur - CIFAER (2008-2011) Participants: Sébastien Pillement, Manh Pham, Olivier Sentieys, Samuel Mouget. In various application domains, emerging requirements lead to the definition of new architectures for electronic embedded systems. In the automotive context, investigated solutions correspond to network of processing elements, distributed in the vehicle. In this context, the research activity considered in the CIFAER (Flexible Intra-Vehicule Communications and Embedded Reconfigurable Architectures) project is the definition of an innovative embedded architecture, based on general purpose processor with reconfigurable processing areas and on the use of adaptable interfaces (radio and powerline communications). Efficient software layers in the associated operating system will be investigated to enable new services as dynamic reconfiguration and task migration for error tolerance. CIFAER involves Irisa, IETR Rennes, Ireena Nantes, Atmel and Geensys. CAIRN will propose and develop the dynamically reconfigurable platform used a the test vehicle of the project. This platform will include fault-tolerant mechanisms for error mitigation. ## 7.7. ANR Architectures du Futur - FOSFOR (2008-2011) Participants: Daniel Chillet, Sébastien Pillement, Manh Pham, Ludovic Devaux, Didier Demigny. The Fosfor (Flexible Operating System FOr Reconfigurable platform) project aims at reconsidering the structure of the RTOS which is generally implemented in software, centralized, and static, by proposing a distributed RTOS with homogeneous interface from the application point of view. We propose to exploit dynamic and partial reconfiguration of the reconfigurable SoC. In this context, the tasks are statically or dynamically deployed (i.e. instantiated) on software units (general processors) or hardware units (reconfigurable areas). Flexibility of the OS will be achieved thanks to virtualization mechanisms of OS services, such that the tasks of the application are executed and communicate without prior knowledge of their assignment to software or hardware. FOSFOR involves Irisa, LEAT Nice, ETIS Cergy, Xilinx and Thales. CAIRN will propose and include in the FOSFOR OS a flexible communication infrastructure and its control management. ## 7.8. ANR Technologies Logicielles - SoCLib (2007-2010) Participants: François Charot, Kevin Martin, Laurent Perraudeau, Charles Wagner. The aim of SocLib (An Open Modeling and Simulation Platform for System-on-Chip Design) is to build an open platform for modeling and simulation of multi-processors system-on-chip, that can be used by both universities and industrial companies. The core of the platform is a library of simulation models for virtual components (IP cores), with a guaranteed path to silicon. The main concern of the SocLib project is a true interoperability between the IP cores: all SocLib components are written in SystemC and respect the VCI (Virtual Component Interface standard) communication protocol. CABA (cycle-accurate and bit-accurate) and TLMT (transaction level model with time) simulation models are proposed. For more details see <a href="http://soclib.lip6.fr">http://soclib.lip6.fr</a>. SOCLIB successfully ended in September 2010. ### 7.9. Pôle Images et Réseaux - Transmedi@ (2008-2009) Participants: Olivier Sentieys, Emmanuel Casseau, Cécile Beaumin-Palud, Arnaud Carer, Thomas Anger. The Transmedi@ project addresses the issue of video transcoding, and more generally media processing, with very-high performance for network infrastructures and high quality for broadcast equipments. The aim of Transmedi@ is to propose flexible reconfigurable co-processing architectures for the acceleration of video algorithms. In the context of network infrastructure, the platform has to be able to transcode in real time several video streams from various video formats and norms, while in the context of broadcast the main constraint comes from the high-quality (HD) of the video. CAIRN is involved in the definition of this platform and will propose innovating structures for reconfigurable coarse-grain processing and data transfer and storage, in this context of video processing. Transmedi@ involves a close collaboration with Alcatel, Envivio, Telecom Bretagne and IETR/Supelec. For more details see <a href="http://transmedia.irisa.fr">http://transmedia.irisa.fr</a>. Transmedi@ successfully ended in September 2010. ## **7.10. Pôle Images et Réseaux - RPS2 (2008-2010)** Participants: Florent Berthelot, François Charot, Charles Wagner, Christophe Wolinski. The RPS2 project started in November 2008. It aims at developing a FPGA-based demonstrator of a DVB-S2 receiver targeting professional applications. RPS2 involves three partners: Inria Rennes, Ditocom and Supelec Rennes. The contribution of CAIRN concerns the design of the hardware architecture of the FEC (Forward Error Correction) process of the DVB-S2 decoding system. This hardware architecture implements low-density parity-check (LDPC) code decoding. RPS2 successfully ended in September 2010. # 7.11. ANR Architectures du Futur - ROMA: Reconfigurable Operators for Multimedia Applications (2007-2010) **Participants:** Emmanuel Casseau, Antoine Floch, Shafqat Khan, Daniel Ménard, François Charot, Christophe Wolinski, Erwan Raffin, Olivier Sentieys. ROMA is an ANR "architectures du futur" project which involves IRISA-CAIRN, CEA-LIST, CNRS-LIRMM and Thomson R&D France. The ROMA project proposes to develop both a design methodology and a reconfigurable processor able to adapt its computing structure to video and image processing applications. The processor is built around a pipeline of coarse grain reconfigurable operators exhibiting efficient power and performance features. Flexibility is obtained through the use of mutable units. These units can be configured for the implemented function, the number representation of the data and the data bit-width. The configuration of the processor is dynamically done all along the application depending on the tasks that are to be carried out. Higher performance in terms of power consumption and computing power, with at least one-magnitude order with regards to state-of-the-art energy-efficient reconfigurable architectures, is expected. CAIRN is the leader of this project. For more details see <a href="http://roma.irisa.fr">http://roma.irisa.fr</a>. ROMA successfully ended in September 2010. ## 8. Other Grants and Activities #### 8.1. National Initiatives The CAIRN team has currently some collaboration with the following laboratories: CEA List, SATIE ENS Cachan, LEAT Nice, Lab-Sticc (Lorient, Brest), LIRMM Montpellier, ELIAUS Perpignan, ETIS Cergy, LIP6 Paris, IETR Rennes, Ireena Nantes; and with the following INRIA project-teams: Pops, Arénaire, Ares, Compsys, Espresso, Symbiose, TexMex. #### 8.1.1. Research Organization of CNRS (GDR) The team participates in the activities of: - GdR SOC-SIP (*System On Chip & System In Package*), working groups on reconfigurable architectures, embedded software for SoC, low power issues. See <a href="http://www.lirmm.fr/soc\_sip/">http://www.lirmm.fr/soc\_sip/</a>. CAIRN is the leader of the group on reconfigurable architectures. - GdR ISIS (Information Signal ImageS), working group on Algorithms Architectures Adequation. - GdR ASR (Architectures Systèmes et Réseaux) - GdR IM (Informatique Mathématique), C2 working group on Codes and Cryptography The Grappas project, funded by the *Equipe Projet Transversale* program from Université Européenne de Bretagne (UEB) aims at evaluating (and improving) the efficiency of automatic parallelization techniques for accelerating electromagnetic FDTD simulations of antennas on GPUs (Graphical Processing Units). The project is a joint project between IETR (D. Thouroude and R. Sauleau) and IRISA (S. Derrien). ## 8.2. European Initiatives The CAIRN team members are involved in close international cooperations with the following laboratories and universities: - Imec (Belgium) on scenario-based fixed-point data format refinement to enable energy-scalable of Software Defined Radios (SDR); - University of Erlangen-Nuremberg and Dresden University of Technology (Germany) on massively parallel embedded reconfigurable architectures and on dynamic reconfiguration optimisation in the mesh fabric; - University of Paderborn (Germany) on spatio-temporal scheduling for reconfigurable systems; - Lund University (Sweden) on constraints programming approach application in the reconfigurable data-paths synthesis flow; - Computer Vision and Robotic Group of the Institute for Informatics and Applications at the University of Girona (Spain) on parallel architectures for vision algorithms applied to underwater robot; - University of Eindhoven (Netherlands) on reconfigurable data-path synthesis; - University of Leiden (Netherlands) on parallel architecture synthesis; - Code and Cryptography group of University College Cork (Ireland) on arithmetic operators for cryptography. #### 8.3. International Initiatives The CAIRN team members are involved in close international cooperations with the following laboratories and universities: - LRTS laboratory of Laval University in Québec (Canada) on architectures for MIMO systems, with funds from FFQR and INRIA "Associated Team" (2006-2008); - LSSI laboratory of Québec University in Trois-Rivières (Canada) on the design of architectures for digital filters and mobile communications; - ENIT (Tunisia) on architectures for mobile communications: - Computer Science Department of the University of Colorado State in Fort-Collins (USA) on loop parallelization and on the development of high-level synthesis tools, this collaboration is supported by the INRIA "équipe associée" program since January 2010; - Los Alamos National Laboratory (USA) on optimized application specific reconfigurable architectures design; - University of Adelaide (Australia) on arithmetic operators; - University of Queensland (Australia) on reconfigurable architectures for scientific processing; - University of California, Riverside (USA) on optimized image processing applications synthesis; - VLSI CAD lab of the Electrical and Computer Engineering Department of University of Massachusetts at Amherst, USA on CAD tools for arithmetic datapath synthesis and optimization; - University of Douala, University of Yaoundé and University of Dschang in Cameroun on models and tools for parallelization. This cooperation takes place in the scope of the SARIMA GIS for the development of research laboratories in Mathematics and Computer Science in Africa. #### 8.4. Exterior research visitors - Prof. Gabriel Caffarena (University CEU-San Pablo, Madrid) for one week in July. - Dr Tuan-Duc Nguyen (International University Vietnam National Univ. Hochiminh City, Vietnam) for two months in Summer. - Daniel Gomez-Prado (University of Massachusetts, Amherst, USA) for three weeks in January. - Prof. Maciej Cieselski (University of Massachusetts, Amherst, USA) for one month in June (financial support from UR1, invited professor grant). - Tomofumi Yuki (Colorado State University, USA) from February for 3 months. - Mark Hamilton (2nd year PhD student in University College Cork, Irland) for 5 months (February to June), financial support from UEB (mobility grant). - Dr. Pierre-Louis Cayrel (Center for Advanced Security Research Darmstadt, Germany) for 2 days in November. ## 9. Dissemination ## 9.1. Scientific Community Animation - F. Charot, O. Sentieys and A. Tisserand are members of the steering committee of a CNRS spring school for graduate students on embedded systems architectures and associated design tools (ARCHI). - O. Sentieys and A. Tisserand are members of the steering committee of a CNRS spring school for graduate students on low-power design (ECOFAC). - D. Chillet, O. Sentieys and A. Tisserand organized the ECOFAC'10 école thématique en conception faible consommation pour les systèmes embarqués temps réel in Plestin les Grèves (Côtes-d'Armor), March 29th April 2rd 2010. Details on http://ecofac2010.irisa.fr/. - D. Chillet is a member of the program committee of the Conference on Design and Architectures for Signal and Image Processing (DASIP). - S. Derrien was publicity Chair for the IEEE ASAP 2010 conference that was organized in Rennes. - S. Pillement is a member of the Program Committee of IEEE FPL, SPL, DTIS and ERSA. - P. Quinton is member of the steering committee of the System Architecture MOdelling and Simulation (SAMOS) workshop and a member of the scientific committee of ASAP. - O. Sentieys is a member of the steering committee of the GDR SOC-SIP. He is the chair of the IEEE Circuits and Systems (CAS) French Chapter. In 2010, he was the member of different scientific evaluation committees (ANR Arpege, AERES) and an expert for some scientific organizations like ANR (blanc, jeune chercheur, cosinus, arpege). He is a member of Allistene working group. - F. Charot was Program Chair of the IEEE ASAP 2010 Conference. - D. Ménard and O. Sentieys were guest editors of Eurasip Journal of Advances in Signal Processing, special issue on Quantization of VLSI Digital Signal Processing Systems. - O. Sentieys was a member of technical program committee of the following conferences: IEEE DDECS, IEEE ISQED, IEEE VTC, IEEE DDECS, DCIS, DTIS, SBCCI. He is on the editorial board of Journal of Low Power Electronics, American Scientific Publishers. He was publicity Co-Chair for IEEE ISCAS 2010 and member of the International Advisory Committee Co-Chair in IEEE APCCAS 2010. - A. Tisserand was a member of technical program committee of the following conferences: IEEE ARITH 20, Reconfig2010 and SympA 2011. He is a member of the editorial board of the International Journal of High Performance Systems Architecture, Inderscience. C. Wolinski was a member of technical committee of the following conferences: IEEE/ACM DATE, IEEE FPL, IEEE ASAP, Euromicro DSD, IEEE ISQED, SympA. He is a member of Board of Directors of Euromicro Society and an Advisor of the Métivier Foundation, France. C. Wolinski was General Chair of the IEEE ASAP 2010 Conference. # 9.2. Current Ph.D. Subjects - Naeem Abbas, Flexible Hardware Accelerators for Biocomputing Applications - Mahtab Alam, Power Aware Signal Processing for Reconfigurable Radios in the context of Wireless Sensor Networks - Andrei Banciu, New Digital Design Methodology for Multi Giga bits/s Tranceivers - Robin Bonamy, Power Consumption Modelling and Optimisation for Reconfigurable Platform - Thomas Chabrier, Reconfigurable Arithmetic Units for Cryptoprocessors with Protection against Side Channel Attacks. - Antoine Eiche, Real-Time Scheduling for Heterogeneous and Reconfigurable Architectures using Neural Network Structures - Ludovic Devaux, Flexible Interconnect Infrastructure for Dynamically Reconfigurable Architecture - Antoine Floch, Pattern Recognition for Processor Instruction-Set Extension - Clément Guy, Generic Definition of Domain Specific Analysis using MDE, jointly with the Triskell EPI - Herve Yviquel, Video coding design framework based on SoC-based platforms - Kevin Martin, Extended Instruction-Set Generation for Processors Embedded in an FPGA - Antoine Morvan, Loop Transformations for Design Space Exploration in High-Level Synthesis - Jean-Charles Naud, Source-to-Source Code Transformation for Fixed-Point Conversion - Quoc-Tuong Ngo, Optimization of Precoding Strategies for Multi-User MIMO-OFDM Systems - Hai-Nam Nguyen, Dynamic Precision Scaling for Mobile Communications - Cécile Beaumin-Palud, Reconfigurable Architecture for High-Performance Video Transcoding - Karthick Parashar, System-level Approach for Implementation and Optimization of Signal Processing Applications into Fixed-Point Architectures - Danuta Pamula, Arithmetic Operators for Cryptography. - Matthieu Texier, Low-Power Embedded Multi-Core Architectures for Mobile Systems - Michel Theriault, Transmit Beam-forming for Distributed Wireless Access with Centralized Signal Processing - Vivek Tovinakere-Dwarakanath, Ultra-Low Power Reconfigurable Controllers for Wireless Sensor Networks - Le Quang Vinh Tran, Energy Optimisation of Cooperative Transmissions for Wireless Sensor Networks - Chenglong Xiao, Pattern-Based Guided High-Level Synthesis #### 9.3. Seminars and Invitations O. Chillet gave a lecture at ECOFAC'2010 on *Open-People: Open Power and Energy Optimization PLatform and Estimator* in April 2010. - D. Chillet gave a talk on *Open-People: Open Power and Energy Optimization PLatform and Estimator* at the *International Workshop on Power and Timing Modeling, Optimization and Simulation*, Grenoble, France, in September 2010. - O. Sentieys gave a keynote on *Challenges and Opportunities for Energy Reduction in Wireless Sensor Networks* at IET ISSC, Cork, Ireland in June 2010. - O. Sentieys gave an invited talk at the seminar of the Electrical and Computer Engineering Department, University of Massachusetts, Amherst, U.S.A. on *A Complete Design-Flow for the Generation of Ultra Low-Power WSN Node Architectures Based on Micro-Tasking* in Nov. 2010. - O. Sentieys gave a lecture at ECOFAC'2010 on Energy Reduction in Wireless Sensor Networks in March 2010. - O. Sentieys gave a popularization talk at Fête de la science 2010 in Lannion on réseaux de capteurs sans fil. - A. Tisserand gave a talk at the seminar of the Electrical & Computer Engineering Department, University of Massachusetts, Amherst, U.S.A. on *Hardware Evaluation of Functions using Optimized Polynomial Approximations* in November 2010. - A. Tisserand was invited to give a seminar at the Center for Advanced Security Research Darmstadt (CASED) in Germany, on *Secured Arithmetic Operators for Cryptography* in April 2010. - A. Tisserand was invited to give a seminar at the Computer Science and Telecommunications Departement, École Normale Supérieure de Cachan, antenne de Bretagne on *Exotic Number Systems for Hardware Arithmetic Operators* in March 2010. - A. Tisserand gave a lecture at the CNRS ECOFAC 2010 spring school on *Introduction to Power Consumption in Digital Integrated Circuits*. - A. Tisserand gave a popularization talk at *Fête de la science* 2010 in Lannion on *puces électroniques et sécurité numérique*. # 9.4. Teaching and Responsibilities There is a strong teaching activity in the CAIRN team since most of the permanent members are Professors or Associate Professors. - P. Quinton is the deputy-director of Ecole Normale Supérieure de Cachan, responsible of the Brittany branch of this school. - E. Casseau is the Director of Academic Studies of ENSSAT since Sep. 2009. - C. Wolinski is the Director of Academic Studies of ESIR since May 2009. - P. Scalart was the Head of the Electronics Engineering department of ENSSATuntil June 2010. - D. Chillet is the Head of the Electronics Engineering department of ENSSATsince July 2010. - O. Sentieys is responsible of the "Embedded Systems" branch of the SISEA Master of Research (M2R). - D. Chillet is member of the French National University Council since 2009 in signal processing and electronics (Conseil National des Université en 61ème section). - S. Derrien has served in 2010 to two hiring committees: University of Rennes 1 and University Claude Bernard in Lyon. - P. Quinton, L. Perraudeau, S. Pillement, D. Chillet and C. Wolinski serve in the hiring committee of University of Rennes 1. - E. Casseau serves in the hiring committee of University of Nice Sophia Antipolis. - S. Pillement serves in the hiring committee of University of Cergy. - O. Sentieys serves in the hiring committee of INSA Rennes, University of Rennes 1, Univ. Lille, Univ. of South Brittany. O. Berder's main teaching activities at ENSSAT are signal processing, microprocessor architecture, and wireless communications. He also teaches signal processing at IUT Lannion and mobile communications at ENI Gabès, Tunisia. - D. Chillet teaches a course on *advanced processors architectures* in M2R/ENSSAT and on *Low-power digital CMOS circuits* at Telecom Bretagne. - E. Casseau's main teaching activities are *signal processing*, *hardware description language* and *real time design methodology*. S. Pillement teaches at IUT Lannion. He also teaches a course on Network on Chip in the Master SIC at ENI Sousse, Tunsia. - R. Rocher teaches at IUT Lannion. - P. Quinton teaches at ENS Cachan, IFSIC and M2R. - P. Scalart teaches courses on signal processing at ENSSAT and M2R. - O. Sentieys teaches at ENSSAT and M2R where he gives courses on *Methodologies for integrated system design* and signal processing. He also teaches *Digital IC: from synthesis to implementation* in the Master Microelectronics System Design and Technology at ENSICAEN. - C.Wolinski is responsible for the following courses: Design of Embedded Systems, Signal, Image, Architectures, Advanced Architectures. ENSSAT stands for "Ecole Nationale Supérieure des Sciences Appliquées et de Technologie" and is an "Ecole d'Ingénieurs" of the University of Rennes 1, located in Lannion. Ifsic stands for "Institut de Formation Supérieure en Informatique et Communication". ESIR (formerly DIIC) stands for "École supérieure d'ingénieur de Rennes" and is an "Ecole d'Ingénieurs" of the University of Rennes 1, located in Rennes. M2R stands for Master of Research, second year. # 10. Bibliography # Major publications by the team in recent years - [1] L. COLLIN, O. BERDER, P. ROSTAING, G. BUREL. *Optimal Minimum Distance Based Precoder for MIMO Spatial Multiplexing Systems*, in "IEEE Transactions on Signal Processing", March 2004, vol. 52, n<sup>o</sup> 3. - [2] A. COURTAY, O. SENTIEYS, J. LAURENT, N. JULIEN. *High-level Interconnect Delay and Power Estimation*, in "Journal of Low Power Electronics (JOLPE)", 2008, vol. 4, n<sup>o</sup> 1, p. 21-33. - [3] R. DAVID, S. PILLEMENT, O. SENTIEYS. *Energy-Efficient Reconfigurable Processors*, in "Low Power Electronics Design", C. PIGUET (editor), Computer Engineering, Vol 1, CRC Press, August 2004, chap. 20. - [4] S. DERRIEN, P. QUINTON. *Parallelizing HMMER for Hardware Acceleration on FPGAs*, in "18th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2007)", Montreal, Canada, July 2007, p. 10–18, Best Paper Award. - [5] L. IMBERT, A. PEIRERA, A. TISSERAND. A Library for Prototyping the Computer Arithmetic Level in Elliptic Curve Cryptography, in "Proc. Advanced Signal Processing Algorithms, Architectures and Implementations XVII", San Diego, California, U.S.A., F. T. LUK (editor), SPIE, August 2007, vol. 6697, n<sup>o</sup> 66970N, p. 1–9, http://dx.doi.org/10.1117/12.733652. - [6] K. KUCHCINSKI, C. WOLINSKI. Global Approach to Scheduling Complex Behaviors based on Hierarchical Conditional Dependency Graphs and Constraint Programming, in "Journal of Systems Architecture", December 2003, vol. 49, no 12-15. - [7] D. MENARD, D. CHILLET, O. SENTIEYS. *Floating-to-fixed-point Conversion for Digital Signal Processors*, in "EURASIP Journal on Applied Signal Processing (JASP), Special Issue Design Methods for DSP Systems", 2006, vol. 2006, no 1, p. 1–15. - [8] D. MENARD, O. SENTIEYS. *Automatic Evaluation of the Accuracy of Fixed-point Algorithms*, in "IEEE/ACM Design, Automation and Test in Europe (DATE-02)", Paris, March 2002. - [9] S. PILLEMENT, O. SENTIEYS, R. DAVID. DART: A Functional-Level Reconfigurable Architecture for High Energy Efficiency, in "EURASIP Journal on Embedded Systems (JES)", 2008, p. 1-13, Article ID 562326, 13 pages. - [10] C. PLAPOUS, C. MARRO, P. SCALART. *Improved signal-to-noise ratio estimation for speech enhancement*, in "IEEE Transactions on Speech and Audio Processing", 2006, vol. 14, n<sup>o</sup> 6. - [11] A. TISSERAND. *High-Performance Hardware Operators for Polynomial Evaluation*, in "Int. J. High Performance Systems Architecture", March 2007, vol. 1, n<sup>o</sup> 1, p. 14–23, invited paper, http://dx.doi.org/10.1504/IJHPSA.2007.013288. - [12] C. WOLINSKI, M. GOKHALE, K. MCCABE. *Polymorphous fabric-based systems: Model, tools, applications*, in "Journal of Systems Architecture", September 2003, vol. 49, n<sup>o</sup> 4-6. # **Publications of the year** #### **Doctoral Dissertations and Habilitation Theses** - [13] D. CHILLET. Contribution à la gestion dynamique de ressources reconfigurables intégrées au sein d'un MPSoC, University of Rennes 1, June 2010. - [14] E. GRACE. Hiérarchie mémoire reconfigurable faible consommation pour systèmes enfouis, University of Rennes 1 ENSSAT. October 2010. - [15] S. KHAN. Development of high performance hardware architectures for multimedia applications, University of Rennes 1 ENSSAT, September 2010. - [16] K. MARTIN. *Génération automatique d'extensions de jeux d'instructions de processeurs*, University of Rennes 1, September 2010, http://tel.archives-ouvertes.fr/tel-00526133/PDF/these KevinMartin.pdf. - [17] A. PASHA. System-Level Synthesis of Ultra Low-Power Wireless Sensor Network Node Controllers: A Complete Design-Flow, University of Rennes 1 ENSSAT, December 2010. - [18] H. M. PHAM. Apport de la reconfiguration dynamique dans les architectures tolérantes aux fautes, University of Rennes 1 ENSSAT, December 2010. - [19] S. PILLEMENT. *Calcul reconfigurable dynamiquement : du transistor à l'application*, University of Rennes 1, October 2010. - [20] A. TISSERAND. Étude et conception d'opérateurs arithmétiques, University of Rennes 1, July 2010. ## **Articles in International Peer-Reviewed Journal** [21] M. ALAM, O. BERDER, D. MENARD, T. ANGER, O. SENTIEYS. A Hybrid Model for Accurate Energy Analysis of WSN nodes, in "EURASIP Journal on Embedded Systems", 2011, to appear. - [22] C. Andriamisaina, P. Coussy, E. Casseau, C. Chavet. *High-Level Synthesis for Designing Multi-mode Architectures*, in "IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems", November 2010, vol. 29, no 11, p. 1736 -1749, http://dx.doi.org/10.1109/TCAD.2010.2062751. - [23] D. CHILLET, S. PILLEMENT, O. SENTIEYS. *Real-Time Scheduling on Heterogeneous SoC Architectures Using Inhibitor Neurons in a Neural Network*, in "Journal of Systems Architecture", 2011, to appear. - [24] A. COURTAY, J. LAURENT, O. SENTIEYS. Spatial Switching data coding technique analysis and improvements for interconnect power consumption optimization, in "Journal of Low Power Electronics (JOLPE)", April 2010, vol. 6, n<sup>o</sup> 1, p. 32-43. - [25] S. DERRIEN, P. QUINTON. *Hardware Acceleration of HMMER on FPGAs*, in "Journal of Signal Processing Systems", October 2010, vol. 58, n<sup>o</sup> 1, p. 53–67, http://dx.doi.org/10.1007/s11265-008-0262-y. - [26] L. DEVAUX, S. B. SASSI, S. PILLEMENT, D. CHILLET, D. DEMIGNY. *Flexible interconnection network for dynamically and partially reconfigurable architectures*, in "International Journal on Reconfigurable Computing (IJRC)", 2010, vol. 2010, no article ID 390545, 15 pages, http://dx.doi.org/10.1155/2010/390545. - [27] S. KHAN, E. CASSEAU, D. MENARD. *High speed reconfigurable SWP operator for multimedia processing using redundant data representation*, in "Int. Journal of Information Sciences and Computer Engineering (IJISCE)", May 2010, vol. 1, p. 45-52, http://hal.inria.fr/inria-00480330/en/. - [28] G. LE JAN, R. LE BOUQUIN-JEANNÈS, N. COSTET, N. TROLÈS, P. SCALART, D. PICHANCOURT, G. FAUCON, J.-E. GOMBERT. *Multivariate predictive model for dyslexia diagnosis*, in "Annals of Dyslexia", 2010, p. 1-20, http://dx.doi.org/10.1007/s11881-010-0038-5. - [29] B. LE GAL, E. CASSEAU. Word-Length Aware DSP Hardware Design Flow Based on High-Level Synthesis, in "Journal of Signal Processing Systems", April 2010, vol. 2010, n<sup>O</sup> Online, p. 1–17, http://dx.doi.org/10. 1007/s11265-010-0467-8. - [30] S. PIESTRAK, S. PILLEMENT, O. SENTIEYS. Comments on 'A low-power dependable Berger code for fully asymmetric communication', in "IEEE Communications Letters", August 2010, vol. 14, n<sup>o</sup> 8, p. 761-763, http://dx.doi.org/10.1109/LCOMM.2010.08.100447. - [31] S. PIESTRAK, S. PILLEMENT, O. SENTIEYS. *On designing efficient codecs for bus-invert Berger code for fully asymmetric communication*, in "IEEE Transactions on Circuits and Systems II", October 2010, vol. 57, n<sup>o</sup> 10, p. 777 -781, http://dx.doi.org/10.1109/TCSII.2010.2067773. - [32] S. PILLEMENT, J. PHILIPPE, O. SENTIEYS. *Spatio-temporal Coding to Improve Speed and Noise Tolerance of On-chip Interconnect*, in "MicroElectronics Journal", 2010, vol. 41, n<sup>o</sup> 8, p. 480 486 [*DOI*: DOI: 10.1016/J.MEJO.2009.11.001], http://www.sciencedirect.com/science/article/B6V44-4Y0547M-1/2/4634f1a0a8cc2bd49ae33ceede13beb0. - [33] R. ROCHER, D. MENARD, O. SENTIEYS, P. SCALART. *Accuracy Evaluation of Fixed-Point based LMS Algorithm*, in "Digital Signal Processing", May 2010, vol. 20, n<sup>o</sup> 3, p. 640-652 [DOI: DOI:10.1016/J.DSP.2009.10.007], http://hal.inria.fr/inria-00450935/en/. - [34] R. ZHANG, J. GORCE, O. BERDER, O. SENTIEYS. Lower Bound of Energy-Latency Trade-off of Opportunistic Routing in Multi-hop Networks, in "EURASIP Journal on Wireless Communciations and Networking", 2010, to appear. #### **Articles in National Peer-Reviewed Journal** [35] L. DEVAUX, S. PILLEMENT, D. CHILLET, D. DEMIGNY. *DRAFT: réseau flexible pour architecture reconfigurable dynamiquement*, in "Technique et Science Informatiques (TSI)", 2011, to appear, http://hal.inria.fr/inria-00536704/en/. #### **Invited Conferences** [36] T. CHABRIER, D. PAMULA, A. TISSERAND. *Hardware implementation of DBNS recoding for ECC processor*, in "Proc. of the 44rd Asilomar Conference on Signals, Systems and Computers", Pacific Grove, California, U.S.A., IEEE, November 2010. ## **International Peer-Reviewed Conference/Proceedings** - [37] N. ABBAS, S. DERRIEN, S. RAJOPADHYE, P. QUINTON. *Accelerating HMMER on FPGA using Parallel Prefixes and Reductions*, in "Proc. of the IEEE International Conference on Field-Programmable Technology (FPT'10)", Beijing, China, December 2010, p. 37-44, http://dx.doi.org/10.1109/FPT.2010.5681755. - [38] A. BANCIU, E. CASSEAU, D. MENARD, T. MICHEL. A Case Study Of The Stochastic Modeling Approach For Range Estimation, in "Proc. of the Conference on Design and Architectures for Signal and Image Processing (DASIP)", Edinburgh, UK, October 2010, p. 301–308. - [39] C. BEAUMIN, O. SENTIEYS, E. CASSEAU, A. CARER. *A Coarse-Grain Reconfigurable Hardware Architecture for RVC-CAL-based Design*, in "Proc. of the Conference on Design and Architectures for Signal and Image Processing (DASIP)", Edinburgh, UK, October 2010, p. 161–168. - [40] O. BERDER, O. SENTIEYS. *PowWow: Power Optimized Hardware/Software Framework for Wireless Motes*, in "Proc. of the Workshop on Ultra-Low Power Sensor Networks (WUPS), co-located with Int. Conf. on Architecture of Computing Systems (ARCS 2010)", Hannover, Germany, February 2010, p. 229–233. - [41] F. BERTHELOT, F. CHAROT, C. WAGNER, C. WOLINSKI. *Design Methodology for a High Performance Robust DVB-S2 Decoder Implementation*, in "Proc. of the 13th Euromicro Conference on Digital System Design Architectures, Methods and Tools (DSD 2010)", Lille France, September 2010, p. 667 674, http://hal.inria.fr/inria-00480723/en/. - [42] L. DEVAUX, S. PILLEMENT, D. CHILLET, D. DEMIGNY. *Mesh and Fat-Tree comparison for dynamically reconfigurable applications*, in "Proc. of the Reconfigurable Communication-centric Systems on Chip (Re-CoSoC'10)", Karlsruhe, Germany, May 2010. - [43] L. DEVAUX, S. PILLEMENT, D. CHILLET, D. DEMIGNY. *Operating System Services for Reconfigurable System-on-Chip Communication*, in "Proc. of the Design of Circuits and Intergrated Systems (DCIS'10)", Canary Islands, Spain, November 2010, http://hal.inria.fr/inria-00536709. [44] L. DEVAUX, S. PILLEMENT, D. CHILLET, D. DEMIGNY. *R2NoC: dynamically Reconfigurable Routers for flexible Networks on Chip*, in "Proc. of the International Conference on ReConFigurable Computing and FPGAs (ReConFig'10)", Cancun, Mexico, December 2010, http://hal.inria.fr/inria-00536711. - [45] A. EICHE, D. CHILLET, S. PILLEMENT, O. SENTIEYS. *Task placement for dynamic and partial reconfigurable region*, in "Proc. of the Conference on Design and Architectures for Signal and Image Processing (DASIP)", Edinburgh, UK, October 2010, p. 82-88, http://hal.inria.fr/inria-00536714. - [46] A. FLOCH, C. WOLINSKI, K. KUCHCINSKI. Combined Scheduling and Instruction Selection for Processors with Reconfigurable Cell Fabric, in "21th IEEE International Conference on Application-specific Systems, Architectures and Processors, (ASAP 2010)", Rennes France, IEEE, July 2010, http://hal.inria.fr/inria-00480680/ en/. - [47] J. FRIGO, E. RABY, S. BRENNAN, C. WOLINSKI, C. WAGNER, F. CHAROT, E. ROSTEN, V. KULATHU-MANI. *Energy efficient sensor node implementations*, in "Proc. of the 18th annual ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA'10)", New York, NY, USA, ACM, 2010, p. 37–40 [DOI: 10.1145/1723112.1723120], http://hal.inria.fr/inria-00451689/fr. - [48] T. M. N. HOANG, S. RAGOT, B. KÖVESI, P. SCALART. Parametric stereo extension of ITU-T G.722 based on a new downmixing scheme, in "Proc. of the IEEE Multimedia Signal Processing Conference", Saint-Malo France, October 2010, p. 188 - 193 [DOI: 10.1109/MMSP.2010.5662017], http://hal.inria.fr/inria-00512646/en/. - [49] S. M. A. H. JAFRI, S. PIESTRAK, O. SENTIEYS, S. PILLEMENT. *Design of a fault-tolerant coarse-grained reconfigurable architecture: A case study*, in "Proc. of the 11th IEEE International Symposium on Quality Electronic Design (ISQED 2010)", San Diego, CA, USA, IEEE, March 2010, 6 pages. - [50] L. LEPAULOUX, P. SCALART, C. MARRO. *Computationally efficient and robust frequency-domain GSC*, in "Proc. of the 12th IEEE International Workshop on Acoustic Echo and Noise Control", Tel-Aviv Israel, August 2010, p. 1-4, http://hal.archives-ouvertes.fr/inria-00512654/en/. - [51] D. MENARD, D. NOVO, R. ROCHER, F. CATTHOOR, O. SENTIEYS. Quantization Mode Opportunities in Fixed-Point System Design, in "Proc. of the XVIII European Signal and Image Processing Conference (EUSIPCO'10)", Aalborg, Denmark, EURASIP, August 2010, p. 542-546, http://hal.inria.fr/inria-00534526/ en/. - [52] Q.-T. NGO, O. BERDER, P. SCALART. 3-D minimum Euclidean distance based sub-optimal precoder for MIMO spatial multiplexing systems, in "Proc. of the IEEE International Conference on Communications (ICC)", Cape Town, South Africa, June 2010, p. 1-5, http://dx.doi.org/10.1109/ICC.2010.5502075. - [53] T. NGUYEN, O. BERDER, O. SENTIEYS. *Cooperative MISO and Relay Comparison in Energy Constrained WSNs*, in "Proc. of the 71st IEEE International Vehicular Technology conference (VTC)", Taipei, Taiwan, May 2010, p. 1-5, http://dx.doi.org/10.1109/VETECS.2010.5493688. - [54] T. NGUYEN, L. MAI, O. BERDER, O. SENTIEYS. *Cooperative MIMO and Relay Association Strategy*, in "International Conferences on Advanced Technologies for Communications (ATC)", Ho Chi Minh city, Vietnam, October 2010, p. 327 330, http://dx.doi.org/10.1109/ATC.2010.5672699. - [55] D. PAMULA, E. HRYNKIEWICZ, A. TISSERAND. *Multiplication in GF* $(2^m)$ : area and time dependency/efficiency/complexity analysis, in "Proc. of the 10th International IFAC Workshop on Programmable Devices and Embedded Systems (PDeS)", Pszczyna, Poland, IFAC, October 2010. - [56] K. PARASHAR, R. ROCHER, D. MENARD, O. SENTIEYS, D. NOVO, F. CATTHOOR. Fast performance evaluation of fixed-point systems with un-smooth operators, in "Proc. of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD", San Jose, CA, IEEE/ACM, November 2010, p. 9-16, http://hal.inria.fr/ inria-00534527/en/. - [57] K. PARASHAR, R. ROCHER, D. MENARD, O. SENTIEYS. A Hierarchical Methodology for Word-Length Optimization of Signal Processing Systems, in "Proc. of the 23rd International Conference on VLSI Design, VLSID'10", Bangalore, India, January 2010, p. 318–323 [DOI: 10.1109/VLSI.DESIGN.2010.66], http:// hal.inria.fr/inria-00432590/en/. - [58] K. PARASHAR, R. ROCHER, D. MENARD, O. SENTIEYS. Analytical Approach for Analyzing Quantization Noise Effects on Decision Operators, in "Proc. of the 35th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)", Dallas, Texas, USA, March 2010, p. 1554 1557 [DOI: 10.1109/ICASSP.2010.5495520], http://hal.inria.fr/inria-00534522/en/. - [59] K. PARASHAR, R. ROCHER, D. MENARD, O. SENTIEYS. Estimating Frequency Characteristics of Quantization Noise for Performance Evaluation of Fixed Point Systems, in "Proc. of the XVIII European Signal and Image Processing Conference (EUSIPCO'10)", Aalborg, Denmark, EURASIP, August 2010, p. 552-556, http://hal.inria.fr/inria-00534524/en/. - [60] K. PARASHAR, D. MENARD, R. ROCHER, O. SENTIEYS. *Shaping Probability Density Function of Quantization Noise in Fixed Point Systems*, in "Proc. of the 44th Annual Asilomar Conference on Signals, Systems, and Computers", Monterey, CA, November 2010, http://hal.inria.fr/inria-00534529/en/. - [61] A. PASHA, S. DERRIEN, O. SENTIEYS. A Complete Design-Flow for the Generation of Ultra Low-Power WSN Node Architectures Based on Micro-Tasking, in "Proc. of the 47th IEEE/ACM Design Automation Conference (DAC)", Anaheim, CA, USA, June 2010, p. 693 698. - [62] A. PASHA, S. DERRIEN, O. SENTIEYS. A Novel Approach for Ultra Low-Power WSN Node Generation, in "Proc. of the IET Irish Signals and Systems Conference (ISSC 2010)", Cork, Ireland, June 2010, p. 204 - 209, http://dx.doi.org/10.1049/cp.2010.0513. - [63] A. PASHA, S. DERRIEN, O. SENTIEYS. System-Level Synthesis for Ultra Low-Power Wireless Sensor Nodes, in "Proc. of the 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools (DSD)", Lille, France, September 2010, p. 493 - 500, http://dx.doi.org/10.1109/DSD.2010.88. - [64] M. PHAM, L. DEVAUX, S. PILLEMENT. *Dynamic NOC-based MPSoC with Fault-Tolerance Support*, in "DAC Workshop on "Diagnostic Services in Network-on-Chips (DSNoC)"", Anaheim, USA, june 2010. - [65] M. Pham, S. Pillement, D. Demigny. *Evaluation of Fault-Mitigation Schemes for Fault-Tolerant Dynamic MPSoC*, in "Proc. of the 2010 International Conference on Field Programmable Logic and Applications (FPL)", Milano, Italy, October 2010, p. 159-162, http://hal.inria.fr/inria-00536720. [66] M. PHAM, S. PILLEMENT, D. DEMIGNY. *FT-DyMPSoC: Analytical Model for Fault-Tolerant Dynamic MPSoC*, in "Proc. of the 18th Int. IEEE Symposium on Field-Programmable Custom Computing Machines", Charlotte, North Carolina, may 2010, poster. - [67] S. PIESTRAK. *Design of cost-efficient multipliers modulo* $2^a 1$ , in "Proc. of the IEEE International Symposium on Circuits and Systems (ISCAS 2010)", Paris, France, June 2010, p. 4093 4096, http://dx.doi.org/10.1109/ISCAS.2010.5537626. - [68] S. PIESTRAK. On reducing error rate of data protected using systematic unordered codes in asymmetric channels, in "Proc. of the 13th Euromicro Conference on Digital System Design (DSD 2010)", Lille, France, September 2010, p. 133-140, http://dx.doi.org/10.1109/DSD.2010.117. - [69] E. RAFFIN, C. WOLINSKI, F. CHAROT, K. KUCHCINSKI, S. GUYETANT, S. CHEVOBBE, E. CASSEAU. Scheduling, Binding and Routing System for a Run-Time Reconfigurable Operator Based Multimedia Architecture, in "Proc. of the Conference on Design and Architectures for Signal and Image Processing (DASIP)", Edinburgh, UK, October 2010, p. 12–19, Best Paper Award, http://hal.inria.fr/inria-00539874/PDF/dasip2010. pdf. - [70] P. SCALART, L. LEPAULOUX. On the convergence behavior of recursive adaptive noise cancellation structure in the presence of crosstalk, in "Proc. of the IEEE International Conference on Sensor Signal Processing for Defence (SSPD)", London, UK, 2010. - [71] M. THÉRIAULT, S. ROY, O. SENTIEYS. Transmitter Architecture for the Evaluation of Beamforming Schemes in the IEEE 802.11n Standard, in "Proc. of the 11th annual IEEE Wireless and Microwave Technology (WAMI) Conference", Melbourne, FL, USA, April 2010, p. 1-4, http://dx.doi.org/10.1109/WAMICON.2010. 5461867. - [72] A. TISSERAND. *Towards Automatic Accuracy Validation and Optimization of Fixed-Point Hardware Descriptions in SystemC*, in "Proc. of the 14th GAMM-IMACS International Symposium on Scientific Computing, Computer Arithmetic and Validated Numerics (SCAN)", Lyon, France, September 2010. - [73] L. TRAN, O. BERDER, O. SENTIEYS. *Energy Efficiency of Cooperative Strategies in Wireless Sensor Networks*, in "International Conferences on Advanced Technologies for Communications (ATC)", Ho Chi Minh city, Vietnam, October 2010, p. 29 32, http://dx.doi.org/10.1109/ATC.2010.5672727. - [74] T.D. VIVEK, O. SENTIEYS, S. DERRIEN. Wakeup Time and Wakeup Energy Estimation in Power-Gated Logic Clusters, in "Proc. of the 24th International Conference on VLSI Design", Chennai, India, January 2011. - [75] C. WOLINSKI, K. KUCHCINSKI, K. MARTIN, A. FLOCH, E. RAFFIN, F. CHAROT. *Graph Constraints in Embedded System Design*, in "Worshop on Combinatorial Optimization for Embedded System Design (COESD 2010)", Bologne Italie, June 2010, <a href="http://hal.inria.fr/inria-00481135/en/">http://hal.inria.fr/inria-00481135/en/</a>. #### **Workshops without Proceedings** [76] A. TISSERAND, T. CHABRIER, D. PAMULA. Arithmetic Level Countermeasures for ECC Coprocessor, in "Workshop on Coding and Cryptography", May 2010, Claude Shannon Institut Workshop on Coding and Cryptography. [77] C. XIAO, E. CASSEAU. *Pattern Extraction for Digital Design*, in "National Workshop of the GdR SoC-SiP (System-On-Chip & System-In-Package)", Paris, France, June 2010. #### **Scientific Books (or Scientific Book chapters)** - [78] D. CHILLET, S. PILLEMENT, O. SENTIEYS. RANN: A Reconfigurable Artificial Neural Network Model for Task Scheduling on Reconfigurable System-on-Chip, in "Algorithm-Architecture Matching for Signal and Image Processing", Springer, 2010, p. 117-144. - [79] F. NOUVEL, P. TANGUY, S. PILLEMENT, M. PHAM. Experiments of in-vehicle power line Communications, in "Vehicular Technologies", Intech, 2011, 15, Accepted for publication. # **Books or Proceedings Editing** [80] F. CHAROT, F. HANNIG, J. TEICH, C. WOLINSKI (editors). ASAP 2010: 21st IEEE International Conference on Application-specific Systems, Architectures and Processors, Institute of Electrical and Electronics Engineers (IEEE, Rennes, France, July 2010. #### **Scientific Popularization** - [81] O. SENTIEYS, O. BERDER. Réseaux de capteurs sans fil, October 2010, Fête de la science. - [82] A. TISSERAND. Puces électroniques et sécurité numérique, October 2010, Fête de la science. #### References in notes - [83] A. AHMADINIA, C. BOBDA, M. BEDNARA, J. TEICH. A new approach for on-line placement on reconfigurable devices, in "18th International Parallel and Distributed Processing Symposium, 2004.", 2004. - [84] Z. ALLIANCE. Zigbee specification, ZigBee Alliance, 2005, no ZigBee Document 053474r06, Version. - [85] V. BAUMGARTE, G. EHLERS, F. MAY, A. NÜCKEL, M. VORBACH, M. WEINHARDT. *PACT XPP A Self-Reconfigurable Data Processing Architecture*, in "The Journal of Supercomputing", 2003, vol. 26, n<sup>o</sup> 2, p. 167–184. - [86] C. BOBDA. Introduction to Reconfigurable Computing: Architectures Algorithms and Applications, Springer, 2007. - [87] C. BOBDA, M. MAJER, D. KOCH, A. AHMADINIA, J. TEICH. A Dynamic NoC Approach for Communication in Reconfigurable Devices, in "Proceedings of International Conference on Field-Programmable Logic and Applications (FPL)", Antwerp, Belgium, Lecture Notes in Computer Science (LNCS), Springer, August 2004, vol. 3203, p. 1032–1036. - [88] D. CHILLET, S. PILLEMENT, O. SENTIEYS. A Neural Network Model for Real-Time Scheduling on Heterogeneous SoC Architectures, in "IEEE International Joint Conference on Neural Networks, IJCNN'07", Orlando, FL, August, 12-17 2007. - [89] K. COMPTON, S. HAUCK. *Reconfigurable computing: a survey of systems and software*, in "ACM Comput. Surv.", 2002, vol. 34, n<sup>o</sup> 2, p. 171–210, http://doi.acm.org/10.1145/508352.508353. [90] G. CONSTANTINIDES, P. CHEUNG, W. LUK. Wordlength optimization for linear digital signal processing, in "IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems", October 2003, vol. 22, no 10, p. 1432-1442. - [91] K. DANNE, R. MUHLENBERND, M. PLATZNER. Executing hardware tasks on dynamically reconfigurable devices under real-time conditions, in "International Conference on Field Programmable Logic and Applications", Lecture Notes in Computer Science, 2006. - [92] R. DAVID, S. PILLEMENT, O. SENTIEYS. *Energy-Efficient Reconfigurable Processors*, in "Low Power Electronics Design", C. PIGUET (editor), Computer Engineering, Vol 1, CRC Press, August 2004, chap. 20. - [93] A. DEJONGHE, B. BOUGARD, S. POLLIN, J. CRANINCKX, A. BOURDOUX, L. VAN DER PERRE, F. CATTHOOR. *Green Reconfigurable Radio Systems*, in "Signal Processing Magazine, IEEE", 2007, vol. 24, n<sup>o</sup> 3, p. 90–101. - [94] A. DUNKELS, B. GRONVALL, T. VOIGT. Contiki-a lightweight and flexible operating system for tiny networked sensors, in "Proceedings of the First IEEE Workshop on Embedded Networked Sensors", 2004. - [95] C. EBELING, D. CRONQUIST, P. FRANKLIN. RaPiD Reconfigurable Pipelined Datapath, in "International Workshop on Field Programmable Logic and Applications", Darmstadt, Lecture notes in Computer Science 1142, September 1996, p. 126–135. - [96] R. HARTENSTEIN. A Decade of Reconfigurable Computing: A Visionary retrospective, in "Design Automation and Test in Europe (DATE 01)", Munich, Germany, March 2001. - [97] R. HARTENSTEIN, M. HERZ, T. HOFFMAN, U. NAGELDINGER. Using The KressArray for Configurable Computing, in "Configurable Computing: Technology and Applications, Proc. SPIE 3526", Bellingham, WA, November 1998, p. 150–161. - [98] S. KIM, W. SUNG. Word-Length Optimization for High Level Synthesis of Digital Signal Processing Systems, in "IEEE Workshop on Signal Processing Systems", Boston, October 1998, p. 142-151. - [99] K. KUM, J. KANG, W. SUNG. AUTOSCALER for C: An optimizing floating-point to integer C program converter for fixed-point digital signal processors, in "IEEE Transactions on Circuits and Systems II Analog and Digital Signal Processing", September 2000, vol. 47, n<sup>o</sup> 9, p. 840-848. - [100] M. LEE, H. SIGNH, G. LU, N. BAGHERZADEH, F. KURDAHI. *Design and Implementation of the MorphoSys Reconfigurable Computing Processor*, in "Journal of VLSI and Signal Processing-Systems for Signal, Image and Video Applications", March 2000, vol. 24, no 2, p. 147–164. - [101] T. MARESCAUX, V. NOLLET, J. MIGNOLET, A. BARTICA, W. MOFFATA, P. AVASAREA, P. COENEA, D. VERKEST, S. VERNALDE, R. LAUWEREINS. *Run-time support for heterogeneous multitasking on reconfigurable SoCs*, in "the VLSI journal", 2004, vol. 38, p. 107–130, http://doi.acm.org/10.1145/996566.996637. - [102] D. MENARD, D. CHILLET, F. CHAROT, O. SENTIEYS. Automatic Floating-point to Fixed-point Conversion for DSP Code Generation, in "International Conference on Compilers, Architectures and Synthesis for Embedded Systems 2002 (CASES 2002)", Grenoble, October 2002. - [103] D. MENARD, D. CHILLET, O. SENTIEYS. Floating-to-fixed-point Conversion for Digital Signal Processors, in "EURASIP Journal on Applied Signal Processing (JASP), Special Issue Design Methods for DSP Systems", 2006, vol. 2006, no 1. - [104] T. MIYAMORI, K. OLUKOTUN. *REMARC: Reconfigurable Multimedia Array Coprocessor*, in "IEICE Transactions on Information and Systems E82-D", February 1999, p. 389–397. - [105] W. NAJJAR, W. BOHM, B. DRAPER, J. HAMMES, R. RINKER, J. BEVERIDGE, M. CHAWATHE, C. ROSS. *High-Level Language Abstraction for Reconfigurable Computing*, in "Computer", 2003, vol. 36, n<sup>o</sup> 8, p. 63-69, http://doi.ieeecomputersociety.org/10.1109/MC.2003.1220583. - [106] V. NOLLET, T. MARESCAUX, D. VERKEST, J. MIGNOLET, S. VERNALDE. *Operating-system controlled network on chip*, in "Proceedings of the 41st annual Conference on Design automation", 2004, p. 256–259, http://doi.acm.org/10.1145/996566.996637. - [107] . PHILIPS. Silicon Hive, Philips Inc., 2003, http://www.siliconhive.com. - [108] J. RABAEY. *Reconfigurable Processing: The Solution to Low-Power Programmable DSP*, in "IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)", 1997, vol. 1, p. 275–278. - [109] R. SALEH, S. WILTON, S. MIRABBASI, A. HU, M. GREENSTREET, G. LEMIEUX, P. PANDE, C. GRECU, A. IVANOV. *System-on-chip: reuse and integration*, in "Proceedings of the IEEE", 2006, vol. 94, n<sup>o</sup> 6, p. 1050–1069. - [110] T. TODMAN, G. CONSTANTINIDES, S. WILTON, O. MENCER, W. LUK, P. CHEUNG. *Reconfigurable computing: architectures and design methods*, in "IEE Proc.-Comput. Digit. Tech.", March 2005, vol. 152, n<sup>o</sup> 2. - [111] G. VENKATARAMANI, W. NAJJAR, F. KURDAHI, N. BAGHERZADEH, W. BOHM, J. HAMMES. *Automatic compilation to a coarse-grained reconfigurable system-on-chip*, in "Trans. on Embedded Computing Systems", 2003, vol. 2, n<sup>o</sup> 4, p. 560–589, http://doi.acm.org/10.1145/950162.950167. - [112] E. WAINGOLD, M. TAYLOR, D. SRIKRISHNA, V. SARKAR, W. LEE, V. LEE, J. KIM, M. FRANK, P. FINCH, R. BARUA, J. BABB, S. AMARASINGHE, A. AGARWAL. *Baring it all to software: The raw machine*, in "IEEE Computer", September 1997, vol. 30, n<sup>o</sup> 9, p. 86–93. - [113] C. WOLINSKI, K. KUCHCINSKI, A. POSTOLA. *UPaK: Abstract Unified Pattern Based Synthesis Kernel for Hardware and Software Systems*, in "University Booth, DATE 2007", Nice, France, May 2007. - [114] Z. A. YE, N. SHENOY, P. BANEIJEE. A C compiler for a processor with a reconfigurable functional unit, in "Proceedings of the 2000 ACM/SIGDA eighth international symposium on Field Programmable Gate-Arrays, FPGA '00", New York, NY, USA, ACM Press, 2000, p. 95–100, http://doi.acm.org/10.1145/329166.329187.