## **Activity Report 2013** # **Project-Team CAIRN** ## **Energy Efficient Computing Architectures** IN COLLABORATION WITH: Institut de recherche en informatique et systèmes aléatoires (IRISA) RESEARCH CENTER Rennes - Bretagne-Atlantique THEME Architecture, Languages and Compilation ## **Table of contents** | 1. | | | | | | |----|----------------------------|--------------------------------------------------------------------------------------|------------|--|--| | 2. | | ectives | . 2 | | | | | | 1 Objectives | 2 | | | | | | ghts of the Year | 4 | | | | 3. | | ogram | . 4 | | | | | 3.1. Panora | | 4 | | | | | 3.2. Reconf | figurable Architecture Design | 5 | | | | | 3.3. Compi | lation and Synthesis for Reconfigurable Platforms | 6 | | | | | | tion between Algorithms and Architectures | 7 | | | | 4. | Application I | Domains | . 7 | | | | | 4.1. Panora | ma | 7 | | | | | 4.2. 4G Wi | reless Communication Systems | 8 | | | | | 4.3. Wirele | ss Sensor Networks | 8 | | | | | 4.4. Multimedia processing | | | | | | 5. | Software and | l Platforms | . 9 | | | | | 5.1. Panora | ma | 9 | | | | | 5.2. Gecos | | 10 | | | | | 5.3. ID.Fix | : Infrastructure for the Design of Fixed-point Systems | 10 | | | | | 5.4. UPaK: | Abstract Unified Pattern-Based Synthesis Kernel for Hardware and Software Systems | 11 | | | | | 5.5. DURA | SE: Automatic Synthesis of Application-Specific Processor Extensions | 11 | | | | | 5.6. PowW | ow: Power Optimized Hardware and Software FrameWork for Wireless Motes (AP-I | <b>.</b> - | | | | | 10-01) | | 12 | | | | | 5.7. Ziggie | : a Platform for Wireless Body Sensor Networks | 12 | | | | | 5.8. SoCLi | b: Open Platform for Virtual Prototyping of Multi-Processors System on Chip | 14 | | | | 6. | <b>New Results</b> | | 14 | | | | | 6.1. Reconf | figurable Architecture Design | 14 | | | | | 6.1.1. A | rithmetic Operators for Cryptography and Fault-Tolerance | 14 | | | | | 6.1.2. Re | econfigurable Processor Extensions Generation | 15 | | | | | 6.1.3. R | untime Mapping of Hardware Accelerators on the FlexTiles 3D Self-Adaptive Hetero | )- | | | | | gen | eous Manycore | 15 | | | | | 6.1.4. Po | ower Models of Reconfigurable Architectures | 16 | | | | | 6.1.5. Re | eal-time Spatio-Temporal Task Scheduling on 3D Architecture | 17 | | | | | 6.1.6. U | ltra-Low-Power Reconfigurable Controllers | 17 | | | | | 6.2. Compi | lation and Synthesis for Reconfigurable Platform | 18 | | | | | 6.2.1. Po | olyhedral-Based Loop Transformations for High-Level Synthesis | 18 | | | | | 6.2.2. Co | ompiling for Embedded Reconfigurable Multi-Core Architectures | 18 | | | | | 6.2.3. N | umerical Accuracy Analysis and Optimization | 18 | | | | | 6.2.4. De | esign Tools for Reconfigurable Video Coding | 19 | | | | | 6.3. Interac | tion between Algorithms and Architectures | 20 | | | | | 6.3.1. D | esign Methodologies for Software Defined Radios | 20 | | | | | 6.3.2. A | daptive Precision under Performance Constraints in OFDM Wireless Receivers | 20 | | | | | 6.3.3. M | IMO Systems and Cooperative Strategies for Low-Energy Wireless Networks | 20 | | | | | 6.3.4. Ei | nergy Harvesting and Adaptive Wireless Sensor Networks | 21 | | | | | 6.3.5. In | npact of RF Front-End Nonlinearity on WSN Communications. | 21 | | | | | 6.3.6. | HarvWSNet: A Co-Simulation Framework for Energy Harvesting Wireless Sensor | r | | | | | | works. | 22 | | | | | 6.3.7. Sy | nchronisation Algorithms and Parallel Architecture for Wireless and High-Rate Optica | ıl | | | | | - | DM Systems | 22 | | | | 7. | | and Cooperations | 22 | | | | | 7.1. National Initiatives | 22 | |----|-----------------------------------------------------------------------------------|-----------| | | 7.1.1. ANR Blanc - PAVOIS (2012–2016) | 23 | | | 7.1.2. ANR INFRA 2011 - FAON (2012-2015) | 23 | | | 7.1.3. Equipex FIT - Future Internet (of Things) | 23 | | | 7.1.4. ANR Ingénérie Numérique et Sécurité - ARDyT (2011-2015) | 24 | | | 7.1.5. ANR Ingénérie Numérique et Sécurité - COMPA (2011-2015) | 24 | | | 7.1.6. ANR Ingénérie Numérique et Sécurité - DEFIS (2011-2015) | 24 | | | 7.1.7. ANR ARPEGE - GRECO (2010-2013) | 24 | | | 7.1.8. Images and Networks competitiveness cluster - 100GFlex project (2010-2013) | 24 | | | 7.2. European Initiatives | 25 | | | 7.2.1. FP7 FLEXTILES | 25 | | | 7.2.2. FP7 ALMA | 25 | | | 7.2.3. Collaborations with Major European Organizations | 26 | | | 7.3. International Initiatives | | | | 7.3.1. Inria International Partners | 26 | | | 7.3.1.1. Declared Inria International Partners | 26 | | | 7.3.1.2. Informal International Partners | 26 | | | 7.3.2. CNRS PICS - SPiNaCH (2012 - 2014) | 26 | | | 7.4. International Research Visitors | | | | 7.4.1. Visits of International Scientists | 27 | | | 7.4.2. Internships | 27 | | 8. | Dissemination | <b>27</b> | | | 8.1. Scientific Animation | 27 | | | 8.2. Seminars and Invitations | 28 | | | 8.3. Teaching - Supervision - Juries | 28 | | | 8.3.1. Teaching Responsibilities | 28 | | | 8.3.2. Teaching | 29 | | | 8.3.3. Supervision | 30 | | | 8.4. Popularization | 31 | | 9. | Bibliography | 32 | **Keywords:** Hardware Accelerators, Compiling, Embedded Systems, Energy Consumption, Parallelism, Wireless Sensor Networks, Security, Signal Processing, Reconfigurable Hardware, Computer Arithmetic, System-On-Chip CAIRN is a common project with CNRS, University of Rennes 1, and ENS Cachan-Antenne de Bretagne, and is located on two sites: Rennes and Lannion. The team has been created on January the 1<sup>st</sup>, 2008 and is a "reconfiguration" of the former R2D2 research team from Irisa. Creation of the Project-Team: 2009 January 01. ### 1. Members #### **Research Scientists** François Charot [Researcher (CR) Inria, Rennes] Olivier Sentieys [Team Leader, Senior Researcher (DR) Inria, Lannion, HdR] Arnaud Tisserand [Researcher (CR) CNRS, Lannion, HdR] ### **Faculty Members** Olivier Berder [Associate Professor, University of Rennes 1, ENSSAT, Lannion, HdR] Emmanuel Casseau [Professor, University of Rennes 1, ENSSAT, Lannion, HdR] Daniel Chillet [Associate Professor, University of Rennes 1, ENSSAT, Lannion, HdR] Antoine Courtay [Associate Professor, University of Rennes 1, ENSSAT, Lannion] Steven Derrien [Professor, University of Rennes 1, ISTIC, Rennes, HdR] Matthieu Gautier [Associate Professor, University of Rennes 1, IUT, Lannion] Cédric Killian [Associate Professor, University of Rennes 1, IUT, Lannion, from Sep 2013] Patrice Quinton [Professor, Director of ENS Rennes, Rennes, HdR] Romuald Rocher [Associate Professor, University of Rennes 1, IUT, Lannion] Pascal Scalart [Professor, University of Rennes 1, ENSSAT, Lannion, HdR] Baptiste Vrigneau [Associate Professor, University of Rennes 1, IUT, Lannion, from Sep 2013] Christophe Wolinski [Professor, University of Rennes 1, Director of ESIR, Rennes, HdR] ### **Engineers** Philippe Quémerais [Research Engineer (half time), University of Rennes 1, ENSSAT, Lannion] Arnaud Carer [Univ. Rennes I, 100Gflex Project, Lannion] Raphaël Bardoux [Univ. Rennes I, Faon Project, Lannion] Nicolas Simon [Univ. Rennes I, Defis Project, Lannion] Antoine Morvan [Univ. Rennes I, Alma Project, Rennes, from Jun 2013] Robin Bonamy [Univ. Rennes I, Greco and BoWI Projects, Lannion, from Jun 2013] Thomas Chabrier [CNRS, Ardyt Project, Lannion, from Jun 2013] Vaibhav Bhatnagar [Inria, SNOW Project, Lannion, until Dec 2013] Mickaël Le Gentil [Univ. Rennes I, BoWI Project, Lannion, until Sep 2013] Remi Pallas [Univ. Rennes I, POF Project, Lannion, until Dec 2013] Maxime Naullet [Univ. Rennes I, Alma Project, Rennes, until Jun 2013] #### **PhD Students** Amine Didioui [CEA Leti grant, Grenoble] Aymen Chakhari [Inria, Brittany Region grant, Lannion] Trong-Nhan Le [University grant, ANR Greco, Lannion] Pramod Udupa [University grant, FUI 100Gflex, Lannion] Ganda-Stéphane Ouedraogo [MENRT grant, Lannion] Karim Bigou [Inria/DGA grant, Lannion] Franck Bucheron [DGA, Rennes] Quang-Hai Khuat [University grant, Brittany Region/CG22, Lannion] Ali Hassan El-Moussawi [University grant, FP7 Alma, Rennes] Christophe Huriaux [MENRT grant, Lannion] Quang-Hoa Le [University grant, FP7 FlexTiles, Lannion] Jérémie Métairie [CNRS grant, ANR Pavois, Lannion] Viet-Hoa Nguyen [University grant, BoWI project, Lannion] Zhongwei Zheng [University grant, BoWI project, Lannion] Gaël Deest [MENRT grant, Rennes, from Oct 2013] Van Thiep Nguyen [USTH grant, Lannion, from Oct 2013] Rengarajan Ragavan [University grant, FP7 FlexTiles, Lannion, from Oct 2013] Mai-Thanh Tran [Brittany Region/CG22 University grant, Lannion, from Oct 2013] Xuan Chien Le [Inria, Brittany Region/LTC grant, Lannion, from Oct 2013] Florent Berthier [CEA Leti grant, Grenoble, from Oct 2013] Antoine Morvan [Inria grant, Nano2012 project, Rennes, until Jun 2013] Thomas Chabrier [Brittany Region/CG22 University grant, Lannion, until Jun 2013] Robin Bonamy [University grant, ANR OpenPeople, Lannion, until Jul 2013] Vivek D. Tovinakere [University grant, ITEA Geodes, Lannion, until Feb 2013] Mahtab Alam [University grant, ITEA Geodes, Lannion, until Jan 2013] Hervé Yviquel [MENRT grant, Lannion, until Jun 2013] #### **Post-Doctoral Fellows** Tomofumi Yuki [Inria, Rennes] Mythri Alle [Univ. Rennes I, Rennes] Nicolas Veyrat-Charvillon [CNRS, Lannion, from Oct 2013] Nicolas Estibals [ATER ENSSAT, Lannion, from Sep 2013] Aroua Briki [ATER ENSSAT, Lannion, from Sep 2013] Ruifeng Zhang [Univ. Rennes I, Lannion, until Apr 2013] Pascal Cotret [ATER ENSSAT, Lannion, until Aug 2013] Ammar El Falou [ATER ENSSAT, Lannion, until Aug 2013] Le-Quang-Vinh Tran [Univ. Rennes I, Lannion, until Aug 2013] ### **Administrative Assistants** Nadia Saintpierre [Assistant, Inria, Rennes] Angélique Le Pennec [Assistant, University of Rennes 1, Enssat, Lannion] ## 2. Overall Objectives ## 2.1. Overall Objectives **Abstract:** The CAIRN project-team researches new architectures, algorithms and design methods for flexible and energy efficiency domain-specific system-on-chip (SoC). As performance and energy-efficiency requirements of SoCs are continuously increasing, they become difficult to fulfill using only programmable processors solutions. To address this issue, we promote/advocate the use of reconfigurable hardware, i.e. hardware structures whose organization may change before or even during execution. Such reconfigurable SoCs offer high performance at a low energy cost, while preserving a high-level of flexibility. The group studies these SoCs from three angles: (i) The invention and design of new reconfigurable platforms with an emphasis on flexible arithmetic operator design, dynamic reconfiguration management and low-power consumption. (ii) The development of their corresponding design flows (compilation and synthesis tools) to enable their automatic design from high-level specifications. (iii) The interaction between algorithms and architectures especially for our main application domains (wireless communications, wireless sensor networks and digital security). The scientific goal of the CAIRN group is to research new hardware architectures of Reconfigurable System-on-Chips (RSoC) along with their associated design flows. RSoCs chips integrate reconfigurable blocks whose hardware structure may be adjusted before or even during a program execution. They originate from the possibilities opened up by Field Programmable Gate Arrays (FPGA) technology and by reconfigurable processors [89], [99]. Recent evolutions in technology and modern hardware systems confirm that reconfigurable systems are increasingly used in recent applications or embedded into more general system-on-chip (SoC) [104]. This architectural model has received a lot of attention in academia over the last decade [94], and is now considered for industrial use. One reason is the rapidly changing standards in communications and information security that require frequent device modifications. In many cases, software updates are not sufficient to keep devices on the market, while hardware redesigns remain too expensive. The need to continuously adapt the system to changing environments (e.g. cognitive radio) is another incentive to use dynamic reconfiguration at runtime. Last, with technologies at 65 nm and below, manufacturing problems strongly influence electrical parameters of transistors, and transient errors caused by particles or radiations will also appear more and more often during execution: error detection and correction mechanisms or autonomic self-control can benefit from reconfiguration capabilities. Standard processors or system-on-chips enable flexible software on fixed hardware, whereas reconfigurable platforms make possible *flexible software on flexible hardware*. As chip density increases [116], power efficiency has become "the Grail" of all chip architects, be they designing circuits for portable devices or for high-performance general-purpose processors. Indeed, power (or energy) constraints are now as equally important as performance constraints. Moreover, this power issue can often only be addressed through the use of a complete application-specific architecture, or by incorporating some application-specific components within a programmable SoC. Designers hence face a very difficult choice between the flexibility and short design time of programmable architectures and the power efficiency of specialized architecture. In this context, reconfigurable architectures are acknowledged for providing the best trade-off between power, performance, cost and flexibility. This efficiency stems from the fact that their hardware structure can be adapted to the application requirements [115], [99]. However, designing reconfigurable systems poses several challenges: first, the definition of the architecture structure itself along with its dynamic reconfiguration capabilities, and then, its corresponding compilation/synthesis tools. The scientific goal of CAIRN is therefore to leverage the background and past experience of its members to tackle these challenges. We therefore propose to approach energy efficient reconfigurable architectures from three angles: (i) the invention of new reconfigurable platforms, (ii) the development of their corresponding design and compilation tools, and (iii) the exploration of the interaction between algorithms and architectures. Wireless Communication is our privileged application domain, and it builds on our experience in 3G. Our research includes the prototyping of (subsets of) such applications on reconfigurable and programmable platforms. For this application domain, the high computational complexity of the Next-Generation (4G) Wireless Communication Systems calls for the design of highly specialized high-performance architectures. In Wireless Sensor Networks (WSN), where each wireless node is expected to operate without battery replacement for significant periods of time, energy consumption is the most important constraint. In this context, our research focuses on energy-efficient architectures and wireless cooperative techniques for WSN and wireless transmission in Intelligent Transportation Systems (ITS). Other important fields such as automotive, digital security and multimedia processing are also considered. Members of the CAIRN team have collaborations with large companies like STMicroelectronics (Grenoble), Technicolor (Rennes), Thales (Paris), Alcatel (Lannion), France-Telecom Orange Labs (Lannion), Atmel (Nantes), Xilinx (USA), SME like Geensys (Nantes), R-interface (Marseille), TeamCast/Ditocom (Rennes), Sensaris (Grenoble), Envivio (Rennes), InPixal (Rennes), Sestream (Paris), Ekinops (Lannion) and Institute like DGA (Rennes), CEA (Saclay, Grenoble). They are involved in several national or international funded projects (FP7 Alma, FP7 Flextiles, Nano2012 S2S4HLS and RECMOTIF projects, ANR funded Pavois, Ardyt, Defis, Faon, Compa, Greco, Ocelot and "Images&Networks Competitiveness Cluster" funded 100Gflex). ## 2.2. Highlights of the Year The paper has been nominated for the best paper award at IEEE/ACM ICCAD, one of the major event in Design Automation. BEST PAPER AWARD: [56] A Polynomial Time Algorithm for Solving the Word-length Optimization Problem in IEEE/ACM International Conference on Computer-Aided Design (ICCAD). K. PARASHAR, D. MENARD, O. SENTIEVS ## 3. Research Program ### 3.1. Panorama The development of complex applications is traditionally split in three stages: a theoretical study of the algorithms, an analysis of the target architecture and the implementation. When facing new emerging applications such as high-performance, low-power and low-cost mobile communication systems or smart sensor-based systems, it is mandatory to strengthen the design flow by a joint study of both algorithmic and architectural issues <sup>1</sup>. Figure 1. CAIRN's general design flow and related research themes <sup>&</sup>lt;sup>1</sup>Often referenced as algorithm-architecture mapping or interaction. Figure 1 shows the global design flow we propose to develop. This flow is organized in levels which refer to our three research themes: application optimization (new algorithms, fixed-point arithmetic and advanced representations of numbers), architecture optimization (reconfigurable and specialized hardware, application-specific processors), and stepwise refinement and code generation (code transformations, hardware synthesis, compilation). In the rest of this part, we briefly describe the challenges concerning **new reconfigurable platforms** in Section 3.2, the issues on **compiler and synthesis tools** related to these platforms in Section 3.3, and the remaining challenges in **algorithm architecture interaction** in Section 3.4. ### 3.2. Reconfigurable Architecture Design Over the last two decades, there has been a strong push of the research community to evolve static programmable processors into run-time dynamic and partial reconfigurable (DPR) architectures. Several research groups around the world have hence proposed reconfigurable hardware systems operating at various levels of granularity. For example, functional-level reconfiguration has been proposed to increase the efficiency of programmable processors without having to pay for the FPGAs penalties. These coarse-grained reconfigurable architectures (CGRAs) provide operator-level configurable functional blocks and word-level datapaths. The main goal of this class of architectures is to provide flexibility while minimizing reconfiguration overhead (there exists several recent surveys on this topic [119], [103], [84], [120]). Compared to fine-grained architectures, CGRAs benefit from a massive reduction in configuration memory and configuration delay, as well as a considerable reduction in routing and placement complexity. This, in turns, results in an improvement in the computation volume over energy cost ratio, even if it comes at the price of a loss of flexibility compared to bit-level operations. Such constraints have been taken into account in the design of DART [99][12], CRIP [87], Adres [111] or others [122]. These works have led to commercial products such as the Extreme Processor Platform (XPP) [88] from PACT or Montium <sup>2</sup> from Recore systems. Another strong trend is the design of hybrid architectures which combine standard GPP or DSP cores with arrays of *configurable elements* such as the Lx [102], or of *field-configurable elements* such as the Xirisc processor [109] and more recently by commercial platforms such as the Xilinx Zynq-7000. Some of their benefits are the following: functionality on demand (set-top boxes for digital TV equipped with decoding hardware on demand), acceleration on demand (coprocessors that accelerate computationally demanding applications in multimedia or communications applications), and shorter time-to-market (products that target ASIC platforms can be released earlier using reconfigurable hardware). Dynamic reconfiguration enables an architecture to adapt itself to various incoming tasks. This requires complex resource management and control which can be provided as services by a real-time operating system (RTOS) [110]: communication, memory management, task scheduling [98], [91][1] and task placement. Such an Operating System (OS) based approach has many advantages: it provides a complete design framework, that is independent of the technology and of the underlying hardware architecture, helping to drastically reduce the full platform design time. Due to the unpredictable execution of tasks, the OS must be able to allocate resource to tasks at run-time along with mechanisms to support inter-task communication. An efficient way to support such communications is to resort to a network-on-chip [117]. The role of the communication infrastructure is then to support transactions between different components of the platform, either between macro-components – main processor, dedicated modules, dynamically reconfigurable component – or within the elements of the reconfigurable components themselves. $<sup>^{2}</sup> http://www.recoresystems.com/technology/montium-technology/montium-architecture/\\$ In CAIRN we mainly target reconfigurable system-on-chip (RSoC) defined as a set of computing and storing resources organized around a flexible interconnection network and integrated within a single silicon chip (or programmable chip such as FPGAs). The architecture is customized for an application domain, and the flexibility is provided by both hardware reconfiguration and software programmability. Computing resources are therefore highly heterogeneous and raise many issues that we discuss in the following: - Reconfigurable hardware blocks with a dynamic behavior where reconfigurability can be achieved at the bit- or operator-level. Our research aims at defining new reconfigurable architectures including computing and memory resources. Since reconfiguration must happen as fast as possible (typically within a few cycles), reducing the configuration time overhead is also a key issue. - When performance and power consumption are major constraints, it is acknowledged that optimized specialized hardware blocks (often called IPs for Intellectual Properties) are the best (and often the only) solution. Therefore, we also study architecture and tools for specialized hardware accelerators and for multi-mode components. - Customized **processors with a specialized instruction-set** also offer a viable solution to trade between energy efficiency and flexibility. They are particularly relevant for modern FPGA platforms where many processor cores can be embedded. For this topic, we focus on the automatic generation of heterogeneous (sequential or parallel) reconfigurable processor extensions that are tightly coupled to processor cores. ### 3.3. Compilation and Synthesis for Reconfigurable Platforms In spite of their advantages, reconfigurable architectures lack efficient and standardized compilation and design tools. As of today, this still makes the technology impractical for large scale industrial use. Generating and optimizing the mapping from high-level specifications to reconfigurable hardware platforms is therefore a key research issue, and the problem has received considerable interest over the last years [114], [90], [121], [124]. In the meantime, the complexity (and heterogeneity) of these platforms has also been increasing quite significantly, with complex heterogeneous multi-cores architectures becoming a de facto standard. As a consequence, the focus of designers is now geared toward optimizing overall system-level performance and efficiency [105], [114], [113]. Here again, existing tools are not well suited, as they fail at providing a unified programming view of the programmable and/or reconfigurable components implemented on the platform. In this context we have been pursuing our efforts to propose tools whose design principles are based on a tight coupling between the compiler and the target hardware architectures. We build on the expertise of the team members in High Level Synthesis (HLS) [8], ASIP optimizing compilers [15] and automatic parallelization for massively parallel specialized circuits [6]. We first study how to increase the efficiency of standard programmable processor by extending their instruction set to speed-up compute intensive kernels. Our focus is on efficient and exact algorithms for the identification, selection and scheduling of such instructions [9]. We also propose techniques to synthesize reconfigurable (or multi-mode) architectures. We address these challenges by borrowing techniques from high-level synthesis, optimizing compilers and automatic parallelization, especially when dealing with nested loop kernels. The goal is then either to derive a custom fine-grain parallel architecture and/or to derive the configuration of a Coarse Grain Reconfigurable Architecture (CGRA). In addition, and independently of the scientific challenges mentioned above, proposing such flows also poses significant software engineering issues. As a consequence, we also study how leading edge Object Oriented software engineering techniques (Model Driven Engineering) can help the Computer Aided Design (CAD) and optimizing compiler communities prototyping new research ideas. Efficient implementation of multimedia and signal processing applications (in software for DSP cores or as special-purpose hardware) often requires, for reasons related to cost, power consumption or silicon area constraints, the use of fixed-point arithmetic, whereas the algorithms are usually specified in floating-point arithmetic. Unfortunately, fixed-point conversion is very challenging and time-consuming, typically demanding up to 50% of the total design or implementation time [92]. Thus, tools are required to automate this conversion. For hardware or software implementation, the aim is to optimize the fixed-point specification. The implementation cost is minimized under a numerical accuracy or an application performance constraint. For DSP-software implementation, methodologies have been proposed [107], [112] to achieve a conversion leading to an ANSI-C code with integer data types. For hardware implementation, the best results are obtained when the word-length optimization process is coupled with the high-level synthesis [106], [95]. Evaluating the effects of finite precision is one of the major and often the most time consuming step while performing fixed-point refinement. Indeed, in the word-length optimization process, the numerical accuracy is evaluated as soon as a new word-length is tested, thus, several times per iteration of the optimization process. Classical approaches are based on fixed-point simulations [96], [118]. They lead to long evaluation times and cannot be used to explore the entire design space. Therefore, our aim is to propose closed-form expressions of errors due to fixed-point approximations that are used by a fast analytical framework for accuracy evaluation. ### 3.4. Interaction between Algorithms and Architectures As CAIRN mainly targets domain-specific system-on-chip including reconfigurable capabilities, algorithmic-level optimizations have a great potential on the efficiency of the overall system. Based on the skills and experiences in "signal processing and communications" of some CAIRN's members, we conduct research on algorithmic optimization techniques under two main constraints: energy consumption and computation accuracy; and for two main application domains: fourth-generation (4G) mobile communications and wireless sensor networks (WSN). These application domains are very conducive to our research activities. The high complexity of the first one and the stringent power constraint of the second one, require the design of specific high-performance and energy-efficient SoCs. We also consider other applications such as video or bioinformatics, but this short state-of-the-art will be limited to wireless applications. The radio in both transmit and receive modes consumes the bulk of the total power consumption of the system. Therefore, protocol optimization is one of the main sources of significant energy reduction to be able to achieve self-powered autonomous systems. Reducing power due to radio communications can be achieved by two complementary main objectives: (i) minimizing the output transmit power while maintaining sufficient wireless link quality and (ii) minimizing useless wake-up and channel hearing while still being reactive. As the physical layer affects all higher layers in the protocol stack, it plays an important role in the energy-constrained design of WSNs. The question to answer can be summarized as: how much signal processing can be added to decrease the transmission energy (i.e. the output power level at the antenna) such that the global energy consumption be decreased? The temporal and spatial diversity of relay and multiple antenna techniques are very attractive due to their simplicity and their performance for wireless transmission over fading channels. Cooperative MIMO (multiple-input and multiple-output) techniques have been first studied in [100], [108] and have shown their efficiency in terms of energy consumption [97]. Our research aims at finding new energy-efficient cooperative protocols associating distributed MIMO with opportunistic and/or multiple relays and considering wireless channel impairments such as transmitters desynchronisation. Another way to reduce the energy consumption consists in decreasing the radio activity, controlled by the medium access (MAC) layer protocols. In this regard, low duty-cycle protocols, such as preamble-sampling MAC protocols, are very efficient because they improve the lifetime of the network by reducing the unnecessary energy waste [86]. As the network parameters (data rate, topology, etc.) can vary, we propose new adaptive MAC protocols to avoid overhearing and idle listening. Finally, MIMO precoding is now recognized as a very interesting technique to enhance the data rate in wireless systems, and is already used in Wi-Max standard (802.16e). This technique can also be used to reduce transmission energy for the same transmission reliability and the same throughput requirement. One of the most efficient precoders is based on the maximization of the minimum Euclidean distance ( $\max$ - $d_{min}$ ) between two received data vectors [93], but it is difficult to define the closed-form of the optimized precoding matrix for large MIMO system with high-order modulations. Our goal is to derive new generic precoders with simple expressions depending only on the channel angle and the modulation order. ## 4. Application Domains ### 4.1. Panorama **keywords:** telecommunications, wireless communications, wireless sensor networks, content-based image retrieval, video coding, intelligent transportation systems, automotive, security Our research is based on realistic applications, in order to both discover the main needs created by these applications and to invent realistic and interesting solutions. The high complexity of the **Next-Generation (4G) Wireless Communication Systems** leads to the design of real-time high-performance specific architectures. The study of these techniques is one of the main field of applications for our research, based on our experience on WCDMA for 3G implementation. In **Wireless Sensor Networks** (WSN), where each wireless node has to operate without battery replacement for a long time, energy consumption is the most important constraint. In this domain, we mainly study energy-efficient architectures and wireless cooperative techniques for WSN. **Intelligent Transportation Systems** (ITS), and especially Automotive Systems, more and more apply technology advances. While wireless transmissions allow a car to communicate with another or even with road infrastructure, **automotive industry** can also propose driver assistance and more secure vehicles thanks to improvements in computation accuracy for embedded systems. Other important fields will also be considered: hardware cryptographic and security modules, specialized hardware systems for the filtering of the network traffic at high-speed, high-speed true-random number generation for security, content-based image retrieval and video processing. ### 4.2. 4G Wireless Communication Systems With the advent of the next generation (4G) broadband wireless communications, the combination of MIMO (Multiple-Input Multiple-Output) wireless technology with Multi-Carrier CDMA (MC-CDMA) has been recognized as one of the most promising techniques to support high data rate and high performance. Moreover, future mobile devices will have to propose interoperability between wireless communication standards (4G, WiMax ...) and then implement MIMO pre-coding, already used by WiMax standard. Finally, in order to maximize mobile devices lifetime and guarantee quality of services to consumers, 4G systems will certainly use cooperative MIMO schemes or MIMO relays. Our research activity focuses on MIMO pre-coding and MIMO cooperative communications with the aim of algorithmic optimization and implementation prototyping. ### 4.3. Wireless Sensor Networks Sensor networks are a very dynamic domain of research due, on the one hand, to the opportunity to develop innovative applications that are linked to a specific environment, and on the other hand to the challenge of designing totally autonomous communicating objects. Cross-layer optimizations lead to energy-efficient architectures and cooperative techniques dedicated to sensor networks applications. In particular, cooperative MIMO techniques are used to decrease the energy consumption of the communications. ### 4.4. Multimedia processing In multimedia applications, audio and video processing is the major challenge embedded systems have to face. It is computationally intensive with power requirements to meet. Video or image processing at pixel level, like image filtering, edge detection and pixel correlation or at block-level such as transforms, quantization, entropy coding and motion estimation have to be accelerated. We investigate the potential of reconfigurable architectures for the design of efficient and flexible accelerators in the context of multimedia applications. ## 5. Software and Platforms ### 5.1. Panorama With the ever raising complexity of embedded applications and platforms, the need for efficient and customizable compilation flows is stronger than ever. This need of flexibility is even stronger when it comes to research compiler infrastructures that are necessary to gather quantitative evidence of the performance/energy or cost benefits obtained through the use of reconfigurable platforms. From a compiler point of view, the challenges exposed by these complex reconfigurable platforms are quite significant, since they require the compiler to extract and to expose an important amount of coarse and/or fine grain parallelism, to take complex resource constraints into consideration while providing efficient memory hierarchy and power management. Because they are geared toward industrial use, production compiler infrastructures do not offer the level of flexibility and productivity that is required for compiler and CAD tool prototyping. To address this issue, we have designed an extensible source-to-source compiler infrastructure that takes advantage of leading edge model-driven object-oriented software engineering principles and technologies. Figure 2. CAIRN's general software development framework. Figure 2 shows the global framework that is being developed in the group. Our compiler flow mixes several types of intermediate representations. The baseline representation is a simple tree-based model enriched with control flow information. This model is mainly used to support our source-to-source flow, and serves as the backbone for the infrastructure. We use the extensibility of the framework to provide more advanced representations along with their corresponding optimizations and code generation plug-ins. For example, for our pattern selection and accuracy estimation tools, we use a data dependence graph model in all basic blocks instead of the tree model. Similarly, to enable polyhedral based program transformations and analysis, we introduced a specific representation for affine control loops that we use to derive a Polyhedral Reduced Dependence Graph (PRDG). Our current flow assumes that the application is specified as a system level hierarchy of communicating tasks, where each task is expressed using C (or Scilab in the short future), and where the system level representation and the target platform model are defined using Domain Specific Languages (DSL). **Gecos** (Generic Compiler Suite) is the main backbone of CAIRN's flow. It is an open source Eclipse-based flexible compiler infrastructure developed for fast prototyping of complex compiler passes. Gecos is a 100% Java based implementation and is based on modern software engineering practices such as Eclipse plugin or model-driven software engineering with EMF (Eclipse Modeling Framework). As of today, our flow offers the following features: - An automatic floating-point to fixed-point conversion flow (for HLS and embedded processors). ID.Fix is an infrastructure for the automatic transformation of software code aiming at the conversion of floating-point data types into a fixed-point representation. http://idfix.gforge.inria.fr. - A polyhedral-based loop transformation and parallelization engine (mostly targeted at HLS). <a href="http://gecos.gforge.inria.fr">http://gecos.gforge.inria.fr</a>. It was used for source-to-source transformations in the context of Nano2012 projects in collaboration with STMicroelectronics. - A custom instruction extraction flow (for ASIP and dynamically reconfigurable architectures). Durase and UPaK are developed for the compilation and the synthesis targeting reconfigurable platforms and the automatic synthesis of application specific processor extensions. They use advanced technologies, such as graph matching and graph merging together with constraint programming methods. - Several back-ends to enable the generation of VHDL for specialized or reconfigurable IPs, and SystemC for simulation purposes (e.g. fixed-point simulations). ### **5.2. Gecos** Participants: Steven Derrien [corresponding author], Nicolas Simon, Maxime Naullet, Antoine Morvan. Keywords: source-to-source compiler, model-driven software engineering, retargetable compilation. The Gecos (Generic Compiler Suite) project is a source-to-source compiler infrastructure developed in the Cairn group since 2004. It was designed to enable fast prototyping of program analysis and transformation for hardware synthesis and retargetable compilation domains. Gecos is 100% Java based and takes advantage of modern model driven software engineering practices. It uses the Eclipse Modeling Framework (EMF) as an underlying infrastructure and takes benefits of its features to make it easily extensible. Gecos is open-source and is hosted on the Inria gforge at <a href="http://gecos.gforge.inria.fr">http://gecos.gforge.inria.fr</a>. The Gecos infrastructure is still under very active development, and serves as a backbone infrastructure to projects of the group. Part of the framework is jointly developed with Colorado State University and since 2012 it is used in the context of the ALMA European project. Developments in Gecos in 2013 have focused on polyhedral loop transformations and efficient SIMD code generation for fixed point arithmetic data-types as a part of the ALMA project. Significant efforts were also been put to provide a coarse-grain parallelization engine targeting the data-flow actor model in the context of the COMPA ANR project. An article describing the design choice and the main features of the framework was presented at the international workshop on Source Code Analysis and Manipulation in september 2013 [46]. ### 5.3. ID.Fix: Infrastructure for the Design of Fixed-point Systems Participants: Olivier Sentieys [corresponding author], Romuald Rocher, Nicolas Simon. Keywords: fixed-point arithmetic, source-to-source code transformation, accuracy optimization, dynamic range evaluation The different techniques proposed by the team for fixed-point conversion are implemented on the ID.Fix infrastructure. The application is described with a C code using floating-point data types and different pragmas, used to specify parameters (dynamic, input/output word-length, delay operations) for the fixed-point conversion. This tool determines and optimizes the fixed-point specification and then, generates a C code using fixed-point data types (ac\_fixed) from Mentor Graphics. The infrastructure is made-up of two main modules corresponding to the fixed-point conversion (ID.Fix-Conv) and the accuracy evaluation (ID.Fix-Eval) The different developments carried-out in 2013 allowed us to obtain a fixed-point conversion tool handling functions, conditional structures and repetitive structures having a fixed number of iterations during time. The frontend has been modified to reduce limitations due to syntax of C langage. A new data type (sc\_fixed) is now able to be generated from the back-end. In the context of the DEFIS ANR project, the ID.Fix tool has been reorganized to be integrated in the DEFIS toolflow. In 2013, ID.Fix has been demonstrated during University Booth at IEEE/ACM DATE and IEEE/ACM DAC. See <a href="http://www.youtube.com/watch?v=nKYA4hezplQ">http://www.youtube.com/watch?v=nKYA4hezplQ</a> # 5.4. UPaK: Abstract Unified Pattern-Based Synthesis Kernel for Hardware and Software Systems **Participants:** Christophe Wolinski [corresponding author], François Charot, Antoine Floc'H [former member]. Keywords: compilation for reconfigurable systems, pattern extraction, constraint-based programming. We are developing (with strong collaboration of Lund University, Sweden and Queensland University, Australia) UPaK Abstract Unified Pattern Based Synthesis Kernel for Hardware and Software Systems [123]. The preliminary experimental results obtained by the UPak system show that the methods employed in the systems enable a high coverage of application graphs with small quantities of patterns. Moreover, high application execution speed-ups are ensured, both for sequential and parallel application execution with processor extensions implementing the selected patterns. UPaK is one of the basis for our research on compilation and synthesis for reconfigurable platforms. It is based on the HCDG representation of the Polychrony software designed at Inria-Rennes in the project-team Espresso. # 5.5. DURASE: Automatic Synthesis of Application-Specific Processor Extensions Participants: Christophe Wolinski [corresponding author], François Charot, Antoine Floc'H. Keywords: compilation for reconfigurable systems, instruction-set extension, pattern extraction, graph covering, constraint-based programming. We are developing a framework enabling the automatic synthesis of application specific processor extensions. It uses advanced technologies, such as algorithms for graph matching and graph merging together with constraints programming methods. The framework is organized around several modules. - CoSaP: Constraint Satisfaction Problem. The goal of CoSaP is to decouple the statement of a constraint satisfaction problem from the solver used to solve it. The CoSaP model is an Eclipse plugin described using EMF to take advantage of the automatic code generation and of various EMF tools. - HCDG: Hierarchical Conditional Dependency Graph. HCDG is an intermediate representation mixing control and data flow in a single acyclic representation. The control flow is represented as hierarchical guards specifying the execution or the definition conditions of nodes. It can be used in the Gecos compilation framework via a specific pass which translates a CDFG representation into an HCDG. Patterns: Flexible tools for identification of computational pattern in a graph and graph covering. These tools model the concept of pattern in a graph and provide generic algorithms for the identification of pattern and the covering of a graph. The following sub-problems are addressed: (sub)-graphs isomorphism, patterns generation under constraints, covering of a graph using a library of patterns. Most of the implemented algorithms use constraints programming and rely on the CoSaP module to solve the optimization problem. # **5.6.** PowWow: Power Optimized Hardware and Software FrameWork for Wireless Motes (AP-L-10-01) Participants: Olivier Sentieys [corresponding author], Olivier Berder, Arnaud Carer, Steven Derrien. Keywords: Wireless Sensor Networks, Low Power, Preamble Sampling MAC Protocol, Hardware and Software Platform PowWow is an open-source hardware and software platform designed to handle wireless sensor network (WSN) protocols and related applications. Based on an optimized preamble sampling medium access (MAC) protocol, geographical routing and protothread library, PowWow requires a lighter hardware system than Zigbee [85] to be processed (memory usage including application is less than 10kb). Therefore, network lifetime is increased and price per node is significantly decreased. CAIRN's hardware platform (see Figure 3) is composed of: - The motherboard, designed to reduce power consumption of sensor nodes, embeds an MSP430 microcontroller and all needed components to process PowWow protocol except radio chip. JTAG, RS232, and I2C interfaces are available on this board. - The radio chip daughter board is currently based on a TI CC2420. - The coprocessing daughter board includes a low-power FPGA which allows for hardware acceleration for some PowWow features and also includes dynamic voltage scaling features to increase power efficiency. The current version of PowWow integrates an Actel IGLOO AGL250 FPGA and a programmable DC-DC converter. We have shown that gains in energy of up to 700 can be obtained by using FPGA acceleration on functions like CRC-32 or error detection with regards to a software implementation on the MSP430. - Finally, a last daughter board is dedicated to energy harvesting techniques. Based on the energy management component LTC3108 from Linear Technologies, the board can be configured with several types of stored energy (batteries, micro-batteries, super-capacitors) and several types of energy sources (a small solar panel to recover photovoltaic energy, a piezoelectric sensor for mechanical energy and a Peltier thermal energy sensor). PowWow distribution also includes a generic software architecture using event-driven programming and organized into protocol layers (PHY, MAC, LINK, NET and APP). The software is based on Contiki [101], and more precisely on the Protothread library which provides a sequential control flow without complex state machines or full multi-threading. To optimize the network regarding a particular application and to define a global strategy to reduce energy, PowWow offers the following extra tools: over-the-air reprogramming (and soon reconfiguration), analytical power estimation based on software profiling and power measurements, a dedicated network analyzer to probe and fix transmissions errors in the network. More information can be found at <a href="http://powwow.gforge.inria.fr">http://powwow.gforge.inria.fr</a>. ### 5.7. Ziggie: a Platform for Wireless Body Sensor Networks **Participants:** Olivier Sentieys, Olivier Berder, Arnaud Carer, Antoine Courtay [corresponding author], Robin Bonamy. Keywords: Wireless Body Sensor Networks, Low Power, Gesture Recognition, Localization, Hardware and Software Platform Figure 3. CAIRN's PowWow motherboard with radio and energy-harvesting boards connected The Zyggie sensor node has been developed in the team to create an autonomous Wireless Body Sensor Network (WBSN) with the capabilities of monitoring body movements. The Zyggie platform is part of the BoWI project funded by CominLabs. Zyggie is composed of: - An ATMEGA128RFA1 microcontroller, - An MPU9150 Inertial Measurement Unit (IMU), - An RF AS193 switch with two antennas, - An LSP331AP barometer, - A DC/DC voltage regulator with a battery charge controller, - A wireless inductive battery charge controller and - Some switches and control LEDs. Figure 4. CAIRN's Ziggie platform for WBSN The IMU is composed of a 3-axis accelerometer, a 3-axis gyrometer and a 3-axis magnetometer. The IMU is communicating its data to the embedded microcontroller via an I2C protocol. We also developed our own MAC protocol for synchronization and data exchanges between nodes. The Zyggie platform is used in many PhD works for evaluating data fusion algorithms (RSSI + IMU data) (Zhongwei Zheng, UR1 and Alexis Aulery, UBS/UR1), low power computing algorithms (Alexis Aulery, UBS/UR1), wireless protocols (Viet Hoa Nguyen, UR1) and body channel characterization (Rizwan Masood, TB). # **5.8. SoCLib: Open Platform for Virtual Prototyping of Multi-Processors System on Chip** Participants: François Charot [corresponding author], Laurent Perraudeau [external collaborator]. Keywords: SoC modeling, SystemC simulation model SoCLib is an open platform for virtual prototyping of multi-processors system on chip (MP-SoC) developed in the framework of the SoCLib ANR project. The core of the platform is a library of SystemC simulation models for virtual components (IP cores), with a guaranteed path to silicon. All simulation models are written in SystemC, and can be simulated with the standard SystemC simulation environment distributed by the OSCI organization. Two types of models are available for each IP-core: CABA (Cycle Accurate / Bit Accurate), and TLM-DT (Transaction Level Modeling with Distributed Time). All simulation models are distributed as free software. We have developed the simulation model of the NIOSII processor, of the Altera Avalon interconnect, and of the TMS320C62 DSP processor from Texas Instruments. Find more information on its dedicated web page: http://www.soclib.fr. ### 6. New Results ## 6.1. Reconfigurable Architecture Design ### 6.1.1. Arithmetic Operators for Cryptography and Fault-Tolerance **Participants:** Arnaud Tisserand, Emmanuel Casseau, Thomas Chabrier, Karim Bigou, Franck Bucheron, Jérémie Métairie, Nicolas Veyrat-Charvillon, Nicolas Estibals. Arithmetic Operators for Fast and Secure Cryptography. Scalar recoding is popular to speed up ECC (elliptic curve cryptography) scalar multiplication: non-adjacent form, double-base number system, multiplase number system (MBNS). But fast recoding methods require pre-computations: multiples of base point or off-line conversion. In paper [42] presented at ARITH, we presented a multi-base (e.g. (2,3,5,7)) recoding method for ECC scalar multiplication based on i) a greedy algorithm starting least significant terms first, ii) cheap divisibility tests by multi-base elements and iii) fast exact divisions by multi-base elements. Multi-base terms are obtained on-the-fly using a special recoding unit which operates in parallel to curve-level operations and at very high speed. This ensures that all recoding steps are performed fast enough to schedule the next curve-level operations without interruptions. The proposed method can be fully implemented in hardware without pre-computations. We report FPGA implementation details and very good performance compared to state-of-art results. A specific version of our method allows random recodings of the scalar which can be used as a partial counter-measure against side-channel attacks. The PhD thesis defended by Thomas Chabrier [18] deals with MBNS and other types of arithmetic recodings for ECC scalar multiplication (title: "Arithmetic recodings for ECC cryptoprocessors with protections against side-channel attacks"). In the paper [67], presented at ComPAS, we presented efficient arithmetic operators for divisibility tests and modulo operations for large operands (e.g. 160-600 bits like in cryptographic applications) and by a set of small constants such as $(2^a, 3, 5, 7, 9)$ where $1 \le a \le 12$ . These operators have been validated and implemented on FPGAs. In the paper [39] presented at CHES, we described a new RNS modular inversion algorithm based on the extended Euclidean algorithm and the plus-minus trick. In our algorithm, comparisons over large RNS values are replaced by cheap computations modulo 4. Comparisons to an RNS version based on Fermat's little theorem were carried out. Comparisons to a version based on Fermat's little theorem were carried out. The number of elementary modular operations is significantly reduced: a factor 12 to 26 for multiplications and 6 to 21 for additions. Virtex 5 FPGAs implementations show that for a similar area, our plus-minus RNS modular inversion is 6 to 10 times faster. Other implementation results of RNS for ECC cryptosystems have been presented in [75] and [74]. ECC Processor with Protections Against SCA. A dedicated processor for elliptic curve cryptography (ECC) is under development. Functional units for arithmetic operations in $GF(2^m)$ and GF(p) finite fields and 160-600-bit operands have been developed for FPGA implementation. Several protection methods against side channel attacks (SCA) have been studied. The use of some number systems, especially very redundant ones, allows one to change the way some computations are performed and then their effects on side channel traces. This work is done in the PAVOIS project. Arithmetic Operators for Fault Tolerance. In the ARDyT project, we work on computation algorithms, representations of numbers and hardware implementations of arithmetic operators with integrated fault detection (and/or fault tolerance) capabilities. The target arithmetic operators are: adders, subtracters, multipliers (and variants of multiplications by constants, square, FMA, MAC), division, square-root, approximations of the elementary functions. We study two approaches: residue codes and specific bit-level coding in some redundant number systems for fault detection/tolerance integration at the arithmetic operator/unit level. FPGA prototypes are under development. ### 6.1.2. Reconfigurable Processor Extensions Generation Participants: Christophe Wolinski, François Charot. Most proposed techniques for automatic instruction sets extension usually dissociate pattern selection and instruction scheduling steps. The effects of the selection on the scheduling subsequently produced by the compiler must be predicted. This approach is suitable for specialized instructions having a one-cycle duration because the prediction will be correct in this case. However, for multi-cycle instructions, a selection that does not take scheduling into account is likely to privilege instructions which will be, *a posteriori*, less interesting than others in particular in the case where they can be executed in parallel with the processor core. The originality of our research work is to carry out specialized instructions selection and scheduling in a single optimization step. This complex problem is modeled and solved using constraint programming techniques. This approach allows the features of the extensible processor to be taken into account with a high degree of flexibility. Different architectures models can be envisioned. This can be an extensible processor tightly coupled to a hardware extension having a minimal number of internal registers used to store intermediate results, or a VLIW-oriented extension made up of several processing units working in parallel and controlled by a specialized instruction. These techniques have been implemented in the Gecos source-to-source framework. Novel techniques addressing the interactions between code transformation (especially loops) and instruction set extension are under study. The idea is to automatically transform the original loop nests of a program (using the polyhedral model) to select specialized and vector instructions. These new instructions may use local memories located in the hardware extension and used to store intermediates data produced at a given loop iteration. Such transformations lead to patterns whose effect is to significantly reduce the pressure on the memory of the processor. An experiment realized on the matrix multiplication (extracted from PolyBench/C, the polyhedral benchmark suite) using an Xtensa extensible and configurable processor from Tensilica shows interesting speedups. Speedup of 4.3 for the transformed code compared to the initial code for matrices of size 512x512 and speedup of 8.75 (respectively 20.15) in case of an extension allowing SIMD vector operations on vector of 4 32-bit words (respectively 16 32-bit words) are observed. ## 6.1.3. Runtime Mapping of Hardware Accelerators on the FlexTiles 3D Self-Adaptive Heterogeneous Manycore Participants: Olivier Sentieys, Antoine Courtay, Christophe Huriaux. FlexTiles is a 3D stacked chip with a manycore layer and a reconfigurable layer. This heterogeneity brings a high level of flexibility in adapting the architecture to the targeted application domain for performance and energy efficiency. A virtualisation layer on top of a kernel hides the heterogeneity and the complexity of the manycore and fine-tunes the mapping of an application at runtime. The virtualisation layer provides self-adaptation capabilities by dynamically relocation of application tasks to software on the manycore or to hardware on the reconfigurable area. This self-adaptation is used to optimize load balancing, power consumption, hot spots and resilience to faulty modules. The reconfigurable technology is based on a Virtual Bit-Stream (VBS) that allows dynamic relocation of accelerators just as software based on virtual binary code allows task relocation. We have proposed a novel approach to hardware task relocation in an FPGA-based reconfigurable fabric, allowing offline design, routing, and unfinalized placement of hardware IPs and dynamic placement of the corresponding bit-streams at run-time. Our proposal relies on a custom dual-context FPGA configuration memory organization in a shift-register manner and on a dedicated bit-stream insertion controller leading to a break-through in terms of adaptive capabilities of the reconfigurable hardware. We show that using our custom shift-register organization across the configuration memory, and under some weak constraints, can greatly reduce the overhead implied by the 1-D to 2-D mapping of the shift-register onto the logic fabric. The use of partial dynamic reconfiguration in FPGA-based systems has grown in recent years as the spectrum of applications which use this feature has increased. For these systems, it is desirable to create a series of partial bitstreams which represent tasks that can be located in multiple regions in the FPGA substrate. While the transferal of homogeneous collections of lookup-table based logic blocks from region to region has been shown to be relatively straightforward, it is more difficult to transfer partial bitstreams which contain fixed function resources, such as block RAMs and DSP blocks. To do so, we explore adding enhancements to the FPGA architecture which allow for the migration of partial bitstreams including fixed resources from region to region even if these fixed function resources are not located in the same position in the region. Our approach does not require significant, time-consuming place-and-route during the migration process. We quantify the cost of inserting additional routing resources into the FPGA architecture to allow for easy migration of heterogeneous, fixed function resources. Our experiments show that this flexibility can be added for a relatively low overhead and performance penalty. As mentioned above, the Virtual Bit-Stream (VBS) is a concept of an unfinalized, pre-routed bit-stream which could be loaded almost anywhere on a custom FPGA logic fabric. Unlike classical bit-streams, the VBS is not tied to a specific location on the circuit, hence its "virtual" qualifier. The goal is to generate a single VBS only once for each and every possible location of the logic fabric in the FPGA in a unfinished manner: the time-consuming packing, place and route steps are done offline and only local routing is done at runtime in order to ensure fast decoding time as well as low memory overhead. The VBS concept is pending for a European patent application. ### 6.1.4. Power Models of Reconfigurable Architectures Participants: Robin Bonamy, Daniel Chillet, Olivier Sentieys. Including a reconfigurable area in complex systems-on-chip is considered as an interesting solution to reduce the area of the global system and to support high performance. But the key challenge in the context of embedded systems is currently the power budget and the designer needs some early estimations of the power consumption of its system. Power estimation for reconfigurable systems is a difficult issue since several parameters need to be taken into account to define an accurate model. One first parameter concerns the choice of tasks to execute and their allocation in the computing resources. Indeed, several hardware implementations of an algorithm can be obtained and exploited by the operating system for a flexible allocation of tasks to optimize energy consumption. These different hardware implementations can be obtained by varying the parallelism level, which has a direct impact on area and execution time and therefore on power and energy consumption. To highlight this point, we made several evaluations of delay, area, power, and energy impacts of loop transformations using High Level Synthesis tools. Real power measurements have been made on an FPGA platform and for different task implementations to build a model of energy consumption versus execution time. Furthermore, we also considered the opportunity of the dynamic reconfiguration, which makes possible to partially reconfigure a specific part of the circuit while the rest of the system is running. This opportunity has two main effects on power consumption. First, thanks to the area sharing ability, the global size of the device can be reduced and the static (leakage) power consumption can thus be reduced. Secondly, it is possible to delete the configuration of a part of the device which reduces the dynamic power consumption when a task is no longer used. We analyzed the power consumption during the dynamic reconfiguration on a Virtex 5 board. Three models of the partial and dynamic reconfiguration power consumption with different complexity/accuracy tradeoffs are extracted. These models are used in design space exploration to include impact of reconfiguration on energy consumption of a complete system. We proposed a methodology for power/energy consumption modeling and estimation in the context of heterogeneous (multi)processor(s) and dynamically reconfigurable hardware systems. We developed an algorithm to explore all task mapping possibilities for a complete application (e.g. for H264 video coding) with the aim to extract one of the best solutions with respect to the designer's constraints. This algorithm is a step ahead for defining on-line power management strategies to decide which task instances must be executed to efficiently manage the available power using dynamic partial reconfiguration. All these results are presented in the Robin Bonamy's thesis [17] ### 6.1.5. Real-time Spatio-Temporal Task Scheduling on 3D Architecture Participants: Quang-Hai Khuat, Quang-Hoa Le, Emmanuel Casseau, Antoine Courtay, Daniel Chillet. One of the main advantages offered by a three-dimensional system-on-chip (3D SoC) is the reduction of wire length between different blocks of a system, thus improving circuit performance and alleviating power overheads of on-chip wiring. To fully exploit this advantage, an efficient management referring to allocate temporarily the tasks at different levels of the architecture is greatly important. In the context of 3D SoC, we have developed several spatio-temporal scheduling algorithms for 3D MultiProcessor Reconfigurable System-on-Chip (3DMPRSoC) architectures composed of a multiprocessor layer and an embedded Field Programmable Gate Array (eFPGA) layer with dynamic reconfiguration. These two layers are interconnected vertically by through-silicon vias (TSVs) ensuring tight coupling between software tasks on processors and associated hardware accelerators on the eFPGA. Our algorithms cope with task dependencies and try to allocate communicating tasks close to each other in order to reduce direct communication cost, thus reducing global communication cost. In the 3DMPRSoC context, our algorithms favor direct communications including: i) point-to-point communication between hardware accelerators on the eFPGA, ii) communication between software tasks through the Network-on-Chip of the multiprocessor layer, and iii) communication between software task and accelerator through TSV. When a direct communication between two tasks occurs, the data are stored in a shared memory placed onto the multiprocessor layer. Our work in [68] takes all types of communication into consideration and proposes a scheduling and placement strategy of tasks reducing the global communication cost to 17% compared with our previous algorithm based on Pfair. In this work, the eFPGA layer of the 3DMPRSoC is supposed to contain homogeneous partial reconfiguration regions (PRR) and the size of a hardware accelerator is limited by the size of a PRR. To exceed this limitation, we analyzed the Vertex-List Structure (VLS) method for relocating hardware accelerators of various sizes anywhere onto the eFPGA if resources are available. Then, we proposed VLS-BCF algorithm [49] based on VLS that allows for reducing the overall communication cost significantly – up to 24% – compared to classical methods. ### 6.1.6. Ultra-Low-Power Reconfigurable Controllers Participants: Vivek D. Tovinakere, Olivier Sentieys, Steven Derrien. A key concern in the design of controllers in wireless sensor network (WSN) nodes is the flexibility to execute different control tasks for managing resources, sensing and communications tasks of the node. In this paper, low-power flexible controllers for WSN nodes based on reconfigurable microtasks are presented. A microtask is a digital control unit made up of an FSM and datapath. Scalable architectures for reconfigurable FSMs along with variable precision adders in datapath are proposed for flexible controllers. Power gating as a low power technique is considered for low power operation in reconfigurable microtasks by exploiting coarse grain power gating opportunities in FSMs and adders. Gate-level models are applied to analyze energy savings in logic clusters due to power gating. Power estimation results on typical benchmark microtasks show a $2\times$ to $5\times$ improvement in energy efficiency w.r.t a microcontroller at a cost of $5\times$ when compared with a microtask implemented as an ASIC with higher NRE costs [21]. ## 6.2. Compilation and Synthesis for Reconfigurable Platform ### 6.2.1. Polyhedral-Based Loop Transformations for High-Level Synthesis Participants: Steven Derrien, Antoine Morvan, Patrice Quinton, Tomofumi Yuki, Mythri Alle. After almost two decades of research effort, there now exists a large choice of robust and mature C to hardware tools that are used as production tools by world-class chip vendor companies. Although these tools dramatically slash design time, their ability to generate efficient accelerators is still limited, and they rely on the designer to expose parallelism and to use appropriate data layout in the source program. We believe this can be overcome by tackling the problem directly at the source level, using source-to-source optimizing compilers. More precisely, our aim is to study how polyhedral-based program analysis and transformation can be used to address this problem. In the context of the PhD of Antoine Morvan, we have studied how it was possible to improve the efficiency and applicability of nested loop pipelining (also known as nested software pipelining) in C to hardware tools. Loop pipelining is a key transformation in high-level synthesis tools as it helps maximizing both computational throughput and hardware utilization. We have first studied how polyhedral based loop transformations (such as coalescing) could be used to improve the efficiency of pipelining small trip-count inner loops [27] and implemented the transformation in the Gecos source to source toolbox. We also have proposed a technique to widen the applicability of loop pipelining to kernels exposing complex dynamic memory access patterns for which compile time dependency analysis techniques cannot be used efficiently. Our approach borrows from the notion of runtime memory disambiguation used in super scalar processors to add a data dependency hazards detection mechanism to the synthesized circuits. The approach has shown promising results and led to a presentation presented at the 50th ACM/IEEE Design Automation Conference [37]. In addition to our work on nested loop pipelining, we also investigated how to extend existing polyhedral code generation techniques to enable the synthesis of fast and area-efficient control-logic. Our approach was implemented in the Gecos framework and presented at the Field Programmable Technology international conference in late 2013 [63]. ### 6.2.2. Compiling for Embedded Reconfigurable Multi-Core Architectures **Participants:** Steven Derrien, Olivier Sentieys, Maxime Naullet, Antoine Morvan, Tomofumi Yuki, Ali Hassan El-Moussawi. Current and future wireless communication and video standards have huge processing power requirements, which cannot be satisfied with current embedded single processor platforms. Most platforms now therefore integrate several processing core within a single chip, leading to what is known as embedded multi-core platforms. This trend will continue, and embedded system design will soon have to implement their systems on platforms comprising tens if not hundred of high performance processing cores. Examples of such architectures are the Xentium processor from by Recore or the Kahrisma processor, a radically new concept of morphable processor from Karlsruhe Institute of Technology (KIT). This evolution will pose significant design challenges, as parallel programming is notoriously difficult, even for domain experts. In the context of the FP7 European Project Alma (Architecture-oriented parallelization for high performance embedded Multicore systems using scilAb), we are studying how to help designers programming these platforms by allowing them to start from a specification in Matlab and/or Scilab, which are widely used for prototyping image/video and wireless communication applications. Our research work in this field revolves around two topics. The first one aims at exploring how floating-point to fixed-point conversion can be performed jointly with the SIMD instruction selection stage to explore performance/accuracy trade-off in the software final implementation. The second one aims at exploring how program transformation techniques (leveraging the polyhedral model and/or based on the domain specific semantics of scilab built-in functions) can be used to enable an efficient coarse grain parallelization of the target application on such multi-core machines [30]. ### 6.2.3. Numerical Accuracy Analysis and Optimization **Participants:** Olivier Sentieys, Steven Derrien, Romuald Rocher, Pascal Scalart, Tomofumi Yuki, Aymen Chakhari, Gaël Deest. Most of analytical methods for numerical accuracy evaluation use perturbation theory to provide the expression of the quantization noise at the output of a system. Existing analytical methods do not consider correlation between noise sources. This assumption is no longer valid when a unique datum is quantized several times. In [35], an analytical model of the correlation between quantization noises is provided. The different quantization modes are supported and the number of eliminated bits is taken into account. The expression of the power of the output quantization noise is provided when the correlation between the noise sources is considered. The proposed approach allows improving significantly the estimation of the output quantization noise power compared to the classical approach, with a slight increase of the computation time. Trading off accuracy to the system costs is popularly addressed as the word-length optimization (WLO) problem. Owing to its NP-hard nature, this problem is solved using combinatorial heuristics. In [56], a novel approach is taken by relaxing the integer constraints on the optimization variables and obtain an alternate noise-budgeting problem. This approach uses the quantization noise power introduced into the system due to fixed-point word-lengths as optimization variables instead of using the actual integer valued fixed-point word-lengths. The noise-budgeting problem is proved to be convex in the rounding mode quantization case and can therefore be solved using analytical convex optimization solvers. An algorithm with linear time complexity is provided in order to realize the actual fixed-point word-lengths from the noise budgets obtained by solving the convex noise-budgeting problem. An analytical approach is studied to determine accuracy of systems including unsmooth operators. An unsmooth operator represents a function which is not derivable in all its definition interval (for example the sign operator). The classical model is no longer valid since these operators introduce errors that do not respect the Widrow assumption (their values are often higher than signal power). So an approach based on the distribution of the signal and the noise was proposed. We focused on recursive structures where an error influences future decision (such as Decision Feedback Equalizer). In that case, numerical analysis method (e.g. Newton Raphson algorithm) can be used. Moreover, an upper bound of the error probability can be analytically determined [43]. We also studied the case of Turbo Coder and Decoder to determine data word-length ensuring sufficient system quality. One of the limitation of analytical accuracy technique is that they are based on a Signal Flow Graph Representation of the system to be analyzed. This SFG model is currently built-out of a source program by flattening its whole control-flow (including full loop unrolling) which raises significant accuracy analysis issues. In 2013 we have started studying how we could bridge numerical analysis techniques with more compact polyhedral program representations to provide a more general and scalable framework. ### 6.2.4. Design Tools for Reconfigurable Video Coding Participants: Emmanuel Casseau, Hervé Yviquel. In the field of multimedia coding, standardization recommendations are always evolving. To reduce design time taking benefit of available SW and HW designs, Reconfigurable Video Coding (RVC) standard allows defining new codec algorithms. The application is represented by a network of interconnected components (so called actors) defined in a modular library and the behaviour of each actor is described in the specific RVC-CAL language. Dataflow programming, such as RVC applications, express explicit parallelism within an application. However general purpose processors cannot cope with both high performance and low power consumption requirements embedded systems have to face. We have investigated the mapping of RVC applications onto a dedicated multiprocessor platform. Actually, our goal is to propose an automated codesign flow based on the RVC framework. The designer provides the application description in the RVC-CAL language, after which the co-design flow automatically generates a network of processors that can be synthesized on FPGA platforms. The processors are based on a low complexity and configurable TTA processor (Very Long Instruction Word -style processor). The architecture model of the platform is composed of processors with their local memories, an interconnection network and shared memories. Both shared and local memories are used to limit the traditional memory bottleneck. Processors are connected together through the shared memories. The design flow is implemented around two open-source toolsets: Orcc (Open RVC-CAL Compiler: http://orcc.sourceforge.net) and TCE (TTA-based Co-design Environment: http://tce.cs.tut.fi). The inputs of the design flow are the RVC application, the platform configuration (i.e. the configuration of the TTA processors and their number), and the mapping specification (i.e. the mapping of the actors onto the processors). Orcc generates a high-level description of the processors, an intermediate representation of the software code associated to each actor, and the processor interconnection requirements. Then TCE uses these informations to generate a complete multi-processor platform design: the VHDL descriptions of the processors using a pre-existing database of hardware components and the executable binary code that will execute the actors on the processors. This work is done in collaboration with Mickael Raulet from IETR INSA Rennes and has been implemented in the Orcc open-source compiler and with Jarmo Takala team from Tampere University of Technology (Finland) who is involved in the TCE toolset. ### 6.3. Interaction between Algorithms and Architectures ### 6.3.1. Design Methodologies for Software Defined Radios **Participants:** Matthieu Gautier, Olivier Sentieys, Emmanuel Casseau, Arnaud Carer, Ganda-Stéphane Ouedraogo, Mai-Thanh Tran, Vaibhav Bhatnagar. Software Defined Radio (SDR) is a flexible signal processing architecture with reconfiguration capabilities that can adapt itself to various air interfaces. It was first introduced by Joseph Mitola as an underlying structure for Cognitive Radio (CR). The FPGA (Field Programmable Gate Array) technology is expected to play a key role in the development of SDR platforms. FPGA-based SDR is a quite old paradigm and we are fronting this challenge while leveraging the nascent High Level Synthesis tools and languages. Actually, our goal is to propose methods and tools for rapid implementation of new waveforms in the stringent flexibility paradigm. We proposed a novel design flow for FPGA-based SDR applications [38] [70]. This flow relies upon HLS principles and its entry point is a Domain-Specific Language (DSL) which partly handles the complexity of programming an FPGA and integrates SDR features. ### 6.3.2. Adaptive Precision under Performance Constraints in OFDM Wireless Receivers Participants: Olivier Sentieys, Matthieu Gautier, Fernando Cladera [Master's Student]. To cope with rapid variations of channel parameters, wireless receivers are designed with a significant performance margin to reach a given Bit Error Rate (BER), even for worst-case channel conditions. Significant energy savings come from varying at run time processing bit-width, based on estimation of channel conditions, without compromising BER constraints. To validate the energy savings, the energy consumption of basic operators has been obtained from real measurements for different bit-widths on an FPGA and an ARM processor using soft SIMD. Results show that up to 66% of the dynamic energy consumption can be saved using this adaptive technique. ### 6.3.3. MIMO Systems and Cooperative Strategies for Low-Energy Wireless Networks **Participants:** Olivier Berder, Olivier Sentieys, Pascal Scalart, Matthieu Gautier, Le-Quang-Vinh Tran, Duc-Long Nguyen [Master's Student], Ruifeng Zhang, Viet-Hoa Nguyen. Since a couple of years, the CAIRN team has reached a significant expertise in multi-antenna systems, especially in linear precoding. In order to obtain an efficient, simple and general form of precoders, we considered an SNR-like matrix to approximate the minimum distance. The precoding matrix is first parameterized as the product of a diagonal power allocation matrix and an input-shaping matrix and demonstrated that the minimum diagonal entry of the latter is obtained when the input-shaping matrix is a DFT-matrix. The major advantage of this design is that the solution can be available for all rectangular QAM-modulations and for any number of datastreams [28]. On the other hand the sphere decoder was applied at the receiver side instead of maximum likelihood and the performance complexity trade-off was investigated. Some adjustments of traditional sphere decoding algorithm were mandatory to adapt to the precoded MIMO systems [55]. Another way to exploit the MIMO diversity, especially in WSN where only one antenna can be supported by limited size devices, is to use space-time codes in a distributed manner. In this context, a new protocol, called fully distributed space-time coded (FDSTC) protocol having information exchange between relays, was proposed and compared with the conventional distributed space-time coded (DSTC) protocol using non-regenerative relays (NR-relays) and regenerative relays (R-relays). At the same spectral efficiency, FDSTC has better performance in terms of outage probability in high SNR regions. In terms of energy efficiency, the FDSTC protocol is shown to outperform DSTC for long-range transmissions [32]. As very few dedicated MAC protocols exist, we investigated a novel low-latency MAC protocol (ARQ-CRI) for low-power cooperative wireless sensor networks WSNs, while preserving (in high traffic mode) or even increasing (in low traffic mode) energy-efficiency [54]. An energy efficient opportunistic MAC protocol with the mechanisms of reservation and a relay candidate coordination were also proposed, and the multi-relay transmission probability was analyzed. Simulation and experiment results on a real wireless sensor network platform in different channels demonstrated the proposed scheme greatly reduces the multi-relay transmission probability and achieves about 84% improvement of energy efficiency compared with the traditional opportunistic MAC schemes [66]. ### 6.3.4. Energy Harvesting and Adaptive Wireless Sensor Networks Participants: Olivier Berder, Olivier Sentieys, Arnaud Carer, Mahtab Alam, Ruifeng Zhang, Trong-Nhan Le. As tiny sensor nodes are equipped with limited battery, the optimization of the power consumption of these devices is extremely vital. In typical WSN platforms, the radio transceiver consumes major proportion of the energy. Major concerns are therefore to decrease both the transmit power and radio activity. We designed an adaptive transmit power optimization technique that is applied under varying channel to reduce the energy per successful transmitted bit. Each node locally adapts its output power according to the signal-to-noise ratio (SNR) variations (for all the neighbor nodes). It is found that by dynamically adapting the transmit power on average can help to reduce the energy consumption by a factor of two [36]. To further extend the system lifetime of WSN, energy harvesting techniques have been considered as potential solutions for long-term operations. Instead of minimizing the consumed energy as for the case of batterypowered systems, the harvesting node is adapted to Energy Neutral Operation (ENO) to achieve a theoretically infinite lifetime. Several types of energy sources can be used, as light, motion or heat [51]. We even investigated the possibility for a single sediment-microbial fuel cell (MFC) to power a wireless sensor network [31]. Through experiments conducted on the PowWow platform, it was shown that the energy harvesting device adapts to the intermittent power supplied by the MFC, and the radio-transmitter is able to switch from a continuous to degraded mode. Given the harvesting capability, we then tried to design power managers (PM) able to optimize the quality of service of WSN while maintaining ENO. Our PM adapts the duty cycle of the node according to the estimation of harvested energy and the consumed energy provided by a simple energy monitor for a super capacitor based WSN to achieve the ENO [52]. When possible, as is sometimes the case for solar or wind energy, it is also of prime interest to benefit from an accurate energy predictor to estimate the energy that can be harvested in the near future, therefore we proposed a low complexity energy predictor using adaptive filter [53]. Finally, with colleagues from University College of Cork, we recently investigated the possibility to combine energy harvesting platforms with low power wake-up radios. A nano-watt wake-up radio receiver (WUR) was used cooperatively with the main transceiver in order to reduce the wasted energy of idle listening in asynchronous MAC protocols, while still maintaining the same reactivity [50]. ### 6.3.5. Impact of RF Front-End Nonlinearity on WSN Communications. Participants: Amine Didioui, Olivier Sentieys, Carolynn Bernier [CEA Leti]. In the context of a collaboration with CEA Leti, we studied the impact of RF front-end non-linearity on the performance of wireless sensor networks (WSN). More specifically, we investigated the problem of interference caused by intermodulation between in-band interferers. We analyzed this problem using an enhanced model of signal-to-interference-and-noise ratio (SINR) that includes an interference term due to intermodulation. Using a WSN simulator and the selectivity and the third-order input intercept point (IIP3) specifications of a radio transceiver, we have shown that the new SINR model provides helpful information for the analysis of intermodulation problems caused by in-band signals in IEEE 802.15.4 WSNs. In [45], we presented a reconfigurable receiver model whose purpose is to enable the study of reconfiguration strategies for future energy-aware and adaptive transceivers. This model is based on Figure of Merits of measured circuits. To account for real-life RF interference mechanisms, a link quality estimator is also provided. We show that adapting the receiver performance to the channel conditions can lead to considerable power saving. The models proposed can easily be implemented in a wireless network simulation in order to validate the value of a reconfigurable architecture in real-world deployment scenarios. ## 6.3.6. HarvWSNet: A Co-Simulation Framework for Energy Harvesting Wireless Sensor Networks. Participants: Amine Didioui, Olivier Sentieys, Carolynn Bernier [CEA Leti]. Recent advances in energy harvesting (EH) technologies now allow wireless sensor networks (WSNs) to extend their lifetime by scavenging the energy available in their environment. While simulation is the most widely used method to design and evaluate network protocols for WSNs is simulation, existing network simulators are not adapted to the simulation of EH-WSNs and most of them provide only a simple linear battery model. To overcome these issues, we have proposed HarvWSNet, a co-simulation framework based on WSNet and Matlab that provides adequate tools for evaluating EH-WSN lifetime [44]. Indeed, the framework allows for the simulation of multi-node network scenarios while including a detailed description of each node's energy harvesting and management subsystem and its time-varying environmental parameters. A case study based on a temperature monitoring application has demonstrated HarvWSNet's ability to predict network lifetime while minimally penalizing simulation time [40]. ## 6.3.7. Synchronisation Algorithms and Parallel Architecture for Wireless and High-Rate Optical OFDM Systems Participants: Pramod Udupa, Olivier Sentieys, Arnaud Carer, Pascal Scalart. Multi-band Coherent Optical OFDM (MB CO-OFDM) is widely predicted to be one of the technologies which will empower 100 Gigabit Ethernet (100GbE) networks. CO-OFDM uses coherent technology and advanced digital signal processing (DSP) to achieve net data rate of 10 Gbps in a single band. This strict throughput requirement puts a constraint on the kind of signal processing algorithms and architectures used for building the system. In [72], a scalable parallel architecture using radix-2<sup>2</sup> for IFFT was proposed. The second proposal consists of a scalable parallel timing synchronization algorithm which can support very high input rates at the receiver. MOPS count as well as area versus throughput for the synchronization algorithm are provided for the OFDM transceiver to show the improvements due to proposed architecture. Architecture exploration was performed using a leading-edge high-level synthesis (HLS) tool. A novel low complexity parallel algorithm and its associated architecture were proposed for initial synchronization in orthogonal frequency division multiplexing (OFDM) systems. The method is hierarchical and uses auto-correlation for the first step and cross-correlation for the second step [60]. The main advantage of the proposed approach is that it reduces the computational complexity by a factor of five (80%), while achieving similar mean square error (MSE) as cross-correlation based methods. The method uses block-level parallelism for auto-correlation step, which speeds up the computation significantly. After fixed-point analysis, a parallel architecture is proposed to accelerate both coarse and fine synchronization steps. This parallel architecture is scalable and provides speed-up proportional to number of parallel blocks [59]. ## 7. Partnerships and Cooperations ## 7.1. National Initiatives The CAIRN team has currently some collaboration with the following laboratories: CEA List, CEA Leti, LEAT Nice, Lab-Sticc (Lorient, Brest), LIRMM (Montpellier, Perpignan), LIP6 Paris, IETR Rennes, Ireena Nantes; and with the following Inria project-teams: Aric, Compsys, Socrate. The team participates in the activities of the following research organization of CNRS (GdR for in French "Groupe de Recherche"): - GdR SOC-SIP (*System On Chip & System In Package*), working groups on reconfigurable architectures, embedded software for SoC, low power issues. E. Casseau is in charge of the architecture topic of the reconfigurable platform working group. - GdR ISIS (Information Signal ImageS), working group on Algorithms Architectures Adequation. - GdR ASR (Architectures Systèmes et Réseaux) - GdR IM (*Informatique Mathématiques*), C2 working group on Codes and Cryptography and ARITH working group on Computer Arithmetic ### 7.1.1. ANR Blanc - PAVOIS (2012–2016) **Participants:** Arnaud Tisserand, Emmanuel Casseau, Romuald Rocher, Philippe Quémerais, Jérémie Métairie, Nicolas Veyrat-Charvillon, Nicolas Estibals, Thomas Chabrier, Karim Bigou. PAVOIS (in French: *Protections Arithmétiques Vis à vis des attaques physiques pour la cryptOgraphle basée sur les courbeS elliptiques*) is a project on Arithmétic Protections Against Physical Attacks for Elliptic Curve based Cryptography. It involves IRISA-CAIRN (Lannion) and LIRMM (Perpignan and Montpellier). This project will provide novel implementations of curve based cryptographic algorithms on custom hardware platforms. A specific focus will be placed on trade-offs between efficiency and robustness against physical attacks. One of our goal is to theoretically study and practically measure the impact of various protection schemes on the performance (speed, silicon cost and power consumption). Theoretical aspects will include an investigation of how special number representations can be used to speed-up cryptographic algorithms, and protect cryptographic devices from physical attacks. On the practical side, we will design innovative cryptographic hardware architectures of a specific processor based on the theoretical advancements described above to implement curve based protocols. We will target efficient and secure implementations for both FPGA an ASIC circuits. For more details see <a href="http://pavois.irisa.fr">http://pavois.irisa.fr</a>. ### 7.1.2. ANR INFRA 2011 - FAON (2012-2015) Participants: Raphaël Bardoux, Arnaud Carer, Matthieu Gautier, Pascal Scalart. The FAON (Frequency based Access Optical Networks) project objectives are to demonstrate the technology and feasibility of a new type of Passive Optical Network (PON) for broadband access which uses a Frequency based shared access technique known as Frequency Division Multiplexing (FDM). These goals completely fall into the line of the expected capacity increase in PON which is today forecasted to go from 100 Mbps per user to 1 Gbps. For more details, see <a href="http://www.anr-faon.fr/">http://www.anr-faon.fr/</a>. Faon involves Orange Labs, CEA-LETI, University of South Brittany (Lab-STICC laboratory) and University of Rennes 1 (Foton laboratory and CAIRNteam). CAIRNaims at developing a high-rate architecture at the receiver side. Specific receiver algorithms (synchronization and equalization) and FPGA implementation are the key issues that will be addressed. ### 7.1.3. Equipex FIT - Future Internet (of Things) **Participants:** Vaibhav Bhatnagar, Arnaud Carer, Matthieu Gautier, Ganda-Stéphane Ouedraogo, Olivier Sentieys. FIT is one of 52 winning projects from the first wave of the French Ministry of Higher Education and Research's "Équipements d'Excellence" (Equipex) research grant programme. FIT involves UPMC, Inria, LSIIT and the Institut Mines-Telecom and runs over a nine-year period. FIT offers a federation of several independent experimental testbeds to provide a larger-scale, more diverse and higher performance platform for accomplishing advanced experiments. For more details, see <a href="http://fit-equipex.fr/">http://fit-equipex.fr/</a>. Inria (CAIRNand Socrate teams) develops the cognitive radio testbed that will provide a full experimental environment for evaluating the coexistence and the cooperation between heterogeneous multistandard nodes. To this aim, a fully open architecture based on software defined radio nodes is developed. CAIRNaims at proposing an FPGA based software defined radio with high level specifications. Cognitive radio testbed development is supported by an ADT funding of Inria. ### 7.1.4. ANR Ingénérie Numérique et Sécurité - ARDyT (2011-2015) Participants: Arnaud Tisserand, Thomas Chabrier, Philippe Quémerais. ARDyT (in French: Architecture Reconfigurable Dynamiquement Tolérante aux fautes) is a project on a Reliable and Reconfigurable Dynamic Architecture. It involves IRISA-CAIRN (Lannion), Lab-STICC (Lorient), LIEN (Nancy) and ATMEL. The purpose of the ARDyT project is to provide a complete environment for the design of a fault tolerant and self-adaptable platform. Then, a platform architecture, its programming environment and management methodologies for diagnosis, testability and reliability have to be defined and implemented. The considered techniques are exempt from the use of hardened components for terrestrial and aeronautics applications for the design of low-cost solutions. The ARDyT platform will provide a European alternative to import ITAR constraints for fault-tolerant reconfigurable architectures. For more details see <a href="http://ardyt.irisa.fr">http://ardyt.irisa.fr</a>. ### 7.1.5. ANR Ingénérie Numérique et Sécurité - COMPA (2011-2015) Participants: Emmanuel Casseau, Steven Derrien, Antoine Courtay, Mythri Alle. COMPA (model oriented design of embedded and adaptive multiprocessor) is a project which involves CAIRN, IETR (Institut d'Electronique et de Télécommunications de Rennes), Lab-STICC (University of Bretagne Sud), CAPS Entreprise, and Modae Technologies. The goal of the project is to design adaptive multiprocessor embedded systems to the execute dataflow programs. The use case is Reconfigurable video coding (RVC) standard. More specifically, we focus on the portable and platform-independent RVC-CAL language to describe the applications. We use transformations to refine, increase parallelism and translate the application model into software and hardware components. Task mapping, instruction and processor allocation, and specific scheduling are also investigated for runtime execution and reconfiguration. ### 7.1.6. ANR Ingénérie Numérique et Sécurité - DEFIS (2011-2015) Participants: Olivier Sentieys, Daniel Menard [external collaborator], Romuald Rocher, Nicolas Simon. DEFIS (Design of fixed-point embedded systems) is a project which involves CAIRN, LIP6 (University of Paris VI), LIRMM (University of Perpignan), CEA LIST, Thales, Inpixal. The main objectives of the project are to propose new approaches to improve the efficiency of the floating-point to fixed-point conversion process and to provide a complete design flow for fixed-point refinement of complex applications. This infrastructure will reduce the time-to-market by automating the fixed-point conversion and by mastering the trade-off between application quality and implementation cost. Moreover, this flow will guarantee and validate the numerical behavior of the resulting implementation. The proposed infrastructure will be validated on two real applications provided by the industrial partners. For more details see <a href="http://defis.lip6.fr">http://defis.lip6.fr</a>. ### 7.1.7. ANR ARPEGE - GRECO (2010-2013) Participants: Olivier Sentieys, Olivier Berder, Arnaud Carer, Trong-Nhan Le. Sensor network technologies and the increase efficiency of photovoltaic cells show that it is possible to reach communicating objects solutions with low enough power consumption to foresee the possibility of developing autonomous objects. Greco (GREen wireless Communicating Objects) is a project on the design of autonomous communicating object platforms (i.e. self-powered sensor networks). The aim is to optimize the power consumption based on (i) a modeling of the performance and power of the required blocks (RF front-end, converters, modem, peripherals, digital architecture, OS, software, power generator, battery, etc.) (ii) heterogeneous simulation models and tools, and (iii) the use of a real-time global "Power Manager". The final validation will be performed on various case studies: a monitoring system and an audio communication between firemen. A HW/SW prototyping (based on an CAIRN's PowWow platform with energy harvesting) and a simulation associating a precise modeling (virtual platform) of an object inserted in a network simulator-like environment will be developed as demonstrators. Greco involves Thales, Irisa-CAIRN, CEA List, CEA Leti, Im2nP, LEAT, Insight-SiP. For more details see http://greco.irisa.fr. ### 7.1.8. Images and Networks competitiveness cluster - 100GFlex project (2010-2013) Participants: Olivier Sentieys, Arnaud Carer, Remi Pallas, Pascal Scalart. Speed and flexibility are quickly increasing in the metropolitan networks. In this context, 100GFLEX studies the relevance of a new transmission scheme: the multiband optical OFDM at very-high rates (up to 100 Gbits/s). In this project we will study efficient algorithms (e.g. synchronization) and high-speed architectures for the digital signal processing of the optical transceivers. Due to the high rate of analog signals (sampling at more than 10Gsample/s), synchronizing and processing is real challenge. 100Gflex involves Mitsubishi-Electric R&D Center Europe, Institut Télécom, Ekinops, France Télécom, Yenista Optics, Foton and CAIRN. ### 7.2. European Initiatives ### 7.2.1. FP7 FLEXTILES **Participants:** Olivier Sentieys, Emmanuel Casseau, Antoine Courtay, Daniel Chillet, Philippe Quémerais, Christophe Huriaux, Quang-Hoa Le. Program: FP7-ICT-2011-7 Project acronym: Flextiles Duration: Oct. 2011 - Sep. 2014 Coordinator: Thales Other partners: Thales (FR), UR1 (FR), KIT (GE), TU/e (NL), CSEM (SW), CEA LETI (FR), Sundance (UK) Project title: Self Adaptive Heterogeneous Manycore Based on Flexible Tiles A major challenge in computing is to leverage multi-core technology to develop energy-efficient high performance systems. This is critical for embedded systems with a very limited energy budget as well as for supercomputers in terms of sustainability. Moreover the efficient programming of multi-core architectures, as we move towards manycores with more than a thousand cores predicted by 2020, remains an unresolved issue. The FlexTiles project will define and develop an energy-efficient yet programmable heterogeneous manycore platform with self-adaptive capabilities. The manycore will be associated with an innovative virtualisation layer and a dedicated tool-flow to improve programming efficiency, reduce the impact on time to market and reduce the development cost by 20 to 50%. FlexTiles will raise the accessibility of the manycore technology to industry - from small SMEs to large companies - thanks to its programming efficiency and its ability to adapt to the targeted domain using embedded reconfigurable technologies. ### 7.2.2. FP7 ALMA Participants: Steven Derrien, Romuald Rocher, Olivier Sentieys, Maxime Naullet, Ali Hassan El-Moussawi. Program: FP7-ICT-2011-7 Project acronym: Alma Project title: Architecture oriented paraLlelization for high performance embedded Multicore sys- tems using scilAb Duration: Sep. 2011 - Aug. 2014 Coordinator: KIT Other partners: KIT (GE), UR1 (FR), Recore Systems (NL), Univ. of Peloponnese (GR), TEI-MES (GR), Intracom SA (GR), Fraunhofer (GE) The mapping process of high performance embedded applications to today's multiprocessor system on chip devices suffers from a complex toolchain and programming process. The problem here is the expression of parallelism with a pure imperative programming language which is commonly C. This traditional approach limits the mapping, partitioning and the generation of optimized parallel code, and consequently the achievable performance and power consumption of applications from different domains. The Architecture oriented paraLlelization for high performance embedded Multicore systems using scilAb (ALMA) project aims to bridge these hurdles through the introduction and exploitation of a Scilab-based toolchain which enables the efficient mapping of applications on multiprocessor platforms from high-level abstraction descriptions. This holistic solution of the toolchain allows the complexity of both the application and the architecture to be hidden, which leads to a better acceptance, reduced development cost and shorter time-to-market. Driven by the technology restrictions in chip design, the end of Moore's law and an unavoidable increasing request of computing performance, ALMA is a fundamental step forward in the necessary introduction of novel computing paradigms and methodologies. ALMA helps to strengthen the position of Europe in the world market of multiprocessor targeted software toolchains. The challenging research will be achieved by the unique ALMA consortium which brings together industry and academia. High class partners from industry such as Recore and Intracom, will contribute their expertise in reconfigurable hardware technology for multicore systems-on-chip, software development tools and real world applications. The academic partners will contribute their outstanding expertise in reconfigurable computing and compilation tools development. ### 7.2.3. Collaborations with Major European Organizations Imec (Belgium), Scenario-based fixed-point data format refinement to enable energy-scalable of Software Defined Radios (SDR) Lund University (Sweden), Constraints programming approach application in the reconfigurable data-paths synthesis flow Code and Cryptography group of University College Cork (Ireland), Arithmetic operators for cryptography, side channel attacks for security evaluation, and WSN for health monitoring Ecole Polytechnique Fédérale de Lausanne - EPFL (Switzerland), Optimization of systems using fixed-point arithmetic Technical University of Madrid - UPM (Spain), Optimization of systems using fixed-point arithmetic Technical University of Tampere, University of Oulu (Finland), Reconfigurable Video Coding ### 7.3. International Initiatives ### 7.3.1. Inria International Partners ### 7.3.1.1. Declared Inria International Partners Computer Science Department, Colorado State University in Fort-Collins (USA), Loop parallelization, development of high-level synthesis tools, Inria Associate Team (2010-2012) Electrical and Computer Engineering Department, University of Massachusetts at Amherst (USA), CAD tools for arithmetic datapath synthesis and optimization ### 7.3.1.2. Informal International Partners LRTS laboratory, Laval University in Québec (Canada), Architectures for MIMO systems, Wireless Sensor Networks, Inria Associate Team (2006-2008) LSSI laboratory, Québec University in Trois-Rivières (Canada), Design of architectures for digital filters and mobile communications ### 7.3.2. CNRS PICS - SPiNaCH (2012 - 2014) Title: Secure and low-Power sensor Networks Circuits for Healthcare embedded applications Principal investigator: Arnaud Tisserand, Olivier Berder, Olivier Sentieys International Partner (Institution - Laboratory - Researcher): Code&Crypto group in University College Cork (Ireland) Duration: 2012 - 2014 Biomedical sensor networks may be used more and more in the future. For instance, they allow patient's health-care parameters to be remotely monitored at home. In this project, we plan to address two important challenges in the design of biomedical sensors networks: i) design of low-power sensor devices for embedded autonomous systems (health monitoring, pace-maker...) with long battery life; ii) confidentiality and security aspects and especially with public key cryptography processor that are robust against side channel attacks (measure of the computation time, the power consumption or the electromagnetic radiations of the circuit) and with limited power-energy resources. ### 7.4. International Research Visitors ### 7.4.1. Visits of International Scientists Prof. Russel Tessier (University of Massachusetts, UMass Reconfigurable Computing Group, USA) for one month in June-July (Visiting professor position from University Rennes 1). Prof. Liam Marnane (University College Cork, Ireland) for one month in June (Visiting professor position from University Rennes 1). Prof. Emanuel Popovici (University College Cork, Ireland) for two weeks in July (Visiting professor position from University Rennes 1). Prof. Manav Bhatnagar, (Department of Electrical Engineering, Indian Institute of Technology, Delhi, India) for two weeks in December (Visiting professor position from University Rennes 1). Dr. Michele Magno, post-doc, (University College Cork, Ireland) for one week in July (funded by CNRS PICS SpiNaCH project). ### 7.4.2. Internships Participant: Simara Pérez Zurita. Subject: Optimizing Computational Precision in High-level Synthesis of Signal Processing Systems: Theory and Implementation using TDS and GECOS Date: from Oct 2012 until Aug 2013 Institution: Technical University of Kaiserslautern (Kaiserslautern, Germany) Participant: Rengarajan Ragavan. Subject: Reconfigurable Microtasks for Ultra-Low Power Wireless Sensor Network Nodes Date: from Jan 2013 until Jul 2013 Institution: Linkoping University (Linkoping, Sweden) Participant: Amith Vikram Pai. Subject: Design and Validation of a Low-Power Embedded FPGA Date: from Jan 2013 until Jun 2013 Institution: Birla Institute of Technology and Science, Pilani (India) ### 8. Dissemination ### 8.1. Scientific Animation O. Berder was General Chair of the 3rd Workshop on Ultra Low Power Systems (WUPS), Prague, Czech Republic, February 2013. - D. Chillet was member of the technical program committee of HiPEAC RAPIDO, HiPEAC WRC, DCIS, and DASIP. - M. Gautier was a member of the technical program committe of IEEE WCNC 2013, IEEE PIMRC 2013, IEEE Percom 2013 (Workshop on Cognitive Computing and Communications), IEEE ICCVE 2013 and IARIA COCORA 2013. - A. Tisserand was a member of technical program committee of the following conferences: IEEE ARITH'21, IEEE Reconfig 2013, DASIP 2013, IEEE NEWCAS 2013. He is a member of the editorial board of the International Journal of High Performance Systems Architecture, Inderscience. - C. Wolinski was a member of the technical program committee of IEEE ASAP and DSD. - F. Charot, O. Sentieys and A. Tisserand are members of the steering committee of a CNRS spring school for graduate students on embedded systems architectures and associated design tools (ARCHI). - O. Sentieys and A. Tisserand are members of the steering committee of a CNRS spring school for graduate students on low-power design (ECOFAC). - A. Tisserand co-organized the ARCHI 2013 école thématique CNRS Architectures des systèmes matériels enfouis et méthodes de conception associées, March 25-29 Col-de-Porte. Details on <a href="http://tima-sls.imag.fr/archi13">http://tima-sls.imag.fr/archi13</a> - A. Tisserand was Editor of a special issue in Journal TSI (Technique et Science Informatique) [81]. - O. Sentieys was a member of technical program committee of the following conferences: IEEE/ACM DATE, IEEE FPL, ACM/IEEE MEMOCODE IEEE VTC, IEEE DDECS, ACM SBCCI, FTFC. He was Track Chair at IEEE NEWCAS. He is on the editorial board of Journal of Low Power Electronics, American Scientific Publishers, and of ISRN Sensor Networks. - O. Sentieys is a member of the steering committee of the GDR SOC-SIP. He is the chair of the IEEE Circuits and Systems (CAS) French Chapter. In 2013, he was an expert for some scientific organizations (ANR, AERES). - O. Berder is an elected member of IRISA Lab Council. He is the moderator of the Embedded Systems area in the Scientists Interest Group on Intelligent Transportation Systems. - E. Casseau was an expert for the ANR INS program (Agence Nationale de la Recherche). ### 8.2. Seminars and Invitations - A. Tisserand gave an invited lecture at the CNRS ARCHI 2013 spring school on FPGA circuits. - A. Tisserand gave an invited lecture at the ARIC team of LIP Laboratory, ENS Lyon, on Arithmetic Level Countermeasures for ECC Cryptoprocessors Against Side Channel Attacks. ### 8.3. Teaching - Supervision - Juries ### 8.3.1. Teaching Responsibilities There is a strong teaching activity in the CAIRN team since most of the permanent members are Professors or Associate Professors. - C. Wolinski is the Director of ESIR. - P. Quinton is the director of Ecole Normale Supérieure de Rennes. - D. Chillet is the Director of Academic Studies of ENSSAT. - P. Scalart is the Head of the Electronics Engineering department of ENSSAT. - S. Derrien is the responsible of the first year of the master of computer science at ISTIC since Sep. 2012. - O. Sentieys is responsible of the "Embedded Systems" branch of the SISEA Master of Research (M2R). D. Chillet is the co-repsonsible of the Embedded System speciality of the ICT Master of University of Science and Technology of Hanoi. ENSSAT stands for "École Nationale Supérieure des Sciences Appliquées et de Technologie" and is an "École d'Ingénieurs" of the University of Rennes 1, located in Lannion. ISTIC is the Electrical Engineering and Computer Science Department of the University of Rennes 1. ESIR stands for "École supérieure d'ingénieur de Rennes" and is an "École d'Ingénieurs" of the University of Rennes 1, located in Rennes. M2R stands for Master by Research, second year. - D. Chillet is member of the French National University Council since 2009 in signal processing and electronics (Conseil National des Universités en 61e section). - D. Chillet is member of the Permanent Committee of the French National University Council since november 2011 in signal processing and electronics (Commission Permanente du Conseil National des Universités en 61e section). - A. Tisserand is member of the French National University Council since 2011 in computer science (Conseil National des Universités en 27e section). ### 8.3.2. Teaching - O. Berder: introduction to signal processing, 38h, ENSSAT (L3) - O. Berder: microprocessors and digital systems, 30h, ENSSAT (L3) - O. Berder: wireless communications, 23h, ENSSAT (M2) - O. Berder: ad hoc networks, 58h, ENSSAT (M1-M2) - O. Berder: signal processing, 12h, IUT Lannion (L2) - E. Casseau: signal processing, 16h, ENSSAT (L3) - E. Casseau: low power design, 6h, ENSSAT (M1) - E. Casseau: real time design methodology, 24h, ENSSAT (M1) - E. Casseau: computer architecture, 36h, ENSSAT (M1) - E. Casseau: system on chip and verification, 10h, Master by Research and ENSSAT (M2) - E. Casseau: reconfigurable architectures, 25h, USTH (M2) - S. Derrien: component and system synthesis, 16h, Research Master (MRI ISTIC) (M2) - S. Derrien: computer architecture, 12h, ENS Cachan (L3) - S. Derrien: introduction to operating systems, 8h, ISTIC (M1) - F. Charot: specification of applications with the signal synchronous language, 24h, ESIR (M1) - F. Charot: virtual prototyping of multiprocessor system-on-chip, 24h, ESIR (M1) - F. Charot: design of embedded systems, 28h, ESIR (M1) - A.Courtay: Processor Architecture, 24h, ENSSAT (L3) - A.Courtay: Digital Electronics, 32h, ENSSAT (L3) - A.Courtay: Digital System Design, 12h, ENSSAT (L3) - A.Courtay: Digital Electronics Communication Interfaces, 68h, ENSSAT (M1) - A.Courtay: Processor Architecture, 25h, USTH (M1) - D.Chillet: Basic processor architecture, 20h, ENSSAT (L1) - D.Chillet: Design methodology of real-time systems, 32h, ENSSAT (L2) - D.Chillet: Advanced processor architectures, 24h, ENSSAT (M2) - D.Chillet: Multimedia processor architectures, 24h, ENSSAT (M2) - D.Chillet: Multi-processor systems, 20h, ENSSAT (M2) - D. Chillet: advanced processors architectures, 24h, Master by Research and ENSSAT (M2) - D. Chillet: low-power digital CMOS circuits, 6h, Telecom Bretagne and University of Occidental Brittany (UBO) (M2) - D. Chillet: Digital system design, 25h, University of Science and Technology of Hanoi (M1) - D. Chillet: Advanced Multiprocessor system , 25h, University of Science and Technology of Hanoi (M2) - M.Gautier, electronics, 42h, IUT Lannion (L1) - M.Gautier, telecommunications, 114h, IUT Lannion (L1) - M.Gautier, digital communications, 28h, IUT Lannion (L2) - C. Killian, digital electronics, 74h, IUT Lannion (L1) - C. Killian, digital electronics, 28h, IUT Lannion (L2) - C. Killian, electricity, 60h, IUT Lannion (L1) - C. Killian, signal processing, 40h, IUT Lannion (L2) - R. Rocher: electricity, 16h, IUT Lannion (L1) - R. Rocher: electronics, 44h, IUT Lannion (L1) - R. Rocher: telecommunications, 82h, IUT Lannion (L1) - R. Rocher: signal processing, 12h, IUT Lannion (L2) - R. Rocher: digital communications, 48h, IUT Lannion (L2) - P. Scalart: non-linear optimisation, 18h, Master by Research and ENSSAT (M2) - P. Scalart: Parametric modelisation, optimal and adaptive Filters, 24h, Master by Research and ENSSAT (M2) - P. Scalart: source coding, 14h, Master by Research and ENSSAT (M2) - P. Scalart: cellular networks, 24h, ENSSAT (M2) - P. Scalart: digital communication systems, 32h, ENSSAT (M1) - P. Scalart: random signals and systems, 12h, ENSSAT (M1) - O. Sentieys: digital signal processing, 40h, ENSSAT (M1) - O. Sentieys: VLSI integrated circuit design, 40h, ENSSAT(M1) - A. Tisserand: multiprocessor architectures and programming, 20h, ENSSAT and Master by Research, Univ. Rennes 1(M2) - A. Tisserand: hardware computer arithmetic operators, 6h, Master by Research, Univ. Rennes 1 (M2) - C. Wolinski: architecture 1, 64h, ESIR (L3) - C. Wolinski: architecture 2, 28h, ESIR (L3) - C. Wolinski: design of embedded systems, 48h, ESIR (M1) - C. Wolinski: signal, image, architecture, 26h, ESIR (M1) - C. Wolinski: programmable architectures, 10h, ESIR (M1) - C. Wolinski: component and system synthesis, 10h, Master by Research (MRI ISTIC) (M2) ### 8.3.3. Supervision PhD: Mahtab Alam, Power Aware Adaptive Techniques for Wireless Sensor Networks, Univ. Rennes 1, Jan. 2013, O. Sentieys, O. Berder, D. Menard. PhD: Robin Bonamy, Power Consumption Modelling and Optimisation for Heterogeneous Reconfigurable Platform, Univ. Rennes 1, Jul. 2013, D. Chillet. PhD: Thomas Chabrier, Arithmetic recodings for ECC cryptoprocessors with protections against side-channel attacks, Univ. Rennes 1, Jun. 2013, A. Tisserand, E. Casseau. PhD: Hervé Yviquel, From dataflow-based video coding tools to dedicated embedded multi-core platforms, Univ. Rennes 1, Oct. 2013, E. Casseau. PhD: Antoine Morvan, Polyhedral Model for High-Level Synthesis of Pipelined Architectures, Univ. Rennes 1, Jun. 2013, P. Quinton, S. Derrien. PhD: Vivek D. Tovinakere, Ultra-Low Power Reconfigurable Controllers for Wireless Sensor Networks, Univ. Rennes 1, Feb. 2013, O. Sentieys. PhD in progress: Florent Berthier, Study and Design of an Ultra Low Power Asynchronous Core for Sensor Networks, Oct. 2013, O. Sentieys, P. Vivet, E. Beigne. PhD in progress: Karim Bigou, RNS Hardware Units for ECC, Oct. 2011, A. Tisserand. PhD in progress: Franck Bucheron, Secure Virtualization for Embedded Systems, Oct. 2011, A. Tisserand. PhD in progress: Aymen Chakhari, Analytical approach for decision errors in fixed-point digital communication systems, Oct. 2010, R. Rocher, P. Scalart. PhD in progress: Gaël Deest, Computing with Errors: Error-Tolerant Machine Code Generation for Unreliable Embedded Hardware, Oct. 2013, S. Derrien, O. Sentieys. PhD in progress: Amine Didioui, Reconfigurable Radio Front-End for Energy-Harvesting Wireless Sensor Networks, Nov. 2010, O. Sentieys, C. Bernier. PhD in progress: Ali Hassan El-Moussawi, Performance/Accuracy Trade-Off in Automatic Parallelization for Embedded Many-Core Platforms, Nov. 2012, S. Derrien. PhD in progress: Christophe Huriaux, Embedded reconfigurable hardware accelerators with efficient dynamic reconfiguration management, Oct. 2012, O. Sentieys, A. Courtay. PhD in progress: Quang-Hai Khuat, Real-Time Spatio-Temporal Task Scheduling on 3D Architectures, Oct. 2011, D. Chillet. PhD in progress: Trong-Nhan Le, Global power management system for self-powered autonomous wireless sensor nodes, Jan. 2011, O. Sentieys, O. Berder. PhD in progress: Quang-Hoa Le, Virtualized dynamic reconfiguration for 3D SoC, Oct. 2012, E. Casseau, A. Courtay. PhD in progress: Xuan Chien Le, Indirect Monitoring in Self-Powered Wireless Sensor Networks for Smart Grid and Building Automation, Oct. 2013, O. Sentieys, O. Berder. PhD in progress: Jérémie Métairie, Reconfigurable Arithmetic Units for Secure Cryptoprocessors, Oct. 2012, A. Tisserand, E. Casseau. PhD in progress: Van Thiep Nguyen, Energy-efficient MAC protocols for cooperative strategies in Wireless Sensor Networks, Oct. 2013, O. Berder, M. Gautier. PhD in progress: Viet-Hoa Nguyen, Energy-efficient cooperative techniques for Wireless Body Area Sensor Networks, Nov. 2012, O. Berder, jointly with C. Langlais from Telecom Bretagne. Ganda-Stéphane Ouedraogo, Automatic synthesis of hardware accalerator from high-level specifications in flexible radios, Oct. 2011, M. Gautier, O. Sentieys. PhD in progress: Rengarajan Ragavan, Ultra-Low Power Reconfigurable Architectures for Computing and Control in Wireless Sensor Networks, Oct. 2013, O. Sentieys, C. Killian. PhD in progress: Mai-Thanh Tran, Hardware Synthesis of Flexible and Reconfigurable Radio from High-Level Language Dedicated to Physical Layer of Wireless Systems, Oct. 2013, E. Casseau, M. Gautier. PhD in progress: Pramod P. Udupa, Sampling, synchronising, digital processing and FPGA implementation of 100Gbps optical OFDM signals, Jan. 2011, O. Sentieys. PhD in progress: Zhongwei Zheng, Short-range geolocation algorithms based on distributed multisensor processing, Nov. 2012, P. Scalart, jointly with C. Roland from Lab-STICC. ### 8.4. Popularization A popularisation paper on energy efficiency has been published in [80] 15 members of the team participated in the national science festival (Fête de la Science) in Plemeur-Bodou in October (demonstrations on wireless sensor networks, cryptology and digital integrated circuits). A letter was published in Inria Emergences on "improving energy efficiency of embedded processors": http://emergences.inria.fr/lettres2013/newsletter-n28/L28\_GECOS ## 9. Bibliography ### Major publications by the team in recent years - [1] D. CHILLET, A. EICHE, S. PILLEMENT, O. SENTIEYS. *Real-time scheduling on heterogeneous system-on-chip architectures using an optimised artificial neural network*, in "Journal of Systems Architecture Embedded Systems Design", April 2011, vol. 57, n<sup>o</sup> 4, pp. 340-353, http://dx.doi.org/10.1016/j.sysarc.2011.01.004 - [2] L. COLLIN, O. BERDER, P. ROSTAING, G. BUREL. *Optimal Minimum Distance Based Precoder for MIMO Spatial Multiplexing Systems*, in "IEEE Transactions on Signal Processing", March 2004, vol. 52, n<sup>o</sup> 3 - [3] A. COURTAY, O. SENTIEYS, J. LAURENT, N. JULIEN. *High-level Interconnect Delay and Power Estimation*, in "Journal of Low Power Electronics (JOLPE)", 2008, vol. 4, n<sup>o</sup> 1, pp. 21-33 - [4] R. DAVID, S. PILLEMENT, O. SENTIEYS. *Energy-Efficient Reconfigurable Processors*, in "Low Power Electronics Design", C. PIGUET (editor), Computer Engineering, Vol 1, CRC Press, August 2004, chap. 20 - [5] S. DERRIEN, P. QUINTON. Parallelizing HMMER for Hardware Acceleration on FPGAs, in "18th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2007)", Montreal, Canada, July 2007, pp. 10–18, Best Paper Award - [6] S. DERRIEN, S. RAJOPADHYE, P. QUINTON, T. RISSET. 12, in "High-Level Synthesis of Loops Using the Polyhedral Model: The MMAlpha Software", P. COUSSY, A. MORAWIEC (editors), Springer Netherlands, 2008, pp. 215-230, http://dx.doi.org/10.1007/978-1-4020-8588-8 - [7] L. IMBERT, A. PEIRERA, A. TISSERAND. *A Library for Prototyping the Computer Arithmetic Level in Elliptic Curve Cryptography*, in "Proc. Advanced Signal Processing Algorithms, Architectures and Implementations XVII", San Diego, California, U.S.A., F. T. LUK (editor), SPIE, August 2007, vol. 6697, n<sup>o</sup> 66970N, pp. 1–9, http://dx.doi.org/10.1117/12.733652 - [8] B. LE GAL, E. CASSEAU, S. HUET. Dynamic Memory Access Management for High-Performance DSP Applications Using High-Level Synthesis, in "IEEE Transactions on Very Large Scale Integration Systems", November 2008, vol. 16, no 11, pp. 1454-1464 - [9] K. MARTIN, C. WOLINSKI, K. KUCHCINSKI, A. FLOCH, F. CHAROT. Constraint-Driven Instructions Selection and Application Scheduling in the DURASE system, in "Proc. of the 20th IEEE International Conference on Application-Specific Systems, Architectures and Processors", Boston, MA, USA, IEEE Computer Society, July 2009, pp. 145-152 - [10] D. MENARD, D. CHILLET, O. SENTIEYS. *Floating-to-fixed-point Conversion for Digital Signal Processors*, in "EURASIP Journal on Applied Signal Processing (JASP), Special Issue Design Methods for DSP Systems", 2006, vol. 2006, n<sup>o</sup> 1, pp. 1–15 [11] D. MENARD, O. SENTIEYS. Automatic Evaluation of the Accuracy of Fixed-point Algorithms, in "IEEE/ACM Design, Automation and Test in Europe (DATE-02)", Paris, March 2002 - [12] S. PILLEMENT, O. SENTIEYS, R. DAVID. *DART: A Functional-Level Reconfigurable Architecture for High Energy Efficiency*, in "EURASIP Journal on Embedded Systems (JES)", 2008, pp. 1-13, Article ID 562326, 13 pages - [13] C. PLAPOUS, C. MARRO, P. SCALART. *Improved signal-to-noise ratio estimation for speech enhancement*, in "IEEE Transactions on Speech and Audio Processing", 2006, vol. 14, n<sup>o</sup> 6 - [14] A. TISSERAND. High-Performance Hardware Operators for Polynomial Evaluation, in "Int. J. High Performance Systems Architecture", March 2007, vol. 1, n<sup>o</sup> 1, pp. 14–23, invited paper, http://dx.doi.org/10.1504/ IJHPSA.2007.013288 - [15] C. WOLINSKI, K. KUCHCINSKI, E. RAFFIN. Automatic Design of Application-Specific Reconfigurable Processor Extensions with UPaK Synthesis Kernel, in "ACM Transactions on Design Automation of Electronic Systems", 2009, vol. 15, no 1, pp. 1–36, http://doi.acm.org/10.1145/1640457.1640458 ### **Publications of the year** ### **Doctoral Dissertations and Habilitation Theses** - [16] M. M. ALAM., Techniques adaptatives pour la gestion de l'énergie dans les réseaux capteurs sans fil, Université Rennes 1, February 2013, http://hal.inria.fr/tel-00931860 - [17] R. BONAMY., Modélisation, Exploration et Estimation de la Consommation pour les Architectures Hétérogènes Reconfigurables Dynamiquement, Université Rennes 1 and Université Rennes 1, July 2013, http://hal.inria.fr/tel-00931849 - [18] T. CHABRIER., *Arithmetic recodings for ECC cryptoprocessors with protections against side-channel attacks*, Université Rennes 1, June 2013, http://hal.inria.fr/tel-00910879 - [19] C. Guy., Facilités de typage pour l'ingénierie des langages, Université Rennes 1, December 2013, http://hal. inria.fr/tel-00917789 - [20] A. MORVAN., *Utilisation du modèle polyédrique pour la synthèse d'architectures pipelinées*, École normale supérieure de Cachan ENS Cachan, June 2013, http://hal.inria.fr/tel-00913692 - [21] V. TOVINAKERE DWARAKANATH., Contrôleurs reconfigurables ultra-faible consommation pour les réseaux de capteurs sans fil, Université Rennes 1, February 2013, http://hal.inria.fr/tel-00859921 - [22] H. YVIQUEL., From dataflow-based video coding tools to dedicated embedded multi-core platforms, Université Rennes 1, October 2013, http://hal.inria.fr/tel-00939346 ### **Articles in International Peer-Reviewed Journals** [23] R. BEN ATITALLAH, E. SENN, D. CHILLET, M. LANOE, D. BLOUIN. *An Efficient Framework for Power-Aware Design of Heterogeneous MPSoC*, in "IEEE Transactions on Industrial Informatics", February 2013, vol. 9, no 1, pp. 487-501 [DOI: 10.1109/TII.2012.2198657], http://hal.inria.fr/hal-00921900 - [24] R. BONAMY, S. BILAVARN, D. CHILLET, O. SENTIEYS. *Power Consumption Models for the Use of Dynamic and Partial Reconfiguration*, in "Microprocessors and Microsystems", January 2014 [DOI: 10.1016/J.MICPRO.2014.01.002], http://hal.inria.fr/hal-00941532 - [25] M. DJENDI, P. SCALART, A. GILLOIRE. Analysis of Two-Sensors Forward BSS Structure With Post-Filters in The Presence of Coherent and Incoherent Noise, in "Speech Communication", 2013, vol. 55, pp. 975-987, http://hal.inria.fr/hal-00939967 - [26] S. M. A. H. JAFRI, S. PIESTRAK, O. SENTIEYS, S. PILLEMENT. *Design of the coarse-grained reconfigurable architecture DART with on-line error detection*, in "Microprocessors and Microsystems", December 2013, MICPRO 2101 p. [DOI: 10.1016/J.MICPRO.2013.12.004], http://hal.inria.fr/hal-00927376 - [27] A. MORVAN, S. DERRIEN, P. QUINTON. *Polyhedral Bubble Insertion: A Method to Improve Nested Loop Pipelining for High-Level Synthesis*, in "IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems", February 2013, vol. 32, n<sup>o</sup> 3, pp. 339-352 [DOI: 10.1109/TCAD.2012.2228270], http://hal.inria.fr/hal-00921424 - [28] Q.-T. NGO, O. BERDER, P. SCALART. General minimum Euclidean distance-based precoder for MIMO wireless systems, in "EURASIP Journal on Advances in Signal Processing", March 2013, pp. 1-12 [DOI: 10.1186/1687-6180-2013-39], http://hal.inria.fr/hal-00932391 - [29] T. H. NGUYEN, F. GOMEZ AGIS, L. BRAMERIE, M. GAY, J.-C. SIMON, O. SENTIEYS. *Impact of Sampling-Source Extinction Ratio in Linear Optical Sampling*, in "IEEE Photonics Technology Letters", April 2013, vol. 27, no 7, pp. 663-666 [DOI: 10.1109/LPT.2013.2248353], http://hal.inria.fr/hal-00931661 - [30] T. STRIPF, O. OEY, T. BRUCKSCHLOEGLA, J. BECKER, G. RAUWERDA, K. SUNESEN, G. GOULAS, P. ALEFRAGIS, N. VOROS, S. DERRIEN, O. SENTIEYS, N. KAVVADIAS, G. DIMITROULAKOS, K. MASSELOS, D. KRITHARIDIS, N. MITAS, T. PERSCHKE. Compiling Scilab to high performance embedded multicore systems, in "Microprocessors and Microsystems", November 2013, vol. 37, no 8, pp. 1033-1049 [DOI: 10.1016/J.MICPRO.2013.07.004], http://hal.inria.fr/hal-00921437 - [31] Y. R. J. THOMAS, M. PICOT, A. CARER, O. BERDER, O. SENTIEYS, F. BARRIÈRE. *A single sediment-Microbial Fuel Cell powering a wireless telecommunication system*, in "Journal of Power Sources", November 2013, vol. 241, pp. 703-708 [DOI: 10.1016/J.JPOWSOUR.2013.05.016], http://hal.inria.fr/hal-00832354 - [32] L.-Q.-V. TRAN, O. BERDER, O. SENTIEYS. *On the performance of distributed space-time coded cooperative relay networks based on inter-relay communications*, in "EURASIP Journal on Wireless Communications and Networking", October 2013, vol. 1, pp. 1-15, http://hal.inria.fr/hal-00931826 - [33] H. YVIQUEL, J. BOUTELLIER, M. RAULET, E. CASSEAU. Automated design of networks of Transport-Triggered Architecture processors using Dynamic Dataflow Programs, in "Signal Processing: Image Communication", September 2013, vol. 28, n<sup>o</sup> 10, pp. 1295 - 1302 [DOI: 10.1016/J.IMAGE.2013.08.013], http:// hal.inria.fr/hal-00909325 #### **Articles in National Peer-Reviewed Journals** [34] P. COTRET, G. GOGNIAT. Protection des architectures hétérogènes sur FPGA: une approche par parefeux matériels, in "Techniques de l'Ingenieur", February 2014, 10 p., Référence IN175, http://hal.inria.fr/ hal-00866646 [35] J. C. NAUD, D. MENARD, O. SENTIEYS. Évaluation de la précision en virgule fixe dans le cas des structures conditionnelles, in "Techniques et Sciences Informatiques", January 2013, vol. 32, n<sup>o</sup> 2, pp. 179-201, http://hal.inria.fr/hal-00743415 ### **International Conferences with Proceedings** - [36] M. M. ALAM, O. BERDER, D. MENARD, O. SENTIEYS. *On the Energy Savings of Adaptive Transmit Power for Wireless Sensor Networks Radio Transceivers*, in "26th International Conference on Architecture of Computing Systems (ARCS)", Prague, Czech Republic, February 2013, http://hal.inria.fr/hal-00876141 - [37] M. ALLE, A. MORVAN, S. DERRIEN. Runtime dependency analysis for loop pipelining in High-Level Synthesis, in "50th Design Automation Conference (DAC)", Austin, United States, ACM, May 2013, http://hal.inria.fr/hal-00921416 - [38] V. Bhatnagar, G. S. Ouedraogo, M. Gautier, A. Carer, O. Sentieys. *An FPGA Software Defined Radio Platform with a High-Level Synthesis Design Flow*, in "IEEE International Vehicular Technology conference (VTC-Spring13)", Germany, June 2013, 12 p., http://hal.inria.fr/hal-00833554 - [39] K. BIGOU, A. TISSERAND. *Improving Modular Inversion in RNS using the Plus-Minus Method*, in "CHES 15th Workshop on Cryptographic Hardware and Embedded Systems 2013", Santa Barbara, United States, G. BERTONI, J.-S. CORON (editors), Springer, May 2013, vol. 8086, pp. 233-249 [DOI: 10.1007/978-3-642-40349-1\_14], http://hal.inria.fr/hal-00825745 - [40] F. BROEKAERT, A. DIDIOUI, C. BERNIER, O. SENTIEYS. Back to Results Prototyping an Energy Harvesting Wireless Sensor Network Application Using HarvWSNet, in "Proceedings of 26th International Conference on Architecture of Computing Systems (ARCS)", Prague, Czech Republic, 2013, pp. 1-6, http://hal.inria.fr/hal-00931782 - [41] M. CATAN, R. D. COSMO, A. EICHE, T. A. LASCU, M. LIENHARDT, J. MAURO, R. TREINEN, S. ZACCHIROLI, G. ZAVATTARO, J. ZWOLAKOWSKI. Aeolus: Mastering the Complexity of Cloud Application Deployment, in "ESOCC European Conference on Service-Oriented and Cloud Computing 2013", Malaga, Spain, K.-K. LAU, W. LAMERSDORF, E. PIMENTEL (editors), Lecture Notes in Computer Science, Springer, 2013, vol. 8135, pp. 1-3 [DOI: 10.1007/978-3-642-40651-5\_1], http://hal.inria.fr/hal-00909298 - [42] T. CHABRIER, A. TISSERAND. *On-the-Fly Multi-Base Recoding for ECC Scalar Multiplication without Pre-Computations*, in "ARITH 21st IEEE International Symposium on Computer Arithmetic", Austin, TX, United States, IEEE, April 2013, http://hal.inria.fr/hal-00772613 - [43] A. CHAKHARI, R. ROCHER, P. SCALART. Analytical approach to evaluate fixed point accuracy for an iteration of decision operators, in "2013 International Conference on Computer Applications Technology (ICCAT)", Sousse, Tunisia, January 2013, pp. 1-4 [DOI: 10.1109/ICCAT.2013.6521964], http://hal.inria. fr/hal-00920686 - [44] A. DIDIOUI, C. BERNIER, D. MORCHE, O. SENTIEYS. *HarvWSNet: A co-simulation framework for energy harvesting wireless sensor networks*, in "International Conference on Computing, Networking and Communications (ICNC)", San Diego, United States, 2013, pp. 808-812 [DOI: 10.1109/ICCNC.2013.6504192], http://hal.inria.fr/hal-00931772 - [45] A. DIDIOUI, C. BERNIER, D. MORCHE, O. SENTIEYS. Power reconfigurable receiver model for energy-aware applications, in "IEEE 56th International Midwest Symposium on Circuits and Systems (MWSCAS)", Colombus, United States, 2013, pp. 800-803 [DOI: 10.1109/MWSCAS.2013.6674770], http://hal.inria.fr/hal-00931775 - [46] A. FLOCH, T. YUKI, A. EL-MOUSSAWI, A. MORVAN, K. MARTIN, M. NAULLET, M. ALLE, L. L'HOURS, N. SIMON, S. DERRIEN, F. CHAROT, C. WOLINSKI, O. SENTIEYS. *GeCoS: A framework for prototyping custom hardware design flows*, in "13th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM)", Eindhoven, Netherlands, B. ADAMS, J. RILLING, F. KHOMH (editors), IEEE, September 2013, pp. 100-105 [DOI: 10.1109/SCAM.2013.6648190], http://hal.inria.fr/hal-00921370 - [47] M. GAUTIER, D. NOGUET. Signal detection using watermark insertion, in "IEEE International Vehicular Technology conference (VTC-Spring13)", Dresden, France, June 2013, 11 p., http://hal.inria.fr/hal-00833552 - [48] G. GOULAS, C. VALOUXIS, P. ALEFRAGIS, N. VOROS, O. OEY, T. STRIPF, T. BRUCKSCHLÖGL, J. BECKER, C. GOGOS, A. EL-MOUSSAWI, M. NAULLET, T. YUKI. Coarse-Grain Optimization and Code Generation for Embedded Multicore Systems, in "16th Euromicro Conference on Digital System Design (DSD)", Santander, Spain, September 2013, pp. 379-386 [DOI: 10.1109/DSD.2013.48], http://hal.inria.fr/hal-00921459 - [49] Q. H. KHUAT, D. CHILLET. Communication Cost Reduction For Hardware Tasks Placed on Homogeneous Reconfigurable Resource, in "DASIP 2013, Design and Architectures for Signal and Image Processing", Cagliari, Italy, October 2013, pp. 265-270, http://hal.inria.fr/hal-00921869 - [50] T.-N. LE, M. MAGNO, A. PEGATOQUET, O. BERDER, O. SENTIEYS, E. POPOVICI. Ultra Low Power Asynchronous MAC Protocol using Wake-Up Radio for Energy Neutral Wireless Sensor Networks, in "1st International Workshop on Energy-Neutral Sensing Systems (ENSsys) organized in conjunction with 11th ACM SenSys Conference", Rome, Italy, November 2013, Paper 10 [DOI: 10.1145/2534208.2534221], http://hal.inria.fr/hal-00921329 - [51] T.-N. LE, A. PEGATOQUET, O. BERDER, O. SENTIEYS. Multi-Source Power Manager for Super-Capacitor based Energy Harvesting Wireless Sensor Networks, in "1st International Workshop on Energy Neutral Sensing Systems (ENSSys) organized in conjunction with 11th ACM SenSys Conference", Rome, Italy, November 2013, Paper 19 [DOI: 10.1145/2534208.2534227], http://hal.inria.fr/hal-00921320 - [52] T.-N. LE, A. PEGATOQUET, O. SENTIEYS, O. BERDER, C. BELLEUDY. *Duty-Cycle Power Manager for Thermal-Powered Wireless Sensor Networks*, in "24th IEEE International Symposium on Personal, Indoor and Mobile Radio Communications", Londres, United Kingdom, September 2013, pp. 1645-1649 [DOI: 10.1109/PIMRC.2013.6666406], http://hal.inria.fr/hal-00921315 - [53] T.-N. LE, O. SENTIEYS, O. BERDER, A. PEGATOQUET, C. BELLEUDY. Adaptive Filter for Energy Predictor in Energy Harvesting Wireless Sensor Networks, in "26th IEEE International Conference on Architecture of Computing Systems (ARCS), 3rd Workshop on Ultra Low Power (WUPS)", Prague, Czech Republic, February 2013, pp. 1-4, http://hal.inria.fr/hal-00921309 - [54] D.-L. NGUYEN, L.-Q.-V. TRAN, O. BERDER, O. SENTIEYS. A Low-Latency and Energy-Efficient MAC Protocol for Cooperative Wireless Sensor Networks, in "Global Communications Conference (Globecom)", Atlanta, United States, IEEE, December 2013, http://hal.inria.fr/hal-00931828 [55] H. NGUYEN VIET, O. BERDER, P. SCALART. On the efficiency of sphere decoding for linearly precoded MIMO systems, in "Wireless Communications and Networking Conference (WCNC)", Shanghai, China, IEEE, April 2013, pp. 4021-4025, http://hal.inria.fr/hal-00931835 - [56] Best Paper - K. PARASHAR, D. MENARD, O. SENTIEYS. *A Polynomial Time Algorithm for Solving the Word-length Optimization Problem*, in "IEEE/ACM International Conference on Computer-Aided Design (ICCAD)", San Diego, United States, November 2013, http://hal.inria.fr/hal-00876132. - [57] M. A. A. PASHA, S. DERRIEN, O. SENTIEYS. Component-Level Datapath Merging in System-Level Design of Wireless Sensor Node Controllers for FPGA-Based Implementations, in "Euromicro Conference on Digital System Design (DSD)", Santander, Spain, IEEE, September 2013, pp. 543-550 [DOI: 10.1109/DSD.2013.64], http://hal.inria.fr/hal-00921421 - [58] M. SAYED HASSAN, A. EL FALOU, C. LANGLAIS. *On the design of coded MIMO systems*, in "ICCIT: the 3rd International Conference on Communications and Information Technology", Beyrouth, Lebanon, 2013, pp. 335-339, http://hal.inria.fr/hal-00940407 - [59] P. UDUPA, O. SENTIEYS, P. SCALART. A Block-Parallel Architecture for Initial and Fine Synchronization in *OFDM Systems*, in "IEEE International Conference on Communications (ICC)", Budapest, Hungary, 2013, pp. 4761-4765 [DOI: 10.1109/ICC.2013.6655326], http://hal.inria.fr/hal-00931445 - [60] P. UDUPA, O. SENTIEYS, P. SCALART. A Novel Hierarchical Low Complexity Synchronization Method for OFDM Systems, in "2013 IEEE 77th Vehicular Technology Conference (VTC Spring)", Dresden, Germany, 2013, pp. 1-5 [DOI: 10.1109/VTCSPRING.2013.6691838], http://hal.inria.fr/hal-00931530 - [61] S. WULIANG, B. COMBEMALE, S. DERRIEN, R. FRANCE. Using Model Types to Support Contract-Aware Model Substitutability, in "9th European Conference on Modelling Foundations and Applications (ECMFA 2013)", Montpellier, France, P. VAN GORP, T. RITTER, L. ROSE (editors), LNCS, Springer-Verlag Berlin Heidelberg, 2013, vol. 7949, pp. 118-133 [DOI: 10.1007/978-3-642-39013-5\_9], http://hal.inria.fr/hal-00808770 - [62] C. XIAO, E. CASSEAU. Improving High-Level Synthesis Effectiveness Through Custom Operator Identification, in "IEEE International Symposium on Circuits and Systems", Melbourne, Australia, June 2014, http:// hal.inria.fr/hal-00931036 - [63] T. YUKI, A. MORVAN, S. DERRIEN. Derivation of Efficient FSM from Loop Nests, in "International Conference on Field-Programmable Technology (ICFPT)", Kyoto, Japan, IEEE, December 2013, http://hal. inria.fr/hal-00921446 - [64] H. YVIQUEL, E. CASSEAU, M. RAULET, P. JÄÄSKELÄINEN, J. TAKALA. *Towards run-time actor mapping of dynamic dataflow programs onto multi-core platforms*, in "International Symposium on Image and Signal Processing and Analysis (ISPA)", France, 2013, pp. 725 730, http://hal.inria.fr/hal-00909408 - [65] H. YVIQUEL, A. LORENCE, K. JERBI, G. COCHEREL, A. SANCHEZ, M. RAULET. *Orcc: multimedia development made easy*, in "The 21st ACM International Conference on Multimedia", France, 2013, pp. 863-866, http://hal.inria.fr/hal-00909401 [66] R. ZHANG, O. BERDER, O. SENTIEYS. *Energy efficient reservation-based opportunistic MAC scheme in multi-hop networks*, in "International Symposium on Personal Indoor and Mobile Radio Communications (PIMRC)", London, United Kingdom, IEEE, September 2013, pp. 1660 - 1665 [DOI: 10.1109/PIMRC.2013.6666409], http://hal.inria.fr/hal-00931831 ### **National Conferences with Proceedings** - [67] K. BIGOU, T. CHABRIER, A. TISSERAND. Opérateur matériel de tests de divisibilité par des petites constantes sur de très grands entiers, in "ComPAS'13 / SympA'15 Symposium en Architectures nouvelles de machines", Grenoble, France, January 2013, http://hal.inria.fr/hal-00772703 - [68] Q. H. KHUAT, Q. H. LE, D. CHILLET, A. COURTAY, E. CASSEAU. *Ordonnancement Spatio-Temporel 3D minimisant le coût de communications entre tâches*, in "XXIVe Colloque Gretsi Traitement du Signal et des Images", Brest, France, September 2013, pp. 1-7, http://hal.inria.fr/hal-00921867 - [69] Q. H. KHUAT, Q. H. LE, D. CHILLET, S. PILLEMENT. Ordonnancement spatio-temporel pour une architecture 3D composée d'une couche multiprocesseur et d'une couche ressource reconfigurables, in "Conférence d'informatique en Parallélisme, Architecture et Système", Grenoble, France, January 2013, ComPAS'2013, <a href="http://hal.inria.fr/hal-00808396">http://hal.inria.fr/hal-00808396</a> - [70] G. S. OUEDRAOGO, M. GAUTIER, O. SENTIEYS. *Description haut niveau de formes d'ondes pour la radio logicielle sur architectures reconfigurables*, in "XXIVe Colloque Gretsi Traitement du Signal et des Images", Brest, France, September 2013, http://hal.inria.fr/hal-00863361 - [71] O. SENTIEYS, M. A. A. PASHA, S. DERRIEN. Architectures de contrôleurs ultra-faible consommation pour noeuds de réseau de capteurs sans fil, in "XXIVe Colloque Gretsi Traitement du Signal et des Images", Brest, France, 2013, pp. 1-4, http://hal.inria.fr/hal-00931628 - [72] P. UDUPA, O. SENTIEYS, L. BRAMERIE. *Design and Implementation of DSP algorithms for 100 Gbps Coherent Optical-OFDM (CO-OFDM) Systems*, in "XXIVe Colloque Gretsi Traitement du Signal et des Images", Brest, France, 2013, pp. 1-4, http://hal.inria.fr/hal-00931542 ### **Conferences without Proceedings** - [73] C. BELLEUDY, T.-N. LE, A. PEGATOQUET, O. SENTIEYS, O. BERDER. Energy Monitor for Super Capacitor based Wireless Sensor Networks, in "Colloque National du GDR SoC-SiP", Lyon, France, June 2013, http:// hal.inria.fr/hal-00921284 - [74] K. BIGOU. Avancées sur l'utilisation de la représentation RNS pour la cryptographie sur courbes elliptiques, in "CRYPTO'PUCES 2013", Porquerolles, France, May 2013, http://hal.inria.fr/hal-00830504 - [75] K. BIGOU, A. TISSERAND. Crypto-processeur ECC en RNS sur FPGA avec inversion modulaire rapide, in "Colloque national du GDR SoC-SiP 2013", Lyon, France, June 2013, http://hal.inria.fr/hal-00830610 - [76] D. CHILLET. Sensibilisation à la modélisation SART pour le développement de code temps réel, in "CETSIS, l0ème Colloque sur l'Enseignement des. Technologies et des Sciences de l'Information et des Systèmes", Caen, France, March 2013, http://hal.inria.fr/hal-00921865 - [77] C. HURIAUX, O. SENTIEYS, A. COURTAY. An FPGA Configuration Stream Architecture Supporting Seamless Hardware Accelerator Migration, in "ConfigComp'2013, Workshop on Reconfigurable Computing V2.0: The Next Generation of Technology, Architectures and Design Tools, held in conjunction to the DATE 2013 conference", Grenoble, France, 2013, http://hal.inria.fr/hal-00931572 - [78] Q. H. Khuat. Communication Cost Reduction For Hardware Tasks Placed on Homogeneous Reconfigurable Resource, in "GDR SoC SiP", Lyon, France, June 2013, http://hal.inria.fr/hal-00921197 - [79] G. S. OUEDRAOGO, M. GAUTIER, O. SENTIEYS. Vers un language spécialisé pour la radio logicielle sur FPGA, in "Colloque national du GDR SoC-SiP", Lyon, France, June 2013, 2 p., http://hal.inria.fr/hal-00922785 ### Scientific Books (or Scientific Book chapters) [80] O. Sentieys. *Efficacite energetique : les technologies de l'information*, in "L'énergie à découvert", R. MOSSERI, C. JEANDEL (editors), CNRS Editions, 2013, pp. 229-231, http://hal.inria.fr/hal-00931675 ### **Books or Proceedings Editing** [81] L. LAGADEC, S. PILLEMENT, A. TISSERAND (editors). , Architecture des ordinateurs, Technique et science informatique, Hermes, February 2013, vol. 32, 150 p. , Numéro spécial du Symposium en Architecture de Machines SympA'14, http://hal.inria.fr/hal-00819668 ### Patents and standards - [82] M. GAUTIER, V. BERG., Procede et dispositif de détection d'une sous-bande de fréquence dans une bande de fréquence et équipement de communication comprenant un tel dispositif, 2013, nº FR20120054118 20120504, http://hal.inria.fr/hal-00939233 - [83] M. GAUTIER, D. NOGUET., Method for Identifying and Detecting a Radio Signal For a Cognitive Communication System, 2013, no US Patent, 20130251014, http://hal.inria.fr/hal-00939225 ### References in notes - [84] S. HAUCK, A. DEHON (editors). , Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation, Morgan Kaufmann, 2008 - [85] Z. ALLIANCE., Zigbee specification, ZigBee Alliance, 2005, no ZigBee Document 053474r06, Version - [86] A. BACHIR, M. DOHLER, T. WATTEYNE, K. LEUNG. *MAC Essentials for Wireless Sensor Networks*, in "Communications Surveys Tutorials, IEEE", quarter 2010, vol. 12, n<sup>o</sup> 2, pp. 222 -248, http://dx.doi.org/10. 1109/SURV.2010.020510.00058 - [87] F. BARAT, M. JAYAPALA, T. VANDER AA, R. LAUWEREINS, G. DECONINCK, H. CORPORAAL. *Low Power Coarse-Grained Reconfigurable Instruction Set Processor*, in "International Workshop on Field Programmable Logic and Applications", Lecture Notes in Computer Science, September 2003, pp. 230–239 - [88] V. BAUMGARTE, G. EHLERS, F. MAY, A. NÜCKEL, M. VORBACH, M. WEINHARDT. *PACT XPP A Self-Reconfigurable Data Processing Architecture*, in "The Journal of Supercomputing", 2003, vol. 26, n<sup>o</sup> 2, pp. 167–184 - [89] C. Bobda., Introduction to Reconfigurable Computing: Architectures Algorithms and Applications, Springer, 2007 - [90] J. M. P. CARDOSO, P. C. DINIZ, M. WEINHARDT. *Compiling for reconfigurable computing: A survey*, in "ACM Comput. Surv.", June 2010, vol. 42, 13:1 p., http://doi.acm.org/10.1145/1749603.1749604 - [91] D. CHILLET, S. PILLEMENT, O. SENTIEYS. A Neural Network Model for Real-Time Scheduling on Heterogeneous SoC Architectures, in "IEEE International Joint Conference on Neural Networks, IJCNN'07", Orlando, FL, August, 12-17 2007 - [92] M. CLARK, M. MULLIGAN, D. JACKSON, D. LINEBARGER. Accelerating Fixed-Point Design for MB-OFDM UWB Systems, in "CommsDesign", 2005, http://www.commsdesign.com/showArticle.jhtml?articleID=57703818 - [93] L. COLLIN, O. BERDER, P. ROSTAING, G. BUREL. *Optimal minimum distance-based precoder for MIMO spatial multiplexing systems*, in "IEEE Transactions on Signal Processing", 2004, vol. 52, n<sup>o</sup> 3, pp. 617–627 - [94] K. COMPTON, S. HAUCK. Reconfigurable computing: a survey of systems and software, in "ACM Comput. Surv.", 2002, vol. 34, no 2, pp. 171–210, http://doi.acm.org/10.1145/508352.508353 - [95] G. CONSTANTINIDES, P. CHEUNG, W. LUK. Wordlength optimization for linear digital signal processing, in "IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems", October 2003, vol. 22, no 10, pp. 1432- 1442 - [96] M. COORS, H. KEDING, O. LUTHJE, H. MEYR. *Fast Bit-True Simulation*, in "Proc. ACM/IEEE Design Automation Conference (DAC)", Las Vegas, june 2001, pp. 708-713 - [97] S. Cui, A. Goldsmith, A. Bahai. *Energy-efficiency of MIMO and cooperative MIMO techniques in sensor networks*, in "IEEE Journal on Selected Areas in Communications", 2004, vol. 22, n<sup>o</sup> 6, pp. 1089–1098 - [98] K. DANNE, R. MUHLENBERND, M. PLATZNER. Executing hardware tasks on dynamically reconfigurable devices under real-time conditions, in "International Conference on Field Programmable Logic and Applications", Lecture Notes in Computer Science, 2006 - [99] R. DAVID, S. PILLEMENT, O. SENTIEYS. *Energy-Efficient Reconfigurable Processors*, in "Low Power Electronics Design", C. PIGUET (editor), Computer Engineering, Vol 1, CRC Press, August 2004, chap. 20 - [100] M. DOHLER, E. LEFRANC, H. AGHVAMI. *Space-time block codes for virtual antenna arrays*, in "The 13th IEEE International Symposium on Personal, Indoor and Mobile Radio Communications", 2002, vol. 1 - [101] A. DUNKELS, B. GRONVALL, T. VOIGT. Contiki-a lightweight and flexible operating system for tiny networked sensors, in "Proceedings of the First IEEE Workshop on Embedded Networked Sensors", 2004 - [102] P. FARABOSHI, G. BROWN, J. FISHER, G. DESOLI. *Lx: A technology Platform for Customizable VLIW Embedded Processing*, in "ACM/IEEE Int. Symp. on Computer Architecture (ISCA 00)", Vancouver, Canada, June 2000, pp. 203–213 [103] P. GARCIA, K. COMPTON, M. SCHULTE, E. BLEM, W. Fu. An overview of reconfigurable hardware in embedded systems, in "EURASIP J. Embedded Syst.", January 2006, vol. 2006, pp. 1–19 - [104] S. HAUCK, A. DEHON., Reconfigurable computing: the theory and practice of FPGA-based computation, Series on Systems on Silicon, Morgan Kaufmann, 2008 - [105] A. HORMATI, M. KUDLUR, S. MAHLKE, D. BACON, R. RABBAH. Optimus: efficient realization of streaming applications on FPGAs, in "Proc. Int. Conf. on Compilers, Architecture, and Synthesis for Embedded Systems", New York, NY, USA, CASES'08, ACM, 2008, pp. 41–50, http://doi.acm.org/10.1145/1450095. 1450105 - [106] S. KIM, W. SUNG. Word-length optimization for high level synthesis of digital signal processing systems, in "IEEE Workshop on Signal Processing Systems", Boston, October 1998, pp. 142-151 - [107] K. Kum, J. Kang, W. Sung. *AUTOSCALER for C: An optimizing floating-point to integer C program converter for fixed-point digital signal processors*, in "IEEE Transactions on Circuits and Systems II Analog and Digital Signal Processing", September 2000, vol. 47, n<sup>o</sup> 9, pp. 840-848 - [108] J. LANEMAN, G. WORNELL. Distributed space-time-coded protocols for exploiting cooperative diversity in wireless networks, in "IEEE Transactions on Information Theory", 2003, vol. 49, no 10, pp. 2415–2425 - [109] A. LODI, M. TOMA, F. CAMPI, A. CAPPELLI, R. CANEGALLO, R. GUERRIERI. A VLIW Processor With Reconfigurable Instruction Set for Embedded Applications, in "IEEE J. of Solid-State Circuits", 2003, vol. 38, no 11, pp. 1876–1886 - [110] T. MARESCAUX, V. NOLLET, J. MIGNOLET, A. BARTICA, W. MOFFATA, P. AVASAREA, P. COENEA, D. VERKEST, S. VERNALDE, R. LAUWEREINS. *Run-time support for heterogeneous multitasking on reconfigurable SoCs*, in "Integration, the VLSI journal", 2004, vol. 38, pp. 107–130 - [111] B. MEI, S. VERNALDE, D. VERKEST, H. DE MAN, R. LAUWEREINS. *ADRES: An architecture with tightly coupled VLIW processor and coarse-grained reconfigurable matrix*, in "Proc. Int. Conf. on Field Programmable Logic and Applications", Springer, 2003, pp. 61–70 - [112] D. MENARD, D. CHILLET, F. CHAROT, O. SENTIEYS. *Automatic Floating-point to Fixed-point Conversion for DSP Code Generation*, in "IEEE/ACM Int. Conf. on Compilers, Architectures and Synthesis for Embedded Systems (CASES)", Grenoble, October 2002 - [113] H. NIKOLOV, M. THOMPSON, T. STEFANOV, A. PIMENTEL, S. POLSTRA, R. BOSE, C. ZISSULESCU, E. DEPRETTERE. *Daedalus: toward composable multimedia MP-SoC design*, in "Proc. Design Automation Conference", New York, NY, USA, DAC'08, ACM, 2008, pp. 574–579, http://doi.acm.org/10.1145/1391469. 1391615 - [114] Y. PARK, H. PARK, S. MAHLKE. CGRA express: accelerating execution using dynamic operation fusion, in "Proc. Int. Conf. on Compilers, Architecture, and Synthesis for Embedded Systems", New York, NY, USA, CASES'09, ACM, 2009, pp. 271–280, http://doi.acm.org/10.1145/1629395.1629433 - [115] J. RABAEY. *Reconfigurable Processing: The Solution to Low-Power Programmable DSP*, in "IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)", 1997, vol. 1, pp. 275–278 - [116] R. SALEH, S. WILTON, S. MIRABBASI, A. HU, M. GREENSTREET, G. LEMIEUX, P. PANDE, C. GRECU, A. IVANOV. *System-on-chip: reuse and integration*, in "Proceedings of the IEEE", 2006, vol. 94, n<sup>o</sup> 6, pp. 1050–1069 - [117] E. SALMINEN, A. KULMALA, T. D. HAMALAINEN. *Survey of Network-on-chip Proposals*, in "White Paper, OCP-IP", 2008, http://www.ocpip.org/socket/whitepapers - [118] K. SEEHYUN, K. KUM, W. SUNG. Fixed-point optimization utility for C and C++ based digital signal processing programs, in "IEEE Transactions on Circuits and Systems II Analog and Digital Signal Processing", nov 1998, vol. 45, no 11, pp. 1455 -1464, http://dx.doi.org/10.1109/82.735357 - [119] G. THEODORIDIS, D. SOUDRIS, S. VASSILIADIS. 2, in "A survey of coarse-grain reconfigurable architectures and CAD tools", Springer Verlag, 2007 - [120] Z. UL-ABDIN, B. SVENSSON. Evolution in architectures and programming methodologies of coarse-grained reconfigurable computing, in "Microprocessors and Microsystems", 2009, vol. 33, n<sup>o</sup> 3, pp. 161 178 [DOI: 10.1016/J.MICPRO.2008.10.003], http://www.sciencedirect.com/science/article/pii/S0141933108001038 - [121] G. VENKATARAMANI, W. NAJJAR, F. KURDAHI, N. BAGHERZADEH, W. BOHM, J. HAMMES. *Automatic compilation to a coarse-grained reconfigurable system-on-chip*, in "ACM Trans. on Embedded Computing Systems", 2003, vol. 2, n<sup>o</sup> 4, pp. 560–589, http://doi.acm.org/10.1145/950162.950167 - [122] C. WOLINSKI, M. GOKHALE, K. MCCAVE. A polymorphous computing fabric, in "Micro, IEEE", 2002, vol. 22, no 5, pp. 56–68 - [123] C. WOLINSKI, K. KUCHCINSKI, A. POSTOLA. *UPaK: abstract unified pattern based synthesis kernel for hardware and software systems*, in "University Booth, DATE 2007", Nice, France, May 2007 - [124] Z. A. YE, N. SHENOY, P. BANEIJEE. A C compiler for a processor with a reconfigurable functional unit, in "Proc. ACM/SIGDA Int. Symp. on Field Programmable Gate-Arrays, FPGA", New York, NY, USA, ACM Press, 2000, pp. 95–100, http://doi.acm.org/10.1145/329166.329187