# Activity Report 2012 # **Project-Team CAIRN** # **Energy Efficient Computing Architectures** IN COLLABORATION WITH: Institut de recherche en informatique et systèmes aléatoires (IRISA) RESEARCH CENTER Rennes - Bretagne-Atlantique THEME **Architecture and Compiling** ## **Table of contents** | 1. | Member | embers | | | | | | |----|-----------|-----------------------------------------------------------------------------------------|------------|--|--|--|--| | 2. | Overall | Overall Objectives | | | | | | | | 2.1. C | Overall Objectives | 2 | | | | | | | 2.2. H | lighlights of the Year | 4 | | | | | | 3. | Scientifi | ic Foundations | . 4 | | | | | | | 3.1. P | anorama | 4 | | | | | | | 3.2. R | Leconfigurable Architecture Design | 5 | | | | | | | 3.3. C | Compilation and Synthesis for Reconfigurable Platforms | 6 | | | | | | | 3.4. In | nteraction between Algorithms and Architectures | 7 | | | | | | 4. | Applica | tion Domains | . 7 | | | | | | | 4.1. P | anorama | 7 | | | | | | | 4.2. 4 | G Wireless Communication Systems | 8 | | | | | | | 4.3. V | Vireless Sensor Networks | 8 | | | | | | | 4.4. N | Multimedia processing | 8 | | | | | | 5. | Softwar | e | . 8 | | | | | | | 5.1. P | anorama | 8 | | | | | | | 5.2. G | Gecos | 10 | | | | | | | 5.3. II | D.Fix: Infrastructure for the Design of Fixed-point Systems | 11 | | | | | | | 5.4. U | JPaK: Abstract Unified Pattern-Based Synthesis Kernel for Hardware and Software Systems | 11 | | | | | | | 5.5. D | OURASE: Automatic Synthesis of Application-Specific Processor Extensions | 11 | | | | | | | 5.6. P | PowWow: Power Optimized Hardware and Software FrameWork for Wireless Motes (AP-L | <b>.</b> - | | | | | | | | 01) | 12 | | | | | | | | oCLib: Open Platform for Virtual Prototyping of Multi-Processors System on Chip | 13 | | | | | | 6. | New Re | sults | 13 | | | | | | | | Reconfigurable Architecture Design | 13 | | | | | | | | . Reconfiguration Controller | 13 | | | | | | | | . Low-Power Reconfigurable Arithmetic Operators | 14 | | | | | | | | . Ultra-Low-Power Reconfigurable Controllers | 14 | | | | | | | | . Models for Dynamically Reconfigurable Systems | 14 | | | | | | | 6. | .1.4.1. Power Models | 14 | | | | | | | | .1.4.2. High-Level Modeling of Reconfigurable Architectures | 15 | | | | | | | | . Fault-Tolerant Reconfigurable Architectures | 15 | | | | | | | | . Low-Power Architectures | 15 | | | | | | | | . Arithmetic Operators for Cryptography | 16 | | | | | | | | .1.7.1. Arithmetic Operators for Fast and Secure Cryptography | 16 | | | | | | | | .1.7.2. ECC Processor with Protections Against SCA | 16 | | | | | | | | . 3D Heterogeneous SoC Design | 16 | | | | | | | | Compilation and Synthesis for Reconfigurable Platform | 17 | | | | | | | 6.2.1 | | 17 | | | | | | | 6.2.2 | | 17 | | | | | | | 6.2.3 | e e e e e e e e e e e e e e e e e e e | 18 | | | | | | | 6.2.4 | | 18 | | | | | | | | nteraction between Algorithms and Architectures | 19 | | | | | | | 6.3.1 | , , <u>,</u> | 19 | | | | | | | 6.3.2 | • | 19 | | | | | | | 6.3.3 | · | 19 | | | | | | | 6.3.4 | <i>e,</i> | | | | | | | | | Networks | 20 | | | | | | | 6.3.5 | . Cooperative Strategies for Low-Energy Wireless Networks | 20 | | | | | | | 6.3.6. | Opportunistic Routing | 20 | |--------------------------------------|------------|----------------------------------------------------------------------------|-----------| | | 6.3.7. | Adaptive Techniques for WSN Power Optimization | 20 | | | 6.3.8. | WSN for Health Monitoring | 21 | | | 6.3.9. | Reconfigurable Video Coding | 21 | | | 6.3.10. | A Low-Complexity Synchronization Method for OFDM Systems | 22 | | | 6.3.11. | Flexible hardware accelerators for biocomputing applications | 22 | | 7. | Partnersh | ips and Cooperations | <b>22</b> | | | 7.1. Eur | opean Initiatives | 22 | | | 7.1.1. | FP7 FLEXTILES | 22 | | | 7.1.2. | FP7 ALMA | 23 | | | 7.1.3. | Collaborations with Major European Organizations | 24 | | | 7.2. Nat | ional Initiatives | 24 | | | 7.2.1. | ANR Blanc - PAVOIS (2012–2016) | 24 | | | 7.2.2. | ANR INFRA 2011 - FAON (2012-2015) | 24 | | | 7.2.3. | Equipex FIT - Future Internet (of Things) | 25 | | | 7.2.4. | ANR Ingénérie Numérique et Sécurité - ARDyT (2011-2015) | 25 | | | 7.2.5. | ANR Ingénérie Numérique et Sécurité - COMPA (2011-2015) | 25 | | | 7.2.6. | ANR Ingénérie Numérique et Sécurité - DEFIS (2011-2015) | 25 | | | 7.2.7. | ANR ARPEGE - GRECO (2010-2013) | 26 | | | 7.2.8. | S2S4HLS | 26 | | | | NANO2012 Program - RecMotifs (2008-2012) | 26 | | | 7.2.10. | ANR Architectures du Futur Open-People (2009-2012) | 27 | | | 7.2.11. | Images and Networks competitiveness cluster - 100GFlex project (2010-2013) | 27 | | 7.3. International Initiatives | | rnational Initiatives | 27 | | | 7.3.1. | Inria Associate Team LRS | 27 | | | 7.3.2. | Inria International Partners | 27 | | | 7.3.3. | CNRS PICS - SPiNaCH (2012 - 2014) | 28 | | 7.4. International Research Visitors | | rnational Research Visitors | 28 | | | 7.4.1. | Visits of International Scientists | 28 | | | 7.4.2. | Internships | 28 | | 8. | | tion | <b>28</b> | | | | entific Animation | 28 | | | | ninars and Invitations | 29 | | | 8.3. Tea | ching - Supervision - PhD Committee | 29 | | | 8.3.1. | Teaching Responsibilities | 29 | | | 8.3.2. | Teaching | 30 | | | 8.3.3. | Supervision | 31 | | | 8.4. Pop | ularization | 33 | | 9. | Bibliograp | phy | <b>33</b> | **Keywords:** Hardware Accelerators, Compiling, Embedded Systems, Energy Consumption, Parallelism, Wireless Sensor Networks, Security, Signal Processing, Reconfigurable Hardware, Computer Arithmetic, System-On-Chip CAIRN is a common project with CNRS, University of Rennes 1, and ENS Cachan-Antenne de Bretagne, and is located on two sites: Rennes and Lannion. The team has been created on January the 1<sup>st</sup>, 2008 and is a "reconfiguration" of the former R2D2 research team from Irisa. Creation of the Project-Team: January 01, 2009. ## 1. Members #### **Research Scientists** François Charot [Research Associate (CR) Inria, Rennes] Steven Derrien [Professor, University of Rennes 1, ISTIC, on leave at Inria until Aug. 2012, Rennes, HdR] Daniel Menard [Associate Professor, University of Rennes 1, ENSSAT, Lannion, on leave at Inria until Aug. 2012, HdR] Olivier Sentieys [Team Leader, Professor, University of Rennes 1, ENSSAT, on secondment at Inria since Sep. 2012, Lannion, HdR] Arnaud Tisserand [Research Associate (CR) CNRS, Lannion, HdR] #### **Faculty Members** Olivier Berder [Associate Professor, University of Rennes 1, ENSSAT, Lannion, HdR] Emmanuel Casseau [Professor, University of Rennes 1, ENSSAT, Lannion, HdR] Daniel Chillet [Associate Professor, University of Rennes 1, ENSSAT, Lannion, HdR] Antoine Courtay [Associate Professor, University of Rennes 1, ENSSAT, Lannion, since Sep. 2012] Sébastien Pillement [Associate Professor, University of Rennes 1, IUT, Lannion, until Aug. 2012, HdR] Matthieu Gautier [Associate Professor, University of Rennes 1, IUT, Lannion] Patrice Quinton [Professor, Director of the Brittany branch of ENS Cachan, Rennes, HdR] Romuald Rocher [Associate Professor, University of Rennes 1, IUT, Lannion] Pascal Scalart [Professor, University of Rennes 1, ENSSAT, Lannion, HdR] Christophe Wolinski [Professor, University of Rennes 1, Director of ESIR, Rennes, HdR] #### **Engineers** Philippe Quémerais [Research Engineer (half time), University of Rennes 1, ENSSAT, Lannion] Arnaud Carer [100Gflex Project, Lannion] Raphaël Bardoux [FAON Project, Lannion] Mickaël Le Gentil [BoWI Project, Lannion] Remi Pallas [POF Project, Lannion] Nicolas Simon [DEFIS Project, Lannion] Maxime Naullet [IG Alma Project, Rennes] Vaibhav Bhatnagar [IC Inria SNOW Project, Lannion] #### **PhD Students** Michel Thériault [CSRNG Canada grant (co-supervision with Laval University, Québec), Lannion] Antoine Morvan [Inria grant, Nano2012 project, Rennes] Jean-Charles Naud [Inria grant, Nano2012 project, Lannion] Matthieu Texier [CEA grant, Saclay] Thomas Chabrier [Brittany Region/CG22 University grant, Lannion] Robin Bonamy [University grant, ANR OpenPeople, Lannion] Vivek D. Tovinakere [University grant, ITEA Geodes, Lannion] Mahtab Alam [University grant, ITEA Geodes, Lannion] Amine Didioui [CEA grant, Grenoble] Hervé Yviquel [MENRT grant, Lannion] Aymen Chakhari [Brittany Region Inria grant, Lannion] Trong-Nhan Le [University grant, ANR Greco, Lannion] Pramod P. Udupa [University grant, FUI 100Gflex, Lannion] Ganda-Stéphane Ouedraogo [MENRT grant, Lannion] Karim Bigou [Inria/DGA grant, Lannion] Franck Bucheron [DGA grant, Rennes] Quang-Hai Khuat [Brittany Region/CG22 University grant, Lannion] Ali Hassan El Moussawi [University grant, FP7 Alma, Rennes] Christophe Huriaux [MENRT grant, Lannion] Quang-Hoa Le [University grant, FP7 FlexTiles, Lannion] Jérémie Métairie [CNRS grant, ANR Pavois, Lannion] Viet-Hoa Nguyen [University grant, BoWI project, Lannion] Zhongwei Zheng [University grant, BoWI project, Lannion] Antoine Floch [Inria grant, Nano2012 project, Rennes, until May 2012] Antoine Eiche [University grant, ANR Cifaer, Lannion, until May 2012] Andrei Banciu [CIFRE grant, STMicroelectronics, Grenoble, until Feb. 2012] Karthick Parashar [Inria Cordi grant, Lannion, until Aug. 2012] Naeem Abbas [Inria grant, ANR Biowic Rennes, until May 2012] Le Quang Vinh Tran [MENRT grant, Lannion] Chenglong Xiao [Inria grant, Nano2012 project, Lannion, until Sep. 2012] Danuta Pamula [Co-tutelle France-Poland, Lannion, until Oct. 2012] #### **Post-Doctoral Fellows** Tomofumi Yuki [Since Nov. 2012, Rennes] Ruifeng Zhang [Lannion] Mythri Alle [Rennes] Pascal Cotret [ATER ENSSAT since Sep. 2012, Lannion] Ammar El Falou [ATER ENSSAT since Sep. 2012, Lannion] #### **Administrative Assistants** Nadia Saintpierre [Assistant, Inria, Rennes] Angélique Le Pennec [Assistant, University of Rennes 1, Enssat, Lannion] # 2. Overall Objectives ## 2.1. Overall Objectives **Abstract:** The CAIRN project-team researches new architectures, algorithms and design methods for flexible and energy efficiency domain-specific system-on-chip (SoC). As performance and energy-efficiency requirements of SoCs are continuously increasing, they become difficult to fulfill using only programmable processors solutions. To address this issue, we promote/advocate the use of reconfigurable hardware, i.e. hardware structures whose organization may change before or even during execution. Such reconfigurable SoCs offer high performance at a low energy cost, while preserving a high-level of flexibility. The group studies these SoCs from three angles: (i) The invention and design of new reconfigurable platforms with an emphasis on flexible arithmetic operator design, dynamic reconfiguration management and low-power consumption. (ii) The development of their corresponding design flows (compilation and synthesis tools) to enable their automatic design from high-level specifications. (iii) The interaction between algorithms and architectures especially for our main application domains (wireless communications, wireless sensor networks and digital security). The scientific goal of the CAIRN group is to research new hardware architectures of Reconfigurable System-on-Chips (RSoC) along with their associated design flows. RSoCs chips integrate reconfigurable blocks whose hardware structure may be adjusted before or even during a program execution. They originate from the possibilities opened up by Field Programmable Gate Arrays (FPGA) technology and by reconfigurable processors [90], [100]. Recent evolutions in technology and modern hardware systems confirm that reconfigurable systems are increasingly used in recent applications or embedded into more general system-on-chip (SoC) [105]. This architectural model has received a lot of attention in academia over the last decade [95], and is now considered for industrial use. One reason is the rapidly changing standards in communications and information security that require frequent device modifications. In many cases, software updates are not sufficient to keep devices on the market, while hardware redesigns remain too expensive. The need to continuously adapt the system to changing environments (e.g. cognitive radio) is another incentive to use dynamic reconfiguration at runtime. Last, with technologies at 65 nm and below, manufacturing problems strongly influence electrical parameters of transistors, and transient errors caused by particles or radiations will also appear more and more often during execution: error detection and correction mechanisms or autonomic self-control can benefit from reconfiguration capabilities. Standard processors or system-on-chips enable flexible software on fixed hardware, whereas reconfigurable platforms make possible *flexible software on flexible hardware*. As chip density increases [117], power efficiency has become "the Grail" of all chip architects, be they designing circuits for portable devices or for high-performance general-purpose processors. Indeed, power (or energy) constraints are now as equally important as performance constraints. Moreover, this power issue can often only be addressed through the use of a complete application-specific architectures, or by incorporating some application-specific components within a programmable SoC. Designers hence face a very difficult choice between the flexibility and short design time of programmable architectures and the power efficiency of specialized architecture. In this context, reconfigurable architectures are acknowledged for providing the best trade-off between power, performance, cost and flexibility. This efficiency stems from the fact that their hardware structure can be adapted to the application requirements [116], [100]. However, designing reconfigurable systems poses several challenges: first, the definition of the architecture structure itself along with its dynamic reconfiguration capabilities, and then, its corresponding compilation/synthesis tools. The scientific goal of CAIRN is therefore to leverage the background and past experience of its members to tackle these challenges. We therefore propose to approach energy efficient reconfigurable architectures from three angles: (i) the invention of new reconfigurable platforms, (ii) the development of their corresponding design and compilation tools, and (iii) the exploration of the interaction between algorithms and architectures. Wireless Communication is our privileged application domain, and it builds on our experience in 3G. Our research includes the prototyping of (subsets of) such applications on reconfigurable and programmable platforms. For this application domain, the high computational complexity of the Next-Generation (4G) Wireless Communication Systems calls for the design of highly specialized high-performance architectures. In Wireless Sensor Networks (WSN), where each wireless node is expected to operate without battery replacement for significant periods of time, energy consumption is the most important constraint. In this context, our research focuses on energy-efficient architectures and wireless cooperative techniques for WSN and wireless transmission in Intelligent Transportation Systems (ITS). Other important fields such as automotive, digital security and multimedia processing are also considered. Members of the CAIRN team have collaborations with large companies like STMicroelectronics (Grenoble), Technicolor (Rennes), Thales (Paris), Alcatel (Lannion), France-Telecom Orange Labs (Lannion), Atmel (Nantes), Xilinx (USA), SME like Geensys (Nantes), R-interface (Marseille), TeamCast/Ditocom (Rennes), Sensaris (Grenoble), Envivio (Rennes), InPixal (Rennes), Sestream (Paris), Ekinops (Lannion) and Institute like DGA (Rennes), CEA (Saclay, Grenoble). They are involved in several national or international funded projects (FP7 Alma, FP7 Flextiles, Nano2012 S2S4HLS and RECMOTIF projects, ANR funded Pavois, Ardyt, Defis, Faon, Compa, Open-People, Greco, Ocelot and "Images&Networks Competitiveness Cluster" funded 100Gflex). ## 2.2. Highlights of the Year • Olivier Berder defended its "Habilitation à Diriger des Recherches (HDR)" thesis in 2012. ## 3. Scientific Foundations #### 3.1. Panorama The development of complex applications is traditionally split in three stages: a theoretical study of the algorithms, an analysis of the target architecture and the implementation. When facing new emerging applications such as high-performance, low-power and low-cost mobile communication systems or smart sensor-based systems, it is mandatory to strengthen the design flow by a joint study of both algorithmic and architectural issues <sup>1</sup>. Figure 1. CAIRN's general design flow and related research themes Figure 1 shows the global design flow that we propose to develop. This flow is organized in levels which refer to our three research themes: application optimization (new algorithms, fixed-point arithmetic and advanced representations of numbers), architecture optimization (reconfigurable and specialized hardware, application-specific processors), and stepwise refinement and code generation (code transformations, hardware synthesis, compilation). <sup>&</sup>lt;sup>1</sup>Often referenced as algorithm-architecture mapping or interaction. In the rest of this part, we briefly describe the challenges concerning **new reconfigurable platforms** in Section 3.2, the issues on **compiler and synthesis tools** related to these platforms in Section 3.3, and the remaining challenges in **algorithm architecture interaction** in Section 3.4. ## 3.2. Reconfigurable Architecture Design Over the last two decades, there has been a strong push of the research community to evolve static programmable processors into run-time dynamic and partial reconfigurable (DPR) architectures. Several research groups around the world have hence proposed reconfigurable hardware systems operating at various levels of granularity. For example, functional-level reconfiguration has been proposed to increase the efficiency of programmable processors without having to pay for the FPGAs penalties. These coarse-grained reconfigurable architectures (CGRAs) provide operator-level configurable functional blocks and word-level datapaths. The main goal of this class of architectures is to provide flexibility while minimizing reconfiguration overhead (there exists several recent surveys on this topic [120], [104], [85], [125]). Compared to fine-grained architectures, CGRAs benefit from a massive reduction in configuration memory and configuration delay, as well as a considerable reduction in routing and placement complexity. This, in turns, results in an improvement in the computation volume over energy cost ratio, even if it comes at the price of a loss of flexibility compared to bit-level operations. Such constraints have been taken into account in the design of DART [100][12], CRIP [88], Adres [112] or others [122]. These works have led to commercial products such as the Extreme Processor Platform (XPP) [89] from PACT or Montium <sup>2</sup> from Recore systems. Another strong trend is the design of hybrid architectures which combine standard GPP or DSP cores with arrays of *configurable elements* such as the Lx [103], or of *field-configurable elements* such as the Xirisc processor [110] and more recently by commercial platforms such as the Xilinx Zynq-7000. Some of their benefits are the following: functionality on demand (set-top boxes for digital TV equipped with decoding hardware on demand), acceleration on demand (coprocessors that accelerate computationally demanding applications in multimedia or communications applications), and shorter time-to-market (products that target ASIC platforms can be released earlier using reconfigurable hardware). Dynamic reconfiguration enables an architecture to adapt itself to various incoming tasks. This requires complex resource management and control which can be provided as services by a real-time operating system (RTOS) [111]: communication, memory management, task scheduling [99], [92][1] and task placement [19]. Such an Operating System (OS) based approach has many advantages: it provides a complete design framework, that is independent of the technology and of the underlying hardware architecture, helping to drastically reduce the full platform design time. Due to the unpredictable execution of tasks, the OS must be able to allocate resource to tasks at run-time along with mechanisms to support inter-task communication. An efficient way to support such communications is to resort to a network-on-chip [118]. The role of the communication infrastructure is then to support transactions between different components of the platform, either between macro-components – main processor, dedicated modules, dynamically reconfigurable component – or within the elements of the reconfigurable components themselves. In CAIRN we mainly target reconfigurable system-on-chip (RSoC) defined as a set of computing and storing resources organized around a flexible interconnection network and integrated within a single silicon chip (or programmable chip such as FPGAs). The architecture is customized for an application domain, and the flexibility is provided by both hardware reconfiguration and software programmability. Computing resources are therefore highly heterogeneous and raise many issues that we discuss in the following: • Reconfigurable hardware blocks with a dynamic behavior where reconfigurability can be achieved at the bit- or operator-level. Our research aims at defining new reconfigurable architectures including computing and memory resources. Since reconfiguration must happen as fast as possible (typically within a few cycles), reducing the configuration time overhead is also a key issue. <sup>&</sup>lt;sup>2</sup>http://www.recoresystems.com/technology/montium-technology - When performance and power consumption are major constraints, it is acknowledged that optimized specialized hardware blocks (often called IPs for Intellectual Properties) are the best (and often the only) solution. Therefore, we also study architecture and tools for specialized hardware accelerators and for multi-mode components. - Customized **processors with a specialized instruction-set** also offer a viable solution to trade between energy efficiency and flexibility. They are particularly relevant for modern FPGA platforms where many processor cores can be embedded. For this topic, we focus on the automatic generation of heterogeneous (sequential or parallel) reconfigurable processor extensions that are tightly coupled to processor cores. ## 3.3. Compilation and Synthesis for Reconfigurable Platforms In spite of their advantages, reconfigurable architectures lack efficient and standardized compilation and design tools. As of today, this still makes the technology impractical for large scale industrial use. Generating and optimizing the mapping from high-level specifications to reconfigurable hardware platforms is therefore a key research issue, and the problem has received considerable interest over the last years [115], [91], [121], [124]. In the meantime, the complexity (and heterogeneity) of these platforms has also been increasing quite significantly, with complex heterogeneous multi-cores architectures becoming a de facto standard. As a consequence, the focus of designers is now geared toward optimizing overall system-level performance and efficiency [106], [115], [114]. Here again, existing tools are not well suited, as they fail at providing a unified programming view of the programmable and/or reconfigurable components implemented on the platform. In this context we have been pursuing our efforts to propose tools whose design principles are based on a tight coupling between the compiler and the target hardware architectures. We build on the expertise of the team members in High Level Synthesis (HLS) [8], ASIP optimizing compilers [15] and automatic parallelization for massively parallel specialized circuits [6]. We first study how to increase the efficiency of standard programmable processor by extending their instruction set to speed-up compute intensive kernels. Our focus is on efficient and exact algorithms for the identification, selection and scheduling of such instructions [9]. We also propose techniques to synthesize reconfigurable (or multi-mode) architectures. We address these challenges by borrowing techniques from high-level synthesis, optimizing compilers and automatic parallelization, especially when dealing with nested loop kernels. The goal is then either to derive a custom fine-grain parallel architecture and/or to derive the configuration of a Coarse Grain Reconfigurable Architecture (CGRA). In addition, and independently of the scientific challenges mentioned above, proposing such flows also poses significant software engineering issues. As a consequence, we also study how leading edge Object Oriented software engineering techniques (Model Driven Engineering) can help the Computer Aided Design (CAD) and optimizing compiler communities prototyping new research ideas. Efficient implementation of multimedia and signal processing applications (in software for DSP cores or as special-purpose hardware) often requires, for reasons related to cost, power consumption or silicon area constraints, the use of fixed-point arithmetic, whereas the algorithms are usually specified in floatingpoint arithmetic. Unfortunately, fixed-point conversion is very challenging and time-consuming, typically demanding up to 50% of the total design or implementation time [93]. Thus, tools are required to automate this conversion. For hardware or software implementation, the aim is to optimize the fixed-point specification. The implementation cost is minimized under a numerical accuracy or an application performance constraint. For DSP-software implementation, methodologies have been proposed [108], [113] to achieve a conversion leading to an ANSI-C code with integer data types. For hardware implementation, the best results are obtained when the word-length optimization process is coupled with the high-level synthesis [107], [96]. Evaluating the effects of finite precision is one of the major and often the most time consuming step while performing fixed-point refinement. Indeed, in the word-length optimization process, the numerical accuracy is evaluated as soon as a new word-length is tested, thus, several times per iteration of the optimization process. Classical approaches are based on fixed-point simulations [97], [119]. They lead to long evaluation times and cannot be used to explore the entire design space. Therefore, our aim is to propose closed-form expressions of errors due to fixed-point approximations that are used by a fast analytical framework for accuracy evaluation. ## 3.4. Interaction between Algorithms and Architectures As CAIRN mainly targets domain-specific system-on-chip including reconfigurable capabilities, algorithmic-level optimizations have a great potential on the efficiency of the overall system. Based on the skills and experiences in "signal processing and communications" of some CAIRN's members, we conduct research on algorithmic optimization techniques under two main constraints: energy consumption and computation accuracy; and for two main application domains: fourth-generation (4G) mobile communications and wireless sensor networks (WSN). These application domains are very conducive to our research activities. The high complexity of the first one and the stringent power constraint of the second one, require the design of specific high-performance and energy-efficient SoCs. We also consider other applications such as video or bioinformatics, but this short state-of-the-art will be limited to wireless applications. The radio in both transmit and receive modes consumes the bulk of the total power consumption of the system. Therefore, protocol optimization is one of the main sources of significant energy reduction to be able to achieve self-powered autonomous systems. Reducing power due to radio communications can be achieved by two complementary main objectives: (i) minimizing the output transmit power while maintaining sufficient wireless link quality and (ii) minimizing useless wake-up and channel hearing while still being reactive. As the physical layer affects all higher layers in the protocol stack, it plays an important role in the energy-constrained design of WSNs. The question to answer can be summarized as: how much signal processing can be added to decrease the transmission energy (i.e. the output power level at the antenna) such that the global energy consumption be decreased? The temporal and spatial diversity of relay and multiple antenna techniques are very attractive due to their simplicity and their performance for wireless transmission over fading channels. Cooperative MIMO (multiple-input and multiple-output) techniques have been first studied in [101], [109] and have shown their efficiency in terms of energy consumption [98]. Our research aims at finding new energy-efficient cooperative protocols associating distributed MIMO with opportunistic and/or multiple relays and considering wireless channel impairments such as transmitters desynchronisation. Another way to reduce the energy consumption consists in decreasing the radio activity, controlled by the medium access (MAC) layer protocols. In this regard, low duty-cycle protocols, such as preamble-sampling MAC protocols, are very efficient because they improve the lifetime of the network by reducing the unnecessary energy waste [87]. As the network parameters (data rate, topology, etc.) can vary, we propose new adaptive MAC protocols to avoid overhearing and idle listening. Finally, MIMO precoding is now recognized as a very interesting technique to enhance the data rate in wireless systems, and is already used in Wi-Max standard (802.16e). This technique can also be used to reduce transmission energy for the same transmission reliability and the same throughput requirement. One of the most efficient precoders is based on the maximization of the minimum Euclidean distance ( $\max$ - $d_{min}$ ) between two received data vectors [94], but it is difficult to define the closed-form of the optimized precoding matrix for large MIMO system with high-order modulations. Our goal is to derive new generic precoders with simple expressions depending only on the channel angle and the modulation order. # 4. Application Domains #### 4.1. Panorama **keywords:** telecommunications, wireless communications, wireless sensor networks, content-based image retrieval, video coding, intelligent transportation systems, automotive, security Our research is based on realistic applications, in order to both discover the main needs created by these applications and to invent realistic and interesting solutions. The high complexity of the **Next-Generation (4G) Wireless Communication Systems** leads to the design of real-time high-performance specific architectures. The study of these techniques is one of the main field of applications for our research, based on our experience on WCDMA for 3G implementation. In **Wireless Sensor Networks** (WSN), where each wireless node has to operate without battery replacement for a long time, energy consumption is the most important constraint. In this domain, we mainly study energy-efficient architectures and wireless cooperative techniques for WSN. **Intelligent Transportation Systems** (ITS), and especially Automotive Systems, more and more apply technology advances. While wireless transmissions allow a car to communicate with another or even with road infrastructure, **automotive industry** can also propose driver assistance and more secure vehicles thanks to improvements in computation accuracy for embedded systems. Other important fields will also be considered: hardware cryptographic and security modules, specialized hardware systems for the filtering of the network traffic at high-speed, high-speed true-random number generation for security, content-based image retrieval and video processing. ## 4.2. 4G Wireless Communication Systems With the advent of the next generation (4G) broadband wireless communications, the combination of MIMO (Multiple-Input Multiple-Output) wireless technology with Multi-Carrier CDMA (MC-CDMA) has been recognized as one of the most promising techniques to support high data rate and high performance. Moreover, future mobile devices will have to propose interoperability between wireless communication standards (4G, WiMax ...) and then implement MIMO pre-coding, already used by WiMax standard. Finally, in order to maximize mobile devices lifetime and guarantee quality of services to consumers, 4G systems will certainly use cooperative MIMO schemes or MIMO relays. Our research activity focuses on MIMO pre-coding and MIMO cooperative communications with the aim of algorithmic optimization and implementation prototyping. ## 4.3. Wireless Sensor Networks Sensor networks are a very dynamic domain of research due, on the one hand, to the opportunity to develop innovative applications that are linked to a specific environment, and on the other hand to the challenge of designing totally autonomous communicating objects. Cross-layer optimizations lead to energy-efficient architectures and cooperative techniques dedicated to sensor networks applications. In particular, cooperative MIMO techniques are used to decrease the energy consumption of the communications. ## 4.4. Multimedia processing In multimedia applications, audio and video processing is the major challenge embedded systems have to face. It is computationally intensive with power requirements to meet. Video or image processing at pixel level, like image filtering, edge detection and pixel correlation or at block-level such as transforms, quantization, entropy coding and motion estimation have to be accelerated. We investigate the potential of reconfigurable architectures for the design of efficient and flexible accelerators in the context of multimedia applications. # 5. Software #### 5.1. Panorama With the ever raising complexity of embedded applications and platforms, the need for efficient and customizable compilation flows is stronger than ever. This need of flexibility is even stronger when it comes to research compiler infrastructures that are necessary to gather quantitative evidence of the performance/energy or cost benefits obtained through the use of reconfigurable platforms. From a compiler point of view, the challenges exposed by these complex reconfigurable platforms are quite significant, since they require the compiler to extract and to expose an important amount of coarse and/or fine grain parallelism, to take complex resource constraints into consideration while providing efficient memory hierarchy and power management. $Figure~2.~{\tt CAIRN'} s~{\it general~software~development~framework}.$ Because they are geared toward industrial use, production compiler infrastructures do not offer the level of flexibility and productivity that is required for compiler and CAD tool prototyping. To address this issue, we have designed an extensible source-to-source compiler infrastructure that takes advantage of leading edge model-driven object-oriented software engineering principles and technologies. Figure 2 shows the global framework that is being developed in the group. Our compiler flow mixes several types of intermediate representations. The baseline representation is a simple tree-based model enriched with control flow information. This model is mainly used to support our source-to-source flow, and serves as the backbone for the infrastructure. We use the extensibility of the framework to provide more advanced representations along with their corresponding optimizations and code generation plug-ins. For example, for our pattern selection and accuracy estimation tools, we use a data dependence graph model in all basic blocks instead of the tree model. Similarly, to enable polyhedral based program transformations and analysis, we introduced a specific representation for affine control loops that we use to derive a Polyhedral Reduced Dependence Graph (PRDG). Our current flow assumes that the application is specified as a system level hierarchy of communicating tasks, where each task is expressed using C (or Scilab in the short future), and where the system level representation and the target platform model are defined using Domain Specific Languages (DSL). **Gecos** (Generic Compiler Suite) is the main backbone of CAIRN's flow. It is an open source Eclipse-based flexible compiler infrastructure developed for fast prototyping of complex compiler passes. Gecos is a 100% Java based implementation and is based on modern software engineering practices such as Eclipse plugin or model-driven software engineering with EMF (Eclipse Modeling Framework). As of today, our flow offers the following features: - An automatic floating-point to fixed-point conversion flow (for HLS and embedded processors). ID.Fix is an infrastructure for the automatic transformation of software code aiming at the conversion of floating-point data types into a fixed-point representation. http://idfix.gforge.inria.fr. - A polyhedral-based loop transformation and parallelization engine (mostly targeted at HLS). http://gecos.gforge.inria.fr. It was used for source-to-source transformations in the context of Nano2012 projects in collaboration with STMicroelectronics. - A custom instruction extraction flow (for ASIP and dynamically reconfigurable architectures). Durase and UPaK are developed for the compilation and the synthesis targeting reconfigurable platforms and the automatic synthesis of application specific processor extensions. They use advanced technologies, such as graph matching and graph merging together with constraint programming methods. - Several back-ends to enable the generation of VHDL for specialized or reconfigurable IPs, and SystemC for simulation purposes (e.g. fixed-point simulations). #### **5.2. Gecos** **Participants:** Steven Derrien [corresponding author], Nicolas Simon, Maxime Naullet, Antoine Floc'h, Antoine Morvan, Clément Guy. Keywords: source-to-source compiler, model-driven software engineering, retargetable compilation. The Gecos (Generic Compiler Suite) project is a source-to-source compiler infrastructure developed in the CAIRN group since 2004. It was designed to enable fast prototyping of program analysis and transformation and is aims the hardware synthesis and retargetable compilation domains. Gecos is 100% Java based and takes advantage of modern model driven software engineering practices. It uses the Eclipse Modeling Framework (EMF) as an underlying infrastructure and takes benefits of its features to make it easily extensible. Gecos is open-source and is hosted on the Inria gforge at <a href="http://gecos.gforge.inria.fr">http://gecos.gforge.inria.fr</a>. The Gecos infrastructure is still under very active development, and serves as a backbone infrastructure to projects of the group (project S2S4HSL, ID.FIX). Part of the framework is jointly developed with Colorado State University and since 2012 it is used in the context of the ALMA European project. Development in Gecos in 2012 have mostly focused on the polyhedral loop transformation engine and its use for hardware synthesis. As a part of the ALMA project, significant efforts are also being made to develop a coarse-grain parallelization engine targeting a distributed memory machine model. ## 5.3. ID.Fix: Infrastructure for the Design of Fixed-point Systems Participants: Daniel Menard, Olivier Sentieys [corresponding author], Romuald Rocher, Nicolas Simon. Keywords: fixed-point arithmetic, source-to-source code transformation, accuracy optimization, dynamic range evaluation The different techniques proposed by the team for fixed-point conversion are implemented on the ID.Fix infrastructure. The application is described with a C code using floating-point data types and different pragmas, used to specify parameters (dynamic, input/output word-length, delay operations) for the fixed-point conversion. This tool determines and optimizes the fixed-point specification and then, generates a C code using fixed-point data types (ac\_fixed) from Mentor Graphics. The infrastructure is made-up of two main modules corresponding to the fixed-point conversion (ID.Fix-Conv) and the accuracy evaluation (ID.Fix-Eval) The different developments carried-out in 2012 allowed us to obtain a fixed-point conversion tool handling functions, conditional structures and repetitive structures having a fixed number of iterations during time. New optimization algorithms have been added. A simulator has been created to verify the results from our analytical approach. For the accuracy evaluation (Acc.Eval), conditional structures and correlation between noise sources have been considered. Some optimizations have been implemented to reduce the computing time and the division operator treatment has been integrated. A tutorial has also been created to install and use this tool. The development of this tool has been achieved thanks to a University of Rennes graduate engineer from November 2011 in the context of DEFIS ANR project and different students during their training period. # **5.4. UPaK: Abstract Unified Pattern-Based Synthesis Kernel for Hardware and Software Systems** Participants: Christophe Wolinski [corresponding author], François Charot, Antoine Floc'h. Keywords: compilation for reconfigurable systems, pattern extraction, constraint-based programming. We are developing (with strong collaboration of Lund University, Sweden and Queensland University, Australia) UPaK Abstract Unified Pattern Based Synthesis Kernel for Hardware and Software Systems [123]. The preliminary experimental results obtained by the UPak system show that the methods employed in the systems enable a high coverage of application graphs with small quantities of patterns. Moreover, high application execution speed-ups are ensured, both for sequential and parallel application execution with processor extensions implementing the selected patterns. UPaK is one of the basis for our research on compilation and synthesis for reconfigurable platforms. It is based on the HCDG representation of the Polychrony software designed at Inria-Rennes in the project-team Espresso. # **5.5. DURASE:** Automatic Synthesis of Application-Specific Processor Extensions Participants: Christophe Wolinski [corresponding author], François Charot, Antoine Floc'h. Keywords: compilation for reconfigurable systems, instruction-set extension, pattern extraction, graph covering, constraint-based programming. We are developing a framework enabling the automatic synthesis of application specific processor extensions. It uses advanced technologies, such as algorithms for graph matching and graph merging together with constraints programming methods. The framework is organized around several modules. - CoSaP: Constraint Satisfaction Problem. The goal of CoSaP is to decouple the statement of a constraint satisfaction problem from the solver used to solve it. The CoSaP model is an Eclipse plugin described using EMF to take advantage of the automatic code generation and of various EMF tools - HCDG: Hierarchical Conditional Dependency Graph. HCDG is an intermediate representation mixing control and data flow in a single acyclic representation. The control flow is represented as hierarchical guards specifying the execution or the definition conditions of nodes. It can be used in the Gecos compilation framework via a specific pass which translates a CDFG representation into an HCDG. - Patterns: Flexible tools for identification of computational pattern in a graph and graph covering. These tools model the concept of pattern in a graph and provide generic algorithms for the identification of pattern and the covering of a graph. The following sub-problems are addressed: (sub)-graphs isomorphism, patterns generation under constraints, covering of a graph using a library of patterns. Most of the implemented algorithms use constraints programming and rely on the CoSaP module to solve the optimization problem. # **5.6.** PowWow: Power Optimized Hardware and Software FrameWork for Wireless Motes (AP-L-10-01) Participants: Olivier Sentieys [corresponding author], Olivier Berder, Arnaud Carer, Steven Derrien. Keywords: Wireless Sensor Networks, Low Power, Preamble Sampling MAC Protocol, Hardware and Software Platform PowWow is an open-source hardware and software platform designed to handle wireless sensor network (WSN) protocols and related applications. Based on an optimized preamble sampling medium access (MAC) protocol, geographical routing and protothread library, PowWow requires a lighter hardware system than Zigbee [86] to be processed (memory usage including application is less than 10kb). Therefore, network lifetime is increased and price per node is significantly decreased. CAIRN's hardware platform (see Figure 3) is composed of: - The motherboard, designed to reduce power consumption of sensor nodes, embeds an MSP430 microcontroller and all needed components to process PowWow protocol except radio chip. JTAG, RS232, and I2C interfaces are available on this board. - The radio chip daughter board is currently based on a TI CC2420. - The coprocessing daughter board includes a low-power FPGA which allows for hardware acceleration for some PowWow features and also includes dynamic voltage scaling features to increase power efficiency. The current version of PowWow integrates an Actel IGLOO AGL250 FPGA and a programmable DC-DC converter. We have shown that gains in energy of up to 700 can be obtained by using FPGA acceleration on functions like CRC-32 or error detection with regards to a software implementation on the MSP430. - Finally, a last daughter board is dedicated to energy harvesting techniques. Based on the energy management component LTC3108 from Linear Technologies, the board can be configured with several types of stored energy (batteries, micro-batteries, super-capacitors) and several types of energy sources (a small solar panel to recover photovoltaic energy, a piezoelectric sensor for mechanical energy and a Peltier thermal energy sensor). PowWow distribution also includes a generic software architecture using event-driven programming and organized into protocol layers (PHY, MAC, LINK, NET and APP). The software is based on Contiki [102], and more precisely on the Protothread library which provides a sequential control flow without complex state machines or full multi-threading. Figure 3. CAIRN's PowWow motherboard with radio and energy-harvesting boards connected To optimize the network regarding a particular application and to define a global strategy to reduce energy, PowWow offers the following extra tools: over-the-air reprogramming (and soon reconfiguration), analytical power estimation based on software profiling and power measurements, a dedicated network analyzer to probe and fix transmissions errors in the network. More information can be found at <a href="http://powwow.gforge.inria.fr">http://powwow.gforge.inria.fr</a>. # 5.7. SoCLib: Open Platform for Virtual Prototyping of Multi-Processors System on Chip Participants: François Charot [corresponding author], Laurent Perraudeau. Keywords: SoC modeling, SystemC simulation model SoCLib is an open platform for virtual prototyping of multi-processors system on chip (MP-SoC) developed in the framework of the SoCLib ANR project. The core of the platform is a library of SystemC simulation models for virtual components (IP cores), with a guaranteed path to silicon. All simulation models are written in SystemC, and can be simulated with the standard SystemC simulation environment distributed by the OSCI organization. Two types of models are available for each IP-core: CABA (Cycle Accurate / Bit Accurate), and TLM-DT (Transaction Level Modeling with Distributed Time). All simulation models are distributed as free software. We have developed the simulation model of the NIOSII processor, of the Altera Avalon interconnect, and of the TMS320C62 DSP processor from Texas Instruments. Find more information on its dedicated web page: http://www.soclib.fr. # 6. New Results ## 6.1. Reconfigurable Architecture Design ## 6.1.1. Reconfiguration Controller Participants: Robin Bonamy, Daniel Chillet, Sébastien Pillement. Dynamically reconfigurable architectures, which can offer high performance, are increasingly used in different domains. Unfortunately, lots of applications cannot benefit from this new paradigm due to large timing overhead. Even for partial reconfiguration, modifying a small region of an FPGA takes few *ms* using the 14.5MB/s IP from Xilinx based on an embedded micro blaze processor. To cope with this problem by increasing performance, we have developed an ultra-fast power-aware reconfiguration controller (UPaRC) to boost the reconfiguration throughput up to 1.433 GB/s. UPaRC cannot only enhance the system performance, but also auto-adapt to various performance and consumption conditions. This could enlarge the range of supported applications and can optimize power-timing trade-off of reconfiguration phase for each selected application during run-time. The energy-efficiency of UPaRC over state-of-the-art reconfiguration controllers is up to 45 times more efficient [66]. ## 6.1.2. Low-Power Reconfigurable Arithmetic Operators Participants: Vivek D. Tovinakere, Olivier Sentieys, Arnaud Tisserand. Arithmetic operators with fixed input data sizes are a source of unnecessary power consumption when data of lower precision have to be processed for significant amount of time. Configuring the arithmetic operator for lower precision when adequate and suppressing standby power in unused logic gates of the circuit can provide the benefit of reduced power consumption. In this work a logic clustering approach to partition arithmetic circuits as a function of reconfigurable input data widths is presented. Unused clusters at a specific precision are power-gated to achieve aggressive leakage power reduction that is a source of significant power consumption in nanoscale technologies. Application of this method to two types of 32-bit adders, reconfigurable to four precisions of data in 65nm CMOS technology shows a possible reduction in power consumption by a factor of 8 to 13 with an area overhead of 15% and 9.2% respectively. The variation of energy savings with respect to standby time of unused logic and frequency of precision adaptation was also analyzed. ### 6.1.3. Ultra-Low-Power Reconfigurable Controllers Participants: Vivek D. Tovinakere, Olivier Sentieys, Steven Derrien. Most digital systems use controllers based on a finite state machine (FSM) and datapath model. For specific control tasks, this model gives an energy efficient ASIC-like implementation compared to a microcontroller. This is especially true when the controller is required to execute a pre-specified task flow graph consisting of several basic tasks in applications like wireless sensor network (WSN) nodes. Previously design flows have been proposed to generate FSMs along with datapaths for tasks specified at a high level of abstraction and hence combine them with a scheduler to realize the overall controller. The generated controller was found to be efficient compared to its microcontroller counterpart by over two orders of magnitude in energy per operation metric, but a significant limitation of such controllers is the lack of flexibility. In this work, flexible controllers based on reconfigurable FSMs are considered at an expense of hardware area. Scalable architectures for reconfigurable FSMs based on lookup tables (LUTs) whose complexity may be parameterized by a high level specification of number of states, primary inputs and outputs of an FSM are proposed. Power gating as a low power technique is used to achieve aggressive leakage power reduction by shutting-off power to unused parts of logic at any given time. It is well known that in nanoscale CMOS circuits, the increase in static power density as a cost far exceeds the impact of area due to increased logic integration. The feedback and feedforward structures of a FSM are exploited to reduce programmable interconnections - a key issue in reconfigurable logic like FPGAs. Power estimation results show good performance of proposed architectures on different metrics when compared with other solutions in the design space of controllers for WSN nodes. ## 6.1.4. Models for Dynamically Reconfigurable Systems ## 6.1.4.1. Power Models Participants: Robin Bonamy, Daniel Chillet, Olivier Sentieys. Including a reconfigurable area in a heterogeneous system-on-chip is considered as an interesting solution to reduce area and increase performance. But the key challenge in the context of embedded systems is currently the power budget of the system, and the designer needs some early estimations of the power consumption of its system. Power estimation for reconfigurable systems is a difficult problem because several parameters need to be taken into account to define an accurate model. In this work, we considered dynamic reconfiguration that makes possible to partially reconfigure a specific part of the circuit while the rest of the system is running. This technique has two main effects on power consumption. First, thanks to the area sharing ability, the global size of the device can be reduced and the static (leakage) power consumption can thus be also reduced. Secondly, it is possible to delete the configuration of a part of the device which reduces the dynamic power consumption when a task is no longer used. We have defined several models of power consumption for the dynamic reconfiguration on a Virtex 5 board and a first model of the power consumption of the reconfiguration. This model shows that the power consumption not only depends on the bitstream file size but also on the content of the reconfiguration region. Finally three models of the partial and dynamic reconfiguration with different complexities/accuracy tradeoffs are extracted [52]. ## 6.1.4.2. High-Level Modeling of Reconfigurable Architectures Participants: Robin Bonamy, Daniel Chillet. To model complex multiprocessor SoCs, the Architecture Analysis & Design Language (AADL) has been adopted. We have proposed an extension of AADL towards reconfigurable systems to support power consumption and dynamic reconfiguration modeling. As different power/energy/time/cost tradeoffs can be achieved for a given application, we proposed to represent as Pareto frontiers the set of values of power/energy vs. execution time or cost to model the execution of an application on the reconfigurable system. These Pareto frontiers are computed from analysis functions which extract and combine component characteristics from AADL models. These functions, developed in OCL (Object Constraint Language), are well suited for design space exploration and they can be used to extract the energy/power properties from the model to compute and to verify user's constraints. To complete these levels of description, we started the development of techniques for constraint verifications. These developments are based on the OCL language, which allows one to extract characteristics on the AADL model, compute mathematical expressions and finally verify mathematical constraints. These verifications have been developed for power and energy consumption, they include static and dynamic power estimation, the power consumption during the dynamic reconfiguration process and the reconfiguration speed. They handle all energy/power parameters related to reconfigurable architectures for an energy estimation of a complete application and heterogeneous system. We currently work on the link between the design space exploration explained in the previous section and the AADL models developed in collaboration with the LEAT laboratory, and to be included in the Open-People Platform [27], [54], [76], [71]. #### 6.1.5. Fault-Tolerant Reconfigurable Architectures Participants: Sébastien Pillement, Manh Pham, Stanislaw Piestrak [Univ. Metz]. In terms of complex systems implementation, reconfigurable FPGAs circuits are now part of the mainstream thanks to their flexibility, performance and high number of integrated resources. FPGAs enter new fields of applications such as aeronautics, military, automotive or confined control thanks to their ability to be remotely updated. However, these fields of applications correspond to harsh environments (cosmic radiation, ionizing, electromagnetic noise) and with high fault-tolerance requirements. We proposed a complete framework to design reconfigurable architecture supporting fault-tolerance mitigation schemes. The proposed framework enables simulation, validation of mitigation operations, but also the scaling of architecture resources. The proposed model was validated thanks to a physical implementation of the fault-tolerant reconfigurable platform. Results have shown the effectiveness of the framework [39] and confirmed the potential of dynamically reconfigurable architectures for supporting fault-tolerance in embedded systems. ## 6.1.6. Low-Power Architectures 6.1.6.1. Wakeup Time and Wakeup Energy Estimation in Power-Gated Logic Clusters Participants: Olivier Sentieys, Vivek D. Tovinakere. Run-time power gating for aggressive leakage reduction has brought into focus the cost of mode transition overheads due to frequent switching between sleep and active modes of circuit operation. In order to design circuits for effective power gating, logic circuits must be characterized for overheads they present during mode transitions. We have proposed a method to determine steady-state virtual-supply voltage in active mode and hence present a model for virtual-supply voltage in terms of basic circuit parameters. Further, we derived expressions for the estimation of two mode transition overheads: wakeup time and wakeup energy for a power-gated logic cluster using the proposed model. Experimental results of application of the model to ISCAS85 benchmark circuits show that wakeup time may be estimated within a low average error across large variation in sleep transistor sizes and variation in circuit sizes with significant speedup in computation time compared to transistor-level circuit simulations [73]. ## 6.1.7. Arithmetic Operators for Cryptography **Participants:** Arnaud Tisserand, Emmanuel Casseau, Thomas Chabrier, Danuta Pamula, Karim Bigou, Franck Bucheron, Jérémie Métairie. #### 6.1.7.1. Arithmetic Operators for Fast and Secure Cryptography Electrical activity variations in a circuit are one of the information leakage used in side channel attacks. In [65], we present $\mathbb{F}_{2^m}$ finite-field multipliers with reduced activity variations for asymmetric cryptography. Useful activity of typical multiplication algorithms is evaluated. The results show strong shapes, which can be used as a small source of information leakage. We propose modified multiplication algorithms and architectures to reduce useful activity variations. Useful activity has been evaluated using accurate FPGA emulation and activity counters at every operation cycle. Measurement analysis shows that the implemented multiplication algorithms (classical, Montgomery and Mastrovito) lead to specific shapes for the curve of activity variations which may be used as a small source of information leakage for some side channel attacks. We proposed modifications of selected $\mathbb{F}_{2^m}$ multipliers to reduce this information leakage source at two levels: architecture level by removing activity peaks due to control (e.g. reset at initialization) and algorithmic level by modifying the shape of the activity variations curve. Due to very low-level optimizations there is no significant area and delay overhead. Paper [64] presents overview of the most interesting $\mathbb{F}_{2^m}$ multiplication algorithms and proposes efficient hardware solutions applicable to elliptic curve cryptosystems. It focuses on fields of size m=233, one of the sizes recommended by NIST (National Institute of Standards and Technology). We perform an analysis of most popular algorithms used for multiplication over finite fields; suggest efficient hardware solutions and point advantages and disadvantages of each algorithm. The article overviews and compares classic, Mastrovito and Montgomery multipliers. Hardware solutions presented here, implement their modified versions to gain on efficiency of the solutions. Moreover we try to present a fair comparison with existing solutions. The designs presented here are targeted to FPGA devices. #### 6.1.7.2. ECC Processor with Protections Against SCA A dedicated processor for elliptic curve cryptography (ECC) is under development. Functional units for arithmetic operations in $\mathbb{F}_{2^m}$ and $\mathbb{F}_p$ finite fields and 160–600-bit operands have been developed for FPGA implementation. Several protection methods against side channel attacks (SCA) have been studied. The use of some number systems, especially very redundant ones, allows one to change the way some computations are performed and then their effects on side channel traces. ### 6.1.8. 3D Heterogeneous SoC Design **Participants:** Quang-Hai Khuat, Hoa Le, Sébastien Pillement, Emmanuel Casseau, Antoine Courtay, Daniel Chillet, Olivier Sentieys. A three-dimensional system-on-chip is an SoC in which two or more layers of dies are stacked vertically into a single circuit and integrated within a single package. 3D stacking is an emerging solution that provides a new dimension in performance by reducing the distances that signals need to travel between the different blocks of a system. Interconnects in future technologies are known to be a major bottleneck for performance and power. In this context, 3D implementations can help alleviate the performance and power overheads of on-chip wiring. In the context of 3D SoC, we have developed a spatio-temporal scheduling algorithm for 3D architecture composed of two layers: i) a homogenous Chip MultiProcessor (CMP) layer and ii) a homogeneous embedded Field-Programmable Gate Array (eFPGA) layer, interconnected by through-silicon vias (TSVs), thus ensuring tight coupling between software tasks on processors and associated hardware accelerators on the eFPGA. We extended the Proportionate-fair (Pfair) algorithm to tackle 3D heterogeneous multiprocessors. Unlike Pfair, our algorithm copes with task dependencies and global communication cost. Communication cost is computed by summing not only point-to-point/direct communication cost, but also memory cost. Our algorithm favours direct communication onto the eFPGA layer, but uses shared memory when direct communications are not possible [61], [75], [74]. ## 6.2. Compilation and Synthesis for Reconfigurable Platform **Participants:** Steven Derrien, Emmanuel Casseau, Daniel Menard, François Charot, Christophe Wolinski, Olivier Sentieys, Patrice Quinton. ## 6.2.1. Polyhedral-Based Loop Transformations for High-Level Synthesis Participants: Steven Derrien, Antoine Morvan, Patrice Quinton. After almost two decades of research effort, there now exists a large choice of robust and mature C to hardware tools that are used as production tools by world-class chip vendor companies. Although these tools dramatically slash design time, their ability to generate efficient accelerators is still limited, and they rely on the designer to expose parallelism and to use appropriate data layout in the source program. We believe this can be overcome by tackling the problem directly at the source level, using source-to-source optimizing compilers. More precisely, our aim is to study how polyhedral-based program analysis and transformation can be used to address this problem. In the context of the PhD of Antoine Morvan, we have studied how it was possible to improve the efficiency and applicability of nested loop pipelining (also known as nested software pipelining) in C to hardware tools. Loop pipelining is a key transformation in high-level synthesis tools as it helps maximizing both computational throughput and hardware utilization. Nevertheless, it somewhat looses its efficiency when dealing with small trip-count inner loops, as the pipeline latency overhead quickly limits its efficiency. Even if it is possible to overcome this limitation by pipelining the execution of a whole loop nest, the applicability of nested loop pipelining has so far been limited to a very narrow subset of loops, namely perfectly nested loops with constant bounds. In this work, we have extended the applicability of nested-loop pipelining to imperfectly nested loops with affine dependencies. We have shown how such loop nest can be analyzed and, under certain conditions, how one can modify the source code in order to allow nested loop pipeline to be applied using a method called polyhedral bubble insertion. The approach has been implemented in the Gecos source-to-source toolbox and was validated using two leading-edge HLS commercial tools. It helps improving performance for a minor area overhead. This work has been accepted for publication in late 2012 to IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. In addition, the complete Gecos source-to-source toolbox was presented at the DAC university booth in June 2012. In addition to our work on nested loop pipelining, we also started investigating how to extend existing polyhedral code generation technique to enable the synthesis of area-efficient control-logic for nested loops hardware accelerators. #### 6.2.2. Compiling for Embedded Reconfigurable Multi-Core Architectures Participants: Steven Derrien, Olivier Sentieys, Maxime Naullet. Current and future wireless communication and video standards have huge processing power requirements, which cannot be satisfied with current embedded single processor platforms. Most platforms now therefore integrate several processing core within a single chip, leading to what is known as embedded multi-core platforms. This trend will continue, and embedded system design will soon have to implement their systems on platforms comprising tens if not hundred of high performance processing cores. Examples of such architectures are the Xentium processor from by Recore or the Kahrisma processor, a radically new concept of morphable processor from Karlsruhe Institute of Technology (KIT). This evolution will pose significant design challenges, as parallel programming is notoriously difficult, even for domain experts. In the context of the FP7 European Project Alma (Architecture-oriented parallelization for high performance embedded Multicore systems using scilAb), we are studying how to help designers programming these platforms by allowing them to start from a specification in Matlab and/or Scilab, which are widely used for prototyping image/video and wireless communication applications. Our research work in this field revolves around two topics. The first one aims at exploring how floating-point to fixed-point conversion can be performed jointly with the SIMD instruction selection stage to explore performance/accuracy trade-off in the software final implementation. The second one aims at exploring how program transformation techniques (leveraging the polyhedral model and/or based on the domain specific semantics of scilab built-in functions) can be used to enable an efficient coarse grain parallelization of the target application on such multi-core machines. ## 6.2.3. Reconfigurable Processor Extensions Generation Participants: Christophe Wolinski, François Charot, Antoine Floc'h. Most proposed techniques for automatic instruction sets extension usually dissociate pattern selection and instruction scheduling steps. The effects of the selection on the scheduling subsequently produced by the compiler must be predicted. This approach is suitable for specialized instructions having a one-cycle duration because the prediction will be correct in this case. However, for multi-cycle instructions, a selection that does not take into account scheduling is likely to privilege instructions which will be, a posteriori, less interesting than others in particular in the case where they can be executed in parallel with the processor core. The originality of our research work is to carry out specialized instructions selection and scheduling in a single optimization step. This complex problem is modeled and solved using constraint programming. This approach allows the features of the extensible processor to be taken into account with a high degree of flexibility. Two architecture models are envisioned. The first one is an extensible processor tightly coupled to an hardware extension having internal registers used to store intermediate results. The second model is VLIW-oriented, a specialized instruction is able to configure several processing using working in parallel. Our experimental results show that these approaches are able to handle graphs of several hundred of nodes in a reasonable time (less than ten seconds for most cases). Speedups obtained are particularly interesting for applications having a high degree of instruction-level parallelism. More details on constraint programming approach applied to reconfigurable processor extension generation can be found in [32] and in the Ph.D. thesis of Antoine Floc'h [20]. During this year, we have also studied a novel technique that addresses the interactions between code optimization and instruction set extension. The idea is to automatically transform the original loop nests of a program (using the polyhedral model) to select specialized and vectorizable instructions. These instructions may use local memories of the hardware extension to store intermediates data produced at a given loop iteration. Details can be found in the Ph.D. thesis of Antoine Floc'h [20]. ## 6.2.4. Custom Operator Identification for High-Level Synthesis **Participants:** Emmanuel Casseau, François Charot, Chenglong Xiao. In this work, our goal is to propose an automated design flow based on custom operator identification for high-level synthesis. Custom operators that can be implemented in special hardware units make it possible to improve performance and reduce area of the design. The key issues involved in the design flow are: automatic enumeration and selection of custom operators from a given high-level application code and re-generation of the source code incorporating the selected custom operators. This new source code is then provided to the high-level synthesis tool. The application is first translated into an internal representation based on a graph representation. Then the problem is to enumerate and select subgraphs that will be implemented as custom operators. However, enumerating all the subgraphs is a computationally difficult problem. In Xiao's PhD thesis [25] and [42], three enumeration algorithms for exact enumeration of subgraphs under various constraints were proposed. Compared to a previously proposed well-known algorithm, the proposed enumeration algorithms can achieve orders of magnitude speedup. Selecting a most profitable subset from the enumerated subgraphs is also a time-consuming job. [25] proposed three different selection heuristics targeting different objectives. Based on these algorithms, experimental results show that the approach achieves on average 19% area reduction, compared to a traditional high-level synthesis with CtoS tool from Cadence. Meanwhile, the latency is reduced on average by 22%. ## 6.3. Interaction between Algorithms and Architectures ### 6.3.1. Numerical Accuracy Analysis and Optimization **Participants:** Daniel Menard, Karthick Parashar, Olivier Sentieys, Romuald Rocher, Pascal Scalart, Aymen Chakhari, Jean-Charles Naud, Emmanuel Casseau. Most of analytical methods for numerical accuracy evaluation use perturbation theory to provide the expression of the quantization noise at the output of a system. Existing analytical methods do not consider a correlation between noise sources. This assumption is no longer valid when a unique datum is quantized several times. In [34], an analytical model of the correlation between quantization noises is provided. The different quantization modes are supported and the number of eliminated bits is taken into account. The expression of the power of the output quantization noise is provided when the correlation between the noise sources is considered. The proposed approach allows improving significantly the estimation of the output quantization noise power compared to the classical approach, with a slight increase of the computation time. An analytical approach is studied to determine accuracy of systems including unsmooth operators. An unsmooth operator represents a function which is not derivable in all its definition interval (for example the sign operator). The classical model is no valid yet since these operators introduce errors that do not respect the Widrow assumption (their values are often higher than signal power). So an approach based on the distribution of the signal and the noise is proposed. It is applied to the sphere decoding algorithm to determine analytically the error probability due to quantization [53]. We also focus on recursive structures where an error influences future decision. So, the Decision Feedback Equalizer is also considered. In that case, numerical analysis method (as Newton Raphson algorithm) can be used. Moreover, an upper bound of the error probability can be analytically determined. A method to determine the distribution of the noise due to quantization at the output of a system made of smooth operators has been developed [70]. It is based on Generalized Gaussian Distribution and allows take under consideration all possible distributions (uniform, gaussian, laplacian, etc.). ### 6.3.2. Multi-Antenna Systems Participants: Olivier Berder, Pascal Scalart, Quoc-Tuong Ngo, Viet-Hoa Nguyen. Still considering the maximization of the minimum Euclidean distance, we proposed a new linear precoder obtained by observing the SNR-like precoding matrix. An approximation of the minimum distance is derived, and its maximum value was obtained by maximizing the minimum diagonal element of the SNR-like matrix. The precoding matrix is first parameterized as the product of a diagonal power allocation matrix and an input-shaping matrix acting on rotation and scaling of the input symbols on each virtual subchannel. We demonstrated that the minimum diagonal entry of the SNR-like matrix is obtained when the input-shaping matrix is a DFT-matrix. The major advantage of this design is that the solution can be available for all rectangular QAM-modulations and for any number of datastreams [35], [36], [37]. To reduce the decoding complexity of linearly precoded MIMO systems, the sphere decoder was applied instead of maximum likelihood and the performance complexity trade-off was investigated. The sphere decoding (SD) algorithm, proposed as a sub-optimal ML-decoding, just considers a subset of lattice points that drop into the sphere centered by the received point to obtain the decoded solution, thus reducing significantly the complexity. Because the structure of our precoder is complicated and strongly depends on the channel, it exists the case when all power is poured only on the best sub-channel. Some adjustments, therefore, of traditional sphere decoding algorithm were mandatory to adapt to the precoded MIMO systems. #### 6.3.3. Impact of RF Front-End Nonlinearity on WSN Communications Participants: Amine Didioui, Olivier Sentieys, Carolynn Bernier [CEA Leti]. ## 6.3.4. HarvWSNet: A Co-Simulation Framework for Energy Harvesting Wireless Sensor Networks Participants: Amine Didioui, Olivier Sentieys, Carolynn Bernier [CEA Leti]. Recent advances in energy harvesting (EH) technologies now allow wireless sensor networks (WSNs) to extend their lifetime by scavenging the energy available in their environment. While simulation is the most widely used method to design and evaluate network protocols for WSNs is simulation, existing network simulators are not adapted to the simulation of EH-WSNs and most of them provide only a simple linear battery model. To overcome these issues, we have proposed HarvWSNet, a co-simulation framework based on WSNet and Matlab that provides adequate tools for evaluating EH-WSN lifetime [56]. Indeed, the framework allows for the simulation of multi-node network scenarios while including a detailed description of each node's energy harvesting and management subsystem and its time-varying environmental parameters. A case study based on a temperature monitoring application has demonstrated HarvWSNet?s ability to predict network lifetime while minimally penalizing simulation time. ## 6.3.5. Cooperative Strategies for Low-Energy Wireless Networks Participants: Olivier Berder, Olivier Sentieys, Le-Quang-Vinh Tran, Duc-Long Nguyen. Recently, cooperative relay techniques (e.g. repetition-based or distributed space-time code based (DSTC-based) protocols) are increasingly of interest as one of the advanced techniques to mitigate the fading effects of transmission channel. We proposed a novel cooperative scheme with data exchange between relays before using distributed space-time coding. This fDSTC (full Distributed Space-Time Code) was compared with the conventional distributed space-time coded (cDSTC) protocol. Then, the thorough comparison of the fDSTC and cDSTC protocols in case of non-regenerative relays (NR-relays) and regenerative relays (R-relays) were considered in terms of error performance, outage probability, diversity order and energy consumption via both numerical simulations and mathematical analysis [24]. The previous works consider the energy efficiency of the cooperative relays techniques under the view of ideal medium access control (MAC) protocol. However, MAC protocol is responsible for regulating the shared wireless medium access of the networks, therefore, it has great influences on the total energy consumption of the networks. That lead us to a big motivation to design a cooperative MAC protocol, RIC-MAC (Receiver Initiated Cooperative MAC), by combining preamble sampling and cooperative relay techniques. The analytic results still confirm the interest of using cooperative relay techniques. However, the energy efficiency of the cooperative relay systems may be affected by MAC protocol design, the traffic loads of the networks and the desired latency [24]. #### 6.3.6. Opportunistic Routing Participants: Olivier Berder, Olivier Sentieys, Ruifeng Zhang. However, the aforementioned approaches introduce an overhead in terms of information exchange, increasing the complexity of the receivers. A simpler way of exploiting spatial diversity is referred to as opportunistic routing. In this scheme, a cluster of nodes still serves as relay candidates but only a single node in the cluster forwards the packet [80]. Energy efficiency and transmission delay are very important parameters for wireless multihop networks. Numerous works that study energy efficiency and delay are based on the assumption of reliable links. However, the unreliability of channels is inevitable in wireless multihop networks. We investigated the tradeoff between the energy consumption and the latency of communications in a wireless multihop network using a realistic unreliable link model [43]. It provided a closed-form expression of the lower bound of the energy-delay tradeoff and of energy efficiency for different channel models (additive white Gaussian noise, Rayleigh fast fading and Rayleigh block-fading) in a linear network. These analytical results are also verified in 2-dimensional Poisson networks using simulations. The closed-form expression provides a framework to evaluate the energy-delay performance and to optimize the parameters in physical layer, MAC layer and routing layer from the viewpoint of cross-layer design during the planning phase of a network. ## 6.3.7. Adaptive Techniques for WSN Power Optimization Participants: Olivier Berder, Daniel Menard, Olivier Sentieys, Mahtab Alam, Trong-Nhan Le. We proposed a self-organized asynchronous medium access control (MAC) protocol for wireless body area sensor (WBASN). A body sensor network exhibits a wide range of traffic variations based on different physiological data emanating from the monitored patient. In this context, we exploit the traffic characteristics being observed at each sensor node and propose a novel technique for latency-energy optimization at the MAC layer [48], [26]. The protocol relies on dynamic adaptation of wake-up interval based on a traffic status register bank. The proposed technique allows the wake-up interval to converge to a steady state for variable traffic rates, which results in optimized energy consumption and reduced delay during the communication. The results show that our protocol outperforms the other protocols in terms of energy as well as latency under the variable traffic of WBASN. System lifetime is the crucial problem of Wireless Sensor Networks (WSNs), and exploiting environmental energy provides a potential solution for this problem. When considering self-powered systems, the Power Manager (PM) plays an important role in energy harvesting WSNs. Instead of minimizing the consumption energy as in the case of battery powered systems, it makes the harvesting node converge to Energy Neutral Operation (ENO) to achieve a theoretically infinite lifetime and maximize the system performance. In [62], a low complexity PM with a Proportional Integral Derivative (PID) controller is introduced. This PM monitors the buffered energy in the storage device and performs adaptation by changing the wake-up period of the wireless node. This shows the interest of our approach since the impractical monitoring harvested energy as well as consumed energy is not required as it is the case in other previously proposed techniques. Experimental results are performed on a real WSN platform with two solar cells in an indoor environment. The PID controller provides a practical strategy for long-term operations of the node in various environmental conditions. ## 6.3.8. WSN for Health Monitoring Participants: Patrice Quinton, Olivier Sentieys. Applications of wireless sensor devices were also considered in the domain of health monitoring. Together with researchers from CASA team of IRISA-UBS, we investigated the possibility of using ECG-sensors to remotely monitor the cardiac activity of runners during a marathon race, using off-the shelf sensing devices and a limited number of base stations deployed along the marathon route. Preliminary experiments showed that such a scenario is indeed viable, although special attention must be paid to balancing the requirements of ECG monitoring with the constraints of episodic, low-rate transmissions. The proliferation of private, corporate and community Wi-Fi hotspots in city centers and residential areas opens up new opportunities for the collection of biomedical data produced by sensors carried by mobile non-hospitalized subjects. Using disruption-tolerant networks, it was shown that biomedical data could be recorded using nearby hotspot. A scenario involving a subject wearing an ECG-enabled sensor walking in the streets of a residential area was reported. These researches, combined with new sensor devices developed by the BOWI project, open up a large range of applications where high-performance sensor devices would allow health monitoring, or sport events organization. #### 6.3.9. Reconfigurable Video Coding Participants: Emmanuel Casseau, Hervé Yviquel. In the field of multimedia coding, standardization recommendations are always evolving. To reduce design time taking benefit of available SW and HW designs, Reconfigurable Video Coding (RVC) standard allows defining new codec algorithms. The application is represented by a network of interconnected components (so called actors) defined in a modular library and the behaviour of each actor is described in the specific RVC-CAL language. Dataflow programming, such as RVC applications, express explicit parallelism within an application. However general purpose processors cannot cope with both high performance and low power consumption requirements embedded systems have to face. Hence we are investigating the mapping of RVC specifications on hardware accelerators or on many tiny core platforms. Actually, our goal is to propose an automated co-design flow based on the Reconfigurable Video Coding framework. The designer provides the application description in the RVC-CAL dataflow language, after which the co-design flow automatically generates a network of processors that can be synthesized on FPGA platforms. We are currently focussing on a many-core platform based on the TTA processor (Very Long Instruction Word -style processor). Hervé Yviquel did a 4-months stay (Spring 2012) at Tampere University of Technology, Finland, in the group of Jarmo Takala who is developing a co-design toolset for TTA processor automated generation. Such a methodology permits the rapid design of a many-core signal processing system which can take advantage of all levels of parallelism. This work is done in collaboration with Mickael Raulet from IETR INSA Rennes and has been implemented in the Orcc open-source compiler. At present time the mapping of the RVC-CAL actor network is straightforward: every actor is mapped on a TTA processor based on our collaboration with Jani Boutellier from the University of Oulu (Finland). To reduce the area of the platform, TTA processor usage rate has to be improved, i.e. several actors are to be mapped onto a single processor. Work in progress is about this. It requires an actor partitioning step to define the set of actors that will be executed on the same processor. Due to the dynamic behaviour of the application, we expect we will be able to use profiling to get some feedbacks for the partitioning. ## 6.3.10. A Low-Complexity Synchronization Method for OFDM Systems Participants: Pramod P. Udupa, Olivier Sentieys, Pascal Scalart. A new hierarchical synchronization method was proposed for initial timing synchronization in orthogonal frequency-division multiplexing (OFDM) systems. Based on the proposal of new training symbol, a threshold based timing metric is designed for accurate estimation of start of OFDM symbol in a frequency selective channel. Threshold is defined in terms of noise distributions and false alarm which makes it applicable independent of type of channel it is applied. Frequency offset estimation is also done for the proposed training symbol. The performance of the proposed timing metric is evaluated using simulation results. The proposed method achieves low mean squared error (MSE) in timing offset estimation at five times lower computational complexity compared to cross-correlation based method in a frequency selective channel. It is also computationally efficient compared to hybrid approaches for OFDM timing synchronization. #### 6.3.11. Flexible hardware accelerators for biocomputing applications Participants: Steven Derrien, Naeem Abbas, Patrice Quinton. It is widely acknowledged that FPGA-based hardware acceleration of compute intensive bioinformatics applications can be a viable alternative to cluster (or grid) based approach as they offer very interesting MIPS/watt figure of merits. One of the issues with this technology is that it remains somewhat difficult to use and to maintain (one is rather designing a circuit rather than programming a machine). Even though there exists C-to-hardware compilation tools (Catapult-C, Impulse-C, etc.), a common belief is that they do not generally offer good enough performance to justify the use of such reconfigurable technology. As a matter of fact, successful hardware implementations of bio-computing algorithms are manually designed at RT-level and are usually targeted to a specific system, with little if any performance portability among reconfigurable platforms. This research work, funded by the ANR BioWic project, aims at providing a framework for helping semi-automatic generation of high-performance hardware accelerators. This research work builds upon the CAIRN research group expertise on automatic parallelization for application specific hardware accelerators and has been targeting mainstream bioinformatics applications (HMMER, ClustalW and BLAST). The Biowic project ended in early 2012. Naeems Abbas, a PhD student funded by the project defended his PhD in May 2012. # 7. Partnerships and Cooperations ## 7.1. European Initiatives #### 7.1.1. FP7 FLEXTILES **Participants:** Olivier Sentieys, Emmanuel Casseau, Antoine Courtay, Daniel Chillet, Philippe Quémerais, Christophe Huriaux, Quang-Hoa Le. Program: FP7-ICT-2011-7 Project acronym: Fmextiles Duration: Oct. 2011 - Sep. 2014 Coordinator: Thales Other partners: Thales (FR), UR1 (FR), KIT (GE), TU/e (NL), CSEM (SW), CEA LETI (FR), Sundance (UK) Project title: Self Adaptive Heterogeneous Manycore Based on Flexible Tiles A major challenge in computing is to leverage multi-core technology to develop energy-efficient high performance systems. This is critical for embedded systems with a very limited energy budget as well as for supercomputers in terms of sustainability. Moreover the efficient programming of multi-core architectures, as we move towards manycores with more than a thousand cores predicted by 2020, remains an unresolved issue. The FlexTiles project will define and develop an energy-efficient yet programmable heterogeneous manycore platform with self-adaptive capabilities. The manycore will be associated with an innovative virtualisation layer and a dedicated tool-flow to improve programming efficiency, reduce the impact on time to market and reduce the development cost by 20 to 50%. FlexTiles will raise the accessibility of the manycore technology to industry - from small SMEs to large companies - thanks to its programming efficiency and its ability to adapt to the targeted domain using embedded reconfigurable technologies. #### 7.1.2. FP7 ALMA Participants: Steven Derrien, Romuald Rocher, Olivier Sentieys, Maxime Naullet, Ali Hassan El Moussawi. Program: FP7-ICT-2011-7 Project acronym: Alma Project title: Architecture oriented paraLlelization for high performance embedded Multicore sys- tems using scilAb Duration: Sep. 2011 - Aug. 2014 Coordinator: KIT Other partners: KIT (GE), UR1 (FR), Recore Systems (NL), Univ. of Peloponnese (GR), TEI-MES (GR), Intracom SA (GR), Fraunhofer (GE) The mapping process of high performance embedded applications to today's multiprocessor system on chip devices suffers from a complex toolchain and programming process. The problem here is the expression of parallelism with a pure imperative programming language which is commonly C. This traditional approach limits the mapping, partitioning and the generation of optimized parallel code, and consequently the achievable performance and power consumption of applications from different domains. The Architecture oriented paraLlelization for high performance embedded Multicore systems using scilAb (ALMA) project aims to bridge these hurdles through the introduction and exploitation of a Scilab-based toolchain which enables the efficient mapping of applications on multiprocessor platforms from high-level abstraction descriptions. This holistic solution of the toolchain allows the complexity of both the application and the architecture to be hidden, which leads to a better acceptance, reduced development cost and shorter time-to-market. Driven by the technology restrictions in chip design, the end of Moore's law and an unavoidable increasing request of computing performance, ALMA is a fundamental step forward in the necessary introduction of novel computing paradigms and methodologies. ALMA helps to strengthen the position of Europe in the world market of multiprocessor targeted software toolchains. The challenging research will be achieved by the unique ALMA consortium which brings together industry and academia. High class partners from industry such as Recore and Intracom, will contribute their expertise in reconfigurable hardware technology for multicore systems-on-chip, software development tools and real world applications. The academic partners will contribute their outstanding expertise in reconfigurable computing and compilation tools development. ## 7.1.3. Collaborations with Major European Organizations Imec (Belgium), Scenario-based fixed-point data format refinement to enable energy-scalable of Software Defined Radios (SDR) Lund University (Sweden), Constraints programming approach application in the reconfigurable data-paths synthesis flow Code and Cryptography group of University College Cork (Ireland), Arithmetic operators for cryptography and WSN for health monitoring Ecole Polytechnique Fédérale de Lausanne - EPFL (Switzerland), Optimization of systems using fixed-point arithmetic Technical University of Madrid - UPM (Spain), Optimization of systems using fixed-point arithmetic Technical University of Tampere, University of Oulu (Finland), Reconfigurable Video Coding Hervé Yviquel spent 4 months in the group of Jarmo Takala at Tampere University of Technology, Finland, from March. ## 7.2. National Initiatives The CAIRN team has currently some collaboration with the following laboratories: CEA List, SATIE ENS Cachan, LEAT Nice, Lab-Sticc (Lorient, Brest), LIRMM (Montpellier, Perpignan), ETIS Cergy, LIP6 Paris, IETR Rennes, Ireena Nantes; and with the following Inria project-teams: Aric, Compsys, Swing, Symbiose, TexMex. The team participates in the activities of the following research organization of CNRS (GdR for in French "Groupe de Recherche"): - GdR SOC-SIP (*System On Chip & System In Package*), working groups on reconfigurable architectures, embedded software for SoC, low power issues. See <a href="http://www2.lirmm.fr/~w3mic/SOCSIP/index.php">http://www2.lirmm.fr/~w3mic/SOCSIP/index.php</a>. CAIRN is the leader of the group on reconfigurable architectures. - GdR ISIS (Information Signal ImageS), working group on Algorithms Architectures Adequation. - GdR ASR (Architectures Systèmes et Réseaux) - GdR IM (*Informatique Mathématiques*), C2 working group on Codes and Cryptography and ARITH working group on Computer Arithmetic ## 7.2.1. ANR Blanc - PAVOIS (2012–2016) Participants: Arnaud Tisserand, Emmanuel Casseau, Romuald Rocher, Philippe Quémerais, Jérémie Métairie. PAVOIS (in French: *Protections Arithmétiques Vis à vis des attaques physiques pour la cryptOgraphIe basée sur les courbeS elliptiques*) is a project on Arithmetic Protections Against Physical Attacks for Elliptic Curve based Cryptography. It involves IRISA-CAIRN (Lannion) and LIRMM (Perpignan and Montpellier). This project will provide novel implementations of curve based cryptographic algorithms on custom hardware platforms. A specific focus will be placed on trade-offs between efficiency and robustness against physical attacks. One of our goal is to theoretically study and practically measure the impact of various protection schemes on the performance (speed, silicon cost and power consumption). Theoretical aspects will include an investigation of how special number representations can be used to speed-up cryptographic algorithms, and protect cryptographic devices from physical attacks. On the practical side, we will design innovative cryptographic hardware architectures of a specific processor based on the theoretical advancements described above to implement curve based protocols. We will target efficient and secure implementations for both FPGA an ASIC circuits. For more details see <a href="http://pavois.irisa.fr">http://pavois.irisa.fr</a>. #### 7.2.2. ANR INFRA 2011 - FAON (2012-2015) Participants: Raphaël Bardoux, Arnaud Carer, Matthieu Gautier, Pascal Scalart. The FAON (Frequency based Access Optical Networks) project objectives are to demonstrate the technology and feasibility of a new type of Passive Optical Network (PON) for broadband access which uses a Frequency based shared access technique known as Frequency Division Multiplexing (FDM). These goals completely fall into the line of the expected capacity increase in PON which is today forecasted to go from 100 Mbps per user to 1 Gbps. For more details, see <a href="http://www.anr-faon.fr/">http://www.anr-faon.fr/</a>. Faon involves Orange Labs, CEA-LETI, University of South Brittany (Lab-STICC laboratory) and University of Rennes 1 (Foton laboratory and CAIRNteam). CAIRNaims at developing a high-rate architecture at the receiver side. Specific receiver algorithms (synchronization and equalization) and FPGA implementation are the key issues that will be addressed. ## 7.2.3. Equipex FIT - Future Internet (of Things) **Participants:** Vaibhav Bhatnagar, Arnaud Carer, Matthieu Gautier, Ganda-Stéphane Ouedraogo, Olivier Sentieys. FIT is one of 52 winning projects from the first wave of the French Ministry of Higher Education and Research's "Équipements d'Excellence" (Equipex) research grant programme. FIT involves UPMC, Inria, LSIIT and the Institut Mines-Telecom and runs over a nine-year period. FIT offers a federation of several independent experimental testbeds to provide a larger-scale, more diverse and higher performance platform for accomplishing advanced experiments. For more details, see <a href="http://fit-equipex.fr/">http://fit-equipex.fr/</a>. Inria (CAIRNand Socrate teams) develops the cognitive radio testbed that will provide a full experimental environment for evaluating the coexistence and the cooperation between heterogeneous multistandard nodes. To this aim, a fully open architecture based on software defined radio nodes is developed. CAIRNaims at proposing an FPGA based software defined radio with high level specifications. Cognitive radio testbed development is supported by an ADT funding of Inria. ## 7.2.4. ANR Ingénérie Numérique et Sécurité - ARDyT (2011-2015) Participants: Sébastien Pillement, Arnaud Tisserand, Philippe Quémerais. ARDyT (in French: Architecture Reconfigurable Dynamiquement Tolérante aux fautes) is a project on a Reliable and Reconfigurable Dynamic Architecture. It involves IRISA-CAIRN (Lannion), Lab-STICC (Lorient), LIEN (Nancy) and ATMEL. The purpose of the ARDyT project is to provide a complete environment for the design of a fault tolerant and self-adaptable platform. Then, a platform architecture, its programming environment and management methodologies for diagnosis, testability and reliability have to be defined and implemented. The considered techniques are exempt from the use of hardened components for terrestrial and aeronautics applications for the design of low-cost solutions. The ARDyT platform will provide a European alternative to import ITAR constraints for fault-tolerant reconfigurable architectures. For more details see <a href="http://ardyt.irisa.fr">http://ardyt.irisa.fr</a>. ## 7.2.5. ANR Ingénérie Numérique et Sécurité - COMPA (2011-2015) Participants: Emmanuel Casseau, Steven Derrien, Sébastien Pillement. COMPA (model oriented design of embedded and adaptive multiprocessor) is a project which involves CAIRN, IETR (Institut d'Electronique et de Télécommunications de Rennes), Lab-STICC (University of Bretagne Sud), CAPS Entreprise, Modae Technologies and Texas Instruments. The goal of the project is to design adaptive multiprocessor embedded systems from dataflow models. Reconfigurable video coding (RVC) standard will be targeted as application use case. We will then more specifically focus on the use of the portable and platform-independent RVC-CAL language to describe the applications. We will propose transformations in order to refine, optimize and translate the application model into software and hardware components. Task mapping, instructions and processor allocation, and constrained scheduling will also be investigated for runtime execution and reconfiguration. #### 7.2.6. ANR Ingénérie Numérique et Sécurité - DEFIS (2011-2015) Participants: Olivier Sentieys, Daniel Menard, Romuald Rocher, Nicolas Simon. DEFIS (Design of fixed-point embedded systems) is a project which involves CAIRN, LIP6 (University of Paris VI), LIRMM (University of Perpignan), CEA LIST, Thales, Inpixal. The main objectives of the project are to propose new approaches to improve the efficiency of the floating-point to fixed-point conversion process and to provide a complete design flow for fixed-point refinement of complex applications. This infrastructure will reduce the time-to-market by automating the fixed-point conversion and by mastering the trade-off between application quality and implementation cost. Moreover, this flow will guarantee and validate the numerical behavior of the resulting implementation. The proposed infrastructure will be validated on two real applications provided by the industrial partners. For more details see <a href="http://defis.lip6.fr">http://defis.lip6.fr</a>. ### 7.2.7. ANR ARPEGE - GRECO (2010-2013) Participants: Olivier Sentieys, Olivier Berder, Arnaud Carer, Trong-Nhan Le. Sensor network technologies and the increase efficiency of photovoltaic cells show that it is possible to reach communicating objects solutions with low enough power consumption to foresee the possibility of developing autonomous objects. Greco (GREen wireless Communicating Objects) is a project on the design of autonomous communicating object platforms (i.e. self-powered sensor networks). The aim is to optimize the power consumption based on (i) a modeling of the performance and power of the required blocks (RF front-end, converters, modem, peripherals, digital architecture, OS, software, power generator, battery, etc.) (ii) heterogeneous simulation models and tools, and (iii) the use of a real-time global "Power Manager". The final validation will be performed on various case studies: a monitoring system and an audio communication between firemen. A HW/SW prototyping (based on an CAIRN's PowWow platform with energy harvesting) and a simulation associating a precise modeling (virtual platform) of an object inserted in a network simulator-like environment will be developed as demonstrators. Greco involves Thales, Irisa-CAIRN, CEA List, CEA Leti, Im2nP, LEAT, Insight-SiP. For more details see http://greco.irisa.fr. #### 7.2.8. S2S4HLS **Participants:** Emmanuel Casseau, Steven Derrien, Daniel Menard, Olivier Sentieys, Antoine Morvan, Chenglong Xiao, Jean-Charles Naud. NANO2012 Program - S2S4HLS (2008-2012) High-level synthesis (HLS) tools start to be used for industrial designs. HLS is analogous to software compilation transposed to the hardware domain. From an algorithmic behavior of the specification, HLS tools automate the design process and generate a register transfer level RTL architecture taking account of user-specified constraints. However, design performance still depends on designer's skill to write the appropriate source code. The S2S4HLS (Source-to-Source for High-Level Synthesis) project intends to process source code transformations to guide synthesis hence leading to more efficient designs, and aims at providing a toolbox for automatic C code source-to-source transformations. The project is focused on three complementary goals to push the limits of existing HLS tools: loop transformations for performance optimization and a better resource usage, automatic floating-point to fixed-point conversion and synthesis of multi-mode architectures. S2S4HLS is organized into three sub-projects targeting these three objectives. The project is in close collaboration with STMicroelectronics and Compsys team at Inria Rhône-Alpes, within the overall Inria-ST partnership agreement. It is financed by the Ministry of Industry in the Nano2012 program. CAIRNis responsible of the project and involved in the three workpackages. ## 7.2.9. NANO2012 Program - RecMotifs (2008-2012) Participants: François Charot, Antoine Floc'h, Christophe Wolinski. The RecMotifs project aims at the generation of application specific extensions targeting the STxP70 processor from STMicroelectronics. CAIRNwill study advanced technologies algorithms for graph matching and graph merging together with constraints programming methods. The project is in close collaboration with STMicroelectronics within the overall Inria-ST partnership agreement. It is financed by the Ministry of Industry in the Nano2012 program. ## 7.2.10. ANR Architectures du Futur Open-People (2009-2012) Participants: Daniel Chillet, Robin Bonamy, Olivier Sentieys. The Open-People (Open Power and Energy Optimization PLatform and Estimator) project aims at defining a complete platform for power estimation and optimization. The platform will be composed of hardware boards to support measurements for the applications. End-users will be able to upload their applications through a web portal, and to control the power measurements of the execution of their applications on a specific electronic board. The Open-People project will also propose a complete power component model library which allows end-users to estimate the power consumption of some parts of the applications without making measurements. This will allow to quickly evaluate the different design choices regarding the power consumption. Finally, through the web portal <a href="http://www.open-people.fr">http://www.open-people.fr</a>, Open-People will propose software tools to apply power optimizations. In this project, CAIRN team will develop power model for FPGA components using dynamic reconfiguration. Open-People involves LabSticc (Lorient), Trio (Nancy), CAIRN (Rennes/Lannion) and Dart (Lille/Valenciennes) teams from Inria, Leat at Nice, Thales (Colombes) and InPixal (Rennes). CAIRNis in charge of power models and optimization for reconfigurable architectures. ## 7.2.11. Images and Networks competitiveness cluster - 100GFlex project (2010-2013) Participants: Olivier Sentieys, Arnaud Carer, Remi Pallas, Pascal Scalart. Speed and flexibility are quickly increasing in the metropolitan networks. In this context, 100GFLEX studies the relevance of a new transmission scheme: the multiband optical OFDM at very-high rates (up to 100 Gbits/s). In this project we will study efficient algorithms (e.g. synchronization) and high-speed architectures for the digital signal processing of the optical transceivers. Due to the high rate of analog signals (sampling at more than 10Gsample/s), synchronizing and processing is real challenge. 100Gflex involves Mitsubishi-Electric R&D Center Europe, Institut Télécom, Ekinops, France Télécom, Yenista Optics, Foton and CAIRN. ## 7.3. International Initiatives ## 7.3.1. Inria Associate Team LRS Title: Loop unRolling Stones: compiling in the polyhedral model Inria principal investigator: Steven Derrien International Partner (Institution - Laboratory - Researcher): Colorado State University (United States) - Mélange Group Duration: 2010 - 2012 See also: http://www.irisa.fr/cosi/HOMEPAGE/Derrien/EA-2010/LRS.htm The goal of the team is twofold: i) Propose new methodologies and algorithms to tackle some of the open problems in automatic parallelization and high level hardware synthesis from nested loop specifications. In particular, we would like to address the problem of parallelization of complex bioinformatics algorithms based of sophisticated dynamic programming algorithms, for which we would like to propose efficient parallelization schemes for both FPGAs (Field Programmable Gate Arrays) and GPUs (Graphical Processing Units). ii) Provide a common open software infrastructure based on (modern/cutting edge) software engineering techniques (Model Driven Software Development) so as to help researchers prototyping new ideas and concept in the domain of optimizing compilers. Our goal being to be able to make our in-house software completely interoperable. ## 7.3.2. Inria International Partners LRTS laboratory, Laval University in Québec (Canada), Architectures for MIMO systems, Wireless Sensor Networks, Inria Associate Team (2006-2008) LSSI laboratory, Québec University in Trois-Rivières (Canada), Design of architectures for digital filters and mobile communications Computer Science Department, Colorado State University in Fort-Collins (USA), Loop parallelization, development of high-level synthesis tools, Inria Associate Team (2010-2012) University of Adelaide (Australia), Arithmetic operators VLSI CAD lab, Electrical and Computer Engineering Department, University of Massachusetts at Amherst (USA), CAD tools for arithmetic datapath synthesis and optimization ## 7.3.3. CNRS PICS - SPiNaCH (2012 - 2014) Title: Secure and low-Power sensor Networks Circuits for Healthcare embedded applications Principal investigator: Arnaud Tisserand, Olivier Berder, Olivier Sentieys International Partner (Institution - Laboratory - Researcher): Code&Crypto group in University College Cork (Ireland) Duration: 2012 - 2014 Biomedical sensor networks may be used more and more in the future. For instance, they allow patient's health-care parameters to be remotely monitored at home. In this project, we plan to address two important challenges in the design of biomedical sensors networks: i) design of low-power sensor devices for embedded autonomous systems (health monitoring, pace-maker...) with long battery life; ii) confidentiality and security aspects and especially with public key cryptography processor that are robust against side channel attacks (measure of the computation time, the power consumption or the electromagnetic radiations of the circuit) and with limited power-energy resources. ## 7.4. International Research Visitors ## 7.4.1. Visits of International Scientists Prof. Gabriel Caffarena (University CEU-San Pablo, Madrid) for one month in August-September. Prof. Maciej Ciesielski (University of Massachusetts, VLSI CAD Laboratory, USA) for one month in June-July. Dr Muhammad Adeel Ahmed Pasha, Assistant Professor at LUMS for a two-month stay in July-August. PhD Student Nabil Ghanmy (University of Sfax, Tunisia) for one month in November-December. PhD Student Tomofumi Yuki (Colorado State University, USA) for two months in November and December. Prof. Sanjay Rajopadhye (Colorado State University, USA) for one week in December. ## 7.4.2. Internships Simara Pérez Zurita (from Oct 2012 until Aug 2013) Subject: Optimizing Computational Precision in High-level Synthesis of Signal Processing Systems: Theory and Implementation using TDS and GECOS Institution: Technical University of Kaiserslautern (Kaiserslautern, Germany) ## 8. Dissemination #### 8.1. Scientific Animation D. Chillet was General Co-Chair of the Conference on Design and Architectures for Signal and Image Processing (DASIP) in Karlsruhe, Germany, Oct. 2012. D. Chillet was the Editor of a Special Issue of International Journal of Real Time Image Processing, 2012. A. Tisserand was a member of technical program committee of the following conferences: IEEE ARITH'21, IEEE Reconfig 2012, DASIP 2012, IEEE NEWCAS 2012. He is a member of the editorial board of the International Journal of High Performance Systems Architecture, Inderscience. - M. Gautier was a member of the technical program committee of IEEE VTC-fall 2012, IEEE ICCVE 2012 and IARIA COCORA 2012. - E. Casseau co-organized with Michael Hubner the first DASIP demo night during the Conference on Design and Architectures for Signal and Image Processing, Karlsruhe, Germany. Details on <a href="http://ecsi.org/dasip2012/demo-night">http://ecsi.org/dasip2012/demo-night</a>. - S. Pillement and E. Casseau were members of the Program Committee of DASIP. - P. Quinton was the coordinator of the "équipe projet de recherche" PucesCom of the Université européenne de Bretagne. - O. Sentieys was a member of technical program committee of the following conferences: IEEE/ACM DATE, IEEE ISQED, IEEE VTC, IEEE DDECS, DCIS, FTFC. He was Track Chair at NEWCAS. He is on the editorial board of Journal of Low Power Electronics, American Scientific Publishers, and of ISRN Sensor Networks. - O. Sentieys is a member of the steering committee of the GDR SOC-SIP. He is the chair of the IEEE Circuits and Systems (CAS) French Chapter. In 2011, he was an expert for some scientific organizations (ANR INS, ANR blanc). He is a member of Allistene working group. - F. Charot, O. Sentieys and A. Tisserand are members of the steering committee of a CNRS spring school for graduate students on embedded systems architectures and associated design tools (ARCHI). - O. Sentieys and A. Tisserand are members of the steering committee of a CNRS spring school for graduate students on low-power design (ECOFAC). - A. Tisserand co-organized the ECOFAC 2012 école thématique CNRS conception faible consommation pour les systèmes embarqués temps réel in La Colle-sur-Loup, May 21–25, 2012. Details on http://leat.unice.fr/ECoFaC2012/. ## 8.2. Seminars and Invitations - A. Tisserand gave an invited talk at the French Rencontres Arithmétique de l'Informatique Mathématique du GT ARITH GDR IM in June on Circuits for True Random Number Generation with On-Line Quality Monitoring. - A. Tisserand gave a invited lecture at the CNRS ECOFAC 2012 spring school on *Power Consumption and Security: Attacks and Countermeasures*. - A. Tisserand gave a invited lecture at the IRMAR (Maths) Laboratory of University of Rennes 1 on *True Random Number Generation with On-Line Quality Monitoring*. ## 8.3. Teaching - Supervision - PhD Committee ## 8.3.1. Teaching Responsibilities There is a strong teaching activity in the CAIRN team since most of the permanent members are Professors or Associate Professors. - C. Wolinski is the Director of ESIR. - P. Quinton is the deputy-director of Ecole Normale Supérieure de Cachan, responsible of the Brittany branch of this school. - D. Chillet is the Director of Academic Studies of ENSSAT. - P. Scalart is the Head of the Electronics Engineering department of ENSSAT. - S. Derrien is the responsible of the first year of the master of computer science at ISTIC since Sep. 2012. O. Sentieys is responsible of the "Embedded Systems" branch of the SISEA Master of Research (M2R). ENSSAT stands for "École Nationale Supérieure des Sciences Appliquées et de Technologie" and is an "École d'Ingénieurs" of the University of Rennes 1, located in Lannion. ISTIC is the Electrical Engineering and Computer Science Department of the University of Rennes 1. ESIR stands for "École supérieure d'ingénieur de Rennes" and is an "École d'Ingénieurs" of the University of Rennes 1, located in Rennes. M2R stands for Master by Research, second year. - D. Chillet is member of the French National University Council since 2009 in signal processing and electronics (Conseil National des Universités en 61e section). - D. Chillet is member of the Permanent Committee of the French National University Council since november 2011 in signal processing and electronics (Commission Permanente du Conseil National des Universités en 61e section). ## 8.3.2. Teaching - O. Berder: introduction to signal processing, 38h, ENSSAT(L3) - O. Berder: microprocessors and digital systems, 19h, ENSSAT(L3) - O. Berder: wireless communications, 23h, ENSSAT(M2) - O. Berder: digital signal processing, 60h, ENSSAT(M1) - O. Berder: ad hoc networks, 58h, ENSSAT(M1-M2) - O. Berder: signal processing, 24h, IUT Lannion (L2) - E. Casseau: verification, 12h, Master by Research and ENSSAT(M2) - E. Casseau: hardware description language, 20h, ENSSAT(M1) - E. Casseau: low power design, 6h, ENSSAT(M1) - E. Casseau: real time design methodology, 24h, ENSSAT(M1) - E. Casseau: verification, 25h, USTH (M1) - E. Casseau: signal processing, 16h, ENSSAT(L3) - S. Derrien: component and system synthesis, 16h, Research Master (MRI ISTIC) (M2) - S. Derrien: computer architecture, 12h, ENS Cachan (L3) - S. Derrien: introduction to operating systems, 8h, ISTIC (M1) - F. Charot: specification of applications with the signal synchronous language, 24h, ESIR(M1) - F. Charot: virtual prototyping of multiprocessor system-on-chip, 24h, ESIR(M1) - F. Charot: design of embedded systems, 28h, ESIR(M1) - D.Chillet: Basic processor architecture, 20h, ENSSAT(L1) - D.Chillet: Design methodology of real-time systems, 32h, ENSSAT(L2) - D.Chillet: Advanced processor architectures, 24h, ENSSAT(M2) - D.Chillet: Multimedia processor architectures, 24h, ENSSAT(M2) - D.Chillet: Multi-processor systems, 20h, ENSSAT(M2) - D. Chillet: advanced processors architectures, 24h, Master by Research and ENSSAT(M2) - D. Chillet: low-power digital CMOS circuits, 6h, Telecom Bretagne and University of Occidental Brittany (UBO) (M2) - D. Chillet: Digital system design, 25h, University of Science and Technology of Hanoi, (M1) - D. Chillet: Advanced Multiprocessor system , 25h, University of Science and Technology of Hanoi, (M2) - M.Gautier, electronics, 42h, IUT Lannion (L1) - M.Gautier, telecommunications, 114h, IUT Lannion (L1) - R. Rocher: electricity, 16h, IUT Lannion (L1) - R. Rocher: electronics, 44h, IUT Lannion (L1) - R. Rocher: telecommunications, 82h, IUT Lannion (L1) - R. Rocher: signal processing, 12h, IUT Lannion (L2) - R. Rocher: digital communications, 48h, IUT Lannion (L2) - P. Scalart: non-linear optimisation, 18h, Master by Research and ENSSAT (M2) - P. Scalart: Parametric modelisation, optimal and adaptive Filters, 24h, Master by Research and ENSSAT (M2) - P. Scalart: source coding, 14h, Master by Research and ENSSAT (M2) - P. Scalart: cellular networks, 24h, ENSSAT (M2) - P. Scalart: digital communication systems, 32h, ENSSAT (M1) - P. Scalart: random signals and systems, 12h, ENSSAT (M1) - O. Sentieys: methodologies for system-on-chip design, 6h, Master by Research and ENSSAT(M2) - O. Sentieys: VLSI integrated circuit design, 66h, ENSSAT(M1) - O. Sentieys: high-level synthesis of digital signal processors, 16h, Master by Research and EN-SSAT(M2) - A. Tisserand: GPU programming, 8h, ENSSAT(M2) - A. Tisserand: hardware computer arithmetic operators, 6h, Master by Research, Univ. Rennes 1 (M2) - A. Tisserand: computer arithmetic, 12h, ENS Cachan, Antenne de Bretagne, *Magister* Computer Science and Telecommunications (L3) - A. Tisserand: computer arithmetic, 16h, ENSEIRB, (L3) - C. Wolinski: architecture 1, 64h, ESIR(L3) - C. Wolinski: architecture 2, 28h, ESIR(L3) - C. Wolinski: design of embedded systems, 48h, ESIR(M1) - C. Wolinski: signal, image, architecture, 26h, ESIR(M1) - C. Wolinski: programmable architectures, 10h, ESIR(M1) - C. Wolinski: component and system synthesis, 10h, Master by Research (MRI ISTIC) (M2) ## 8.3.3. Supervision HDR: Olivier Berder, Systèmes multi-antennes et optimisation énergétique des réseaux de capteurs sans fil, Habilitation à Diriger des Recherches, University of Rennes 1, Dec. 2012. PhD: Naeem Abbas, Acceleration of a Bioinformatics Application using High-Level Synthesis, May 2012, P. Quinton, S. Derrien. PhD: Andrei Banciu, A Stochastic Approach for the Range Evaluation, Feb. 2012, E. Casseau. PhD: Antoine Eiche, Real-Time Scheduling for Heterogeneous and Reconfigurable Architectures using Neural Network Structures, University of Rennes 1, Sep. 2012, D.Chillet. PhD: Antoine Floc'h, Optimizing Compilation for Processor Instruction-Set Extension, Jun. 2012, C. Wolinski, F. Charot. PhD: Quoc-Tuong Ngo, Generalized minimum Euclidean distance based precoders for MIMO spatial multiplexing systems, Jan. 2012, P. Scalart, O. Berder. PhD: Danuta Pamula, Arithmetic operators on $GF(2^m)$ for cryptographic applications: performance - power consumption - security tradeoffs, University of Rennes 1 and Silesian University of Technology, Dec. 2012, A. Tisserand. PhD: Le-Quang-Vinh Tran, Energy-efficient cooperative relay protocols for wireless sensor networks, Dec. 2012, O. Berder, O. Sentieys. PhD: Karthick Parashar, System-level approaches for fixed-point refinement of signal processing algorithms, Dec. 2012, O. Sentieys, D. Menard. PhD: Chenglong Xiao, Custom Operator Identification for High-level Synthesis, University of Rennes 1, Nov. 2012, E. Casseau. PhD in progress: Mahtab Alam, Power Aware Signal Processing for Reconfigurable Radios in the context of Wireless Sensor Networks, Nov. 2009, O. Sentieys, O. Berder, D. Menard. PhD in progress: Djamel Benfarhat, Design of disruption-tolerant communication protocols for mobile communicating objects in health applications, 2009, Patrice Quinton jointly with Frédéric Guidec, IRISA UBS. PhD in progress: Karim Bigou, RNS Hardware Units for ECC, Oct. 2011, A. Tisserand. PhD in progress: Robin Bonamy, Power Consumption Modelling and Optimisation for Reconfigurable Platform, Oct. 2009, D. Chillet. PhD in progress: Franck Bucheron, Secure Virtualization for Embedded Systems, Oct. 2011, A. Tisserand. PhD in progress: Thomas Chabrier, Reconfigurable Arithmetic Units for Cryptoprocessors with Protection against Side Channel Attacks, Oct. 2009, A. Tisserand, E. Casseau. PhD in progress: Aymen Chakhari, Analytical approach for decision errors in fixed-point digital communication systems, Oct. 2010, R. Rocher, P. Scalart. PhD in progress: Ali-Hassan El-Moussaw, Performance/Accuracy Trade-Off in Automatic Parallelization for Embedded Many-Core Platforms, Nov. 2012, S. Derrien. PhD in progress: Clément Guy, Generic Definition of Domain Specific Analysis using MDE, Oct. 2010, S. Derrien, jointly with J.M. Jezequel and B. Combemale from Triskell EPI. PhD in progress: Christophe Huriaux, Embedded reconfigurable hardware accelerators with efficient dynamic reconfiguration management, Oct. 2012, O. Sentieys, A. Courtay. PhD in progress: Quang-Hai Khuat, Real-Time Spatio-Temporal Task Scheduling on 3D Architectures, Oct. 2011, D. Chillet. PhD in progress: Trong-Nhan Le, Global power management system for self-powered autonomous wireless sensor nodes, Jan. 2011, O. Sentieys, O. Berder. PhD in progress: Quang-Hoa Le, Virtualized dynamic reconfiguration for 3D SoC, Oct. 2012, E. Casseau, A. Courtay. PhD in progress: Jérémie Métairie, Reconfigurable Arithmetic Units for Secure Cryptoprocessors, Oct. 2012, A. Tisserand, E. Casseau. PhD in progress: Antoine Morvan, Loop Transformations for Design Space Exploration in High-Level Synthesis, Oct. 2009, P. Quinton, S. Derrien. PhD in progress: Jean-Charles Naud, Source-to-Source Code Transformation for Fixed-Point Conversion, Oct. 2009, D. Menard, O. Sentieys. PhD in progress: Viet-Hoa Nguyen, Energy-efficient cooperative techniques for Wireless Body Area Sensor Networks, Nov. 2012, O. Berder, jointly with C. Langlais from Telecom Bretagne. PhD in progress: Matthieu Texier, Low-Power Embedded Multi-Core Architectures for Mobile Systems, Oct. 2009, O. Sentieys, jointly with R. David from CEA List. PhD in progress: Michel Theriault, Transmit Beam-forming for Distributed Wireless Access with Centralized Signal Processing, Oct. 2007, O. Sentieys, jointly with S. Roy from U. Laval, Canada. PhD in progress: Vivek D. Tovinakere, Ultra-Low Power Reconfigurable Controllers for Wireless Sensor Networks, Oct. 2009, O. Sentieys. Ganda-Stéphane Ouedraogo, Automatic synthesis of hardware accalerator from high-level specifications in flexible radios, Oct. 2011, M. Gautier, O. Sentieys. PhD in progress: Pramod P. Udupa, Sampling, synchronising, digital processing and FPGA implementation of 100Gbps optical OFDM signals, Jan. 2011, O. Sentieys. PhD in progress: Hervé Yviquel, Video coding design framework based on SoC-based platforms, Oct. 2010, E. Casseau. PhD in progress: Zhongwei Zheng, Short-range geolocation algorithms based on distributed multisensor processing, Nov. 2012, P. Scalart, jointly with C. Roland from Lab-STICC. ## 8.4. Popularization A. Tisserand gave a popularization talk at Fête de la science 2012 in Lannion on low-energy electronic circuits. # 9. Bibliography ## Major publications by the team in recent years - [1] D. CHILLET, A. EICHE, S. PILLEMENT, O. SENTIEYS. *Real-time scheduling on heterogeneous system-on-chip architectures using an optimised artificial neural network*, in "Journal of Systems Architecture Embedded Systems Design", April 2011, vol. 57, no 4, p. 340-353, http://dx.doi.org/10.1016/j.sysarc.2011.01.004. - [2] L. COLLIN, O. BERDER, P. ROSTAING, G. BUREL. *Optimal Minimum Distance Based Precoder for MIMO Spatial Multiplexing Systems*, in "IEEE Transactions on Signal Processing", March 2004, vol. 52, n<sup>o</sup> 3. - [3] A. COURTAY, O. SENTIEYS, J. LAURENT, N. JULIEN. *High-level Interconnect Delay and Power Estimation*, in "Journal of Low Power Electronics (JOLPE)", 2008, vol. 4, n<sup>o</sup> 1, p. 21-33. - [4] R. DAVID, S. PILLEMENT, O. SENTIEYS. *Energy-Efficient Reconfigurable Processors*, in "Low Power Electronics Design", C. PIGUET (editor), Computer Engineering, Vol 1, CRC Press, August 2004, chap. 20. - [5] S. DERRIEN, P. QUINTON. Parallelizing HMMER for Hardware Acceleration on FPGAs, in "18th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2007)", Montreal, Canada, July 2007, p. 10–18, Best Paper Award. - [6] S. DERRIEN, S. RAJOPADHYE, P. QUINTON, T. RISSET. High-Level Synthesis of Loops Using the Polyhedral Model: The MMAlpha Software, in "High-Level Synthesis From Algorithm to Digital Circuit", P. COUSSY, A. MORAWIEC (editors), Springer Netherlands, 2008, p. 215-230, http://dx.doi.org/10.1007/978-1-4020-8588-8. - [7] L. IMBERT, A. PEIRERA, A. TISSERAND. *A Library for Prototyping the Computer Arithmetic Level in Elliptic Curve Cryptography*, in "Proc. Advanced Signal Processing Algorithms, Architectures and Implementations XVII", San Diego, California, U.S.A., F. T. LUK (editor), SPIE, August 2007, vol. 6697, n<sup>o</sup> 66970N, p. 1–9, http://dx.doi.org/10.1117/12.733652. - [8] B. LE GAL, E. CASSEAU, S. HUET. *Dynamic Memory Access Management for High-Performance DSP Applications Using High-Level Synthesis*, in "IEEE Transactions on Very Large Scale Integration Systems", November 2008, vol. 16, no 11, p. 1454-1464. - [9] K. MARTIN, C. WOLINSKI, K. KUCHCINSKI, A. FLOCH, F. CHAROT. Constraint-Driven Instructions Selection and Application Scheduling in the DURASE system, in "Proc. of the 20th IEEE International Conference on Application-Specific Systems, Architectures and Processors", Boston, MA, USA, IEEE Computer Society, July 2009, p. 145-152. - [10] D. MENARD, D. CHILLET, O. SENTIEYS. *Floating-to-fixed-point Conversion for Digital Signal Processors*, in "EURASIP Journal on Applied Signal Processing (JASP), Special Issue Design Methods for DSP Systems", 2006, vol. 2006, n<sup>o</sup> 1, p. 1–15. - [11] D. MENARD, O. SENTIEYS. *Automatic Evaluation of the Accuracy of Fixed-point Algorithms*, in "IEEE/ACM Design, Automation and Test in Europe (DATE-02)", Paris, March 2002. - [12] S. PILLEMENT, O. SENTIEYS, R. DAVID. *DART: A Functional-Level Reconfigurable Architecture for High Energy Efficiency*, in "EURASIP Journal on Embedded Systems (JES)", 2008, p. 1-13, Article ID 562326, 13 pages. - [13] C. PLAPOUS, C. MARRO, P. SCALART. *Improved signal-to-noise ratio estimation for speech enhancement*, in "IEEE Transactions on Speech and Audio Processing", 2006, vol. 14, n<sup>o</sup> 6. - [14] A. TISSERAND. *High-Performance Hardware Operators for Polynomial Evaluation*, in "Int. J. High Performance Systems Architecture", March 2007, vol. 1, n<sup>o</sup> 1, p. 14–23, invited paper, http://dx.doi.org/10.1504/IJHPSA.2007.013288. - [15] C. WOLINSKI, K. KUCHCINSKI, E. RAFFIN. Automatic Design of Application-Specific Reconfigurable Processor Extensions with UPaK Synthesis Kernel, in "ACM Transactions on Design Automation of Electronic Systems", 2009, vol. 15, no 1, p. 1–36, http://doi.acm.org/10.1145/1640457.1640458. ## **Publications of the year** #### **Doctoral Dissertations and Habilitation Theses** - [16] N. ABBAS. Acceleration of a Bioinformatics Application using High-Level Synthesis, University of Rennes 1, May 2012. - [17] A. BANCIU. A Stochastic Approach for the Range Evaluation, University of Rennes 1, February 2012. - [18] O. BERDER. Systèmes multi-antennes et optimisation énergétique des réseaux de capteurs sans fil, University of Rennes 1, December 2012, Habilitation à Diriger des Recherches. - [19] A. EICHE. Real-Time Scheduling for Heterogeneous and Reconfigurable Architectures using Neural Network Structures, University of Rennes 1, September 2012. - [20] A. FLOC'H. *Compilation optimisante pour processeurs extensibles*, University of Rennes 1, June 2012, http://tel.archives-ouvertes.fr/tel-00726420. - [21] Q.-T. NGO. Generalized minimum Euclidean distance based precoders for MIMO spatial multiplexing systems, University of Rennes 1, January 2012. [22] D. PAMULA. Arithmetic Operators on $GF(2^m)$ for Cryptographic Applications: Performance - Power Consumption - Security Tradeoffs, University of Rennes 1 and Silesian University of Technology, December 2012. - [23] K. PARASHAR. System-level approaches for fixed-point refinement of signal processing algorithms, University of Rennes 1, December 2012. - [24] L. Q. V. TRAN. Energy-efficient cooperative relay protocols for wireless sensor networks, University of Rennes 1, December 2012. - [25] C. XIAO. Custom Operator Identification for High-level Synthesis, University of Rennes 1, November 2012. ### **Articles in International Peer-Reviewed Journals** - [26] M. M. ALAM, O. BERDER, D. MENARD, O. SENTIEYS. *TAD-MAC: traffic-aware dynamic MAC protocol for wireless body area sensor networks*, in "IEEE Journal on Emerging and Selected Topics in Circuits and Systems", March 2012, vol. 2, n<sup>o</sup> 1, p. 109 -119 [*DOI :* 10.1109/JETCAS.2012.2187243], http://ieeexplore.ieee.org/xpls/abs\_all.jsp?arnumber=6163385. - [27] R. B. ATITALLAH, E. SENN, D. CHILLET, M. LANOE, D. BLOUIN. *An Efficient Framework for Power Aware Design of Heterogeneous MPSoC*, in "IEEE Transactions on Industrial Informatics", November 2012, vol. PP, n<sup>o</sup> 99, http://dx.doi.org/10.1109/TII.2012.2198657. - [28] G. CAFFARENA, O. SENTIEYS, D. MENARD, J.-A. LOPEZ, D. NOVO. *Quantization of VLSI digital signal processing systems*, in "EURASIP Journal on Advances in Signal Processing", February 2012, vol. 2012, p. 1-2, http://hal.inria.fr/hal-00743410. - [29] E. CASSEAU, B. LE GAL. *Design of multi-mode application-specific cores based on high-level synthesis*, in "Integration, the VLSI Journal", 2012, vol. 45, n<sup>o</sup> 1, p. 9–21 [*DOI*: 10.1016/J.VLSI.2011.07.003], http://www.sciencedirect.com/science/article/pii/S0167926011000617. - [30] J.-M. JÉZÉQUEL, B. COMBEMALE, S. DERRIEN, C. GUY, S. RAJOPADHYE. *Bridging the Chasm Between MDE and the World of Compilation*, in "Journal of Software and Systems Modeling (SoSyM)", October 2012, vol. 11, n<sup>o</sup> 4, p. 581-597 [*DOI*: 10.1007/s10270-012-0266-8], http://hal.inria.fr/hal-00717219. - [31] S. Khan, E. Casseau. *High-performance motion estimation operator using multimedia oriented subword parallelism*, in "Journal of Communication and Computer", 2012, vol. 9, n<sup>o</sup> 1, p. 1–14, http://www.davidpublishing.com/show.html?3593, http://hal.inria.fr/hal-00746875. - [32] K. MARTIN, C. WOLINSKI, K. KUCHCINSKI, A. FLOCH, F. CHAROT. Constraint Programming Approach to Reconfigurable Processor Extension Generation and Application Compilation, in "ACM transactions on Reconfigurable Technology and Systems (TRETS)", June 2012, vol. 5, n<sup>o</sup> 2, p. 1-38, http://doi.acm.org/10. 1145/2209285.2209289. - [33] D. MENARD, O. SENTIEYS, N. HERVÉ, H.-N. NGUYEN. *High-Level Synthesis under Fixed-Point Accuracy Constraint*, in "Journal of Electrical and Computer Engineering (JECE)", March 2012, vol. 2012, n<sup>o</sup> Article ID 906350, p. 1-14 [*DOI*: 10.1155/2012/906350], http://www.hindawi.com/journals/jece/2012/906350. - [34] J.-C. NAUD, D. MENARD, G. CAFFARENA, O. SENTIEYS. A Discrete Model for Correlation Between Quantization Noises, in "IEEE Transactions on Circuits and Systems. Part II, Express Briefs", December 2012, http://hal.inria.fr/hal-00743413. - [35] Q.-T. NGO, O. BERDER, P. SCALART. *Minimum Euclidean Distance Based Precoders for MIMO Systems Using Rectangular QAM Modulations*, in "IEEE Transactions on Signal Processing", March 2012, vol. 60, n<sup>o</sup> 3, p. 1527 -1533 [DOI: 10.1109/TSP.2011.2177972], http://hal.inria.fr/hal-00741554. - [36] Q.-T. NGO, O. BERDER, P. SCALART. *Minimum Euclidean Distance Based Precoding for Three-Dimensional Multiple-Input Multiple-Ouput Spatial Multiplexing Systems*, in "IEEE Transactions on Wireless Communications", 2012, vol. 11, no 7, p. 2486 2495, http://hal.inria.fr/hal-00741559. - [37] Q.-T. NGO, O. BERDER, P. SCALART. General minimum Euclidean distance based precoder for MIMO wireless systems, in "EURASIP Journal on Advances in Signal Processing", 2013, to appear. - [38] A. PASHA, S. DERRIEN, O. SENTIEYS. System Level Synthesis for Wireless Sensor Node Controllers: A Complete Design Flow, in "ACM Transactions on Design Automation of Electronic Systems (TODAES)", January 2012, vol. 17, n<sup>o</sup> 1, p. 2.1–2.24 [DOI: 10.1145/2071356.2071358], http://dl.acm.org/citation.cfm?id=2071358. - [39] M. PHAM, S. PILLEMENT, S. PIESTRAK. Low Overhead Fault-Tolerance Technique for Dynamically Reconfigurable Softcore Processor, in "IEEE Transactions on Computers", 2012, vol. 99, n<sup>o</sup> PrePrints, http://dx.doi.org/10.1109/TC.2012.55. - [40] E. RAFFIN, C. WOLINSKI, F. CHAROT, E. CASSEAU, A. FLOCH, K. KUCHCINSKI, S. CHEVOBBE, S. GUYETANT. Scheduling, Binding and Routing System for a Run-Time Reconfigurable Operator Based Multimedia Architecture, in "Journal of Embedded and Real-Time Communication Systems", January 2012, vol. 3, no 1, p. 1–30 [DOI: 10.4018/JERTCS.2012010101], http://hal.inria.fr/hal-00663458. - [41] R. ROCHER, D. MÉNARD, O. SENTIEYS, P. SCALART. Analytical Approach for Numerical Accuracy Estimation of Fixed-Point Systems Based on Smooth Operations, in "IEEE Transactions on Circuits and Systems. Part I, Regular Papers", October 2012, vol. 59, no 10, p. 2326 2339 [DOI: 10.1109/TCSI.2012.2188938], http://hal.inria.fr/hal-00741741. - [42] C. XIAO, E. CASSEAU. Exact Custom Instruction Enumeration for Extensible Processors, in "Integration, the VLSI Journal", 2012, vol. 45, n<sup>o</sup> 2, p. 263–270 [DOI: 10.1016/J.VLSI.2011.11.011], http://www.sciencedirect.com/science/article/pii/S0167926011001003. - [43] R. ZHANG, O. BERDER, J.-M. GORCE, O. SENTIEYS. *Energy-Delay Tradeoff in Wireless Multihop Networks with Unreliable Links*, in "Ad Hoc Networks", 2012, vol. 10, n<sup>o</sup> 7, p. 1306 -1321, http://hal.inria.fr/hal-00741560. ## **Articles in National Peer-Reviewed Journals** [44] J.-C. NAUD, D. MENARD, O. SENTIEYS. Évaluation de la précision en virgule fixe dans le cas des structures conditionnelles, in "Revue Techniques et Sciences Informatiques, RTSI", December 2012, http://hal.inria.fr/hal-00743415. #### **Invited Conferences** [45] E. CASSEAU. *ROMA: reconfigurable-operator based architecture for multimedia applications*, in "International Symposium on System-on-Chip (SoC)", October 2012, http://hal.inria.fr/hal-00746878. - [46] D. CHILLET. An Overview of Design Problematics for Embedded Systems, in "15th National Symposium on Selected ICT Problems", Hanoi, Viet Nam, University of Science and Technology of Hanoi, December 2012, http://hal.inria.fr/hal-00759676. - [47] D. CHILLET. Estimation et modélisation de la consommation des architectures reconfigurables et du concept de reconfiguration dynamique, in "Colloque du GDR SoC SiP", Paris, France, June 2012. ## **International Conferences with Proceedings** - [48] M. M. ALAM, O. BERDER, D. MENARD, O. SENTIEYS. *Latency-Energy Optimized MAC Protocol for Body Sensor Networks*, in "Ninth International Conference on Wearable and Implantable Body Sensor Networks (BSN)", May 2012, p. 67 -72 [*DOI*: 10.1109/BSN.2012.8], http://hal.inria.fr/hal-00741558. - [49] J. BECKER, M. HUEBNER, T. STRIPF, O. OEY, S. DERRIEN, D. MÉNARD, O. SENTIEYS, G. RAUWERDA, K. SUNESEN, D. KRITHARIDIS, C. VALOUXIS, G. GOULAS, P. ALEFRAGIS, N. S. VOROS, G. DIMITROULAKOS, N. MITAS, D. GOEHRINGER. From Scilab To High Performance Embedded Multicore Systems The ALMA Approach, in "Proc. 15th EUROMICRO Conference on Digital System Design (DSD)", Cesme, Izmir, Turquie, 2012, http://hal.inria.fr/hal-00752642. - [50] D. BENFERHAT, F. GUIDEC, P. QUINTON. Cardiac Monitoring of Marathon Runners using Disruption-Tolerant Wireless Sensors, in "6th Springer Int. Conf. on Ubiquitous Computing and Ambiant Intelligence (UCAmI'12)", Victoria-Gasteiz, Spain, LNCS, Springer, December 2012, http://hal.inria.fr/hal-00763319. - [51] D. BENFERHAT, F. GUIDEC, P. QUINTON. *Disruption-Tolerant Wireless Sensor Networking for Biomedical Monitoring in Outdoor Conditions*, in "7th ACM Int. Conf. on Body Area Networks (BodyNets'12)", Oslo, Norway, ACM, September 2012, http://hal.inria.fr/hal-00763305. - [52] R. BONAMY, D. CHILLET, S. BILAVARN, O. SENTIEYS. *Power Consumption Model for Partial Dynamic Reconfiguration*, in "Proc. of International Conference on ReConFigurable Computing and FPGA (RECONFIG)", Cancun, Mexico, December 2012. - [53] A. CHAKHARI, K. PARASHAR, R. ROCHER, P. SCALART. Analytical approach to evaluate the effect of the spread of quantization noise through the cascade of decision operators for spherical decoding, in "Proc. of International Conference on Design and Architectures for Signal and Image Processing (DASIP)", October 2012, http://hal.inria.fr/hal-00741829. - [54] D. CHILLET, E. SENN, O. ZENDRA, C. BELLEUDY, S. BILAVARN, R. B. ATITALLAH, C. SAMOYEAU, A. FRITSCH. Open-People: Open Power and Energy Optimization PLatform and Estimator, in "Proc. of 15th Euromicro Conference on Digital System Design (DSD'2012)", Cesme, Izmir, Turkey, October 2012, p. 668-675. - [55] A. DIDIOUI, C. BERNIER, D. MORCHE, O. SENTIEYS. *Impact of RF front-end nonlinearity on WSN communications*, in "Proc. of the Ninth International Symposium on Wireless Communication Systems (ISWC)", Paris, France, November 2012, p. 875-879 [DOI: 10.1109/ISWCS.2012.6328493]. - [56] A. DIDIOUI, C. BERNIER, D. MORCHE, O. SENTIEYS. *HarvWSNet: A Co-simulation Framework for Energy Harvesting Wireless Sensor Networks*, in "Proc. of the International Conference on Computing, Networking and Communications (ICNC)", San Diego, USA, January 2013. - [57] M. GAUTIER, V. BERG, D. NOGUET. *Wideband frequency domain detection using Teager-Kaiser energy operator*, in "Proc. of the IEEE 5th International Conference on Cognitive Radio Oriented Wireless Networks and Communications (CrownCom'12)", June 2012, p. 332 337, http://hal.inria.fr/hal-00742528. - [58] G. GOULAS, P. ALEFRAGIS, N. S. VOROS, C. VALOUXIS, C. GOGOS, N. NIKOLAOS, G. DIM-ITROULAKOS, K. MASSELOS, D. GOEHRINGER, S. DERRIEN, O. SENTIEYS, D. MÉNARD, M. HUEBNER, T. STRIPF, O. OEY, J. BECKER, G. RAUWERDA, K. SUNESEN, D. KRITHARIDIS, N. MITAS. From Scilab to Multicore Embedded Systems: Algorithms and Methodologies, in "Proc. of the IEEE International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (IC-SAMOS)", Samos, Grèce, 2012, http://hal.inria.fr/hal-00752615. - [59] F. GUIDEC, D. BENFERHAT, P. QUINTON. *Biomedical Monitoring of Non-Hospitalized Subjects using Disruption-Tolerant Wireless Sensors*, in "3rd International Conference on Wireless Mobile Communication and Healthcare (MobiHealth'12)", Paris, France, Springer, November 2012, http://hal.inria.fr/hal-00763316. - [60] C. GUY, B. COMBEMALE, S. DERRIEN, R. H. STEEL, J.-M. JÉZÉQUEL. On Model Subtyping, in "Proc. 8th European Conference on Modelling Foundations and Applications (ECMFA)", Kgs. Lyngby, Denmark, July 2012, http://hal.inria.fr/hal-00695034. - [61] Q. H. KHUAT, Q. H. LE, D. CHILLET, S. PILLEMENT. Spatio-Temporal Scheduling for 3D Reconfigurable & Multiprocessor Architecture, in "International Design and Test Symposium, IDT 2012", Doha, Quatar, December 2012. - [62] T.-N. LE, O. SENTIEYS, O. BERDER, A. PÉGATOQUET, C. BELLEUDY. *Power Manager with PID controller in Energy Harvesting Wireless Sensor Networks*, in "Proc. of Workshop on energy and Wireless Sensors (e-WiSe)", Besançon, France, November 2012. - [63] F. LEMONNIER, P. MILLET, G. MARCHESAN ALMEIDA, M. HUEBNER, J. BECKER, S. PILLEMENT, O. SENTIEYS, M. KOEDAM, S. SINHA, K. GOOSSENS, C. PIGUET, M. MORGAN, R. LEMAIRE. *Towards future adaptive multiprocessor systems-on-chip: an innovative approach for flexible architectures*, in "Proc. IEEE International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (IC-SAMOS)", Samos, Greece, July 2012. - [64] D. PAMULA, E. HRYNKIEWICZ, A. TISSERAND. *Analysis of GF*(2<sup>233</sup>) *Multipliers Regarding Elliptic Curve Cryptosystem Applications*, in "11th IFAC/IEEE International Conference on Programmable Devices and Embedded Systems (PDeS)", Brno, Czech Republic, May 2012, p. 252-257. - [65] D. PAMULA, A. TISSERAND. *GF*(2<sup>m</sup>) *Finite-Field Multipliers with Reduced Activity Variations*, in "4th International Workshop on the Arithmetic of Finite Fields", Bochum, Germany, LNCS, Springer, July 2012, vol. 7369, p. 152-167, http://dx.doi.org/10.1007/978-3-642-31662-3\_11. - [66] M. PHAM, R. BONAMY, S. PILLEMENT, D. CHILLET. *UPaRC Ultra-fast power-aware reconfiguration controller*, in "Proc. IEEE/ACM Design and Test in Europe Conference (2012)", Dresden, Germany, March 2012, p. 1373–1378. [67] I. PRATOMO, S. PILLEMENT. *Gradient - An Adaptive Fault-tolerant Routing Algorithm for 2D Mesh Network-on-Chips*, in "Proc. of International Conference on Design and Architectures for Signal and Image Processing (DASIP)", Karlsruhe, Germany, October 2012. - [68] I. PRATOMO, S. PILLEMENT. *Impact of Design Parameters on Performance of Adaptive Network-on-Chips*, in "Proc. High Performance Computing and Simulation (HPCS)", Madrid, Spain, July 2012, p. 724 725. - [69] P. QUINTON, A.-M. CHANA, S. DERRIEN. *Efficient Hardware Implementation of Data-Flow Parallel Embedded Systems*, in "12th Int. Conf. on Embedded Computer Systems: Architectures, Modeling, and Simulation (Samos XII)", Samos, Greece, July 2012. - [70] R. ROCHER, P. SCALART. *Noise Probability Density Function in Fixed-Point Systems Based on Smooth Operators*, in "Proc. of International Conference on Design and Architectures for Signal and Image Processing (DASIP)", October 2012, http://hal.inria.fr/hal-00741824. - [71] E. SENN, D. CHILLET, O. ZENDRA, C. BELLEUDY, R. B. ATITALLAH, A. FRITSCH, C. SAMOYEAU. *Open-People: an Open Platform for Estimation and Optimizations of energy consumption*, in "Proc. of International Conference on Design and Architectures for Signal and Image Processing (DASIP)", Karlsruhe, Germany, October 2012. - [72] T. STRIPF, O. OEY, T. BRUCKSCHLOEGL, R. KOENIG, M. HUEBNER, G. GOULAS, P. ALEFRAGIS, S. NIKOLAOS, G. RAUWERDA, K. SUNESEN, S. DERRIEN, D. MENARD, O. SENTIEYS, N. KAVVADIAS, G. DIMITROULAKOS, K. MASSELOS, D. GOEHRINGER, T. PERSCHKE, D. KRITHARIDIS, N. MITAS, J. BECKER. A Flexible Approach for Compiling Scilab to Reconfigurable Multi-Core Embedded Systems, in "Proc. International Workshop on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)", York, Royaume-Uni, 2012, p. 1-8, http://hal.inria.fr/hal-00752644. - [73] V. D. TOVINAKERE, O. SENTIEYS, S. DERRIEN. A Semiemperical Model for Wakeup Time Estimation in Power-Gated Logic Clusters, in "Proc. of the 49th IEEE/ACM Design Automation Conference (DAC)", San Francisco, CA, USA, June 2012, p. 48-55, http://doi.acm.org/10.1145/2228360.2228371. ### **National Conferences with Proceeding** [74] Q. H. KHUAT, Q. H. LE, D. CHILLET, S. PILLEMENT. Spatio-Temporal Scheduling for 3D Reconfigurable and Multiprocessor Architecture, in "Manifestation des Jeunes Chercheurs en Sciences et Technologies de l'Information et de la Communication (MAJECSTIC'2012)", Lille, France, October 2012. ## **Conferences without Proceedings** - [75] Q. H. KHUAT, Q. H. LE, D. CHILLET, S. PILLEMENT. Spatio-Temporal Scheduling for 3D Reconfigurable and Multiprocessor Architecture, in "Colloque du GDR SoC SiP", Paris, France, June 2012. - [76] E. SENN, C. BELLEUDY, D. CHILLET, A. FRITSCH, R. B. ATITALLAH, O. ZENDRA, C. SAMOYEAU. *Open-People: Open-Power and Energy Optimization PLatform and Estimator*, in "University Booth, Sophia-Antipolis Microelectronics Conference (SAME)", Nice, France, October 2012. #### **Scientific Books (or Scientific Book chapters)** - [77] O. SENTIEYS, O. BERDER. *Optimizing Energy Efficiency of Sensor Networks*, in "Energy Autonomous Micro and Nano Systems", M. BELLEVILLE, C. CONDEMINE (editors), Wiley-ISTE, 2012, p. 325–360, http://hal.inria.fr/hal-00742125. - [78] O. SENTIEYS, O. BERDER. *Optimisation énergétique des réseaux de capteurs*, in "Micro et Nanosystèmes autonomes en énergie", M. BELLEVILLE, C. CONDEMINE (editors), Hermes, 2012, http://hal.inria.fr/hal-00742126. - [79] O. SENTIEYS, A. TISSERAND. *Architectures reconfigurables FPGA*, in "Technologies logicielles Architectures des systèmes", Techniques de l'Ingénieur, August 2012, n<sup>o</sup> H 1 196, p. 1-22. - [80] R. ZHANG, O. BERDER, O. SENTIEYS. *Energy-Latency Tradeoff of Opportunistic Routing*, in "Routing in Opportunistic Networks", Springer, 2012, http://hal.inria.fr/hal-00742127. ## **Research Reports** [81] J.-C. NAUD, D. MENARD. *Numerical Accuracy Evaluation for Polynomial Computation*, Inria, February 2012, 20, http://hal.inria.fr/hal-00672654. ## **Scientific Popularization** [82] A. TISSERAND. Énergie dans les puces électroniques, October 2012, Exposé, Fête de la Science, Lannion. #### **Other Publications** - [83] A. TISSERAND. Circuits for True Random Number Generation with On-Line Quality Monitoring, June 2012, Rencontres Arithmétique de l'Informatique Mathématique, Exposé invité. - [84] A. TISSERAND. *Power Analysis and Cryptosystem Security: Attacks and Countermeasures*, May 2012, Cours École Thématique ECOFAC 2012, <a href="http://leat.unice.fr/ECoFaC2012/">http://leat.unice.fr/ECoFaC2012/</a>. ## References in notes - [85] S. HAUCK, A. DEHON (editors). Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation, Morgan Kaufmann, 2008. - [86] Z. ALLIANCE. Zigbee specification, ZigBee Alliance, 2005, n<sup>o</sup> ZigBee Document 053474r06, Version. - [87] A. BACHIR, M. DOHLER, T. WATTEYNE, K. LEUNG. *MAC Essentials for Wireless Sensor Networks*, in "Communications Surveys Tutorials, IEEE", quarter 2010, vol. 12, n<sup>o</sup> 2, p. 222 -248, http://dx.doi.org/10.1109/SURV.2010.020510.00058. - [88] F. BARAT, M. JAYAPALA, T. VANDER AA, R. LAUWEREINS, G. DECONINCK, H. CORPORAAL. *Low Power Coarse-Grained Reconfigurable Instruction Set Processor*, in "International Workshop on Field Programmable Logic and Applications", Lecture Notes in Computer Science, Lecture notes in Computer Science 2778, September 2003, p. 230–239. - [89] V. BAUMGARTE, G. EHLERS, F. MAY, A. NÜCKEL, M. VORBACH, M. WEINHARDT. *PACT XPP A Self-Reconfigurable Data Processing Architecture*, in "The Journal of Supercomputing", 2003, vol. 26, n<sup>o</sup> 2, p. 167–184. [90] C. BOBDA. Introduction to Reconfigurable Computing: Architectures Algorithms and Applications, Springer, 2007. - [91] J. M. P. CARDOSO, P. C. DINIZ, M. WEINHARDT. *Compiling for reconfigurable computing: A survey*, in "ACM Comput. Surv.", June 2010, vol. 42, p. 13:1–13:65, http://doi.acm.org/10.1145/1749603.1749604. - [92] D. CHILLET, S. PILLEMENT, O. SENTIEYS. A Neural Network Model for Real-Time Scheduling on Heterogeneous SoC Architectures, in "IEEE International Joint Conference on Neural Networks, IJCNN'07", Orlando, FL, August, 12-17 2007. - [93] M. CLARK, M. MULLIGAN, D. JACKSON, D. LINEBARGER. Accelerating Fixed-Point Design for MB-OFDM UWB Systems, in "CommsDesign", 2005, http://www.commsdesign.com/showArticle.jhtml?articleID=57703818. - [94] L. COLLIN, O. BERDER, P. ROSTAING, G. BUREL. *Optimal minimum distance-based precoder for MIMO spatial multiplexing systems*, in "IEEE Transactions on Signal Processing", 2004, vol. 52, n<sup>o</sup> 3, p. 617–627. - [95] K. COMPTON, S. HAUCK. *Reconfigurable computing: a survey of systems and software*, in "ACM Comput. Surv.", 2002, vol. 34, n<sup>o</sup> 2, p. 171–210, http://doi.acm.org/10.1145/508352.508353. - [96] G. CONSTANTINIDES, P. CHEUNG, W. LUK. Wordlength optimization for linear digital signal processing, in "IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems", October 2003, vol. 22, no 10, p. 1432-1442. - [97] M. COORS, H. KEDING, O. LUTHJE, H. MEYR. *Fast Bit-True Simulation*, in "Proc. ACM/IEEE Design Automation Conference (DAC)", Las Vegas, june 2001, p. 708-713. - [98] S. Cui, A. Goldsmith, A. Bahai. *Energy-efficiency of MIMO and cooperative MIMO techniques in sensor networks*, in "IEEE Journal on Selected Areas in Communications", 2004, vol. 22, n<sup>o</sup> 6, p. 1089–1098. - [99] K. DANNE, R. MUHLENBERND, M. PLATZNER. Executing hardware tasks on dynamically reconfigurable devices under real-time conditions, in "International Conference on Field Programmable Logic and Applications", Lecture Notes in Computer Science, 2006. - [100] R. DAVID, S. PILLEMENT, O. SENTIEYS. *Energy-Efficient Reconfigurable Processors*, in "Low Power Electronics Design", C. PIGUET (editor), Computer Engineering, Vol 1, CRC Press, August 2004, chap. 20. - [101] M. DOHLER, E. LEFRANC, H. AGHVAMI. *Space-time block codes for virtual antenna arrays*, in "The 13th IEEE International Symposium on Personal, Indoor and Mobile Radio Communications", 2002, vol. 1. - [102] A. DUNKELS, B. GRONVALL, T. VOIGT. Contiki-a lightweight and flexible operating system for tiny networked sensors, in "Proceedings of the First IEEE Workshop on Embedded Networked Sensors", 2004. - [103] P. FARABOSHI, G. BROWN, J. FISHER, G. DESOLI. *Lx: A technology Platform for Customizable VLIW Embedded Processing*, in "ACM/IEEE Int. Symp. on Computer Architecture (ISCA 00)", Vancouver, Canada, June 2000, p. 203–213. - [104] P. GARCIA, K. COMPTON, M. SCHULTE, E. BLEM, W. Fu. An overview of reconfigurable hardware in embedded systems, in "EURASIP J. Embedded Syst.", January 2006, vol. 2006, p. 1–19. - [105] S. HAUCK, A. DEHON. Reconfigurable computing: the theory and practice of FPGA-based computation, Series on Systems on Silicon, Morgan Kaufmann, 2008. - [106] A. HORMATI, M. KUDLUR, S. MAHLKE, D. BACON, R. RABBAH. Optimus: efficient realization of streaming applications on FPGAs, in "Proc. Int. Conf. on Compilers, Architecture, and Synthesis for Embedded Systems", New York, NY, USA, CASES'08, ACM, 2008, p. 41–50, http://doi.acm.org/10.1145/1450095. 1450105. - [107] S. KIM, W. SUNG. Word-length optimization for high level synthesis of digital signal processing systems, in "IEEE Workshop on Signal Processing Systems", Boston, October 1998, p. 142-151. - [108] K. Kum, J. Kang, W. Sung. *AUTOSCALER for C: An optimizing floating-point to integer C program converter for fixed-point digital signal processors*, in "IEEE Transactions on Circuits and Systems II Analog and Digital Signal Processing", September 2000, vol. 47, n<sup>o</sup> 9, p. 840-848. - [109] J. LANEMAN, G. WORNELL. Distributed space-time-coded protocols for exploiting cooperative diversity in wireless networks, in "IEEE Transactions on Information Theory", 2003, vol. 49, no 10, p. 2415–2425. - [110] A. LODI, M. TOMA, F. CAMPI, A. CAPPELLI, R. CANEGALLO, R. GUERRIERI. A VLIW Processor With Reconfigurable Instruction Set for Embedded Applications, in "IEEE J. of Solid-State Circuits", 2003, vol. 38, no 11, p. 1876–1886. - [111] T. MARESCAUX, V. NOLLET, J. MIGNOLET, A. BARTICA, W. MOFFATA, P. AVASAREA, P. COENEA, D. VERKEST, S. VERNALDE, R. LAUWEREINS. *Run-time support for heterogeneous multitasking on reconfigurable SoCs*, in "Integration, the VLSI journal", 2004, vol. 38, p. 107–130. - [112] B. MEI, S. VERNALDE, D. VERKEST, H. DE MAN, R. LAUWEREINS. *ADRES: An architecture with tightly coupled VLIW processor and coarse-grained reconfigurable matrix*, in "Proc. Int. Conf. on Field Programmable Logic and Applications", Springer, 2003, p. 61–70. - [113] D. MENARD, D. CHILLET, F. CHAROT, O. SENTIEYS. *Automatic Floating-point to Fixed-point Conversion for DSP Code Generation*, in "IEEE/ACM Int. Conf. on Compilers, Architectures and Synthesis for Embedded Systems (CASES)", Grenoble, October 2002. - [114] H. NIKOLOV, M. THOMPSON, T. STEFANOV, A. PIMENTEL, S. POLSTRA, R. BOSE, C. ZISSULESCU, E. DEPRETTERE. *Daedalus: toward composable multimedia MP-SoC design*, in "Proc. Design Automation Conference", New York, NY, USA, DAC'08, ACM, 2008, p. 574–579, http://doi.acm.org/10.1145/1391469. 1391615. - [115] Y. PARK, H. PARK, S. MAHLKE. *CGRA express: accelerating execution using dynamic operation fusion*, in "Proc. Int. Conf. on Compilers, Architecture, and Synthesis for Embedded Systems", New York, NY, USA, CASES'09, ACM, 2009, p. 271–280, http://doi.acm.org/10.1145/1629395.1629433. - [116] J. RABAEY. *Reconfigurable Processing: The Solution to Low-Power Programmable DSP*, in "IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)", 1997, vol. 1, p. 275–278. [117] R. SALEH, S. WILTON, S. MIRABBASI, A. HU, M. GREENSTREET, G. LEMIEUX, P. PANDE, C. GRECU, A. IVANOV. *System-on-chip: reuse and integration*, in "Proceedings of the IEEE", 2006, vol. 94, n<sup>o</sup> 6, p. 1050–1069. - [118] E. SALMINEN, A. KULMALA, T. D. HAMALAINEN. *Survey of Network-on-chip Proposals*, in "White Paper, OCP-IP", 2008, http://www.ocpip.org/socket/whitepapers. - [119] K. SEEHYUN, K. KUM, W. SUNG. *Fixed-point optimization utility for C and C++ based digital signal processing programs*, in "IEEE Transactions on Circuits and Systems II Analog and Digital Signal Processing", nov 1998, vol. 45, n<sup>o</sup> 11, p. 1455 -1464, http://dx.doi.org/10.1109/82.735357. - [120] G. THEODORIDIS, D. SOUDRIS, S. VASSILIADIS. A survey of coarse-grain reconfigurable architectures and CAD tools, in "Fine- and coarse-grain reconfigurable computing", Springer Verlag, 2007. - [121] G. VENKATARAMANI, W. NAJJAR, F. KURDAHI, N. BAGHERZADEH, W. BOHM, J. HAMMES. *Automatic compilation to a coarse-grained reconfigurable system-on-chip*, in "ACM Trans. on Embedded Computing Systems", 2003, vol. 2, n<sup>o</sup> 4, p. 560–589, http://doi.acm.org/10.1145/950162.950167. - [122] C. WOLINSKI, M. GOKHALE, K. MCCAVE. A polymorphous computing fabric, in "Micro, IEEE", 2002, vol. 22, no 5, p. 56–68. - [123] C. WOLINSKI, K. KUCHCINSKI, A. POSTOLA. *UPaK: abstract unified pattern based synthesis kernel for hardware and software systems*, in "University Booth, DATE 2007", Nice, France, May 2007. - [124] Z. A. YE, N. SHENOY, P. BANEIJEE. A C compiler for a processor with a reconfigurable functional unit, in "Proc. ACM/SIGDA Int. Symp. on Field Programmable Gate-Arrays, FPGA", New York, NY, USA, ACM Press, 2000, p. 95–100, http://doi.acm.org/10.1145/329166.329187. - [125] Z. UL-ABDIN, B. SVENSSON. Evolution in architectures and programming methodologies of coarse-grained reconfigurable computing, in "Microprocessors and Microsystems", 2009, vol. 33, n<sup>o</sup> 3, p. 161 178 [DOI: 10.1016/J.MICPRO.2008.10.003], http://www.sciencedirect.com/science/article/pii/S0141933108001038.