INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE # Team R2D2 # Reconfigurable and Retargetable Digital Devices Rennes - Bretagne Atlantique # **Table of contents** | 1. | Team | 1 | | | | |----|-----------------------------------------------------------------------------------------------------------|----------|--|--|--| | 2. | Overall Objectives | 2 | | | | | | 2.1. Introduction | | | | | | | 2.2. Directions | 3 | | | | | | 2.2.1. New Architectures and Technologies | 3 | | | | | | 2.2.2. Modeling, Synthesis and Compilation Targeting Reconfigurable Platforms | | | | | | | 2.2.3. Study of Applications | 3 | | | | | 3. | Scientific Foundations | <b>3</b> | | | | | | 3.1. Panorama | | | | | | | 3.2. New Architectures and Technologies | | | | | | | <ul><li>3.2. New Architectures and Technologies</li><li>3.2.1. New Reconfigurable Architectures</li></ul> | | | | | | | 3.2.2. Network on Chip Design | | | | | | | 3.2.3. Wireless Sensor Networks | | | | | | | 3.3. Modeling, Synthesis and Compilation for Reconfigurable Platforms | 5 | | | | | | 3.3.1. Dedicated Hardware Accelerator Synthesis | 6 | | | | | | 3.3.2. Processor Modeling and Flexible Compilation | 6 | | | | | | 3.3.3. Floating-Point to Fixed-Point Conversion | 7 | | | | | 4. | Application Domains | <b>7</b> | | | | | | 4.1. Panorama | 7 | | | | | | 4.2. Mobile Communications | 7 | | | | | 5. | Software | 8 | | | | | | 5.1. Panorama | 8 | | | | | | 5.2. PolyLib | 8 | | | | | | 5.3. MMAlpha | 8 | | | | | | 5.4. BSS, BOOST | 9 | | | | | | 5.5. Gecos | 9 | | | | | | 5.6. FWRToolbox | 10 | | | | | 6. | New Results | | | | | | | 6.1. New Architectures and Technologies | 10 | | | | | | 6.1.1. New Organization of Reconfigurable Structures | 10 | | | | | | 6.1.1.1. Reconfigurable Architecture Description Language | 10 | | | | | | 6.1.1.2. Multi-Mode Architecture Design | 11 | | | | | | 6.1.1.3. Memory Hierarchy in Specialized SoC | 11 | | | | | | 6.1.2. Efficient Coding or Modulation Schemes for On-Chip Interconnection Networks | 11 | | | | | | 6.1.3. Wireless Sensor Networks | 12 | | | | | | 6.1.3.1. Energy-Efficiency Optimization for Cooperative MIMO Schemes in Wireless Se | nsor | | | | | | Networks | 12 | | | | | | 6.1.3.2. Power Optimization of Channel Codec Adapted for Wireless Sensor Networks | 12 | | | | | | 6.2. Modeling, Synthesis and Compilation for Reconfigurable Platforms | 13 | | | | | | 6.2.1. Synthesis and Compilation Techniques | 13 | | | | | | 6.2.1.1. Derivation of Efficient Architectures for Regular Arrays | 13 | | | | | | 6.2.1.2. Automatic Synthesis of Optimized Application-Dependent Reconfigurable Syst | tems | | | | | | | 13 | | | | | | 6.2.1.3. Run-Time Reconfigurable Architecture Modeling | 14 | | | | | | 6.2.1.4. Specialized Microcontroller Synthesis on FPGA | 14 | | | | | | 6.2.2. Floating-Point to Fixed-Point Transformation | 14 | | | | | | 6.2.2.1. Floating-Point to Fixed-Point Conversion Methodology for FPGA | 14 | | | | | | 6.2.2.2. Analytical Accuracy Evaluation in Fixed-Point Systems | 15 | | | | | | 6.2.2.3. Accuracy Constraint Determination for Fixed-Point Systems | 15 | | | | | | 6.2.2.4. Optimal Fixed-Point Implementation of Filter/Controller | 15 | |-----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------| | | 6.2.3. Specialized SoC Architecture Modeling | 16 | | | 6.2.3.1. System Modelling for Dynamically Reconfigurable Architectures | 16 | | | 6.2.3.2. SoC Modeling and Prototyping on FPGA-Based Systems | 17 | | | 6.3. Study of Applications | 17 | | | 6.3.1. Radio-Communication Systems | 17 | | | 6.3.1.1. MIMO Mobile Communication Systems Prototyping | 17 | | | 6.3.1.2. Transmit Beamforming for Distributed Wireless Access with Centralized S | ignal | | | Processing | 18 | | | 6.3.1.3. Parallel Reconfigurable Architectures for LDPC Decoding | 18 | | | 6.3.2. Content-Based Image Retrieval Hardware Acceleration | 18 | | | 6.3.3. Hardware Acceleration of Bioinformatics Applications | 19 | | | 6.3.4. Intrusion Detection System in Hardware | 19 | | | 6.3.5. Accelerating Statistical Test for Real-Time Estimation of Randomness | 20 | | | 6.3.6. Noise Reduction in Speech Processing | 20 | | | 6.3.7. Intelligent Transport System | 21 | | 7. | Contracts and Grants with Industry | | | | 7.1. ANR RNRT SVP (2006-2008) | 22 | | | 7.2. ANR Architectures du Futur - ROMA: Reconfigurable Operators for Multimedia Applica | | | | (2007-2010) | 22 | | | 7.3. ANR Technologies Logicielles - SocLib (2007-2009) | 22 | | _ | 7.4. Contract with Thomson (2006-2009) | 22 | | 8. | Other Grants and Activities | | | | 8.1. Regional Actions | 23 | | | 8.1.1. Fastnet: Fast Adaptive Secure Technology for high-speed NETwork (2005-2007) | 23 | | | 8.1.2. CAPTIV (2006-2008) | 23 | | | 8.1.3. PucesCom-Santé (2006-2008) | 23 | | | 8.2. National Actions | 23 | | | 8.3. International Bilateral Relations | 23 | | | 8.3.1. Europe | 23 | | | 8.3.1.1. Comap Project: Collaboration with Germany | 23<br>24 | | | 8.3.1.2. Other European Collaboration | 24 | | | 8.3.2. Africa 8.3.3. North America | 24 | | | | 24 | | 9. | 8.4. Visiting Scientists | | | <b>7.</b> | <b>Dissemination</b> | 25 | | | 9.2. Seminars and Invitations | 25<br>25 | | | | 25<br>25 | | 10. | e i | | | 4 V • | 1/10/10 to 10/10 1 | <del>//</del> 🗸 | # 1. Team The team R2D2 is located on two sites: Rennes and Lannion. R2D2 is a common team with CNRS, University of Rennes 1 (Ifsic and Enssat). The team has been created on May the $1^{St}$ , 2002. #### Head of team François Charot [ CR INRIA ] Olivier Sentieys [ Professor, University of Rennes 1, Enssat, HdR ] #### Administrative assistant Céline Ammoniaux [ until 4/5/07, TR CNRS ] Isabelle Leca [ since 4/6/07, TR INRIA ] Orlane Kuligowski [ until 08/31/07, Enssat, University ] Joelle Thépault [ since 09/1/07, Enssat, University ] #### Research scientist Sébastien Pillement [ Associate professor, IUT Lannion, on secondment at INRIA since 09/01/06 ] #### **Faculty members** Olivier Berder [ Associate professor, Enssat ] Emmanuel Casseau [ Professor, Enssat, HdR ] Daniel Chillet [ Associate professor, Enssat ] Steven Derrien [ Associate professor, Ifsic ] Daniel Menard [ Associate professor, Enssat ] Patrice Quinton [ Professor, Director of the Brittany branch of the ENS de Cachan, HdR ] Pascal Scalart [ Professor, Enssat, HdR ] Christophe Wolinski [ Professor, Ifsic, HdR ] #### **Associated Faculty members** Didier Demigny [ Professor, Director of IUT Lannion, HdR ] Hélène Dubois [ Associate professor, Enssat ] Michel Guitton [ Associate professor, Enssat ] Laurent Perraudeau [ Associate professor, Ifsic ] #### **Technical staff** Charles Wagner [ IR CNRS ASCII ] Philippe Quemerais [ IR Enssat, part time ] Ludovic L'Hours [ ANR SVP Project since 11/06/06, Irisa-Rennes ] Antoine Floch [ ANR ROMA Project since 10/01/07, Irisa-Rennes ] Jérôme Astier [ Captiv Project since 10/15/07, Irisa-Lannion ] Mohamad Diab [ IA INRIA since 10/15/07, Irisa-Lannion ] #### Post-doctoral fellows Florent Berthelot [ Post doc since 09/01/07, Irisa-Rennes ] Thibault Hilaire [Lecturer since 09/01/07, IUT, Irisa-Lannion] Romuald Rocher [ Lecturer since 09/01/07, IUT, Irisa-Lannion ] Babtiste Vrigneau [Lecturer since 09/01/07, IUT, Irisa-Lannion] #### Visiting scientist Sébastien Roy [ Professor, Laval University, Québec, Canada, Visiting Professor from 6/23/07 to 8/6/07, Enssat ] #### PhD students Georges Adouko [ INRIA grant, Irisa-Rennes ] Faten Benabdallah [ Tunisian grant, Irisa-Lannion ] Anne-Marie Chana [ SARIMA grant, co-supervision with Yaoundé I University - Cameroon, Irisa-Rennes ] Antoine Courtay [ Brittany Region grant, co-supervision with LESTER Lorient, Irisa-Lannion ] Mohamed Djendi [ France Telecom grant, Irisa-Lannion ] Erwan Grace [ CEA - University grant, Irisa-Lannion ] Shafqat Khan [ University grant, Irisa-Lannion ] Julien Lallet [ University grant, Irisa-Lannion ] Ludovick Lepauloux [ France Telecom grant, Irisa-Lannion ] Kevin Martin [ INRIA grant, Irisa-Rennes ] Tuan-Duc Nguyen [ University grant, Irisa-Lannion ] Hai-Nam Nguyen [ University grant, Irisa-Lannion ] Auguste Noumsi [ SARIMA grant, co-supervision with Yaoundé I University - Cameroon, Irisa-Rennes ] Adeel Pasha [ MENRT grant, Irisa-Rennes ] Manh Pham [ Brittany Region - University grant, Irisa-Lannion ] Erwan Raffin [ CIFRE grant, THOMSON, Irisa-Rennes ] Renaud Santoro [ MENRT grant, co-supervision with Laval university - Québec, Irisa-Lannion ] Taofik Saïdi [ University grant, co-supervision with Laval university - Québec, Irisa-Lannion ] Michel Thériault [ CSRNG Canada grant, co-supervision with Laval university - Québec, Irisa-Lannion ] David Virette [ France Telecom grant, Irisa-Lannion ] #### Master students Adeel Pasha [ Master on embedded systems (Nice), Irisa-Lannion ] Umer Farooq [ Master on embedded systems (Nice), Irisa-Lannion ] Shafqat Khan [ Master on embedded systems (Nice), Irisa-Lannion ] Antoine Floch [ Master on computer science (Rennes), Irisa-Rennes ] # 2. Overall Objectives #### 2.1. Introduction The problems tackled by the team R2D2 relate to the design of specialized systems on reconfigurable platforms. A hardware platform is a structure of Integrated Circuits (IC) containing a set of programmable components—general purpose or specific processor cores—, memories and generally specialized components. Such a platform can be seen as an integrated architecture scheme, common to numerous algorithms belonging to a given application domain. This can be seen as an answer given by the designers of embedded systems to the increasing difficulty they have to implement their applications [69]. One can consequently imagine that in the future, most of the ICs necessary to the design of a complex system will be derived from a given existing platform. This design approach is an alternative to the IP-based (*Intellectual Property*) design approach, in which the system is built by assembling separately designed components. A reconfigurable platform includes a set of reconfigurable components (blocks of reconfigurable logic, reconfigurable datapath, flexible communication networks). In terms of area and power consumption, the reconfigurable resources enable a more efficient use of the silicon than in programmable processors or in specialized components. Future platforms will be highly parallel, heterogeneous, programmable and reconfigurable. Parallelism is the only way of reaching the performance level required by future applications. Heterogeneity results from the report that an efficient design is often composed of several subsystems, characterized by well-differentiated computation requirements. Programmability avoids freezing the functionalities. Finally, reconfigurability combines the speed of specialized solutions and the flexibility of traditional programmable components. Our scientific objectives seek to profit from various methods (very high-level synthesis, behavioral synthesis, flexible compilation, floating-point to fixed-point conversion, etc.), contributing each one with its specificities, to the design of a part of a specialized system. The models and the underlying techniques allow the use of estimators, thus contributing to the choices of implementation, with a precise knowledge of the performance of the system, of its complexity and its power consumption. #### 2.2. Directions Research undertaken within the team R2D2 aims at facilitating the design of reconfigurable hardware systems, by proposing models of architectures and associated design methodologies which favor the adequacy between the algorithms of the applications and the architectures supporting the implementation. The team links together three main directions. #### 2.2.1. New Architectures and Technologies Our studies, motivated by the constraints of high-performance, flexibility, and low-power consumption, focus on the following topics: - new organizations of reconfigurable structures offering the speed of specialized solutions and the flexibility of traditional programmable components with regards to application areas like mobile telecommunications; - the application of advanced mobile telecommunication techniques to the design of Network-on-Chip (NoC); - architectures for low-power sensor networks. #### 2.2.2. Modeling, Synthesis and Compilation Targeting Reconfigurable Platforms The implementation of an application on a reconfigurable platform requires a large set of techniques. Successive refinements lead to the implementation choices of the various parts of the application on the components of the platform. Our studies focus on the following aspects: SoC modeling, synthesis of dedicated hardware accelerators, processor modeling and flexible compilation, floating-point to fixed-point conversion. #### 2.2.3. Study of Applications Our privileged field of applications is third and fourth-generation mobile telecommunications. Moreover other application domains are considered: cryptography and traffic filtering in high-speed networks, image indexing, speech processing and bioinformatics. Our research includes the prototyping of applications on reconfigurable and programmable platforms. # 3. Scientific Foundations #### 3.1. Panorama R2D2 research activities inherit from two scientific communities working on closely related areas in the design of hardware systems: the first relates to methods and tools for specialized architecture design and the second concerns signal processing and dedicated circuit architectures. We first outline the evolution of specialized architectures. Then we present some bases of our research. # 3.2. New Architectures and Technologies **Keywords:** *Network-on-chip*, *SoC*, *computation grain*, *low-power consumption*, *multiple-valued logic*, *reconfigurable architecture*, *sensor network*. By the end of the decade, IC technology should allow to fabricate billion transistors chips, instead of few tens of millions today as illustrated by the document published by the SIA<sup>1</sup> (*Semiconductors Industry Association*). The hardware systems of the future equipments will be miniaturized – one now usually speaks about System-on-Chip (SoC) – while mixing architectures which will be highly heterogeneous and will include dedicated hardware accelerators. <sup>&</sup>lt;sup>1</sup>http://www.itrs.net/Links/2005ITRS/Home2005.htm Even if electronic CAD tools and associated design methodologies progressed during last years, the design of new ICs has not become easier. On the contrary, the gap between the capacities offered by the IC technology and the potential of the current design tools – the famous *technology gap*, – has never been as large. A rather fundamental change in the way of designing circuits is needed. This evolution of the technology has an impact on the architectures of manufactured ICs. With the years, a migration is noted: from ASIC towards SoC, and in an immediate future towards reconfigurable programmable platforms. - ASIC were prevalent between 1980 and 1995, and from now on are only used as particular blocks in more complex heterogeneous systems. - The first SoCs were designed around 1995. Thanks to the increasing density of chips, a complex SoC usually integrates one or more processor cores (general purpose processor or digital signal processor), memory blocks (RAM, ROM, flash memory, EPROM, etc.), as well as many different interfaces useful for the correct working of the system. They combine hardware and software components. Their design relies on the use of synthesis, place and route tools, and libraries of reusable components. - In the near future, SoC will evolve to reconfigurable platforms. A reconfigurable platform targets a set of algorithms or applications generally common to the same field of applications. The design tools and methodologies must thus make it possible to derive a specialized architecture starting from this basic architecture [79]. The platforms will allow the needs for a broader spectrum of applications to be satisfied. Flexibility, high-performance and energy efficiency, are critical issues for embedded applications. This is particularly true for mobile applications. These three constraints are taken into consideration in our architecture studies. #### 3.2.1. New Reconfigurable Architectures The last years saw the emergence of new reconfigurable architectures [65], which are an alternative to the traditional performance/flexibility compromise, conditioning the choice between purely hardware (ASIC) or purely software (programmable processor) solutions. Computation grain and reconfiguration schemes are open research topics in this field. Many reconfigurable architectures are based on FPGA-type circuits and the majority of them, such as GARP [67], NAPA [78], Chimaera [66], integrate a traditional programmable processor in charge of the sequencing of the treatments on the reconfigurable block. The concept of computation grain indeed constitutes an interesting and significant research subject. The majority of the FPGA circuits are fine grain since they can be reconfigured at the bit level, which contrasts with programmable processors that manipulate words (32bit words for a number of them). When bit-level reconfiguration is not required by the application, coarsegrained structures must be built starting from the elementary blocks of the reconfigurable structure, which results in a over-cost of the circuit. To limit this over-cost, new coarse-grained reconfigurable architectures have been proposed. This results in structures in which the elementary blocks correspond to arithmetic logic units, multipliers, memories, etc. Architectures such as Piperench [62] or RaPiD [56] can be reconfigured respectively at the operator and functional levels. In addition, architectures like Matrix [58] at MIT, MorphoSys [73] at the University of California at Irvine, can be quoted. And among the commercial realizations: the array of reconfigurable arithmetic logic units of Elixent, the former Chameleon processor and the XPP processors of PACT<sup>2</sup>. The Pleiades [76] project is an architectural platform supporting several computation grains - logic operations are treated as effectively as the arithmetic operations, - designed in order to consume a minimum of energy whatever the level of required performance. However, this platform is not very flexible because of the static feature of its reconfiguration which limits it to certain field of applications, the coding of words having been the support of the study. <sup>&</sup>lt;sup>2</sup>http://www.pactcorp.com/ #### 3.2.2. Network on Chip Design The rapid growth of device densities on silicon has made it possible to design a SoC using validated IP blocks. Traditional common interconnection resource on a SoC is a shared bus. Increasing the number of blocks in the SoC emphasizes the limitation of the bus solution. Among those limitations stand the increasing noise sensibility and the scalability of the interconnection scheme. In order to precisely control the electrical and scalability parameters [57] of the interconnect, in-chip communications have to be organized. A new paradigm is rising to face the interconnect issue [55]. The Network on Chip (NoC) concept proposes to use well-defined network layers to build the interconnection scheme. It separates the communication process into different layers associated with services (error detection or correction, routing or packetizing for example). A NoC is dedicated to the reliable and efficient routing of information grouped in packets (with redundancy information, routing information, etc.). Assuming that the voltage swing on wires will decrease in the next few years, the reliability of the physical layer will decrease. The challenge is to provide a reliable, efficient and low-power link to meet the requirements of future SoCs. #### 3.2.3. Wireless Sensor Networks Wireless sensor networks are groups of sensors interconnected with each other through wireless links. The aim of these sensor networks is to collect information from the area and to relay it through the network. Sensor networks have raise new challenges in wireless communications [83]. First, the autonomy or the lifetime of a sensor network must be very high, since the sensors can be integrated in concrete, in the soil or even in the body of living beings where the replacement of the batteries is impossible or difficult. Energy-scavenging techniques can be used for that purpose. Then, these networks have to be self-organized since they have to cope with local sensor breakdowns, for example when some sensors run out of power. Another important singularity is the fact that the data rate needed by the applications should be quite low, since the data does not have to be sent continuously, but only when changes occur. Many applications have been proposed, in miscellaneous domains of activities, e.g. in agriculture, building, bridges, transport, military applications, enemy monitoring, chemical and bacteriological monitoring, emergency after earthquakes. Many wireless systems already exist and are commercially successful. Their specifications have generally been developed in order to maximize the spectral efficiency. In sensor networks, the energy is more critical than the available spectrum. For these kind of applications we should rather maximize the power efficiency than the spectral efficiency. A communication system can be described functionally by dividing the processing in layers. The OSI (Open Systems Interconnection) model describes seven layers for the processing. The problem is that the design of a communication system cannot efficiently be done for each layer separately because layers are coupled to each other. Separate optimizations for each layer is not sufficient. That is why designing a powerefficient system must take into account this coupling, by making cross-layer optimizations [61]. For that reason, it is better to consider few layers. Our approach is to use a fragmentation of the protocol stack in only two layers. The higher-level part includes the aims of OSI application, presentation, session, transport, and network levels. The lower-level part includes the aims of OSI data-link and physical levels. The lower-level part considers a transmission between two neighbor nodes and has to optimize the communication from this point of view. The higher-level part considers a transmission between generally distant applications, assuming that the lower-level communication used are energy-efficient. This fragmentation has already been used in [81], and can be justified by saying that networking issues are coupled together only in the higher-layer part, while the channel management issues are coupled only in the lower-layer part. # 3.3. Modeling, Synthesis and Compilation for Reconfigurable Platforms **Keywords:** ASIP, IC, architecture description language, data coding, design methodology, fixed-point arithmetic, flexible compilation, high-level synthesis, parallel architecture, precision, retargetable compilation, specialized processor. #### 3.3.1. Dedicated Hardware Accelerator Synthesis Although the architecture of ICs evolves to increasingly programmable and reconfigurable solutions, future silicon systems will continue to integrate specialized hardware components. The design of such components rests on the use of synthesis techniques. Today circuits synthesis starts from high-level specifications. The specification of programs carrying out regular computations in the form of recurrence equations allows powerful static analyses and transformations of programs for the derivation of regular architectures. The base of our research is the polyhedral model, which is well-suited to the expression of the calculation parts applications and which allows the expression and the handling of systems of recurrence equations. There exist many academic environments prototypes for the automatic synthesis of specialized architectures starting from high-level specification: for example, Diastol, Presage, Hifi, Cathedral, Sade, PEI and MMAlpha. Tools performing a high-level synthesis from the C language now exist on the market: tools based on SystemC<sup>3</sup> like *CoCentric SystemC Compiler*<sup>4</sup> of Synopsys, *A|RT Builder* of Adelante Technologies/Frontier Design, tools based on C and its extensions as *Celoxica DK1 Design Suite* <sup>5</sup> of Celoxica. Few tools rest on a true parallelization but many research projects explore this approach: Flex<sup>6</sup> and Raw<sup>7</sup> at MIT, Piperench<sup>8</sup> at Carnegie-Mellon, Garp<sup>9</sup> at Berkeley, Pico [80] at HPLabs Palo Alto, Compaan<sup>10</sup> in Leiden. Among these tools let us cite Alpha [6] and MMAlpha, initially developed in the project-team Cosi, evolved from Diastol and constitute today a practical environment for the handling of recurrence equations and the high-level synthesis of dedicated hardware accelerators. We are continuing to make evolve MMAlpha. The work is done in close cooperation with the CompSys team (LIP, ENS Lyon). #### 3.3.2. Processor Modeling and Flexible Compilation Hardware description languages like VHDL or Verilog are largely used to model and simulate processors, but mainly with the aim to design hardware. The design of SoC requires methodologies and tools for the exploration of the architecture design space. This exploration requires the use of architecture description languages (ADL), adapted to the specification of the SoC architecture models. Very early in the design process, ADL play a role for the validation of SoC architectures, and also for the automatic generation of the software development tools necessary to the software and hardware design of the architecture. Most of the existing architecture description languages aimed at the specification of processor architecture, privileging either the synthesis, or the generation of compilers, or the generation of simulators. None of the existing languages is really directed towards architectural exploration. In the category of architecture description languages mainly directed towards processor hardware synthesis, one can quote Mimola, developed at the university of Dortmund, and used to describe target machines in the MSSQ and Record [72] compilers. Mimola is very close to hardware description languages like VHDL or Verilog. A Mimola description can be employed for the synthesis, simulation, and code generation, after extraction of the instruction set. With regard to the architecture description languages mainly directed towards compilation, one can quote nML, designed at the university of Berlin, ISDL proposed by the MIT, MDES developed at the university of Illinois, Expression developed at the University of California at Irvine. <sup>&</sup>lt;sup>3</sup>http://www.systemc.org http://www.synopsys.com/products/cocentric\_studio/ <sup>&</sup>lt;sup>5</sup>http://www.celoxica.com/products/tools/dk.asp <sup>&</sup>lt;sup>6</sup>http://flex-compiler.lcs.mit.edu <sup>&</sup>lt;sup>7</sup>http://cag.lcs.mit.edu/raw <sup>8</sup>http://www.ece.cmu.edu/research/piperench/ <sup>9</sup>http://brass.cs.berkeley.edu/garp.html <sup>10</sup>http://www.liacs.nl/~cserc/compaan/index.html With regard to the architecture description languages mainly directed towards simulation, one can quote LISA [75], developed at the university of Aachen. LISA allows the generation of cycle-accurate simulators for DSP processors. Both the structure and the behavior can be modeled. The existing architecture description languages can also be classified according to the modeling level: behavioral or structural. A language like Mimola is of structural level, languages like nML and ISDL are of behavioral level. LISA, Expression and MDES mixes the two levels of modeling. There is no standard as regards architecture description languages. The ARMOR language developed in the project-team Cosi, constitutes a practical approach for the modeling of complex architectures. It is suited to architectural exploration and automatic generation of software development tools (compiler, simulator, processor design tools, etc.). #### 3.3.3. Floating-Point to Fixed-Point Conversion Most digital signal processing algorithms are specified with floating-point data types but are finally implemented into fixed-point architectures (e.g. a DSP (Digital Signal Processor), an ASIP (Application Specific Instruction-set Processor), an ASIC or a FPGA) to satisfy the cost and the power consumption constraints of embedded systems. This conversion from floating-point arithmetic to fixed-point arithmetic is a tiresome task and error-prone if it is carried out manually. Indeed, experiments [63] showed that the time devoted to this conversion step is relatively significant, manual conversion representing up to 30% of the total time necessary to the implementation of the algorithm. More time-to-market constraint pushes to the use of high-level development tools, allowing to automate certain tasks. The existing methodologies for fixed-point data automatic coding [70], [82] carry out a transformation from floating-point data representation into a fixed-point representation, without taking into account the architecture of the target processor. However the analysis of the influence of the architecture on the precision of computation and the various phases of the code generation shows the need for taking the architecture features into account and for coupling the coding and code generation processes to obtain an implementation of quality in terms of precision of calculations and execution time. Data coding optimization must be carried out under precision constraint, and it is thus necessary to determine the signal-to-quantization noise ratio (SQNR) of the application. The SQNR determination methods [68] are generally based on simulation. But within the framework of the data coding optimization these methods use an iterative process leading to high times of optimization. Analytical techniques offer new perspectives for the accuracy evaluation [7]. # 4. Application Domains #### 4.1. Panorama Third- and fourth-generation mobile telecommunications is our privileged field of applications. In the framework of research and/or contractual cooperations, other application domains are considered: image indexing, traffic filtering in high-speed networks, and speech processing. #### 4.2. Mobile Communications The future generations of telecommunications constitute a privileged field of applications for IC designers because of the diversity of the constraints to be satisfied. Very high-level of performance – superior to 12 billion operations per second – is required as the result from the association of multimedia capacities and access techniques such as the WCDMA (known as 3G). Flexibility and programmability is needed in order to support the whole spectrum of the algorithms integrated into the standards of present generations (GSM, DEC, IS-95) and their evolutions. From the point of view of hardware architectures, the next generation systems will have to deal successively with very different applications. Indeed, the common tasks in a third-generation communication chain manipulate variable data sizes according to *distance* separating the task from the transmitter or the receiver, – the application tasks handle data of high-granularity such as images whereas the transmission tasks operate on bit-level data. Because of the wide spectrum of applications in future telecommunication standards, the computation treatments to be applied to these data will also be very diversified. This will result in very different calculation patterns. Moreover the time-to-market constraints impose the definition of development tools as portable as effective. This application domain is particularly challenging since current architectural solutions does not offer practical solutions for energy aware products (lower than 500mW in peak). # 5. Software #### 5.1. Panorama Research undertaken by R2D2 is in the context of software and hardware tools for the design of hardware systems. In order to promote the studied techniques, several software prototypes are developed (Polylib, MMAlpha, BSS, ARMOR/CALIFE). Among those, four distributed software are presented: Polylib an *open source* library of calculation on polyhedron, MMAlpha for the high-level synthesis, BSS a platform for the design of circuits and Gecos a flexible compilation platform. # 5.2. PolyLib **Keywords:** ASIC, CAD, architecture synthesis, data parallelism, functional programming, polyhedral computation. Participants: Patrice Quinton [correspondant], Tanguy Risset [CompSys, INRIA Rhône-Alpes]. The polyhedral Polylib library, developed in C, is an *open source* library of calculation on convex polyhedron. It was initially developed by Hervé Le Verge and Doran Wilde at INRIA Rennes. It is today maintained and developed with the LIP (ENS Lyon) and the ICPS (university of Strasbourg). The handling of the domains used in the recurrence equations or spaces of indices described by nested loops justifies the use of such a library. This library is currently used (independently of MMAlpha) by several research organizations (in England, the United States, the Netherlands, and in France). To know some more, refer to http://www.irisa.fr/cosi/polylib/user/ or contact Patrice Quinton. # 5.3. MMAlpha **Keywords:** ASIC, CAD, architecture synthesis, data parallelism, functional programming. Participants: Patrice Quinton [correspondant], Tanguy Risset [CompSys, INRIA Rhône-Alpes]. MMAlpha implements transformations on the Alpha language. The Alpha language was proposed by Christophe Mauras [6]. The implementation is carried out in the Mathematica language (from where the name MMAlpha) and is built on the Polylib library. Alpha program transformations are implemented by combining the Mathematica language and the Polylib library. The principle is to derive either an architecture, a sequential or a parallel code starting from an algorithmic specification of a problem. These transformations are semi-automatic, i.e. the actions to be performed are indicated by the user but the transformation itself is carried out by MMAlpha. Automatic transformations are also available, and provide in some cases satisfactory results. The design methodology is inherited from the method of systolic array synthesis. This field is studied from the theoretical point of view, and results of these research are implemented and experimented in the MMAlpha software. This software makes it possible to test various existing synthesis strategies, to study various possibilities of parallelization and to generate an architectural description of a circuit thanks to the AlpHard format (subset of the Alpha language). The interface between MMAlpha and logic synthesis tools is done thanks to a translation towards VHDL. The software was the implementation support of many theses carried out at Irisa. It is used by several research teams within the framework of collaborations with R2D2. It is one of the only tools making it possible to describe an algorithm and its hardware implementation in the same language and to deduce this implementation with proven transformations. To know some more, contact Patrice Quinton. #### 5.4. BSS, BOOST **Keywords:** architecture synthesis, circuit design, low-power consumption, placement. Participants: Daniel Chillet [correspondant], Sébastien Pillement, Olivier Sentieys. The BSS (*Breizh Synthesis System*) software platform for circuit design proposes a set of tools for the capture of application description (in VHDL or in C), the compilation, the simulation and the synthesis of architecture. The platform is currently composed of the following modules. - A set of programs (C and VHDL compilers, selection, scheduling, code generation) allowing the synthesis of circuits. - Graphic interfaces, *PUDesigner and GFDesigner*, allowing the visualization and the handling of the data flow graphs and architectures. - A tool for power estimation at the architectural level, *PowerCheck*, operating from the architectures generated by the synthesis. It also uses as an input a file of parameters which makes it possible to characterize the technology of the circuit and the physical capacities of the chips. The signal can be specified in two different ways: either by its probabilities according to a model (white noise, DBT), or in the form of a file of vectors from which are extracted the probabilistic characteristics. As output, PowerCheck provides a report indicating the average powers dissipated by each part of the control and processing units. PowerCheck also gives the dissipated powers cycle by cycle by the various modules. - A tool for area and delay interconnection estimation, *Jfloorplanner*, operating at the architectural level. The input of the tool consists of a *netlist* generated by BSS. This netlist contains the whole of information related to the components and their interconnections. The tool provides indications concerning the final area of the floorplan, the length of the interconnections as well as the interconnection delays related to these lengths. A display of the estimated floorplan is available and can be used in order to carry out quickly the place and route step with standard CAO tools. BOOST (Breizh Object Oriented Synthesis Tools) is an evolution of the BSS platform whose main objective is to facilitate the integration of new modules in the synthesis flow. A global XML application defines the module list and the installation location. For each module, an XML application defines how the module has to be described to be included in the Boost platform. Several simple synthesis steps have been included in Boost. This platform was used as a demonstrator for the OSGAR project during the RNTL days in October 2004 in Rennes. Boost is developed in Java language and can be installed on solaris, windows or linux platforms. To know some more, contact Daniel Chillet. #### **5.5. Gecos** **Keywords:** *Eclipse, Flexible Compilation, OSGi.* **Participant:** Ludovic L'Hours [correspondant]. Gecos can be seen as an evolution of CALIFE (Vincent Messé and François Charot) where a compilation flow is built from simple transformation tasks. In Gecos tasks are assembled using a simple script language: variables carry data (intermediate representation, profiling data, etc.) and functions call transformations. This simple language allows to easily create or customize compilation flows. Gecos is developed using OSGi plugins and Eclipse extension framework which ease the installation and the development of new transformation and analysis tasks. The platform is in active development but it already contains many transformations of a standard modern compiler (C frontend, SSA transformation, code selector, register allocator, etc.). Some works are currently undertaken to use gecos as a bridge between other compilation or synthesis activities UPaK, FloatToFix or MMAlpha. Find more information on its dedicated web page <a href="http://gecos.gforge.inria.fr">http://gecos.gforge.inria.fr</a>. #### 5.6. FWRToolbox **Keywords:** *Matlab toolbox, fixed-point, linear filters/controllers implementation.* Participant: Thibault Hilaire [correspondant]. The FWRToolbox is a Matlab open-source toolbox used to analyze the Finite Word Length effects of digital filters/controllers implementations and find "optimal" realizations (according to open-loop/closed-loop sensitivity measures, roundoff noise analysis, etc.). It allows to automatically generate C, Matlab of VHDL fixed-point code. Find more informations on its dedicated web page http://gforge.inria.fr/projects/fwrtoolbox/. # 6. New Results ### 6.1. New Architectures and Technologies **Keywords:** CDMA, Network-on-Chip, NoC, SoC, System-on-Chip, computation grain, low-power consumption, reconfigurable architecture, sensor network. #### 6.1.1. New Organization of Reconfigurable Structures 6.1.1.1. Reconfigurable Architecture Description Language Participants: Julien Lallet, Sébastien Pillement, Olivier Sentieys. Our research aims at developing methods for the definition of platform-based dynamically reconfigurable architectures. This method allows to easily develop a new dynamically reconfigurable architecture based on computing resources and generic interconnection schemes, to explore performances and energy efficiency and to validate the architecture by simulations at different abstraction levels. The definition of the architecture is done with the help of a high-level architecture description language based on the MAML language developed at the University of Erlangen-Nuremberg. The first part of the work realized has permitted to interconnect different kinds of computing resources (configurable logic blocks, reconfigurable functional units or processors) and to produce the required reconfiguration resources for making the reconfiguration process homogeneous [39]. Different architecture paradigms (FPGA, reconfigurable datapaths such as DART [4] or Massively Parallel Processor Architectures such as WPPA developed at U. Erlangen-Nuremberg [39]) can thus be quickly explored for one application. The second part of this work consists in the generation of the configuration controller. After analyzing the MAML specifications of the architecture and of the reconfiguration resources produced, the tool is able to generate a configuration controller which manages the configuration data of the architecture. Preemption is possible, and so the controller is able to interrupt a task, to save the context of the current configuration, and to run another task. The proposed reconfiguration paradigm for computing and interconnect resources has been optimized for very fast reconfiguration process, which is essential for reaching timing constraint required by today's applications. Reconfiguration is achieved in one cycle with the help of two configuration memory distributed banks and of a scan-path-based configuration bit-stream loading. Implementation of a WCDMA receiver has been tested on various architectures generated by our tool and has shown the efficiency of our methodology applied to reconfigurable systems. The tool which has been developed automatically generates synthesis and simulation models from the MAML specification. #### 6.1.1.2. Multi-Mode Architecture Design Participants: Emmanuel Casseau, Shafqat Khan. In a mobile society, more and more devices need to continuously adapt to changing environments. Such mode switches can be smoothly done in software using a general purpose or digital signal processor. However hardware components can cope with throughput and power constraints. Reconfigurable hardware technologies offer partial reconfiguration at runtime but require long reconfiguration times when changing rapidly applications. They are usually not power efficient for mobile devices. We are currently developing a methodology to implement multiple configuration (or mode) systems into a single circuit using conventional hardware technologies. One of the goals of multi-mode system design is to minimize area by reusing resources effectively among different configurations. High-level synthesis (HLS) is an automated process that generates a register transfer level (RTL) architecture from an algorithmic specification and user defined constraints. Scheduling and binding steps accomplish resource sharing efficiently. The main idea of this study is thus to make use of such algorithms in the synthesis of multi-mode systems. In [28] [26], multi-throughput architecture design has been investigated<sup>11</sup>. Configurations with different kinds of constraints (timing, resources) are targeted in [40] [27]<sup>12</sup>. These two approaches mainly differ by the way the *data-flow graphs* are handled: in a sequential way in the first case, all together whatever the configuration for the second one. #### 6.1.1.3. Memory Hierarchy in Specialized SoC Participants: Daniel Chillet, Olivier Sentieys, Erwan Grace. Since several years, memory area in SoC architectures has increased strongly. Today, circuit designers define SoC with an ever-increasing number of memory banks to store large amount of data. These banks are organized into a multi-level embedded memory hierarchy to ensure high-performance. However, due to the weak activity of the memory and according to the number of transistors, memory power consumption, and especially static power consumption, represents a major part of the global SoC power. In this context, we have defined a reconfigurable memory hierarchy model suited to SoC. It is a multi-bank, three-level memory hierarchy, where the banks are interconnected with a reconfigurable network to optimize their performances. Hence, this hierarchy can be tuned according to applications needs. Moreover, a Dynamic Voltage Scaling (DVS) technique is integrated to save more power with respect to data access constraints. So far, we have defined all the basic elements of this architecture and we are implementing our proposal on a FPGA circuit. Memory hierarchy reconfiguration is managed by a specific controller<sup>13</sup>. Future works will define the methodology which allows the designer to explore the design space. # 6.1.2. Efficient Coding or Modulation Schemes for On-Chip Interconnection Networks Participants: Sébastien Pillement, Olivier Sentieys. We have introduced a new coding scheme that faces simultaneously different issues of interconnection design. It accelerates data transfer on a bus or on a network-on-chip by removing worst-case patterns that cause crosstalk issues. This is achieved by skewing odd and even signals on the link. The implementation of this system is very simple and area-efficient. It enables to improve bandwidth by a factor higher than 2.3 on a metal-2 UMC $0.13\mu m$ CMOS technology bus with the same number of wires than a shielded bus. Furthermore, the delay propagation is well-controlled since the solution that is used to face crosstalk phenomenon removes all transition patterns but two. It also greatly improves noise tolerance through the use of a combination of two error detecting codes at the expense of a reduced number of additional wires. The first code uses temporal <sup>&</sup>lt;sup>11</sup>This work is done through a collaboration with the *LESTER Lab.*, *Université de Bretagne Sud* (P. Coussy (As. Pr.), C. Andriamisaina (PhD) <sup>&</sup>lt;sup>12</sup>This work is done through a collaboration with the *IMS Lab.*, *CNRS UMR 52* (B. Le Gal (As. Pr.) <sup>&</sup>lt;sup>13</sup>This work is done through a collaboration with the CEA List, Saclay redundancy and the second code is a parity-based scheme. This property enables us to lower the power supply voltage in order to reduce power consumption. #### 6.1.3. Wireless Sensor Networks **Participants:** Olivier Berder, François Charot, Ludovic L'Hours, Patrice Quinton, Olivier Sentieys, Charles Wagner, Tuan-Duc Nguyen, Adeel Pasha. Research in the field of wireless sensor networks (WSN) currently undergoes an important revolution, opening prospects for significant impacts in many applications (safety, health, environment, food safety, manufacture, telecom and robotics). In the context of research at R2D2, we work on a prototype of network of sensors with a principal objective of the reduction of energy. The aim of our research is mainly to study the relationship between algorithms, architectures and energy efficiency in the context of wireless sensor networks. R2D2 was at the initiative of an RNRT project, named SVP (for SurVeiller et Prévenir) together with several companies and teams: CEA LETI, Thales, INRIA, LPBEM, AphyCare, ANACT, Lip6 and Institut Maupertuis. This project aims at developing platforms for sensor network applications. R2D2 is involved in a region sponsored research project named CAPTIV where applications of sensor networks for automotive applications are studied. Notice that part of this research activity has been pushed by the strong links of R2D2 with the Aphycare<sup>14</sup> company, a spin-off of the R2D2 team, whose activity aims at developing wireless sensors nodes for the care of elder persons. 6.1.3.1. Energy-Efficiency Optimization for Cooperative MIMO Schemes in Wireless Sensor Networks **Participants:** Olivier Berder, Olivier Sentieys, Tuan-Duc Nguyen. For radio transmission over a fading channel, space-time diversity Multi-Input Multi-Output (MIMO) techniques need less transmission energy than SISO techniques for the same performance. The energy-efficiency of a MIMO transmission is particularly useful for Wireless Sensor Networks (WSN) where the energy consumption is the most important criterion. However, the direct application of multi-antenna techniques to WSN is impractical due to the limited physical size of sensor nodes which can typically support a single antenna. Fortunately, some individual nodes can cooperate at both transmission and reception sides to reduce the total energy consumption. Differing from classical MIMO systems, a cooperative MIMO system suffers from the de-synchronization between distributed nodes at the transmission side and the additive noise in the reception side, increasing the transmission energy consumption. At the cooperative transmission side, we investigated the effect of transmission synchronization error on the performance of cooperative MIMO systems and a new space-time combination technique was proposed, which has a better tolerance to the transmission synchronization error than the traditional combination technique and has also a low complexity [42], [43]. At the reception side, the effect of cooperative techniques on the performance of the cooperative MIMO system was considered and two new strategies which are more energy efficient and have better performance than other recent techniques were proposed. The significant advantages in terms of performance and energy consumption of these new cooperative strategies were proved by theoretical calculations and numerical simulations. 6.1.3.2. Power Optimization of Channel Codec Adapted for Wireless Sensor Networks **Participants:** Adeel Pasha, Olivier Sentieys. The purpose of this work was to define an architecture of channel coder and decoder that is specialized with the context of WSN. The error correcting codes make it possible to decrease to a significant degree transmit radio power of the nodes of the network and thus to save a very significant part of energy. We have used convolutional codes for error correction in WSN and define a new low power approach for calculating branch metrics (BM) in Viterbi decoder that saves nearly 43% power and 46% area as compared to the traditional approach [53]. <sup>14</sup>http://www.aphycare.com/ # 6.2. Modeling, Synthesis and Compilation for Reconfigurable Platforms **Keywords:** architecture modeling ASIP design, architecture synthesis, communication, fixed-point arithmetic, flexible compilation, reconfigurable system, scheduler, synthesis, system on-chip. #### 6.2.1. Synthesis and Compilation Techniques 6.2.1.1. Derivation of Efficient Architectures for Regular Arrays **Participants:** Steven Derrien, Tanguy Risset [CompSys INRIA Rhône-Alpes], Anne-Marie Chana, Auguste Noumsi, Patrice Quinton. Our research aims at developing methods and tools to synthesize parallel architectures for data-intensive applications expressed using the Alpha applicative language. These methods are implemented in the MMAlpha software. The Alpha language allows systems to be modeled using structured descriptions: some components can be separately represented, and later instantiated as an elementary block in a larger application. In many applications, these blocks have different clock rates, and it is the case for example, in the WCDMA (Wireless Code Division Multiple Access) air interface. We have been able to represent in Alpha multi-rate systems, by adding special components that model up- and down-samplers, and we have extended the structured scheduler of MMAlpha in order to find out the rates of all elementary blocks as well as the detailed schedule of each block. This activity, which was started during the thesis of Madeleine Nyamsi in 2005, is being pursued in the frame of a research cooperation with the Laval University in Québec city and UQTR (Québec) for the modelling of MIMO communication schemes and with Colorado State University at Fort-Collins. More generally, our current research aims at using the full potential of Alpha structuration in order to be able to describe and implement complex systems, including interface generation and resource constrained usage. To this respect, applications (MIMO, content-based image retrieval and bioinformatics) have been a powerful driver of our research and have lead to interesting results regarding applications themselves [33], [29], [18]. 6.2.1.2. Automatic Synthesis of Optimized Application-Dependent Reconfigurable Systems Participants: Christophe Wolinski, François Charot, Erwan Raffin. This year we have started working on the problem of automatic selection of application-dependent processor extensions and on the problem of application scheduling on these new architectures. In our approach these extensions are implemented as specialized sequential or parallel instructions. They correspond to the identified most frequently occurring computational patterns or other interesting patterns in the application graph and are finally selected during mapping and scheduling. Our methods can handle both time-constrained and resource-constrained scheduling. We have developed (with collaboration of C.Kuchcinski from Lund University, Sweden) the UPaK *Abstract Unified Pattern Based Synthesis Kernel for Hardware and Software Systems* system that can be used for this purpose[51], [50]. The experimental results obtained by the UPaK system show that the presented method provides high coverage of application graphs with small quantities of patterns and ensures high application execution speed-ups both for sequential and parallel application execution with processor extensions implementing selected patterns. The UPaK system, however, is more general. It has a number of features that make it superior in the context of HW/SW compilation for heterogeneous (reconfigurable) systems and create an opportunity to develop competitive commercial synthesis tools. These features are the following: - computation patterns represent instructions in software, while in hardware they define components; this makes the basis for a unified representation for hardware/software compilation, - computation patterns represent different configurations that can be mapped onto the same reconfigurable architecture, - computation patterns are sub-graphs in HCDG (*Hierarchical Conditional Dependency Graph*) and graph matching can be used for their identification as well as for mapping the HCDG graph onto hardware and/or software implementation, - patterns in our approach form a hierarchy that helps to handle complexity of designs. We have also started working on an optimized synthesis of automatically identified computational patterns in order to synthesize corresponding run-time reconfigurable cells. #### 6.2.1.3. Run-Time Reconfigurable Architecture Modeling Participants: Christophe Wolinski, François Charot, Antoine Floc'h, Erwan Raffin. We have started working on the modeling problem of the run-time reconfigurable architecture in order to optimize the execution of the application. We used the constraints programming approach to model hardware resources, communications and memories. The architecture is linear and similar to the DART cluster [25]. The first results are presented in the master report of Antoine Floc'h [52]. #### 6.2.1.4. Specialized Microcontroller Synthesis on FPGA Participants: Ludovic L'Hours, Patrice Quinton, Steven Derrien. This research aims at developing techniques to synthesize specialized microcontrollers on FPGA. The targeted applications are described in a high-level language such as C, where control strongly prevails (peripheral control driver, packet processing, etc), such as the operating system embedded into the RDisk machine [64]. The main goal is to get small-sized circuits, with reasonable performances. The traditional approaches of architecture synthesis generally aim at maximizing the performance by analyzing and paralleling the data flow, there are not suited to applications with complex control flow. Their intrinsic sequential feature naturally leads to use software compilation techniques. We designed a microcontroller synthesis technique based on the extraction of a specialized instruction set from a given application. The design of this instruction set is mainly leaded by application profiling information, but also by different estimators such as the pattern complexity or the number of bus access. The microcontroller is then derived from this instruction set using VHDL templates: the targeted architecture is currently a RISC processor, but other kind of architectures such as VLIW could be considered. Compared to fixed instructions set microprocessors, we managed to reduce by a magnitude of 2, both the code size and the processor size, for the same range of performances [71]. All these algorithms where integrated in a generic compilation platform called Gecos. This platform in constant development is freely available (<a href="http://gecos.gforge.inria.fr">http://gecos.gforge.inria.fr</a>). Investigation are currently undertaken to use this methodology to generate processors for very constrained devices, such as nodes of a sensor network. #### 6.2.2. Floating-Point to Fixed-Point Transformation # 6.2.2.1. Floating-Point to Fixed-Point Conversion Methodology for FPGA Participants: Daniel Menard, Romuald Rocher, Olivier Sentieys. A new methodology to implement floating-point applications into an FPGA using fixed-point arithmetic has been proposed [17], [35]. The user has to specify the application time and accuracy constraint (expressed as the minimum output Signal to Quantization Noise Ratio). Then the methodology converts the application into fixed-point. Our approach aims at determining the fixed-point specification which minimizes the architecture cost and leads to a sufficient computation accuracy expressed through the accuracy constraint. The fixed-point conversion process must determine, for all data, a word-length and a binary-point position. It is composed of three main tasks. The first step corresponds to the data dynamic range evaluation. These results are used in the second step to determine the binary point locations. The third step objective is to fix the data word-length, such that the architecture cost is minimized and the accuracy constraint is satisfied. The accuracy is evaluated with an analytical method to reduce dramatically the optimization time compared to simulation based methods. To generate an optimized architecture, the operator word-length optimization and the synthesis process are coupled [7]. Thus, an iterative process on high-level synthesis and operator word-length optimization is used to improve both of these dependent processes. This coupling allows reducing the number of arithmetic operators. Indeed, smaller word-length operators have a reduced latency. Compared to classical implementations based on a uniform word-length, our approach reduces architecture cost from 20 % to 40 % [13]. #### 6.2.2.2. Analytical Accuracy Evaluation in Fixed-Point Systems Participants: Daniel Menard, Romuald Rocher, Pascal Scalart, Olivier Sentieys. An important part of the floating-point to fixed-point process is the fixed-point accuracy evaluation. The accuracy is evaluated through the Signal to Quantization Noise Ratio (SQNR). A general method based on an analytical approach has been proposed. [45], [46] This method is valid for all quantization laws (truncation and rounding) and for all systems including arithmetic operations. The proposed technique is based on a matrix model which simplifies the expression for transform algorithms such as FFT or DCT. For recursive systems, the method unrolls the recurrence. The complexity of our approach has been determined. To reduce this complexity, a linear prediction model has been developed. This model accelerates recurrence unrolling by approximating the recurrence terms included in the output quantization noise analytical expression. The model has been evaluated and compared in terms of accuracy and computing time for different applications such as, Least Mean Square (LMS) or Affine Projection Algorithms (APA). This approach leads to accurate noise power estimations. Model execution times have been evaluated on the Matlab tool. The linear prediction approach reduces dramatically the noise power expression computing time. The output noise analytical expression is used in the floating-point to fixed-point conversion process to optimize the data word-length under accuracy constraint (SQNR minimal value). The optimization time obtained with our approach is better than that obtained with fixed-point simulation based approach after only several iterations. Our approach reduces computing time compared to simulation approaches after only some iterations. These results show the interest of our methodology to reduce fixed-point system development time. They are described in details in the Ph.D. thesis of Romuald Rocher [14]. #### 6.2.2.3. Accuracy Constraint Determination for Fixed-Point Systems Participants: Daniel Menard, Olivier Sentieys. Fixed-point arithmetic introduces an unalterable quantization error which modifies the application functionalities and degrades the performances. Minimal computation accuracy must be guaranteed to maintain the application performances. In the fixed-point conversion process, the fixed-point specification is optimized such as the implementation cost is minimized as long as the application performances are fulfilled. Nevertheless, the performance degradations are not analyzed directly in the conversion process and an intermediate metric is used to measure the computation accuracy. The global conversion method is decomposed in two main steps. Firstly, a computation accuracy constraint is determined according to the application performances, and secondly the architecture cost is minimized under this accuracy constraint during the fixed-point conversion process. The minimal value determination for the computation accuracy metric is a difficult problem and cannot be defined directly. This accuracy constraint has to be linked to the quality evaluation and performances of the application. A method has been proposed to determine the accuracy constraint from the application performances. In our approach, the metric used to evaluate the computation accuracy is the quantization noise power. The accuracy constraint is determined by modelling the fixed-point system behavior with an infinite precision version of the system and a single noise source located at the system output. A noise model based on a uniform noise and a Gaussian noise has been proposed [41]. This noise model allows to predict performance degradations due to fixed-point and is used to determine the initial value of the accuracy constraint. Then an iterative process is used to adjust the accuracy constraint according to the real measured performances. Our approach to determine the accuracy constraint has been tested and validated on two applications corresponding to a MP3 coder and a WCDMA receiver. #### 6.2.2.4. Optimal Fixed-Point Implementation of Filter/Controller **Participants:** Thibault Hilaire, Daniel Menard, Olivier Sentieys. This work is mainly focused on hardware implementation of linear time invariant filters or controllers, and the analysis of Finite Word Length effects (principally fixed-point arithmetic, but also some extensions to floating-point arithmetic are also done). The digital implementation (in embedded devices such as DSP, FPGA, ASIC, etc.) leads to a numerical (and also temporal) degradation of the controllers performance, characteristics, etc.), due to the quantization of the involved coefficients (parametric errors) and the roundoff noises (numerical noises) in the numerical computations. These degradations depends on the structure of the algorithm used (direct forms, cascade/parallel decomposition, state space forms, algorithms with shift operator, delta or modified-delta operators, etc.), on the software methods or the hardware possibilities. The search for optimal implementation (in a sense to be defined) of a particular filters/controllers is a very difficult problem that is studied in different ways by several different communities (control, signal processing, architecture, etc.). Instead of considering C-code or a particular graph description of the algorithm used to numerically realize the filter or controller, this work is based on a algebraical description of the realization (matrix description with the "specialized implicit state-space framework") [22]. The first part of this work is based on T. Hilaire's thesis and is quite independent from hardware considerations. Some Finite Word Length criteria has been developed (like transfer function sensitivity, pole sensitivity, stability related sensitivity for closed-loop context, etc.) and optimal realizations can be found. Then this work was extended with hardware considerations and the possibility to consider multiple wordlength realizations. The roundoff analysis was done in open-loop and closed-loop context [37]. Some other criteria, like surface or power consumption could then be used for the optimal realization design. From the definition of the filter or controller (i.e. the transfer function), it is possible to choose multiple possible structure of realization (state-space, delta-operator, rho-operator, etc.), find the optimal one (according to one or some FWL measures) and generate the equivalent C, MATLAB or VHDL fixed-point code. The FWR Toolbox (for Matlab) was built to achieve this 'optimal' fixed-point implementation (see section 5.6. #### 6.2.3. Specialized SoC Architecture Modeling 6.2.3.1. System Modelling for Dynamically Reconfigurable Architectures Participants: Daniel Chillet, Sébastien Pillement, Umer Farooq, Didier Demigny, Olivier Sentieys. To ensure efficient execution of application tasks into SoC architecture, designers include heterogeneous execution resources in the same chip. The management of the overall platform (including hardware support and tasks) is thus made by an operating system (OS). With the introduction of flexible/reconfigurable resources in SoC architecture, some OS services must be adapted. For example, the task scheduling service has to take the availability of reconfigurable resource into account, to instantiate tasks on this resource. In order to evaluate the impact of reconfigurable architecture on OS services, we have defined a SystemC model of the complete environment in the context of the OverSoC project (see section 8.2.1). In this project, we have developed the model of the reconfigurable part of the system. Concerning the scheduling service, we defined a first Artificial Neural Networks (ANN) to ensure spatial and temporal placement of tasks within a heterogeneous multi-processor SoC [30], [16]. This year, we have extended our first ANN proposal to take reconfigurability into account. We have thus defined a new structure, called Reconfigurable ANN (RANN), which allows to substantially reduce the number of neurons [31], [32]. This model can handle any number of tasks which can be instantiated on the resources. A mathematical formulation of this RANN was proposed, and a simulation tool was developed. A correct scheduling is obtained with a small number of iterations and very small number of neurons. To complete this study, we started the hardware implementation of the neural network. This architecture will provide an efficient and reactive on-line scheduling. Our first results show that some further optimizations can be applied to limit the implementation cost of neurons and to limit the number of neuron connections. On-line scheduling introduces dynamicity in task placement within a reconfigurable resource, and thus requires a flexible communication service to ensure data exchange. In this context, we study specific interconnect structures well-suited for dynamically reconfigurable chips. We defined a first hierarchical interconnect infrastructure and built a VHDL implementation of this solution. The next step of this work consists in the definition of a global controller to manage simultaneously scheduling and communication services. #### 6.2.3.2. SoC Modeling and Prototyping on FPGA-Based Systems Participants: François Charot, Kevin Martin, Laurent Perraudeau, Charles Wagner. As part of our participation to the SocLib project (see Section 7.3), we have designed a model of the configurable NiosII processor core from Altera (the fast version is actually supported). This NiosII processor core is declined in three families (economic, standard, fast). It is a configurable processor core (pipeline depth, cache size, support for custom instructions, etc.). The developed model, which is in fact an instruction-set simulator, could be used in CABA or TLMT SocLib simulation platforms. This work is done with the goal to establish a link between a SocLib simulation platform and its prototyping on a FPGA system. To this end, a VCI model of the Altera Avalon bus and is associated wrappers are currently being designed. ### 6.3. Study of Applications **Keywords:** WCDMA, biomedical, image indexing, intrusion detection in hardware, mobile telecommunication, speech processing. Applications stemming from third-generation radio-communication systems are good candidates for the study of hardware systems mixing programmable parts executing software code and specialized modules dedicated to the acceleration of time consuming parts of applications. Data filtering, cryptographic and traffic filtering in high-speed network, speech processing are also under consideration. #### 6.3.1. Radio-Communication Systems #### 6.3.1.1. MIMO Mobile Communication Systems Prototyping Participants: Taofik Saïdi, Baptiste Vrigneau, Sébastien Roy, Olivier Sentieys, Olivier Berder. In the context of wireless communications, using more than one antenna both at the transmitter and at the receiver optimizes the spectral efficiency data transmission. The high complexity of the MIMO (Multiple Input Multiple Output) and multi-antenna algorithms leads to the design of real-time high-performance specific architectures. A flexible MIMO real-time prototype has been designed to operate under the WCDMA (Wideband Code Division Multiple Access) third generation cellular standard. It can be used for uplink (HSUPA) and downlink (HSDPA) communications. The circuit is characterized by a scalable and flexible parallel-pipeline architecture [48]. This system is designed on a rapid prototyping platforms from Lyrtech Inc. company, the SignalMaster platforms, for real-time measurements. This work is done in collaboration with Lyrtech Inc. and with the LRTS laboratory of Laval University in Québec, CA. Considering the possibility for the transmitter to get some Channel State Information (CSI) from the receiver, antennas power allocation strategies can be performed thanks to the jointed optimization of linear precoder (at the transmitter) and decoder (at the receiver). Several criteria are used to design these precoders, such as the Minimum Bit Error Rate or the Minimum euclidian distance. For example, the next 802.11n norm will consider the beamforming precoder in option. If many architectures have been proposed for classical MIMO decoding, there is no prototype of precoding schemes in spite of the great improvement in term of BER and capacities they can achieve. Although optimization criteria are different, all the studied precoders consider a diagonalized channel by using the singular value decomposition (SVD) which represents a mathematical keypoint. The numerical tool "coordinate rotation digital computer" (CORDIC) allows the design of an efficient hardware architecture for programmable component for the SVD, and some implementation results have been obtained to jointly optimize SVD and precoding. # 6.3.1.2. Transmit Beamforming for Distributed Wireless Access with Centralized Signal Processing Participants: Michel Thériault, Sébastien Roy, Olivier Sentieys. Exploiting the full potential of MIMO systems is a difficult problem. Although there are relatively simple methods to use antenna diversity at the receiver, where the channel can be estimated without to much difficulty using the received information, estimating the channel at the transmitter (forward channel) is much more complex. In point-to-point MIMO, estimating the forward channel is optional, as it does not lead to important increases in performances. In multi-user MIMO, however, estimating the forward channel is necessary in order to obtain similar capacities in both uplink and downlink. The proposed research is based on a wireless local area network (WLAN) using OFDM modulation in the unlicensed 5 GHz frequency bands. This network consists of several access points, each access point being made up of an antenna array. Unlike most WLAN used today, the signal processing is not performed individually at each access point, but rather at a central processing unit (each access point being linked to this central processing unit using large bandwidth links). The proposed research for this project is divided in two main stages: developing transmission algorithms adapted to the problem at hand, then implementing those algorithms on VLSI chip-sets. The strategies studied for the transmission algorithms are based on transmit beamforming. Transmit beamforming consists in linearly combining the signals transmitted on each antenna of an array in such a way that distinct beams are formed for each user. This process allows us to maximize the signal power received by each user while minimizing the interference from other users' communications. A simulation model of the system under study is being developed using Matlab software. This model allows the comparison of the different algorithms considered for this thesis and to choose one (or several) algorithm which will be implemented on VLSI chip-sets. #### 6.3.1.3. Parallel Reconfigurable Architectures for LDPC Decoding Participants: François Charot, Christophe Wolinski. LDPC codes are a class of error-correcting code introduced by Gallager [60] with an iterative probability-based decoding algorithm. Their performances combined with their relatively simple decoding algorithm make these codes very attractive for the next satellite and radio digital transmission system generations. LDPC codes were chosen in DVB-S2, 802.11n, 802.16e and 802.3a standards. The decoding of LDPC codes is an iterative process. For 802.16e standard about 3 000 messages are processed and reordered in each of the 30 iterations of the decoding process. The amount of messages is much more higher in the case of DVB-S2 (of the order of 300 000 messages). These huge data processing and storage requirements are a real challenge for the decoder hardware realization, which has to fulfill a specified throughput (30 Mbit/s for 802.16e and 255 Mbit/s for base station applications in case of DVB-S2). One major problem is the huge design space composed of many interrelated parameters which enforces drastic design trade-offs. An other important issue is the need for flexibility of the hardware solutions which have to be able to support all the declinations of a given standard. We have defined a generic architecture template that is composed of several processing modules and a set of interconnection buses for inter-modules communications. Each module includes two processing units (called *bitnode* and *checknode* processing units), and a set of memory banks. The number of modules, the number of interconnection buses, the size and the number of memory banks is standard dependent. The working of the LDPC decoding algorithm rests on an appropriate distribution of the block of input data in the different memory banks and on a scheduling of the computation obtained using constraints programming-based optimization tools. This architecture template has been instantiated in the case of the 802.16e standard. This architecture has been validated with the design of a cycle accurate and bit accurate SystemC simulation model. #### 6.3.2. Content-Based Image Retrieval Hardware Acceleration Participants: Steven Derrien, Auguste Noumsi, Patrice Quinton, Laurent Amsaleg [Texmex]. Content Based Image Retrieval (CBIR) is a technique that allows one to retrieve images of a data base which are (at least) partly similar to a given reference image. CBIR is drawing increasing interest due to its potential application to problems such as image copyright enforcement. Indeed, the large use of Internet resulted in a huge increase of Web available multimedia content, especially images. Checking copyright is therefore a concern for image owners which must be able to identify undue use of images. This identification process relies upon precise and fast image comparison algorithms as Internet is a rapidly changing support and such algorithms need to be run on a daily basis. Although accurate search techniques based on local image descriptors exist, they suffer from very long execution time (retrieving an image among a 30,000 image data base requires about 1,500 seconds on a standard workstation). To make these techniques attractive, we have been working on the acceleration of CBIR through the use of specific hardware design architectures, the target machines being the RDISK cluster [10] and the ReMIX machine [18]. Among other results, we have designed a highly efficient hardware accelerator for the CBIR application targeted at the ReMIX machine [29], [18], for which speedup factor of 40 have been experimentally measured. #### 6.3.3. Hardware Acceleration of Bioinformatics Applications Participants: Steven Derrien, Patrice Quinton. Over the last few years, FPGA based accelerators haven proved to be a very attractive solution for implementing many of the most compute intensive bio-computing algorithms. Among others, FPGA implementations of Smith and Waterman, *BLAST* and *ClustalW* software or even Weighted Finite Automaton algorithms have exhibited impressive speed-up factors, making them a very viable alternative to expensive supercomputing infrastructures such as vector computers or PC clusters. Among other approaches, profile based hidden Markov models (HMM) have been recently used by biologists to predict the structure and function of a protein directly from its representation as an amino-acid sequence. The approach consists in building and providing probabilistic models of protein sequences that share similar structures or functions. As of today, there are several software implementations of this model, the HMMS package being one of the most widely used. We have studied the possibility of accelerating using reconfigurable hardware the most time consuming routine of the HMMS tool, using linear space-time mappings based on the so-called *polyhedral model*. This lead to a flexible parallel architecture template which handles the feedback loop present in most HMM models. Preliminary results [33] indicates that the resulting architecture could provide speedup factor between 10 and 50, depending on the query HMM model, and on the target FPGA technology. #### 6.3.4. Intrusion Detection System in Hardware Participants: Georges Adouko, François Charot, Christophe Wolinski. The dynamic feature of security systems is – through anti-intrusion mechanisms (filtering at different levels: packet, connection, and application levels) evolving according to modes and levels of protection–, to our knowledge, a challenge out of reach of classical technologies based on general purpose or network processors. The requirements of security in high-speed networks (from 10 to 40 Gigabit/s) impose the implementation of the filtering rules in the appropriate hardware structures. It is a matter of being able to manage a large variety of complex treatments, and also to guarantee the quality of service. Only dedicated solutions could solve the bottleneck related to the implementation complexity today, at the price of an obvious lack of flexibility and a total absence of evolution. The aim of our research (Fastnet PRIR Project) is the design of specialized hardware systems for filtering of the network traffic at high-speed. During this year, we have investigated the design of string matching engines based on the use of a set of small multiple characters state machines running in parallel. Each state machine is then in charge of a subset of the strings to be processed. The sizing of these small state machines reflects the capacities of FPGA programmable components (in terms of size of the distributed memory blocks and number of logic elements). We have measured the size of these state machines for different size of string subsets. We have also considered the treatment of 1, 2 and 4 characters per cycle. In order to reduce the memory occupancies, and to have a balanced cost distribution between logical and memory resources we have worked on the coding of the state machines. First results show that the targeted 10 to 40 Gigabit/s throughputs could be achieved. #### 6.3.5. Accelerating Statistical Test for Real-Time Estimation of Randomness Participants: Renaud Santoro, Olivier Sentieys, Sébastien Roy. Random number generator (RNG) are necessary in many applications like cryptography, communication, VLSI testing or probabilistic algorithms. RNG randomness estimation is performed by using a battery of statistical tests. Several such batteries are reported in the literature including Diehard [74], NIST [77], FIPS 140-2 [59], AIS 20 [54] and the AIS 31 batteries. The number of hardware applications requiring RNG is continuously increasing, specially in cryptography embedded circuits. In cryptography, security is partially based on the randomness quality of a key generated by a RNG. In hardware, RNG randomness can be influenced by external noise (power supply noise, temperature) and by chip activity. This dependance is a weaknesses and can make easier the task of an hacker. We have proposed the on-chip monitoring of the RNG by using an efficient hardware battery of statistical tests. During this year, we have investigated the acceleration of some statistical tests in hardware to enhance their efficiency and to detect RNG failures. A battery of statistical tests has been selected for its efficient implementation, making the area and power consumption insignificant. Performance and cost of the hardware implementation into FPGA and VLSI targets have shown that statistical tests can easily be implemented in low-cost embedded security circuits, enhanced the on-line monitoring of RNG randomness to prevent RNG failures. Implementation results of the six statistical tests exhibit a significant improvement over published work. During this year, we also have used these techniques to search the optimal rule for five-neighbor cellular automata random number generators [47]. Cellular Automata (CA) are recognized as efficient solutions for high-rate random number generators. Nevertheless, for a good randomness, the CA rule and the number of neighbor cells have to be correctly chosen. If a cell has N neighbors, $2^{2^N}$ rules are possible, and finding the optimal rule is a time consuming task. To reduce its complexity, the search procedure is usually done using genetic algorithm, which does not guarantee to find the best rule. In our work, we have also used an exhaustive search to find the optimal rule on five neighborhood CA. The search is allowed by using statistical tests and entropy measures implemented in FPGA. The use of FPGA makes this search fast and efficient. This study also shows that the increase of neighbors in a CA enhances the entropy in the context of high-rate pseudo-random number generators. #### 6.3.6. Noise Reduction in Speech Processing Participants: Pascal Scalart, Mohamed Djendi. Blind source separation (BSS) refers to the problem of recovering signals from several observed linear mixtures. The strength of the BSS model is that only mutual statistical independence between the source signals is assumed and no a priori information about the characteristics of the source signals, the mixing matrix or the arrangement of the sensors is needed. Therefore BSS can be applied to a variety of situations such as the separation of simultaneous speakers, analysis of biomedical signals obtained by EEG or in wireless telecommunications to separate several received signals. In our work, we consider an extension of the so-called convolutive mixture model when only two observations are available and issued from two spatially localized (point) sources corrupted by additive uncorrelated noises sources. The first point source corresponds to a speech signal and the second one can represent either car engine noise or far-end speech that we want to cancel. The additive noise components represent the non-coherent part of the diffuse acoustic (background) noise in the vicinity of the microphones. To reduce the influence of the noise, adaptive noise cancellers based on the source separation principle are usually composed of two (or four) adaptive filters arranged in a forward or a backward symmetric structure. The forward structure can be regarded as an extension of the basic structure of an Adaptive Noise Canceller. The first step of the work was to investigate the feed-forward implementation of a noise cancelling system based on a blind source separation structure. The focus was put on the adaptation of the separation structure. In such a scheme, we propose a new algorithm called the double fast Newton transversal filter (DFNTF) algorithm to adapt the two adaptive filters of the noise cancelling system. We have shown the superiority of the DFNTF algorithm compared to the performance of the double NLMS algorithm. We have also shown that the control of the convergence of the adaptive filters by a vocal activity detector allows a full cancellation of the coherent part of the noise components. To study the separation problem when two closely-spaced microphones are used, a new model has been proposed which is physically consistent as it appears from the simulations performed with different sensor spacing. Since the feed-forward implementation of a noise cancelling system may introduce distortion on the BSS outputs, further processing is needed. Moreover, this distortion is as more important as the microphones are closely spaced. In this context, we have shown theoretically that correction of the distortions was possible when using a post-filtering stage applied on the outputs of the BSS structure. We concentrate in this second part on the post-filters implementation, a problem for which to our knowledge no satisfactory solution has been proposed in the literature, especially when closely spaced microphones are used. In [34], we proposed two new frequency domain methods to compute the post-filters in order to compensate for the spectral distortion caused by the forward BSS structure. The first method is based on an open-loop amplitude equalization structure. The second one uses a frequency domain adaptive filter to estimate recursively the post-filter. In this second method, we have introduced a variable step size which is dependent on the signal-to-noise ratio to cope with the various situations that can be encountered in practice. Hence, this new modification permits robust convergence of the adaptive equalizer even in critical situations. All the simulations demonstrate the good behavior of the two proposed methods with a slight advantage for the second one. Note that even more striking improvement of the cepstral distance has been obtained with these two methods when very short impulse responses are used in the mixing model thus resulting in a very high spectral distortion at the output of the separation structure. However, we note that the amplification of the non-coherent noise by the post-filters may be mitigated by the fact that the uncorrelated noise is generally lower in case of closely spaced microphones. Further work will deal with this problem. #### 6.3.7. Intelligent Transport System Participants: Olivier Berder, Daniel Ménard, Olivier Sentieys, Tuan-Duc Nguyen. Transportation systems are playing a critical role in virtually all facets of modern life and significant challenges remain to further improve the efficiency and safety of the current systems. The Brittany Region Council and the Côtes d'Armor Department Council are actually investing in this research area and created recently a Scientific Interest Group on Intelligent Transportation System (ITS), whose head is at ENSSAT, Lannion. Our research team actively participates to this new activity, and especially to projects concerning the deployment of new energy-efficient architectures for ITS. R2D2 is the leader of the regional research program CAPTIV, which aims at proposing new low-cost and energy-efficient mobile communications solutions to ease and make safer road traffic conditions. Considering "intelligent" road signs and vehicles, i.e. equipped with an autonomous radio communication system, drivers will be able to receive at any time various information about traffic fluidity or road sign identification. In order to reduce deployment cost and increase lifetime of the whole system, Multi-Input Multi-Output (MIMO) signal processing techniques are used. Such techniques allow to dramatically increase the capacity of mobile communication systems or the quality of the transmission, thanks to the well known space-time codes. From another point of view, MIMO systems allow to significantly reduce energy consumed by communications in ad-hoc networks. Considering each crossroads as a communication node, the possible cooperation between road signs allows energy-efficient communications between crossroads. Supported by the Scientific Interest Group GIS ITS-Bretagne and by industrial leaders in ITS domain, regrouping major research laboratories in the region, CAPTIV is a highly applicative program. A first prototype of such a communicating crossroads will be presented in the Route du Futur in Saint-Brieuc (portion of road devoted to ITS experimentations). # 7. Contracts and Grants with Industry ### 7.1. ANR RNRT SVP (2006-2008) **Participants:** Olivier Berder, François Charot, Ludovic L'Hours, Olivier Sentieys, Patrice Quinton, Charles Wagner. The main goal of the ANR SVP (http://svp.irisa.fr), (SurVeiller et Prévenir) project is to study, to experiment and to realize an ambient integrated architectural framework dedicated to the design and to the deployment of services into a dynamic sensor network. The proposed framework will consist in designing a system architecture that will meet the objective of ease of use or convenience while also taking into account and adapting all specific characteristics of wireless sensor nodes like drastic resource constraints. Since we are convinced that only technologies are not enough to spread and promote advanced researches we insist on the societal aspects of the project by also taking into account the final user. The second main objective of the SVP project is to deploy real applications in situ in order to adapt the technology available on the shelf to the reality. The first application will consist in deploying a sensor network that will record the physical activity of school children in order to study and prevent childhood obesity. In the second application, a sensor network will be deployed in an harbor area in order to warn workers when risk of accident may arise and to help in localizing and optimizing the containers management. # 7.2. ANR Architectures du Futur - ROMA: Reconfigurable Operators for Multimedia Applications (2007-2010) **Participants:** Emmanuel Casseau, Shafqat Khan, Daniel Ménard, François Charot, Christophe Wolinski, Erwan Raffin, Olivier Sentieys. ROMA (http://roma.irisa.fr) is an ANR "architectures du future" project which has been contracted in January 2007 for 3 years. It involves IRISA-R2D2 as prime, CEA-LIST, CNRS-LIRMM and THOMSON R&D France. The ROMA project proposes to develop both a design methodology and a reconfigurable processor able to adapt its computing structure to video and image processing applications. The processor is built around a pipeline of coarse grain reconfigurable operators exhibiting efficient power and performance features. Flexibility is obtained through the use of mutable units. These units can be configured for the function they implement, the code data are represented with and the data bit-width. The configuration of the processor is dynamically done all along the application depending on the tasks that are to be carried out. Higher performance in terms of power consumption and computing power, with at least one-magnitude order with regards to state-of-the-art energy-efficient reconfigurable architectures, is expected. R2D2 is the leader of this project. #### 7.3. ANR Technologies Logicielles - SocLib (2007-2009) Participants: François Charot, Kevin Martin, Laurent Perraudeau, Charles Wagner. The goal of the SocLib (http://www.soclib.fr), project is to build an open platform for modelling and simulation of multi-processors system on chip, that can be used by both universities and industrial companies. The core of the platform is a library of simulation models for virtual components (IP cores), with a guaranteed path to silicon. The main concern of the SocLib project is a true interoperability between the IP cores: all SocLib components are written in SystemC and respect the VCI (Virtual Component Interface standard) communication protocol. CABA (cycle-accurate and bit-accurate) and TLMT (transaction level model with time) simulation models are proposed. #### **7.4. Contract with Thomson (2006-2009)** Participants: François Charot, Christophe Wolinski. The Ph.D. thesis of E. Raffin is supported by a CIFRE grant in the framework of a contract between R2D2 and Thomson. # 8. Other Grants and Activities # 8.1. Regional Actions #### 8.1.1. Fastnet: Fast Adaptive Secure Technology for high-speed NETwork (2005-2007) Participants: Georges Adouko, François Charot. The Fastnet project has been contracted in March 2005, It is granted by the Brittany Region and it involves ENST Bretagne. It tackles the problematic of high-rate filtering, using architectures based on reconfigurable components that allow at the hardware level, specific filtering algorithms to be implemented, and exhibiting this way a high degree of parallelism. #### 8.1.2. CAPTIV (2006-2008) Participants: Olivier Berder, Tuan-Duc Nguyen, Olivier Sentieys. The CAPTIV project (http://captiv.irisa.fr) has been contracted in January 2006. It is granted by the Brittany Region and it involves ENST Bretagne and IETR laboratories. The scientific objective of this research program is the study and the realization of communication systems between vehicles and road infrastructure (e.g. signs traffic) at low cost and at low-energy consumption . #### 8.1.3. PucesCom-Santé (2006-2008) Participants: François Charot, Patrice Quinton, Olivier Sentieys, Charles Wagner. The "PucesCom-Santé" project has been contracted in January 2006, It is granted by the Brittany Region. PucesCom-santé is managed by the Brittany branch of the ENS de Cachan and it involves several laboratories of the Brittany region: LPBEM (Rennes 2), IRISA, IETR and LTSI. The project concerns the use of a biometric data sensor network for the follow-up of the physical activity and energy expenditure of a population #### 8.2. National Actions The team R2D2 participates to the activities of: - GdR SOC-SIP (System On Chip System In Package). - GdR-PRC ISIS (Information Signal ImageS), working group GT7 Algorithms Architectures Adequation. - GdR ASR (Architectures Systèmes et Réseaux), R2D2 is a member of the group RECAP<sup>15</sup> group. #### 8.2.1. OverSoc (2005-2008) Participants: Daniel Chillet, Sébastien Pillement. OveRSoC is an ANR project which has been contracted in december 2005 for 3 years. The project objective is to develop such global exploration methodology to evaluate and validate the interactions between an embedded RTOS and a Reconfigurable SoC (RSoc) platform. The OveRSoC project aims also at furnishing SoC designers with a framework for choosing the right RTOS services architecture according to a particular reconfigurable SoC platform. This project involves Architecture team of Etis (UMR 8051) and the Lisif laboratory (EA 2385). #### 8.3. International Bilateral Relations #### 8.3.1. Europe 8.3.1.1. Comap Project: Collaboration with Germany Participants: Julien Lallet, Sébastien Pillement, Olivier Sentieys. <sup>15</sup> http://www2.lifl.fr/sensor/ The CoMap project (https://comap.enstb.org/) deals with the systematic mapping, evaluation, and exploration of massively parallel processor architectures that are designed for special purpose applications in the world of embedded computers. This is a French-German project with several teams from the both countries: - Hardware-Software-Co-Design, Department of Computer Science, University of Erlangen-Nuremberg - Laboratory of Circuits and Systems, Department of Electrical Engineering and Information Technology, Dresden University of Technology - Architecture and Systems, LESTER, Université de Bretagne Occidentale - R2D2 research team from IRISA, Université de Rennes - High Performance Computing and Architecture, ENST Bretagne The investigated class of computer architectures can be described by massively parallel networked processing elements that, using today's hardware technology, may be implemented on a single chip (SoC - System on a Chip). #### 8.3.1.2. Other European Collaboration R2D2 cooperates with Lund University (Sweden) on Constraints Programming approach application in the reconfigurable data-paths synthesis flow. R2D2 cooperates with the university of Girona in Spain (Computer Vision and Robotic Group of the Institute for Informatics and Applications) on parallel architectures for vision algorithms applied to underwater robot. R2D2 cooperates with James Whidborne, from Cranfield University on optimal finite-word-length and finite precision controller implementations and low-complexity controllers. #### 8.3.2. Africa R2D2 cooperates with ENIT in Tunis on the topic of mobile telecommunication architectures. R2D2 cooperates with University of Douala, University of Yaoundé and University of Dschang in Cameroun on models and tools for parallelization. This cooperation takes place in the scope of the SARIMA GIS for the development of research laboratories in Mathematics and Computer Science in Africa. #### 8.3.3. North America R2D2 maintains relations with the computer science department of the University of Colorado State in Fort-Collins on the development of MMAlpha. R2D2 cooperates with the LSSI laboratory of Trois-Rivières university in Québec, on the design of architectures for filters. R2D2 cooperates with Los Alamos National Laboratory (USA) on optimized reconfigurable architectures implementations for low-level image processing. R2D2 cooperates with the University of California, Riverside, on optimized image processing applications synthesis. R2D2 cooperates with the LRTS laboratory of Laval University in Québec on the topic of architectures for MIMO systems. ### 8.4. Visiting Scientists - Sébastien Roy (Laval University, Canada) from 06/21/07 for 6 weeks. - Michel Thériault (Laval University, Canada) from 09/1/07 for 8 months. # 9. Dissemination # 9.1. Scientific Community Animation - S. Derrien and P. Quinton received the Best Paper Award at ASAP'07 for [33]. - O. Sentieys is a steering committee member of the SOC-SIP Expert Group at the department STIC of the CNRS. He is the chair of the IEEE Circuits and Systems (CAS) french chapter. He is a member of the French National University Council since 2000 in signal processing and electronics (Conseil National des Universités en 61ème section). In 2007, he was a member of technical program committee of the following conferences: IEEE DDECS, IEEE ISQED, DCIS, DTIS, FTFC, GRETSI. He is on the editorial board of Journal of Low Power Electronics, American Scientific Publishers. He was a reviewer for IEEE/ACM DAC, IEEE VTC, IEEE Trans. on VLSI, IEEE Trans. on CAS-II, Signal Processing, Journal of Real-Time Image Processing, TSI. - D. Chillet is a member of the organisation committee of the Workshop on Design and Architectures for Signal and Image Processing (DASIP 2007) and a program committee member of Majecstic. - P. Quinton is member of the steering committee of the System Architecture MOdelling and Simulation (SAMOS) workshop. P. Quinton was member of the scientific committee of ASAP'07 and MAJECSTIC'07. - P. Scalart is reviewer for: IEEE Trans. on Signal Processing, IEEE Trans. on Speech & Audio Processing, IEEE Signal Processing Letters, Speech Communication. - C. Wolinski was a member of technical committee of the following conferences: DATE, DSD, ISQED. He is a member of Board of Directors of Euromicro Society. #### 9.2. Seminars and Invitations - Ch. Wolinski, K.Kuchcinski and A.Postola have presented "UPaK: Abstract Unified Pattern Based Synthesis Kernel for Hardware and Software Systems" at the University Booth, DATE 2007, Nice, France. - Ch. Wolinski has been an invited speaker at Cortina Systems, Ottawa, Canada in July 2007 and gave a presentation on "UPaK system". - Ch. Wolinski has been an invited speaker at Los Alamos National laboratory, USA in July 2007 and gave a presentation on "Abstract Unified Pattern Based Synthesis Kernel for Hardware and Software Systems". Sébastien Pillement has been an invited speaker at Laval University, Québec, Canada in November 2007 and gave a presentation on "Placement dynamique de tâches sur architecture reconfigurable dynamiquement". Sébastien Pillement has presented "Vers un language de description d?architectures reconfigurable" at the GDR CNRS SoC-SiP in Mars 2007. Olivier Sentieys has been an invited speaker at Laval University, Québec, Canada in November 2007 and gave a presentation on "Gestion de l'énergie dans les réseaux de senseurs: interaction algorithmes et plateforme". Olivier Sentieys has been an invited speaker at the summer school "École thématique ARCHI07, Architectures des systèmes matériels enfouis et méthodes de conception associées" in March 2007 and gave a presentation on "Digital Signal Processors". Olivier Sentieys has presented "Gestion de l?énergie dans les réseaux de capteurs: interaction algorithmes et plateforme" at the GDR CNRS SoC-SiP in Mars 2007. # 9.3. Teaching and Responsabilities - O. Berder teaches a course on processors architectures and signal processing at Enssat. - D. Chillet teaches a course on *advanced processors architectures* in Master STIR and on *Low-power digital CMOS circuits* at Enst de Bretagne. - E. Casseau's main teaching activities at Enssat are *signal processing* and *hardware description language*. He also teaches *Soc design methodologies* at ENST Bretagne Engineering school. - L. Perraudeau is responsible for a course on the object languages in the DESS Isa (Computer science and its applications) of the university of Rennes 1, teaches the design of integrated circuits in DIIC second year), and teaches in Licence d'informatique, in Deug Sciences, mention SM and STPI. - P. Quinton is deputy-director of Ecole Normale Supérieure de Cachan, responsible of the Brittany branch of this school. - P. Scalart is the head of electronics engineering department at Enssat. he teaches courses on signal processing at Enssat. - O. Sentieys is responsible for a signal and architecture module of the Master STI of the University of Rennes 1 and the DRT in electronic of Enssat. He teaches at Enssat and gives courses on *Methodologies for integrated system design* in Master STI and on *Digital IC: from synthesis to implementation* at the Executive Master in Microelectronics System Design and Technology (EnsiCaen and Philips/NXP). - C.Wolinski is responsible for Computer Organization and Architecture branch in DIIC. He is responsible for the following courses: CSE "Design of Embedded Systems" (DIIC), SIA "Signal, Image, Architectures" (DIIC), XAA" Advanced Architectures" (ENSC). # 10. Bibliography ### Major publications by the team in recent years - [1] M. CARTRON. Vers une plate-forme efficace en énergie pour les réseaux de capteurs sans fil, Ph.D. Thesis, University of Rennes 1, ENSSAT, December 2006. - [2] F. CHAROT, G. LE FOL, P. LEMONNIER, C. WAGNER, C. BOUVILLE, R. BARZIC. *Towards Hardware Building Blocks for Software-Only Real Time Video Processing: the MOVIE Approach*, in "IEEE Transactions on Circuits and Systems for Video Technology", vol. 9, n<sup>o</sup> 6, September 1999. - [3] R. DAVID, D. CHILLET, S. PILLEMENT, O. SENTIEYS. A Dynamically Reconfigurable Architecture for Low-Power Multimedia Terminals, in "SOC Design Methodologies", Kluwer Academic Publishers, 2002, p. 51–62. - [4] R. DAVID. Architecture reconfigurable dynamiquement pour applications mobiles, Thèse de Doctorat, Université de Rennes, July 2003. - [5] K. KUCHCINSKI, C. WOLINSKI. Global Approach to Scheduling Complex Behaviors based on Hierarchical Conditional Dependency Graphs and Constraint Programming, in "Journal of Systems Architecture", vol. 49, no 12-15, December 2003. - [6] C. MAURAS. *Alpha*: un langage équationnel pour la conception et la programmation d'architectures parallèles synchrones, Thèse de doctorat, Université de Rennes 1, December 1989. - [7] D. MENARD, D. CHILLET, O. SENTIEYS. *Floating-to-fixed-point Conversion for Digital Signal Processors*, in "EURASIP Journal on Applied Signal Processing (JASP), Special Issue Design Methods for DSP Systems", vol. 2006, n<sup>o</sup> 1, 2006, p. 1–15. - [8] V. MESSÉ. *Production de compilateurs flexibles pour la conception de processeurs programmables spécialisés*, Thèse de doctorat, Université de Rennes 1, March 1999. [9] D. MÉNARD. Méthodologie de compilation d'algorithmes de traitement du signal en précision infinie pour les processeurs en virgule fixe, Thèse de doctorat, Université de Rennes 1, December 2002. - [10] A. NOUMSI, S. DERRIEN, P. QUINTON. *Acceleration of a content-based image-retrieval application on the RDISK cluster*, in "20th International International Parallel and Distributed Processing Symposium (IPDPS 2006)", April 2006, p. 25-29. - [11] P. QUINTON, V. V. DONGEN.. *The mapping of linear recurrence equations on regular arrays*, in "Journal of VLSI Signal Processing", vol. 1, 1989, p. 93-113. - [12] P. QUINTON, Y. ROBERT. Systolic Algorithms and Architectures, Prentice Hall and Masson, 1989. - [13] R. ROCHER, D. MENARD, N. HERVÉ, O. SENTIEYS. *Fixed-Point Configurable Hardware Components*, in "EURASIP Journal on Embedded Systems (JES)", vol. 2006, n<sup>o</sup> 1, 2006, Article ID 23197, 13 pages. - [14] R. ROCHER. Evaluation analytique de la précision des systèmes en virgule fixe, Ph.D. Thesis, University of Rennes 1, ENSSAT, December 2006. - [15] C. WOLINSKI, M. GOKHALE, K. MCCABE. *Polymorphous fabric-based systems: Model, tools, applications*, in "Journal of Systems Architecture", vol. 49, no 4-6, September 2003. #### **Year Publications** #### **Doctoral dissertations and Habilitation theses** - [16] I. Benkermi. *Modèle et algorithme d'ordonnancement pour architectures reconfigurables dynamiquement*, Ph.D. Thesis, University of Rennes 1, ENSSAT, IRISA, January 2007. - [17] N. HERVÉ. Contributions à la synthèse d'architecture virgule fixe à largeurs multiples., Ph.D. Thesis, University of Rennes 1, ENSSAT, IRISA, March 2007. #### Articles in refereed journals and book chapters - [18] R. CHIKHI, S. DERRIEN, A. NOUMSI, P. QUINTON. Combining Flash Memory and FPGAs to Efficiently Implement a Massively Parallel Algorithm for Content-Based Image Retrieval, in "International Journal of Electronics", to appear, 2007. - [19] D. CHILLET, R. DAVID, E. GRACE, O. SENTIEYS. *Hiérarchie mémoire reconfigurable: vers une structure de stockage faible consommation*, in "Technique et Science Informatiques", to appear, 2007. - [20] P. COUSSY, E. CASSEAU, P. BOMEL, A. BAGANNE, E. MARTIN. Constrained algorithmic IP design for system-on-chip, in "Integration, the VLSI Journal, issue on Systems-on-Chip: Design and Test", vol. 40, n<sup>o</sup> 2, February 2007, p. 94–105. - [21] T. HILAIRE, P. CHEVREL, J. CLAUZEL. Low parametric sensitivity realization design for FWL implementation of MIMO controllers: Theory and application to the active control of vehicle longitudinal oscillations, in "International Journal of Tomography and Statistics", vol. 6, 2007, p. 128–133. - [22] T. HILAIRE, P. CHEVREL, J. WHIDBORNE. *A Unifying Framework for Finite Wordlength Realizations*, in "Circuits and Systems I: Fundamental Theory and Applications, IEEE Transactions on", vol. 54, n<sup>o</sup> 8, August 2007, p. 1765–1774. - [23] B. LE GAL, E. CASSEAU, S. HUET. *Dynamic Memory Access Management for High-Performance DSP Applications Using High-Level Synthesis*, in "IEEE Transactions on Very Large Scale Integration Systems", to appear, 2007. - [24] S. PILLEMENT, R. DAVID. *Architectures reconfigurable faible consommation réalité ou prospective ?*, in "Technique et Science Informatiques, numéro spécial SoC", vol. 26, 2007, p. 595–622. - [25] S. PILLEMENT, O. SENTIEYS, R. DAVID. *DART: A Functional-Level Reconfigurable Architecture for High Energy Efficiency*, in "EURASIP Journal on Embedded Systems (JES)", Article ID 562326, 13 pages, 2008, p. 1-13. #### **Publications in Conferences and Workshops** - [26] C. Andriamisaina, E. Casseau, P. Coussy. *Synthesis of Multimode digital signal processing systems*, in "NASA/ESA Conference on Adaptive Hardware and Systems (AHS-2007), Edinburgh, Scotland", August 2007, p. 318 325. - [27] E. CASSEAU, S. KHAN, B. LE GAL, W. AUBRY. *Multimode architecture design*, in "Workshop on Design and Architectures for Signal and Image Processing (DASIP'07), Grenoble, France", November 2007. - [28] C. CHAVET, C. ANDRIAMISAINA, P. COUSSY, E. CASSEAU, E. JUIN, P. URARD, E. MARTIN. *A design flow dedicated to multi-mode architectures for DSP applications*, in "IEEE/ACM International Conference on Computer-Aided Design (ICCAD07), San Jose, CA", November 2007. - [29] R. CHIKHI, S. DERRIEN, A. NOUMSI, P. QUINTON. Combining Flash Memory and FPGAs to Efficiently Implement a Massively Parallel Algorithm for Content-Based Image Retrieval, in "Proceedings of the International Workshop on Applied Reconfigurable Computing (ARC 2007), Mangaratiba, Brazil", P. DINIZ, E. MARQUES, K. BERTELS, M. FERNANDES, J. CARDOSO (editors), Lecture Notes in Computer Science (LNCS), vol. 4419, Springer-Verlag, March 2007, p. 247–258. - [30] D. CHILLET, I. BENKERMI, S. PILLEMENT, O. SENTIEYS. *Hardware Task Scheduling for Heterege-neous SoC Architectures*, in "15th European Signal Processing Conference (EUSIPCO'07), Poznan, Poland", September 2007. - [31] D. CHILLET, S. PILLEMENT, O. SENTIEYS. A Neural Network Model for Real-Time Scheduling on Heterogeneous SoC Architectures, in "IEEE International Joint Conference on Neural Networks, IJCNN'07, Orlando, FL", August, 12-17 2007, p. 102 107. - [32] D. CHILLET, S. PILLEMENT, O. SENTIEYS. Vers une implémentation matérielle d'un réseau de neurones pour le service d'ordonnancement des tâches au sein d'un SoC, in "GRETSI'07, Troyes, France", 2007. - [33] S. DERRIEN, P. QUINTON. *Parallelizing HMMER for Hardware Acceleration on FPGAs*, in "18th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2007), Montreal, Canada", Best Paper Award, July 2007, p. 10–18. [34] M. DJENDI, A. GILLOIRE, P. SCALART. *New frequency domain post-filters for noise cancellation using two closely spaced microphones*, in "15th European Signal Processing Conference (EUSIPCO'07), Poznan, Poland", September 2007. - [35] N. HERVÉ, D. MÉNARD, O. SENTIEYS. About the importance of operation grouping procedures for multiple word-length architecture optimizations, in "Proceedings of the International Workshop on Applied Reconfigurable Computing (ARC 2007), Mangaratiba, Brazil", P. DINIZ, E. MARQUES, K. BERTELS, M. FERNANDES, J. CARDOSO (editors), Lecture Notes in Computer Science (LNCS), vol. 4419, Springer-Verlag, March 2007, p. 191–200. - [36] T. HILAIRE, P. CHEVREL, J. WHIDBORNE. Low Parametric Closed-Loop Sensitivity Realizations using Fixed-Point and Floating-Point Arithmetic, in "Proc. European Control Conference (ECC'07)", July 2007. - [37] T. HILAIRE, D. MENARD, O. SENTIEYS. *Roundoff Noise Analysis of Finite Wordlength Realizations with the Implicit State-Space Framework*, in "15th European Signal Processing Conference (EUSIPCO'07), Poznan, Pologne", September 2007. - [38] S. HUET, S. LENOURS, O. PASQUIER, E. CASSEAU. *Granularity Issues in Transaction Level Modelling Digital Signal Processing Applications*, in "Forum on specification and Design Languages (FDL 2007), Barcelona, Spain", September 2007. - [39] A. KUPRIYANOV, F. HANNIG, D. KISSLER, J. TEICH, J. LALLET, O. SENTIEYS, S. PILLEMENT. Modeling of Interconnection Networks in Massively Parallel Processor Architectures, in "Proceedings of 20th International Conference on Architecture of Computing Systems, Zurich, Switzerland", Lecture Notes in Computer Science (LNCS), vol. 4415, Springer-Verlag, March 2007, p. 268–282. - [40] B. LE GAL, L. BOSSUET, S. KHAN, E. CASSEAU. HLS Design Flow for Multimode IP Generation Under Multiple Constraints, in "IEEE Conference on Electronics, Circuits and Systems (ICECS 2007)", December 2007. - [41] D. MENARD, R. SERIZEL, R. ROCHER, O. SENTIEYS. *Noise model for Accuracy Constraint Determination in Fixed-Point Systems*, in "Workshop on Design and Architectures for Signal and Image Processing (DASIP'07), Grenoble, France", November 2007. - [42] T. NGUYEN, O. BERDER, O. SENTIEYS. Cooperative MIMO Schemes Optimal Selection for Wireless Sensor Networks, in "Proceedings of IEEE 65th Vehicular Technology Conference, VTC2007-Spring, Dublin, Ireland", April 2007, p. 85–89. - [43] T. NGUYEN, O. BERDER, O. SENTIEYS. *Energy-efficiency optimization for cooperative MIMO schemes in wireless sensor networks*, in "IRAMUS Thematic Informational Workshop, Val Thorens, France", January 2007. - [44] T. NGUYEN, O. BERDER, O. SENTIEYS. Optimisation énergétique des transmissions MIMO coopératives pour les réseaux de capteurs sans fil, in "GRETSI'07, Troyes, France", 2007. - [45] R. ROCHER, D. MENARD, O. SENTIEYS, P. SCALART. *Analytical accuracy evaluation of Fixed-Point Systems*, in "15th European Signal Processing Conference (EUSIPCO'07), Poznan, Poland", September 2007. - [46] R. ROCHER, D. MENARD, O. SENTIEYS, P. SCALART. Evaluation analytique de la précision des systèmes en virgule fixe, in "GRETSI'07, Troyes, France", 2007. - [47] R. SANTORO, S. ROY, O. SENTIEYS. Search for Optimal Five-Neighbor FPGA-Based Cellular Automata Random Number Generators, in "Signals, Systems and Electronics, 2007. ISSSE'07. International Symposium on, Montréal, Canada", 2007, p. 343–346. - [48] T. SAÏDI, S. ROY, O. SENTIEYS. *A testbed for evaluation of MIMO WCDMA architectures*, in "Signals, Systems and Electronics, 2007. ISSSE'07. International Symposium on, Montréal, Canada", July 2007. - [49] O. SENTIEYS, O. BERDER, P. QUEMERAIS, M. CARTRON. Wake-up Interval Optimization fo Sensor Networks with Rendez-vous Schemes, in "Workshop on Design and Architectures for Signal and Image Processing (DASIP'07), Grenoble, France", November 2007. - [50] C. WOLINSKI, K. KUCHCINSKI. Computation Patterns Identification for Instruction Set Extensions Implemented as Reconfigurable Hardware, in "Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms (ERSA 2007), Las Vegas, USA", June 2007. - [51] C. WOLINSKI, K. KUCHCINSKI. Identification of Application Specific Instructions Based on Subgraph Isomorphism Constraints, in "18th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2007), Montreal, Canada", July 2007. #### Miscellaneous - [52] A. FLOC'H. Compilation pour architectures reconfigurables, Technical report, University of Rennes, June 2007. - [53] A. PASHA. Power Optimization of Channel Coder and Decoder Adapted for Wireless Sensor Networks, Technical report, University of Nice-Sophia Antipolis, June 2007. #### References in notes - [54] AIS. Application Notes and Interpretation of the Scheme (AIS), 1999. - [55] L. BENINI, G. D. MICHELI. *Networks on Chips: a New SoC Paradigm*, in "IEEE Computer", vol. 35, n<sup>o</sup> 1, January 2002, p. 70–78. - [56] D. C. CRONQUIST, P. FRANKLIN, C. FISHER, M. FIGUEROA, C. EBELING. *Architecture Design of Reconfigurable Pipelined Datapath*, in "Advance Research in VLSI", 1999. - [57] W. DALLY, B. TOWLES. *Route Packets, Not Wires: on-chip Interconnection Networks*, in "Proceedings of the 38th Design Automation Conference", June 2001. - [58] A. DEHON. Reconfigurable Architecture for General-Purpose Computing, Ph. D. Thesis, MIT, 1996. - [59] FIPS. Security Requirements for Cryptographic Modules, FIPS PUB 140-2, 1999, http://csrc.nist.gov/publications/fips/fips140-2/fips1402.pdf. - [60] R. GALLAGER. Low-density parity-check codes, in "IRE Trans. Inform. Theory", vol. 8, Jan. 1962, p. 21-28. - [61] A. GOLDSMITH, S. WICKER. *Design Challenges for Energy-Constrained Ad Hoc Wireless Networks*, in "IEEE Wireless Communications", vol. 9, n<sup>o</sup> 4, August 2002, p. 8–27. - [62] S. C. GOLDSTEIN, H. SCHMIT, M. BUDIU, S. CADAMBI, M. MOE, R. R. TAYLOR. *PipeRench: A Reconfigurable Architecture and Compiler*, in "IEEE Computer", April 2000. - [63] T. GRÖTKER, E. MULTHAUP, O. MAUSS. *Evaluation of HW/SW Tradeoffs Using Behavioral Synthesis*, in "ICSPAT'96, Boston", October 1996. - [64] S. GUYETANT, M. GIRAUD, L. L'HOURS, S. DERRIEN, S. RUBINI, D. LAVENIER, F. RAIMBAULT. *Cluster of Reconfigurable Nodes for Scanning Large Genomic Banks*, in "Parallel Computing", vol. 31, n<sup>o</sup> 1, 2005, p. 73–96. - [65] R. HARTENSTEIN. A Decade of Reconfigurable Computing: A Visionary retrospective, in "Design Automation and Test in Europe (DATE)", 2001. - [66] S. HAUCK, T. FRY, M. HOSLER, J. KAO. *The Chimera Reconfigurable Functional Unit*, in "IEEE Symposium on FPGAs for Custom Computing Machines", 1997. - [67] J. HAUSER, J. WAWRZYNEK. *GARP: A MIPS processor with a reconfigurable coprocessor*, in "IEEE Symposium on FPGAs for Custom Computing Machines", June 1997. - [68] H. KEDING, M. COORS, O. LUTHJE, H. MEYR. *Fast Bit True Simulation*, in "Design Automation Conference 2001 (DAC 2001), Las Vegaus", June 2001. - [69] K. KEUTZER, S. MALIK, R. NEWTON, J. RABAEY, A. SANGIOVANNI-VINCENTELLI. *System Level Design* : *Orthogonalization of Concerns and Platform-based Design*, in "IEEE Transactions on Computer-Aided of Circuits and Systems", vol. 19, n<sup>o</sup> 12, December 2000. - [70] K. KUM, J. KANG, W. SUNG. AUTOSCALER for C: An optimizing floating-point to integer C program converter for fixed-point digital signal processors, in "IEEE Transactions on Circuits and Systems II", vol. 47, September 2000, p. 840-848. - [71] L. L'HOURS. Generating Efficient Custom FPGA Soft-Cores for Control-Dominated Applications, in "Proceedings of the 16th IEEE International Conference on Application-Specific Systems, Architectures, and Processors", S. VASSILIADIS, N. DIMOPOULOS, S. RAJOPADHYE (editors), IEEE Computer Society, July 2005, p. 127–133. - [72] R. LEUPERS. Retargetable Code Generation for Digital Signal Processors, Kluwer Academic Publishers, 1997. - [73] G. Lu, H. Singh, M. Lee, N. Bagherzadeh, F. Kurdahi, E. Filho. *The MorpoSys Parallel Reconfigurable System*, in "Euro-Par'99, LNCS 1685", 1999. - [74] G. MARSAGLIA. *Diehard: A Battery of Tests of Randomness*, Technical report, Florida State University, Tallahassee, FL, USA, 1996, http://stat.fsu.edu/pub/diehard/. - [75] S. PEES, A. HOFFMANN, V. ZIVOJNOVIC, H. MEYR. LISA Machine Description Language for Cycle-Accurate Models of Programmable DSP Architectures, in "DAC 1999", June 1999. - [76] J. RABAEY. A low-energy heterogeneous reconfigurable DSP IC, in "Design Automation Conference (DAC)", June 2000. - [77] A. RUKHIN, J. SOTO, J. NECHVATAL, M. SMID, D. BANKS. A Statistical Test Suite for Random and Pseudorandom Number Generators for Statistical Applications, in "NIST Special Publication in Computer Security", 2001, p. 800-22. - [78] C. RUPP, M. LANDGUTH, T. GRAVERICK, E. GOMERSALL, H. HOLT. *The NAPA Adaptative Processing Architecture*, in "IEEE Symposium on FPGAs for Custom Computing Machines", April 1998. - [79] A. SANGIOVANNI-VINCENTELLI, G. MARTIN. *Platform-Based Design and Software Design Methodology for Embedded Systems*, in "IEEE Design and Test of Computers", November 2001. - [80] R. SCHREIBER, S. ADITYA, S. MAHLE, V. KATHAIL, B. RAU, D. CRONQUIST, M. SIVARAMAN. *PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators*, Technical report, n<sup>o</sup> HPL-2001-249, HP Laboratories Palo Alto, October 2001. - [81] M. SRIVASTAVA. *Power-aware Communication Systems*, in "Power-aware Design Methodologies", M. PEDRAM, J. RABAEY (editors), chap. 11, Kluwer Academic Publishers, 2002, p. 297–334. - [82] M. WILLEMS, V. BURSGENS, H. KEDING, H. MEYR. System Level Fixed-Point Design Based On An Interpolative Approach, in "Design Automation Conference (DAC-97)", 1997. - [83] J. J. DA SILVA, J. SHAMBERGER, J. AMMER, C. GUO, S. LI, R. SHAH, T. TUAN, M. SHEETS, J. RABAEY, B. NIKOLIC, A. SANGIOVANNI-VINCENTELLI, P. WRIGHT. *Design Methodology for PicoRadio Networks*, in "Design, Automation and Test in Europe Conference", IEEE/ACM, 2001.