INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE # Team R2D2 # Reconfigurable and Retargetable Digital Devices # Rennes # **Table of contents** | ı. | leam | 1 | | | |----|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------|--|--| | 2. | Overall Objectives | | | | | | 2.1. Introduction | 1 | | | | | 2.2. New architectures and technologies | 2 | | | | | 2.3. Synthesis of dedicated hardware accelerators | 2<br>3<br>4 | | | | | 2.4. Exploration, estimation, prototyping for the design of silicon systems | 4 | | | | 3. | Scientific Foundations | | | | | | 3.1. Panorama | 4 | | | | | 3.2. Regular program parallelization and dedicated hardware accelerator synthesis | 4<br>5 | | | | | Processor modeling and flexible compilation | | | | | | 3.4. Fixed-point implementation of algorithms | 6<br>6 | | | | | 3.5. New reconfigurable architectures | | | | | 4. | FF The state of th | 7 | | | | | 4.1. Panorama | 7 | | | | | 4.2. Mobile telecommunications | 7 | | | | | 4.3. Adaptive filtering | 7 | | | | | 4.4. AES ciphering algorithm | 8 | | | | 5. | | 8 | | | | | 5.1. Panorama | 8 | | | | | 5.2. PolyLib | 8 | | | | | 5.3. MMAlpha | 8 | | | | _ | 5.4. BSS | 9 | | | | 6. | | 10 | | | | | 6.1. New architectures and technologies | 10 | | | | | 6.1.1. DART architecture | 10 | | | | | 6.1.1.1. Functional-level reconfigurable architecture | 10 | | | | | 6.1.1.2. Reconfigurable arithmetic units | 10 | | | | | 6.1.1.3. Study and technological validation of a new reconfigurable architecture | 11 | | | | | 6.1.2. Network-on-Chip | 11 | | | | | 6.1.3. Innovative architectures for low-power sensor networks | 11 | | | | | 6.1.4. Multiple-Valued Logic architectures and circuits | 11 | | | | | 6.2. Synthesis of parallel and dedicated accelerators | 12 | | | | | 6.2.1. Synthesis of multi-dimensional time architectures | 12<br>12 | | | | | 6.2.2. Expression re-use in reduction expressions | 12 | | | | | <ul><li>6.3. Exploration, estimation, and prototyping for the design of silicon systems</li><li>6.3.1. Extension to the ARMOR architecture description language</li></ul> | 13 | | | | | 6.3.2. CALIFE compilation flow | 13 | | | | | 6.3.3. Methodology for automatic floating-point to fixed point conversion | 13 | | | | | 6.3.4. Software environment for reconfigurable platforms | 13 | | | | | 6.3.4.1. Compiler framework for DART | 14 | | | | | 6.3.4.2. Real-time schedulers | 14 | | | | | 6.3.4.3. Operating systems developments | 14 | | | | | 6.3.5. SystemC modeling of processors | 15 | | | | | 6.3.6. Memory hierarchy | 15 | | | | | 6.4. Study of applications | 15 | | | | | 6.4.1. 3G mobile application prototyping | | | | | | 6.4.2. Multi-user detection | 15<br>15 | | | | | | 10 | | | | | 6.4.3 | . Image indexing | 16 | |-----|----------|-------------------------------------------------------------------------------------------|-----------| | | | . Cryptographic applications | 16 | | | | . Biomedical multi-sensor | 16 | | | 6.4.6 | | 16 | | 7. | | s and Grants with Industry | 16 | | | | A Hades: high-speed and secure networks (2002-2003) | 16 | | | | Γ Ozone (2002-2004) | 17 | | | | IRASE: reconfigurability and VLIW processors in parallel heterogeneous architectures | 17 | | | | chitectures based on multiple-valued logic for telecommunication applications (2001-2004) | 17 | | | | GAR (2003-2005) | 17 | | 8. | | rants and Activities | 18 | | | 8.1. Na | tional initiatives | 18 | | | 8.1.1 | . RDISK: Reconfigurable DISK | 18 | | | 8.1.2 | . ReMiX: Reconfigurable Memory for Indexing Huge Amount of Data | 18 | | | 8.2. Int | ernational bilateral relations | 19 | | | 8.2.1 | . Europe | 19 | | | 8.2.2 | . Africa | 19 | | | 8.2.3 | . North America | 19 | | | 8.3. Vi: | siting scientists | 19 | | 9. | Dissemin | ation | <b>20</b> | | | 9.1. Ac | tivities in the scientific community | 20 | | | 9.2. Te | aching and responsabilities | 20 | | 10. | Bibliogr | aphy | <b>20</b> | # 1. Team The team R2D2 is located on two sites: Rennes and Lannion. #### Head of team François Charot [CR INRIA] Olivier Sentieys [Professor, university of Rennes 1, Enssat] #### Administrative assistant Lydie Letort [TR INRIA] #### Staff member, CNRS Charles Wagner [IR CNRS ASCII] #### Staff member, University of Rennes 1 Michel Aline [ATER until 08/31/03, Enssat] Daniel Chillet [Associate professor, Enssat] Steven Derrien [ATER until 08/31/03, Associate professor from 09/01/03, Ifsic] Hélène Dubois [Associate professor, Enssat] Anne-Claire Guillou [ATER until 08/31/03, Ifsic] Michel Guitton [Associate professor, Enssat] Daniel Menard [Associate professor from 09/01/03, Enssat] Laurent Perraudeau [Associate professor, Ifsic] Sébastien Pillement [Associate professor, IUT Lannion] Patrice Quinton [Professor, Ifsic] Taofik Saïdi [IR until 09/30/03, Enssat] Pascal Scalart [Professor from 02/01/03, Enssat] Christophe Wolinski [Professor from 09/01/03, Ifsic] #### Ph.D. students Imène Benkermi [University grant, Irisa-Lannion] Mickaël Cartron [Brittany Region grant from 10/01/03, Irisa-Lannion] Stéphane Chevobbe [CEA grant] Raphaël David [MENRT grant until 08/31/03, Irisa-Lannion] Gautam Gupta [CIES grant, co-supervision with Colorado State University, Irisa-Rennes] Nicolas Hervé [University grant, Irisa-Lannion] Ekué Kinvi-Boh [MENRT grant, Irisa-Lannion] Ludovic L'Hours [MENRT grant, Irisa-Rennes] Madeleine Nyamsi [INRIA grant, Irisa-Rennes] Jean-Marc Philippe [Brittany Region - University grant, Irisa-Lannion] Romuald Rocher [MENRT grant from 10/01/03, Irisa-Lannion] Taofik Saïdi [University grant from 11/01/03, Irisa-Lannion] # 2. Overall Objectives #### 2.1. Introduction The problems tackled by the team R2D2 relate to the design of specialized systems on reconfigurable platforms. A hardware platform is a structure of integrated circuits containing a set of programmable components (general purpose or specific processor cores), memories and generally specialized components. Such a platform can be seen as an integrated architecture scheme, common to numerous algorithms from the same application domain. This notion is the answer given by the designers of embedded systems to the increasing difficulty they have to implement their applications [50]. One can consequently imagine that in the future, most of the integrated circuits necessary to the design of a complex system will be derived from a given existing platform. This approach of design is an alternative to the IP-based (*Intellectual Property*) design approach, in which the system is built by the assembling of separately designed components. A reconfigurable platform is a platform including a set of reconfigurable components (blocks of reconfigurable logic, reconfigurable data-path, flexible communication networks). The reconfigurable resources enable a far more efficient (area and power) use of the silicon than in programmable processors or in specialized components. There is no doubt that future platforms will be highly parallel, heterogeneous, programmable and reconfigurable. Parallelism is the only way of reaching the performance level required by future applications. Heterogeneity results from the report that several subsystems, characterized by well-differentiated computation requirements, have to coexist in an efficient design. Programmability avoids freezing the functionalities. Finally, reconfigurability combines the speed of specialized solutions and the flexibility of traditional programmable components. The scientific objectives of the team R2D2 are characterized by the following elements. - A methodological framework. We seek to profit from various methods (very high-level synthesis, behavioral synthesis, flexible compilation, generation of fixed-point code, etc.), contributing each one with its specificities, to the design of a part of the system. The models and the underlying techniques allow the use of estimators, thus contributing to the choices of implementation, with a precise knowledge of the performance of the system, of its complexity and its power consumption. - Privileged fields of applications: third- and fourth-generation mobile telecommunications, cryptography, image indexing, biomedical and speech processing. Research undertaken within the team R2D2 aims at facilitating the design of reconfigurable hardware systems, by proposing models of architectures and associated design methodologies which favor the adequacy between the algorithms of the applications and the architectures supporting the implementation. #### 2.2. New architectures and technologies **Key words:** reconfigurable architecture, grain of calculation, low-power consumption architecture, Network-on-chip, NoC, sensor network, multiple-valued logic, SoC. By the end of the decade, the technology of the integrated circuits should allow the integration of a billion transistors on a chip, instead of few tens of millions today as the documents published by the SIA¹ illustrate it. The hardware systems at the base of the future equipments will be miniaturized – one now usually speaks about System-on-Chip (SoC) – while mixing architectures which will be highly heterogeneous and will include dedicated hardware accelerators. Even if electronic CAD tools and associated design methodologies progressed much during last years, the design of new integrated circuits is therefore not easier today. On the contrary, the distance between the capacities offered by the technology and the potential of the current design tools – the famous *technology gap*, – was never as large. A rather fundamental change in the way of designing circuits seems to take shape. This evolution of the technology has an impact on the architectures of the integrated circuits. With the years, a migration of architectures is noted: from ASIC towards SoC, and in an immediate future towards reconfigurable programmable platforms. • ASIC were prevalent between 1980 and 1995, and from now on are only used like particular blocks in more complex heterogeneous systems. <sup>&</sup>lt;sup>1</sup>Semiconductors Industry Association: http://public.itrs.net/Files/2002Update/ • The first SoC were designed around 1995. Thanks to the increasing density of chips, a complex SoC usually integrates one or more processor cores (general purpose processor or digital signal processor), memory blocks (RAM, ROM, flash memory, EPROM, etc.), as well as many different interfaces useful for the correct working of the system. They combine hardware and software components. Their design rests on the use of synthesis, place and route tools, and libraries of reusable components. • In the near future, SoC will evolve to platforms, these structures of integrated architectures, common to a set of algorithms or applications belonging to the same field of applications. The design tools and methodologies must thus make it possible to design a specialized architectures starting from this basic architecture [60]. The platforms will allow the needs for a broader spectrum of applications to be satisfied, at the price of a reduction of the variety of designed circuits. The research projects that we carry out relate to the study of new organizations of reconfigurable structures offering the speed of specialized solutions and the flexibility of traditional programmable components, with regards to application areas from the field of telecommunications. These new organizations constitute one of the components of the reconfigurable platforms. The work concerns in particular the definition of an architectural model, called DART associating high-performance and low-power consumption. This architecture targets multimedia terminals: it must be able to deal with applications very different in terms of granularity, computation or temporal constraints. The second aspect concerns network-on-chip (NoC). As the number of resources in a SoC (i.e. processors or IP blocks) is growing, its design is becoming more and more difficult. Nowadays, one of the most important issue is the design of the communication network. Our research deals with the application of advanced mobile telecommunication techniques to NoC design. Our goal is to design an efficient interconnection scheme enabling computation resources to work as fast as possible, while being energy efficient and more robust against noise. The third aspect concerns architectures for low-power sensor networks. Recent evolutions in wireless devices allow us to imagine dense wireless sensors networks. The aim of these sensor networks would be to collect information from an area and to relay it through the network. On the one hand, the communications must be wireless, in order to keep the cost of the network low, while allowing the installation to be simple enough to make it possible. On the other hand, the sensor networks must have a high energy autonomy. These two conflicting conditions show the energy challenge raised by sensor networks. In addition, sensor networks are ad-hoc networks and must self-organize. Moreover our work concerns the study of Multiple-Valued Logic (MVL) circuits and architectures. Nowadays, numerical systems are exclusively based on a binary representation of numbers and computations. It was shown that the use of a higher number of logical states can make it possible to optimize the output of the treatments, interconnections and memory elements [44]. These techniques contribute in particular to the improvement of the bandwidth during the transmission or the storage of data, the fulfillment of complex functions in a lower time and with a consumption lower than binary logic or the use of complex functions of encoding. # 2.3. Synthesis of dedicated hardware accelerators **Key words:** high-level synthesis, CAD, parallel architecture, integrated circuit, design methodology. Although the architecture of integrated circuits evolves to increasingly programmable and reconfigurable solutions, it remains that the future silicon systems will continue to integrate specialized hardware components. The design of such components rests on the use of synthesis techniques. In circuit design, one calls synthesis the process which transforms a behavioral description into an architectural description which implements it. The current design tools include generally various synthesis software. During last years the tendency of the tools is to raise the abstraction level of the specifications. One traditionally calls very high-level synthesis, the synthesis which allows an architecture to be derived starting from a program. The methodologies studied in the team are based on the polyhedral model, which is well-suited to the expression of the calculation parts of the applications and which allows the expression and the handling of systems of recurring equations. This model is the base of the principal formalisms proposed for the synthesis of regular architectures, formalisms to which the Alpha language belongs [54]. The research projects which we carry out aim to the design of parallel hardware accelerators dedicated to the intensive calculation parts of the applications. These hardware accelerators constitute one of the components of the reconfigurable platforms. Studied methodologies take into account the design of the interfaces necessary to the control, the initialization and the effective data supply of the hardware accelerator. This work is done in close cooperation with the team CompSys (LIP, ENS Lyon). #### 2.4. Exploration, estimation, prototyping for the design of silicon systems **Key words:** architecture synthesis, retargetable compilation, architecture modeling, ASIP design, fixed-point arithmetic. The implementation of an application on a reconfigurable platform requires the setting up of a set of various techniques (architecture synthesis, flexible compilation, fixed-point code generation, profiling, etc.) which, by successive refinements, contribute to the implementation choices of the various parts of the application on the components of the platform. The research activities which we carry out aim at setting up methodologies allowing the implementation of various parts of the application on the various components of the platform. The efficient implementation of an algorithm on a specialized processor, such as for example a DSP (Digital Signal Processor) or an ASIP (Application Specific Instruction-set Processor), or on a hardware structure, such as an ASIC or a FPGA (Field Programmable Gate Array), requires for reasons related to cost, consumption or silicon area constraints, the use of fixed-point arithmetic, whereas the algorithms are usually specified in floating-point arithmetic. We develop a methodology to transform an algorithm specified into floating-point in a fixed-point specification. Methodology must determine the optimal coding of the data, making it possible to maximize the precision and to minimize the execution time and the code size. The modeling of hardware systems is a key aspect of the architecture design space exploration process. Our work relates to the study of architecture description languages (ADL). Those play a significant role in the development of architectures, their dimensioning with respect to the constraints of the target applications, and the support of the associated software development environments (compiler, simulator, processor design tools, etc.). We develop a flexible compilation infrastructure, called CALIFE, allowing the prototyping of compilers for ASIP specialized processors, modeled using the ARMOR description language. Part of these tools are the basis of the software environment studied for the programming of reconfigurable platforms. # 3. Scientific Foundations #### 3.1. Panorama R2D2 research activities are based on work resulting from two scientific communities whose competences are complementary for the design of hardware systems: the first relates to the design methods and tools for specialized architecture design and the second concerns signal processing and dedicated circuit architectures. We briefly present some bases of our research: principles and challenges related to the parallelization of regular programs and the design of dedicated hardware accelerators, an outline of the techniques used to model specialized processors, the problem of the implementation of algorithms specified in floating-point in fixed-point architectures, as well as the challenges related to the study of new reconfigurable architectures. # 3.2. Regular program parallelization and dedicated hardware accelerator synthesis **Key words:** high-level synthesis, parallel architecture, integrated circuit, design methodology. Today circuits synthesis starts from high-level specifications. The specification of programs carrying out regular computations in the form of recurring equations allows powerful static analyses and transformations of programs for the derivation of regular architectures[11]. The base of our research is the polyhedral model, which is well-adapted to the expression of the calculation parts applications and which allows the expression and the handling of systems of recurring equations. There exist many prototypes of academic environments for the automatic synthesis of specialized architectures starting from high-level specification: for example, Diastol, Presage, Hifi, Cathedral, Sade, PEI and MMAlpha. Tools performing a high-level synthesis from the C language exist now on the market: tools based on SystemC<sup>2</sup> like *CoCentric SystemC Compiler*<sup>3</sup> of Synopsys, A|RT Builder<sup>4</sup> of Adelante Technologies/Frontier Design, tools based on C and its extensions as *Celoxica DK1 Design Suite* of Celoxica. Few tools rest on a true parallelization but many research projects explore this approach: Flex<sup>5</sup> and Raw<sup>6</sup> at MIT, Piperench at Carnegie-Mellon, Garp<sup>7</sup> at Berkeley, Pico [61] at HPLabs Palo Alto, Compaan<sup>8</sup> in Leiden. Alpha[5] and MMAlpha, developed in the project-team Cosi, evolved from Diastol and constitute today a practical environment for the handling of recurring equations and the high-level synthesis of dedicated hardware accelerators. #### 3.3. Processor modeling and flexible compilation **Key words:** architecture description language, ASIP, specialized processor, retargetable compilation, flexible compilation. Hardware description languages like VHDL or Verilog are largely used to model and simulate processors, but mainly with the aim of hardware design. The design of SoC requires methodologies and tools for the exploration of the architecture design space. This exploration passes by the use of architecture description languages (ADL), adapted to the specification of the SoC architecture models. Very early in the design process, they play a role on the one hand for the validation of SoC architectures, and on the other hand for the automatic generation of the software development tools necessary to the software and hardware design of the architecture. The majority of the existing architecture description languages aim at the specification of processor architecture, by privileging either the synthesis, or the generation of compilers, or the generation of simulators, but very seldom the whole. None of the existing languages is really directed towards architectural exploration. In the category of architecture description languages mainly directed towards processor hardware synthesis, one can quote Mimola, developed at the university of Dortmund, and used to describe target machines in the MSSQ and Record [52] compilers. Mimola is very close to hardware description languages like VHDL or Verilog. A Mimola description can be employed for the synthesis, simulation, and code generation, after extraction of the instruction set. With regard to the architecture description languages mainly directed towards compilation, one can quote nML, designed at the university of Berlin, ISDL proposed by the MIT, MDES developed at the university of Illinois, Expression developed at the University of California at Irvine. With regard to the architecture description languages mainly directed towards simulation, one can quote LISA [56], developed at the university of Aachen. LISA allows the generation of cycle-accurate simulators for DSP processors. Both the structure and the behavior can be modeled. The existing architecture description languages can be classified according to the modeling level: behavioral or structural. A language like Mimola is of structural level, languages like nML and ISDL are of behavioral level. LISA, Expression and MDES mixes the two levels of modeling. <sup>&</sup>lt;sup>2</sup>http://www.systemc.org <sup>&</sup>lt;sup>3</sup>http://www.synopsys.com/products/cocentric\_studio/ <sup>&</sup>lt;sup>4</sup>http://www.adelantetech.com/en/html/algemeen/Products/ARTproducts/Builder/Builder.asp <sup>&</sup>lt;sup>5</sup>http://flex-compiler.lcs.mit.edu <sup>6</sup>http://cag.lcs.mit.edu/raw <sup>&</sup>lt;sup>7</sup>http://brass.cs.berkeley.edu/garp.html <sup>8</sup>http://www.liacs.nl/~cserc/compaan/index.html There is no standard as regards architecture description languages, as this panorama attests it. The ARMOR language developed in the project-team Cosi, from its modularity and its concision seems to us to be adapted to the modeling of complex architectures of hardware systems, as well with respect to the architectural exploration and as of the automatic generation of development tools. #### 3.4. Fixed-point implementation of algorithms **Key words:** fixed-point arithmetic, data coding, precision. The algorithms, such as generally proposed by their authors and the standardization committees in charge of their diffusion, usually handle the data in floating-point format. Their implementation within hardware systems supporting fixed-point arithmetic requires to carry out a conversion of the floating-point description of the algorithm into a fixed-point specification in order to satisfy the cost, consumption and size constraints required by the applications. This conversion is a tiresome task and error-prone if it is carried out manually. Indeed, some experiments [46] showed that the time devoted to this conversion step is relatively significant, manual conversion being able to represent up to 30% of the total time necessary to the implementation of the algorithm. Let us note in addition that the time-to-market constraint requires the use of high-level development tools, allowing to automate certain tasks. The existing methodologies for fixed-point data automatic coding [51][62] carry out a transformation from floating-point data representation into a fixed-point representation, without taking into account the architecture of the target processor. However the analysis of the influence of the architecture on the precision of computation and the various phases of the code generation shows the need for taking the architecture features into account and for coupling the coding and code generation processes to obtain an implementation of quality in terms of precision of calculations and execution time. Data coding optimization must be carried out under precision constraint, and it is thus necessary to determine the signal-to-quantization noise ratio (SQNR) of the application. The SQNR determination methods [49] are generally based on simulation. But within the framework of the data coding optimization these methods use an iterative process leading to high times of optimization. The analytical study of determination method of SQNR is a way that we study. # 3.5. New reconfigurable architectures **Key words:** reconfigurable architecture, grain of computation, low-power, high-performance. These last years saw the emergence of new reconfigurable architectures, which are a new alternative to the traditional performance/flexibility compromise, conditioning the choice between purely hardware (ASIC) or purely software (programmable processor) solutions. Nevertheless, in spite of the great number of research activities on reconfigurable architectures [47], none of them has the ambition to associate the three main constraints which are high-performance, low-power consumption and flexibility. As an example, the Pleiades [57] project is an architectural platform supporting several grains of calculations, – logic operations are treated as effectively as the arithmetic operations, – designed in order to consume a minimum of energy whatever the level of required performance. However, this platform does not make it possible to support the set of constraints previously discussed because of the static feature of its reconfiguration which limits it to certain field of applications, the coding of words having been the support of study. The dynamic reconfiguration suggested by the Chameleon processor<sup>9</sup> results in a sufficient flexibility to meet the needs inherent in an field of applications, while authorizing the management of a number of calculation resources allowing the treatment of a complete chain of communication. In spite of that, the lack of control of its consumption prohibits its use within a portable multimedia terminal. In addition to these two examples, many reconfigurable architectures are based on FPGA-type circuits and the majority of them, such as GARP [48], NAPA [58], Chimaera [59], integrate a traditional programmable processor in charge of the sequencing of the treatments on the reconfigurable block. Other architectures such as <sup>9</sup>http://www.chameleonsystems.com Piperench [45] or RaPiD [42] can be reconfigured at a higher level, respectively at the operator and functional level. The concept of grain of calculation indeed constitutes an interesting and significant research subject. The majority of the FPGA circuits are « fine grain » since they can be reconfigured at the bit level, which contrasts with the way in which the programmable processors handle words (32-bit words for a number of them). When bit-level reconfiguration is not required by the application, coarse grain structures must be built starting from the elementary blocks of the reconfigurable structure, which results in a over-cost of the circuit. To limit this over-cost, new coarse grain reconfigurable architectures are proposed. It results in structures in which the elementary blocks correspond to arithmetic logic units, multipliers, memories, etc. In addition to Piperench and RaPiD already mentioned, the architectures Matrix [43] at MIT, MorphoSys [53] at the University of California at Irvine, can be quoted. And among the commercial realizations: the array of reconfigurable arithmetic logic units of Elixent <sup>10</sup>, and the XPP processors of PACT. Numbers of these architectures approach the reconfiguration in a way much more efficient than the FPGA circuits do it, without satisfying the performance/flexibility/power consumption compromise previously evoked. In short, among descriptions of projects on reconfigurable architectures the objectives of « high-performance », « flexibility » or « low-power consumption » are usually proposed but the latter are never combined within the same study. DART architecture aims at satisfying all of these constraints. # 4. Application Domains #### 4.1. Panorama The privileged field of applications is that of third- and fourth-generation mobile telecommunications. The other application domains considered are adaptive filtering, image indexing, data ciphering (AES algorithm) for cryptography, biomedical and speech processing. #### 4.2. Mobile telecommunications The future generations of telecommunications constitute a privileged field of applications for integrated circuit designers because of the diversity of the constraints to satisfy. In addition to the very high-level of performance – superior to 12 billion operations per second – resulting from the association of multimedia capacities and access techniques such as the WCDMA which these systems will have to support (known as 3G), is added the need for supporting the whole of the algorithms integrated into the standards of present generations (GSM, DEC, IS-95) and their evolutions. From the point of view of hardware architectures, the next generation systems will have successively to treat very different applications. Indeed, the common tasks in a third-generation communication chain handle variable data sizes according to « distance » separating the task from the transmitter or the receiver, – the application tasks handle data of high-granularity such as images whereas the tasks giving access to the transmission support work on data coded at the bit level. Because of the importance of the application spectrum integrated into the future telecommunication standards, the treatments to be applied to these data will also be very diversified, which will result in very different calculation patterns. Even if each one of these constraints can be supported, the problem is much more delicate when they are combined, the time-to-market constraints impose the definition of development tools as portable as effective. When in addition, the result must consume energy very little – lower than 500mW in peak, – this problem is insolvable if one limits oneself to the current architectural solutions. # 4.3. Adaptive filtering Many applications related to telecommunications require the use of parallel circuits. Adaptive filtering algorithms, in particular, lend themselves well to this type of study. In collaboration with the university of <sup>10</sup>http://www.elixent.com/ Trois-Rivières in Québec, collaboration supported in the course of time by various contracts, filter architectures are studied and synthesized to be implemented in reconfigurable circuits or integrated circuits. Two particular adaptive filtering algorithms are studied: a filtering algorithm with adaptation delayed by the method of least squares (DLMS), and a filtering algorithm with adaptation by neural network. In both cases, research carried out consists in synthesizing an architecture using the MMAlpha software, and comparing the architecture obtained with a version designed with standard tools. The synthesis highlights limitations of the MMAlpha software which result in extending the functions from them, thus allowing the improvement of the synthesis techniques. # 4.4. AES ciphering algorithm High-speed and secure networks require the use of parallel circuits allowing an efficient implementation of encryption algorithms. Architectures carrying out the encryption AES algorithm are studied with the goal to be implemented on reconfigurable circuits. Carried out work consists in studying the synthesis of architectures starting from SystemC, a modeling language based on C++, developed by the main actors of the CAD tools community to become a standard language in the field of hardware/software codesign. SystemC allows both high-level (functional) and low-level (register transfer-level) descriptions of an application. # 5. Software #### 5.1. Panorama **Key words:** *library*, *polyhedral computation*. Research undertaken by R2D2 is in the context of software and hardware tools for the design of hardware systems. In order to promote the studied techniques, several software prototypes are developed (Polylib, MMAlpha, BSS, ARMOR/CALIFE). Among those, three distributed software are presented: Polylib an *open source* library of calculation on polyhedron, MMAlpha for the high-level synthesis and BSS a platform for the design of circuits. # 5.2. PolyLib Participants: Patrice Quinton [contact], Tanguy Risset [CompSys, INRIA Rhône-Alpes]. **Key words:** design circuit, architecture synthesis, low-power consumption, CAD, ASIC, data parallelism, automatic parallelization, library, polyhedral computation. The polyhedral Polylib library, developed in C, is an *open source* library of calculation on convex polyhedron. It was developed initially by Herve Le Verge and Doran Wilde at INRIA Rennes. It is today maintained and developed with the LIP (ENS Lyon) and the ICPS of the university of Strasbourg. The handling of the domains used in the recurring equations or spaces of indices described by nested loops justifies the use of such a library. This library is currently used (independently of MMAlpha) by several research organizations (in England, the United States, the Netherlands, and in France). To know some more, refer to http://www.irisa.fr/polylib or contact Patrice Quinton. # 5.3. MMAlpha **Participants:** Anne-Claire Guillou, Patrice Quinton [contact], Tanguy Risset [CompSys, INRIA Rhône-Alpes]. **Key words:** architecture synthesis, CAD, ASIC, functional programming, data parallelism. MMAlpha is a software which implements transformations on the Alpha language. The Alpha language was proposed by Christophe Mauras during his thesis in 1989. The implementation is carried out in the Mathematica language (from where the name MMAlpha) and is built on the Polylib library. Alpha program transformations are implemented by using the possibilities of Mathematica and the Polylib library. The principle of use of these transformations is to derive either an architecture, sequential or parallel code starting from an algorithmic specification of a treatment. These transformations are semi-automatic, i.e. the actions to be performed are indicated by the user but the transformation itself is carried out by MMAlpha. It is possible to carry out an automatic derivation but the experiment shows that the design space is so huge that this is seldom satisfactory. The design methodology is inherited from the method of systolic array synthesis. This field was studied from the theoretical point of view and the MMAlpha environment makes it possible to test the various existing synthesis strategies, to study various possibilities of parallelization and to generate an architectural description of a circuit thanks to the AlpHard format (subset of the Alpha language). The communication with logic synthesis tools is done thanks to a translation of the AlpHard format towards VHDL. The software was the implementation support of many theses carried out at Irisa. It is used by several research teams within the framework of collaborations with R2D2. It is one of the only tools making it possible to describe an algorithm and its hardware implementation in the same language and to deduce this implementation with proven transformations. To know some more, refer to http://www.irisa.fr/R2D2/ALPHA/ or contact Patrice Quinton. #### **5.4. BSS** Participants: Daniel Chillet [contact], Sébastien Pillement, Olivier Sentieys. **Key words:** circuit design, architecture synthesis, low-power consumption, placement. The BSS (*Breizh Synthesis System*) software platform for circuit design proposes a set of tools for the capture of application description (in VHDL or in C), the compilation, the simulation and the synthesis of architecture. This one is developed in order to make the tools accessible by Internet. The platform is currently composed of the following modules. - A set of programs (C and VHDL compilers, selection, scheduling, code generation) allowing the synthesis of circuits. - Graphic interfaces, *PUDesigner and GFDesigner*, allowing the visualization and the handling of the data flow graphs and architectures. - A tool for power estimation at the architectural level, PowerCheck, operating from the architectures generated by the synthesis. It also uses as an input a file of parameters which makes it possible to characterize the technology of the circuit and the physical capacities of the chips. The signal can be specified in two different ways: either by its probabilities according to a model (white noise, DBT), or in the form of a file of vectors from which are extracted the probabilistic characteristics. As output, PowerCheck provides a report indicating the average powers dissipated by each part of the control and processing units. PowerCheck also gives the dissipated powers cycle by cycle by the various modules. - A tool for area and delay interconnection estimation, *Jfloorplanner*, operating at the architectural level. The input of the tool consists of a *netlist* generated by BSS. This netlist contains the whole of information related to the components and their interconnections. The tool provides indications concerning the final area of the floorplan, the length of the interconnections as well as the interconnection delays related to these lengths. A display of the estimated floorplan is available and can be used in order to carry out quickly the place and route step with standard CAO tools. To know some more, contact Daniel Chillet. # 6. New Results # 6.1. New architectures and technologies Our research aims at studying new architectures contributing to System-on-Chip design. Current works concern the following aspects: - new organization of reconfigurable structures (DART architecture) and their associated reconfigurable arithmetic units, - network-on-chip, - innovative architectures for low-power sensor networks, - multiple-valued logic circuits. **Key words:** reconfigurable architecture, grain of calculation, low-power consumption architecture, Network-on-Chip, NoC, CDMA, sensor network, multiple-valued logic, MVL, System-on-Chip, SoC. #### 6.1.1. DART architecture #### 6.1.1.1. Functional-level reconfigurable architecture Participants: Daniel Chillet, Raphaël David, Sébastien Pillement, Olivier Sentieys. Associating flexibility with high-performance and energy efficiency, is a critical issue for embedded applications. The definition of the DART architecture results from the state of the art in reconfigurable architectures, and from an analysis of next-generation telecommunication application domain. DART is a hierarchical architecture supporting different levels of parallelism. To exploit task parallelism, DART has been broken up into clusters. Distinct tasks can be processed concurrently by clusters, since each of them has its own control and storage resources. At the system level, tasks are distributed to clusters by a controller. This controller supports the real-time operating system which assigns tasks to clusters according to urgency and resources availability constraints. The system level of DART also includes shared memories (data, configuration) and a communication unit which allows its interfacing with external components through standard bus. Each cluster of DART integrates two types of processing primitives: several Reconfigurable Data-Path (RDP) used for arithmetic processing and an FPGA core to process data at the bit level. The RDPs are reconfigurable at the functional level to optimize the interconnections between arithmetic operators according to the calculation pattern. The FPGA core is reconfigurable at the gate level to efficiently support bit-level parallelism of processing. One of the main feature of DART is to support two RDP reconfiguration modes. During regular processing, the RDPs are dynamically reconfigured to be adapted to the calculation pattern. This reconfiguration, called here *hardware reconfiguration*, may take a few cycles, but is used for long period of time. On the contrary, during irregular processing, the calculation pattern is changing very often. In that case, the reconfiguration time has to be minimized, and the RDPs structure is modified thanks to *software reconfiguration*. Another important feature of a DART cluster is to exploit the redundancy in data-path to minimize the configuration data volume. Specifying several times a same configuration being able to be considered as an energy waste, we introduced a concept called *Single Configuration Multiple Data* (SCMD). Innovative reconfiguration schemes allow to deal concurrently with high-performance, flexibility and low energy constraints. A computation power of 6.2 GOPS combined with an energy efficiency of 40 MOPS/mW demonstrate its potential in the context of multimedia mobile computing applications. This work, presented in Raphaël David's PhD thesis [12], has been conducted under a collaboration with STMicroelectronics (PHRASE project) and received funding from the French industry ministry. #### 6.1.1.2. Reconfigurable arithmetic units Participants: Raphaël David, Sébastien Pillement, Taofik Saïdi, Olivier Sentieys. The DART architecture RDPs are organized around functional units and memories interconnected according to a fully-connected network. The functional units are dynamically reconfigurable. Each arithmetic unit allows a variety of operations, thus permitting a degree of flexibility in the calculation patterns occurring within a RDP. The design of the RDP unit combines high-performance and low-power due to the use of the most efficient arithmetic algorithms. #### 6.1.1.3. Study and technological validation of a new reconfigurable architecture Participants: Stéphane Chevobbe, Olivier Sentieys. This work is done at the CEA/LCEI in collaboration with R2D2. New architectural paradigms are needed to exploit capability of future CMOS processes (ultimate CMOS) and to support their constraints. The work consists in proposing an implementation of an architecture developed at CEA (LCEI - Laboratory on Embedded Computer and Images) which is a reconfigurable asynchronous architecture (RAMPASS) [27]. The structure is based on a duplication of the same block connected by a network. The main goal of the work is to find a well-suited network which is able to connect each cell with an other whatever the use of the network. #### 6.1.2. Network-on-Chip Participants: Jean-Marc Philippe, Sébastien Pillement, Olivier Sentieys. Two different concepts of an interconnection scheme are being studied. The first one is to use the CDMA (Code Division Multiple Access) physical-level access techniques to transmit messages over the network and thus to protect them against noise problems. We implemented an accurate bus model using the SPICE simulator to model the behavior of multiple wires and crosstalk effects as well as a transmission model using Matlab (to measure the efficiency of this technique). These models form the basis for architecture exploration and future developments. Several interconnection topologies have been studied and the improvement of CDMA over classical TDMA (Time Division Multiple Access) approaches for on-chip communications has been demonstrated. The second field of interest deals with design problems of communication channels in the case of SoC integrating a high number of IPs. Current solutions have some drawbacks which preclude them to be used in any kind of design. We study an heterogeneous, hierarchical design methodology to build the interconnection network which would best match the application requirements. The design of the network itself is currently being studied. #### 6.1.3. Innovative architectures for low-power sensor networks Participants: Mickaël Cartron, Olivier Sentievs. The simulation program of a wireless communication system between sensors was realized. This program allows us to study both performance and power consumption of nodes of the system. Several scenarios are considered, rendering the environment characteristics and the performance needs. With the help of this information, we can find an optimal functioning point across the whole network protocol stack, given a minimum performance and considering a given scenario for the application. After determining this point, we have to imagine new very flexible architectures that could be used for the processing. The simulation program also showed that for low data transmission rates, static power consumption in digital processing dominates dynamic power consumption, contrary to systems with high activities. We will focus on this point to design efficient architectures for these sensor networks. #### 6.1.4. Multiple-Valued Logic architectures and circuits Participants: Michel Aline, Ekué Kinvi-Boh, Olivier Sentieys. The performances of integrated systems are limited by complex wiring (a great amount of the chip performances is devoted to interconnection), large propagation delay and high-power consumption. With Multiple-Valued Logic (MVL), it is possible to reduce the amount of interconnections. Therefore, the power consumption caused by important switching activity on each node of a circuit can be reduced. A new concept for MVL design is here considered. The SUpplementary Symmetrical LOgic Circuit structure (SUS-LOC) is a new promising approach for the implementation of MVL functions in voltage-mode. It combines low-energy consumption and a speed equivalent to binary CMOS structures. A library of basic MVL logic, memory and arithmetic cells has been designed, characterized and compared with classical binary CMOS implementations [32]. The aim is to design a ternary DSP core and to measure its performances. We consider for this global DSP core the arithmetic processing unit (shifter, registers, multiplier, multiplexors, adder and ALU), the SRAM memory and the interconnection busses. All components have been analyzed and useful performance comparisons are made between radix-2 and radix-3 logics [31]. Estimation results on sub-modules showed globally that more than 50% energy reduction can be achieved in memory and arithmetic structures. A collaboration with the SOI group at the Catholic University of Louvain-La-Neuve (UCL) is underway and will allow to implement ternary functions in a $2\mu$ SOI CMOS process using the SUS-LOC concepts, with a first aim to validate our experimental estimations. A 64-tert SRAM and a 4-tert adder have been designed and fabricated at UCL. These two circuits represent the very first full-ternary circuit ever fabricated. They will be tested at the beginning of year 2004 using specifically fabricated test equipments. #### 6.2. Synthesis of parallel and dedicated accelerators Our research aims at developping methods and tools to synthesize parallel architectures for data-intensive applications expressed using the Alpha applicative language. These methods are implemented in the MMAlpha software. #### 6.2.1. Synthesis of multi-dimensional time architectures Participants: Anne-Claire Guillou, Patrice Quinton. **Key words:** architecture synthesis, CAD, ASIC, FPGA. This research is developed jointly with Tanguy Risset from the INRIA Rhône-Alpes CompSys team. Alpha programs can be scheduled by means of multi-dimensional timing functions: each calculation is assigned a time vector whose components represent different units of time, e.g. hours, minutes, seconds, etc. Generating architectures for multi-dimensional scheduled programs requires solving two problems. First, generating controllers for multi-dimensional clocks; second, synthesizing memories to store data that are kept alive during long period of times (e.g., hours or minutes). Techniques to solve these problems were studied and presented in [29] as well as in Anne-Claire Guillou's PhD thesis [13]. #### 6.2.2. Expression re-use in reduction expressions Participants: Gautam Gupta, Patrice Quinton. Reduction operators are available in Alpha to express computations such as sums or products, and more generally, any associative operator. In complex expressions, reductions may share sub-expressions. Detecting common sub-expressions and rewriting programs in such a way that they are computed only once may often reduce the complexity of the program. Such techniques could be useful to analyze and accelerate complex loop nests. This problem is currently being investigated by Gautam Gupta in close collaboration with Sanjay Rajophadhye from Colorado State University in Fort Collins. # 6.3. Exploration, estimation, and prototyping for the design of silicon systems **Key words:** architecture synthesis, flexible compilation, architecture modeling ASIP design, fixed-point arithmetic. The implementation of an application on a reconfigurable platform requires the setting up of a whole of various techniques (architecture synthesis, flexible compilation, fixed-point code generation, profiling, etc.) which, by successive refinements, contribute to the choices of implementation of the various parts of the application on the components of the platform. The research activities which we carry out aim at setting up methodologies allowing the implementation of various parts of an application on the various components of a platform. Current works concern the following aspects: • extensions to the ARMOR architecture description language, - the CALIFE compilation flow, - a methodology for automatic floating-point to fixed point conversion, - the software environment for reconfigurable platforms, - the SystemC modeling of processors, - the memory hierarchy in specialized SoC. #### 6.3.1. Extension to the ARMOR architecture description language Participants: François Charot, Ludovic L'Hours, Madeleine Nyamsi. Our research aims at developping methods to model programmable processors through their instruction sets and tools to derive software development environments from these processor models. A processor description in ARMOR is a grammar whose each derivation is a possible behavior of the instruction set. ARMOR thus describes the behavior of the instruction set, including its semantics, temporal information, the use of the resources, as well as the possibilities of parallelism at the instruction level. In order to model the control of the processor, we added extensions to the ARMOR language. Rules allowing the pipeline mechanisms (pipeline resources, interconnections) to be modeled and the coding of the instructions to be specified were introduced. #### 6.3.2. CALIFE compilation flow Participants: François Charot, Ludovic L'Hours. CALIFE is an experimental platform for retargetable code generation designed to satisfy the constraints of architecture exploration and the concurrent design of a specialized processor and its compiler. In order to allow CALIFE compilation flows to be specified in a convenient way, and also to guarantee a complete retargetability of the CALIFE environment, some extensions have been carried out. A new architecture of compilation passes has been designed. It allows the runtime creation of a compilation flow through an XML file description. It uses the concept of factory to instantiate new passes. Passes can be built in the CALIFE environment or dynamically loaded from a module which allows the user to create new passes of compilation and to use them without changing any part of the environment. #### 6.3.3. Methodology for automatic floating-point to fixed point conversion Participants: Nicolas Hervé, Daniel Menard, Taofik Saïdi, Olivier Sentievs. In our previous work, a methodology for the implementation of floating-point algorithms into fixed-point DSP processors under accuracy constraint has been proposed [4] and a tool has been developed in the case of DSP. Our work is focused on the extension of this methodology for hardware implementations and more particularly for FPGA architectures. This kind of architecture allows more flexibility regarding to word-length optimization since the architecture has to be synthesized. This work is a part of the OSGAR RNTL project whose aim is to develop high-level tools for designing FPGAs. The first part of this work corresponds to the creation of a basic arithmetic unit library including adders, subtracters, multipliers and comparators. All the components are characterized in terms of delay, power consumption and area (number of elementary cells, CLBs) for all operand word-lengths. This characterization process is achieved through synthesis with the Synplify software and leads to a XML database file which will be used for high-level synthesis and word-length optimization process. A methodology has been proposed and implemented for the case where all the data word-lengths are fixed to a same value. This assumption reduces dramatically the search space and allows to use a very simple optimization technique for minimizing the sum of all design data word-lengths. Next step will be to minimize a cost function expressed in terms of resource usage, speed or power consumption and to couple the high-level synthesis and the word-length optimization processes in order to obtain a more efficient solution. An important part of the floating-point to fixed-point process is the fixed-point specification accuracy evaluation. Thus, the goal is to extend our previous works to obtain an analytical accuracy evaluation method for all kind of systems. More particularly, the adaptive systems are under consideration. A study of the (N)LMS (Normalized Last Mean-Square) adaptive filter has been conducted. The previous published models are valid only for a convergent quantization law. Thus, we propose a more simple model and which is valid for different quantization laws. The model quality has been evaluated by comparing our estimation with the results obtained by simulations. The next work will be focused on the extension of this model to other adaptive filter structures and the generalization of this approach to all kind of linear and non-linear systems #### 6.3.4. Software environment for reconfigurable platforms #### 6.3.4.1. Compiler framework for DART Participants: François Charot, Daniel Chillet, Raphaël David, Sébastien Pillement, Olivier Sentieys. An efficient development flow is the key to exploit the computation power of functional-level reconfigurable architectures (e.g. DART). Hence, a compilation framework has been defined. It is based on the joint use of a front-end allowing the transformation and the optimization of the specification, a retargetable compiler and an architectural synthesis tool. The development flow allows the user to describe its applications in the C language. This high-level description is translated at first into Control and Data Flow Graph (CDFG), from which some automatic transformations (loop unrolling, loop kernel extractions, etc.) are done to optimize the execution time. After these transformations, the distinction between regular codes, irregular ones and data manipulations permits to translate, thanks to the compilation and the architectural synthesis, a high level description of the application into binary executable codes for DART. A cycle-accurate bit-accurate simulator developed in SystemC finally allows to validate the implementation and to evaluate its performance and energy consumption. #### *6.3.4.2. Real-time schedulers* Participants: Imène Benkermi, Sébastien Pillement, Olivier Sentieys. This research is related to the software environment of a hardware platform dedicated to multimedia applications consisting of general-purpose processors and reconfigurable/specialized accelerators including the Dart reconfigurable architecture. The work concerns the design of an on-line scheduler able to distribute the task set on the different computing units, while meeting their real-time constraints and taking into account the heterogeneity of these units; i.e. one task may have different execution times depending on the unit it is executed on. We proposed the extension to heterogeneous architectures of an approximate method for optimization problems using neural networks. The Hopfield model is used. A network construction rule, allowing the network to converge to a stable state meeting the constraints imposed by the task and the architecture models, was proposed [21]. In order to validate the method, simulations on complex multimedia applications are under way. #### 6.3.4.3. Operating systems developments Participants: Daniel Chillet, Sébastien Pillement, Olivier Sentieys. In the context of the DART development, we studied the interaction between reconfigurable platforms, including a general processor (such as ARM or Lx processor) and reconfigurable modules, and the operating system. In this context, the chosen operating system is a reduced version of Linux (uCLinux). uCLinux is about 2 times less larger than Linux and its main drawback is that it can not handle virtual memory. The integration model of DART is based on a master/slave behavior, where the processor runs the operating system and interacts with the reconfigurable coprocessor (e.g. DART) as a peripheral. The exchanges between the processor and DART are supported by shared memory. The application is specified using Posix threads. The communications are ensured by a generic technique which can be instantiated with several data structures. A DART driver has been included in the uCLinux kernel with the definition of dynamic memory space management. An MPEG 2 decoder has been decomposed in several tasks using Posix threads to validate this framework. A patch has been added to the *armulator* (GNU gdb debugger for ARM processor) to simulate the application running on the platform. Future works will be to prototype this platform (ARM, DART, uCLinux) on Altera' Excalibur component. #### 6.3.5. SystemC modeling of processors Participants: François Charot, Daniel Chillet, Sébastien Pillement, Olivier Sentieys. Within the context of our participation to the CNRS SocLib team, we have developed a cycle-accurate bit-accurate SystemC model of a VLIW processor. The processor core is based on four arithmetic and logic units (2 ALUs, 1 multiplier unit and 1 memory access unit). A communication unit, supporting the interface to the VCI (Virtual Component Interface) bus protocol has been added. The VCI wrapper is based on the peripheral and basic VCI protocol. Future works will focus on the Advanced VCI protocol. #### 6.3.6. Memory hierarchy Participant: Daniel Chillet. We have defined an architecture model for the memory hierarchy in a specialized SoC and a methodology to find the appropriate placement of data into the hierarchy. The methodology is built on iterative algorithms which first evaluate the placement from a 2-level hierarchy (fast scratch-pad memory and slow global memory) with a global objective of power-consumption limitation. Then, the algorithm evaluate the opportunity to increase the number of memories into the hierarchy to further reduce the energy. A memory library has been characterized in energy and access time for read and write access in a $0.13~\mu m$ technology. Some experiments on embedded multimedia applications have demonstrated that memory hierarchy can offer a drastic reduction in terms of energy consumption. #### 6.4. Study of applications The field of applications is that of third-generation mobile telecommunication, adaptive filtering, image indexing, ciphering, biomedical and speech processing. **Key words:** mobile telecommunication, WCDMA, image indexing, cryptographic applications, biomedical, speech processing. #### 6.4.1. 3G mobile application prototyping **Participants:** François Charot, Michel Guitton, Madeleine Nyamsi, Sébastien Pillement, Taofik Saïdi, Pascal Scalart, Olivier Sentieys, Charles Wagner. Applications stemming from third-generation radio-communication systems are good candidates for the study of hardware systems mixing programmable parts executing software code and specialized modules dedicated to the acceleration of time consuming parts of applications. WCDMA is typically considered as one of the most critical application of next-generation telecommunication systems. The main idea of WCDMA (Wideband Code Division Multiple Access) is to share the communication support between several users by scrambling user symbol with a pseudo-noise code. This access technique adapts the signals to the communication support by spreading its spectrum. The complexity of the algorithms requires architectural exploration to evaluate performance needs. Programmable devices (ST200/Lx and C64x DSP) and reconfigurable architectures (Virtex Xcv2kE FPGA, DART architecture) have been studied. Performances and power consumption of these implementation have been analyzed [36][37]. A WCDMA communication chain (transmitter and receiver) was specified using Matlab. This chain will constitute the reference version for the studies to come. An implementation of this chain using simulink is currently being developed with an aim of study of prototyping on the SignalMaster hardware platform (this platform combines a DSP processor and a FPGA circuit). Furthermore, a real-time WCDMA transmitter/receiver has been implemented on an FPGA integrated in the RC1000 card from Celoxica. #### 6.4.2. Multi-user detection Participant: Patrice Quinton. Together with Daniel Massicotte (University of Québec at Trois-Rivières) and Tanguy Risset (INRIA Rhônes-Alpes CompSys), we studied chip architectures for Wideband CDMA receivers for 3G Wireless Communications. Our first application considers implementations of a multi-user detection based on a linear MMSE adaptive filter, whereas the second application concerns a Parallel Interference Cancellation (PIC) algorithm. Using MMAlpha, we were able to produce a complete specification for two architectures in the first case, and for one architecture in the second case. For all cases, we were able to write the application in a structured manner, to partially check the functionality of the code, to generate a C program, to schedule the code, and to generate a VHDL synthesizable program for a significant part of the architecture. As a result, a first estimation of the architecture could be derived. #### 6.4.3. Image indexing Participant: Patrice Quinton. Image indexing is an interesting application for parallel architecture synthesis, as it is based on a very simple and extremely time consuming algorithm. The kernel of an image indexing algorithm was expressed using Alpha and the automatic synthesis of an architecture is currently being studied together with Auguste Noumsi from University of Douala (Cameroun). #### 6.4.4. Cryptographic applications Participants: François Charot, Charles Wagner. The fine grain structure of Field Programmable Gate Arrays (FPGAs) is quite suitable for the efficient hardware implementation of ciphering algorithms. We studied parallel architectures for the Advanced Encryption Standard (AES) algorithm and implemented them on FPGA. Two architectures were studied and presented in [25] as well as in [26]. They showed that a rate of more than 10 Gbits/sec for full pipelining can be achieved. #### 6.4.5. Biomedical multi-sensor Participants: Pascal Scalart, Olivier Sentieys. A smart multi-sensor kernel has been developed in cooperation with Aphycare Technologies company [24]. This smart multi-sensor kernel is composed of a wrist-strap topped by a little case integrating a digital signal processor (DSP), several sensors and a radio communication link. Several signals are available from the sensors and are then processed by the DSP to estimate some physiological parameters. The different combinations of these parameters allow us to define a set of physiological states and to retrieve the one the patient is currently in. #### 6.4.6. Speech processing Participant: Pascal Scalart. The problem of single microphone speech enhancement in noisy environments is studied in cooperation with FT R&D laboratories in Lannion. Common short-time noise reduction (SNR) techniques proposed in the literature are expressed as spectral gains depending on the a priori signal-to-noise ratio. In the classical decision-directed approach, the a priori SNR depends on the previous form speech spectrum estimation. As a consequence the gain function matches the previous frame rather than the current one, thus degrading the noise reduction performance. To avoid this problem, we have developed a new method called two-step noise reduction technique which solves this problem while maintaining the benefits of the decision-directed approach. This algorithm has been selected by the ETSI Aurora working group to design an advanced algorithm for distributed speech recognition (DSR) in environments. # 7. Contracts and Grants with Industry # 7.1. PEA Hades: high-speed and secure networks (2002-2003) Participants: François Charot, Ludovic L'Hours, Charles Wagner. Work which is completed by the team relates to the implementation study of security algorithms (cryptography algorithms) and their adaptation to the context of high-speed networks, with a particular focus on the integration and the optimization of the ciphering algorithms: respect of the high-throughput requirements (gigabit/s) of encryption algorithms (AES [55]), exploratory study on the terabit/s. The second aspect being studied, in the context of the design of a cipher component for high-speed and secure network, concerns the influence of the ciphering part on security protocols (IPsec). #### 7.2. IST Ozone (2002-2004) Participants: François Charot, Madeleine Nyamsi, Patrice Quinton, Charles Wagner. The IST Ozone project (*New Technologies and services for emerging nomadic societies*) began in November 2001. It gathers the following partners: Philips Electronics (Netherlands), Imec (Belgium), Epictoid (Netherlands), Eindhoven University of Technology (Netherlands), INRIA, Thomson Multimedia (France). The Ozone project aims at investigating, defining, implementing and integrating a generic platform for ambient intelligence applications. This project aims at making more convivial the interactions of the user with the equipments and the applications to allow new services of better quality. One of the research orientations relates to hardware architectures on which can be implemented the ambient intelligence. The object of the work undertaken by the team relates to the integration of the MMAlpha and CALIFE tools, and its use for compilation of intensive calculation parts of applications. # 7.3. PHRASE: reconfigurability and VLIW processors in parallel heterogeneous architectures Participants: Daniel Chillet, Raphaël David, Sébastien Pillement, Olivier Sentieys. PHRASE is supported by contracts of the Ministry for industry and finances (MEFI/STSI) since 1999. Work is related to studies aiming at the definition of new generation of completely programmable and reconfigurable integrated circuits. The studies are undertaken within the framework of a co-operation between STMicroelectronics, the team AS from the university of Western Brittany, and R2D2. # 7.4. Architectures based on multiple-valued logic for telecommunication applications (2001-2004) Participants: Daniel Chillet, Hélène Dubois, Ekué Kinvi-Boh, Sébastien Pillement, Olivier Sentieys. Until the development of the SUS-LOC technology by E.D. Olson, the MVL techniques were not realizable in practice and remained only theoretical. This technology makes it possible to fulfill any MVL function with a complexity equivalent to CMOS technology in the binary case and uses standard circuits foundries. The international patent protecting it was classified like one of the most innovative of these last years by the US Patent Office. An active research collaboration was established with EDO LLC, the american company founded by D. Olson, for the study and the development of new systems using the MVL, in particular in the field of digital signal processors (DSP) for telecommunications. #### 7.5. OSGAR (2003-2005) Participants: Daniel Chillet, Nicolas Hervé, Daniel Menard, Sébastien Pillement, Olivier Sentieys. OSGAR is a RNTL project, gathering the following partners: CEA-list, TNI-Valiosys, the university of Western Brittany, and R2D2. This project aims at studying and developping tools for high-level synthesis able, starting from C code, semi-formal specifications or object code, to carry out an automatic migration towards one or more reconfigurable circuits. The object of the work undertaken by the team relates to the following points. • the adaptation of the tools of the circuit design BSS software platform to reconfigurable architectures, in order to take into account in an automatic way the data coding and the size of the operators. - the modeling of reconfigurable architecture from the point of view of the developed software tools. The objective is to integrate in the models the power consumption aspects. The goal is to be able to provide estimates of power dissipated during prototyping. - the validation by the implementation of two applications (image processing, WCDMA) on the various architectures considered in the project. # 8. Other Grants and Activities #### 8.1. National initiatives R2D2 members take part to the specific actions of the SoC Pluridisciplinary Thematic Network (RTP SoC of the CNRS): « Dynamically reconfigurable architectures » (AS27), « Specification and design of an IP block library » (AS28), « Operating system and multiprocessor architectures » (AS29), and also to the following other actions: « software radio » (AS37), « computer arithmetic » (AS78) and « compilation for embedded systems » (AS82). The team R2D2 participates to the activities of two multi-laboratory team of RTP SoC: Pomard and SocLib. The team R2D2 participates to the activities of: - GdR-PRC ISIS (Information Signal ImageS), working group GT7 « Algorithms Architectures Adequation ». - GdR-PRC ARP (Architectures Réseaux et Parallélisme), working group « Specialized architectures ». #### 8.1.1. RDISK: Reconfigurable DISK Participants: Steven Derrien, Ludovic L'Hours. The Reconfigurable DISK, is a joint project (January 2002 - December 2003) between the Symbiose and R2D2 teams, funded by the French Research ministry. Its goal is to develop a specialized architecture following the « smart disk » concept. The idea is to attach reconfigurable computation capabilities near the disk for providing on-the-fly data filtering to speed-up large database scanning. The target application field is genomic data extraction, and a 48 disk system is currently assembled for experimentation. The team R2D2 is involved in the design, implementation and validation of the RDISK Programmable System-on-Chip, which is based upon a Xilinx FPGA. The goal was to provide a small foot-print (in terms of resource usage) SoC. Among others tasks, this project included the design of a SoC bus arbiter, of a high-performance SDRAM controller and of an ATA/IDE hard drive controller. A significant amount of work has also been done on the design of a light-weight operating system layer, whose purpose is to handle the RDISK dynamic reconfiguration capability and to provide simple communication primitives between the host and the boards. All these contributions have been successfully tested on the system, and now serve as a framework for all others RDISK project participants. #### 8.1.2. ReMiX: Reconfigurable Memory for Indexing Huge Amount of Data Participants: François Charot, Steven Derrien. Indexing is a well-known technique that accelerates searches within large volumes of data such as the ones needed by applications related to genomics, to content-based image or text retrieval. The ReMiX project proposes the design of a dedicated and very large RAM index memory (several hundred of Giga-bytes, distributed among a cluster of PCs), big enough to entirely store huge indexes in main memory avoiding the use of any disk. In addition, the index memory uses reconfigurable hardware resources to tailor – at the hardware level – the memory management to best support the specific properties of the indexing schemes. It also offers the opportunity to implement algorithms having potential parallelism. This three-year project (October 2003 - September 2006), coordinated by the Symbiose project, is funded by the French ministry (ACI Data Mass program). The team R2D2 is involved in the design of the hardware platform. #### 8.2. International bilateral relations #### 8.2.1. Europe R2D2 cooperates with the University of Leiden in the Netherlands (Ed Deprettere) on parallel architecture synthesis. R2D2 cooperates with UCL at Louvain-La-Neuve on the topic of ternary technology integrated circuits. A prototype circuit is under development with the SOI technology of the micro-electronics laboratory (DICE of UCL). #### 8.2.2. Africa Within the framework of the Cari project and FICU program of the University Agency for Francophonie, R2D2 has a cooperation with the university of Yaoundé (Cameroun) on systolic algorithmic. R2D2 cooperate with ENIT in Tunis on the topic of mobile telecommunication architectures. R2D2 cooperates with the university of Antanarivo in Madagascar, for the training of faculty member. #### 8.2.3. North America R2D2 cooperate with the LSSI laboratory of Trois-Rivières university in Québec, on the design of architectures for filters. R2D2 maintains relations with the computer science department of the University of Colorado State in Fort-Collins on the development of MMAlpha. R2D2 cooperate with the LRTS laboratory of Laval university in Québec on the topic of architectures for MIMO systems. # 8.3. Visiting scientists - M. Mensi (ISET Tunisia), A. Bouallegue (ENIT Tunisia), A. Ben Rabaa (ENIT Tunisia), R. Bouallegue (SupCom Tunisia), from 12/1/2003 until 12/7/2003. - E. Damergi (ENIT Tunisia) from 9/1/2003 for 10 weeks. - I. Viorela (University of Girona, Spain) from 02/15/03 for 3 months. - S. Mathieu and S. Roy (University of Laval, Québec) from 10/20/2003 for 2 weeks. - P. Bouchard (University of Laval, Québec) from 11/15/2003 for 3 weeks. - S. Piestrak (Wroclaw University, Poland) from 6/6/2003 for 1 month and from 11/15/2003 for 1 month. # 9. Dissemination # 9.1. Activities in the scientific community F Charot is steering committee member of the SoC Pluridisciplinary Thematic Network (RTP SoC), set up at the department STIC of the CNRS. F. Charot managed the archi03 school (embedded systems architecture and associated design tools) in April 2003 in Roscoff. Sébastien Pillement served as technical program committee member for 13th International Conference on Field Programmable Logic and Application FPL, 2003. - P. Quinton was invited to the Workshop System Architecture MOdelling and Simulation, in July 2003 in Samos, Greece. - C. Wolinski was Session chair at DAC'2003. He gave seminars at University of Colorado State, Fort Collins, USA (march 2003) on « New Architecture for Reconfigurable Computing System », at University of Eindhoven, Netherlands (may 2003) on « Design Tool for a Heterogeneous Fabric Generation » and at VERIMAG, France (may 2003) on « New Architecture for Reconfigurable Computing System ». He was a speaker at REASON Summer School on FPGA-based and Re-configurable Systems, Ljubljana, August 2003. # 9.2. Teaching and responsabilities - F. Charot is responsible for a course on « Applications of architecture in telecommunications » in DIIC ARC, Ifsic. - D. Chillet teaches a course on « advanced processors architectures » in Master STIR. - H. Dubois is the associate academic director at Enssat. - M. Guitton is in charge of the communication at Enssat. - L. Perraudeau is the associate director of Ifsic (responsible for the budget and the equipment). He is responsible for a course on the object languages in the DESS Isa (Computer science and its applications) of the university of Rennes 1, teaches the design of integrated circuits in DIIC second year), and teaches in Licence d'informatique, in Deug Sciences, mention SM and STPI. - P. Quinton is the director of Ifsic. He is responsible for the parallel algorithmic course (Alpa module) in the Master in computer science of the university of Rennes 1, teaches in Deug Sciences, mention SM and STPI, and in DIIC (second and third year). - O. Sentieys is responsible for a signal and architecture module of the Master STIR of the University of Rennes 1 and the DRT in electronic of Enssat. He teaches at Enssat and gives courses on « Methodologies for integrated system design » in Master STIR and on « Low-power digital CMOS circuits » at Enst de Bretagne. Graduate student intern: F. Ben Abdallah (ENIT, Tunisia), M. Cartron (Enssat, France), I. Chaieb (ENIT, Tunisia), A.M. Chana (university of Yaoundé, Cameroun), S. Cochard (université of Savoie, France), N. Larhiq (université of Grenoble, France), V. Letourneux (Enssat, France), J. Morel (Ifsic, France), R. Rocher (Enssat, France), L. Vitte (Enssat, France). # 10. Bibliography # Major publications by the team in recent years [1] F. CHAROT, G. LE FOL, P. LEMONNIER, C. WAGNER, C. BOUVILLE, R. BARZIC. *Towards Hardware Building Blocks for Software-Only Real Time Video Processing: the MOVIE Approach.* in « IEEE Transactions on Circuits and Systems for Video Technology », number 6, volume 9, September, 1999. [2] R. DAVID, D. CHILLET, S. PILLEMENT, O. SENTIEYS. SOC Design Methodologies. Kluwer Academic Publishers, 2002, chapter A Dynamically Reconfigurable Architecture for Low-Power Multimedia Terminals, pages 51–62. - [3] J. DIGUET, D. CHILLET, O. SENTIEYS. A Framework for High Level Estimations of Signal Processing Implementations. in « Journal of VLSI System for Signal, Image and Video Technology », number 3, volume 25, July, 2000. - [4] D. MÉNARD. Méthodologie de compilation d'algorithmes de traitement du signal en précision infinie pour les processeurs en virgule fixe. Thèse de doctorat, Université de Rennes 1, December, 2002. - [5] C. MAURAS. Alpha: un langage équationnel pour la conception et la programmation d'architectures parallèles synchrones. Thèse de doctorat, Université de Rennes 1, December, 1989. - [6] D. MENARD, O. SENTIEYS. *Automatic Evaluation of the Accuracy of Fixed-point Algorithms*. in « IEEE/ACM Design, Automation and Test in Europe (DATE-02) », Paris, March, 2002. - [7] V. MESSÉ. *Production de compilateurs flexibles pour la conception de processeurs programmables spécialisés.* Thèse de doctorat, Université de Rennes 1, March, 1999. - [8] P. QUINTON, V. V. DONGEN.. *The mapping of linear recurrence equations on regular arrays.* in « Journal of VLSI Signal Processing », volume 1, 1989, pages 93-113. - [9] P. QUINTON, Y. ROBERT. Systolic Algorithms and Architectures. Prentice Hall and Masson, 1989. - [10] S. V. RAJOPADHYE, S. PURUSHOTHAMAN, R. M. FUJIMOTO. On Synthesizing Systolic Arrays from Recurrence Equations with Linear Dependencies. in « Proceedings, Sixth Conference on Foundations of Software Technology and Theoretical Computer Science », Springer Verlag, LNCS 241, pages 488-503, New Delhi, India, December, 1986. - [11] F. DUPONT DE DINECHIN. Systèmes structurés d'équations récurrentes : mise en œuvre dans le langage Alpha et applications. Thèse de doctorat, université de Rennes I, January, 1997. #### Doctoral dissertations and "Habilitation" theses - [12] R. DAVID. Architecture reconfigurable dynamiquement pour applications mobiles. Thèse de Doctorat, Université de Rennes, July, 2003. - [13] A.-C. GUILLOU. Synthèse architecturale basée sur le modèle polyédrique : validation et extensions de la méthodologie MMAlpha. Thèse de Doctorat, Université de Rennes, December, 2003. # Articles in referred journals and book chapters [14] R. DAVID, D. LAVENIER, S. PILLEMENT. *Du microprocesseur au circuit FPGA : une analyse sous l'angle de la reconfiguration.* in « Technique et Science Informatiques », 2003, to appear. - [15] R. DAVID, S. PILLEMENT, O. SENTIEYS. *Low Power Electronics Design*. CRC Press, 2003, chapter Energy-Efficient Reconfigurable Processors, To appear. - [16] S. DERRIEN, A.-C. GUILLOU, P. QUINTON, T. RISSET, C. WAGNER. *Automatic Synthesis of Efficient Interfaces for Compiled Regular Architectures*. S. BHATTACHARYYA, E. DEPRETTERE, J. TEICH, editors, in « Domain-Specific Processors: Systems, Architectures, Modeling, and Simulation », Marcel Dekker, 2003, chapter 10, To appear. - [17] M. GOKHALE, J. FRIGO, K. MCCABE, J. THEILER, C. WOLINSKI, D. LAVENIER. *Experience with a Hybrid Processor: K-Means Clustering.* in « Special Issues of the Journal of Supercomputing », number 4, volume 24, 2003. - [18] K. KUCHCINSKI, C. WOLINSKI. Global Approach to Scheduling Complex Behaviors based on Hierarchical Conditional Dependency Graphs and Constraint Programming. in « Journal of Systems Architecture », 2003, to appear. - [19] D. MENARD, T. SAÏDI, D. CHILLET, O. SENTIEYS. *Implantation d'algorithmes spécifiés en virgule flottante dans les DSP virgule fixe.* in « Technique et Science Informatiques », number 2, volume 22, 2003, pages 783-809. - [20] C. WOLINSKI, M. GOKHALE, K. MCCABE. *Polymorphous fabric-based systems: Model, tools, applications.* in « Journal of Systems Architecture », number 4-6, volume 49, September, 2003. #### **Publications in Conferences and Workshops** - [21] I. BENKERMI, S. PILLEMENT, O. SENTIEYS. Application des réseaux de neurones à l'ordonnancement de tâches temps réel sur une architecture multiprocesseurs hétérogènes. in « SYMPposium en Architectures nouvelles de machines, SympA'2003 », pages 372–379, La Colle sur Loup, France, October, 2003. - [22] F. BOUTEILLE, P. SCALART, B. KOVESI. *Packet loss concealment using audio morphing*. in « Speech processing, Transmission and Quality aspects (STQ) Workshop Compensating for Packet Loss in Real-Time Applications », February, 2003. - [23] S. Bruno, P. Scalart. Application des techniques de démodulation AM/FM à l'estimation du pouls et de la respiration à partir de signaux biomédicaux. in « Colloque GRETSI sur le traitement du signal et des images 2003 », Paris, France, September, 2003. - [24] S. Bruno, P. Scalart. Smart Multisensor Kernel. in « Smart Objects Conference SOC'2003 », May, 2003. - [25] F. CHAROT, E. YAHYA. Fully-Modular Partially-Pipelined AES Implementation in Counter Mode using ALTERA FPGA. in « Proceedings of 2003 International Conference on Electronic Sciences, Information Technology and Telecommunication », Susa, Tunisia, March, 2003. - [26] F. CHAROT, E. YAHYA, C. WAGNER. *Efficient Modular-Pipelined AES Implementation in Counter Mode on ALTERA FPGA*. in « Proceedings of FPL 03 », LNCS 2778, pages 282–291, Lisbon, Portugal, September, 2003. [27] S. CHEVOBBE, N. VENTROUX, F. BLANC, T. COLLETTE. RAMPASS: Reconfigurable And Advanced Multi-Processing Architecture for future Silicon System. in « Proceedings of Samos III Workshop », July, 2003. - [28] H. GNABA, M. JAIDANE-SAIDANE, P. SCALART. *Introduction of the CELP structure of the GSM coder in the acoustic echo canceller for the GSM network.* in « 8th European Conference on Speech Communication and Technology », September, 2003. - [29] A.-C. GUILLOU, P. QUINTON, T. RISSET. *Hardware Synthesis for Multi-Dimensional Time*. in « Proceedings of ASAP 03 », IEEE Press, pages 40–51, The Hague, The Netherlands, June, 2003. - [30] E. Kinvi-Boh, M. Aline, O. Sentieys. *Conception d'un processeur ternaire à faible énergie.* in « Colloque Faible Tension Faible Consommation (FTFC'03) », May, 2003. - [31] E. KINVI-BOH, M. ALINE, O. SENTIEYS, E. OLSON. *Design and Characterization of a Low-Power Ternary DSP.* in « International Signal Processing Conference (ISPC'03) », Dallas, US, April, 2003. - [32] E. KINVI-BOH, M. ALINE, O. SENTIEYS, E. OLSON. *MVL Circuit Design and Characterization using SUS-LOC structure*. in « IEEE International Symposium on Multiple-Valued Logic (ISMVL'03) », Tokyo, Japan, May, 2003. - [33] D. LAVENIER, S. GUYÉTANT, S. DERRIEN, S. RUBINI. A reconfigurable parallel disk system for filtering genomic banks. in « Engineering of Reconfigurable Systems and Algorithms », 2003. - [34] D. MASSICOTTE, P. QUINTON, A. O. DAHMANE, T. RISSET. Fast Exploration of Parallel Architectures for Multi-User Detection Algorithms: A Case Study. in « Proceedings of Samos III Workshop », July, 2003. - [35] D. MENARD, M. GUITTON, R. DAVID, S. PILLEMENT, O. SENTIEYS. Évaluation comparative de platesformes reconfigurables et programmables pour les télécommunications de 3ème génération. in « Colloque GRETSI sur le traitement du signal et des images 2003 », Paris, France, September, 2003. - [36] D. MENARD, M. GUITTON, S. PILLEMENT, O. SENTIEYS. *Design and Implementation of WCDMA Platforms: Challenges and Trade-offs.* in « International Signal Processing Conference (ISPC'03) », Dallas, US, April, 2003. - [37] D. MENARD, M. GUITTON, P. QUEMERAIS, O. SENTIEYS. *Efficient Implementation of a WCMA Rake Receiver on the TMS320C64x*. in « 37th Asilomar Conference on Signals, Systems and Computers », Monterey, US, November, 2003. - [38] S. PILLEMENT, R. DAVID, O. SENTIEYS. *Architectures reconfigurables : opportunités pour la faible consommation*. in « Colloque Faible Tension Faible Consommation (FTFC'03) », May, 2003. - [39] C. WOLINSKI, M. GOKHALE, K. MCCABE. Fabric-Based Systems: Model, Tools, Applications. in « IEEE Symposium on Field-Programmable Custom Computing Machines », April, 2003. - [40] C. WOLINSKI, M. GOKHALE, K. MCCABE. Rapid Construction of Reconfigurable Computing Fabrics for Systems on a Programmable Chip. in « IEEE HPCA/SSRS '03 », February, 2003. [41] C. WOLINSKI, F. TROUW, M. GOKHALE. A Preliminary Study of Molecular Dynamics on Reconfigurable Computers. in « ERSA'03 », February, 2003. #### Bibliography in notes - [42] D. C. CRONQUIST, P. FRANKLIN, C. FISHER, M. FIGUEROA, C. EBELING. Architecture Design of Reconfigurable Pipelined Datapath. in « Advance Research in VLSI », 1999. - [43] A. DEHON. Reconfigurable Architecture for General-Purpose Computing. Ph. D. Thesis, MIT, 1996. - [44] G. EPSTEIN. Multiple-Valued Logic Design: An introduction. Institute of Physics Publishing, Bristol, 1993. - [45] S. C. GOLDSTEIN, H. SCHMIT, M. BUDIU, S. CADAMBI, M. MOE, R. R. TAYLOR. *PipeRench: A Reconfigurable Architecture and Compiler.* in « IEEE Computer », April, 2000. - [46] T. GRÖTKER, E. MULTHAUP, O. MAUSS. *Evaluation of HW/SW Tradeoffs Using Behavioral Synthesis*. in «ICSPAT'96», Boston, October, 1996. - [47] R. HARTENSTEIN. A Decade of Reconfigurable Computing: A Visionary retrospective. in « Design Automation and Test in Europe (DATE) », 2001. - [48] J. HAUSER, J. WAWRZYNEK. *GARP: A MIPS processor with a reconfigurable coprocessor.* in « IEEE Symposium on FPGAs for Custom Computing Machines », June, 1997. - [49] H. KEDING, M. COORS, O. LUTHJE, H. MEYR. *Fast Bit True Simulation*. in « Design Automation Conference 2001 (DAC 2001) », Las Vegaus, June, 2001. - [50] K. KEUTZER, S. MALIK, R. NEWTON, J. RABAEY, A. SANGIOVANNI-VINCENTELLI. System Level Design : Orthogonalization of Concerns and Platform-based Design. in « IEEE Transactions on Computer-Aided of Circuits and Systems », number 12, volume 19, December, 2000. - [51] K. KUM, J. KANG, W. SUNG. AUTOSCALER for C: An optimizing floating-point to integer C program converter for fixed-point digital signal processors. in « IEEE Transactions on Circuits and Systems II », volume 47, September, 2000, pages 840-848. - [52] R. LEUPERS. Retargetable Code Generation for Digital Signal Processors. Kluwer Academic Publishers, 1997. - [53] G. Lu, H. Singh, M. Lee, N. Bagherzadeh, F. Kurdahi, E. Filho. *The MorpoSys Parallel Reconfigurable System.* in « Euro-Par'99, LNCS 1685 », 1999. - [54] C. MAURAS. Alpha: un langage équationnel pour la conception et la programmation d'architectures parallèles synchrones. Ph. D. Thesis, Université de Rennes 1, December, 1989. - [55] J. NECHVATAL, E. BARKER, L. BASSHAM, W. BURR, M. DWORKIN, J. FOTI, E. ROBACK. Report on the development of the Advanced Encryption Standard (AES). Technical report, National Institute of Standard - and Technology, October, 2000. - [56] S. PEES, A. HOFFMANN, V. ZIVOJNOVIC, H. MEYR. LISA Machine Description Language for Cycle-Accurate Models of Programmable DSP Architectures. in « DAC 1999 », June, 1999. - [57] J. RABAEY. A low-energy heterogeneous reconfigurable DSP IC. in « Design Automation Conference (DAC) », June, 2000. - [58] C. RUPP, M. LANDGUTH, T. GRAVERICK, E. GOMERSALL, H. HOLT. *The NAPA Adaptative Processing Architecture*. in « IEEE Symposium on FPGAs for Custom Computing Machines », April, 1998. - [59] M. H. S. HAUCK, J. KAO. *The Chimera Reconfigurable Functional Unit.* in « IEEE Symposium on FPGAs for Custom Computing Machines », 1997. - [60] A. SANGIOVANNI-VINCENTELLI, G. MARTIN. *Platform-Based Design and Software Design Methodology for Embedded Systems*. in « IEEE Design and Test of Computers », November, 2001. - [61] R. SCHREIBER, S. ADITYA, S. MAHLE, V. KATHAIL, B. RAU, D. CRONQUIST, M. SIVARAMAN. *PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators*. Technical report, number HPL-2001-249, HP Laboratories Palo Alto, October, 2001. - [62] M. WILLEMS, V. BURSGENS, H. KEDING, H. MEYR. System Level Fixed-Point Design Based On An Interpolative Approach. in « Design Automation Conference (DAC-97) », 1997.