# **Activity Report 2011** # **Project-Team DART** Contributions of the Data Parallelism to Real Time RESEARCH CENTER Lille - Nord Europe **THEME** **Embedded and Real Time Systems** # **Table of contents** | 1. | Members | | <u>1</u> | |----|------------------|---------------------------------------------------------------------------------------|----------------| | 2. | Overall Objecti | ves | <mark>2</mark> | | | 2.1. Introducti | | 2 | | | 2.2. Highlight | | 3 | | 3. | Scientific Found | lations | <b>3</b> | | | 3.1. Introducti | | 3 | | | | ling for HP-SoC design | 4 | | | 3.2.1. Foun | | 4 | | | | System-on-Chip Design | 4 | | | | Model-driven engineering | 4 | | | | Models of computation | 4 | | | | ributions of the team | 5 | | | | High-level modeling in Gaspard2 | 5 | | | 3.2.2.2. | Intermediate concept modeling and transformations | 6 | | | 3.2.2.3. | An operational semantics for RSM | 7 | | | 3.2.2.4. | Clock-based modeling of embedded system behavior | 7 | | | 3.2.2.5. | High-level modeling and exploration of non functional properties | 9 | | | 3.2.2.6. | HPF towards Marte | 9 | | | | MARTE extensions for reconfigurable based systems | 9 | | | 3.2.2.8. | • | 10 | | | 3.2.2.9. | Transformation migration after metamodel evolution | 10 | | | | Model transformation towards SystemC-PA pard2 for avionic hybrid test platform design | 10 | | | | • • | 10 | | | 3.3.1. Foun | sed optimization and compilation techniques | 11<br>11 | | | 3.3.1.1. | | 11 | | | 3.3.1.2. | Transformation and traceability | 11 | | | | ributions of the team | 12 | | | 3.3.2.1. | Data-parallel code transformations | 12 | | | 3.3.2.2. | Multi-objective hierarchical scheduling heuristics | 12 | | | 3.3.2.3. | Transformation techniques | 13 | | | 3.3.2.4. | Traceability | 13 | | | 3.3.2.5. | Verifying conformance and semantics-preserving model transformations | 13 | | | 3.3.2.6. | Modeling for GPU | 13 | | | 3.3.2.7. | Clock-based design space exploration for SoCs | 14 | | | 3.3.2.8. | Optimized code generation from UML/MARTE models | 14 | | | 3.3.2.9. | Architecture exploration based on meta-heuristics | 14 | | | | Architecture exploration for efficient data transfer and storage | 14 | | | | Multi-objective mapping and scheduling heuristics | 14 | | | | GPGPU code production | 15 | | | 3.3.2.13. | From MARTE to OpenCL. | 15 | | | 3.3.2.14. | | specific | | | 1 | anguages | 15 | | | | romagnetic modeling | 15 | | | 3.4. HP-SoC s | imulation, verification and synthesis | 16 | | | | dations | 16 | | | 3.4.1.1. | Abstraction levels and Transaction Level Modeling | 16 | | | 3.4.1.2. | Dynamic reconfiguration - FPGA | 17 | | | 3.4.1.3. | Verification | 17 | | | 3.4.2 | . Conti | ributions of the team | 17 | |----|--------|--------------|----------------------------------------------------------------------------------|----| | | 3 | .4.2.1. | Co-simulation in SystemC | 17 | | | 3 | .4.2.2. | Model transformation towards Pthreads | 18 | | | 3 | .4.2.3. | Gaspardlib extensions | 18 | | | 3 | .4.2.4. | Partial and Dynamic Reconfiguration (PDR) implementations | 18 | | | 3 | .4.2.5. | IP based configurable massively parallel processing SoC | 18 | | | 3 | .4.2.6. | Caches in MPSoCs | 19 | | | 3 | .4.2.7. | Verification | 19 | | | 3 | .4.2.8. | System Level Power Modeling | 20 | | | 3 | .4.2.9. | Energy consumption driven dynamic reconfigurable execution model | 20 | | | | | Partial dynamic reconfiguration | 21 | | | 3 | .4.2.11. | Network on Chip synthesis | 21 | | | 3 | .4.2.12. | IP based configurable massively parallel processing SoC | 21 | | | | | ethods for General and Domain-Specific Languages | 22 | | 4. | | | | 22 | | | 4.1. C | Saspard 2 | | 22 | | | 4.2. P | apyrus | | 23 | | | | | iven Factory | 23 | | | 4.4. C | <b>MEGSI</b> | | 23 | | 5. | New Re | sults | | 24 | | | 5.1. C | Co-Model | ling for HP-SoC with MARTE | 24 | | | 5.1.1 | . Diago | onal mesh modeling with MARTE | 24 | | | 5.1.2 | . MAR | TE extension for reconfigurable hardware models | 24 | | | 5.1.3 | . Comp | paraison of SAC and ArrayOL for parallelism expression | 24 | | | 5.1.4 | . Gaspa | ard Modeling Improvements | 25 | | | 5.2. F | ormal M | ethods for general-purpose and domain-specific languages | 25 | | | 5.2.1 | . Form | al Semantics for Domain Specific Modeling Languages | 25 | | | 5.2.2 | . A nev | w abstraction for signal programs, and improvement of the compilation process | 25 | | | 5.2.3 | | g bounded model checking to focus fixpoint iterations | 26 | | | 5.2.4 | | mal definition of a compiler for the Kermeta metamodeling language in K | 26 | | | 5.2.5 | . A gen | eric approach and tool for tracing executions back to a DSML's operational seman | | | | | | | 26 | | | | | ion and compilation techniques | 26 | | | 5.3.1 | | rated Code Optimization | 26 | | | 5.3.2 | | odology to generate OpenCL code from MARTE models | 27 | | | 5.3.3 | | ling into Models | 27 | | | 5.3.4 | | Analysis of Polychronous Specifications with SMT Theory | 27 | | | 5.3.5 | | ramming functional and real-time aspects simultaneously | 27 | | | 5.3.6 | | ning Localized Model Transformation | 28 | | | | | nputing on SoC | 28 | | | 5.4.1 | | ect and Energy-Efficient Design of a Multimedia Application on SoC | 28 | | | 5.4.2 | - | gn Space Exploration for Efficient Data Intensive Computing on SoCs | 28 | | | 5.4.3 | | er Estimation | 29 | | | | | reconfiguration for HP-SoC | 29 | | | 5.5.1 | | ext switching for volatile IP | 29 | | | 5.5.2 | _ | neric broadcast network for HP-SoC architecture | 29 | | | 5.5.3 | | ibuted control for dynamic reconfiguration | 30 | | | 5.5.4 | | nic test bench on heterogenous reconfigurable platform | 30 | | | | | on case-studies | 31 | | | 5.6.1 | | rimentations for electromagnetism simulations | 31 | | | 5.6.2 | . H.264 | 4 modeling on NoC, implementation and synthesis | 31 | | | | xellience | 31 | | |----|-----------|--------------------------------------------------------------------------|------|--| | 6. | Contract | ts and Grants with Industry | . 31 | | | | 6.1. A | NR Famous | 31 | | | | 6.2. Th | ne ANR Open-People project | 32 | | | | | VRIA Euromed 3+3 | 32 | | | | 6.4. ST | ΓΙC INRIA - Tunisia program | 32 | | | | 6.5. Co | Contrat STIC INRIA - Algérie | | | | | 6.6. N | Nano 2012 ID-TLM | | | | | 6.7. Co | ollaboration with CEA List | 33 | | | | 6.8. Co | ollaboration with SME Ecreall | 33 | | | | | ollaboration EADS IW, and Eurocopter | 33 | | | 7. | Partners | ships and Cooperations | 34 | | | | 7.1. In | ternational Initiatives | 34 | | | | 7.1.1. | Collaboration with Colombia | 34 | | | | 7.1.2. | Collaboration with Romania | 34 | | | | 7.1.3. | Visits of International Scientists | 34 | | | | 7.2. Eu | uropean Initiatives | 34 | | | | 7.2.1. | Collaboration with Belgium | 34 | | | | 7.2.2. | Collaboration with England | 34 | | | | 7.2.3. | Participation In European Programs | 34 | | | | 7.3. N | ational Initiatives | 34 | | | | 7.3.1. | Within Inria | 34 | | | | 7.3.2. | (III-VIII-VIII-VIII-VIII-VIII-VIII-VIII-VIII-VIII-VIII-VIII-VIII-VIII-VI | 34 | | | 8. | Dissemin | nation | 35 | | | | 8.1. A | nimation of the scientific community | 35 | | | | 8.2. Te | eaching | 36 | | | | 8.2.1. | Undergraduate | 36 | | | | 8.2.2. | Graduate | 36 | | | | 8.2.3. | Post-Graduate | 37 | | | 9. | Bibliogra | aphy | . 37 | | **Keywords:** Synchronous Languages, Programming Languages, Compiling, Processors, Embedded Systems, Software Engineering DaRT is a common project with the University of Lille 1, Science and Technologies, via the Laboratory of Fundamental Computer Science of Lille (LIFL, associated to the CNRS as UMR 8022). Beginning of the Team: 01/01/2012. # 1. Members #### **Research Scientists** Abdoulaye Gamatié [Research scientist, UMR CNRS 8022] Vlad Rusu [Research scientist, HdR] #### **Faculty Members** Jean-Luc Dekeyser [Team Leader, Professor, USTL, HdR] Pierre Boulet [Vice-Team Leader, Professor, USTL, HdR] Cédric Dumoulin [Associate professor, USTL] Anne Etien [Associate professor, USTL] Frédéric Guyomarch [Associate professor, USTL] Philippe Marquet [Associate professor, USTL] Samy Meftali [Associate professor, USTL, HdR] Laure Gonnord [Associate professor, USTL] Julien Forget [Associate professor, USTL] #### **External Collaborators** Rabie Ben Atitallah [Associate professor, UVHC] Smaïl Niar [Professor, UVHC, HdR] ### **Engineers** Thomas Legrand [Software Engineer, INRIA] Alexis Muller [Transfer and innovation Engineer, INRIA] Emmanuel Leguy [Research Engineer, USTL] Rahma Yangui [Software Engineer, INRIA] #### **PhD Students** Wendell Rodrigues [VALEO and Regional grant] Chiraz Trabelsi [INRIA grant] Adolf Abdallah [ATER untill March 2011] Vincent Aranega [French Minister grant and ATER] Sana Cherif [ANR Famous] Hana Krichen [co-advising USTL-ENIS Sfax] Majdi Elhaji [co-advising USTL-Univ Monastir] Amen Souissi [CIFRE Ecreal1] Georges Afonso [CIFRE EADS-Eurocopter] Santosh-Kumar Rethinagiri [ANR Open-people, UVHC] Paméla Wattebled [ANR Famous, UBS] Amine El Kouhen [CEA grant, Univ Lille1] Andrei Arusoaie [co-advising Univ. of Iasi (Roumanie), since Oct. 2011] ### **Post-Doctoral Fellow** Christophe Calvès [PostDoc, INRIA] # **Administrative Assistant** Karine Lewandowski [INRIA] # 2. Overall Objectives ### 2.1. Introduction For the last few years we have seen the beginning of the "design gap". This gap is caused by the exponential growth of the integration rate of transistors on chips and the comparatively slower growth of the productivity of the integrated circuits designers. It is now impractical to fill a chip with custom designed logic. One has to reuse existing design parts or fill the chip area with memory (a good example of this evolution is the multicore processors that include several existing processing cores instead of complexifying a single core). This evolution is clearly attested by the International Technology Roadmap on semiconductors. In the same time, the computing power requirements of intensive signal processing applications such as video processing, voice recognition, telecommunications, radar or sonar are steadily increasing (several hundreds of Gops for low power embedded systems in a few years). New algorithms and new technologies introduce dynamic reconfiguration system on chip in the design flow. If the design productivity does not increase dramatically, the limiting factor of the growth of the semiconductor industry will not be the physical limitations due to the thinness of the fabrication process but the economy! Indeed we ask to the system design teams to build more complex systems faster, cheaper, bug free and decreasing the power consumption... We propose in the DaRT project to contribute to the improvement of the productivity of the electronic embedded system design teams. We structure our approach around a few key ideas: - Promote the use of parallelism to help reduce the power consumption while improving the performance. - Use of *MDE*(*Model Driven Engineering*) By separating the concerns in different models allowing reuse of these models and to keep them human readable. - Propose an environment starting at the highest level of abstraction, namely the *system modeling* level. - Automate code production by the use of (semi)-automatic model transformations to build correct by construction code. - Develop *simulation techniques* at precise abstraction levels (functional, transactional or register transfer levels) to check the design the soonest. - Prototype the resulting embedded systems of FPGA and dynamically reconfigurable FPGA. - Promote *strong semantics* in the application model to allow verification, non ambiguous design and automatic code generation. - Focus on a *limited application domain*, intensive signal processing applications. This restriction allows us to push our developments further without having to deal with the wide variety of applications. All these ideas are implemented into a prototype co-design environment based on a model driven engineering approach, Gaspard. This open source platform is our test bench and is freely available. To help the designer, such an environment should help to evaluate several architectural solutions as well as several application specifications with regard to their performance and cost. We are able to estimate metrics from SystemC simulations and the refactoring algorithm defined for the transformation of loops to particular multiprocessors are the first steps for exploration. Automatic exploration system based on multi-objective methods has to transform the SoC description (size, network, memory, association). The space of solutions is huge and a fast simulation in SystemC at a high abstraction level is a good opportunity to reduce the space in a short delay. After that, a precise simulation at low level in SystemC or even in VHDL (synthetizable VHDL) can start to refine the solution. Code production is also focussed for GPGPU using OpenCL language as an intermediary target. The main technologies we promote are UML 2 [66] and MARTE profil, MDE [121] and Eclipse EMF [63] for the modeling and model handling; Array-OL [90], [91], [83], [81] and synchronous languages [79] as computation models with strong semantics for verification; SystemC [67] for the simulation; OpenMP for the shared memory parallel execution; OpenCL for the massively parallel GPU; VHDL for the synthesis; and Java to code our prototypes. # 2.2. Highlights of the Year The International Conference on Computer Design encompasses a wide range of topics in the research, design, and implementation of computer systems and their components. ICCD's multi-disciplinary emphasis provides an ideal environment for developers and researchers to discuss practical and theoretical work covering system and computer architecture, verification and test, design and technology, and tools and methodologies. This conference exists from 30 Years. The paper ("Hybrid System Level Power Consumption Estimation for FPGA-Based MPSoC") of the PhD student Santosh Rethinagiri obtained the best paper award 2011. BEST PAPER AWARD: [43] Hybrid System Level Power Consumption Estimation for 29FPGA-Based MPSoC in 29th IEEE International Conference on Computer Design ICCD 2011. SANTHOSH KUMAR. RETHINAGIRI, R. BEN ATITALLAH, S. NIAR, E. SENN, J.-L. DEKEYSER. # 3. Scientific Foundations # 3.1. Introduction The main research topic of the DaRT team-project concerns the hardware/software codesign of embedded systems with high performance processing units like DSP or SIMD processors. A special focus is put on multi processor architectures on a single chip (System-on-Chip). The contribution of DaRT is organized around the following items: - Co-modeling for High Performance SoC design: We define our own metamodels to specify application, architecture, and (software hardware) association. These metamodels present new characteristics as high level data parallel constructions, iterative dependency expression, data flow and control flow mixing, hierarchical and repetitive application and architecture models. All these metamodels are implemented with respect to the MARTE standard profile of the OMG group, which is dedicated to the modeling of embedded and real-time systems. - Model-based optimization and compilation techniques: We develop automatic transformations of data parallel constructions. They are used to map and to schedule an application on a particular architecture. This architecture is by nature heterogeneous and appropriate techniques used in the high performance community can be adapted. We developed new heuristics to minimize the power consumption. This new objective implies to specify multi criteria optimization techniques to achieve the mapping and the scheduling. - SoC simulation, verification and synthesis: We develop a SystemC based simulation environment at different abstraction levels for accurate performance estimation and for fast simulation. To address an architecture and the applications mapped on it, we simulate in SystemC at different abstraction levels the result of the SoC design. This simulation allows us to verify the adequacy of the mapping and the schedule, e.g., communication delay, load balancing, memory allocation. We also support IP (Intellectual Property) integration with different levels of specification. On the other hand, we use formal verification techniques in order to ensure the correctness of designed systems by particularly considering the synchronous approach. Finally, we transform MARTE models of data intensive algorithms in VHDL, in order to synthesize a hardware implementation. # 3.2. Co-modeling for HP-SoC design The main research objective is to build a set of metamodels (application, hardware architecture, association, deployment and platform specific metamodels) to support a design flow for SoC design. We use a MDE (Model Driven Engineering) based approach. ### 3.2.1. Foundations ## 3.2.1.1. System-on-Chip Design SoC (System-on-Chip) can be considered as a particular case of embedded systems. SoC design covers a lot of different viewpoints including the application modeling by the aggregation of functional components, the assembly of existing physical components, the verification and the simulation of the modeled system, and the synthesis of a complete end-product integrated into a single chip. The model driven engineering is appropriate to deal with the multiple abstraction levels. Indeed, a model allows several viewpoints on information defined only once and the links or transformation rules between the abstraction levels permit the re-use of the concepts for a different purpose. #### 3.2.1.2. Model-driven engineering Model Driven Engineering (MDE) [121] is now recognized as a good approach for dealing with System on Chip design issues such as the quick evolution of the architectures or always growing complexity. MDE relies on the model paradigm where a model represents an abstract view of the reality. The abstraction mechanism avoids dealing with details and eases reusability. A common MDE development process is to start from a high level of abstraction and to go to a targeted model by flowing through intermediate levels of abstraction. Usually, high level models contain only domain specific concepts, while technological concepts are introduced smoothly in the intermediate levels. The targeted levels are used for different purposes: code generation, simulation, verification, or as inputs to produce other models, etc. The clear separation between the high level models and the technological models makes it easy to switch to a new technology while re-using the previous high level designs. Transformations allow to go from one model at a given abstraction level to another model at another level, and to keep the different models synchronized In an MDE approach, a SoC designer can use the same language to design application and architecture. Indeed, MDE is based on proved standards: UML 2 [65] for modeling, the MOF (Meta Object Facilities [110]) for metamodel expression and QVT [111] for transformation specifications. Some profiles, i.e. UML extensions, have been defined in order to express the specificities of a particular domain. In the context of embedded system, the MARTE profile in which we contribute follows the OMG standardization process. #### 3.2.1.3. Models of computation We briefly present our reference models of computation that consist of the Array-OL language and the synchronous model. The former allows us to express the parallelism in applications while the latter favors the formal validation of the design. **Array-OL.** The Array-OL language [90], [91], [83], [81] is a mixed graphical-textual specification language dedicated to express multidimensional intensive signal processing applications. It focuses on expressing all the potential parallelism in the applications by providing concepts to express data-parallel access in multidimensional arrays by regular tilings. It is a single assignment first-order functional language whose data structures are multidimensional arrays with potentially cyclic access. The synchronous model. The synchronous approach [79] proposes formal concepts that favor the trusted design of embedded real-time systems. Its basic assumption is that computation and communication are instantaneous (referred to as "synchrony hypothesis"). The execution of a system is seen through the chronology and simultaneity of observed events. This is a main difference from visions where the system execution is rather considered under its chronometric aspect (i.e., duration has a significant role). There are different synchronous languages with strong mathematical foundations. These languages are associated with tool-sets that have been successfully used in several critical domains, e.g. avionics, nuclear power plants. In the context of the DaRT project, we consider declarative languages such as Lustre [85] and Signal [104] to model various refinements of Array-OL descriptions in order to deal with the control aspect as well as the temporal aspect present in target applications. The first aspect is typically addressed by using concepts such as mode automata, which are proposed as an extension mechanism in synchronous declarative languages. The second aspect is studied by considering temporal projections of array dimensions in synchronous languages based on clock notion. The resulting synchronous models are analyzable using the formal techniques and tools provided by the synchronous technology. # 3.2.2. Contributions of the team Our proposal is partially based upon the concepts of the "Y-chart" [97]. The MDE contributes to express the model transformations which correspond to successive refinements between the abstraction levels. Metamodeling brings a set of tools which enable us to specify our application and hardware architecture models using UML tools, to reuse functional and physical IPs, to ensure refinements between abstraction levels via mapping rules, to initiate interoperability between the different abstraction levels used in a same codesign, and to ensure the opening to other tools, like verification tools, thought the use of standards. The application and the hardware architecture are modeled separately using similar concepts inspired by Array-OL to express the parallelism. The placement and scheduling of the application on the hardware architecture is then expressed in an association model. All the previously defined models, application, architecture and association, are platform independent and they conform to the MARTE OMG Profil (figure 1). No component is associated with an execution, simulation or synthesis technology. Such an association targets a given technology (OpenMP, OpenCL, SystemC/PA, VHDL, etc.). Once all the components are associated with some IPs of the GasparLib library, the deployment is fully realized. This result can be transformed to further abstraction level models via some model transformations (figure 2). The simulation results can lead to a refinement of the initial application, hardware architecture, association and deployment models. We propose a methodology to work with all these different models. The design steps are: - 1. Separation of application and hardware architecture modeling. - 2. Association with semi-automatic mapping and scheduling. - 3. Selection of IPs from libraries for each element of application/architecture models, to achieve the deployment. - 4. Automatic generation of the various platform specific simulation or execution models. - 5. Automatic simulation or execution code generation with calls to the IPs. - 6. Refinement at the highest level taking account of the simulation results. ### 3.2.2.1. High-level modeling in Gaspard2 In Gaspard2, models are described by using the recent OMG standard MARTE profile combined with a few native UML concepts and some extensions. The new release of Gaspard2 uses different packages of MARTE for UML modeling. The Hardware Resource Model (HRM) concepts of MARTE enable to describe the hardware part of a system. The Repetitive Structure Modeling (RSM) concepts allow one to describe repetitive structures (DaRT team was the main contributor of this MARTE package definition). Finally, the Generic Component Modeling (GCM) concepts are used as the base for component modeling. The above concepts are expressive enough to permit the modeling of different aspects of an embedded system: • functionality (or applicative part): the focus is mainly put on the expression of data dependencies between components in order to describe an algorithm. Here, the manipulated data are mainly multidimensional arrays. Furthermore, a form of reactive control can be described in modeled applications via the notion of execution modes. This last aspect is modeled with the help of some native UML notions in addition to MARTE. - hardware architecture: similar mechanisms are also used here to describe regular architectures in a compact way. Regular parallel computation units are more and more present in embedded systems, especially in SoCs. HRM is fully used to model these concepts. Some extensions are proposed for NoC design and FPGA specifications. The GPU have a particular memory hierarchy. In order to model the memory details, we extend the MARTE metamodel to describe low level characteristics of the memory. - association of functionality with hardware architecture: the main issues concern the allocation of the applicative part of a system onto the available computation resources, and the scheduling. Here also, the allocation model takes advantage of the repetitive and hierarchical representation offered by MARTE to enable the association at different granularity levels, in a factorized way. In addition to the above usual design aspects, Gaspard2 also defines a notion of *deployment* specification (see Figure 1) in order to select compilable IPs from libraries, at this time models can produce codes. The corresponding package defines concepts that (i) enable to describe the relation between a MARTE representation of an elementary component (a box with ports) to a text-based code (and Intellectual Property - IP, or a function with arguments), and (ii) allow one to inform the Gaspard2 transformations of specific behaviors of each component (such as average execution time, power consumption...) in order to generate a high abstraction level simulation in adequacy with the real system. Recently this package was extended to design reconfigurable systems using dynamical deployment. Figure 1. Overview of the design concepts. ### 3.2.2.2. Intermediate concept modeling and transformations Gaspard2 targets different technologies for various purposes: formal verification, high-performance computing, simulation and hardware synthesis (Figure 1). This is achieved via model transformations that relate intermediate representations towards the final target representations. • A metamodel for procedural language with OpenMP (OpenMP in Figure 1). It is inspired by the ANSI C and Fortran grammars and extended by OpenMP statements [68]. The aim of this metamodel is to use the same model to represent Fortran and C code. Thus, from an OpenMP model, it is possible to generate OpenMP/Fortran or OpenMP/C. The generated code includes parallelism directives and control loops to distribute task (IPs code) repetitions over processors [124]. - A VHDL metamodel (VHDL in Figure 1). It gathers the necessary concepts to describe hardware accelerators at the RTL (Register Transfer Level) level, which allows the hardware execution of applications. This metamodel introduces, *e.g.*, the notions of *clock* and *register* in order to manipulate some of the usual hardware design concepts. It is precise enough to enable the generation of synthetizable HDL code [103]. - The two metamodels SystemC and Pthread was redefined to implement both a multi-thread execution model. These are described in the "New results" part. - Synchronous metamodel (Synchronous Equational). It was used to benefit of the verification tools of synchronous languages. It is not yet maintained in the new release of Gaspard2. The transformation scheme. In order to target these metamodels, several transformations have been developed (Figure 2). MartePortInstance introduces into the MARTE metamodel the concept of PortInstance corresponding to an instance of port associated to a part. The ExplicitAllocation transformation explicits the association of each application part on the processing units, according to the association of other elements in the application hierarchy. The LinkTopologyTask transformation replaces the connectors between a component and an inner repeated part by a task managing the data (TilerTask). The scheduling of the application tasks is decomposed into three transformations, Synchronisation that associates, to each application component, a local graph of tasks corresponding to its parts; GlobalSynchronization that computes a global graph of tasks for the complete application from the local graphs of tasks; and Scheduling that schedules the tasks from the global graph. TilerMapping maps the TilerTasks onto processors. The management of the data in the memory is performed through two transformations. MemoryMapping maps the data into memory i.e. creates the variables and allocates address spaces. AddressComputation computes addresses for each variable. Finally, some transformations are dedicated to targets: Functional introduces the concepts relative to procedural languages. pThread transforms MARTE elementary tasks into threads and the connectors into buffers. SystemC traduces the MARTE architecture into concepts of the SystemC language. # 3.2.2.3. An operational semantics for RSM The Repetitive Structure Modeling (RSM) package of the UML MARTE profile is used to describe repetitive computations and topologies (e.g., data-parallel algorithms, grid of processing units) in an embedded system. In Gaspard2, the concepts provided by this package are of prime importance for the specification of data-intensive applications. A formal semantics [82] has been previously defined for the Array-OL language, which is the basis for the definition of RSM. We proposed an new formal semantics for RSM, which is operational unlike [82]. Execution semantic descriptions are rarely taken into account in the definition of UML profiles. This raises several serious correctness issues about the manipulation of models defined with these profiles. The aim of our new semantics [100] is to answer this demand by proposing a help for understanding the behavior and execution of models specified with RSM concepts in UML MARTE. #### 3.2.2.4. Clock-based modeling of embedded system behavior The concepts defined in the RSM package of MARTE allow one to suitably describe the data intensive algorithms [70] [69]. In order to add more details about the system functional behavior, logical clocks are associated with components to describe the expected rates at which data should be processed. The Time subprofile of MARTE is used to model this rate information. It offers a rich expressivity for describing both logical and physical time aspects [74]. The rate constraints are expressed using the CCSL package of MARTE in the form of clock constraints. We refer to this clock constraints as functional clock properties. Figure 2. Overview of the transformation chains. The physical resources that implement the data intensive algorithms are specified in MARTE. For each resource, hardware IPs are deployed in order to refine the models towards a specific technology. At this level, we extract information concerning the processors speed represented by its frequency. We synthesize new clocks that represent the periods of the clock cycles for each processor involved in the execution. All clocks are related to an ideal clock. The occurrence of the instants of the ideal clock are fast enough to capture any instant of the processors clocks. We refer to these clock specifications as physical clock properties. Since application functionality and hardware architecture are modeled independently in Gaspard2, the allocation phase bridges these two different views in order to map functionality on their associated physical resources. In terms of clocks, this allocation is expressed as the mapping of functional clock properties onto physical clock properties, according to a particular mapping algorithm. The result of such allocation is a new set of clocks reflecting the simulation of the temporal behavior of the system during execution. We refer to these clock description as simulation clock properties. They are usable for a very relevant system analysis. #### 3.2.2.5. High-level modeling and exploration of non functional properties We have proposed an approach for high-level modeling and exploration of non functional properties. Our work proposed a Model Driven Engineering (MDE)-based approach to integrate non functional requirements for systems on chip and defined metamodels that allow the integration of external optimization tools in the Gaspard2 environment. The designer creates the application and architecture models at a high level. The designer should then take the decision to allocate application functions on hardware components. This decision depends essentially on the non functional properties of both of the software and hardware components. For this reason, it is necessary to express these requirements. The proposed methodology uses models enriched with non-functional properties to drive the optimization of resource allocation. #### 3.2.2.6. HPF towards Marte Concerning the power of expression of the MARTE RSM subprofile that we have defined, we have studied the data and computation distribution capabilities. We have proved that the MARTE «distribute» stereotype is at least as expressive as the well known High Performance Fortran data distribution. The proof is constructive: starting from an ALIGN and a DISTRIBUTE HFP directive, we build a MARTE «distribute». ### 3.2.2.7. MARTE extensions for reconfigurable based systems Reconfigurable FPGA based Systems-on-Chip (SoC) architectures are increasingly becoming the preferred solution for implementing modern embedded systems. However due to the tremendous amount of hardware resources available in these systems, new design methodologies and tools are required to reduce their design complexity. In previous work, we provided an initial contribution to the modeling of these systems by extending MARTE profile to incorporate significant design criteria such as power consumption. In its current version, MARTE lacks dynamic reconfiguration concepts. Even these later are necessary to model and implement rapid prototypes for complex systems. Our objective is to define all necessary concepts for dynamic reconfiguration issues regarding configuration latency, resources number, etc. Afterwards, these concepts will be integrated to MARTE to obtain an extended and complete profile, which can be called Reconfigurable MARTE (RecoMARTE). Our current proposals permit us to model fine grain reconfigurable FPGA architectures with an initial extension of the MARTE profile to model Dynamic Reconfiguration at a high-level description. Since a controller is essential for managing a dynamically reconfigurable region, we modeled a state machine at high abstraction levels using UML state machine diagrams. This state machine is responsible for switching between the available configurations. As a future work, we will analyze the reconfigurable design flow of Xilinx from the design partitioning to the bitstream generation stage. It is a starting point for understanding how to generate configuration files. Then, we will extract relevant data to define our own design flow. #### 3.2.2.8. Traceability We use the transformation mechanism to assist a tester in the mutation analysis process dedicated to model transformations. The mutation analysis aims to qualify a test model set. More precisely, errors are voluntary injected in transformation and the ability of the test models set to highlight these errors is analyzed. If the number of highlighted errors, *i.e.* if the test model set is not enough qualified, new models have to be added in order to raise the set quality [108]. Our approach relies on the hypothesis that it is easier to modify an existing model than to create a new one from scratch. The local trace, coupled to a mutation matrix, helps the tester to identify adequate test models and their relevant parts to modify in order to improve the test data set. We propose a semi-automation approach that can automatically generate new test model in some cases and efficiently assist the testers in others cases [77]. ### 3.2.2.9. Transformation migration after metamodel evolution Metamodels evolve because of several reasons such as design refinement and software requirement changes. When this happens, transformations defined in terms of those metamodels might become inconsistent and migration would be necessary. Due to the lack of methodology support, transformation migration is mostly ad hoc and manually performed. Besides, the growing complexity and size of transformations make this task difficult and error prone. We started works in this domain area. More specifically, on the one hand, we specify transformation consistency by defining the relationship between transformation and metamodels, we called it domain conformance. On the other hand, we propose a transformation migration process which describes the set of tasks that should be completed in order to re-establish consistency after metamodel evolution [107], [116]. #### 3.2.2.10. Model transformation towards SystemC-PA The buffered strategy developed for the transformation chain towards pThread has been kept to simulate the behavior of the application for the SystemC-PA simulation target. Mapped tasks are associated to threads themselves run on SystemC processing modules. Most of the thread contents (concepts, transformation and code generator) were reused and coupled with the SystemC contents dedicated to the architecture. A new model transformation has been developed to map the threads related to the application to the SystemC elements related to the architecture. The data accesses in the new SystemC-PA target are triggered off when the buffers (Pthread mechanisms) are requested. Those accesses are forwarded to the architecture through the TLM2 communication channels of the processors running the thread. The resulting transformation chain is available in the on-line Gaspard version (http://www.gaspard2.org). # 3.2.3. Gaspard2 for avionic hybrid test platform design The emergence and the maturity of FPGA circuits for distributed and reconfigurable architectures offer the opportunity to explore real time problems in the field of avionic systems. FPGA becomes de facto a major processing element as same as general CPUs. As of now, the FPGA is widely used in the field of I/O component in order to connect the real equipment with the CPU host. Among the main features mapped into the FPGA in the original architecture, we quote the fast serial link and RAM IPs (Intellectual property) which are needed to ensure communication between CPU and FPGA. Additionally, the Base Time IP is needed for the global system synchronization. This minimal configuration based on FPGA can be duplicated several times and connected together to build bigger test system or a complete simulator. Eurocopter expectation for the abovedescribed architecture is to prototype some models which can be eligible and relocated in the FPGA. The objective is to increase the performances of these models and to reduce the communication latencies by the means of embedding the different parts in the same chip. To do so, we studied in this first year a real avionic test loop in order to extract the complex models that will be implemented in the FPGA. Different hardware model configurations have been explored to reach an optimal well-balanced global system using the ML403 Virtex-4 Xilinx board. Different tradeoffs in terms of performance and resource occupation in the FPGA are obtained. Later, these results will be used for dynamically adapt the system functioning according to the available resources and performance requirements. As a second part, we used the MARTE profile to represent an hybrid system (CPU/FPGA). In the MARTE specification, an application is a set of tasks connected through ports. Tasks are considered as mathematical functions reading data from their input ports and writing data on their output ports. This specification has been used to model the avionic test loop. In addition, MARTE allows describing the hardware architecture in a structural way. Typical components such as HwProcessor, HwFPGA and HwRAM can be specified with their non-functional properties. We used this subset of MARTE in order to represent an hybrid multiprocessor architecture. The main component of this architecture is composed of the Xeon-X3370 processor (multicore CPU) and the Virtex-4 Xilinx FPGA. Furthermore, MARTE provides the Allocate concept as well as the concept specially crafted for repetitive structures Distribute. This latter concept gives a way to express regular distribution of tasks onto a set of processors or FPGA resources. The mapping step relies on two types of distribution (timeScheduling and spatialDistibution) depending on the target hardware platform (CPU/FPGA). The different models of our avionic test loop can be mapped onto the host multicore processor, the embedded processor (Microblaze) or the hardware resources in the FPGA. # 3.3. Model-based optimization and compilation techniques ### 3.3.1. Foundations #### 3.3.1.1. Optimization for parallelism We study optimization techniques to produce "good" schedules and mappings of a given application onto a hardware SoC architecture. These heuristic techniques aim at fulfilling the requirements of the application, whether they be real time, memory usage or power consumption constraints. These techniques are thus multi-objective and target heterogeneous architectures. We aim at taking advantage of the parallelism (both data-parallelism and task parallelism) expressed in the application models in order to build efficient heuristics. Our application model has some good properties that can be exploited by the compiler: it expresses all the potential parallelism of the application, it is an expression of data dependencies —so no dependence analysis is needed—, it is in a single assignment form and unifies the temporal and spatial dimensions of the arrays. This gives to the optimizing compiler all the information it needs and in a readily usable form. #### 3.3.1.2. Transformation and traceability Model to model transformations are at the heart of the MDE approach. Anyone wishing to use MDE in its projects is sooner or later facing the question: how to perform the model transformations? The standardization process of Query View Transformation [111] was the opportunity for the development of transformation engine as Viatra, Moflon or Sitra. However, since the standard has been published, only few of investigating tools, such as ATL<sup>1</sup> (a transformation dedicated tool) or Kermeta <sup>2</sup> (a generalist tool with facilities to manipulate models) are powerful enough to execute large and complex transformations such as in the Gaspard2 framework. None of these engine is fully compliant with the QVT standard. To solve this issue, new engine relying on a subset of the standard recently emerged such as QVTO <sup>3</sup> and smartQVT. These engines implement the QVT Operational language. Traceability may be used for different purposes such as understanding, capturing, tracking and verification on software artifacts during the development life cycle [98]. MDE has as main principle that everything is a model, so trace information is mainly stored as models. Solutions are proposed to keep the trace information in the initials models source or target [125]. The major drawbacks of this solution are that it pollutes the models with additional information and it requires adaptation of the metamodels in order to take into account traceability. Using a separate trace model with a specific semantics has the advantage of keeping trace information independent of initial models [102]. <sup>1</sup> http://www.eclipse.org/m2m/atl <sup>&</sup>lt;sup>2</sup>http://www.kermeta.org <sup>&</sup>lt;sup>3</sup>http://www.eclipse.org/m2m/qvto/doc # 3.3.2. Contributions of the team #### 3.3.2.1. Data-parallel code transformations We have studied Array-OL to Array-OL code transformations [83], [122], [93], [92], [94] [101]. Array-OL allows a powerful expression of the data access patterns in such applications and a complete parallelism expression. It is at the heart of our metamodel of application, hardware architecture and association. The code transformations that have been proposed are related to loop fusion, loop distribution or tiling but they take into account the particularities of the application domain such as the presence of modulo operators to deal with cyclic frequency domains or cyclic space dimensions (as hydrophones around a submarine for example). We pursue the study of such transformations with two objectives: - Propose utilization strategies of such transformations in order to optimize some criteria such as memory usage, minimization of redundant computations or adaptation to a target hardware architecture. - Stretch their application domain to our more general application model (instead of just Array-OL). In 2009 the study on the interaction between the high-level data-parallel transformations and the interrepetition dependencies (allowing the specification of uniform dependencies) was achieved. Because the ODT formalism behind the Array-OL transformations cannot express dependencies between the elements of the same multidimensional space, in order to take into account the uniform dependencies we proposed and proved an algorithm that, starting from the hierarchical distribution of repetition before and after a transformation, is capable to compute the new uniform dependencies that express the same exact dependencies as before the transformations. It all comes down to solving an (in)equations system, interpreting the solutions and translating them into new uniform dependencies. The algorithm was implemented and integrated into the refactoring toolbox and enables the use of the transformations on models containing inter-repetition dependencies. In order to validate the theoretical work around the high-level Array-OL refactoring based on the data-parallel transformations, together with Eric Lenormand and Michel Barreteau from THALES Research & Technology we worked on a study on optimization techniques in the context of an industrial radar application. We have proposed a strategy to use the refactoring toolbox to help explore the design space, illustrated on the radar application modeled using the Modeling and Analysis of Real-time and Embedded systems (MARTE) UML profile. ### 3.3.2.2. Multi-objective hierarchical scheduling heuristics When dealing with complex heterogeneous hardware architectures, the scheduling heuristics usually take a task dependence graph as input. Both our application and hardware architecture models are hierarchical and allow repetitive expressions. We propose a Globally Irregular, Locally Regular (GILR) combination of heuristics to allow to take advantage of both task and data parallelism [105] and have started evaluating multi-objective evolutionary meta-heuristics in this context. These evolutionary meta-heuristics deal with the irregular (task parallelism) part of the design [80] while we have proposed a heuristic to deal with the regular part (data parallelism) [106]. Furthermore, local optimizations (contained inside a hierarchical level) decrease the communication overhead and allow for a more efficient usage of the memory hierarchy. We aim at combining the data-parallel code transformations presented before and the GILR heuristics in order to deal efficiently with the data-parallelism of the application by using repetitive parts of the hardware architecture. The introduction of uniform inter-repetition dependencies in the data-parallel tasks of Gaspard2 has had several consequences. Aside the modification of the refactoring (see section 3.3.2.1), we have studied the compilation of such tasks. This compilation involves the scheduling of such repetitions on repetitive grids of processors and the code generation. This scheduling problem is NP-complete and we have proposed a heuristic based on the automatic parallelization techniques to compute a good (efficient both in time and code size) schedule in the case when all loop bounds and processor array shapes are known. #### 3.3.2.3. Transformation techniques In the previous version of Gaspard2, model transformations were complex and monolithic. They were thus hardly evolvable, reusable and maintainable. We thus proposed to decompose complex transformations into smaller ones jointly working in order to build a single output model [96]. These transformations involve different parts of the same input metamodel (e.g. the MARTE metamodel); their application field is localized. The localization of the transformation was ensured by the definition of the intermediary metamodels as delta. The delta metamodel only contains the few concepts involved in the transformation (i.e. modified, or read). The specification of the transformations only uses the concepts of these deltas. We defined the Extend operator to build the complete metamodel from the delta and transposed the corresponding transformations. The complete metamodel corresponds to the merge between the delta and the MARTE metamodel or an intermediary metamodel. The transformation then becomes the chaining of metamodel shifts and the localized transformation. This way to define the model transformations has been used in the Gaspard2 environment. It allowed a better modularity and thus also reusability between the various chains. #### 3.3.2.4. Traceability Our traceability solution relies on two models the Local and the Global Trace metamodels. The former is used to capture the traces between the inputs and the outputs of one transformation. The Global Trace metamodel is used to link Local Traces according to the transformation chain. The local trace also proposes an alternative "view" to the common traceability mechanism that does not refers to the execution trace of the transformation engine. It can be used whatever the used transformation language and can easily complete an existing traceability mechanism by providing a more finer grain traceability [75]. Furthermore, based on our trace metamodels, we developed algorithms to ease the model transformation debug. Based on the trace, the localization of an error is eased by reducing the search field to the sequence of the transformation rule calls [76]. #### 3.3.2.5. Verifying conformance and semantics-preserving model transformations We give formal executable semantics to the notions of *conformance* and of *semantics-preserving model transformations* in the model-driven engineering framework [119]. Our approach consists in translating models and meta-models (possibly enriched with OCL invariants) into specifications in *Membership Equational Logic*, an expressive logic implemented in the Maude tool. Conformance between a model and a meta-model is represented by the validity of a certain *theory interpretation*, of the specification representing the meta-model, in the specification representing the model. Model transformations between origin and destination meta-models are mappings between the sets of models that conform to the those meta-models, respectively, and can be represented by rewrite rules in *Rewriting Logic*, a superset of Membership Equational Logic also implemented in Maude. When the meta-models involved in a transformation are endowed with dynamic semantics, the transformations between them are also typically required to preserve those semantical aspects. We propose to represent the notion of dynamic semantics preservation by means of *algebraic simulations* expressed in Membership Equational Logic. Maude can then be used for automatically verifying conformance, and for automatically verifying dynamic semantics preservation up to a bounded steps of the dynamic semantics. These works lead to better understood meta-models and models, and to model transformations containing fewer errors. ### 3.3.2.6. Modeling for GPU The model described in UML with Marte profile model is chained in several inout transformations that adds and/or transforms elements in the model. For adding memory allocation concepts to the model, a QVT transformation based on «Memory Allocation Metamodel» provides information to facilitate and optimize the code generation. Then a model to text transformation allows to generate the C code for GPU architecture. Before the standard releases, Acceleo is appropriate to get many aspects from the application and architecture model and transform it in CUDA (.cu, .cpp, .c, .h, Makefile) and OpenCL (.cl, .cpp, .c, .h, Makefile) files. For the code generation, it's required to take into account intrinsic characteristics of the GPUs like data distribution, contiguous memory allocation, kernels and host programs, blocks of threads, barriers and atomic functions. #### 3.3.2.7. Clock-based design space exploration for SoCs We have previously proposed an abstract clock-based modeling of data-intensive SoCs behaviors within the Gaspard2 framework [70] [69]. Both application functionality and hardware architecture are characterized in terms of clocks. Then, their allocation is also expressed as a projection of functional clock properties onto physical clock properties, according to a mapping choice. The result of such allocation is a new set of clocks reflecting the simulation of the temporal behavior of the system during execution. This year, this approach has been applied to the design of the H.264 encoder on a multiprocessor hardware architecture using the standard MARTE profile [71]. The obtained model has been analyzed by considering abstract clocks. In particular, it has been shown that such clocks help to tackle design space exploration issues via a relevant modeling of different hardware/software mappings. The trade-off about processor frequency scaling, system functional properties and energy consumption has been addressed, via different hardware IP choices. This has been achieved via a qualitative reasoning on traces resulting from a scheduling of logical clocks, capturing functional properties, on physical clocks derived from processors frequency. # 3.3.2.8. Optimized code generation from UML/MARTE models Starting from the observation that some semantics (and thus some optimization possibilities) are lost when generating code in a programming language from a UML/MARTE model, the contribution of a thesis codirected with the CEA LIST is an optimization at the model level followed by a translation to the GENERIC intermediate representation of the gcc compilation framework in order to allow more optimization, for the moment focusing on code size optimization. #### 3.3.2.9. Architecture exploration based on meta-heuristics Some progress has been made on the proposal of meta-heuristics use for multi-objective mapping and scheduling. In collaboration with the Dolphin project-team of INRIA Lille - Nord Europe and LIFL we have modeled the association process of Gaspard2 as an optimization problem in order to solve it with a genetic algorithm based heuristic that has been implemented in the ParadisEO optimization framework. This new heuristics is currently being integrated in the Gaspard2 tool. Another work comparing heuristics based on the particle swarm and genetic algorithm meta heuristics has been proposed in collaboration with the computer science laboratory of Oran, Algeria, in continuation of our collaboration. #### 3.3.2.10. Architecture exploration for efficient data transfer and storage A major point in embedded system design today is the optimization of communication structures, memory hierarchy and global synchronizations. Such an optimization is a time consuming and error-prone process, that requires a suitable automatic approach. We proposed an electronic system level framework to explore the data transfer storage micro-architecture and the synchronization of iterative data-parallel applications [88]. The aim is to define a methodology that can be a front-end for loop-based high level synthesis or interconnect hardware IPs in order to realize memory-centric MPSoCs. In Gaspard2, this will enable to assess various mappings of Array-OL models onto different kinds of target architectures. Our solution starts from a canonical Array-OL representation and apply a set of transformations in order to infer an Application Specific architecture that masks the times to transfer data with the time to perform the computations. A customizable model of the target architecture including FIFO queues and double buffering mechanism is proposed. The mapping of a given image processing application onto this architecture is performed through a flow of Array-OL transformations aimed to improve the parallelism level and to reduce the size of the used internal memories. A method based on an integer partition is considered to reduce the space of explored transformations. # 3.3.2.11. Multi-objective mapping and scheduling heuristics Mohamed Akli Redjedal, univ. Lille 1 master, co-directed with Laetitia Jourdan form the Dolphin project-team of INRIA Lille - Nord Europe and LIFL. The work of Mohamed Redjedal has consisted in modeling the association process of Gaspard2 as an optimization problem in order to solve it with a genetic algorithm based heuristic. He has indeed modeled this multi-objective mapping and scheduling problem, proposed a heuristic and its implementation in the ParadisEO optimization framework. A 1st year master student from the univ. of Brussels has worked 6 weeks on the model driven export from Gaspard2 to the optimization heuristics proposed by Mohamed Redjedal #### 3.3.2.12. GPGPU code production The solution of large, sparse systems of linear equations « Ax=b » presents a bottleneck in sequential code executing on CPU. To solve a system bound to Maxwell's equations on Finite Element Method (FEM), a version of conjugate gradient iterative method was implemented in CUDA and OpenCL as well. The aim is to accelerate and verify the parallel code on GPUs. The first results showed a speedup around 6 times against sequential code on CPU. Another approach uses an algorithm that explores the sparse matrix storage format (by rows and by columns). This one did not increase the speedup but it allows to evaluate the impact of the access to the memory. #### 3.3.2.13. From MARTE to OpenCL. We have proposed an MDE approach to generate OpenCL code. From an abstract model defined using UML/MARTE, we generate a compilable OpenCL code and then, a functional executable application. As MDE approach, the research results provide, additionally, a tool for project reuse and fast development for not necessarily experts. This approach is an effective operational code generator for the newly released OpenCL standard. Further, although experimental examples use mono device(one GPU) example, this approach provides resources to model applications running on multi devices (homogeneously configured). Moreover, we provide two main contributions for modeling with UML profile to MARTE. On the one hand, an approach to model distributed memory simple aspects, i.e. communication and memory allocations. On the other hand, an approach for modeling the platform and execution models of OpenCL. During the development of the transformation chain, an hybrid metamodel was proposed for specifying of CPU and GPU programming models. This allows generating other target languages that conform the same memory, platform and execution models of OpenCL, such as CUDA language. Based on other created model to text templates, future works will exploit this multi language aspect. Additionally, intelligent transformations can determine optimization levels in data communication and data access. Several studies show that these optimizations increase remarkably the application performance. #### 3.3.2.14. Formal techniques for construction, compilation and analysis of domain-specific languages The increasing complexity of software development requires rigorously defined *domain specific modelling languages* (DSML). Model-driven engineering (MDE) allows users to define their language's syntax in terms of *metamodels*. Several approaches for defining operational semantics of DSML have also been proposed [123], [89], [73], [84], [115]. We have also proposed one such approach, based on representing models and metamodels as algebraic specifications, and operational semantics as rewrite rules over those specifications [95], [120]. These approaches allow, in principle, for model execution and for formal analyses of the DSML. However, most of the time, the executions/analyses are performed via transformations to other languages: code generation, resp. translation to the input language of a model checker. The consequence is that the results (e.g., a program crash log, or a counterexample returned by a model checker) may not be straightforward to interpret by the users of a DSML. We have proposed in [118] a formal and operational framework for tracing such results back to the original DSML's syntax and operational semantics, and have illustrated it on SPEM, a language for timed process management. # 3.3.3. Electromagnetic modeling The Finite Integration Technique (F.I.T) is used to compute the phenomena. This technique is efficient if the mesh is generated by a regular hexahedron. Moreover the matrix system, obtained from a regular mesh can be exploited to use the parallel direct solver. In fact, in reordering the unknowns by the nested dissection method, it is possible to construct directly the lower triangular matrix with many processors without assembling the matrix system. During this year, we have used our parallel direct solver as a preconditionner for a sparse linear system coming from a FEM problem with a good efficiency. # 3.4. HP-SoC simulation, verification and synthesis Many simulations at different levels of abstraction are the key of an efficient design of embedded systems. The different levels include a functional (and possibly distributed) validation of the application, a functional validation of the application and an architecture co-model, and a validation of a heterogeneous specification of an embedded system (a specification integrating modules provided at different abstraction levels). SoCs are more and more complex and integrate software parts as well as specific hardware parts (IPs, Intellectual Properties). Generally before obtaining a SoC on silicium, a system is specified at several abstraction levels. Any system design flow consists in refining, more or less automatically, each model to obtain another, starting from a functional model to reach a Register Transfer Level model. One of the biggest design challenges is the development of a strong, low cost and fast simulation tool for system verification and simulation. The DaRT project is concerned by the simulation at different levels of abstraction (SystemC, VHDL) of the application/architecture co-model and of the mapping/schedule produced by the optimization phase. #### 3.4.1. Foundations #### 3.4.1.1. Abstraction levels and Transaction Level Modeling Currently, Transaction Level Modeling, TLM, is being used in the industry to solve a variety of practical problems during the design, development and deployment of electronic systems. The TLM 2.0 standard appeared during the very few last years. It consists in describing systems according to the specifications of the TLM abstraction levels. At these levels, function calls simulate the behavior of the communications between architecture components. Nowadays, this modeling style is widely used for verification and it is starting to be used for design at many major electronic companies. Recently, many actions and challenges have been started in order to help to proliferate TLM. Thus, several teams are working to furnish to designers standard TLM APIs and guidelines, TLM platform IP and tools supports. SystemC is the first system description language adopting TLM specifications. Thus, several standardization APIs have been proposed to the OSCI by all the major EDA and IP vendors. This standardization effort is being generalized now by the OSCI / OCP-IP TLM standardization alliance, to build on a common TLM API foundation. One of the most important TLM API proposals is the one from Cadence, distributed to OSCI and OCP-IP. It is intended as common foundation for OSCI and OCP-IP allowing protocol-specific APIs (e.g. AMBA, OCP) and describing a wide range of abstraction levels for fast and efficient simulations. In order to keep our design flow coherent, we choose to use two significant simulation levels. Each of them has special advantages. The main objectives of the PVT level are fast verification of system functionalities and monitoring of the contentions in the interconnection network. Complementary to this level, the CABA level is used to accurately estimate the execution time and power consumption. At the PVT level, details related to the computation and communication resources are omitted. The software application is executed by an instruction-accurate Instruction Set Simulator. Transactions are performed through channels instead of signals. At the CABA level, hardware components are implemented at the cycle accurate level for both processing and communication parts. Communication protocol and arbitration strategy are specified as well. Simulation at the PVT level permits a rapid exploration of a large solution space by eliminating non interesting regions from the DSE process. The solutions selected at this level are then forwarded to a new exploration at the CABA level. At each level, the exploration is based on developed performance and power estimation tools. Code generation at both of those levels needs parameter specifications for execution time, power estimation, and platform configurations. These parameters are specified at the deployment phase. Due to all TLM's benefits, we defined a TLM metamodel as a top level point for automatic transformations to both simulation and synthesis platforms. Our TLM metamodel contains the main concepts needed for verification and design following the Cadence API proposal. But, as we are targeting multi-language simulation platforms, the meta model is completely independent from the SystemC syntax. It is composed mainly by two parts: architecture and application. This clear separation between SW and HW parts permits easy extensions and updates of the meta model. - The architecture part contains all necessary concepts to describe HW elements of systems at TLM levels. The SW part is mainly composed of computation tasks. They should be hierarchical and repetitive. A set of parameters could be attached to each task in order to specify the scheduling dependently of the used computation model. - Thus this metamodel keeps hierarchies and repetitions of both the application and the architecture. This permits to still benefit from the data parallelism as far as possible in the design (simulation and synthesis flow). In fact, the designer can choose to eliminate hierarchies when transforming the TLM model into a simulation model, and to keep it when transforming into a synthesis model. # 3.4.1.2. Dynamic reconfiguration - FPGA Current FPGAs support the notion of Partial Dynamic Reconfiguration which allows part of the FPGA to be reconfigured on the fly hence introducing the idea of virtual hardware. Partial Reconfiguration allows swapping of tasks (mutually exclusive)depending upon user requirements and Quality of service needs. Using such a technology permits to optimize energy consumption and the area in the system. It allows also to have very flexible systems, adaptable for large application classes. #### 3.4.1.3. Verification Our privileged basis for verification is the reactive synchronous domain. Over the last two decades several formal verification technologies have been provided by a very active research community in this domain. Among the available tools, we can mention efficient compilers that act more than usual compilers in that they address more static analysis issues. There are also various model-checkers that use both symbolic representations and non symbolic ones. Some of these model-checkers offer facilities that go beyond verification by enabling the synthesis of (discrete) controllers. Finally, these synchronous technologies give the opportunity in some cases to perform a functional simulation of the described systems. ### 3.4.2. Contributions of the team The results of DaRT simulation package concerns mainly the PVT and the CABA levels. We also propose techniques to interact with IPs specified at other level of abstraction (mainly RTL). #### 3.4.2.1. Co-simulation in SystemC From the association model, the Gaspard2 environment is able to automatically produce SystemC simulation code. The MDE techniques offer the transformation of the association model to the SystemC model. During this transformation the data parallel components are unrolled and the data dependencies between elementary tasks become synchronization primitive calls relying on a buffered strategy. The SoC architecture is produced from the architecture model coupled with a ready-to-use component library. A processing module in SystemC simulates the behavior of tasks mapped to a particular processor. Other modules contain the data parallel structures and are able to answer to any read/write requests. The communications between tasks and between tasks and memories are simulated via communication modules in SystemC. These last modules produce interesting results concerning the simultaneous network conflicts and the capacity of this network for this application. A transformation chain within Gaspard2 ensures the code generation from the input model. The produced simulation code is based on SystemC IPs assembling. These IPs are available in the Gaspard2 library in both TLM and CABA levels. They represent all the usual architecture components such as processors (ARM, MIPS, ..etc), memories, caches, buses, NoCs, etc. #### 3.4.2.2. Model transformation towards Pthreads The strategy in previous version of the Gaspard2 framework imposed a global synchronization mechanism between all the tasks of the application. This mechanism does not allow one to reach an optimal execution. We have investigated a new strategy to overcome this problem, based on fine grain synchronizations between the different tasks of the modeled application. For this new strategy, we use the pthread API. Each task of the UML application model is transformed into a thread. The data exchanges between the tasks are ensured by a buffer-based strategy. The best compromise between the memory used and the performance can be reached by adjusting the size of each buffer. Moreover, we have developed this strategy to facilitate its use in simulation targets such as SystemC-PA. The transformation chain towards Pthreads enabled to optimize the global synchronization mechanism between all the tasks of the application provided by the previous version of Gaspard2. ### 3.4.2.3. Gaspardlib extensions The chain towards SystemC code allows simulations at the TLM-PA level. Regarding the architecture design, the process acts at as a connector between existing SystemC modules. They correspond to basic components such as memories, processors, caches. They are gathered in the Gaspardlib to be included or linked at the code compilation step. On one hand, both application and architecture IPs have been modeled using UML to easily drag and drop the available components inside the user?s model. On another hand, we aimed at providing the most flexible design for the SystemC architecture. The GaspardLib allows a high interoperability for our SystemC components with any other SystemC architecture. Consequently, additional SystemC modules have been integrated to extend the Gaspardlib. They come from other free simulation environments: ReSP,SocLib,Unisim. ### 3.4.2.4. Partial and Dynamic Reconfiguration (PDR) implementations Current Gaspard2 Model transformation chain to Register Transfer Level (RTL) allows to generate two key aspects of a partial dynamically reconfigurable system: namely the dynamically reconfigurable region and the code for the reconfiguration manager that carries out the switch between the different configurations of this dynamic region. For this, the MARTE metamodel has been extended to integrate concepts of UML state machines and collaborations, which help in creation of mode automata semantics at the high abstraction levels. Integration of these concepts in the extended MARTE metamodel helps in the respective model-to-model transformations. Moreover, the high level application model has several building blocks: the elementary components, each associated to several available intellectual properties (IPs). The current deployment level has been also extended to integrate the notion of "configurations", which are unique global implementations of the application functionality, with each configuration comprised of different combinations of IPs related to the elementary components. Using a combination of the deployment level and the introduced control semantics, it is possible for a designer to change the configuration related to an application, resulting in different results such as consumed FPGA resources, reconfiguration times, etc. We incorporate two model-to-model transformations in our flow, first the UML2MARTE transformation, with integrated state machine and configuration concepts. This transformation results in an intermediate MARTE model, which is converted into an RTL model by the MARTE2RTL transformation. The application model is converted into several implementations of a dynamically reconfigurable hardware accelerator, along with the source code for the configuration switch. Finally, the design flow has been validated in the construction of a dynamically reconfigurable delay estimation correlation modulethat is part of a complex anti-collision radar detection system in collaboration with IEMN Valenciennes. The simulation results from the different configurations correspond to an initial MATLAB result, validating the different configurations. Additionally change of IPs related to a key elementary component in the module resulted in different reconfiguration times proving methodology. ### 3.4.2.5. IP based configurable massively parallel processing SoC A methodology and a tool chain to design and build IP-based configurable massively parallel architectures is proposed. The defined architecture is named mppSoC, massively parallel processing System on Chip. It is a SIMD architecture composed of a number of processor elements (the PEs) working in perfect synchronization. A small amount of local and private memory is attached to each PE. Every PE is potentially connected to its neighbors via a regular network. Furthermore, each PE is connected to an entry of mpNoC, a massively parallel Network on Chip that potentially connects each PE to one another, performing efficient irregular communications. All the system is controlled by an Array Controller Unit (ACU). Our objective is to propose then a methodology to produce FPGA implementations of the mppSoC architecture. The whole mppSoC architecture with its various components is implemented following an IP based design methodology. An implementation on FPGA, ALTERA StratixII 2s180, is proposed as a proof of feasibility. The architecture consists of general IPs (processor IPs, memory IPs, etc.) and specific IPs supplied with the mppSoC system (control IPs, etc.). Specific IPs are used as a glue to build the architecture. General IPs present a defined interface which must be respected by the designer if it wants to produce its own IP. For this kind of IPs we provide a library to alleviate their design. The designed architecture is configurable and parametric. In fact, to construct a mppSoC system, we assemble IPs to generate a FPGA configuration. The designer has to make different choices. He has to determine the different components in his architecture, for example if it contains an irregular communication network with a defined interconnection router or a neighborhood one or both. Since we propose a parametric architecture, he has to choose also some architectural parameters such as the number of PEs, the memory size and the topology of the neighborhood network if it exists. After fixing the architecture, the designer will choose then the basic IPs which will be used such as processor IP, interconnection network IP, etc. By this way, the user can choose the most appropriate mppSoC configuration satisfying his needs. To evaluate the proposed design methodology we have implemented different sized architectures with various configurations. We have also tested some examples of data parallel applications such as FIR, reduction, matrix multiplication, image rotation and 2D convolution. Through simulation results we can choose the most appropriate mppSoC configuration with the optimal performance metrics: execution time, FPGA resources and energy consumption. As a result we have proposed an IP based methodology for the construction of mppSoC system helping the designer to choose the best configuration for a given application. It is a first step towards a mppSoC architecture exploration. Ongoing work aims at integrating the mppSoC in a real application such a video processing framework. Future work will aim at improving the proposed IP assembling methodology to construct mppSoC systems. Our ultimate goal is to provide a completely tool to generate a mppSoC configuration in order to help the designer in a semi-automatic architecture exploration for a given application. #### 3.4.2.6. Caches in MPSoCs In Multi-Processor System-on-Chip (MPSoC) architectures using shared-memory, caches plays an important impact on performance and energy consumption levels. When the executed application depicts a high degree of reference locality, caches may reduce the amount of shared-memory accesses and data transfers on the interconnection network. Hence, execution time and energy consumption can be greatly optimized. However, caches in MPSoC architectures put forward the data coherency problem. In this context, most of the existing solutions are based either on data invalidation or data update protocols. These protocols do not consider the change in the application behavior. This paper presents a new hybrid cache-coherency protocol that is able to dynamically adapt its functioning mode according to the application needs. An original architecture which facilitates this protocol's implementation in Network-On-Chip based MPSoC architectures has been proposed. Performances, in terms of speed up factor and energy reduction gain of the proposed protocol, have been evaluated using a Cycle Accurate Bit Accurate (CABA) simulation platform. Experimental results in comparison with other existing solutions show that this protocol may give significant reductions in execution time and energy consumption can be achieved. # 3.4.2.7. Verification Guaranteeing the correctness of systems is a highly important issue in the Gaspard2 design methodology. This is required at least for their validation. In order to provide the designer with the required means to cope with validation, we propose to bridge the gap between the Gaspard2 design approach and validation techniques for SoCs by using the synchronous approach and test-based techniques. We have already defined a synchronous dataflow equational model of Gaspard2 specification concepts. The resulting model is then usable to address various correctness issues: causality analysis that enables to detect erroneous data dependencies (i.e., those which lead to cycles) in specifications, clock synchronizability analysis when such a system model is to be considered on a deployment platform, etc. Starting from the simulation clock properties of an embedded system (as described previously), we start an analysis of the system behavior. On the one hand, we verify whether or not the functional clock constraints specified by the designer in the application specification are met during the system execution on considered physical resources. When these constraints are not met, the simulation clock traces can be used to reason and find the solutions to satisfy the constraints. For instance, this may amount to decrease the speed of processors that compute data very fast or to increase the speed of processors that compute data very slowly. The modification of the processors speed by increasing or decreasing the speed should always respect the functional constraints imposed by the designer. It appears in the simulation clock traces by determining new physical clock properties from the suitable processor frequencies. Another example of solution may consist in delaying the first activation of a faster processor until an adequate time to begin the execution. Such an activation delay could be seen as minimizing the voltage/frequency. The team examples have highlighted some needs for a better numeric verification of synchronous programs, and we also work on the amelioration of precision of the Signal analysis. ### 3.4.2.8. System Level Power Modeling Due to the ongoing nano-miniaturization in chip production, estimation of power consumption is becoming a critical metric in embedded system design. In current industrial and academic practices, power estimation using low-level CAD tools is still widely adopted. These low level tools are however inconvenient to manage the architecture of modern complex embedded systems. System level power estimation is considered a vital premise to cope with the critical design constraints. The keywords in our contribution are Hybridization and decorrelation between abstraction levels. The hybridization is applied here at 2 levels: granularity of activities used to develop the power models in one side and the level of the considered abstraction on the other side. If almost of studies focus on power estimation for a given abstraction level without overcoming the wall of speed/accuracy trade-off, the idea is to build up hybrid power estimation tool that gathers different abstraction levels of the system to grab the strict relevant data depending on the power estimation process step. Thus, designers build their systems by instantiating different hardware and software IPs (Intellectual Property) from existing libraries. The granularity of the used power models should be coherent with the design approach. In this work, we develop a hybrid system level power estimator for embedded systems. First, power models relying on Functional Level Power Analysis (FLPA) methodology is developed. Secondly, we forge the whole system into a fast simulation framework in order to obtain the system's power consumption data. The combination of the above parts yields to a relatively fast and accurate power estimation. Our experimental results, performed on explicit embedded platform, show that obtained power estimation are less than 1% of error when compared to the measurements realized on the real system. In our work, we further extend the usage of higher abstraction level to speed up the estimation with the help of multigranularity of input data and phase sampling of the application. At the end, the proposed power estimation is 21 times faster than the detailed simulation with a marginal error of 1.5%. ### 3.4.2.9. Energy consumption driven dynamic reconfigurable execution model As a continuation of our work on energy consumption estimation for Systems on Chip (SoC) at the Cycle Accurate Level using SystemC simulation, the aim of our current work is to ensure the adaptivity of SoCs regarding changes at run time of some operating conditions such as consumption constraints. This adaptivity is based on the reconfigurability on the Socs implemented on FPGAs. Here, the energy consumption estimation is not done during simulation anymore but during the execution of the application on the FPGA. In order to be adaptive to runtime changes, the system architecture has to be changed accordingly. A possible change can be, for example, to change the parallelism degree or to change a processing algorithm in order to consume less energy. The decision of reconfiguring is taken after a negotiation between consumption monitors integrated in the system. This monitors are OCP- compliant, which allows them to be easily integrated and reused for different architectures thanks to the genericity and parametrability of this standard communication protocol. Up to now, we have started implementing simple systems on FPGA supporting the dynamic reconfiguration taking the user inputs as a criterion of reconfiguration. We also implemented some interface adapters in order to facilitate the future integration of the OCP monitors in the system. As a future work, we intend to integrate the energy consumption as a criterion of reconfiguration using monitors. These monitors are supposed to take decisions of reconfiguration after negotiating between them. Therefore, we started by studying the negotiation used on software systems such as multi-agent systems. We will adapt this for our hardware architecture on FPGA. #### 3.4.2.10. Partial dynamic reconfiguration Partial dynamic reconfiguration modeling [114], [113] permits to generate two key aspects of a partial dynamically reconfigurable system from high level modeled specifications: namely the dynamically reconfigurable region and the code for the reconfiguration manager that carries out the switch between the different configurations of this dynamic region. Once these aspects are generated using the model transformations, it is possible to use commercial simulation and synthesis tools to implement dynamic reconfiguration in state of the art FPGAs [114]. Currently the intermediate model transformation chain is being updated to make use of the newly introduced intermediate metamodels and model transformations developed by the DaRT team, in order to provide a uniform design flow. Similarly, optimizations related to RTL code generation using Acceleo are also continuing. However, the MARTE compliant high level specifications lack the means to express architectural details at high abstraction levels. For this reason, an initial exploratory analysis was carried out in [86] that expands the MARTE hardware concepts to include aspects of reconfigurable architectures, and to introduce aspects such as power consumption at these high level models. These works can be described as an initial contribution to the ANR FAMOUS project. Similarly, MARTE has recently introduced the notion of 'configurations', similar to those introduced in [114]. These concepts permit to express system configuration at the MARTE UML models, but lack guidelines and precise semantics. An overview of these concepts was presented in [112], which highlights some of the shortcomings of the present concepts and provides an alternative, as described in [114]. #### 3.4.2.11. Network on Chip synthesis The study of Networks on Chip (NoC) is a research field that primarily addresses the global communication in Systems-on-Chip (SoC). The selected topology and the routing algorithm play a prime role in the performance of NoC architectures. In order to handle the design complexity and meet the tight time-to-market constraints, it is important to automate most of these NoC design phases. The use of MARTE in modeling such architectures may provide designers asset of high level concepts to obtain compact and reusable models in a fast way. Thus we defined a new methodology for modeling concepts of NoC based Architectures. It aims to improve the effectiveness of the MARTE standard by clarifying some notations and extending some definitions in the standard, in order to allow modeling complex NoC architectures. #### 3.4.2.12. IP based configurable massively parallel processing SoC Our mppSoC project proposed a methodology and tool chain to design and build IP-based configurable massively parallel architectures. A mppSoc architecture is a SIMD architecture composed of a number of processor elements working in perfect synchronization, the PEs. Each PE is potentially connected to its neighbors via a regular network. Furthermore, each PE is connected to an entry of mpNoC, a massively parallel Network on Chip that performs efficient irregular communications. All the system is controlled by an Array Controller Unit, the ACU. The mppSoc project aims at the design and implementation of a given mppSoC architecture to fit the requirements of a given application. The mppSoC architecture model is then configurable and parametrizable and our chain produces FPGA implementations of the mppSoC architecture. Our last contributions define a model-driven based generation chain integrated in the Gaspard environment. A mppSoC UML model is defined using using the MARTE profile. From this model, our chain allows the generation of the corresponding mppSoC synthetizable VHDL code that can be directly simulated or prototyped on FPGA. Targeting the DE2-70 FPGA board, we have been able to validate some mppSoC configurations running signal processing applications [ref]. This last works conclude Mouna Baklouti PhD thesis [ref]. # 3.5. Formal Methods for General and Domain-Specific Languages We are working on developping and applying formal methods to the definition, analysis, and transformation of languages. These languages include general ones like C, Domain-Specific ones (DSLs) such as Kermeta [109], Signal [99], and VHDL, and Domain-Specific modelling ones (DSMLs) such as xSPEM [78]. We use rewriting techniques embodied in the K [117] and Maude [87] semantical frameworks, abstract interpretation techniques, techniques inspired from program transformation and compilation, and refinement techniques. We often use Model-Driven Engineering (MDE) as a *lingua franca* and we believe it is a useful vessel for formal methods into software engineering practice. We fruitfully collaborate with colleagues within Inria (the Triskell team at Inria Rennes-Bretagne Atlantique and the Compsys team at Inria Grenoble Rhône-Alpes), with colleagues outside Inria (David Monniaux at Verimag, Grenoble), and with foreign colleagues (the K-framework team bi-localised in Iaşi, Romania and in Urbana Champaign, USA; the university of Aleppo, Syria). We organise events (two workshops and one summer school in 2011), supervise PhD students (one started in the Fall 2001, co-supervision with the K team) and interns, participate in PhD commitees (two in 2011) and in teaching. We have obtained financial support outside Inria from the University of Lille. # 4. Software # **4.1. Gaspard 2** Participants: Jean-Luc Dekeyser [correspondant], All DaRT team. Gaspard2 is an Integrated Development Environment (IDE) for SoC visual co-modeling. It allows or will allow modeling, simulation, testing and code generation of SoC applications and hardware architectures. Its purpose is to provide a single environment for all the SoC development processes: - High level modeling of applications and hardware architectures - Application and hardware architecture association (mapping and scheduling) - Application refactoring - Deployment specification - Model to model transformation (to automatically produce models for several target platforms) - Code generation - Simulation - Reification of any stages of the development The Gaspard2 tool is based on the Eclipse [62] IDE. A set of plugins provides the different functionalities. Gaspard2 provides an internal engine to execute transformation chains. This engine is able to run either QVT (OMG standard) or Java transformations. It is also able to run model-to-text transformations based on Acceleo [64]. The Gaspard2 engine is defined to execute models conform to an internal transformation chains metamodel. A GUI has been developed to specify transformation chain models by drawing them. For the final user, application, hardware architecture, association, deployment and technology models are specified and manipulated by the developer through UML diagrams, and saved by the UML tool in an XMI file format. Gaspard2 manipulates these models through repositories (Java interfaces and implementations) automatically generated thanks to the Ecore specification. Several transformation chains are provided with Gaspard2 to target, from UML models, several execution or simulation platforms (OpenMP, OpenCL, Pthread, SystemC, VHDL, ...). This input language is based on the MARTE UML profile. A tool to generate SIMD configurations derived from the mppSoC model was developed. It allows to automatically generate the VHDL code from a high specification modeled at a high abstraction level (UML model using MARTE profile) based on the IP mppSoC library. The developed tool facilitates to the user to choose a SIMD configuration adapted to his application needs. It has been integrated in the Gaspard environment. **Gaspard2 as an educational resource.** The Gaspard2 platform was one of the topics taught in the context of the courses on embedded systems in Telecom Lille and in a Master 2 (TNSI) lecture "Design tools for embedded systems" at the University of Valenciennes. These lectures focused on the potentiality to generate several targets from a subset of the Marte profile and the ability to target system on chip architectures at the TLM level respectively. Furthermore, the model driven engineering characteristics of Gaspard2 are largely detailed in the lecture of Software engineering at Polytech Lille and in the Master of research at university of Lille too. - See also the web page <a href="http://www.gaspard2.org/">http://www.gaspard2.org/</a> - Inria softwre evaluation: A-2, SO-4, SM-2, EM-1, SDL-2, DA-4, CD-4, MS-4, TPM4 - Version: 2.1.0 # 4.2. Papyrus Participants: Cédric Dumoulin [correspondant], Amine El Kouhen, Rahma Yangui. - See also the web page http://www.eclipse.org/papyrus/ - Software data: plugins number > 150, lines number > 1 million - Inria softwre evaluation: A-5, SO-4, SM-4, EM-4, SDL-5, DA-4, CD-4, MS-4, TPM3 - Version: 0.9.0 # 4.3. Model Driven Factory Participants: Alexis Muller, Anne Etien [correspondant], Thomas Legrand. MDFactory is a Model Driven Engineering environment to design, develop and run software production chains. This tool supports our approach based on localized transformation and our Extend operator [96]. It provides a graphical editor to build such production chains with drag and drop from a reusable transformation library. MDFactory is based on the Eclipse platform and the Eclipse Modeling Framework (EMF). It is used to build Gaspard2 integrated transformation chains. This software will be transferred to the start up company Axellience. - Software data: plugins number around 75 - Evaluation of the software: A 4; SO 4; SM 2; EM 3; SDL 3; DA 4; CD 3; MS 2; TPM 2 - Version: 1.0 ### 4.4. OMEGSI Participant: Amen Souissi [correspondant]. OMEGSI is an integrated development environment (IDE) for collaborative portals. It allows business process-centered modeling, process simulation, process optimization and full code generation for collaborative portals. The OMEGSI tool is based on the Eclipse IDE. A set of plugins provides the different functionalities. OMEGSI provides an internal engine to execute interactives transformation strategies. This engine (TranS) is written in QVT transformation and able to run any transformation type (QVT, JAV, Acceleo...). Currently one transformation strategy is provided with OMEGSI to target, from an UML model, the Dolmen execution platform. This input language is based on the MACOP (Modeling and Analysis of Collaborative Portal ) UML profile. The fully functional OMEGSI Beginning version still available on Ecreall website. - See also the web page http://omegsi.ecreall.com/ - Inria softwre evaluation: A-3, SO-3, SM-1, EM-2, SDL-4, DA-4, CD-4, MS-4, TPM4 - Version: # 5. New Results # 5.1. Co-Modeling for HP-SoC with MARTE # 5.1.1. Diagonal mesh modeling with MARTE As a continuation of this work on modeling at system level, a methodology for modeling concepts of NoC-based architectures is proposed especially the modeling of all kinds of topologies (regular, irregular or hierarchical) and routing algorithms. This contribution includes a VHDL code generation. On the other side we proposed a VLSI implementation of a new NoC topology called diagonal mesh that it designed to offer a good tradeoff between hardware cost and theoretical quality of service (QoS). This NoC is based on a new router architecture called FeRoNoC (Flexible, extensible Router NoC). ## 5.1.2. MARTE extension for reconfigurable hardware models Reconfigurable System-on-Chip (RSoC), mainly FPGAs, offer several advantages such as flexibility, adaptivity and especially their capability to allow switching several implementations at run-time, i.e., PDR. PDR feature requires multiple run-time changes in RSoC such as: - QoS factors: changes in executing functionalities due to designer requirements, or changes due to resource constraints of targeted hardware/platforms. - The changes can also take place due to other environmental criteria such as communication quality, time and area consumed for reconfiguration and energy consumption. In previous work [86], we provided an initial contribution to the modeling of these systems by extending UML MARTE profile to incorporate significant design criteria such as power consumption. Furthermore, high flexibility of RSoC implies high design complexity of the control of such system. This makes designing a robust control for managing reconfiguration a studious task. In [25], we present a high level design approach using UML MARTE for modeling dynamic reconfiguration controllers. Our proposed controller is based on distributed monitoring of runtime changes and distributed decision making. Our approach allows to increase flexibility and design reusability compared to centralized solution. Indeed, in its current version, UML MARTE profile lacks dynamic reconfiguration concepts and requirements for the reconfiguration control mechanism. Even these later are necessary to model and implement rapid prototypes for complex systems. We can only model a state machine at high abstraction levels which is responsible for switching between the available configurations. So we define a new design methodology using the proposed version of RecoMARTE (extended MARTE) to model PDR concepts at different abstraction levels, mainly architecture (structural and physical models) and allocation (software to Hardware allocation (Sw/Hw Allocate) and Hardware to Hardware allocation (Hw/Hw Allocate)). We also define necessary requirements for the reconfiguration control mechanism in order to manage reconfiguration at every design level. In addition, our solution allows to describe global contracts and constraints for combining automata. As future works, we plan to carry out model transformations to enable automatic code generation of configuration files. The code can then be used as input for commercial tools for final FPGA synthesis. # 5.1.3. Comparaison of SAC and ArrayOL for parallelism expression In this join work with the University of Hertfordshire, we compare and analyse two such schemes. One of them is a domain-specific language, ArrayOL, to OpenCL. The other one is a transformation mechanism for mapping a image/signal processing transformation route for mapping a high-level general purpose array processing language, Single Assignment C (SaC) to CUDA. Using a real-world image processing application as a running example, we demonstrate that albeit the fact of being general purpose, the array processing language be used to specify complex array access patterns generically. Performance of the generated CUDA code is comparable to the OpenCL code created from domain-specific language. # 5.1.4. Gaspard Modeling Improvements Gaspard2 is the IDE proposed by the DaRT team. Its usage can be painful for beginners as well as for experts. We try to improve the usage of Gaspard in different ways: - By allowing modifications at any model level, and let propagate the modifications to the higher and lower models (Amen Souissi). - By providing missing diagrams in Papyrus (Amine El Kouen) By customizing the Gaspard User Interface (UI). Modeling in Gaspard is done with the Papyrus Modeler. We participate to the Papyrus development, which allow us to propose some customization tools. These laters, are used to provide a modeling UI more adapted to embedded system co-modeling. This work is done by Rahma Yangui (INRIA engineer). - By allowing to adapt dynamically the UML modeler environment according to the steps of the modeling process (Amine El Kouen's thesis). This allows to guide the user in its development process, and to propose a simplified UI, oriented to the current development step. Also, we have migrated from Papyrus I to Papyrus Eclipse. # 5.2. Formal Methods for general-purpose and domain-specific languages # 5.2.1. Formal Semantics for Domain Specific Modeling Languages Domain-Specific Modelling Languages (DSMLs) are languages dedicated to modelling in specific application areas. Recently, the design of DSMLs has become widely accessible to engineers trained in the basics of Model-Driven Engineering (MDE): one designs a metamodel for the language's abstract syntax; then, the language's operational semantics is expressed using model transformations over the metamodel. The democratisation of DSML design catalysed by MDE is likely to give birth to numerous languages. One can also reasonably expect that there shall be numerous errors in those languages. Indeed, getting a language right (especially its operational semantics) is hard, regardless of whether the language is defined in the modern MDE framework or in more traditional ones. Formal approaches can benefit language designers by helping them avoid or detect errors. But, in order to be accepted by nonexpert users, formal approaches have to operate in the background of a familiar language design process, such as the MDE-based one mentioned above. In 2011 we have migrated from [21], which uses the general Maude semantic framework, towards using the more language-definition specific the K-semantic framework to formalise the basic MDE ingredients used in DSML definition: models, metamodels, and model transformations. We have implemented a prototype tool that takes as input any DSML described in using MDE, and generates formal K definitions for the language's syntax, static semantics, and operational semantics. Since the definitions are executable, we get execution and formal verification engines for free [44]. A subproject of this work has been a formal definition for a substantial fragment of the OCL language [45]. # 5.2.2. A new abstraction for signal programs, and improvement of the compilation process In this work we propose a sound abstraction for an efficient static analysis of synchronous programs describing multi-clock embedded systems in Signal. This abstraction combines the Boolean theory and numeric interval approximation to adequately address clock relations defined as combinations of logical and numerical expressions. Through a few examples, we show how the proposed solution is used to determine absence of reaction captured by empty clocks; mutual exclusion captured by two or more clocks whose associated signals never occur at the same time; or hierarchical control of component activations via clock inclusion. We also show this analysis improves the quality of the code generated automatically by the Signal compiler, e.g., a code with smaller footprint, or a code executed more efficiently thanks to optimizations enabled by the new abstraction [38]. # 5.2.3. Using bounded model checking to focus fixpoint iterations Two classical sources of imprecision in static analysis by abstract interpretation are widening and merge operations. Merge operations can be done away by distinguishing paths, as in trace partitioning, at the expense of enumerating an exponential number of paths. In this article, we describe how to avoid such systematic exploration by focusing on a single path at a time, designated by SMT-solving. Our method combines well with acceleration techniques, thus doing away with widenings as well in some cases. We illustrate it over the well-known domain of convex polyhedra [40]. # 5.2.4. A formal definition of a compiler for the Kermeta metamodeling language in K Kermeta [109] is a DSL designed as a kernel for metamodel engineering. It unifies metamodeling, constraints, semantics and transformation features into a statically typed language. It is object-oriented and allows for metamodeling features such as attributes, associations, and multiplicities. It also includes design-by-contract, aspect-oriented features, and genericity. This makes Kermeta a large and complex language: indeed, combining all these features into one language may easily lead to inconsistencies. Christophe's postoctoral work, starting in September 2010, has been to formally specify Kermeta. He did so via a specification of compiler for Kermeta in K [117]. K formal specifications are executable, hence, Christophe's compiler can be used to actually compile Kermeta programs. The compiler it completely self-contained and generates bytecode for an abstract machine also formally specified in K. This work led to the discovery of several errors and inconsistencies in Kermeta's manual and existing interpreter. The errors are reported to the Kermeta designers (Triskell project-team at Inria Rennes-Bretagne Atlantique), who, as it turns out, are also writing a compiler of Kermeta in the traditional, informal way. We are planning to make them benefit from the experience we gained in formal compilation. # 5.2.5. A generic approach and tool for tracing executions back to a DSML's operational semantics Model-driven engineering allows users to define abstract syntaxes for their own DSMLs in terms of metamodels. Several approaches for defining operational semantics for DSMLs have also been proposed. These approaches allow, in principle, for model execution and for formal analyses of the DSMLs. However, most of the time, the executions/analyses are performed via transformations to other languages: code generation, resp. translation to the input language of a model checker. The consequence is that the results (e.g., a program crash log, or a counterexample returned by a model checker) may not be straightforward to interpret by the users of a DSML. We propose a formal and operational framework for tracing such results back to the original DSML's syntax and operational semantics. We implement the approach in a generic tool written in Kermeta, and illustrated in on the xSPEM language, a timed language for expressing the execution of activities constrained by time, resources, and precedences [31]. # 5.3. Optimization and compilation techniques # 5.3.1. Generated Code Optimization Performing a model-to-source transformation, whereby a high-level language is mapped to CUDA or OpenCL, is an attractive option. In particular, it enables to harness the power of GPUs without any expertise on the GPGPU programming. In this work, we add a new compilation option for the Gaspard2 transformation chain: UMI2OpenCL to detect shareable data zone. The tilers from ArrayOL, which allow express the data parallelism from repetitive tasks, are analyzed in time compilation to create areas of shared data. The identification of these areas is crucial to allow us loading data on shared areas of memory that have high throughput. Consequently, programs automatically generated shall have performances comparable to manually well written programs. # 5.3.2. Methodology to generate OpenCL code from MARTE models In order to reduce design complexity, we propose an approach to generate code for OpenCL API, an open standard for parallel programming of heterogeneous systems. This approach is based on Model Driven Engineering (MDE) and Modeling and Analysis of Real-Time and Embedded Systems (MARTE) standard proposed by Object Management Group (OMG). The aim is to provide resources to non-specialist in parallel programming to implement their applications. Moreover, concepts like reuse and platform independence are present. Since we have designed an application and execution platform architecture, we can reuse the same project to add more functionalities and/or change the target architecture. Consequently, this approach helps industries to achieve their time-to-market constraints. The resulting code, for the host and compute devices, are compilable source files that satisfy the specifications defined on design time. ## 5.3.3. Profiling into Models Regarding the models fine tuning, we propose integrating software-profiling results to higher-level specification models [56]. The aim is to optimize the models and, consequently, the generated code. The model optimization approach relies on the Gaspard2 branch dedicated to code generation for OpenCL and GPUs [58]. We offer software execution feedback, based on models transformation traceability [75], to model designers. These feedbacks enable the designers to tune their models in order to improve the software performances even if they do not have in-depth knowledge on the running platform (GPU). First, the code is generated from a first designed model using Gaspard2. The resulting code is then executed within an existing profiling environment. Afterwards, profiling results are delivered directly to designer as annotations in the model. Basically, we move up two types of information, using traceability. The first type directly results from the profiler, e.g. processor occupancy, onto specific regions in the model, enlightening the regions that requires tuning. The second type correspond to results of an expert system analysis that we provide. Information of this second type is delivered to designers as advices in the model annotations. The expert system generates these advices from platform features and running results. For example, it can suggest changing the shape of a task in order to optimize the processor occupancy. The more we feed the knowledge base and engine of the expert system, the more it is able to give better advices. The model optimization relies on the hypothesis that the high level models are error free. Since these models are complex, it is difficult for the designers to conceive them correctly the first time. We propose a new approach, enabling the model designer to debug its models. For this purpose, we offer a quick and automatic code instrumentation to the model designer. As for the model optimisation, we take advantage of the model transformation traceability to keep the link between models and software execution and to provide execution information feedback. Hence, the information produced in the running environment during the software execution is moved up directly onto the models, allowing the model designers verifying the behavior of their software, directly on the high level models. ### 5.3.4. Static Analysis of Polychronous Specifications with SMT Theory As opposed to single clocked synchronous programming paradigms, polychronous formalism allows specification of concurrent data flow computation on signals such that various data flows can evolve asynchronous with respect to each other. We formulated the clock analysis in Signal compilation [38] and the detection of false loops in MRICDF as a decision problem in Satisfiability Modulo Theory (SMT) [30] [59]. Due to recent interests in SMT solvers, a number of efficient solvers are available which offer a greater expressiveness in dealing with non Boolean constraints and allow us to discern false loops from realizable causalities in reasonable computation time. We demonstrated that several polychronous specifications rejected by current compilers due to their inability to identify only true causal loops, can be synthesized as correct sequential embedded software. #### 5.3.5. Programming functional and real-time aspects simultaneously An embedded system is usually required to respect real-time constraints related to physical constraints, either those of its environment or those of the physical devices it controls. First, it is often multi-periodic since its devices have different physical characteristics and must therefore be controlled at different rates. Second, the system must respect deadline constraints, which may correspond for instance to a maximum end-toend latency requirement between observations (inputs) and the corresponding reactions (outputs). A correct implementation must respect all the real-time constraints and must also be functionally deterministic, meaning that the outputs of the system are always the same for a given sequence of inputs. Current practice often deals with this two aspects separately, while our objective is to deal with them simultaneously. To this intent, we must first introduce real-time primitives at the programming language level. We carried on previous work on the PRELUDE language [19], which provides such primitives in a synchronous data-flow language. We produced a complete end-to-end framework for the design and the implementation of embedded systems on a symmetric multicore: the PRELUDE-SCHEDMCORE toolset [32]. We recently started a Master research project to study how real-time aspects could be introduced in more traditional programming paradigms with the SCALA a language. The PRELUDE compiler translates a program into a set of dependent periodic tasks. We proposed a new dynamic priority-based scheduling policy capable of dealing with the extended precedence constraints (constraints between tasks of different periods) of such systems in [36], [48]. Finally, as PRELUDExs semantics defines formally both the functional and the temporal behaviour of a system, we studied temporal formal verification in [46]. # 5.3.6. Chaining Localized Model Transformation Usually, two transformations can only be chained if the output metamodel of the first one is included into the input metamodel of the second one. This compliance issue forces to design either tailored fine-grain model transformations for a dedicated chain or large and complex transformations. In both cases, transformations are not reusable and hardly maintainable. In order to solve this problem, we have introduced localized transformations which apply to a (typically very small) subset of an input metamodel of a transformation. Each localized transformation is designed and implemented to accomplish a specific transformation task, and involves and is applicable to a few concepts. The input and output metamodels of these transformations are not disjoint contrarily to traditional transformations; new chaining constraints have to be defined. We have thus defined new chaining constraints based on a type analysis to specify when two transformations can be chained in one, both or any order [96]. In some cases, this analysis concludes that the transformations can be chained in both order but with some input models, the two output models resulting of the two chaining, are not the same. We have introduce an intermediary abstraction level independent of any transformation language that focuses on read, modified, created and deleted metaelements. We are pursuing our investigations with this new abstraction level. # 5.4. Green computing on SoC # 5.4.1. Correct and Energy-Efficient Design of a Multimedia Application on SoC We studied the design and analysis of multimedia applications such as the JPEG encoder on multiprocessor architectures [55] [24] [13]. A model-based approach was adopted by using the UML Marte specifications [54]. An abstract clock analysis has been proposed to deal with the correctness of system behaviors and to find the most suitable execution platform configurations regarding performance and energy consumption. Our approach offers a rapid and reliable design space analysis, which is crucial when implementing complex systems [37]. # 5.4.2. Design Space Exploration for Efficient Data Intensive Computing on SoCs Finding efficient implementations of data intensive applications, such as radar/sonar signal and image processing, on a system-on-chip is a very challenging problem due to increasing complexity and performance requirements of such applications. One major issue is the optimization of data transfer and storage microarchitecture, which is crucial in this context. We proposed a comprehensive method to explore the mapping of high-level representations of applications into a customizable hardware accelerator [52]. The high-level representation is given in a language named Array-OL. The customizable architecture uses FIFO queues and a double buffering mechanism to mask the latency of data transfers and external memory access. The mapping of a high-level representation onto a given architecture is achieved by applying loop transformations in Array-OL. A method based on integer partition is used to reduce the space of explored solutions. Our proposition aims at facilitating the inference of adequate hardware realizations for data intensive applications. It is illustrated on a case study consisting in implementing a hydrophone monitoring application. #### 5.4.3. Power Estimation Within the context of the OPEN-PEOPLE project, we aim at addressing the power estimation challenges of embedded system design with a new approach, combining Functional Level Power Analysis with advanced SystemC – Transaction Level Modeling (TLM) simulation techniques, in order to formally prove qualitative and quantitative properties of the final system power estimation. This approach requires the construction of a power models from FLPA for different embedded boards (FPGA and ASIC) and building up system level simulation environment for the analysis of power model and proofs of properties of the simulated system [42]. As a main contribution, we propose a new hybrid system-level power consumption estimation methodology for complex embedded systems [41]. A key word in our contribution is hybridization between abstraction levels. Almost all the previous studies focus on power estimation for a given abstraction level without overcoming the wall of speed/accuracy trade-off. The idea here is to build up a hybrid power estimation tool that combines Functional Level Power Analysis (FLPA) for hardware power modeling and Transactional Level Modeling (TLM) simulation technique for rapid system prototyping and fast power estimation. Basically, the FLPA is used for processor power modeling. In the frame of this work, it will be extended to cover the other hardware components used in the MultiProcessor System-on-Chip (MPSoC) such as the memory and the reconfigurable logic. After that, we go further in terms of scalability to target heterogeneous multiprocessor architectures. The functional power estimation part is coupled with a fast SystemC simulator in order to obtain the needed micro-architectural activities for power models, which allows us to reach a superior bargain between accuracy and speed[43]. # 5.5. Dynamic reconfiguration for HP-SoC ### 5.5.1. Context switching for volatile IP Dynamic reconfigurations require configuration decisions from smart controllers. Such a decision implies context saving of an existing IP or switching from an IP to another (loading a new bitstream). The store/restore operations can be managed by the operating system or by using a dedicated hardware component. In this work, a new model for hardware IP context storage and management is proposed. The approach is based on a flexible hardware wrapper which can make IP reconfigurable. In fact, these wrappers contain a naming system supporting efficient runtime context switching. # 5.5.2. A generic broadcast network for HP-SoC architecture The **hNoC** model proposes a specific network on chip dedicated to the massively parallel architecture SCAC. This model is composed of huge number of complex routers, called node elements (the **NE**s), communicating and working in perfect synchronizations. Each NE is potentially connected to its neighbors via a regular connection. Furthermore, each NE is connected to a heterogeneous set of computing groups (clusters) allow asynchronous processing. Each group includes a combination of processors programmable, the PEs (software processing units) and specialized hardware accelerators (hardware processing units) to perform critical tasks demanding the more performance. All the system is controlled by a Network Controller Unit, the **NCU**. The NCU and The PEs are implemented with the Forth processor. The aim of our works is to design a new kind of communication network model for SCAC architecture to overcome firstly the overlapping communications with computations and secondly to increase significantly the external performances in terms of throughput. The difficulty of designing hNoC is a compromise between an optimal quality of broadcasting, high bandwidth and important flexibility of use, while reducing power consumption and silicon area. Our first contributions defined a broadcast with mask model integrated in the communication network hNoC of SCAC architecture. This model is based on **subnetting** the network of processing nodes which separate the control of communication and processing. Our model was implemented in synthetizable **VHDL** code that is simulated and targeted Xilinx Virtex6 (XC6VLX240T) board. ## 5.5.3. Distributed control for dynamic reconfiguration The aim of our current work is to propose a distributed approach for reconfiguration control on FPGAs. The main reason for choosing a distributed control approach is that, with the ever growing complexity and size of the modern reconfigurable systems, the traditional centralized approach is no more efficient. Instead, a distributed control has many advantages in terms of performance and design efficiency. Indeed, the distributed control allows to avoid communication bottlenecks and to increase the parallelism compared to the centralized one, allowing a better performance which is a critical issue especially for high-performance applications. At the design level, the distributed control has many advantages. It allows to decrease the design complexity of the control by dividing the intelligence between the controllers, which allows a shorter design time and an easier verification. It also facilitates the reuse of the controllers instead of redesigning a centralized controller for different systems, which allows also a higher scalability in order to adapt to the growing size of the modern SoC. Our approach for reconfiguration control is an event-driven control, where events come from a variety of sources in order to ensure a high adaptivity of the reconfigurable systems. Reconfiguration can be triggered by a user input, a change in the environment condition (e.g. changes in lightening condition) or a change in performance or power consumption requirements, etc. Therefore, we propose a modular structure of each controller allowing three major tasks: monitoring, decision making and reconfiguration realization. In order to respect the global constraints of the system, the controllers communicate their decisions to each other in order to handle cooperation and conflicts. In [25], we proposed a high-level design of our approach using Model Driven Engineering aiming to combine the advantages of the distributed control with the high-level design in order to decrease design complexity and automate code generation increasing thus design productivity. At the physical level, the distributed control has been implemented for simple applications in order to test the different modules of the controllers (monitoring, decision making, communication between controllers). As a future work, we plan to implement the distributed control for more complex applications in order to highlight the advantages of our approach and study its limits. # 5.5.4. Avionic test bench on heterogenous reconfigurable platform The aim goal of this thesis is to design the next Eurocopter avionic test bench generation. For the past 20 years, Test Systems have always be considered as a must do in the avionic development cycle. In early 2008, the Eurocopter research department has undertaken a profound reflection on the vocation Pro-Active Test Systems [15]. Hitherto, the test systems were based on real time specific CPU boards that run proprietary real time operating systems and plugged with Input/Output (I/O) boards to communicate with the equipments under test. In current industrial practice, the well-spread VME CPU boards are widely used. Due to the present test system performance requirement, an increase in the computation rates is needed, but it cannot be delivered by the VME CPU boards any-more. Furthermore, this solution is considered as an expensive maintainable technology. To overcome these drawbacks, the usage of multicore hosts (PC or workstation) allows an immediate increase in the capacity of computation. An important outcome of this transition is the refusal of the obsolete CPU boards. However, this solution cannot guarantee the real-time criteria while the execution of concurrent tasks due to the lack of an appropriate Operating System (OS) environment. In addition, this solution brings new communication latencies between the CPUs and I/O boards plugged in the VME backplane. In this work, our proposal is to make profit from the new available hardware computing resource (FPGA) and to make up hybrid avionic test systems [27]. Indeed, FPGA technology could offer a higher computation rates comparing to CPUs up to 10x. It could implement heavy models in a hardware fashion with the management of the parallelism degree to answer the real-time constraints of the application. The main challenge of hybrid (CPU/FPGA) architectures concerns the programming model and the design methodology. We need to deal with the heterogeneity of both hardware and software parts in order to obtain a fast system prototyping. In current industrial practice, manual coding is still widely adopted in the development of hybrid architectures, which is clearly not suited to manage the complexity intrinsic in these systems. For designers, this approach is very tedious, error-prone and expensive. In the first part of our work we emphasized the usage of Model Driven Engineering (MDE) for heterogeneous systems in order to reduce the design complexity of CPU-FPGA architectures [72]. In ReCoSoC paper, we focused on the prototyping environment and the related development tools in order to map existing software into CPU-FPGA architectures by detecting all data dependencies and get the parallelism degree. Moreover, we presented communication solutions comparing fast links such as Ethernet and PCIe. Secondly Multi-Core optimizations in different environments such as Linux with Open Source real-time patches (Xenomai) and processor affinities capabilities. Then, we presented in [28] a new generation of adaptive and generic avionic test benches using FPGA reconfigurability capabilities. Indeed, nowadays, each Eurocopter test bench is related to a specific embedded part and a specific aircraft. Proposing such generic architecture will reduce the helicopter design cycle significantly by Testing different embedded systems at the same time. # 5.6. Application case-studies ### 5.6.1. Experimentations for electromagnetism simulations The electrical and electronic engineering has used parallel programming to solve its large scale complex problems for performance reasons. However, as parallel programming requires a non-trivial distribution of tasks and data, developers find it hard to implement their applications effectively. Thus, we used our approach, based on Model Driven Engineering (MDE) and the MARTE profile, to generate code for a sparse solver and achieve a good speed-up. Moreover, thanks to model reuse capacity, we can add/change functionalities or the target architecture and still keep a good scalability. ### 5.6.2. H.264 modeling on NoC, implementation and synthesis In addition, the H.264 coder is modeled using MARTE profile and a hardware description for all components is proposed. particularly the Motion Estimation (ME), adopts many new features to increase the coding performance such as block matching algorithm, motion vector prediction, variable block size motion estimation, etc. However, VBSME is utilized in the MPEG4-AVC/H.264 standard which leads to a higher computational complexity and a higher data dependence that makes the hardware implementation very complex. The aime of our work is to propose a VLSI architecture for full-search VBSME (FSVBSME). This contribution allows the reusing smaller sub-blocks for the computation results, sharing sub-blocks comparator and offering low power consumption. #### 5.7. Axellience Based on the good results of the localized transformations coupled with MDFactory in Gaspard in term of reusability, modifiability and understandability, Alexis Muller (Expert Engineer) decided to study the opportunity to create a start-up company from these works. Due to his past experiment in the domain of information system and the maturity of the model usage by the enterprises, the idea was to target the automatic generation of information system from UML model and no more to address embedded systems. Joined by Thomas Legrand (Software Engineer), they developed new localized transformations and new chains. The results and the first feed-backs from enterprises in the domain of information systems are very encouraging. Furthermore, the Axellience project has win the national competitive examination of helping to the creation of innovating enterprises (Oseo). A technological transfer concerning MDFactory is foreseen between the DaRT team and the Axellience project in order to create the start-up company in the early beginning of the 2012 year. Straight collaborations between the company and the DaRT team should continue via the works around localized transformations. # 6. Contracts and Grants with Industry ### 6.1. ANR Famous Collaboration with INRIA Rhône Aples, Université de Bretagne Sud, Université de Bourgogne, SME SODIUS FAMOUS project aims at introducing a complete methodology that takes the reconfigurability of the hardware as an essential design concept and proposes the necessary mechanisms to fully exploit those capabilities at runtime. The project covers research in system models, compile time and run time methods, and analysis and verification techniques. These tools will provide high-quality designs with improved designer productivity, while guaranteeing consistency with the initial requirements for adaptability and the final implementation. Thus FAMOUS is a research project with an immediate industrial impact. Actually, it will make reconfigurable systems design easier and faster. The obtained tool in this project is expected to be used by both companies designers and academic researchers, especially for modern applications system specific design as smart camera, image and video processing, FAMOUS tools will be based on well established standards in design community. In fact, modeling will start from very high abstraction level using an extended version of MARTE. Simulation and synthesizable models will be obtained by automatic model to model transformations, using MDE approach. These techniques will contribute to shorten drastically time-to-market. FAMOUS is a basic research project. In fact, most of partners are academic, and its main objective is to explore novel design methodologies and target modern embedded systems architectures. FAMOUS project is funded by french Agence Nationale de la Recherche (ANR). It has also been labeled by Media & Network cluster in 2009. The involved resources reach 408 person-month, from five partners: the public research labs LIFL INRIA (Lille), LabSTICC (Lorient), INRIA Rhône-Alpes (Grenoble), LE2I University of Bourgogne (Dijon) and the SME company Sodius SAS (Nantes). It has started on December 2009, and it will last 48 months. ## 6.2. The ANR Open-People project **Partners:** Université de Bretagne Sud (UBS)Lab-STICC, INRIA Nancy Grand Est, INRIA Lille Nord Europe, Université de Rennes 1 (UR1), Université de Nice Sophia Antipolis (UNSA), THALES Communications (Colombes), InPixal (Rennes) The Open-PEOPLE (Open Power and Energy Optimization PLatform and Estimator project is a national project funded by the ANR (Agence Nationale de la Recherche), the French National Research Agency. The objective of Open-PEOPLE is to provide a platform for estimating and optimizing the power and energy consumptions. Users will be able to estimate the consumption of an application deployed on a hardware architecture chosen in a set of parametric reference architectures. The components used in the targeted architecture will be chosen in a library of hardware and software components. Some of these components will be parametric (such as reconfigurable processors or ASIP) to further enlarge the design space for exploration. The library will be extensible; users will have the possibility to add new components, according to the evolution of both applications and technology. Open-PEOPLE is definitely an open project. The software platform for conducting estimation and optimization, will be accessible through an Internet portal. This software platform will be coupled to an automated hardware platform for physical measurements. The measurements needed to build models for new components to be added in the library will be remotely controlled through the software platform. A library of benchmarks will be proposed, to help building models for new components and architectures. ### 6.3. INRIA Euromed 3+3 This project, involves the DaRT team of Inria Lille Nord-Europe, Ecole Supérieur d'Informatique d'Alger, Université de Monastir and University of Las Palmas. It aims at studying efficient architectures for modern embedded system. Thus, we defined, modeled and designed NoC based systems using MDE approaches. Image processing applications (H264) modeling and their efficient mapping on the developed architectures, constitute also a key issue in this project. ### 6.4. STIC INRIA - Tunisia program We have been co-advising two PhD students and several Master students in collaboration with the team of Pr. Mohamed Abid at CES-ENIS in Sfax and Pr. Abderrazak JEMAI at INSAT in Tunis. This collaboration is supported by the STIC Inria-Tunisia program, which aims at promoting the design of metamodels, transformation tools and techniques for the implementation of reconfigurable systems-on-chip. The resulting co-design environment will be validated on embedded systems dedicated to security in automobile, and more specifically in the design of cruise control systems integrating anti-collision radars. Several successful student exchanges have been realized since 2006 between DaRT, INSAT and CES-ENIS. # 6.5. Contrat STIC INRIA - Algérie This project, involving the DaRT and Dolphin team of Inria Lille Nord-Europe, and the Laboratoire d'Informatique d'Oran of the university of Oran, Algeria aims at studying the architecture exploration phase of embedded system design. It has started in 2011 and should end in 2012. It funds exchanges between the two countries. A first "magistère" has been defended in 2011 (Abdelkader Aroui). ### 6.6. Nano 2012 ID-TLM This project, involves the DaRT team of Inria Lille Nord-Europe, the Aoste team of Inria Sophia-Antipolis Méditerranée and ST Microelectronics studies formal models of computation and model driven engineering to help design embedded systems. It has started in 2009 for four years. ### 6.7. Collaboration with CEA List A PhD thesis (Asma Charfi) is co-advised between our team and the CEA List on optimized code generation from MARTE models. The idea is that some information is lost when the code is generated from a high level model to code. The compiler then tries to find back this lost information to optimize the code. If these optimizations were taken care of at model level, the compiler would have a simplified task to do and we could expect improved performance. This thesis has been defended on December 2011. A new PhD thesis (Amine El Kouhen) is co-advised between our team and the CEA List on the adaptation of UML Tools to the domain and to the design process. The idea is to provide customers with a UML tool adapted to its work, here, the design of embedded system. The tool customization is done with the help of models or profiles. DaRT and the CEA List also collaborate on the Papyrus UML project. #### 6.8. Collaboration with SME Ecreall The transformation chain used in the DaRT Embedded System modeling approach involves several models automatically generated. This work aims to be able to modify directly ones of the generated model, and let the modification be propagated in both direction to other models of the chain. Not all changes can be propagated, a part of this work will be to identify changes that can be propagated. This work is done in collaboration with Ecreall (http://www.ecreall.com/), a small company involved in developing web collaborative portals. The first step of the work was to align MDE practice of the company to the DaRT practice. A first transformation chain has been developed. It allows to model collaborative portals, transform it in intermediate models, and then generate the code for a targeted technology (dolmen). This work will reuse results from the Traceability. ### 6.9. Collaboration EADS IW, and Eurocopter The subject deals with dynamic reconfigurable system design for avionic test applications. It is motivated by the need of methodologies and tools for the design of high-performance applications on dynamic reconfigurable computing systems. A complete methodology takes the reconfigurability of the hardware as an essential design concept and proposes the necessary mechanisms to fully exploit those capabilities at runtime. A set of tools must provide high-quality designs with improved designer productivity, which guarantees consistency with the initial requirements for adaptability and for the final implementation. This methodology allows designers to easily implement a system specification on a platform that includes general purpose processors dynamically combined with multiple accelerators running on an FPGA. # 7. Partnerships and Cooperations # 7.1. International Initiatives ### 7.1.1. Collaboration with Colombia The collaboration with the Universitad de los Andes in Bogota and more precisely the team of software engineering directed by Rubby Casallas is still running. Anne Etien is co-supervising the master thesis of David Mendez. Furthermore, common works also with the University of York about the themes of model driven engineering and evolution are in course. #### 7.1.2. Collaboration with Romania We collaborate with the university of Iaşi (Romania) on the formal definition of DSMLs in the K framework. ### 7.1.3. Visits of International Scientists BADRI NARAYANAN RAVI (from Jan 2011 until Jun 2011) Subject: Environnement unifié en vue de l'estimation et l'optimisation de la consommation de puissance des systèmes embarqués mobiles Institution: Chalmers University of Technology (Sweden) ### 7.2. European Initiatives ### 7.2.1. Collaboration with Belgium The collaboration with the Université Libre de Bruxelles, (flemish part (VUB)) and more specifically the Software Languages Lab is still running. It concerns the chaining of localized transformations. #### 7.2.2. Collaboration with England The collaboration initiate the previous year with the Sosym team of the University of York in England is continuing. We are working together on model driven engineering, evolution and genericity. Some papers have been conjointly written and are under submission. ### 7.2.3. Participation In European Programs Pierre Boulet is a member of the HiPEAC network of excellence. #### 7.3. National Initiatives #### 7.3.1. Within Inria We collaborate with colleagues within Inria with: - the Triskell team at Inria Rennes-Bretagne Atlantique) on the analysis of DSMLs and on the formal definition of Kermeta. - the Compsys team at Inria Grenoble Rhône-Alpes /Lyon on termination, and more generally on analysis of general C programs. #### 7.3.2. Other National Collaborations We collaborate with David Monniaux (Verimag, Grenoble) on improving the global precision of fixpoint computations, via the use of SMT solving. We also collaborate with the L2EP (Université de Lille1) inside the research pole MEDEE, especially in the first action: industrialization of Code\_CARMEL. Code\_CARMEL is a software for electromagnetic fields simulations. We collaborate with Xavier Blanc (LaBri) on the definition of constraints for chaining model transformations. # 8. Dissemination ### 8.1. Animation of the scientific community Cedric Dumoulin was reviewers in the following revues or conferences: - ACM Computing Surveys, http://csur.acm.org/ - MSR 11, http://www.lifl.fr/msr11/ - SOSYM 11, http://www.sosym.org/ Abdoulaye Gamatié gave an invited talk at LCTES'11 on Model Driven Design Framework for Massively Parallel Embedded Systems. He was in the program committee of the LCTES'2011, MSR'2011 (PC Chair), ES-Lsyn'2011, ACCA'2011 and M-BED'2011 conferences. He was also organizing local chair of the MSR'2011 conference. He was member of a PhD thesis committee in 2011. Finally, he was member of the following selection committees in 2011: PhD candidates and Post-doctorates at INRIA Lille - Nord Europe, Associate Professor at IUT A - Université Paul Sabatier in Toulouse. Anne Etien was member of the program committee of ECMFA 2011, Inforsid 2011, the Model and Evolution Workshop organized jointly to the Models conference. She was also reviewer for the international Journal Sosym. Pierre Boulet chaired and organized MBED'11 a workshop collocated with DATE'11, Grenoble, France, March 18, 2011 (http://www.ecsi.org/m-bed-2011). He was member of the program committee of RAPIDO'11. He has also reviewed papers for the Journal of Systems Architecture, for Design Automation for Embedded Systems and for the Software and System Modeling (SoSyM) journal. He was also a referee of three PhDs and two "habilitations à diriger des recherches" in 2011. Laure Gonnord organised the workshop "Analyse to compile, compile to analyse" (http://acca2011.imag.fr/) in April 2011 in Chamonix. Vlad Rusu co-organised the 2nd Int. Workshop on Algebraic Methods for Model-based Software Engineering (http://www.lcc.uma.es/~duran/AMMSE11/) in Zürich, as well as the Ecole Jeunes Chercheurs en Programmation (http://ejcp2011.inria.fr/index2011.htm) in Rennes and Dinard, in June 2011. Vlad Rusu was referee for the PhDs of of Machiel vad der Bijl (Univ. Twente, the Netherlands) on Compositionality and Refinement on Model-Based Testing, and Adriàn Riesco (Univ. Madrid) on Debugging and Heteorgenous Verifications in Maude. Frédéric Guyomarch gave an invited talk for the celebration of the 20th LERMA's birthday on the FIT solver for simulation of electromagnetic problems. We have participated in the organisation of the "Journées sur l'Ingénierie Dirigée par les Modèles" (IDM Days), co-located with the "Journées du GDR Génie de la Programmation et du Logiciel" (GDR GPL Days) (http://rmod.lille.inria.fr/idm-gpl/pier). We also organized the 3rd Workshop on Rapid Simulation and Performance Evaluation: Methods and tools (Rapido'2011 <a href="http://www2.lifl.fr/rapido/Rapido/Home.html">http://www2.lifl.fr/rapido/Rapido/Home.html</a>) DART project members co-organized (in cooperation with CEA and Polytechnico di Milano) the 3rd Rapido workshop on Saturday 22 Jan 2011in Heraklion, Crete, Greece. This workshop on Rapid Simulation and Performance Evaluation: Methods and tools (http://www2.lifl.fr/rapido/Rapido/Home.html) occurred in conjunction with the 6th International Conference on High-Performance and Embedded Architectures and Compilers of the Hipeac FP7 network of excellence (NoE HiPEAC). More than 30 researchers attended the conference. Four talented invited speakers gave keynotes and 8 papers have been selected and presented in the workshop on different aspects of design and simulation of embedded systems. ### 8.2. Teaching #### Licence: Licence : Introduction to computers architecture, 44h, L1 Université de Lille 1, France (Frédéric Guyomarch) Licence: Mathematics for informatics, 64h, L2 Université de Lille 1, France (Frédéric Guyomarch) Licence : Datastructures, 44h, L1 Université de Lille 1, France (Frédéric Guyomarch) Licence : Algorithmics and programming in JAVA, 64h, L1 Université de Lille 1, France (Frédéric Guyomarch) Licence: System Programming, 60h, L3, Université Lille 1, France (Philippe Marquet) Licence: Algorithmics and programming: an introduction, 30h, L3, Polytech'Lille, France (Julien Forget) Licence: Languages and translators, 14h, L3, Polytech'Lille, France (Julien Forget) Licence: Advanced programming, 10h, L3, Polytech'Lille, France (Julien Forget) Licence: Algorithmics, 24h, L3, Polytech'Lille, France (Julien Forget) Licence: Introduction aux Bases de données, 50h, L3, Université Lille 1 (EPU), France (Anne Etien) #### Master: Master: Advanced Object Conception, 69h, M1, Université Lille 1, France (Cédric Dumoulin) Master: New Technologies for the Web, 75h, M2, Université Lille 1, France (Cédric Dumoulin) Master : Embedded System Design, 18h, M2, Télécom Lille 1, France (Abdoulaye Gamatié et Pierre Boulet) Master: Distributed Systems, 18h, M2, Université Lille 1, France (Pierre Boulet) Master: UML profiles for Embeded Systems, 3h, M1, University of Oran, Algeria (Pierre Boulet) Master: Design of Operating System, 42h, M1 Université Lille 1, France (Philippe Marquet) Master: Parallel and Distributed Programming, 24h, M1 Université Lille 1, France (Philippe Marquet) Master: Introduction to Innovation and Research, 15h, M2 Université Lille 1, France (Philippe Marquet) Master: Operating systems, 40h, M1, Polytech'Lille, France (Julien Forget) Master: Advanced operating system programming, 8h, M1, Polytech'Lille, France (Julien Forget) Master: Fundamentals of computer science, 8h, M1, Polytech'Lille, France (Julien Forget) Master: Distributed systems, 8h, M2, Polytech'Lille, France (Julien Forget) Master : Bases de données, 20h, M1, Université Lille 1 (EPU), France (Anne Etien) Master: Ingénierie Logicielle, 20h, M2, Université Lille 1 (EPU), France (Anne Etien) ### 8.2.1. Undergraduate Laure Gonnord teaches several basic courses such as algorithmics, programming in C, compilation and an introduction to theoritical computer science for engineers. Laure Gonnord has co-supervised with Christophe Alias (Compsys) Guillaume Andrieu's undergraduate intership on modular termination of C programs. ### 8.2.2. Graduate Vlad Rusu co-supervises the PhD thesis of Andrei Arusoaie with Dorel Lucanu (University of Iasi, Romania) on the design and analysis of DSMLs using in the K framework (since October 2011). #### 8.2.3. Post-Graduate Vlad Rusu is the supervisor of the postdoctoral research of Christophe Calvès on the formal definition of Kermeta in the K framework (since September 2010). Abdoulaye Gamatié has supervised Sarra Boumedien, Master Student Sfax Tunisia, March-July for her master thesis on Clock-based Design of a Multimedia Application on SoCs. PhD & HdR (Les thèses soutenues doivent figurer dans la bibliographie): PhD in progress : Amen Souissi, "Propagation bidirectionnelle des modifications effectuées sur un modèle appartenant à une chaîne de transformation de modèles" ,01/10/2009, Pierre Boulet et Cedric Dumoulin PhD in progress : Amine EL KOUHEN, Méta-modèle d'adaptation des outils de modélisation, Janvier 2011, Cédric DUMOULIN (LIFL), Sébastien GERARD (CEA), Pierre BOULET (LIFL) Vincent Aranega has defended his phD thesis entitled "Tracabilité pour la mise au point de modèle et la correction de transformation" on November 2011. # 9. Bibliography # Major publications by the team in recent years - [1] A. C. ALJUNDI, J.-L. DEKEYSER, M. T. KECHADI, I. D. SCHERSON. A universal performance factor for multi-criteria evaluation of multistage interconnection networks, in "Future Generation Comp. Syst.", 2006, vol. 22, n<sup>o</sup> 7, p. 794-804. - [2] A. CUCCURU, J.-L. DEKEYSER, P. MARQUET, P. BOULET. *Towards UML 2 extensions for compact modeling of regular complex topologies*, in "MODELS/UML 2005, ACM/IEEE 8th international conference on model driven engineering languages and systems", Montego Bay, Jamaica, October 2005. - [3] A. W. DE OLIVEIRA RODRIGUES, F. GUYOMARC'H, J.-L. DEKEYSER, Y. LE MENACH. *Automatic Multi-GPU Code Generation applied to Simulation of Electrical Machines*, in "Compumag 2011", Sydney, Australia, July 2011, http://hal.inria.fr/inria-00605645/en. - [4] A. GAMATIÉ, É. RUTTEN, H. YU, P. BOULET, J.-L. DEKEYSER. Synchronous Modeling and Analysis of Data Intensive Applications, in "EURASIP Journal on Embedded Systems, eurasip", july 2008, vol. 2008, Article ID 561863. - [5] A. GAMATIÉ. *Design of Streaming Applications on MPSoCs using Abstract Clocks*, in "Design, Automation and Test in Europe Conference (DATE'2012)", Dresden, Allemagne, 2012, http://hal.inria.fr/hal-00647480/en/. - [6] A. GAMATIÉ, S. LE BEUX, É. PIEL, R. BEN ATITALLAH, A. ETIEN, P. MARQUET, J.-L. DEKEYSER. A Model Driven Design Framework for Massively Parallel Embedded Systems, in "ACM Transactions on Embedded Computing Systems (TECS)", 2011, vol. 10, no 4, http://hal.inria.fr/inria-00637595/en. - [7] C. GLITIA, P. BOULET, E. LENORMAND, M. BARRETEAU. Repetitive model refactoring strategy for the design space exploration of intensive signal processing applications, in "Journal of Systems Architecture", January 2011, vol. 57, n<sup>o</sup> 9, p. 815-829 [DOI: 10.1016/J.SYSARC.2010.12.002], http://hal.inria.fr/inria-00605069/en. - [8] S. LE BEUX, P. MARQUET, J.-L. DEKEYSER. A Model Driven Co-Design Approach for High Perforamnce Embedded Systems Dedicated to Transport, in "Studies in Informatics and Control Journal", 2008, vol. 2008, nº 4. - [9] P. MARQUET, S. DUQUENNOY, S. LE BEUX, S. MEFTALI, J.-L. DEKEYSER. *Massively Parallel Processing on a Chip*, in "ACM Int'l Conf. on Computing Frontiers", Ischia, Italy, May 2007. - [10] SANTHOSH KUMAR. RETHINAGIRI, R. BEN ATITALLAH, S. NIAR, E. SENN, J.-L. DEKEYSER. *Hybrid System Level Power Consumption Estimation for 29FPGA-Based MPSoC*, in "29th IEEE International Conference on Computer Design ICCD 2011", October 2011. - [11] V. Rusu. *Embedding Domain-Specific Modelling Languages in Maude Specifications*, in "ACM SIGSOFT Software Engineering Notes", January 2011, vol. 36, n<sup>o</sup> 1, Extended version accepted in the Systems and Software Engineering Journal. [*DOI*: 10.1145/1921532.1921557], http://hal.inria.fr/inria-00527859/en/. - [12] C. Trabelsi, S. Meftali, R. Ben Atitallah, A. Jemai, J.-L. Dekeyser, S. Niar. *An MDE Approach for Energy Consumption Estimation in MPSoC Design*, in "2nd Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools", Italie Pisa, Jan 2010, 6 p., <a href="http://hal.inria.fr/inria-00486200/en">http://hal.inria.fr/inria-00486200/en</a>. ### **Publications of the year** #### **Doctoral Dissertations and Habilitation Theses** - [13] A. ABDALLAH. Conception de SoC à Base d'Horloges Abstraites: Vers l'Exploration d'Architectures en MARTE, Université des Sciences et Technologie de Lille - Lille I, March 2011, http://hal.inria.fr/tel-00597031/en. - [14] V. ARANEGA. *Traçabilité pour la mise au point de modèles et la correction de transformations*, Université des Sciences et Technologie de Lille Lille I, November 2011, http://hal.inria.fr/tel-00597031/en. ### **Articles in International Peer-Reviewed Journal** - [15] G. AFONSO, N. BELANGER. Making a MARTE, March 2011. - [16] Y. AYDI, M. BAKLOUTI, J.-L. DEKEYSER, M. ABID. A Multi-Level Design Methodology of Multistage Interconnection Network for MPSOCs, in "International Journal of Computer Applications in Technology (IJCAT)", 2011, vol. 42, no 1-2, http://hal.inria.fr/inria-00563733/en. - [17] A. GAMATIÉ, S. LE BEUX, É. PIEL, R. BEN ATITALLAH, A. ETIEN, P. MARQUET, J.-L. DEKEYSER. A Model Driven Design Framework for Massively Parallel Embedded Systems, in "ACM Transactions on Embedded Computing Systems (TECS)", 2011, vol. 10, no 4, http://hal.inria.fr/inria-00637595/en. - [18] C. GLITIA, P. BOULET, E. LENORMAND, M. BARRETEAU. Repetitive model refactoring strategy for the design space exploration of intensive signal processing applications, in "Journal of Systems Architecture", January 2011, vol. 57, n<sup>o</sup> 9, p. 815-829 [DOI: 10.1016/J.SYSARC.2010.12.002], http://hal.inria.fr/inria-00605069/en. - [19] C. PAGETTI, J. FORGET, F. BONIOL, M. CORDOVILLA, D. LESENS. *Multi-task implementation of multi-periodic synchronous programs*, in "Discrete Event Dynamic Systems", 2011, vol. 21, n<sup>o</sup> 3, p. 307-338, http://hal.inria.fr/inria-00638936/en/. [20] L. ROSE, E. GUERRA, J. DE LARA, A. ETIEN, D. KOLOVOS, R. F. PAIGE. *Generic Model Management*, in "Software and Systems Modeling", 2011, in press, <a href="http://hal.inria.fr/inria-00635200/en">http://hal.inria.fr/inria-00635200/en</a>. - [21] V. Rusu. *Embedding Domain-Specific Modelling Languages in Maude Specifications*, in "ACM SIGSOFT Software Engineering Notes", January 2011, vol. 36, no 1, Extended version accepted in the Systems and Software Engineering Journal. [DOI: 10.1145/1921532.1921557], http://hal.inria.fr/inria-00527859/en/. - [22] A. SOUISSI, P. BOULET, C. DUMOULIN, M. LAUNAY. *Modélisation centrée sur les processus métier pour la génération complète de portails collaboratifs*, in "Technique et Science Informatiques (TSI)", November 2011, http://hal.inria.fr/inria-00638298/en. - [23] C. TRABELSI, R. BEN ATITALLAH, S. MEFTALI, J.-L. DEKEYSER, A. JEMAI. *AModel-Driven Approach for Hybrid Power Estimation in Embedded Systems Design*, in "Eurasip Journal on Embedded Systems", April 2011, http://hal.inria.fr/inria-00584360/en. ### **Articles in National Peer-Reviewed Journal** [24] A. ABDALLAH, A. GAMATIÉ, J.-L. DEKEYSER. *Modélisation UML/MARTE de SoC et analyse temporelle basée sur l'approche synchrone*, in "RSTI - TSI - 30/2011. Architecture des ordinateurs", 2011, vol. 30, p. 1089 – 1114, http://hal.inria.fr/inria-00637009/en. #### **Invited Conferences** - [25] S. CHERIF, C. TRABELSI, S. MEFTALI, J.-L. DEKEYSER. *High Level Design of adaptive distributed controller for Partial Dynamic reconfiguration in FPGA*, in "Conference on Design and Architectures for Signal and Image Processing", Tampere, Finland, September 2011, <a href="http://hal.inria.fr/inria-00609122/en">http://hal.inria.fr/inria-00609122/en</a>. - [26] M. ELHAJI, B. ATTIA, A. ZITOUNI, R. TOURKI, S. MEFTALI, J.-L. DEKEYSER. *FERONOC: Flexible and extensible router implementation for diagonal mesh topology*, in "Conference on Design and Architectures for Signal and Image Processing", Tampere, Finland, September 2011, http://hal.inria.fr/inria-00609117/en. ### **International Conferences with Proceedings** - [27] G. AFONSO, R. BEN ATITALLAH, N. BELANGER, M. RUBIO, J.-L. DEKEYSER, A. LOYER. *A prototyping environment for high performance reconfigurable computing*, in "6th International Workshop on Reconfigurable Communication-centric Systems-on-Chip", Montpellier, France, June 2011. - [28] G. AFONSO, R. BEN ATITALLAH, N. BELANGER, M. RUBIO, J.-L. DEKEYSER, S. STILKERICH. *Toward Generic and Adaptive Avionic Test Systems*, in "NASA/ESA Conference on Adaptive Hardware and Systems", San Diego, USA, June 2011. - [29] G. AFONSO, R. BEN ATITALLAH, J.-L. DEKEYSER. A Design Environment for Reconfigurable Computing Systems, in "Systems-on-Chip System-in-Package", Lyon, France, June 2011. - [30] B. ANTHONY JOSE, A. GAMATIÉ, J. OUY, S. KUMAR SHUKLA. SMT based false causal loop detection during code synthesis from Polychronous specifications, in "9th IEEE/ACM International Conference on Formal Methods and Models for Codesign (MEMOCODE)", Cambridge, United Kingdom, 2011, http://hal. inria.fr/inria-00637574/en. - [31] B. COMBEMALE, L. GONNORD, V. RUSU. A Generic Tool for Tracing Executions Back to a DSML's Operational Semantics, in "Seventh European Conference on Modelling Foundations and Applications", Birmingham, United Kingdom, Lecture Notes in Computer Science, Springer Verlag, June 2011, vol. 6698, p. 35-51, http://hal.inria.fr/hal-00593425/en. - [32] M. CORDOVILLA, F. BONIOL, J. FORGET, E. NOULARD, C. PAGETTI. *Developing critical embedded systems on multicore architectures: the Prelude-SchedMCore toolset*, in "19th International Conference on Real-Time and Network Systems", Nantes, France, Irccyn, September 2011, http://hal.inria.fr/inria-00618587/en. - [33] A. W. DE OLIVEIRA RODRIGUES, F. GUYOMARC'H, J.-L. DEKEYSER. *Programming Massively Parallel Architectures using MARTE: a Case Study*, in "2nd Workshop on Model Based Engineering for Embedded Systems Design (M-BED 2011) on Date Conference 2011", Grenoble, France, March 2011, http://hal.inria.fr/inria-00578646/en. - [34] A. W. DE OLIVEIRA RODRIGUES, F. GUYOMARC'H, J.-L. DEKEYSER, Y. LE MENACH. *Automatic Multi-GPU Code Generation applied to Simulation of Electrical Machines*, in "Compumag 2011", Sydney, Australia, July 2011, http://hal.inria.fr/inria-00605645/en. - [35] M. ELHAJI, P. BOULET, R. TOURKI, A. ZITOUNI, J.-L. DEKEYSER, S. MEFTALI. *Modeling Networks-on-Chip at System Level with the MARTE UML profile*, in "M-BED'2011", Grenoble, France, March 2011, http://hal.inria.fr/inria-00569077/en. - [36] J. FORGET, E. GROLLEAU, C. PAGETTI, P. RICHARD. Dynamic Priority Scheduling of Periodic Tasks with Extended Precedences, in "IEEE 16th Conference on Emerging Technologies Factory Automation (ETFA)", Toulouse, France, September 2011 [DOI: 10.1109/ETFA.2011.6059015], http://hal.inria.fr/inria-00638941/en/. - [37] A. GAMATIÉ. Design of Streaming Applications on MPSoCs using Abstract Clocks, in "Design, Automation and Test in Europe Conference (DATE'2012)", Dresden, Allemagne, 2012, http://hal.inria.fr/hal-00647480/en/. - [38] A. GAMATIÉ, L. GONNORD. Static analysis of synchronous programs in signal for efficient design of multiclocked embedded systems, in "ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems, LCTES 2011", Chicago, IL, United States, 2011, p. 71-80, http://hal.inria.fr/inria-00586137/en. - [39] J. Guo, A. W. DE OLIVEIRA RODRIGUES, J. THIYAGALINGAM, F. GUYOMARC'H, P. BOULET, S.-B. SCHOLZ. Harnessing the Power of GPUs without Losing Abstractions in SaC and ArrayOL: A Comparative Study, in "HIPS 2011, 16th International Workshop on High-Level Parallel Programming Models and Supportive Environments", Anchorage (Alaska), United States, May 2011, http://hal.inria.fr/inria-00569100/en. - [40] D. MONNIAUX, L. GONNORD. *Using Bounded Model Checking to Focus Fixpoint Iterations*, in "Static analysis", Venezia, Italie, E. YAHAV (editor), Lecture notes in Computer Science, Springer, 2011, vol. 6887, p. 369-385 [DOI: 10.1007/978-3-642-23702-7\_27], http://hal.archives-ouvertes.fr/hal-00600087/en/. - [41] SANTHOSH KUMAR. RETHINAGIRI, R. BEN ATITALLAH, J.-L. DEKEYSER. A System Level Power Consumption Estimation for MPSoC, in "International Symposium on System-on-Chip 2011", October 2011. [42] SANTHOSH KUMAR. RETHINAGIRI, R. BEN ATITALLAH, S. NIAR, E. SENN, J.-L. DEKEYSER. *Fast and Accurate Hybrid Power Estimation Methodology for Embedded Systems*, in "Conference on Design and Architectures for Signal and Image Processing (DASIP)", November 2011. ### [43] Best Paper SANTHOSH KUMAR. RETHINAGIRI, R. BEN ATITALLAH, S. NIAR, E. SENN, J.-L. DEKEYSER. *Hybrid System Level Power Consumption Estimation for 29FPGA-Based MPSoC*, in "29th IEEE International Conference on Computer Design ICCD 2011", October 2011. - [44] V. RUSU, D. LUCANU. A K-Based Formal Framework for Domain-Specific Modelling Languages, in "Formal Verification of Object-Oriented Systems", Torino, Italy, October 2011, http://hal.inria.fr/inria-00637099/en. - [45] V. Rusu, D. Lucanu. K Semantics for OCL a Proposal for a Formal Definition for OCL, in "2nd International K Workshop", Cheile Gradistei (Brasov), Roumanie, August 2011, http://hal.inria.fr/hal-00641199/en/. - [46] R. WISS, F. BONIOL, C. PAGETTI, J. FORGET. *Calcul et vérification de propriété de latence sur une spécification fonctionnelle synchrone multi-périodique*, in "Approches Formelles dans l'Assistance au Développement de Logiciels (AFADL)", Grenoble, 2012, to appear. ### **National Conferences with Proceeding** [47] A. W. DE OLIVEIRA RODRIGUES, F. GUYOMARC'H, J.-L. DEKEYSER. A Modeling Approach based on *UML/MARTE for GPU Architecture*, in "Symposium en Architectures nouvelles de machines (SympA'14)", Saint Malo, France, May 2011, http://hal.inria.fr/inria-00593863/en. ### **Conferences without Proceedings** - [48] J. FORGET, E. GROLLEAU, C. PAGETTI. Ordonnancement de tâches périodiques avec précédences étendues sans sémaphores, in "ROADEF 2011", SAINT ETIENNE, France, École Nationale Supérieure des Mines de Saint-Étienne, March 2011, http://hal.inria.fr/inria-00563798/en. - [49] SANTHOSH KUMAR. RETHINAGIRI, R. BEN ATITALLAH, S. NIAR, E. SENN, J.-L. DEKEYSER. An Effective Approach for Power Consumption Modeling of Complex Processor, in "GDR SOC-SIP", June 2011. ### Scientific Books (or Scientific Book chapters) - [50] V. ARANEGA, J.-M. MOTTU, A. ETIEN, J.-L. DEKEYSER. *Using Trace to Situate Errors in Model Transformations*, in "Software and Data Technologies", Communications in Computer and Information Science, Springer Berlin Heidelberg, April 2011, vol. 50 [DOI: 10.1007/978-3-642-20116-5\_11], http://hal.inria.fr/inria-00589253/en. - [51] Y. AYDI, M. BAKLOUTI, P. MARQUET, J.-L. DEKEYSER, M. ABID. *A Design Methodology of MIN-Based Network for MPPSoC on Reconfigurable Architecture*, in "Reconfigurable Embedded Control Systems: Applications for Flexibility and Agility", M. KHALGUI, H.-M. HANISCH (editors), IGI-Global, 2011, p. 209-234, http://hal.inria.fr/inria-00563719/en. - [52] R. CORVINO, A. GAMATIÉ, P. BOULET. *Design Space Exploration for Efficient Data Intensive Computing on SoCs*, in "Handbook of Data Intensive Computing", B. FURHT, A. ESCALANTE (editors), Springer, 2011, http://hal.inria.fr/inria-00637012/en. - [53] J.-L. DEKEYSER, A. GAMATIÉ, S. MEFTALI, I. R. QUADRI. *Models for Co-Design of Heterogeneous Dynamically Reconfigurable SoCs*, in "Heterogeneous Embedded Systems Design Theory and Practice", Springer, 2012, 26, http://hal.inria.fr/inria-00525023/en. - [54] A. GAMATIÉ. Specification of Data Intensive Applications with Data Dependency and Abstract Clocks, in "Handbook of Data Intensive Computing", B. FURHT, A. ESCALANTE (editors), Springer, 2011, http://hal.inria.fr/inria-00637011/en/. ### **Research Reports** - [55] A. ABDALLAH, A. GAMATIÉ, R. BEN ATITALLAH, J.-L. DEKEYSER. Correct and Energy-Efficient Design of a Multimedia Application on SoCs, INRIA, August 2011, no RR-7715, http://hal.inria.fr/inria-00616223/en. - [56] A. W. DE OLIVEIRA RODRIGUES, V. ARANEGA, A. ETIEN, F. GUYOMARC'H, J.-L. DEKEYSER. *Enabling Traceability in an MDE Approach to Improve Performance of GPU Applications*, INRIA, August 2011, n<sup>o</sup> RR-7720, http://hal.inria.fr/inria-00617912/PDF/RR-7720.pdf. - [57] A. W. DE OLIVEIRA RODRIGUES, V. ARANEGA, A. ETIEN, F. GUYOMARC'H, J.-L. DEKEYSER. *Enabling Traceability in an MDE Approach to Improve Performance of GPU Applications*, INRIA, August 2011, n<sup>o</sup> RR-7720, http://hal.inria.fr/inria-00617912/en. - [58] A. W. DE OLIVEIRA RODRIGUES, F. GUYOMARC'H, J.-L. DEKEYSER. *An MDE Approach for Automatic Code Generation from MARTE to OpenCL*, INRIA, February 2011, n<sup>o</sup> RR-7525, http://hal.inria.fr/inria-00563411/en. - [59] B. A. JOSE, A. GAMATIÉ, M. KRACHT, S. K. SHUKLA. *Improved False Causal Loop Detection in Polychronous Specification of Embedded Software*, INRIA, 2011, http://hal.inria.fr/inria-00637582/en. #### **Scientific Popularization** [60] P. BOULET. *Modélisation et analyse de systèmes embarqués ou temps-réel avec le profil UML MARTE*, in "Techniques de l'Ingenieur", February 2011, n<sup>o</sup> IN120, http://hal.inria.fr/inria-00587386/en. ### **Other Publications** [61] A. W. DE OLIVEIRA RODRIGUES, F. GUYOMARC'H, J.-L. DEKEYSER. *Using ArrayOL to Identify Potentially Shareable Data in Thread Work-Groups of GPUs*, March 2011, Poster, http://hal.inria.fr/inria-00594304/en/. ### **References in notes** - [62] The Eclipse Project, 2003, http://www.eclipse.org. - [63] EMF Eclipse Modeling Framework, 2007, http://www.eclipse.org/modeling/emf. - [64] Acceleo, 2009, http://www.acceleo.org. - [65] OBJECT MANAGEMENT GROUP, INC. (editor). *U2 Partners'* (*UML 2.0*): Superstructure, 2nd revised submission, January 2003, http://www.omg.org/cgi-bin/doc?ptc/03-01-02. [66] OBJECT MANAGEMENT GROUP, INC. (editor). (UML 2.0): Superstructure Draft Adopted Specification, July 2003, http://www.omg.org/cgi-bin/doc?ptc/03-07-06. - [67] SystemC, 2002, http://www.systemc.org/. - [68] OpenMP Application Programme Interface, May 2005, http://www.openmp.org/drupal/mp-documents/spec25.pdf. - [69] A. ABDALLAH, A. GAMATIÉ, J.-L. DEKEYSER. Modélisation UML/MARTE de SoC et analyse temporelle basée sur l'approche synchrone, in "SYMPosium en Architecture de machines (SympA'13)", Toulouse, France, September 2009. - [70] A. ABDALLAH, A. GAMATIÉ, J.-L. DEKEYSER. *Model-Driven Design of Embedded Multimedia Applications on SoCs*, in "12th Euromicro Conference on Digital System Design (DSD2009)", Patras, Greece, August 2009. - [71] A. ABDALLAH, A. GAMATIÉ, J.-L. DEKEYSER. *Correct and Energy-Efficient Design of SoCs: the H.264 Encoder Case Study*, in "International Symposium on System-on-Chip (SoC'2010)", Finlande Tampere, 2010, http://hal.inria.fr/inria-00522792/en. - [72] G. AFONSO, R. B. ATITALLAH, N. BELANGER, M. RUBIO, J.-L. DEKEYSER. *An Efficient Design Methodology for Hybrid Avionic Test Systems*, in "IEEE Conference on Emerging Technologies and Factory Automation (ETFA)", Bilbao, Spain, Sep 2010. - [73] A. AGRAWAL. Graph Rewriting And Transformation (GReAT): A Solution For The Model Integrated Computing Bottleneck, in "18th IEEE International Conference on Automated Software Engineering (ASE'03)", 2003, p. 364-368. - [74] C. ANDRÉ, F. MALLET, R. DE SIMONE. *Time Modeling in MARTE*, in "ECSI Forum on specification & Design Languages (FDL)", Barcelona Espagne, ECSI, 2007, p. 268-273, http://hal.inria.fr/inria-00204481/en/. - [75] V. ARANEGA, A. ETIEN, J.-L. DEKEYSER. *Using an Alternative Trace for QVT*, in "Workshop on Multi-Paradigm Modeling", Norvège Olso, Oct 2010, http://hal.inria.fr/inria-00524153/en. - [76] V. ARANEGA, J.-M. MOTTU, A. ETIEN, J.-L. DEKEYSER. *Traceability Mechanism for Error Localization in Model Transformation*, in "ICSOFT", Bulgaria, July 2009. - [77] V. ARANEGA, J.-M. MOTTU, A. ETIEN, J.-L. DEKEYSER. *Using Traceability to Enhance Mutation Analysis Dedicated to Model Transformation*, in "Workshop on Model driven Engineering Verification and Validation", Norvège Olso, Oct 2010, http://hal.inria.fr/inria-00524150/en. - [78] R. BENDRAOU, B. COMBEMALE, X. CRÉGUT, M.-P. GERVAIS. *Definition of an Executable SPEM 2.0*, in "APSEC", IEEE Computer Society, 2007, p. 390-397. - [79] A. BENVENISTE, P. CASPI, S. EDWARDS, N. HALBWACHS, P. LE GUERNIC, R. DE SIMONE. *The Synchronous Languages Twelve Years Later*, in "Proceedings of the IEEE", January 2003, vol. 91, n<sup>o</sup> 1, p. 64-83. - [80] A. E. H. BENYAMINA, P. BOULET. *Multi-objective Mapping for NoC Architecture*, in "Journal of Digital Information Management", December 2007, vol. 5, n<sup>o</sup> 6, p. 378–384. - [81] P. BOULET. Array-OL Revisited, Multidimensional Intensive Signal Processing Specification, INRIA, February 2007, no RR-6113, http://hal.inria.fr/inria-00128840/en. - [82] P. BOULET. Formal Semantics of Array-OL, a Domain Specific Language for Intensive Multidimensional Signal Processing, INRIA, March 2008, n<sup>o</sup> RR-6467, http://hal.inria.fr/inria-00261178/en/. - [83] P. BOULET, J.-L. DEKEYSER, J.-L. LEVAIRE, P. MARQUET, J. SOULA, A. DEMEURE. *Visual Data-parallel Programming for Signal Processing Applications*, in "9th Euromicro Workshop on Parallel and Distributed Processing, PDP 2001", Mantova, Italy, February 2001, p. 105–112. - [84] T. BUCHMANN, A. DOTOR, S. UHRIG, B. WESTFECHTEL. *Model-Driven Software Development with Graph Transformations: A Comparative Case Study*, in "Applications of Graph Transformations with Industrial Relevance, Third International Symposium (AGTIVE'07)", 2007, p. 345-360. - [85] P. CASPI, D. PILAUD, N. HALBWACHS, J.A. PLAICE. *Lustre: a declarative language for real-time pro-gramming*, in "Proceedings of the 14th ACM SIGACT-SIGPLAN symposium on Principles of programming languages (POPL'87)", ACM Press, 1987, p. 178-188. - [86] S. CHERIF, I. R. QUADRI, S. MEFTALI, J.-L. DEKEYSER. *Modeling reconfigurable Systems-on-Chips with UML MARTE profile: an exploratory analysis*, in "13th Euromicro Conference on Digital System Design (DSD 2010)", France Lille, Sep 2010, http://hal.inria.fr/inria-00525004/en. - [87] M. CLAVEL, F. DURÁN, S. EKER, P. LINCOLN, N. MARTÍ-OLIET, J. MESEGUER, C. L. TALCOTT.. All About Maude, A High-Performance Logical Framework, Lecture Notes in Computer Science, Springer, 2007, vol. 4350. - [88] R. CORVINO, A. GAMATIÉ, P. BOULET. Architecture Exploration for Efficient Data Transfer and Storage in Data-Parallel Applications, in "Euro-Par 2010 Parallel Processing", P. D'AMBRA, M. GUARRACINO, D. TALIA (editors), Springer Berlin / Heidelberg, 2010, vol. 6271, p. 101–116, http://hal.inria.fr/inria-00522786/en. - [89] G. CSERTÁN, G. HUSZERL, I. MAJZIK, Z. PAP, A. PATARICZA, D. VARRÓ. VIATRA Visual Automated Transformations for Formal Verification and Validation of UML Models, in "17th IEEE International Conference on Automated Software Engineering (ASE'02)", 2002, p. 267-270. - [90] A. DEMEURE, A. LAFAGE, E. BOUTILLON, D. ROZZONELLI, J.-C. DUFOURD, J.-L. MARRO. *Array-OL* : *Proposition d'un Formalisme Tableau pour le Traitement de Signal Multi-Dimensionnel*, in "Gretsi", Juan-Les-Pins, France, September 1995. - [91] A. DEMEURE, Y. DEL GALLO. An Array Approach for Signal Processing Design, in "Sophia-Antipolis conference on Micro-Electronics (SAME 98)", France, October 1998. - [92] P. DUMONT, P. BOULET. Transformations de code Array-OL: implémentation de la fusion de deux tâches, Laboratoire d'Informatique fondamentale de Lille et Thales Communications, October 2003. [93] P. DUMONT. Étude des Transformations d'un Code Array-OL dans Gaspard, Laboratoire d'informatique fondamentale de Lille, Université des sciences et technologies de Lille, France, September 2002, nº 02-11, http://www.lifl.fr/west/publi/Dumo02rr11.ps.gz. - [94] P. DUMONT. Spécification multidimensionnelle pour le traitement du signal systématique, Laboratoire d'informatique fondamentale de Lille, Université des sciences et technologies de Lille, 2005, (In French). - [95] M. EGEA, V. RUSU. Formal executable semantics for conformance in the MDE framework, in "Innivations in Software and Systems Engineering", 2010, http://hal.inria.fr/inria-00527502/en. - [96] A. ETIEN, A. MULLER, T. LEGRAND, X. BLANC. *Combining Independent Model Transformations*, in "ACM Symposium On Applied Computing (SAC)", Suisse Sierre, Mar 2010, http://hal.inria.fr/inria-00516708/en. - [97] D. D. GAJSKI, R. KUHN. *Guest Editor Introduction: New VLSI-Tools*, in "IEEE Computer", December 1983, vol. 16, n<sup>o</sup> 12, p. 11-14. - [98] I. GALVÃO, A. GOKNIL. Survey of Traceability Approaches in Model-Driven Engineering, in "IEEE International Enterprise Distributed Object Computing Conference (EDOC 2007)", October 2007, p. 313-326. - [99] A. GAMATIÉ. Designing Embedded Systems with the SIGNAL Programming Language, Springer, 2010. - [100] A. GAMATIÉ, V. RUSU, É. RUTTEN. *Operational Semantics of the Marte Repetitive Structure Modeling Concepts for Data-Parallel Applications Design*, in "9th International Symposium on Parallel and Distributed Computing (ISPDC'2010)", Turquie Istanbul, 2010, http://hal.inria.fr/inria-00522787/en. - [101] C. GLITIA. Code transformations for systematic signal processing and memory size optimizations, Université des sciences et technologies de Lille, 2006. - [102] F. JOUAULT. *Loosely Coupled Traceability for ATL*, in "European Conference on Model Driven Architecture (ECMDA) workshop on traceability", 2005, p. 29–37. - [103] S. LE BEUX, P. MARQUET, J.-L. DEKEYSER. *Model Driven Engineering Benefits for High Level Synthesis*, INRIA, 2008, n<sup>o</sup> 6615, http://hal.inria.fr/inria-00311300/en/. - [104] P. LE GUERNIC, J.-P. TALPIN, J.-C. LE LANN. *Polychrony for System Design*, in "Journal for Circuits, Systems and Computers", April 2003, vol. 12, n<sup>o</sup> 3, p. 261–304. - [105] A. MEENA, P. BOULET. Model Driven Scheduling Framework for Multiprocessor SoC Design, in "Workshop on Scheduling for Parallel Computing, SPC 2005", Poznan, Poland, September 2005, http://www.cs.put. poznan.pl/mdrozdowski/spc-ppam05/. - [106] A. MEENA. Allocation, Assignation et ordonnancement pour les systèmes sur puce multi-processeurs, Université des Sciences et Technologies de Lille, December 2006. - [107] D. MENDEZ, A. ETIEN, A. MULLER, R. CASALLAS. *Towards Transformation Migration After Metamodel Evolution*, in "Model and Evolution Wokshop", Norvège Olso, Oct 2010, http://hal.inria.fr/inria-00524145/en. - [108] J.-M. MOTTU, B. BAUDRY, Y. LE TRAON. *Mutation Analysis Testing for Model Transformations*, in "proceedings of the European Conference on Model Driven Architecture (ECMDA 06)", Bilbao, Spain, July 2006. - [109] P.-A. MULLER, F. FLEUREY, J.-M. JÉZÉQUEL. Weaving Executability into Object-Oriented Metalanguages, in "MoDELS", Lecture Notes in Computer Science, Springer, 2005, vol. 3713, p. 264-278. - [110] OBJECT MANAGEMENT GROUP, INC.. Meta Object Facility (MOF) Core Specification, Version 2.0, January 2006, http://www.omg.org. - [111] OBJECT MANAGEMENT GROUP, INC.. MOF Query / Views / Transformations, July 2007, OMG paper, http://www.omg.org. - [112] I. R. QUADRI, A. GAMATIÉ, P. BOULET, J.-L. DEKEYSER. *Modeling of Configurations for Embedded System Implementations in MARTE*, in "1st workshop on Model Based Engineering for Embedded Systems Design Design, Automation and Test in Europe (DATE 2010)", Allemagne Dresden, Mar 2010, http://hal.inria.fr/inria-00486845/en. - [113] I. R. QUADRI, S. MEFTALI, J.-L. DEKEYSER. *Designing dynamically reconfigurable SoCs: From UML MARTE models to automatic code generation*, in "Conference on Design and Architectures for Signal and Image Processing (DASIP 2010)", Royaume-Uni Edinburgh, Oct 2010, http://hal.inria.fr/inria-00525003/en. - [114] I. R. QUADRI. MARTE based model driven design methodology for targeting dynamically reconfigurable FPGA based SoCs, Université des Sciences et Technologie de Lille Lille I, Apr 2010, http://hal.inria.fr/tel-00486483/en. - [115] J. RIVERA, E. GUERRA, J. DE LARA, A. VALLECILLO.. Analyzing Rule-Based Behavioral Semantics of Visual Modeling Languages with Maude, in "Proc. 1st International Conference on Software Language Engineering (SLE'08)", LNCS, 2008, vol. 5452, p. 54-73. - [116] L. ROSE, A. ETIEN, D. MENDEZ, D. KOLOVOS, F. POLACK, R. F. PAIGE. *Comparing Model-Metamodel and Transformation-Metamodel Co-evolution*, in "Model and Evolution Wokshop", Norvège Olso, Oct 2010, http://hal.inria.fr/inria-00524314/en. - [117] G. ROSU, T.-F. SERBANUTA. *An Overview of the K Semantic Framework*, in "Journal of Logic and Algebraic Programming", 2010, vol. 79, n<sup>o</sup> 6, p. 397–434, http://dx.doi.org/10.1016/j.jlap.2010.03.012. - [118] V. Rusu, L. Gonnord, B. Combemale. Formally Tracing Executions From an Analysis Tool Back to a Domain Specific Modeling Language's Operational Semantics, INRIA, Oct 2010, n<sup>o</sup> RR-7423, http://hal.inria.fr/inria-00526561/en. - [119] V. Rusu. Formal executable semantics for conformance in the MDE framework, in "3rd Int. UML and Formal Methods Workshop", December 2009. - [120] V. Rusu. *Embedding Domain-Specific Modelling Languages in Maude Specifications*, in "ACM SIGSOFT Software Engineering Notes", 2010, http://hal.inria.fr/inria-00527859/en. - [121] D. SCHMIDT. Model-Driven Engineering, in "IEEE Computer", February 2006, vol. 39, no 2, p. 41-47. [122] J. SOULA. *Principe de Compilation d'un Langage de Traitement de Signal*, Laboratoire d'informatique fondamentale de Lille, Université des sciences et technologies de Lille, December 2001, (In French). - [123] G. TAENTZER. AGG: A Graph Transformation Environment for Modeling and Validation of Software, in "Applications of Graph Transformations with Industrial Relevance, Second International Workshop (AGTIVE'03)", 2003, p. 446-453. - [124] J. TAILLARD, F. GUYOMARC'H, J.-L. DEKEYSER. A Graphical Framework for High Performance Computing using an MDE Approach, in "16th Euromicro International Conference on Parallel, Distributed and network-based Processing", Toulouse, France, February 2008, to appear. - [125] Y. VELEGRAKIS, R. J. MILLER, J. MYLOPOULOS. *Representing and Quering Data transformations*, in "International conference on Data Engineering (ICDE)", April 2005, p. 81-92.