Section: New Results
Sequences assembly, alignement and comparison
Participants : Dominique Lavenier [contact] , Claire Lemaitre, Pierre Peterlongo, Fabrice Legeai, Guillaume Chapuis, Rayan Chikhi, Nicolas Maillet, Delphine Naquin, Raluca Uricaru, Pavlos Antoniou, Thomas Derrien.
-
Hardware accelerator: Designing FPGA-based accelerators is a difficult and time-consuming task that can be eased by High Level Synthesis Tools. A C-to-hardware methodology has been used to develop an efficient systolic array for the genomic sequence alignment problem. [42] , [25] http://www.eetimes.com/design/programmable-logic/4217568/How-to-accelerate-genomic-sequence-alignment-4X-using-half-an-FPGA?Ecosystem=programmable-logic [Online publication 2: http://www.springerlink.com/content/37l00567qm18h146/]
-
De novo assembly of NGS data: A novel framework has been introduced for de novo assembly of next-generation sequencing data. The new paired string graphs and localized assembly models are implemented in the Monument assembler [24] . [Online publication: http://www.springerlink.com/content/f5g305j5k73x3k14/]
-
International competition of de novo genome assembly: The Symbiose team (IRISA/CNRS/ENS Cachan Brittany) participated to this competition. [9] . [Online publication: http://genome.cshlp.org/content/early/2011/09/16/gr.126599.111.abstract]
-
Indexation of NGS data: A novel data structure is described for indexing NGS data. The structure is coupled with filtering algorithms that enable memory-efficient and parallel indexing. [23]
-
Breakpoints in genomes: We analysed the correlation between 3D chromatin interaction data and breakpoint regions resulting from evolutionary rearrangements in the human genome. We found that two loci distant in the human genome but adjacent in the mouse genome are significantly more often observed in close proximity in the human nucleus than expected. [21] . [Online publication: http://www.biomedcentral.com/1471-2164/12/303]
-
Repeat detection: A tool has been presented for detecting long similar fragments that occur two or more times in a set of biological sequences. This is achieved by using a filter as a preprocessing step, and by using the information that the filter has gathered also in the successive inference phase. [26] . [Online publication: http://www.stringology.org/event/2011/p08.html]
-
Targeted assembly of NGS data: Mapsembler is an iterative targeted assembler which processes large datasets of reads on commodity hardware. Mapsembler checks for the presence of given regions of interest in the reads and reconstructs their neighborhood, either as a plain sequence (consensus) or as a graph (full sequence structure). [39]
-
Transcriptome assembly and annotation: We established and analyzed two catalogues of transcripts by assembling EST sequences, and performed their functional annotations using the gene ontology for the following 2 species : spodoptera littoralis [15] and cabomba [20] .
-
Substitution matrices: A general and simple methodology has been proposed to build new matrices fitted to specific compositional bias of proteins. It was then applied to the large scale comparison of Mollicute AT-rich genomes [16] . [Online publication: http://www.biomedcentral.com/1471-2105/12/457/]