Section: New Results
Bio-medicine and biotechnology
Participants : Pascal Durrens [correspondant] , David Sherman.
Genome assembly for bio-medicine
We performed the assembly of the Clavispora lusitaniae (aka Candida lusitaniae) genome. Yeasts from the genus Candida are opportunistic human pathogens in immunocompromised patients, linked to a high mortality rate. Although Candida albicans is the major pathogen, related species are more and more isolated, such as Clavispora lusitaniae which is responsible for candidaemia in newborn babies and in onco-hematology patients.
Even though the genome of a Clavispora lusitaniae strain (ATCC 42720), isolated from a patient, has already been sequenced by the Broad Institute, we achieved the genome assembly of the wild type reference strain (CBS 6936) as patient isolates tend to harbor genome modifications. The assembly was computed from Illumina reads with a coverage of 30X, using the MINIA assembler from Inria GENSCALE team. We also looked for single nucleotide polymorphismw (SNPs) in the reads coming from 3 hypovirulent mutants impaired in the beta oxidation metabolic pathway. Some detected SNPs are now under experimental validation and we are going to make a Genome Announcement for the CBS 6936 genome.
Transcriptome assembly for bio-technology
We carried out the assembly of transcriptomes from different tissues of the African oil palm tree Elaeis guineensis. The goal of this project is twofold: (i) Select the most relevant genes involved in oil synthesis in order to implement heterologous expression of some of these genes in a cultivated plant recipient such as tobacco. Preliminary results on heterologous expression of 2-3 key genes/factors ended in 15% of dry weight of oil synthesis. New expression technology allows for simultaneous expression of 15-20 genes. Identifiying the best candidates for co-expression will permit efficient heterologous oil synthesis. (ii) Identify the polymorphism of genes in a panel of 25 wild type isolates and of 5 production lineages of Elaeis guineensis in relation to the oil yield in different environmental conditions. In addition to a high variability of oil quantity (1-12 tons/ha/year), the relative amount of unsaturated fatty acids spans widely (15-55% dry weight) among the 30 Elaeis guineensis strains. Identification of polymorphisms will pave the way to genome-wide association genetics (GWAS) for the improvement of the oil resource.
In a first step, we produced assembled transcriptomes of ca. 300 million reads from 3 tissues (leaf, mesocarp, kernel) ocoming form a single tree, using state-of-the-art assembler TRINITY. Tuning of the software parameters was performed on the Inria PLAFRIM computation platform. About 20% of the assembled sequences revealed to be tissue-specific. Computation of the protein sequences deduced from the assembled transcripts gave a protein repertoire which was annotated using related sequences available in public databases. These transcript and protein sets will be used as a framwork in the polymorphism studies.