Section: Research Program
Overview
Fundamental questions in the life sciences can now be addressed at an unprecedented scale through the combination of high-throughput experimental techniques and advanced computational methods from the computer sciences. The new field of computational biology or bioinformatics has grown around intense collaboration between biologists and computer scientists working towards understanding living organisms as systems. One of the key challenges in this study of systems biology is understanding how the static information recorded in the genome is interpreted to become dynamic systems of cooperating and competing biomolecules.
Magnome addresses this challenge through the development of informatic techniques for understanding the structure and history of eukaryote genomes: algorithms for genome analysis, data models for knowledge representation, stochastic hierarchical models for behavior of complex systems, and data mining and classification. Our work is in methods and algorithms for:
-
Genome annotation for complete genomes, performing syntactic analyses to identify genes, and semantic analyses to map biological meaning to groups of genes [22] , [5] , [9] , [10] , [32] , [33] .
-
Integration of heterogenous data, to build complete knowledge bases for storing and mining information from various sources, and for unambiguously exchanging this information between knowledge bases [1] , [3] , [25] , [27] , [21] .
-
Ancestor reconstruction using optimization techniques, to provide plausible scenarios of the history of genome evolution [10] , [7] , [28] , [34] .
-
Classification and logical inference, to reliably identify similarities between groups of genetic elements, and infer rules through deduction and induction [8] , [6] , [9] .
-
Hierarchical and comparative modeling, to build mathematical models of the behavior of complex biological systems, in particular through combination, reutilization, and specialization of existing continuous and discrete models [24] , [20] [12] .
The hundred- to thousand-fold decrease in sequencing costs seen in the past few years presents significant challenges for data management and large-scale data mining. Magnome 's methods specifically address “scaling out,” where resources are added by installing additional computation nodes, rather than by adding more resources to existing hardware. Scaling out adds capacity and redundancy to the resource, and thus fault tolerance, by enforcing data redundancy between nodes, and by reassigning computations to existing nodes as needed.