RESUMO
Phenotype prediction is at the center of many questions in biology. Prediction is often achieved by determining statistical associations between genetic and phenotypic variation, ignoring the exact processes that cause the phenotype. Here, we present a framework based on genome-scale metabolic reconstructions to reveal the mechanisms behind the associations. We calculated a polygenic score (PGS) that identifies a set of enzymes as predictors of growth, the phenotype. This set arises from the synergy of the functional mode of metabolism in a particular setting and its evolutionary history, and is suitable to infer the phenotype across a variety of conditions. We also find that there is optimal genetic variation for predictability and demonstrate how the linear PGS can still explain phenotypes generated by the underlying nonlinear biochemistry. Therefore, the explicit model interprets the black box statistical associations of the genotype-to-phenotype map and helps to discover what limits the prediction in metabolism.
Assuntos
Evolução Biológica , Genoma , Genótipo , Fenótipo , Herança MultifatorialRESUMO
The fitness cost of complex pleiotropic mutations is generally difficult to assess. On the one hand, it is necessary to identify which molecular properties are directly altered by the mutation. On the other, this alteration modifies the activity of many genetic targets with uncertain consequences. Here, we examine the possibility of addressing these challenges by identifying unique predictors of these costs. To this aim, we consider mutations in the RNA polymerase (RNAP) in Escherichia coli as a model of complex mutations. Changes in RNAP modify the global program of transcriptional regulation, with many consequences. Among others is the difficulty to decouple the direct effect of the mutation from the response of the whole system to such mutation. A problem that we solve quantitatively with data of a set of constitutive genes, those on which the global program acts most directly. We provide a statistical framework that incorporates the direct effects and other molecular variables linked to this program as predictors, which leads to the identification that some genes are more suitable to determine costs than others. Therefore, we not only identified which molecular properties best anticipate fitness, but we also present the paradoxical result that, despite pleiotropy, specific genes serve as more solid predictors. These results have connotations for the understanding of the architecture of robustness in biological systems.
Assuntos
RNA Polimerases Dirigidas por DNA , Escherichia coli , RNA Polimerases Dirigidas por DNA/genética , Escherichia coli/genética , MutaçãoRESUMO
The ecological role of microorganisms is of utmost importance due to their multiple interactions with the environment. However, assessing the contribution of individual taxonomic groups has proven difficult despite the availability of high throughput data, hindering our understanding of such complex systems. Here, we propose a quantitative definition of guild that is readily applicable to metagenomic data. Our framework focuses on the functional character of protein sequences, as well as their diversifying nature. First, we discriminate functional sequences from the whole sequence space corresponding to a gene annotation to then quantify their contribution to the guild composition across environments. In addition, we identify and distinguish functional implementations, which are sequence spaces that have different ways of carrying out the function. In contrast, we found that orthology delineation did not consistently align with ecologically (or functionally) distinct implementations of the function. We demonstrate the value of our approach with two case studies: the ammonia oxidation and polyamine uptake guilds from the Malaspina circumnavigation cruise, revealing novel ecological dynamics of the latter in marine ecosystems. Thus, the quantification of guilds helps us to assess the functional role of different taxonomic groups with profound implications on the study of microbial communities.
RESUMO
The optimization of genetically engineered biological constructs is a key step to deliver high-impact biotechnological applications. The use of high-throughput DNA assembly methods allows the construction of enough genotypic variants to successfully cover the target design space. This, however, entails extra workload for researchers during the screening stage of candidate variants. Despite the existence of commercial colony pickers, their high price excludes small research laboratories and budget-adjusted institutions from accessing such extensive screening capability. In this work we present COPICK, a technical solution to automatize colony picking in an open-source liquid handler Opentrons OT-2. COPICK relies on a mounted camera to capture images of regular Petri dishes and detect microbial colonies for automated screening. COPICK's software can then automatically select the best colonies according to different criteria (size, color and fluorescence) and execute a protocol to pick them for further analysis. Benchmark tests performed for E. coli and P. putida colonies delivers a raw picking performance over pickable colonies of 82% with an accuracy of 73.4% at an estimated rate of 240 colonies/h. These results validate the utility of COPICK, and highlight the importance of ongoing technical improvements in open-source laboratory equipment to support smaller research teams.
RESUMO
Bacterial gene expression depends on the allocation of limited transcriptional resources provided a particular growth rate and growth condition. Early studies in a few genes suggested this global regulation to generate a unifying hyperbolic expression pattern. Here, we developed a large-scale method that generalizes these experiments to quantify the response to growth of over 700 genes that a priori do not exhibit any specific control. We distinguish a core subset following a promoter-specific hyperbolic response. Within this group, we sort genes with regard to their responsiveness to the global regulatory program to show that those with a particularly sensitive linear response are located near the origin of replication. We then find evidence that this genomic architecture is biologically significant by examining position conservation of E. coli genes in 100 bacteria. The response to the transcriptional resources of the cell results in an additional feature contributing to bacterial genome organization.
RESUMO
A correction to this article has been published and is linked from the HTML and PDF versions of this paper. The error has not been fixed in the paper.
RESUMO
The forcing that environmental variation exerts on populations causes continuous changes with only two possible evolutionary outcomes: adaptation or extinction. Here we address this topic by studying the transient dynamics of populations on complex fitness landscapes. There are three important features of realistic landscapes of relevance in the evolutionary process: fitness landscapes are rough but correlated, their fitness values depend on the current environment, and many (often most) genotypes do not yield viable phenotypes. We capture these properties by defining time-varying, holey, NK fitness landscapes. We show that the structure of the space of genotypes so generated is that of a network of networks: in a sufficiently holey landscape, populations are temporarily stuck in local networks of genotypes. Sudden jumps to neighbouring networks through narrow adaptive pathways (connector links) are possible, though strong enough local trapping may also cause decays in population growth and eventual extinction. A combination of analytical and numerical techniques to characterize complex networks and population dynamics on such networks permits to derive several quantitative relationships between the topology of the space of genotypes and the fate of evolving populations.