Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 68
Filtrar
Mais filtros

Bases de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 25(1): 195, 2024 May 17.
Artigo em Inglês | MEDLINE | ID: mdl-38760692

RESUMO

BACKGROUND: Pathogenic infections pose a significant threat to global health, affecting millions of people every year and presenting substantial challenges to healthcare systems worldwide. Efficient and timely testing plays a critical role in disease control and transmission prevention. Group testing is a well-established method for reducing the number of tests needed to screen large populations when the disease prevalence is low. However, it does not fully utilize the quantitative information provided by qPCR methods, nor is it able to accommodate a wide range of pathogen loads. RESULTS: To address these issues, we introduce a novel adaptive semi-quantitative group testing (SQGT) scheme to efficiently screen populations via two-stage qPCR testing. The SQGT method quantizes cycle threshold (Ct) values into multiple bins, leveraging the information from the first stage of screening to improve the detection sensitivity. Dynamic Ct threshold adjustments mitigate dilution effects and enhance test accuracy. Comparisons with traditional binary outcome GT methods show that SQGT reduces the number of tests by 24% on the only complete real-world qPCR group testing dataset from Israel, while maintaining a negligible false negative rate. CONCLUSION: In conclusion, our adaptive SQGT approach, utilizing qPCR data and dynamic threshold adjustments, offers a promising solution for efficient population screening. With a reduction in the number of tests and minimal false negatives, SQGT holds potential to enhance disease control and testing strategies on a global scale.


Assuntos
Reação em Cadeia da Polimerase em Tempo Real , Reação em Cadeia da Polimerase em Tempo Real/métodos , Humanos
2.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34524425

RESUMO

To enable personalized cancer treatment, machine learning models have been developed to predict drug response as a function of tumor and drug features. However, most algorithm development efforts have relied on cross-validation within a single study to assess model accuracy. While an essential first step, cross-validation within a biological data set typically provides an overly optimistic estimate of the prediction performance on independent test sets. To provide a more rigorous assessment of model generalizability between different studies, we use machine learning to analyze five publicly available cell line-based data sets: National Cancer Institute 60, ancer Therapeutics Response Portal (CTRP), Genomics of Drug Sensitivity in Cancer, Cancer Cell Line Encyclopedia and Genentech Cell Line Screening Initiative (gCSI). Based on observed experimental variability across studies, we explore estimates of prediction upper bounds. We report performance results of a variety of machine learning models, with a multitasking deep neural network achieving the best cross-study generalizability. By multiple measures, models trained on CTRP yield the most accurate predictions on the remaining testing data, and gCSI is the most predictable among the cell line data sets included in this study. With these experiments and further simulations on partial data, two lessons emerge: (1) differences in viability assays can limit model generalizability across studies and (2) drug diversity, more than tumor diversity, is crucial for raising model generalizability in preclinical screening.


Assuntos
Neoplasias , Algoritmos , Linhagem Celular , Humanos , Aprendizado de Máquina , Neoplasias/tratamento farmacológico , Neoplasias/genética , Redes Neurais de Computação
3.
PLoS Comput Biol ; 19(11): e1011563, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37971967

RESUMO

mRNA levels of all genes in a genome is a critical piece of information defining the overall state of the cell in a given environmental condition. Being able to reconstruct such condition-specific expression in fungal genomes is particularly important to metabolically engineer these organisms to produce desired chemicals in industrially scalable conditions. Most previous deep learning approaches focused on predicting the average expression levels of a gene based on its promoter sequence, ignoring its variation across different conditions. Here we present FUN-PROSE-a deep learning model trained to predict differential expression of individual genes across various conditions using their promoter sequences and expression levels of all transcription factors. We train and test our model on three fungal species and get the correlation between predicted and observed condition-specific gene expression as high as 0.85. We then interpret our model to extract promoter sequence motifs responsible for variable expression of individual genes. We also carried out input feature importance analysis to connect individual transcription factors to their gene targets. A sizeable fraction of both sequence motifs and TF-gene interactions learned by our model agree with previously known biological information, while the rest corresponds to either novel biological facts or indirect correlations.


Assuntos
Aprendizado Profundo , Saccharomyces cerevisiae , Saccharomyces cerevisiae/genética , Biologia Computacional , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Expressão Gênica
4.
Proc Natl Acad Sci U S A ; 118(8)2021 02 23.
Artigo em Inglês | MEDLINE | ID: mdl-33593911

RESUMO

The central question in the origin of life is to understand how structure can emerge from randomness. The Eigen theory of replication states, for sequences that are copied one base at a time, that the replication fidelity has to surpass an error threshold to avoid that replicated specific sequences become random because of the incorporated replication errors [M. Eigen, Naturwissenschaften 58 (10), 465-523 (1971)]. Here, we showed that linking short oligomers from a random sequence pool in a templated ligation reaction reduced the sequence space of product strands. We started from 12-mer oligonucleotides with two bases in all possible combinations and triggered enzymatic ligation under temperature cycles. Surprisingly, we found the robust creation of long, highly structured sequences with low entropy. At the ligation site, complementary and alternating sequence patterns developed. However, between the ligation sites, we found either an A-rich or a T-rich sequence within a single oligonucleotide. Our modeling suggests that avoidance of hairpins was the likely cause for these two complementary sequence pools. What emerged was a network of complementary sequences that acted both as templates and substrates of the reaction. This self-selecting ligation reaction could be restarted by only a few majority sequences. The findings showed that replication by random templated ligation from a random sequence input will lead to a highly structured, long, and nonrandom sequence pool. This is a favorable starting point for a subsequent Darwinian evolution searching for higher catalytic functions in an RNA world scenario.


Assuntos
Evolução Molecular , Conformação de Ácido Nucleico , Oligonucleotídeos/química , Origem da Vida , Moldes Genéticos , DNA Polimerase Dirigida por DNA/metabolismo
5.
Proc Natl Acad Sci U S A ; 118(17)2021 04 27.
Artigo em Inglês | MEDLINE | ID: mdl-33833080

RESUMO

Epidemics generally spread through a succession of waves that reflect factors on multiple timescales. On short timescales, superspreading events lead to burstiness and overdispersion, whereas long-term persistent heterogeneity in susceptibility is expected to lead to a reduction in both the infection peak and the herd immunity threshold (HIT). Here, we develop a general approach to encompass both timescales, including time variations in individual social activity, and demonstrate how to incorporate them phenomenologically into a wide class of epidemiological models through reparameterization. We derive a nonlinear dependence of the effective reproduction number [Formula: see text] on the susceptible population fraction S. We show that a state of transient collective immunity (TCI) emerges well below the HIT during early, high-paced stages of the epidemic. However, this is a fragile state that wanes over time due to changing levels of social activity, and so the infection peak is not an indication of long-lasting herd immunity: Subsequent waves may emerge due to behavioral changes in the population, driven by, for example, seasonal factors. Transient and long-term levels of heterogeneity are estimated using empirical data from the COVID-19 epidemic and from real-life face-to-face contact networks. These results suggest that the hardest hit areas, such as New York City, have achieved TCI following the first wave of the epidemic, but likely remain below the long-term HIT. Thus, in contrast to some previous claims, these regions can still experience subsequent waves.


Assuntos
COVID-19 , Epidemias , Imunidade Coletiva , Modelos Imunológicos , SARS-CoV-2/imunologia , COVID-19/epidemiologia , COVID-19/imunologia , COVID-19/transmissão , Humanos , Estados Unidos/epidemiologia
6.
PLoS Comput Biol ; 18(12): e1010244, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36574450

RESUMO

Recent observations have revealed that closely related strains of the same microbial species can stably coexist in natural and laboratory settings subject to boom and bust dynamics and serial dilutions, respectively. However, the possible mechanisms enabling the coexistence of only a handful of strains, but not more, have thus far remained unknown. Here, using a consumer-resource model of microbial ecosystems, we propose that by differentiating along Monod parameters characterizing microbial growth rates in high and low nutrient conditions, strains can coexist in patterns similar to those observed. In our model, boom and bust environments create satellite niches due to resource concentrations varying in time. These satellite niches can be occupied by closely related strains, thereby enabling their coexistence. We demonstrate that this result is valid even in complex environments consisting of multiple resources and species. In these complex communities, each species partitions resources differently and creates separate sets of satellite niches for their own strains. While there is no theoretical limit to the number of coexisting strains, in our simulations, we always find between 1 and 3 strains coexisting, consistent with known experiments and observations.


Assuntos
Ecossistema , Microbiota
7.
PLoS Comput Biol ; 16(8): e1008135, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32810127

RESUMO

Social interaction between microbes can be described at many levels of details: from the biochemistry of cell-cell interactions to the ecological dynamics of populations. Choosing an appropriate level to model microbial communities without losing generality remains a challenge. Here we show that modeling cross-feeding interactions at an intermediate level between genome-scale metabolic models of individual species and consumer-resource models of ecosystems is suitable to experimental data. We applied our modeling framework to three published examples of multi-strain Escherichia coli communities with increasing complexity: uni-, bi-, and multi-directional cross-feeding of either substitutable metabolic byproducts or essential nutrients. The intermediate-scale model accurately fit empirical data and quantified metabolic exchange rates that are hard to measure experimentally, even for a complex community of 14 amino acid auxotrophies. By studying the conditions of species coexistence, the ecological outcomes of cross-feeding interactions, and each community's robustness to perturbations, we extracted new quantitative insights from these three published experimental datasets. Our analysis provides a foundation to quantify cross-feeding interactions from experimental data, and highlights the importance of metabolic exchanges in the dynamics and stability of microbial communities.


Assuntos
Microbiota , Bactérias/classificação , Bactérias/metabolismo , Modelos Biológicos
8.
PLoS Comput Biol ; 15(12): e1007524, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31856158

RESUMO

The human gut microbiome is a complex ecosystem, in which hundreds of microbial species and metabolites coexist, in part due to an extensive network of cross-feeding interactions. However, both the large-scale trophic organization of this ecosystem, and its effects on the underlying metabolic flow, remain unexplored. Here, using a simplified model, we provide quantitative support for a multi-level trophic organization of the human gut microbiome, where microbes consume and secrete metabolites in multiple iterative steps. Using a manually-curated set of metabolic interactions between microbes, our model suggests about four trophic levels, each characterized by a high level-to-level metabolic transfer of byproducts. It also quantitatively predicts the typical metabolic environment of the gut (fecal metabolome) in approximate agreement with the real data. To understand the consequences of this trophic organization, we quantify the metabolic flow and biomass distribution, and explore patterns of microbial and metabolic diversity in different levels. The hierarchical trophic organization suggested by our model can help mechanistically establish causal links between the abundances of microbes and metabolites in the human gut.


Assuntos
Microbioma Gastrointestinal/fisiologia , Modelos Biológicos , Biomassa , Biologia Computacional , Simulação por Computador , Ecossistema , Humanos , Metaboloma , Interações Microbianas , Biologia de Sistemas
9.
Nucleic Acids Res ; 45(13): 7615-7622, 2017 Jul 27.
Artigo em Inglês | MEDLINE | ID: mdl-28605556

RESUMO

Among several quantitative invariants found in evolutionary genomics, one of the most striking is the scaling of the overall abundance of proteins, or protein domains, sharing a specific functional annotation across genomes of given size. The size of these functional categories change, on average, as power-laws in the total number of protein-coding genes. Here, we show that such regularities are not restricted to the overall behavior of high-level functional categories, but also exist systematically at the level of single evolutionary families of protein domains. Specifically, the number of proteins within each family follows family-specific scaling laws with genome size. Functionally similar sets of families tend to follow similar scaling laws, but this is not always the case. To understand this systematically, we provide a comprehensive classification of families based on their scaling properties. Additionally, we develop a quantitative score for the heterogeneity of the scaling of families belonging to a given category or predefined group. Under the common reasonable assumption that selection is driven solely or mainly by biological function, these findings point to fine-tuned and interdependent functional roles of specific protein domains, beyond our current functional annotations. This analysis provides a deeper view on the links between evolutionary expansion of protein families and the functional constraints shaping the gene repertoire of bacterial genomes.


Assuntos
Evolução Molecular , Genoma Bacteriano , Proteínas de Bactérias/química , Proteínas de Bactérias/classificação , Proteínas de Bactérias/genética , Tamanho do Genoma , Domínios Proteicos/genética , Proteoma/química , Proteoma/classificação , Proteoma/genética , Fatores de Transcrição/química , Fatores de Transcrição/classificação , Fatores de Transcrição/genética
10.
BMC Bioinformatics ; 19(Suppl 18): 486, 2018 Dec 21.
Artigo em Inglês | MEDLINE | ID: mdl-30577754

RESUMO

BACKGROUND: The National Cancer Institute drug pair screening effort against 60 well-characterized human tumor cell lines (NCI-60) presents an unprecedented resource for modeling combinational drug activity. RESULTS: We present a computational model for predicting cell line response to a subset of drug pairs in the NCI-ALMANAC database. Based on residual neural networks for encoding features as well as predicting tumor growth, our model explains 94% of the response variance. While our best result is achieved with a combination of molecular feature types (gene expression, microRNA and proteome), we show that most of the predictive power comes from drug descriptors. To further demonstrate value in detecting anticancer therapy, we rank the drug pairs for each cell line based on model predicted combination effect and recover 80% of the top pairs with enhanced activity. CONCLUSIONS: We present promising results in applying deep learning to predicting combinational drug response. Our feature analysis indicates screening data involving more cell lines are needed for the models to make better use of molecular features.


Assuntos
Aprendizado Profundo/tendências , Avaliação Pré-Clínica de Medicamentos/métodos , Linhagem Celular Tumoral , Humanos , National Cancer Institute (U.S.) , Redes Neurais de Computação , Estados Unidos
11.
Phys Rev Lett ; 120(15): 158102, 2018 Apr 13.
Artigo em Inglês | MEDLINE | ID: mdl-29756882

RESUMO

Microbial ecosystems are remarkably diverse, stable, and usually consist of a mixture of core and peripheral species. Here we propose a conceptual model exhibiting all these emergent properties in quantitative agreement with real ecosystem data, specifically species abundance and prevalence distributions. Resource competition and metabolic commensalism drive the stochastic ecosystem assembly in our model. We demonstrate that even when supplied with just one resource, ecosystems can exhibit high diversity, increasing stability, and partial reproducibility between samples.


Assuntos
Ecossistema , Microbiota , Modelos Biológicos , Processos Estocásticos
12.
J Chem Phys ; 149(13): 134901, 2018 Oct 07.
Artigo em Inglês | MEDLINE | ID: mdl-30292218

RESUMO

Reduction of information entropy along with ever-increasing complexity is among the key signatures of life. Understanding the onset of such behavior in the early prebiotic world is essential for solving the problem of the origin of life. Here we study a general problem of heteropolymers capable of template-assisted ligation based on Watson-Crick-like hybridization. The system is driven off-equilibrium by cyclic changes in the environment. We model the dynamics of 2-mers, i.e., sequential pairs of specific monomers within the heteropolymer population. While the possible number of them is Z 2 (where Z is the number of monomer types), we observe that most of the 2-mers get extinct, leaving no more than 2Z survivors. This leads to a dramatic reduction of the information entropy in the sequence space. Our numerical results are supported by a general mathematical analysis of the competition of growing polymers for constituent monomers. This natural-selection-like process ultimately results in a limited subset of polymer sequences. Importantly, the set of surviving sequences depends on initial concentrations of monomers and remains exponentially large (2 L down from Z L for length L) in each of realizations. Thus, an inhomogeneity in initial conditions allows for a massively parallel search of the sequence space for biologically functional polymers, such as ribozymes. We also propose potential experimental implementations of our model in the contexts of either biopolymers or artificial nano-structures.


Assuntos
Biopolímeros/química , Entropia , Origem da Vida , Algoritmos , Catálise , Gráficos por Computador , Simulação por Computador , Modelos Químicos , Hibridização de Ácido Nucleico , Polimerização
13.
Proc Natl Acad Sci U S A ; 112(29): 9070-5, 2015 Jul 21.
Artigo em Inglês | MEDLINE | ID: mdl-26153419

RESUMO

An approximation to the ∼4-Mbp basic genome shared by 32 strains of Escherichia coli representing six evolutionary groups has been derived and analyzed computationally. A multiple alignment of the 32 complete genome sequences was filtered to remove mobile elements and identify the most reliable ∼90% of the aligned length of each of the resulting 496 basic-genome pairs. Patterns of single base-pair mutations (SNPs) in aligned pairs distinguish clonally inherited regions from regions where either genome has acquired DNA fragments from diverged genomes by homologous recombination since their last common ancestor. Such recombinant transfer is pervasive across the basic genome, mostly between genomes in the same evolutionary group, and generates many unique mosaic patterns. The six least-diverged genome pairs have one or two recombinant transfers of length ∼40-115 kbp (and few if any other transfers), each containing one or more gene clusters known to confer strong selective advantage in some environments. Moderately diverged genome pairs (0.4-1% SNPs) show mosaic patterns of interspersed clonal and recombinant regions of varying lengths throughout the basic genome, whereas more highly diverged pairs within an evolutionary group or pairs between evolutionary groups having >1.3% SNPs have few clonal matches longer than a few kilobase pairs. Many recombinant transfers appear to incorporate fragments of the entering DNA produced by restriction systems of the recipient cell. A simple computational model can closely fit the data. Most recombinant transfers seem likely to be due to generalized transduction by coevolving populations of phages, which could efficiently distribute variability throughout bacterial genomes.


Assuntos
Escherichia coli/genética , Genoma Bacteriano , Recombinação Genética/genética , Transformação Genética , Bacteriófagos/genética , Pareamento de Bases/genética , Evolução Biológica , Células Clonais , Escherichia coli/virologia , Vetores Genéticos , Modelos Genéticos , Anotação de Sequência Molecular , Mosaicismo , Filogenia , Polimorfismo de Nucleotídeo Único/genética , Mapeamento por Restrição , Transdução Genética
14.
Plant J ; 86(6): 472-80, 2016 06.
Artigo em Inglês | MEDLINE | ID: mdl-27015116

RESUMO

Transcriptome data sets from thousands of samples of the model plant Arabidopsis thaliana have been collectively generated by multiple individual labs. Although integration and meta-analysis of these samples has become routine in the plant research community, it is often hampered by a lack of metadata or differences in annotation styles of different labs. In this study, we carefully selected and integrated 6057 Arabidopsis microarray expression samples from 304 experiments deposited to the Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI). Metadata such as tissue type, growth conditions and developmental stage were manually curated for each sample. We then studied the global expression landscape of the integrated data set and found that samples of the same tissue tend to be more similar to each other than to samples of other tissues, even in different growth conditions or developmental stages. Root has the most distinct transcriptome, compared with aerial tissues, but the transcriptome of cultured root is more similar to the transcriptome of aerial tissues, as the cultured root samples lost their cellular identity. Using a simple computational classification method, we showed that the tissue type of a sample can be successfully predicted based on its expression profile, opening the door for automatic metadata extraction and facilitating the re-use of plant transcriptome data. As a proof of principle, we applied our automated annotation pipeline to 708 RNA-seq samples from public repositories and verified the accuracy of our predictions with sample metadata provided by the authors.


Assuntos
Arabidopsis/genética , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Proteínas de Arabidopsis/genética , Regulação da Expressão Gênica de Plantas/genética
15.
PLoS Comput Biol ; 12(10): e1005146, 2016 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-27760135

RESUMO

Gene expression is controlled by the combinatorial effects of regulatory factors from different biological subsystems such as general transcription factors (TFs), cellular growth factors and microRNAs. A subsystem's gene expression may be controlled by its internal regulatory factors, exclusively, or by external subsystems, or by both. It is thus useful to distinguish the degree to which a subsystem is regulated internally or externally-e.g., how non-conserved, species-specific TFs affect the expression of conserved, cross-species genes during evolution. We developed a computational method (DREISS, dreiss.gerteinlab.org) for analyzing the Dynamics of gene expression driven by Regulatory networks, both External and Internal based on State Space models. Given a subsystem, the "state" and "control" in the model refer to its own (internal) and another subsystem's (external) gene expression levels. The state at a given time is determined by the state and control at a previous time. Because typical time-series data do not have enough samples to fully estimate the model's parameters, DREISS uses dimensionality reduction, and identifies canonical temporal expression trajectories (e.g., degradation, growth and oscillation) representing the regulatory effects emanating from various subsystems. To demonstrate capabilities of DREISS, we study the regulatory effects of evolutionarily conserved vs. divergent TFs across distant species. In particular, we applied DREISS to the time-series gene expression datasets of C. elegans and D. melanogaster during their embryonic development. We analyzed the expression dynamics of the conserved, orthologous genes (orthologs), seeing the degree to which these can be accounted for by orthologous (internal) versus species-specific (external) TFs. We found that between two species, the orthologs have matched, internally driven expression patterns but very different externally driven ones. This is particularly true for genes with evolutionarily ancient functions (e.g. the ribosomal proteins), in contrast to those with more recently evolved functions (e.g., cell-cell communication). This suggests that despite striking morphological differences, some fundamental embryonic-developmental processes are still controlled by ancient regulatory systems.


Assuntos
Algoritmos , Regulação da Expressão Gênica/fisiologia , Redes Reguladoras de Genes/fisiologia , Modelos Biológicos , Proteoma/metabolismo , Software , Animais , Simulação por Computador , Retroalimentação Fisiológica/fisiologia , Humanos
16.
PLoS Comput Biol ; 11(9): e1004440, 2015 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-26367172

RESUMO

Populations of species in ecosystems are often constrained by availability of resources within their environment. In effect this means that a growth of one population, needs to be balanced by comparable reduction in populations of others. In neutral models of biodiversity all populations are assumed to change incrementally due to stochastic births and deaths of individuals. Here we propose and model another redistribution mechanism driven by abrupt and severe reduction in size of the population of a single species freeing up resources for the remaining ones. This mechanism may be relevant e.g. for communities of bacteria, with strain-specific collapses caused e.g. by invading bacteriophages, or for other ecosystems where infectious diseases play an important role. The emergent dynamics of our system is characterized by cyclic ''diversity waves'' triggered by collapses of globally dominating populations. The population diversity peaks at the beginning of each wave and exponentially decreases afterwards. Species abundances have bimodal time-aggregated distribution with the lower peak formed by populations of recently collapsed or newly introduced species while the upper peak--species that has not yet collapsed in the current wave. In most waves both upper and lower peaks are composed of several smaller peaks. This self-organized hierarchical peak structure has a long-term memory transmitted across several waves. It gives rise to a scale-free tail of the time-aggregated population distribution with a universal exponent of 1.7. We show that diversity wave dynamics is robust with respect to variations in the rules of our model such as diffusion between multiple environments, species-specific growth and extinction rates, and bet-hedging strategies.


Assuntos
Biologia Computacional/métodos , Modelos Teóricos , Dinâmica Populacional , Bactérias , Bacteriófagos , Biodiversidade , Economia , Ecossistema , Especiação Genética
17.
Proc Natl Acad Sci U S A ; 110(15): 6235-9, 2013 Apr 09.
Artigo em Inglês | MEDLINE | ID: mdl-23530195

RESUMO

Bacterial genomes and large-scale computer software projects both consist of a large number of components (genes or software packages) connected via a network of mutual dependencies. Components can be easily added or removed from individual systems, and their use frequencies vary over many orders of magnitude. We study this frequency distribution in genomes of ∼500 bacterial species and in over 2 million Linux computers and find that in both cases it is described by the same scale-free power-law distribution with an additional peak near the tail of the distribution corresponding to nearly universal components. We argue that the existence of a power law distribution of frequencies of components is a general property of any modular system with a multilayered dependency network. We demonstrate that the frequency of a component is positively correlated with its dependency degree given by the total number of upstream components whose operation directly or indirectly depends on the selected component. The observed frequency/dependency degree distributions are reproduced in a simple mathematically tractable model introduced and analyzed in this study.


Assuntos
Genes Bacterianos/genética , Software , Biologia de Sistemas/métodos , Simulação por Computador , Bases de Dados Factuais , Bases de Dados Genéticas , Frequência do Gene , Genoma Bacteriano , Modelos Biológicos , Probabilidade , Linguagens de Programação
18.
J Chem Phys ; 143(4): 045102, 2015 Jul 28.
Artigo em Inglês | MEDLINE | ID: mdl-26233165

RESUMO

Self-replicating systems based on information-coding polymers are of crucial importance in biology. They also recently emerged as a paradigm in material design on nano- and micro-scales. We present a general theoretical and numerical analysis of the problem of spontaneous emergence of autocatalysis for heteropolymers capable of template-assisted ligation driven by cyclic changes in the environment. Our central result is the existence of the first order transition between the regime dominated by free monomers and that with a self-sustaining population of sufficiently long chains. We provide a simple, mathematically tractable model supported by numerical simulations, which predicts the distribution of chain lengths and the onset of autocatalysis in terms of the overall monomer concentration and two fundamental rate constants. Another key result of our study is the emergence of the kinetically limited optimal overlap length between a template and each of its two substrates. The template-assisted ligation allows for heritable transmission of the information encoded in chain sequences thus opening up the possibility of long-term memory and evolvability in such systems.


Assuntos
Biopolímeros/química , Substâncias Macromoleculares/química , Modelos Teóricos , Catálise , Simulação por Computador , Cinética
19.
PLoS Comput Biol ; 9(4): e1003023, 2013 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-23592969

RESUMO

In addition to their biological function, protein complexes reduce the exposure of the constituent proteins to the risk of undesired oligomerization by reducing the concentration of the free monomeric state. We interpret this reduced risk as a stabilization of the functional state of the protein. We estimate that protein-protein interactions can account for ~2-4 k(B)T of additional stabilization; a substantial contribution to intrinsic stability. We hypothesize that proteins in the interaction network act as evolutionary capacitors which allows their binding partners to explore regions of the sequence space which correspond to less stable proteins. In the interaction network of baker's yeast, we find that statistically proteins that receive higher energetic benefits from the interaction network are more likely to misfold. A simplified fitness landscape wherein the fitness of an organism is inversely proportional to the total concentration of unfolded proteins provides an evolutionary justification for the proposed trends. We conclude by outlining clear biophysical experiments to test our predictions.


Assuntos
Biologia Computacional/métodos , Evolução Molecular , Proteínas Fúngicas/química , Mapeamento de Interação de Proteínas/métodos , Mapas de Interação de Proteínas , Proteínas/química , Citoplasma/química , Proteínas de Choque Térmico HSP90/química , Ligação Proteica , Conformação Proteica , Desnaturação Proteica , Dobramento de Proteína , Saccharomyces cerevisiae/química , Termodinâmica
20.
Proc Natl Acad Sci U S A ; 108(10): 4258-63, 2011 Mar 08.
Artigo em Inglês | MEDLINE | ID: mdl-21368118

RESUMO

How do living cells achieve sufficient abundances of functional protein complexes while minimizing promiscuous nonfunctional interactions? Here we study this problem using a first-principle model of the cell whose phenotypic traits are directly determined from its genome through biophysical properties of protein structures and binding interactions in a crowded cellular environment. The model cell includes three independent prototypical pathways, whose topologies of protein-protein interaction (PPI) subnetworks are different, but whose contributions to the cell fitness are equal. Model cells evolve through genotypic mutations and phenotypic protein copy number variations. We found a strong relationship between evolved physical-chemical properties of protein interactions and their abundances due to a "frustration" effect: Strengthening of functional interactions brings about hydrophobic interfaces, which make proteins prone to promiscuous binding. The balancing act is achieved by lowering concentrations of hub proteins while raising solubilities and abundances of functional monomers. On the basis of these principles we generated and analyzed a possible realization of the proteome-wide PPI network in yeast. In this simulation we found that high-throughput affinity capture-mass spectroscopy experiments can detect functional interactions with high fidelity only for high-abundance proteins while missing most interactions for low-abundance proteins.


Assuntos
Proteínas/metabolismo , Mutação , Ligação Proteica , Proteínas/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA