Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
1.
BMC Genomics ; 23(1): 408, 2022 May 30.
Artículo en Inglés | MEDLINE | ID: mdl-35637464

RESUMEN

BACKGROUND: Codon usage bias (CUB), the non-uniform usage of synonymous codons, occurs across all domains of life. Adaptive CUB is hypothesized to result from various selective pressures, including selection for efficient ribosome elongation, accurate translation, mRNA secondary structure, and/or protein folding. Given the critical link between protein folding and protein function, numerous studies have analyzed the relationship between codon usage and protein structure. The results from these studies have often been contradictory, likely reflecting the differing methods used for measuring codon usage and the failure to appropriately control for confounding factors, such as differences in amino acid usage between protein structures and changes in the frequency of different structures with gene expression. RESULTS: Here we take an explicit population genetics approach to quantify codon-specific shifts in natural selection related to protein structure in S. cerevisiae and E. coli. Unlike other metrics of codon usage, our approach explicitly separates the effects of natural selection, scaled by gene expression, and mutation bias while naturally accounting for a region's amino acid usage. Bayesian model comparisons suggest selection on codon usage varies only slightly between helix, sheet, and coil secondary structures and, similarly, between structured and intrinsically-disordered regions. Similarly, in contrast to prevous findings, we find selection on codon usage only varies slightly at the termini of helices in E. coli. Using simulated data, we show this previous work indicating "non-optimal" codons are enriched at the beginning of helices in S. cerevisiae was due to failure to control for various confounding factors (e.g. amino acid biases, gene expression, etc.), and rather than selection to modulate cotranslational folding. CONCLUSIONS: Our results reveal a weak relationship between codon usage and protein structure, indicating that differences in selection on codon usage between structures are slight. In addition to the magnitude of differences in selection between protein structures being slight, the observed shifts appear to be idiosyncratic and largely codon-specific rather than systematic reversals in the nature of selection. Overall, our work demonstrates the statistical power and benefits of studying selective shifts on codon usage or other genomic features from an explicitly evolutionary approach. Limitations of this approach and future potential research avenues are discussed.


Asunto(s)
Uso de Codones , Saccharomyces cerevisiae , Aminoácidos/genética , Teorema de Bayes , Codón/genética , Escherichia coli/genética , Genética de Población , Saccharomyces cerevisiae/genética , Selección Genética
2.
Mol Biol Evol ; 38(4): 1641-1652, 2021 04 13.
Artículo en Inglés | MEDLINE | ID: mdl-33306127

RESUMEN

Ultraconserved elements (UCEs) are stretches of hundreds of nucleotides with highly conserved cores flanked by variable regions. Although the selective forces responsible for the preservation of UCEs are unknown, they are nonetheless believed to contain phylogenetically meaningful information from deep to shallow divergence events. Phylogenetic applications of UCEs assume the same degree of rate heterogeneity applies across the entire locus, including variable flanking regions. We present a Wright-Fisher model of selection on nucleotides (SelON) which includes the effects of mutation, drift, and spatially varying, stabilizing selection for an optimal nucleotide sequence. The SelON model assumes the strength of stabilizing selection follows a position-dependent Gaussian function whose exact shape can vary between UCEs. We evaluate SelON by comparing its performance to a simpler and spatially invariant GTR+Γ model using an empirical data set of 400 vertebrate UCEs used to determine the phylogenetic position of turtles. We observe much improvement in model fit of SelON over the GTR+Γ model, and support for turtles as sister to lepidosaurs. Overall, the UCE-specific parameters SelON estimates provide a compact way of quantifying the strength and variation in selection within and across UCEs. SelON can also be extended to include more realistic mapping functions between sequence and stabilizing selection as well as allow for greater levels of rate heterogeneity. By more explicitly modeling the nature of selection on UCEs, SelON and similar approaches can be used to better understand the biological mechanisms responsible for their preservation across highly divergent taxa and long evolutionary time scales.


Asunto(s)
Modelos Genéticos , Selección Genética , Secuencia de Bases , Secuencia Conservada , Filogenia
3.
BMC Evol Biol ; 20(1): 109, 2020 08 26.
Artículo en Inglés | MEDLINE | ID: mdl-32842959

RESUMEN

BACKGROUND: For decades, codon usage has been used as a measure of adaptation for translational efficiency and translation accuracy of a gene's coding sequence. These patterns of codon usage reflect both the selective and mutational environment in which the coding sequences evolved. Over this same period, gene transfer between lineages has become widely recognized as an important biological phenomenon. Nevertheless, most studies of codon usage implicitly assume that all genes within a genome evolved under the same selective and mutational environment, an assumption violated when introgression occurs. In order to better understand the effects of introgression on codon usage patterns and vice versa, we examine the patterns of codon usage in Lachancea kluyveri, a yeast which has experienced a large introgression. We quantify the effects of mutation bias and selection for translation efficiency on the codon usage pattern of the endogenous and introgressed exogenous genes using a Bayesian mixture model, ROC SEMPPR, which is built on mechanistic assumptions about protein synthesis and grounded in population genetics. RESULTS: We find substantial differences in codon usage between the endogenous and exogenous genes, and show that these differences can be largely attributed to differences in mutation bias favoring A/T ending codons in the endogenous genes while favoring C/G ending codons in the exogenous genes. Recognizing the two different signatures of mutation bias and selection improves our ability to predict protein synthesis rate by 42% and allowed us to accurately assess the decaying signal of endogenous codon mutation and preferences. In addition, using our estimates of mutation bias and selection, we identify Eremothecium gossypii as the closest relative to the exogenous genes, providing an alternative hypothesis about the origin of the exogenous genes, estimate that the introgression occurred ∼6×108 generation ago, and estimate its historic and current selection against mismatched codon usage. CONCLUSIONS: Our work illustrates how mechanistic, population genetic models like ROC SEMPPR can separate the effects of mutation and selection on codon usage and provide quantitative estimates from sequence data.


Asunto(s)
Uso de Codones , Genética de Población , Modelos Genéticos , Saccharomycetales/genética , Selección Genética , Teorema de Bayes , Mutación
4.
BMC Genomics ; 21(1): 370, 2020 May 20.
Artículo en Inglés | MEDLINE | ID: mdl-32434474

RESUMEN

BACKGROUND: Researchers often measure changes in gene expression across conditions to better understand the shared functional roles and regulatory mechanisms of different genes. Analogous to this is comparing gene expression across species, which can improve our understanding of the evolutionary processes shaping the evolution of both individual genes and functional pathways. One area of interest is determining genes showing signals of coevolution, which can also indicate potential functional similarity, analogous to co-expression analysis often performed across conditions for a single species. However, as with any trait, comparing gene expression across species can be confounded by the non-independence of species due to shared ancestry, making standard hypothesis testing inappropriate. RESULTS: We compared RNA-Seq data across 18 fungal species using a multivariate Brownian Motion phylogenetic comparative method (PCM), which allowed us to quantify coevolution between protein pairs while directly accounting for the shared ancestry of the species. Our work indicates proteins which physically-interact show stronger signals of coevolution than randomly-generated pairs. Interactions with stronger empirical and computational evidence also showing stronger signals of coevolution. We examined the effects of number of protein interactions and gene expression levels on coevolution, finding both factors are overall poor predictors of the strength of coevolution between a protein pair. Simulations further demonstrate the potential issues of analyzing gene expression coevolution without accounting for shared ancestry in a standard hypothesis testing framework. Furthermore, our simulations indicate the use of a randomly-generated null distribution as a means of determining statistical significance for detecting coevolving genes with phylogenetically-uncorrected correlations, as has previously been done, is less accurate than PCMs, although is a significant improvement over standard hypothesis testing. These methods are further improved by using a phylogenetically-corrected correlation metric. CONCLUSIONS: Our work highlights potential benefits of using PCMs to detect gene expression coevolution from high-throughput omics scale data. This framework can be built upon to investigate other evolutionary hypotheses, such as changes in transcription regulatory mechanisms across species.


Asunto(s)
Evolución Molecular , Proteínas Fúngicas/genética , Hongos/genética , Expresión Génica , Proteínas Fúngicas/metabolismo , Hongos/clasificación , Hongos/metabolismo , Modelos Genéticos , Fenotipo , Filogenia , Unión Proteica
5.
Mol Biol Evol ; 36(4): 834-851, 2019 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-30521036

RESUMEN

We present a new phylogenetic approach, selection on amino acids and codons (SelAC), whose substitution rates are based on a nested model linking protein expression to population genetics. Unlike simpler codon models that assume a single substitution matrix for all sites, our model more realistically represents the evolution of protein-coding DNA under the assumption of consistent, stabilizing selection using a cost-benefit approach. This cost-benefit approach allows us to generate a set of 20 optimal amino acid-specific matrix families using just a handful of parameters and naturally links the strength of stabilizing selection to protein synthesis levels, which we can estimate. Using a yeast data set of 100 orthologs for 6 taxa, we find SelAC fits the data much better than popular models by 104-105 Akike information criterion units adjusted for small sample bias. Our results also indicated that nested, mechanistic models better predict observed data patterns highlighting the improvement in biological realism in amino acid sequence evolution that our model provides. Additional parameters estimated by SelAC indicate that a large amount of nonphylogenetic, but biologically meaningful, information can be inferred from existing data. For example, SelAC prediction of gene-specific protein synthesis rates correlates well with both empirical (r=0.33-0.48) and other theoretical predictions (r=0.45-0.64) for multiple yeast species. SelAC also provides estimates of the optimal amino acid at each site. Finally, because SelAC is a nested approach based on clearly stated biological assumptions, future modifications, such as including shifts in the optimal amino acid sequence within or across lineages, are possible.


Asunto(s)
Sustitución de Aminoácidos , Técnicas Genéticas , Modelos Genéticos , Filogenia , Selección Genética , Genética de Población/métodos
6.
Bioinformatics ; 34(14): 2496-2498, 2018 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-29522124

RESUMEN

Summary: AnaCoDa is an R package for estimating biologically relevant parameters of mixture models, such as selection against translation inefficiency, non-sense errors and ribosome pausing time, from genomic and high throughput datasets. AnaCoDa provides an adaptive Bayesian MCMC algorithm, fully implemented in C++ for high performance with an ergonomic R interface to improve usability. AnaCoDa employs a generic object-oriented design to allow users to extend the framework and implement their own models. Current models implemented in AnaCoDa can accurately estimate biologically relevant parameters given either protein coding sequences or ribosome foot-printing data. Optionally, AnaCoDa can utilize additional data sources, such as gene expression measurements, to aid model fitting and parameter estimation. By utilizing a hierarchical object structure, some parameters can vary between sets of genes while others can be shared. Genes may be assigned to clusters or membership may be estimated by AnaCoDa. This flexibility allows users to estimate the same model parameter under different biological conditions and categorize genes into different sets based on shared model properties embedded within the data. AnaCoDa also allows users to generate simulated data which can be used to aid model development and model analysis as well as evaluate model adequacy. Finally, AnaCoDa contains a set of visualization routines and the ability to revisit or re-initiate previous model fitting, providing researchers with a well rounded easy to use framework to analyze genome scale data. Availability and implementation: AnaCoDa is freely available under the Mozilla Public License 2.0 on CRAN (https://cran.r-project.org/web/packages/AnaCoDa/).


Asunto(s)
Codón , Genómica/métodos , Modelos Genéticos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Algoritmos , Teorema de Bayes
7.
Mol Phylogenet Evol ; 94(Pt A): 290-7, 2016 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-26358614

RESUMEN

The quality of phylogenetic inference made from protein-coding genes depends, in part, on the realism with which the codon substitution process is modeled. Here we propose a new mechanistic model that combines the standard M0 substitution model of Yang (1997) with a simplified model from Gilchrist (2007) that includes selection on synonymous substitutions as a function of codon-specific nonsense error rates. We tested the newly proposed model by applying it to 104 protein-coding genes in brewer's yeast, and compared the fit of the new model to the standard M0 model and to the mutation-selection model of Yang and Nielsen (2008) using the AIC. Our new model provided significantly better fit in approximately 85% of the cases considered for the basic M0 model and in approximately 25% of the cases for the M0 model with estimated codon frequencies, but only in a few cases when the mutation-selection model was considered. However, our model includes a parameter that can be interpreted as a measure of the rate of protein production, and the estimates of this parameter were highly correlated with an independent measure of protein production for the yeast genes considered here. Finally, we found that in some cases the new model led to the preference of a different phylogeny for a subset of the genes considered, indicating that substitution model choice may have an impact on the estimated phylogeny.


Asunto(s)
Codón/genética , Código Genético , Modelos Genéticos , Selección Genética , Genes Fúngicos , Nucleótidos/genética , Filogenia , Mutación Puntual , Saccharomyces cerevisiae/clasificación , Proteínas de Saccharomyces cerevisiae
8.
PLoS Comput Biol ; 9(11): e1003283, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24244117

RESUMEN

Toxoplasma gondii establishes a chronic infection by forming cysts preferentially in the brain. This chronic infection is one of the most common parasitic infections in humans and can be reactivated to develop life-threatening toxoplasmic encephalitis in immunocompromised patients. Host-pathogen interactions during the chronic infection include growth of the cysts and their removal by both natural rupture and elimination by the immune system. Analyzing these interactions is important for understanding the pathogenesis of this common infection. We developed a differential equation framework of cyst growth and employed Akaike Information Criteria (AIC) to determine the growth and removal functions that best describe the distribution of cyst sizes measured from the brains of chronically infected mice. The AIC strongly support models in which T. gondii cysts grow at a constant rate such that the per capita growth rate of the parasite is inversely proportional to the number of parasites within a cyst, suggesting finely-regulated asynchronous replication of the parasites. Our analyses were also able to reject the models where cyst removal rate increases linearly or quadratically in association with increase in cyst size. The modeling and analysis framework may provide a useful tool for understanding the pathogenesis of infections with other cyst producing parasites.


Asunto(s)
Quistes/parasitología , Interacciones Huésped-Patógeno/fisiología , Modelos Biológicos , Modelos Estadísticos , Toxoplasma/crecimiento & desarrollo , Animales , Encéfalo/parasitología , Biología Computacional , Femenino , Ratones , Toxoplasma/patogenicidad
9.
Proc Natl Acad Sci U S A ; 108(25): 10231-6, 2011 Jun 21.
Artículo en Inglés | MEDLINE | ID: mdl-21646514

RESUMEN

The genetic code is redundant with most amino acids using multiple codons. In many organisms, codon usage is biased toward particular codons. Understanding the adaptive and nonadaptive forces driving the evolution of codon usage bias (CUB) has been an area of intense focus and debate in the fields of molecular and evolutionary biology. However, their relative importance in shaping genomic patterns of CUB remains unsolved. Using a nested model of protein translation and population genetics, we show that observed gene level variation of CUB in Saccharomyces cerevisiae can be explained almost entirely by selection for efficient ribosomal usage, genetic drift, and biased mutation. The correlation between observed codon counts within individual genes and our model predictions is 0.96. Although a variety of factors shape patterns of CUB at the level of individual sites within genes, our results suggest that selection for efficient ribosome usage is a central force in shaping codon usage at the genomic scale. In addition, our model allows direct estimation of codon-specific mutation rates and elongation times and can be readily applied to any organism with high-throughput expression datasets. More generally, we have developed a natural framework for integrating models of molecular processes to population genetics models to quantitatively estimate parameters underlying fundamental biological processes, such a protein translation.


Asunto(s)
Codón , Código Genético , Flujo Genético , Mutación , Biosíntesis de Proteínas , Evolución Biológica , Genoma Fúngico , Modelos Genéticos , Saccharomyces cerevisiae/genética
10.
Sci Rep ; 14(1): 12983, 2024 06 06.
Artículo en Inglés | MEDLINE | ID: mdl-38839808

RESUMEN

Some of the most metabolically diverse species of bacteria (e.g., Actinobacteria) have higher GC content in their DNA, differ substantially in codon usage, and have distinct protein folding environments compared to tractable expression hosts like Escherichia coli. Consequentially, expressing biosynthetic gene clusters (BGCs) from these bacteria in E. coli often results in a myriad of unpredictable issues with regard to protein expression and folding, delaying the biochemical characterization of new natural products. Current strategies to achieve soluble, active expression of these enzymes in tractable hosts can be a lengthy trial-and-error process. Cell-free expression (CFE) has emerged as a valuable expression platform as a testbed for rapid prototyping expression parameters. Here, we use a type III polyketide synthase from Streptomyces griseus, RppA, which catalyzes the formation of the red pigment flaviolin, as a reporter to investigate BGC refactoring techniques. We applied a library of constructs with different combinations of promoters and rppA coding sequences to investigate the synergies between promoter and codon usage. Subsequently, we assess the utility of cell-free systems for prototyping these refactoring tactics prior to their implementation in cells. Overall, codon harmonization improves natural product synthesis more than traditional codon optimization across cell-free and cellular environments. More importantly, the choice of coding sequences and promoters impact protein expression synergistically, which should be considered for future efforts to use CFE for high-yield protein expression. The promoter strategy when applied to RppA was not completely correlated with that observed with GFP, indicating that different promoter strategies should be applied for different proteins. In vivo experiments suggest that there is correlation, but not complete alignment between expressing in cell free and in vivo. Refactoring promoters and/or coding sequences via CFE can be a valuable strategy to rapidly screen for catalytically functional production of enzymes from BCGs, which advances CFE as a tool for natural product research.


Asunto(s)
Sistema Libre de Células , Regiones Promotoras Genéticas , Streptomyces griseus/enzimología , Streptomyces griseus/genética , Streptomyces griseus/metabolismo , Escherichia coli/genética , Escherichia coli/metabolismo , Familia de Multigenes , Proteínas Bacterianas/genética , Proteínas Bacterianas/metabolismo , Sintasas Poliquetidas/genética , Sintasas Poliquetidas/metabolismo , Codón/genética , Aciltransferasas
11.
PLoS Genet ; 6(9): e1001128, 2010 Sep 16.
Artículo en Inglés | MEDLINE | ID: mdl-20862306

RESUMEN

Despite the fact that tRNA abundances are thought to play a major role in determining translation error rates, their distribution across the genetic code and the resulting implications have received little attention. In general, studies of codon usage bias (CUB) assume that codons with higher tRNA abundance have lower missense error rates. Using a model of protein translation based on tRNA competition and intra-ribosomal kinetics, we show that this assumption can be violated when tRNA abundances are positively correlated across the genetic code. Examining the distribution of tRNA abundances across 73 bacterial genomes from 20 different genera, we find a consistent positive correlation between tRNA abundances across the genetic code. This work challenges one of the fundamental assumptions made in over 30 years of research on CUB that codons with higher tRNA abundances have lower missense error rates and that missense errors are the primary selective force responsible for CUB.


Asunto(s)
Codón/genética , Evolución Molecular , Biosíntesis de Proteínas/genética , ARN de Transferencia/genética , Sesgo , Escherichia coli/genética , Variación Genética , Genoma Bacteriano/genética , Modelos Genéticos , Células Procariotas/metabolismo , Especificidad de la Especie
12.
bioRxiv ; 2023 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-38077034

RESUMEN

Some of the most metabolically diverse species of bacteria (e.g., Actinobacteria) have higher GC content in their DNA, differ substantially in codon usage, and have distinct protein folding environments compared to tractable expression hosts like Escherichia coli. Consequentially, expressing biosynthetic gene clusters (BGCs) from these bacteria in E. coli frequently results in a myriad of unpredictable issues with protein expression and folding, delaying the biochemical characterization of new natural products. Current strategies to achieve soluble, active expression of these enzymes in tractable hosts, such as BGC refactoring, can be a lengthy trial-and-error process. Cell-free expression (CFE) has emerged as 1) a valuable expression platform for enzymes that are challenging to synthesize in vivo, and as 2) a testbed for rapid prototyping that can improve cellular expression. Here, we use a type III polyketide synthase from Streptomyces griseus, RppA, which catalyzes the formation of the red pigment flaviolin, as a reporter to investigate BGC refactoring techniques. We synergistically tune promoter and codon usage to improve flaviolin production from cell-free expressed RppA. We then assess the utility of cell-free systems for prototyping these refactoring tactics prior to their implementation in cells. Overall, codon harmonization improves natural product synthesis more than traditional codon optimization across cell-free and cellular environments. Refactoring promoters and/or coding sequences via CFE can be a valuable strategy to rapidly screen for catalytically functional production of enzymes from BCGs. By showing the coordinators between CFE versus in vivo expression, this work advances CFE as a tool for natural product research.

13.
RNA ; 16(4): 748-61, 2010 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-20179149

RESUMEN

Upstream open reading frames (uORFs) are protein coding elements in the 5' leader of messenger RNAs. uORFs generally inhibit translation of the main ORF because ribosomes that perform translation elongation suffer either permanent or conditional loss of reinitiation competence. After conditional loss, reinitiation competence may be regained by, at the minimum, reacquisition of a fresh methionyl-tRNA. The conserved h subunit of Arabidopsis eukaryotic initiation factor 3 (eIF3) mitigates the inhibitory effects of certain uORFs. Here, we define more precisely how this occurs, by combining gene expression data from mutated 5' leaders of Arabidopsis AtbZip11 (At4g34590) and yeast GCN4 with a computational model of translation initiation in wild-type and eif3h mutant plants. Of the four phylogenetically conserved uORFs in AtbZip11, three are inhibitory to translation, while one is anti-inhibitory. The mutation in eIF3h has no major effect on uORF start codon recognition. Instead, eIF3h supports efficient reinitiation after uORF translation. Modeling suggested that the permanent loss of reinitiation competence during uORF translation occurs at a faster rate in the mutant than in the wild type. Thus, eIF3h ensures that a fraction of uORF-translating ribosomes retain their competence to resume scanning. Experiments using the yeast GCN4 leader provided no evidence that eIF3h fosters tRNA reaquisition. Together, these results attribute a specific molecular function in translation initiation to an individual eIF3 subunit in a multicellular eukaryote.


Asunto(s)
Regiones no Traducidas 5' , Factor 3 de Iniciación Eucariótica/metabolismo , Sistemas de Lectura Abierta , Iniciación de la Cadena Peptídica Traduccional , Subunidades de Proteína/metabolismo , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Factores de Transcripción con Cremalleras de Leucina de Carácter Básico/genética , Factores de Transcripción con Cremalleras de Leucina de Carácter Básico/metabolismo , Codón Iniciador , Factor 3 de Iniciación Eucariótica/genética , Mutación , Biosíntesis de Proteínas , Subunidades de Proteína/genética , ARN Mensajero/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo
14.
J Immunol ; 185(10): 5751-61, 2010 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-20937846

RESUMEN

Fucosyltransferase-IV and -VII double knockout (FtDKO) mice reveal profound impairment in T cell trafficking to lymph nodes (LNs) due to an inability to synthesize selectin ligands. We observed an increase in the proportion of memory/effector (CD44(high)) T cells in LNs of FtDKO mice. We infected FtDKO mice with lymphocytic choriomeningitis virus to generate and track Ag-specific CD44(high)CD8 T cells in secondary lymphoid organs. Although frequencies were similar, total Ag-specific effector CD44(high)CD8 T cells were significantly reduced in LNs, but not blood, of FtDKO mice at day 8. In contrast, frequencies of Ag-specific memory CD44(high)CD8 T cells were up to 8-fold higher in LNs of FtDKO mice at day 60. Because wild-type mice treated with anti-CD62L treatment also showed increased frequencies of CD44(high) T cells in LNs, we hypothesized that memory T cells were preferentially retained in, or preferentially migrated to, FtDKO LNs. We analyzed T cell entry and egress in LNs using adoptive transfer of bone fide naive or memory T cells. Memory T cells were not retained longer in LNs compared with naive T cells; however, T cell exit slowed significantly as T cell numbers declined. Memory T cells were profoundly impaired in entering LNs of FtDKO mice; however, memory T cells exhibited greater homeostatic proliferation in FtDKO mice. These results suggest that memory T cells are enriched in LNs with T cell deficits by several mechanisms, including longer T cell retention and increased homeostatic proliferation.


Asunto(s)
Linfocitos T CD8-positivos/citología , Quimiotaxis de Leucocito/inmunología , Memoria Inmunológica , Ganglios Linfáticos/citología , Selectinas/inmunología , Subgrupos de Linfocitos T/citología , Animales , Linfocitos T CD8-positivos/inmunología , Proliferación Celular , Separación Celular , Citometría de Flujo , Fucosiltransferasas/deficiencia , Receptores de Hialuranos/inmunología , Ligandos , Ganglios Linfáticos/inmunología , Ratones , Ratones Endogámicos C57BL , Ratones Noqueados , Subgrupos de Linfocitos T/inmunología
15.
BMC Bioinformatics ; 11: 72, 2010 Feb 03.
Artículo en Inglés | MEDLINE | ID: mdl-20128916

RESUMEN

BACKGROUND: Tag-based techniques, such as SAGE, are commonly used to sample the mRNA pool of an organism's transcriptome. Incomplete digestion during the tag formation process may allow for multiple tags to be generated from a given mRNA transcript. The probability of forming a tag varies with its relative location. As a result, the observed tag counts represent a biased sample of the actual transcript pool. In SAGE this bias can be avoided by ignoring all but the 3' most tag but will discard a large fraction of the observed data. Taking this bias into account should allow more of the available data to be used leading to increased statistical power. RESULTS: Three new hierarchical models, which directly embed a model for the variation in tag formation probability, are proposed and their associated Bayesian inference algorithms are developed. These models may be applied to libraries at both the tag and aggregate level. Simulation experiments and analysis of real data are used to contrast the accuracy of the various methods. The consequences of tag formation bias are discussed in the context of testing differential expression. A description is given as to how these algorithms can be applied in that context. CONCLUSIONS: Several Bayesian inference algorithms that account for tag formation effects are compared with the DPB algorithm providing clear evidence of superior performance. The accuracy of inferences when using a particular non-informative prior is found to depend on the expression level of a given gene. The multivariate nature of the approach easily allows both univariate and joint tests of differential expression. Calculations demonstrate the potential for false positive and negative findings due to variation in tag formation probabilities across samples when testing for differential expression.


Asunto(s)
Teorema de Bayes , Sesgo , Perfilación de la Expresión Génica , ARN Mensajero/genética
16.
Biochim Biophys Acta Biomembr ; 1860(12): 2479-2485, 2018 12.
Artículo en Inglés | MEDLINE | ID: mdl-30279149

RESUMEN

The Sec secretion pathway is found across all domains of life. A critical feature of Sec secreted proteins is the signal peptide, a short peptide with distinct physicochemical properties located at the N-terminus of the protein. Previous work indicates signal peptides are biased towards translationally inefficient codons, which is hypothesized to be an adaptation driven by selection to improve the efficacy and efficiency of the protein secretion mechanisms. We investigate codon usage in the signal peptides of E. coli using the Codon Adaptation Index (CAI), the tRNA Adaptation Index (tAI), and the ribosomal overhead cost formulation of the stochastic evolutionary model of protein production rates (ROC-SEMPPR). Comparisons between signal peptides and 5'-end of cytoplasmic proteins using CAI and tAI are consistent with a preference for inefficient codons in signal peptides. Simulations reveal these differences are due to amino acid usage and gene expression - we find these differences disappear when accounting for both factors. In contrast, ROC-SEMPPR, a mechanistic population genetics model capable of separating the effects of selection and mutation bias, shows codon usage bias (CUB) of the signal peptides is indistinguishable from the 5'-ends of cytoplasmic proteins. Additionally, we find CUB at the 5'-ends is weaker than later segments of the gene. Results illustrate the value in using models grounded in population genetics to interpret genetic data. We show failure to account for mutation bias and the effects of gene expression on the efficacy of selection against translation inefficiency can lead to a misinterpretation of codon usage patterns.


Asunto(s)
Aminoácidos/metabolismo , Codón , Escherichia coli/genética , Expresión Génica , Señales de Clasificación de Proteína/genética , Genes Bacterianos , Mutación , Biosíntesis de Proteínas , ARN de Transferencia/genética
17.
BMC Bioinformatics ; 8: 403, 2007 Oct 18.
Artículo en Inglés | MEDLINE | ID: mdl-17945026

RESUMEN

BACKGROUND: Serial Analysis of Gene Expression (SAGE) is a high-throughput method for inferring mRNA expression levels from the experimentally generated sequence based tags. Standard analyses of SAGE data, however, ignore the fact that the probability of generating an observable tag varies across genes and between experiments. As a consequence, these analyses result in biased estimators and posterior probability intervals for gene expression levels in the transcriptome. RESULTS: Using the yeast Saccharomyces cerevisiae as an example, we introduce a new Bayesian method of data analysis which is based on a model of SAGE tag formation. Our approach incorporates the variation in the probability of tag formation into the interpretation of SAGE data and allows us to derive exact joint and approximate marginal posterior distributions for the mRNA frequency of genes detectable using SAGE. Our analysis of these distributions indicates that the frequency of a gene in the tag pool is influenced by its mRNA frequency, the cleavage efficiency of the anchoring enzyme (AE), and the number of informative and uninformative AE cleavage sites within its mRNA. CONCLUSION: With a mechanistic, model based approach for SAGE data analysis, we find that inter-genic variation in SAGE tag formation is large. However, this variation can be estimated and, importantly, accounted for using the methods we develop here. As a result, SAGE based estimates of mRNA frequencies can be adjusted to remove the bias introduced by the SAGE tag formation process.


Asunto(s)
Etiquetas de Secuencia Expresada , Perfilación de la Expresión Génica/métodos , Modelos Genéticos , Análisis de Secuencia de ADN/métodos , Factores de Transcripción/genética , Teorema de Bayes , Simulación por Computador , Interpretación Estadística de Datos , Bases de Datos Genéticas , Reproducibilidad de los Resultados , Sensibilidad y Especificidad
18.
Evolution ; 60(5): 970-9, 2006 May.
Artículo en Inglés | MEDLINE | ID: mdl-16817537

RESUMEN

Filamentous fungi are ubiquitous and ecologically important organisms with rich and varied life histories, however, there is no consensus on how to identify or measure their fitness. In the first part of this study we adapt a general epidemiological model to identify the appropriate fitness metric for a saprophytic filamentous fungus. We find that fungal fitness is inversely proportional to the equilibrium density of uncolonized fungal resource patches which, in turn, is a function of the expected spore production of a fungus. In the second part of this study we use a simple life history model of the same fungus within a resource patch to show that a bang-bang resource allocation strategy maximizes the expected spore production, a critical fitness component. Unlike bang-bang strategies identified in other life-history studies, we find that the optimal allocation strategy for saprophytes does not entail the use of all of the resources within a patch.


Asunto(s)
Evolución Biológica , Hongos/crecimiento & desarrollo , Hongos/patogenicidad , Modelos Biológicos , Densidad de Población , Dinámica Poblacional
19.
Genome Biol Evol ; 7(6): 1559-79, 2015 May 14.
Artículo en Inglés | MEDLINE | ID: mdl-25977456

RESUMEN

Extracting biologically meaningful information from the continuing flood of genomic data is a major challenge in the life sciences. Codon usage bias (CUB) is a general feature of most genomes and is thought to reflect the effects of both natural selection for efficient translation and mutation bias. Here we present a mechanistically interpretable, Bayesian model (ribosome overhead costs Stochastic Evolutionary Model of Protein Production Rate [ROC SEMPPR]) to extract meaningful information from patterns of CUB within a genome. ROC SEMPPR is grounded in population genetics and allows us to separate the contributions of mutational biases and natural selection against translational inefficiency on a gene-by-gene and codon-by-codon basis. Until now, the primary disadvantage of similar approaches was the need for genome scale measurements of gene expression. Here, we demonstrate that it is possible to both extract accurate estimates of codon-specific mutation biases and translational efficiencies while simultaneously generating accurate estimates of gene expression, rather than requiring such information. We demonstrate the utility of ROC SEMPPR using the Saccharomyces cerevisiae S288c genome. When we compare our model fits with previous approaches we observe an exceptionally high agreement between estimates of both codon-specific parameters and gene expression levels ([Formula: see text] in all cases). We also observe strong agreement between our parameter estimates and those derived from alternative data sets. For example, our estimates of mutation bias and those from mutational accumulation experiments are highly correlated ([Formula: see text]). Our estimates of codon-specific translational inefficiencies and tRNA copy number-based estimates of ribosome pausing time ([Formula: see text]), and mRNA and ribosome profiling footprint-based estimates of gene expression ([Formula: see text]) are also highly correlated, thus supporting the hypothesis that selection against translational inefficiency is an important force driving the evolution of CUB. Surprisingly, we find that for particular amino acids, codon usage in highly expressed genes can still be largely driven by mutation bias and that failing to take mutation bias into account can lead to the misidentification of an amino acid's "optimal" codon. In conclusion, our method demonstrates that an enormous amount of biologically important information is encoded within genome scale patterns of codon usage, accessing this information does not require gene expression measurements, but instead carefully formulated biologically interpretable models.


Asunto(s)
Codón , Evolución Molecular , Genómica/métodos , Mutación , Biosíntesis de Proteínas , Selección Genética , Expresión Génica , Modelos Genéticos , Saccharomyces cerevisiae/genética
20.
Evolution ; 57(10): 2216-25, 2003 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-14628910

RESUMEN

We present moments and likelihood methods that estimate a DNA substitution rate from a group of closely related sister species pairs separated at an assumed time, and we test these methods with simulations. The methods also estimate ancestral population size and can test whether there is a significant difference among the ancestral population sizes of the sister species pairs. Estimates presented in the literature often ignore the ancestral coalescent prior to speciation and therefore should be biased upward. The simulations show that both methods yield accurate estimates given sample sizes of five or more species pairs and that better likelihood estimates are obtained if there is no significant difference among ancestral population sizes. The model presented here indicates that the larger than expected variation found in multitaxa datasets can be explained by variation in the ancestral coalescence and the Poisson mutation process. In this context, observed variation can often be accounted for by variation in ancestral population sizes rather than invoking variation in other parameters, such as divergence time or mutation rate. The methods are applied to data from two groups of species pairs (sea urchins and Alpheus snapping shrimp) that are thought to have separated by the rise of Panama three million years ago.


Asunto(s)
Evolución Molecular , Geografía , Modelos Genéticos , Filogenia , Animales , Decápodos/genética , Funciones de Verosimilitud , Mutación , Densidad de Población , Erizos de Mar/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA