Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 35
Filtrar
1.
Bioinformatics ; 37(20): 3654-3656, 2021 Oct 25.
Artículo en Inglés | MEDLINE | ID: mdl-33904572

RESUMEN

MOTIVATION: Structure-conditioned information statistics have proven useful to predict and visualize tRNA Class-Informative Features (CIFs) and their evolutionary divergences. Although permutation P-values can quantify the significance of CIF divergences between two taxa, their naive Monte Carlo approximation is slow and inaccurate. The Peaks-over-Threshold approach of Knijnenburg et al. (2009) promises improvements to both speed and accuracy of permutation P-values, but has no publicly available API. RESULTS: We present tRNA Structure-Function Mapper (tSFM) v1.0, an open-source, multi-threaded application that efficiently computes, visualizes and assesses significance of single- and paired-site CIFs and their evolutionary divergences for any RNA, protein, gene or genomic element sequence family. Multiple estimators of permutation P-values for CIF evolutionary divergences are provided along with confidence intervals. tSFM is implemented in Python 3 with compiled C extensions and is freely available through GitHub (https://github.com/tlawrence3/tSFM) and PyPI. AVAILABILITY AND IMPLEMENTATION: The data underlying this article are available on GitHub at https://github.com/tlawrence3/tSFM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

2.
J Mol Evol ; 89(1-2): 103-116, 2021 02.
Artículo en Inglés | MEDLINE | ID: mdl-33528599

RESUMEN

The evolution of tRNA multigene families remains poorly understood, exhibiting unusual phenomena such as functional conversions of tRNA genes through anticodon shift substitutions. We improved FlyBase tRNA gene annotations from twelve Drosophila species, incorporating previously identified ortholog sets to compare substitution rates across tRNA bodies at single-site and base-pair resolution. All rapidly evolving sites fell within the same metal ion-binding pocket that lies at the interface of the two major stacked helical domains. We applied our tRNA Structure-Function Mapper (tSFM) method independently to each Drosophila species and one outgroup species Musca domestica and found that, although predicted tRNA structure-function maps are generally highly conserved in flies, one tRNA Class-Informative Feature (CIF) within the rapidly evolving ion-binding pocket-Cytosine 17 (C17), ancestrally informative for lysylation identity-independently gained asparaginylation identity and substituted in parallel across tRNAAsn paralogs at least once, possibly multiple times, during evolution of the genus. In D. melanogaster, most tRNALys and tRNAAsn genes are co-arrayed in one large heterologous gene cluster, suggesting that heterologous gene conversion as well as structural similarities of tRNA-binding interfaces in the closely related asparaginyl-tRNA synthetase (AsnRS) and lysyl-tRNA synthetase (LysRS) proteins may have played a role in these changes. A previously identified Asn-to-Lys anticodon shift substitution in D. ananassae may have arisen to compensate for the convergent and parallel gains of C17 in tRNAAsn paralogs in that lineage. Our results underscore the functional and evolutionary relevance of our tRNA structure-function map predictions and illuminate multiple genomic and structural factors contributing to rapid, parallel and compensatory evolution of tRNA multigene families.


Asunto(s)
Drosophila melanogaster , ARN de Transferencia , Animales , Anticodón/genética , Drosophila melanogaster/genética , Genoma de los Insectos , ARN de Transferencia/genética
3.
BMC Bioinformatics ; 20(1): 434, 2019 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-31438847

RESUMEN

BACKGROUND: The epidermal growth factor receptor (EGFR) is a major regulator of proliferation in tumor cells. Elevated expression levels of EGFR are associated with prognosis and clinical outcomes of patients in a variety of tumor types. There are at least four splice variants of the mRNA encoding four protein isoforms of EGFR in humans, named I through IV. EGFR isoform I is the full-length protein, whereas isoforms II-IV are shorter protein isoforms. Nevertheless, all EGFR isoforms bind the epidermal growth factor (EGF). Although EGFR is an essential target of long-established and successful tumor therapeutics, the exact function and biomarker potential of alternative EGFR isoforms II-IV are unclear, motivating more in-depth analyses. Hence, we analyzed transcriptome data from glioblastoma cell line SF767 to predict target genes regulated by EGFR isoforms II-IV, but not by EGFR isoform I nor other receptors such as HER2, HER3, or HER4. RESULTS: We analyzed the differential expression of potential target genes in a glioblastoma cell line in two nested RNAi experimental conditions and one negative control, contrasting expression with EGF stimulation against expression without EGF stimulation. In one RNAi experiment, we selectively knocked down EGFR splice variant I, while in the other we knocked down all four EGFR splice variants, so the associated effects of EGFR II-IV knock-down can only be inferred indirectly. For this type of nested experimental design, we developed a two-step bioinformatics approach based on the Bayesian Information Criterion for predicting putative target genes of EGFR isoforms II-IV. Finally, we experimentally validated a set of six putative target genes, and we found that qPCR validations confirmed the predictions in all cases. CONCLUSIONS: By performing RNAi experiments for three poorly investigated EGFR isoforms, we were able to successfully predict 1140 putative target genes specifically regulated by EGFR isoforms II-IV using the developed Bayesian Gene Selection Criterion (BGSC) approach. This approach is easily utilizable for the analysis of data of other nested experimental designs, and we provide an implementation in R that is easily adaptable to similar data or experimental designs together with all raw datasets used in this study in the BGSC repository, https://github.com/GrosseLab/BGSC .


Asunto(s)
Empalme Alternativo/genética , Biología Computacional/métodos , Receptores ErbB/genética , Glioblastoma/genética , Teorema de Bayes , Línea Celular Tumoral , Receptores ErbB/metabolismo , Humanos , Probabilidad , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Interferencia de ARN , ARN Interferente Pequeño/metabolismo , Transducción de Señal
4.
BMC Evol Biol ; 19(1): 224, 2019 12 09.
Artículo en Inglés | MEDLINE | ID: mdl-31818253

RESUMEN

BACKGROUND: Eukaryotes acquired the trait of oxygenic photosynthesis through endosymbiosis of the cyanobacterial progenitor of plastid organelles. Despite recent advances in the phylogenomics of Cyanobacteria, the phylogenetic root of plastids remains controversial. Although a single origin of plastids by endosymbiosis is broadly supported, recent phylogenomic studies are contradictory on whether plastids branch early or late within Cyanobacteria. One underlying cause may be poor fit of evolutionary models to complex phylogenomic data. RESULTS: Using Posterior Predictive Analysis, we show that recently applied evolutionary models poorly fit three phylogenomic datasets curated from cyanobacteria and plastid genomes because of heterogeneities in both substitution processes across sites and of compositions across lineages. To circumvent these sources of bias, we developed CYANO-MLP, a machine learning algorithm that consistently and accurately phylogenetically classifies ("phyloclassifies") cyanobacterial genomes to their clade of origin based on bioinformatically predicted function-informative features in tRNA gene complements. Classification of cyanobacterial genomes with CYANO-MLP is accurate and robust to deletion of clades, unbalanced sampling, and compositional heterogeneity in input tRNA data. CYANO-MLP consistently classifies plastid genomes into a late-branching cyanobacterial sub-clade containing single-cell, starch-producing, nitrogen-fixing ecotypes, consistent with metabolic and gene transfer data. CONCLUSIONS: Phylogenomic data of cyanobacteria and plastids exhibit both site-process heterogeneities and compositional heterogeneities across lineages. These aspects of the data require careful modeling to avoid bias in phylogenomic estimation. Furthermore, we show that amino acid recoding strategies may be insufficient to mitigate bias from compositional heterogeneities. However, the combination of our novel tRNA-specific strategy with machine learning in CYANO-MLP appears robust to these sources of bias with high accuracy in phyloclassification of cyanobacterial genomes. CYANO-MLP consistently classifies plastids as late-branching Cyanobacteria, consistent with independent evidence from signature-based approaches and some previous phylogenetic studies.


Asunto(s)
Cianobacterias/genética , Eucariontes/citología , Eucariontes/genética , Plastidios/genética , Evolución Biológica , Modelos Biológicos , Fotosíntesis , Filogenia , ARN de Transferencia , Simbiosis
5.
Theor Popul Biol ; 129: 68-80, 2019 10.
Artículo en Inglés | MEDLINE | ID: mdl-31042487

RESUMEN

Advances in structural biology of aminoacyl-tRNA synthetases (aaRSs) have revealed incredible diversity in how aaRSs bind their tRNA substrates. The causes of this diversity remain mysterious. We developed a new class of highly rugged fitness landscape models called match landscapes, through which genes encode the assortative interactions of their gene products through the complementarity and identifiability of their structural features. We used results from coding theory to prove bounds and equalities on fitness in match landscapes assuming additive interaction energies, macroscopic aminoacylation kinetics including proofreading, site-specific modifiers of interaction, and selection for translational accuracy in multiple, perfectly encoded site-types. Using genotypes based on extended Hamming codes we show that over a wide array of interface sizes and numbers of encoded cognate pairs, selection for translational accuracy alone is insufficient to displace the tRNA-binding interfaces of aaRSs. Yet, under combined selection for translational accuracy and rate, site-specific modifiers are selected to adaptively displace the tRNA-binding interfaces of non-cognate aaRS-tRNA pairs. We describe a remarkable correspondence between the lengths of perfect RNA (quaternary) codes and the modal sizes of small non-coding RNA families.


Asunto(s)
Aminoacil-ARNt Sintetasas/genética , Aptitud Genética/genética , ARN de Transferencia/genética , Humanos , Metagenómica , Modelos Genéticos , Modelos Estadísticos
6.
BMC Genomics ; 17(1): 1003, 2016 12 08.
Artículo en Inglés | MEDLINE | ID: mdl-27927177

RESUMEN

BACKGROUND: While the CCA sequence at the mature 3' end of tRNAs is conserved and critical for translational function, a genetic template for this sequence is not always contained in tRNA genes. In eukaryotes and Archaea, the CCA ends of tRNAs are synthesized post-transcriptionally by CCA-adding enzymes. In Bacteria, tRNA genes template CCA sporadically. RESULTS: In order to understand the variation in how prokaryotic tRNA genes template CCA, we re-annotated tRNA genes in tRNAdb-CE database version 0.8. Among 132,129 prokaryotic tRNA genes, initiator tRNA genes template CCA at the highest average frequency (74.1%) over all functional classes except selenocysteine and pyrrolysine tRNA genes (88.1% and 100% respectively). Across bacterial phyla and a wide range of genome sizes, many lineages exist in which predominantly initiator tRNA genes template CCA. Convergent and parallel retention of CCA templating in initiator tRNA genes evolved in independent histories of reductive genome evolution in Bacteria. Also, in a majority of cyanobacterial and actinobacterial genera, predominantly initiator tRNA genes template CCA. We also found that a surprising fraction of archaeal tRNA genes template CCA. CONCLUSIONS: We suggest that cotranscriptional synthesis of initiator tRNA CCA 3' ends can complement inefficient processing of initiator tRNA precursors, "bootstrap" rapid initiation of protein synthesis from a non-growing state, or contribute to an increase in cellular growth rates by reducing overheads of mass and energy to maintain nonfunctional tRNA precursor pools. More generally, CCA templating in structurally non-conforming tRNA genes can afford cells robustness and greater plasticity to respond rapidly to environmental changes and stimuli.


Asunto(s)
Bacterias/genética , Precursores del ARN/metabolismo , Anticodón , Archaea/genética , Emparejamiento Base , Secuencia de Bases , Bases de Datos Genéticas , Genes Arqueales , Genes Bacterianos , Precursores del ARN/química , ARN de Transferencia de Metionina/química , ARN de Transferencia de Metionina/metabolismo
7.
PLoS Comput Biol ; 10(2): e1003454, 2014 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-24586126

RESUMEN

Molecular phylogenetics and phylogenomics are subject to noise from horizontal gene transfer (HGT) and bias from convergence in macromolecular compositions. Extensive variation in size, structure and base composition of alphaproteobacterial genomes has complicated their phylogenomics, sparking controversy over the origins and closest relatives of the SAR11 strains. SAR11 are highly abundant, cosmopolitan aquatic Alphaproteobacteria with streamlined, A+T-biased genomes. A dominant view holds that SAR11 are monophyletic and related to both Rickettsiales and the ancestor of mitochondria. Other studies dispute this, finding evidence of a polyphyletic origin of SAR11 with most strains distantly related to Rickettsiales. Although careful evolutionary modeling can reduce bias and noise in phylogenomic inference, entirely different approaches may be useful to extract robust phylogenetic signals from genomes. Here we develop simple phyloclassifiers from bioinformatically derived tRNA Class-Informative Features (CIFs), features predicted to target tRNAs for specific interactions within the tRNA interaction network. Our tRNA CIF-based model robustly and accurately classifies alphaproteobacterial genomes into one of seven undisputed monophyletic orders or families, despite great variability in tRNA gene complement sizes and base compositions. Our model robustly rejects monophyly of SAR11, classifying all but one strain as Rhizobiales with strong statistical support. Yet remarkably, conventional phylogenetic analysis of tRNAs classifies all SAR11 strains identically as Rickettsiales. We attribute this discrepancy to convergence of SAR11 and Rickettsiales tRNA base compositions. Thus, tRNA CIFs appear more robust to compositional convergence than tRNA sequences generally. Our results suggest that tRNA-CIF-based phyloclassification is robust to HGT of components of the tRNA interaction network, such as aminoacyl-tRNA synthetases. We explain why tRNAs are especially advantageous for prediction of traits governing macromolecular interactions from genomic data, and why such traits may be advantageous in the search for robust signals to address difficult problems in classification and phylogeny.


Asunto(s)
Alphaproteobacteria/clasificación , Alphaproteobacteria/genética , ARN Bacteriano/genética , ARN de Transferencia/genética , Proteínas Bacterianas/genética , Biología Computacional , Evolución Molecular , Redes Reguladoras de Genes , Transferencia de Gen Horizontal , Genoma Bacteriano , Modelos Genéticos , Filogenia , Rhodospirillales/clasificación , Rhodospirillales/genética
8.
Curr Protoc ; 4(5): e1046, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38717471

RESUMEN

Whole-genome sequencing is widely used to investigate population genomic variation in organisms of interest. Assorted tools have been independently developed to call variants from short-read sequencing data aligned to a reference genome, including single nucleotide polymorphisms (SNPs) and structural variations (SVs). We developed SNP-SVant, an integrated, flexible, and computationally efficient bioinformatic workflow that predicts high-confidence SNPs and SVs in organisms without benchmarked variants, which are traditionally used for distinguishing sequencing errors from real variants. In the absence of these benchmarked datasets, we leverage multiple rounds of statistical recalibration to increase the precision of variant prediction. The SNP-SVant workflow is flexible, with user options to tradeoff accuracy for sensitivity. The workflow predicts SNPs and small insertions and deletions using the Genome Analysis ToolKit (GATK) and predicts SVs using the Genome Rearrangement IDentification Software Suite (GRIDSS), and it culminates in variant annotation using custom scripts. A key utility of SNP-SVant is its scalability. Variant calling is a computationally expensive procedure, and thus, SNP-SVant uses a workflow management system with intermediary checkpoint steps to ensure efficient use of resources by minimizing redundant computations and omitting steps where dependent files are available. SNP-SVant also provides metrics to assess the quality of called variants and converts between VCF and aligned FASTA format outputs to ensure compatibility with downstream tools to calculate selection statistics, which are commonplace in population genomics studies. By accounting for both small and large structural variants, users of this workflow can obtain a wide-ranging view of genomic alterations in an organism of interest. Overall, this workflow advances our capabilities in assessing the functional consequences of different types of genomic alterations, ultimately improving our ability to associate genotypes with phenotypes. © 2024 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol: Predicting single nucleotide polymorphisms and structural variations Support Protocol 1: Downloading publicly available sequencing data Support Protocol 2: Visualizing variant loci using Integrated Genome Viewer Support Protocol 3: Converting between VCF and aligned FASTA formats.


Asunto(s)
Polimorfismo de Nucleótido Simple , Programas Informáticos , Flujo de Trabajo , Polimorfismo de Nucleótido Simple/genética , Biología Computacional/métodos , Genómica/métodos , Anotación de Secuencia Molecular/métodos , Secuenciación Completa del Genoma/métodos
9.
Nature ; 450(7167): 203-18, 2007 Nov 08.
Artículo en Inglés | MEDLINE | ID: mdl-17994087

RESUMEN

Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species.


Asunto(s)
Drosophila/clasificación , Drosophila/genética , Evolución Molecular , Genes de Insecto/genética , Genoma de los Insectos/genética , Genómica , Filogenia , Animales , Codón/genética , Elementos Transponibles de ADN/genética , Drosophila/inmunología , Drosophila/metabolismo , Proteínas de Drosophila/genética , Orden Génico/genética , Genoma Mitocondrial/genética , Inmunidad/genética , Familia de Multigenes/genética , ARN no Traducido/genética , Reproducción/genética , Alineación de Secuencia , Análisis de Secuencia de ADN , Sintenía/genética
10.
PLoS Negl Trop Dis ; 14(2): e0007983, 2020 02.
Artículo en Inglés | MEDLINE | ID: mdl-32106219

RESUMEN

The development of chemotherapies against eukaryotic pathogens is especially challenging because of both the evolutionary conservation of drug targets between host and parasite, and the evolution of strain-dependent drug resistance. There is a strong need for new nontoxic drugs with broad-spectrum activity against trypanosome parasites such as Leishmania and Trypanosoma. A relatively untested approach is to target macromolecular interactions in parasites rather than small molecular interactions, under the hypothesis that the features specifying macromolecular interactions diverge more rapidly through coevolution. We computed tRNA Class-Informative Features in humans and independently in eight distinct clades of trypanosomes, identifying parasite-specific informative features, including base pairs and base mis-pairs, that are broadly conserved over approximately 250 million years of trypanosome evolution. Validating these observations, we demonstrated biochemically that tRNA:aminoacyl-tRNA synthetase (aaRS) interactions are a promising target for anti-trypanosomal drug discovery. From a marine natural products extract library, we identified several fractions with inhibitory activity toward Leishmania major alanyl-tRNA synthetase (AlaRS) but no activity against the human homolog. These marine natural products extracts showed cross-reactivity towards Trypanosoma cruzi AlaRS indicating the broad-spectrum potential of our network predictions. We also identified Leishmania major threonyl-tRNA synthetase (ThrRS) inhibitors from the same library. We discuss why chemotherapies targeting multiple aaRSs should be less prone to the evolution of resistance than monotherapeutic or synergistic combination chemotherapies targeting only one aaRS.


Asunto(s)
Alanina-ARNt Ligasa/antagonistas & inhibidores , Antiprotozoarios/farmacología , Inhibidores Enzimáticos/farmacología , Leishmania/enzimología , Proteínas Protozoarias/antagonistas & inhibidores , Treonina-ARNt Ligasa/antagonistas & inhibidores , Trypanosoma/efectos de los fármacos , Alanina-ARNt Ligasa/genética , Alanina-ARNt Ligasa/metabolismo , Antiprotozoarios/química , Inhibidores Enzimáticos/química , Humanos , Leishmania/efectos de los fármacos , Leishmania/genética , Leishmaniasis/parasitología , Proteínas Protozoarias/genética , Proteínas Protozoarias/metabolismo , Treonina-ARNt Ligasa/genética , Treonina-ARNt Ligasa/metabolismo , Trypanosoma/enzimología , Trypanosoma/genética , Tripanosomiasis/parasitología
11.
BMC Bioinformatics ; 10: 271, 2009 Aug 28.
Artículo en Inglés | MEDLINE | ID: mdl-19715597

RESUMEN

BACKGROUND: Promoter identification is a first step in the quest to explain gene regulation in bacteria. It has been demonstrated that the initiation of bacterial transcription depends upon the stability and topology of DNA in the promoter region as well as the binding affinity between the RNA polymerase sigma-factor and promoter. However, promoter prediction algorithms to date have not explicitly used an ensemble of these factors as predictors. In addition, most promoter models have been trained on data from Escherichia coli. Although it has been shown that transcriptional mechanisms are similar among various bacteria, it is quite possible that the differences between Escherichia coli and Chlamydia trachomatis are large enough to recommend an organism-specific modeling effort. RESULTS: Here we present an iterative stochastic model building procedure that combines such biophysical metrics as DNA stability, curvature, twist and stress-induced DNA duplex destabilization along with duration hidden Markov model parameters to model Chlamydia trachomatis sigma66 promoters from 29 experimentally verified sequences. Initially, iterative duration hidden Markov modeling of the training set sequences provides a scoring algorithm for Chlamydia trachomatis RNA polymerase sigma66/DNA binding. Subsequently, an iterative application of Stepwise Binary Logistic Regression selects multiple promoter predictors and deletes/replaces training set sequences to determine an optimal training set. The resulting model predicts the final training set with a high degree of accuracy and provides insights into the structure of the promoter region. Model based genome-wide predictions are provided so that optimal promoter candidates can be experimentally evaluated, and refined models developed. Co-predictions with three other algorithms are also supplied to enhance reliability. CONCLUSION: This strategy and resulting model support the conjecture that DNA biophysical properties, along with RNA polymerase sigma-factor/DNA binding collaboratively, contribute to a sequence's ability to promote transcription. This work provides a baseline model that can evolve as new Chlamydia trachomatis sigma66 promoters are identified with assistance from the provided genome-wide predictions. The proposed methodology is ideal for organisms with few identified promoters and relatively small genomes.


Asunto(s)
Proteínas Bacterianas/genética , Chlamydia trachomatis/genética , Biología Computacional/métodos , Cadenas de Markov , Regiones Promotoras Genéticas , Factor sigma/química , Factor sigma/genética , Algoritmos , Proteínas Bacterianas/química , Biofisica/métodos , Genes Bacterianos , Genoma Bacteriano
12.
Proteins ; 77(3): 499-508, 2009 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-19507241

RESUMEN

Protein structures change during evolution in response to mutations. Here, we analyze the mapping between sequence and structure in a set of structurally aligned protein domains. To avoid artifacts, we restricted our attention only to the core components of these structures. We found that on average, using different measures of structural change, protein cores evolve linearly with evolutionary distance (amino acid substitutions per site). This is true irrespective of which measure of structural change we used, whether RMSD or discrete structural descriptors for secondary structure, accessibility, or contacts. This linear response allows us to quantify the claim that structure is more conserved than sequence. Using structural alphabets of similar cardinality to the sequence alphabet, structural cores evolve three to ten times slower than sequences. Although we observed an average linear response, we found a wide variance. Different domain families varied fivefold in structural response to evolution. An attempt to categorically analyze this variance among subgroups by structural and functional category revealed only one statistically significant trend. This trend can be explained by the fact that beta-sheets change faster than alpha-helices, most likely due to that they are shorter and that change occurs at the ends of the secondary structure elements.


Asunto(s)
Biología Computacional/métodos , Proteínas/química , Aminoácidos/química , Secuencia Conservada , Bases de Datos de Proteínas , Evolución Molecular , Conformación Molecular , Mutación , Conformación Proteica , Estructura Secundaria de Proteína , Estructura Terciaria de Proteína , Proteómica/métodos , Análisis de Regresión , Alineación de Secuencia
13.
Nucleic Acids Res ; 35(Web Server issue): W350-3, 2007 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-17591612

RESUMEN

We have earlier published an automated statistical classifier of tRNA function called TFAM. Unlike tRNA gene-finders, TFAM uses information from the total sequences of tRNAs and not just their anticodons to predict their function. Therefore TFAM has an advantage in predicting initiator tRNAs, the amino acid charging identity of nonstandard tRNAs such as suppressors, and the former identity of pseudo-tRNAs. In addition, TFAM predictions are robust to sequencing errors and useful for the statistical analysis of tRNA sequence, function and evolution. Earlier versions of TFAM required a complicated installation and running procedure, and only bacterial tRNA identity models were provided. Here we describe a new version of TFAM with both a Web Server interface and simplified standalone installation. New TFAM models are available including a proteobacterial model for the bacterial lysylated isoleucine tRNAs, making it now possible for TFAM to correctly classify all tRNA genes for some bacterial taxa. First-draft eukaryotic and archaeal models are also provided making initiator tRNA prediction easily accessible genes to any researcher or genome sequencing effort. The TFAM Web Server is available at http://tfam.lcb.uu.se.


Asunto(s)
Alphaproteobacteria/genética , Biología Computacional/métodos , Evolución Molecular , Transferencia de Gen Horizontal , Modelos Estadísticos , Alphaproteobacteria/clasificación , Alphaproteobacteria/enzimología , Animales , ADN Bacteriano/clasificación , Bases de Datos de Ácidos Nucleicos , Drosophila melanogaster/genética , Variación Genética , Internet , Filogenia , ARN de Transferencia/clasificación , ARN de Transferencia/genética , Programas Informáticos , Interfaz Usuario-Computador
14.
Nucleic Acids Res ; 34(3): 905-16, 2006.
Artículo en Inglés | MEDLINE | ID: mdl-16473848

RESUMEN

Sequence logos are stacked bar graphs that generalize the notion of consensus sequence. They employ entropy statistics very effectively to display variation in a structural alignment of sequences of a common function, while emphasizing its over-represented features. Yet sequence logos cannot display features that distinguish functional subclasses within a structurally related superfamily nor do they display under-represented features. We introduce two extensions to address these needs: function logos and inverse logos. Function logos display subfunctions that are over-represented among sequences carrying a specific feature. Inverse logos generalize both sequence logos and function logos by displaying under-represented, rather than over-represented, features or functions in structural alignments. To make inverse logos, a compositional inverse is applied to the feature or function frequency distributions before logo construction, where a compositional inverse is a mathematical transform that makes common features or functions rare and vice versa. We applied these methods to a database of structurally aligned bacterial tDNAs to create highly condensed, birds-eye views of potentially all so-called identity determinants and antideterminants that confer specific amino acid charging or initiator function on tRNAs in bacteria. We recovered both known and a few potentially novel identity elements. Function logos and inverse logos are useful tools for exploratory bioinformatic analysis of structure-function relationships in sequence families and superfamilies.


Asunto(s)
ARN Bacteriano/clasificación , ARN de Transferencia/clasificación , Análisis de Secuencia de ADN/métodos , Secuencia de Bases , Secuencia de Consenso , ADN Bacteriano/química , ADN Bacteriano/clasificación , Interpretación Estadística de Datos , Entropía , ARN Bacteriano/genética , ARN de Transferencia/genética , Alineación de Secuencia
15.
Nucleic Acids Res ; 34(3): 893-904, 2006.
Artículo en Inglés | MEDLINE | ID: mdl-16473847

RESUMEN

We present TFAM, an automated, statistical method to classify the identity of tRNAs. TFAM, currently optimized for bacteria, classifies initiator tRNAs and predicts the charging identity of both typical and atypical tRNAs such as suppressors with high confidence. We show statistical evidence for extensive variation in tRNA identity determinants among bacterial genomes due to variation in overall tDNA base content. With TFAM we have detected the first case of eukaryotic-like tRNA identity rules in bacteria. An alpha-proteobacterial clade encompassing Rhizobiales, Caulobacter crescentus and Silicibacter pomeroyi, unlike a sister clade containing the Rickettsiales, Zymomonas mobilis and Gluconobacter oxydans, uses the eukaryotic identity element A73 instead of the highly conserved prokaryotic element C73. We confirm divergence of bacterial histidylation rules by demonstrating perfect covariation of alpha-proteobacterial tRNA(His) acceptor stems and residues in the motif IIb tRNA-binding pocket of their histidyl-tRNA synthetases (HisRS). Phylogenomic analysis supports lateral transfer of a eukaryotic-like HisRS into the alpha-proteobacteria followed by in situ adaptation of the bacterial tDNA(His) and identity rule divergence. Our results demonstrate that TFAM is an effective tool for the bioinformatics, comparative genomics and evolutionary study of tRNA identity.


Asunto(s)
Alphaproteobacteria/genética , Evolución Molecular , Transferencia de Gen Horizontal , Histidina-ARNt Ligasa/genética , Modelos Estadísticos , ARN de Transferencia de Histidina/genética , Alphaproteobacteria/clasificación , Alphaproteobacteria/enzimología , ADN Bacteriano/clasificación , Bases de Datos de Ácidos Nucleicos , Variación Genética , Genoma Bacteriano , Genómica , Histidina-ARNt Ligasa/clasificación , Filogenia , ARN de Transferencia/clasificación , ARN de Transferencia/genética , ARN de Transferencia de Histidina/química , ARN de Transferencia de Histidina/clasificación , ARN de Transferencia de Metionina/clasificación
16.
J Bacteriol ; 189(24): 8993-9000, 2007 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-17951392

RESUMEN

Expression of minigenes encoding tetra- or pentapeptides MXLX or MXLXV (E peptides), where X is a nonpolar amino acid, renders cells erythromycin resistant whereas expression of minigenes encoding tripeptide MXL does not. By using a 3A' reporter gene system beginning with an E-peptide-encoding sequence, we asked whether the codons UGG and GGG, which are known to promote peptidyl-tRNA drop-off at early positions in mRNA, would result in a phenotype of erythromycin resistance if located after this sequence. We find that UGG or GGG, at either position +4 or +5, without a following stop codon, is associated with an erythromycin resistance phenotype upon gene induction. Our results suggest that, while a stop codon at +4 gives a tripeptide product (MIL) and erythromycin sensitivity, UGG or GGG codons at the same position give a tetrapeptide product (MILW or MILG) and phenotype of erythromycin resistance. Thus, the drop-off event on GGG or UGG codons occurs after incorporation of the corresponding amino acid into the growing peptide chain. Drop-off gives rise to a peptidyl-tRNA where the peptide moiety functionally mimics a minigene peptide product of the type previously associated with erythromycin resistance. Several genes in Escherichia coli fulfill the requirements of high mRNA expression and an E-peptide sequence followed by UGG or GGG at position +4 or +5 and should potentially be able to give an erythromycin resistance phenotype.


Asunto(s)
Antibacterianos/farmacología , Codón/genética , Farmacorresistencia Bacteriana , Eritromicina/farmacología , Escherichia coli/efectos de los fármacos , Biosíntesis de Proteínas , Aminoacil-ARN de Transferencia/metabolismo , Genes Reporteros , Oligopéptidos/biosíntesis , Proteína Estafilocócica A/biosíntesis , Proteína Estafilocócica A/genética
17.
Biochimie ; 89(10): 1276-88, 2007 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-17889982

RESUMEN

There are at least 21 subfunctional classes of tRNAs in most cells that, despite a very highly conserved and compact common structure, must interact specifically with different cliques of proteins or cause grave organismal consequences. Protein recognition of specific tRNA substrates is achieved in part through class-restricted tRNA features called tRNA identity determinants. In earlier work we used TFAM, a statistical classifier of tRNA function, to show evidence of unexpectedly large diversity among bacteria in tRNA identity determinants. We also created a data reduction technique called function logos to visualize identity determinants for a given taxon. Here we show evidence that determinants for lysylated isoleucine tRNAs are not the same in Proteobacteria as in other bacterial groups including the Cyanobacteria. Consistent with this, the lysylating biosynthetic enzyme TilS lacks a C-terminal domain in Cyanobacteria that is present in Proteobacteria. We present here, using function logos, a map estimating all potential identity determinants generally operational in Cyanobacteria and Proteobacteria. To further isolate the differences in potential tRNA identity determinants between Proteobacteria and Cyanobacteria, we created two new data reduction visualizations to contrast sequence and function logos between two taxa. One, called Information Difference logos (ID logos), shows the evolutionary gain or retention of functional information associated to features in one lineage. The other, Kullback-Leibler divergence Difference logos (KLD logos), shows recruitments or shifts in the functional associations of features, especially those informative in both lineages. We used these new logos to specifically isolate and visualize the differences in potential tRNA identity determinants between Proteobacteria and Cyanobacteria. Our graphical results point to numerous differences in potential tRNA identity determinants between these groups. Although more differences in general are explained by shifts in functional association rather than gains or losses, the apparent identity differences in lysylated isoleucine tRNAs appear to have evolved through both mechanisms.


Asunto(s)
Algoritmos , Cianobacterias/genética , Proteobacteria/genética , ARN de Transferencia/genética , Aminoacil-ARNt Sintetasas/metabolismo , Cianobacterias/metabolismo , Proteobacteria/metabolismo , Aminoacil-ARN de Transferencia/genética
18.
Genome Biol Evol ; 9(7): 1971-1977, 2017 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-28810711

RESUMEN

Candida albicans is the most common cause of life-threatening fungal infections in humans, especially in immunocompromised individuals. Crucial to its success as an opportunistic pathogen is the considerable dynamism of its genome, which readily undergoes genetic changes generating new phenotypes and shaping the evolution of new strains. Candida africana is an intriguing C. albicans biovariant strain that exhibits remarkable genetic and phenotypic differences when compared with standard C. albicans isolates. Candida africana is well-known for its low degree of virulence compared with C. albicans and for its inability to produce chlamydospores that C. albicans, characteristically, produces under certain environmental conditions. Chlamydospores are large, spherical structures, whose biological function is still unknown. For this reason, we have sequenced, assembled, and annotated the whole transcriptomes obtained from an efficient C. albicans chlamydospore-producing clinical strain (GE1), compared with the natural chlamydospore-negative C. africana clinical strain (CBS 11016). The transcriptomes of both C. albicans (GE1) and C. africana (CBS 11016) clinical strains, grown under chlamydospore-inducing conditions, were sequenced and assembled into 7,442 (GE1 strain) and 8,370 (CBS 11016 strain) high quality transcripts, respectively. The release of the first assembly of the C. africana transcriptome will allow future comparative studies to better understand the biology and evolution of this important human fungal pathogen.


Asunto(s)
Candida albicans/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ARN/métodos , Esporas Fúngicas/genética , Transcriptoma , Candida albicans/clasificación , Regulación Fúngica de la Expresión Génica , Especificidad de la Especie
19.
J Mol Biol ; 351(1): 9-15, 2005 Aug 05.
Artículo en Inglés | MEDLINE | ID: mdl-16002088

RESUMEN

Based on a computational analysis of the 5' regions of tRNA-encoding genes, the average length of the 5' leaders in tRNA precursors in Escherichia coli appears to be 17-18 residues long. An in vivo assay based on tRNA nonsense suppression was developed and used to investigate the function of the 5' leader of the tRNA precursors on tRNA processing and bacterial growth. Our data indicate that the 5' leader influences bacterial growth but is surprisingly not absolutely necessary for growth. These findings are consistent with previous in vitro data where it was demonstrated that the 5' leader plays a role in the interaction with RNase P, the endoribonuclease responsible for removing the 5' leader in the cell. We discuss the plausible role of the 5' leader in processing and tRNA gene expression.


Asunto(s)
Escherichia coli/crecimiento & desarrollo , Escherichia coli/genética , Precursores del ARN , Procesamiento Postranscripcional del ARN , ARN de Transferencia/genética , Secuencia de Bases , Proliferación Celular , Modelos Moleculares , Conformación de Ácido Nucleico , ARN de Transferencia de Fenilalanina , Ribonucleasa P/metabolismo
20.
Genetics ; 165(4): 1761-77, 2003 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-14704164

RESUMEN

DNA polymerase alpha is the most highly scrambled gene known in stichotrichous ciliates. In its hereditary micronuclear form, it is broken into >40 pieces on two loci at least 3 kb apart. Scrambled genes must be reassembled through developmental DNA rearrangements to yield functioning macronuclear genes, but the mechanism and accuracy of this process are unknown. We describe the first analysis of DNA polymorphism in the macronuclear version of any scrambled gene. Six functional haplotypes obtained from five Eurasian strains of Stylonychia lemnae were highly polymorphic compared to Drosophila genes. Another incompletely unscrambled haplotype was interrupted by frameshift and nonsense mutations but contained more silent mutations than expected by allelic inactivation. In our sample, nucleotide diversity and recombination signals were unexpectedly high within a region encompassing the boundary of the two micronuclear loci. From this and other evidence we infer that both members of a long repeat at the ends of the loci provide alternative substrates for unscrambling in this region. Incongruent genealogies and recombination patterns were also consistent with separation of the two loci by a large genetic distance. Our results suggest that ciliate developmental DNA rearrangements may be more probabilistic and error prone than previously appreciated and constitute a potential source of macronuclear variation. From this perspective we introduce the nonsense-suppression hypothesis for the evolution of ciliate altered genetic codes. We also introduce methods and software to calculate the likelihood of hemizygosity in ciliate haplotype samples and to correct for multiple comparisons in sliding-window analyses of Tajima's D.


Asunto(s)
Cilióforos/genética , ADN Polimerasa I/genética , Mutación/genética , Polimorfismo Genético/genética , Recombinación Genética , Animales , Secuencia de Bases , Cilióforos/enzimología , Biología Computacional , ADN Protozoario/genética , ADN Protozoario/metabolismo , Evolución Molecular , Haplotipos/genética , Datos de Secuencia Molecular , Mosaicismo , Homología de Secuencia de Ácido Nucleico
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA