Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 35
Filtrar
1.
Bioinformatics ; 37(20): 3654-3656, 2021 Oct 25.
Artigo em Inglês | MEDLINE | ID: mdl-33904572

RESUMO

MOTIVATION: Structure-conditioned information statistics have proven useful to predict and visualize tRNA Class-Informative Features (CIFs) and their evolutionary divergences. Although permutation P-values can quantify the significance of CIF divergences between two taxa, their naive Monte Carlo approximation is slow and inaccurate. The Peaks-over-Threshold approach of Knijnenburg et al. (2009) promises improvements to both speed and accuracy of permutation P-values, but has no publicly available API. RESULTS: We present tRNA Structure-Function Mapper (tSFM) v1.0, an open-source, multi-threaded application that efficiently computes, visualizes and assesses significance of single- and paired-site CIFs and their evolutionary divergences for any RNA, protein, gene or genomic element sequence family. Multiple estimators of permutation P-values for CIF evolutionary divergences are provided along with confidence intervals. tSFM is implemented in Python 3 with compiled C extensions and is freely available through GitHub (https://github.com/tlawrence3/tSFM) and PyPI. AVAILABILITY AND IMPLEMENTATION: The data underlying this article are available on GitHub at https://github.com/tlawrence3/tSFM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

2.
J Mol Evol ; 89(1-2): 103-116, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33528599

RESUMO

The evolution of tRNA multigene families remains poorly understood, exhibiting unusual phenomena such as functional conversions of tRNA genes through anticodon shift substitutions. We improved FlyBase tRNA gene annotations from twelve Drosophila species, incorporating previously identified ortholog sets to compare substitution rates across tRNA bodies at single-site and base-pair resolution. All rapidly evolving sites fell within the same metal ion-binding pocket that lies at the interface of the two major stacked helical domains. We applied our tRNA Structure-Function Mapper (tSFM) method independently to each Drosophila species and one outgroup species Musca domestica and found that, although predicted tRNA structure-function maps are generally highly conserved in flies, one tRNA Class-Informative Feature (CIF) within the rapidly evolving ion-binding pocket-Cytosine 17 (C17), ancestrally informative for lysylation identity-independently gained asparaginylation identity and substituted in parallel across tRNAAsn paralogs at least once, possibly multiple times, during evolution of the genus. In D. melanogaster, most tRNALys and tRNAAsn genes are co-arrayed in one large heterologous gene cluster, suggesting that heterologous gene conversion as well as structural similarities of tRNA-binding interfaces in the closely related asparaginyl-tRNA synthetase (AsnRS) and lysyl-tRNA synthetase (LysRS) proteins may have played a role in these changes. A previously identified Asn-to-Lys anticodon shift substitution in D. ananassae may have arisen to compensate for the convergent and parallel gains of C17 in tRNAAsn paralogs in that lineage. Our results underscore the functional and evolutionary relevance of our tRNA structure-function map predictions and illuminate multiple genomic and structural factors contributing to rapid, parallel and compensatory evolution of tRNA multigene families.


Assuntos
Drosophila melanogaster , RNA de Transferência , Animais , Anticódon/genética , Drosophila melanogaster/genética , Genoma de Inseto , RNA de Transferência/genética
3.
BMC Bioinformatics ; 20(1): 434, 2019 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-31438847

RESUMO

BACKGROUND: The epidermal growth factor receptor (EGFR) is a major regulator of proliferation in tumor cells. Elevated expression levels of EGFR are associated with prognosis and clinical outcomes of patients in a variety of tumor types. There are at least four splice variants of the mRNA encoding four protein isoforms of EGFR in humans, named I through IV. EGFR isoform I is the full-length protein, whereas isoforms II-IV are shorter protein isoforms. Nevertheless, all EGFR isoforms bind the epidermal growth factor (EGF). Although EGFR is an essential target of long-established and successful tumor therapeutics, the exact function and biomarker potential of alternative EGFR isoforms II-IV are unclear, motivating more in-depth analyses. Hence, we analyzed transcriptome data from glioblastoma cell line SF767 to predict target genes regulated by EGFR isoforms II-IV, but not by EGFR isoform I nor other receptors such as HER2, HER3, or HER4. RESULTS: We analyzed the differential expression of potential target genes in a glioblastoma cell line in two nested RNAi experimental conditions and one negative control, contrasting expression with EGF stimulation against expression without EGF stimulation. In one RNAi experiment, we selectively knocked down EGFR splice variant I, while in the other we knocked down all four EGFR splice variants, so the associated effects of EGFR II-IV knock-down can only be inferred indirectly. For this type of nested experimental design, we developed a two-step bioinformatics approach based on the Bayesian Information Criterion for predicting putative target genes of EGFR isoforms II-IV. Finally, we experimentally validated a set of six putative target genes, and we found that qPCR validations confirmed the predictions in all cases. CONCLUSIONS: By performing RNAi experiments for three poorly investigated EGFR isoforms, we were able to successfully predict 1140 putative target genes specifically regulated by EGFR isoforms II-IV using the developed Bayesian Gene Selection Criterion (BGSC) approach. This approach is easily utilizable for the analysis of data of other nested experimental designs, and we provide an implementation in R that is easily adaptable to similar data or experimental designs together with all raw datasets used in this study in the BGSC repository, https://github.com/GrosseLab/BGSC .


Assuntos
Processamento Alternativo/genética , Biologia Computacional/métodos , Receptores ErbB/genética , Glioblastoma/genética , Teorema de Bayes , Linhagem Celular Tumoral , Receptores ErbB/metabolismo , Humanos , Probabilidade , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Interferência de RNA , RNA Interferente Pequeno/metabolismo , Transdução de Sinais
4.
BMC Evol Biol ; 19(1): 224, 2019 12 09.
Artigo em Inglês | MEDLINE | ID: mdl-31818253

RESUMO

BACKGROUND: Eukaryotes acquired the trait of oxygenic photosynthesis through endosymbiosis of the cyanobacterial progenitor of plastid organelles. Despite recent advances in the phylogenomics of Cyanobacteria, the phylogenetic root of plastids remains controversial. Although a single origin of plastids by endosymbiosis is broadly supported, recent phylogenomic studies are contradictory on whether plastids branch early or late within Cyanobacteria. One underlying cause may be poor fit of evolutionary models to complex phylogenomic data. RESULTS: Using Posterior Predictive Analysis, we show that recently applied evolutionary models poorly fit three phylogenomic datasets curated from cyanobacteria and plastid genomes because of heterogeneities in both substitution processes across sites and of compositions across lineages. To circumvent these sources of bias, we developed CYANO-MLP, a machine learning algorithm that consistently and accurately phylogenetically classifies ("phyloclassifies") cyanobacterial genomes to their clade of origin based on bioinformatically predicted function-informative features in tRNA gene complements. Classification of cyanobacterial genomes with CYANO-MLP is accurate and robust to deletion of clades, unbalanced sampling, and compositional heterogeneity in input tRNA data. CYANO-MLP consistently classifies plastid genomes into a late-branching cyanobacterial sub-clade containing single-cell, starch-producing, nitrogen-fixing ecotypes, consistent with metabolic and gene transfer data. CONCLUSIONS: Phylogenomic data of cyanobacteria and plastids exhibit both site-process heterogeneities and compositional heterogeneities across lineages. These aspects of the data require careful modeling to avoid bias in phylogenomic estimation. Furthermore, we show that amino acid recoding strategies may be insufficient to mitigate bias from compositional heterogeneities. However, the combination of our novel tRNA-specific strategy with machine learning in CYANO-MLP appears robust to these sources of bias with high accuracy in phyloclassification of cyanobacterial genomes. CYANO-MLP consistently classifies plastids as late-branching Cyanobacteria, consistent with independent evidence from signature-based approaches and some previous phylogenetic studies.


Assuntos
Cianobactérias/genética , Eucariotos/citologia , Eucariotos/genética , Plastídeos/genética , Evolução Biológica , Modelos Biológicos , Fotossíntese , Filogenia , RNA de Transferência , Simbiose
5.
Theor Popul Biol ; 129: 68-80, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-31042487

RESUMO

Advances in structural biology of aminoacyl-tRNA synthetases (aaRSs) have revealed incredible diversity in how aaRSs bind their tRNA substrates. The causes of this diversity remain mysterious. We developed a new class of highly rugged fitness landscape models called match landscapes, through which genes encode the assortative interactions of their gene products through the complementarity and identifiability of their structural features. We used results from coding theory to prove bounds and equalities on fitness in match landscapes assuming additive interaction energies, macroscopic aminoacylation kinetics including proofreading, site-specific modifiers of interaction, and selection for translational accuracy in multiple, perfectly encoded site-types. Using genotypes based on extended Hamming codes we show that over a wide array of interface sizes and numbers of encoded cognate pairs, selection for translational accuracy alone is insufficient to displace the tRNA-binding interfaces of aaRSs. Yet, under combined selection for translational accuracy and rate, site-specific modifiers are selected to adaptively displace the tRNA-binding interfaces of non-cognate aaRS-tRNA pairs. We describe a remarkable correspondence between the lengths of perfect RNA (quaternary) codes and the modal sizes of small non-coding RNA families.


Assuntos
Aminoacil-tRNA Sintetases/genética , Aptidão Genética/genética , RNA de Transferência/genética , Humanos , Metagenômica , Modelos Genéticos , Modelos Estatísticos
6.
BMC Genomics ; 17(1): 1003, 2016 12 08.
Artigo em Inglês | MEDLINE | ID: mdl-27927177

RESUMO

BACKGROUND: While the CCA sequence at the mature 3' end of tRNAs is conserved and critical for translational function, a genetic template for this sequence is not always contained in tRNA genes. In eukaryotes and Archaea, the CCA ends of tRNAs are synthesized post-transcriptionally by CCA-adding enzymes. In Bacteria, tRNA genes template CCA sporadically. RESULTS: In order to understand the variation in how prokaryotic tRNA genes template CCA, we re-annotated tRNA genes in tRNAdb-CE database version 0.8. Among 132,129 prokaryotic tRNA genes, initiator tRNA genes template CCA at the highest average frequency (74.1%) over all functional classes except selenocysteine and pyrrolysine tRNA genes (88.1% and 100% respectively). Across bacterial phyla and a wide range of genome sizes, many lineages exist in which predominantly initiator tRNA genes template CCA. Convergent and parallel retention of CCA templating in initiator tRNA genes evolved in independent histories of reductive genome evolution in Bacteria. Also, in a majority of cyanobacterial and actinobacterial genera, predominantly initiator tRNA genes template CCA. We also found that a surprising fraction of archaeal tRNA genes template CCA. CONCLUSIONS: We suggest that cotranscriptional synthesis of initiator tRNA CCA 3' ends can complement inefficient processing of initiator tRNA precursors, "bootstrap" rapid initiation of protein synthesis from a non-growing state, or contribute to an increase in cellular growth rates by reducing overheads of mass and energy to maintain nonfunctional tRNA precursor pools. More generally, CCA templating in structurally non-conforming tRNA genes can afford cells robustness and greater plasticity to respond rapidly to environmental changes and stimuli.


Assuntos
Bactérias/genética , Precursores de RNA/metabolismo , Anticódon , Archaea/genética , Pareamento de Bases , Sequência de Bases , Bases de Dados Genéticas , Genes Arqueais , Genes Bacterianos , Precursores de RNA/química , RNA de Transferência de Metionina/química , RNA de Transferência de Metionina/metabolismo
7.
PLoS Comput Biol ; 10(2): e1003454, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24586126

RESUMO

Molecular phylogenetics and phylogenomics are subject to noise from horizontal gene transfer (HGT) and bias from convergence in macromolecular compositions. Extensive variation in size, structure and base composition of alphaproteobacterial genomes has complicated their phylogenomics, sparking controversy over the origins and closest relatives of the SAR11 strains. SAR11 are highly abundant, cosmopolitan aquatic Alphaproteobacteria with streamlined, A+T-biased genomes. A dominant view holds that SAR11 are monophyletic and related to both Rickettsiales and the ancestor of mitochondria. Other studies dispute this, finding evidence of a polyphyletic origin of SAR11 with most strains distantly related to Rickettsiales. Although careful evolutionary modeling can reduce bias and noise in phylogenomic inference, entirely different approaches may be useful to extract robust phylogenetic signals from genomes. Here we develop simple phyloclassifiers from bioinformatically derived tRNA Class-Informative Features (CIFs), features predicted to target tRNAs for specific interactions within the tRNA interaction network. Our tRNA CIF-based model robustly and accurately classifies alphaproteobacterial genomes into one of seven undisputed monophyletic orders or families, despite great variability in tRNA gene complement sizes and base compositions. Our model robustly rejects monophyly of SAR11, classifying all but one strain as Rhizobiales with strong statistical support. Yet remarkably, conventional phylogenetic analysis of tRNAs classifies all SAR11 strains identically as Rickettsiales. We attribute this discrepancy to convergence of SAR11 and Rickettsiales tRNA base compositions. Thus, tRNA CIFs appear more robust to compositional convergence than tRNA sequences generally. Our results suggest that tRNA-CIF-based phyloclassification is robust to HGT of components of the tRNA interaction network, such as aminoacyl-tRNA synthetases. We explain why tRNAs are especially advantageous for prediction of traits governing macromolecular interactions from genomic data, and why such traits may be advantageous in the search for robust signals to address difficult problems in classification and phylogeny.


Assuntos
Alphaproteobacteria/classificação , Alphaproteobacteria/genética , RNA Bacteriano/genética , RNA de Transferência/genética , Proteínas de Bactérias/genética , Biologia Computacional , Evolução Molecular , Redes Reguladoras de Genes , Transferência Genética Horizontal , Genoma Bacteriano , Modelos Genéticos , Filogenia , Rhodospirillales/classificação , Rhodospirillales/genética
8.
Curr Protoc ; 4(5): e1046, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38717471

RESUMO

Whole-genome sequencing is widely used to investigate population genomic variation in organisms of interest. Assorted tools have been independently developed to call variants from short-read sequencing data aligned to a reference genome, including single nucleotide polymorphisms (SNPs) and structural variations (SVs). We developed SNP-SVant, an integrated, flexible, and computationally efficient bioinformatic workflow that predicts high-confidence SNPs and SVs in organisms without benchmarked variants, which are traditionally used for distinguishing sequencing errors from real variants. In the absence of these benchmarked datasets, we leverage multiple rounds of statistical recalibration to increase the precision of variant prediction. The SNP-SVant workflow is flexible, with user options to tradeoff accuracy for sensitivity. The workflow predicts SNPs and small insertions and deletions using the Genome Analysis ToolKit (GATK) and predicts SVs using the Genome Rearrangement IDentification Software Suite (GRIDSS), and it culminates in variant annotation using custom scripts. A key utility of SNP-SVant is its scalability. Variant calling is a computationally expensive procedure, and thus, SNP-SVant uses a workflow management system with intermediary checkpoint steps to ensure efficient use of resources by minimizing redundant computations and omitting steps where dependent files are available. SNP-SVant also provides metrics to assess the quality of called variants and converts between VCF and aligned FASTA format outputs to ensure compatibility with downstream tools to calculate selection statistics, which are commonplace in population genomics studies. By accounting for both small and large structural variants, users of this workflow can obtain a wide-ranging view of genomic alterations in an organism of interest. Overall, this workflow advances our capabilities in assessing the functional consequences of different types of genomic alterations, ultimately improving our ability to associate genotypes with phenotypes. © 2024 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol: Predicting single nucleotide polymorphisms and structural variations Support Protocol 1: Downloading publicly available sequencing data Support Protocol 2: Visualizing variant loci using Integrated Genome Viewer Support Protocol 3: Converting between VCF and aligned FASTA formats.


Assuntos
Polimorfismo de Nucleotídeo Único , Software , Fluxo de Trabalho , Polimorfismo de Nucleotídeo Único/genética , Biologia Computacional/métodos , Genômica/métodos , Anotação de Sequência Molecular/métodos , Sequenciamento Completo do Genoma/métodos
9.
Nature ; 450(7167): 203-18, 2007 Nov 08.
Artigo em Inglês | MEDLINE | ID: mdl-17994087

RESUMO

Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species.


Assuntos
Drosophila/classificação , Drosophila/genética , Evolução Molecular , Genes de Insetos/genética , Genoma de Inseto/genética , Genômica , Filogenia , Animais , Códon/genética , Elementos de DNA Transponíveis/genética , Drosophila/imunologia , Drosophila/metabolismo , Proteínas de Drosophila/genética , Ordem dos Genes/genética , Genoma Mitocondrial/genética , Imunidade/genética , Família Multigênica/genética , RNA não Traduzido/genética , Reprodução/genética , Alinhamento de Sequência , Análise de Sequência de DNA , Sintenia/genética
10.
PLoS Negl Trop Dis ; 14(2): e0007983, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-32106219

RESUMO

The development of chemotherapies against eukaryotic pathogens is especially challenging because of both the evolutionary conservation of drug targets between host and parasite, and the evolution of strain-dependent drug resistance. There is a strong need for new nontoxic drugs with broad-spectrum activity against trypanosome parasites such as Leishmania and Trypanosoma. A relatively untested approach is to target macromolecular interactions in parasites rather than small molecular interactions, under the hypothesis that the features specifying macromolecular interactions diverge more rapidly through coevolution. We computed tRNA Class-Informative Features in humans and independently in eight distinct clades of trypanosomes, identifying parasite-specific informative features, including base pairs and base mis-pairs, that are broadly conserved over approximately 250 million years of trypanosome evolution. Validating these observations, we demonstrated biochemically that tRNA:aminoacyl-tRNA synthetase (aaRS) interactions are a promising target for anti-trypanosomal drug discovery. From a marine natural products extract library, we identified several fractions with inhibitory activity toward Leishmania major alanyl-tRNA synthetase (AlaRS) but no activity against the human homolog. These marine natural products extracts showed cross-reactivity towards Trypanosoma cruzi AlaRS indicating the broad-spectrum potential of our network predictions. We also identified Leishmania major threonyl-tRNA synthetase (ThrRS) inhibitors from the same library. We discuss why chemotherapies targeting multiple aaRSs should be less prone to the evolution of resistance than monotherapeutic or synergistic combination chemotherapies targeting only one aaRS.


Assuntos
Alanina-tRNA Ligase/antagonistas & inibidores , Antiprotozoários/farmacologia , Inibidores Enzimáticos/farmacologia , Leishmania/enzimologia , Proteínas de Protozoários/antagonistas & inibidores , Treonina-tRNA Ligase/antagonistas & inibidores , Trypanosoma/efeitos dos fármacos , Alanina-tRNA Ligase/genética , Alanina-tRNA Ligase/metabolismo , Antiprotozoários/química , Inibidores Enzimáticos/química , Humanos , Leishmania/efeitos dos fármacos , Leishmania/genética , Leishmaniose/parasitologia , Proteínas de Protozoários/genética , Proteínas de Protozoários/metabolismo , Treonina-tRNA Ligase/genética , Treonina-tRNA Ligase/metabolismo , Trypanosoma/enzimologia , Trypanosoma/genética , Tripanossomíase/parasitologia
11.
BMC Bioinformatics ; 10: 271, 2009 Aug 28.
Artigo em Inglês | MEDLINE | ID: mdl-19715597

RESUMO

BACKGROUND: Promoter identification is a first step in the quest to explain gene regulation in bacteria. It has been demonstrated that the initiation of bacterial transcription depends upon the stability and topology of DNA in the promoter region as well as the binding affinity between the RNA polymerase sigma-factor and promoter. However, promoter prediction algorithms to date have not explicitly used an ensemble of these factors as predictors. In addition, most promoter models have been trained on data from Escherichia coli. Although it has been shown that transcriptional mechanisms are similar among various bacteria, it is quite possible that the differences between Escherichia coli and Chlamydia trachomatis are large enough to recommend an organism-specific modeling effort. RESULTS: Here we present an iterative stochastic model building procedure that combines such biophysical metrics as DNA stability, curvature, twist and stress-induced DNA duplex destabilization along with duration hidden Markov model parameters to model Chlamydia trachomatis sigma66 promoters from 29 experimentally verified sequences. Initially, iterative duration hidden Markov modeling of the training set sequences provides a scoring algorithm for Chlamydia trachomatis RNA polymerase sigma66/DNA binding. Subsequently, an iterative application of Stepwise Binary Logistic Regression selects multiple promoter predictors and deletes/replaces training set sequences to determine an optimal training set. The resulting model predicts the final training set with a high degree of accuracy and provides insights into the structure of the promoter region. Model based genome-wide predictions are provided so that optimal promoter candidates can be experimentally evaluated, and refined models developed. Co-predictions with three other algorithms are also supplied to enhance reliability. CONCLUSION: This strategy and resulting model support the conjecture that DNA biophysical properties, along with RNA polymerase sigma-factor/DNA binding collaboratively, contribute to a sequence's ability to promote transcription. This work provides a baseline model that can evolve as new Chlamydia trachomatis sigma66 promoters are identified with assistance from the provided genome-wide predictions. The proposed methodology is ideal for organisms with few identified promoters and relatively small genomes.


Assuntos
Proteínas de Bactérias/genética , Chlamydia trachomatis/genética , Biologia Computacional/métodos , Cadeias de Markov , Regiões Promotoras Genéticas , Fator sigma/química , Fator sigma/genética , Algoritmos , Proteínas de Bactérias/química , Biofísica/métodos , Genes Bacterianos , Genoma Bacteriano
12.
Proteins ; 77(3): 499-508, 2009 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-19507241

RESUMO

Protein structures change during evolution in response to mutations. Here, we analyze the mapping between sequence and structure in a set of structurally aligned protein domains. To avoid artifacts, we restricted our attention only to the core components of these structures. We found that on average, using different measures of structural change, protein cores evolve linearly with evolutionary distance (amino acid substitutions per site). This is true irrespective of which measure of structural change we used, whether RMSD or discrete structural descriptors for secondary structure, accessibility, or contacts. This linear response allows us to quantify the claim that structure is more conserved than sequence. Using structural alphabets of similar cardinality to the sequence alphabet, structural cores evolve three to ten times slower than sequences. Although we observed an average linear response, we found a wide variance. Different domain families varied fivefold in structural response to evolution. An attempt to categorically analyze this variance among subgroups by structural and functional category revealed only one statistically significant trend. This trend can be explained by the fact that beta-sheets change faster than alpha-helices, most likely due to that they are shorter and that change occurs at the ends of the secondary structure elements.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Aminoácidos/química , Sequência Conservada , Bases de Dados de Proteínas , Evolução Molecular , Conformação Molecular , Mutação , Conformação Proteica , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Proteômica/métodos , Análise de Regressão , Alinhamento de Sequência
13.
Nucleic Acids Res ; 35(Web Server issue): W350-3, 2007 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-17591612

RESUMO

We have earlier published an automated statistical classifier of tRNA function called TFAM. Unlike tRNA gene-finders, TFAM uses information from the total sequences of tRNAs and not just their anticodons to predict their function. Therefore TFAM has an advantage in predicting initiator tRNAs, the amino acid charging identity of nonstandard tRNAs such as suppressors, and the former identity of pseudo-tRNAs. In addition, TFAM predictions are robust to sequencing errors and useful for the statistical analysis of tRNA sequence, function and evolution. Earlier versions of TFAM required a complicated installation and running procedure, and only bacterial tRNA identity models were provided. Here we describe a new version of TFAM with both a Web Server interface and simplified standalone installation. New TFAM models are available including a proteobacterial model for the bacterial lysylated isoleucine tRNAs, making it now possible for TFAM to correctly classify all tRNA genes for some bacterial taxa. First-draft eukaryotic and archaeal models are also provided making initiator tRNA prediction easily accessible genes to any researcher or genome sequencing effort. The TFAM Web Server is available at http://tfam.lcb.uu.se.


Assuntos
Alphaproteobacteria/genética , Biologia Computacional/métodos , Evolução Molecular , Transferência Genética Horizontal , Modelos Estatísticos , Alphaproteobacteria/classificação , Alphaproteobacteria/enzimologia , Animais , DNA Bacteriano/classificação , Bases de Dados de Ácidos Nucleicos , Drosophila melanogaster/genética , Variação Genética , Internet , Filogenia , RNA de Transferência/classificação , RNA de Transferência/genética , Software , Interface Usuário-Computador
14.
Nucleic Acids Res ; 34(3): 905-16, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-16473848

RESUMO

Sequence logos are stacked bar graphs that generalize the notion of consensus sequence. They employ entropy statistics very effectively to display variation in a structural alignment of sequences of a common function, while emphasizing its over-represented features. Yet sequence logos cannot display features that distinguish functional subclasses within a structurally related superfamily nor do they display under-represented features. We introduce two extensions to address these needs: function logos and inverse logos. Function logos display subfunctions that are over-represented among sequences carrying a specific feature. Inverse logos generalize both sequence logos and function logos by displaying under-represented, rather than over-represented, features or functions in structural alignments. To make inverse logos, a compositional inverse is applied to the feature or function frequency distributions before logo construction, where a compositional inverse is a mathematical transform that makes common features or functions rare and vice versa. We applied these methods to a database of structurally aligned bacterial tDNAs to create highly condensed, birds-eye views of potentially all so-called identity determinants and antideterminants that confer specific amino acid charging or initiator function on tRNAs in bacteria. We recovered both known and a few potentially novel identity elements. Function logos and inverse logos are useful tools for exploratory bioinformatic analysis of structure-function relationships in sequence families and superfamilies.


Assuntos
RNA Bacteriano/classificação , RNA de Transferência/classificação , Análise de Sequência de DNA/métodos , Sequência de Bases , Sequência Consenso , DNA Bacteriano/química , DNA Bacteriano/classificação , Interpretação Estatística de Dados , Entropia , RNA Bacteriano/genética , RNA de Transferência/genética , Alinhamento de Sequência
15.
Nucleic Acids Res ; 34(3): 893-904, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-16473847

RESUMO

We present TFAM, an automated, statistical method to classify the identity of tRNAs. TFAM, currently optimized for bacteria, classifies initiator tRNAs and predicts the charging identity of both typical and atypical tRNAs such as suppressors with high confidence. We show statistical evidence for extensive variation in tRNA identity determinants among bacterial genomes due to variation in overall tDNA base content. With TFAM we have detected the first case of eukaryotic-like tRNA identity rules in bacteria. An alpha-proteobacterial clade encompassing Rhizobiales, Caulobacter crescentus and Silicibacter pomeroyi, unlike a sister clade containing the Rickettsiales, Zymomonas mobilis and Gluconobacter oxydans, uses the eukaryotic identity element A73 instead of the highly conserved prokaryotic element C73. We confirm divergence of bacterial histidylation rules by demonstrating perfect covariation of alpha-proteobacterial tRNA(His) acceptor stems and residues in the motif IIb tRNA-binding pocket of their histidyl-tRNA synthetases (HisRS). Phylogenomic analysis supports lateral transfer of a eukaryotic-like HisRS into the alpha-proteobacteria followed by in situ adaptation of the bacterial tDNA(His) and identity rule divergence. Our results demonstrate that TFAM is an effective tool for the bioinformatics, comparative genomics and evolutionary study of tRNA identity.


Assuntos
Alphaproteobacteria/genética , Evolução Molecular , Transferência Genética Horizontal , Histidina-tRNA Ligase/genética , Modelos Estatísticos , RNA de Transferência de Histidina/genética , Alphaproteobacteria/classificação , Alphaproteobacteria/enzimologia , DNA Bacteriano/classificação , Bases de Dados de Ácidos Nucleicos , Variação Genética , Genoma Bacteriano , Genômica , Histidina-tRNA Ligase/classificação , Filogenia , RNA de Transferência/classificação , RNA de Transferência/genética , RNA de Transferência de Histidina/química , RNA de Transferência de Histidina/classificação , RNA de Transferência de Metionina/classificação
16.
J Bacteriol ; 189(24): 8993-9000, 2007 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-17951392

RESUMO

Expression of minigenes encoding tetra- or pentapeptides MXLX or MXLXV (E peptides), where X is a nonpolar amino acid, renders cells erythromycin resistant whereas expression of minigenes encoding tripeptide MXL does not. By using a 3A' reporter gene system beginning with an E-peptide-encoding sequence, we asked whether the codons UGG and GGG, which are known to promote peptidyl-tRNA drop-off at early positions in mRNA, would result in a phenotype of erythromycin resistance if located after this sequence. We find that UGG or GGG, at either position +4 or +5, without a following stop codon, is associated with an erythromycin resistance phenotype upon gene induction. Our results suggest that, while a stop codon at +4 gives a tripeptide product (MIL) and erythromycin sensitivity, UGG or GGG codons at the same position give a tetrapeptide product (MILW or MILG) and phenotype of erythromycin resistance. Thus, the drop-off event on GGG or UGG codons occurs after incorporation of the corresponding amino acid into the growing peptide chain. Drop-off gives rise to a peptidyl-tRNA where the peptide moiety functionally mimics a minigene peptide product of the type previously associated with erythromycin resistance. Several genes in Escherichia coli fulfill the requirements of high mRNA expression and an E-peptide sequence followed by UGG or GGG at position +4 or +5 and should potentially be able to give an erythromycin resistance phenotype.


Assuntos
Antibacterianos/farmacologia , Códon/genética , Farmacorresistência Bacteriana , Eritromicina/farmacologia , Escherichia coli/efeitos dos fármacos , Biossíntese de Proteínas , Aminoacil-RNA de Transferência/metabolismo , Genes Reporter , Oligopeptídeos/biossíntese , Proteína Estafilocócica A/biossíntese , Proteína Estafilocócica A/genética
17.
Biochimie ; 89(10): 1276-88, 2007 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-17889982

RESUMO

There are at least 21 subfunctional classes of tRNAs in most cells that, despite a very highly conserved and compact common structure, must interact specifically with different cliques of proteins or cause grave organismal consequences. Protein recognition of specific tRNA substrates is achieved in part through class-restricted tRNA features called tRNA identity determinants. In earlier work we used TFAM, a statistical classifier of tRNA function, to show evidence of unexpectedly large diversity among bacteria in tRNA identity determinants. We also created a data reduction technique called function logos to visualize identity determinants for a given taxon. Here we show evidence that determinants for lysylated isoleucine tRNAs are not the same in Proteobacteria as in other bacterial groups including the Cyanobacteria. Consistent with this, the lysylating biosynthetic enzyme TilS lacks a C-terminal domain in Cyanobacteria that is present in Proteobacteria. We present here, using function logos, a map estimating all potential identity determinants generally operational in Cyanobacteria and Proteobacteria. To further isolate the differences in potential tRNA identity determinants between Proteobacteria and Cyanobacteria, we created two new data reduction visualizations to contrast sequence and function logos between two taxa. One, called Information Difference logos (ID logos), shows the evolutionary gain or retention of functional information associated to features in one lineage. The other, Kullback-Leibler divergence Difference logos (KLD logos), shows recruitments or shifts in the functional associations of features, especially those informative in both lineages. We used these new logos to specifically isolate and visualize the differences in potential tRNA identity determinants between Proteobacteria and Cyanobacteria. Our graphical results point to numerous differences in potential tRNA identity determinants between these groups. Although more differences in general are explained by shifts in functional association rather than gains or losses, the apparent identity differences in lysylated isoleucine tRNAs appear to have evolved through both mechanisms.


Assuntos
Algoritmos , Cianobactérias/genética , Proteobactérias/genética , RNA de Transferência/genética , Aminoacil-tRNA Sintetases/metabolismo , Cianobactérias/metabolismo , Proteobactérias/metabolismo , Aminoacil-RNA de Transferência/genética
18.
Genome Biol Evol ; 9(7): 1971-1977, 2017 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-28810711

RESUMO

Candida albicans is the most common cause of life-threatening fungal infections in humans, especially in immunocompromised individuals. Crucial to its success as an opportunistic pathogen is the considerable dynamism of its genome, which readily undergoes genetic changes generating new phenotypes and shaping the evolution of new strains. Candida africana is an intriguing C. albicans biovariant strain that exhibits remarkable genetic and phenotypic differences when compared with standard C. albicans isolates. Candida africana is well-known for its low degree of virulence compared with C. albicans and for its inability to produce chlamydospores that C. albicans, characteristically, produces under certain environmental conditions. Chlamydospores are large, spherical structures, whose biological function is still unknown. For this reason, we have sequenced, assembled, and annotated the whole transcriptomes obtained from an efficient C. albicans chlamydospore-producing clinical strain (GE1), compared with the natural chlamydospore-negative C. africana clinical strain (CBS 11016). The transcriptomes of both C. albicans (GE1) and C. africana (CBS 11016) clinical strains, grown under chlamydospore-inducing conditions, were sequenced and assembled into 7,442 (GE1 strain) and 8,370 (CBS 11016 strain) high quality transcripts, respectively. The release of the first assembly of the C. africana transcriptome will allow future comparative studies to better understand the biology and evolution of this important human fungal pathogen.


Assuntos
Candida albicans/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de RNA/métodos , Esporos Fúngicos/genética , Transcriptoma , Candida albicans/classificação , Regulação Fúngica da Expressão Gênica , Especificidade da Espécie
19.
J Mol Biol ; 351(1): 9-15, 2005 Aug 05.
Artigo em Inglês | MEDLINE | ID: mdl-16002088

RESUMO

Based on a computational analysis of the 5' regions of tRNA-encoding genes, the average length of the 5' leaders in tRNA precursors in Escherichia coli appears to be 17-18 residues long. An in vivo assay based on tRNA nonsense suppression was developed and used to investigate the function of the 5' leader of the tRNA precursors on tRNA processing and bacterial growth. Our data indicate that the 5' leader influences bacterial growth but is surprisingly not absolutely necessary for growth. These findings are consistent with previous in vitro data where it was demonstrated that the 5' leader plays a role in the interaction with RNase P, the endoribonuclease responsible for removing the 5' leader in the cell. We discuss the plausible role of the 5' leader in processing and tRNA gene expression.


Assuntos
Escherichia coli/crescimento & desenvolvimento , Escherichia coli/genética , Precursores de RNA , Processamento Pós-Transcricional do RNA , RNA de Transferência/genética , Sequência de Bases , Proliferação de Células , Modelos Moleculares , Conformação de Ácido Nucleico , RNA de Transferência de Fenilalanina , Ribonuclease P/metabolismo
20.
Genetics ; 165(4): 1761-77, 2003 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-14704164

RESUMO

DNA polymerase alpha is the most highly scrambled gene known in stichotrichous ciliates. In its hereditary micronuclear form, it is broken into >40 pieces on two loci at least 3 kb apart. Scrambled genes must be reassembled through developmental DNA rearrangements to yield functioning macronuclear genes, but the mechanism and accuracy of this process are unknown. We describe the first analysis of DNA polymorphism in the macronuclear version of any scrambled gene. Six functional haplotypes obtained from five Eurasian strains of Stylonychia lemnae were highly polymorphic compared to Drosophila genes. Another incompletely unscrambled haplotype was interrupted by frameshift and nonsense mutations but contained more silent mutations than expected by allelic inactivation. In our sample, nucleotide diversity and recombination signals were unexpectedly high within a region encompassing the boundary of the two micronuclear loci. From this and other evidence we infer that both members of a long repeat at the ends of the loci provide alternative substrates for unscrambling in this region. Incongruent genealogies and recombination patterns were also consistent with separation of the two loci by a large genetic distance. Our results suggest that ciliate developmental DNA rearrangements may be more probabilistic and error prone than previously appreciated and constitute a potential source of macronuclear variation. From this perspective we introduce the nonsense-suppression hypothesis for the evolution of ciliate altered genetic codes. We also introduce methods and software to calculate the likelihood of hemizygosity in ciliate haplotype samples and to correct for multiple comparisons in sliding-window analyses of Tajima's D.


Assuntos
Cilióforos/genética , DNA Polimerase I/genética , Mutação/genética , Polimorfismo Genético/genética , Recombinação Genética , Animais , Sequência de Bases , Cilióforos/enzimologia , Biologia Computacional , DNA de Protozoário/genética , DNA de Protozoário/metabolismo , Evolução Molecular , Haplótipos/genética , Dados de Sequência Molecular , Mosaicismo , Homologia de Sequência do Ácido Nucleico
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA