Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Mais filtros

Bases de dados
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
BMC Evol Biol ; 14: 207, 2014 Oct 09.
Artigo em Inglês | MEDLINE | ID: mdl-25293379

RESUMO

BACKGROUND: Phylogenetic studies have provided detailed knowledge on the evolutionary mechanisms of genes and species in Bacteria and Archaea. However, the evolution of cellular functions, represented by metabolic pathways and biological processes, has not been systematically characterized. Many clades in the prokaryotic tree of life have now been covered by sequenced genomes in GenBank. This enables a large-scale functional phylogenomics study of many computationally inferred cellular functions across all sequenced prokaryotes. RESULTS: A total of 14,727 GenBank prokaryotic genomes were re-annotated using a new protein family database, UniFam, to obtain consistent functional annotations for accurate comparison. The functional profile of a genome was represented by the biological process Gene Ontology (GO) terms in its annotation. The GO term enrichment analysis differentiated the functional profiles between selected archaeal taxa. 706 prokaryotic metabolic pathways were inferred from these genomes using Pathway Tools and MetaCyc. The consistency between the distribution of metabolic pathways in the genomes and the phylogenetic tree of the genomes was measured using parsimony scores and retention indices. The ancestral functional profiles at the internal nodes of the phylogenetic tree were reconstructed to track the gains and losses of metabolic pathways in evolutionary history. CONCLUSIONS: Our functional phylogenomics analysis shows divergent functional profiles of taxa and clades. Such function-phylogeny correlation stems from a set of clade-specific cellular functions with low parsimony scores. On the other hand, many cellular functions are sparsely dispersed across many clades with high parsimony scores. These different types of cellular functions have distinct evolutionary patterns reconstructed from the prokaryotic tree.


Assuntos
Archaea/genética , Bactérias/genética , Anotação de Sequência Molecular/métodos , Bases de Dados de Proteínas , Genoma Arqueal , Genoma Bacteriano , Filogenia
2.
Anal Chem ; 86(19): 9496-503, 2014 Oct 07.
Artigo em Inglês | MEDLINE | ID: mdl-25157598

RESUMO

A database searching approach can be used for metabolite identification in metabolomics by matching measured tandem mass spectra (MS/MS) against the predicted fragments of metabolites in a database. Here, we present the open-source MIDAS algorithm (Metabolite Identification via Database Searching). To evaluate a metabolite-spectrum match (MSM), MIDAS first enumerates possible fragments from a metabolite by systematic bond dissociation, then calculates the plausibility of the fragments based on their fragmentation pathways, and finally scores the MSM to assess how well the experimental MS/MS spectrum from collision-induced dissociation (CID) is explained by the metabolite's predicted CID MS/MS spectrum. MIDAS was designed to search high-resolution tandem mass spectra acquired on time-of-flight or Orbitrap mass spectrometer against a metabolite database in an automated and high-throughput manner. The accuracy of metabolite identification by MIDAS was benchmarked using four sets of standard tandem mass spectra from MassBank. On average, for 77% of original spectra and 84% of composite spectra, MIDAS correctly ranked the true compounds as the first MSMs out of all MetaCyc metabolites as decoys. MIDAS correctly identified 46% more original spectra and 59% more composite spectra at the first MSMs than an existing database-searching algorithm, MetFrag. MIDAS was showcased by searching a published real-world measurement of a metabolome from Synechococcus sp. PCC 7002 against the MetaCyc metabolite database. MIDAS identified many metabolites missed in the previous study. MIDAS identifications should be considered only as candidate metabolites, which need to be confirmed using standard compounds. To facilitate manual validation, MIDAS provides annotated spectra for MSMs and labels observed mass spectral peaks with predicted fragments. The database searching and manual validation can be performed online at http://midas.omicsbio.org.


Assuntos
Algoritmos , Metaboloma , Metabolômica/estatística & dados numéricos , Modelos Estatísticos , Benchmarking , Bases de Dados Factuais , Metabolômica/métodos , Projetos de Pesquisa , Synechococcus/química , Synechococcus/metabolismo , Espectrometria de Massas em Tandem
3.
Funct Integr Genomics ; 10(1): 97-110, 2010 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-19802638

RESUMO

Bacteria of the genus Shewanella can thrive in different environments and demonstrate significant variability in their metabolic and ecophysiological capabilities including cold and salt tolerance. Genomic characteristics underlying this variability across species are largely unknown. In this study, we address the problem by a comparison of the physiological, metabolic, and genomic characteristics of 19 sequenced Shewanella species. We have employed two novel approaches based on association of a phenotypic trait with the number of the trait-specific protein families (Pfam domains) and on the conservation of synteny (order in the genome) of the trait-related genes. Our first approach is top-down and involves experimental evaluation and quantification of the species' cold tolerance followed by identification of the correlated Pfam domains and genes with a conserved synteny. The second, a bottom-up approach, predicts novel phenotypes of the species by calculating profiles of each Pfam domain among their genomes and following pair-wise correlation of the profiles and their network clustering. Using the first approach, we find a link between cold and salt tolerance of the species and the presence in the genome of a Na(+)/H(+) antiporter gene cluster. Other cold-tolerance-related genes include peptidases, chemotaxis sensory transducer proteins, a cysteine exporter, and helicases. Using the bottom-up approach, we found several novel phenotypes in the newly sequenced Shewanella species, including degradation of aromatic compounds by an aerobic hybrid pathway in Shewanella woodyi, degradation of ethanolamine by Shewanella benthica, and propanediol degradation by Shewanella putrefaciens CN32 and Shewanella sp. W3-18-1.


Assuntos
Adaptação Fisiológica/genética , Proteínas de Bactérias/genética , Temperatura Baixa , Família Multigênica/genética , Shewanella/genética , Sintenia/genética , Proteínas de Bactérias/química , Genes Bacterianos/genética , Loci Gênicos/genética , Fenótipo , Estrutura Terciária de Proteína , Tolerância ao Sal/genética , Análise de Sequência de DNA , Especificidade da Espécie
4.
Stand Genomic Sci ; 9: 20, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25780509

RESUMO

BACKGROUND: More than 80% of the microbial genomes in GenBank are of 'draft' quality (12,553 draft vs. 2,679 finished, as of October, 2013). We have examined all the microbial DNA sequences available for complete, draft, and Sequence Read Archive genomes in GenBank as well as three other major public databases, and assigned quality scores for more than 30,000 prokaryotic genome sequences. RESULTS: Scores were assigned using four categories: the completeness of the assembly, the presence of full-length rRNA genes, tRNA composition and the presence of a set of 102 conserved genes in prokaryotes. Most (~88%) of the genomes had quality scores of 0.8 or better and can be safely used for standard comparative genomics analysis. We compared genomes across factors that may influence the score. We found that although sequencing depth coverage of over 100x did not ensure a better score, sequencing read length was a better indicator of sequencing quality. With few exceptions, most of the 30,000 genomes have nearly all the 102 essential genes. CONCLUSIONS: The score can be used to set thresholds for screening data when analyzing "all published genomes" and reference data is either not available or not applicable. The scores highlighted organisms for which commonly used tools do not perform well. This information can be used to improve tools and to serve a broad group of users as more diverse organisms are sequenced. Unexpectedly, the comparison of predicted tRNAs across 15,000 high quality genomes showed that anticodons beginning with an 'A' (codons ending with a 'U') are almost non-existent, with the exception of one arginine codon (CGU); this has been noted previously in the literature for a few genomes, but not with the depth found here.

5.
Database (Oxford) ; 2010: baq012, 2010 Jul 06.
Artigo em Inglês | MEDLINE | ID: mdl-20627862

RESUMO

Shewanellae are facultative gamma-proteobacteria whose remarkable respiratory versatility has resulted in interest in their utility for bioremediation of heavy metals and radionuclides and for energy generation in microbial fuel cells. Extensive experimental efforts over the last several years and the availability of 21 sequenced Shewanella genomes made it possible to collect and integrate a wealth of information on the genus into one public resource providing new avenues for making biological discoveries and for developing a system level understanding of the cellular processes. The Shewanella knowledgebase was established in 2005 to provide a framework for integrated genome-based studies on Shewanella ecophysiology. The present version of the knowledgebase provides access to a diverse set of experimental and genomic data along with tools for curation of genome annotations and visualization and integration of genomic data with experimental data. As a demonstration of the utility of this resource, we examined a single microarray data set from Shewanella oneidensis MR-1 for new insights into regulatory processes. The integrated analysis of the data predicted a new type of bacterial transcriptional regulation involving co-transcription of the intergenic region with the downstream gene and suggested a biological role for co-transcription that likely prevents the binding of a regulator of the upstream gene to the regulator binding site located in the intergenic region. Database URL: http://shewanella-knowledgebase.org:8080/Shewanella/ or http://spruce.ornl.gov:8080/Shewanella/


Assuntos
DNA Bacteriano/genética , DNA Intergênico/genética , Bases de Conhecimento , Shewanella/genética , Sequência de Bases , Bases de Dados Genéticas , Ecossistema , Inativação Gênica , Genoma Bacteriano , Dados de Sequência Molecular , Alinhamento de Sequência , Shewanella/fisiologia , Transcrição Gênica
6.
Bioinformation ; 4(4): 169-72, 2009 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-20198195

RESUMO

UNLABELLED: Shewanella oneidensis MR-1 is an important model organism for environmental research as it has an exceptional metabolic and respiratory versatility regulated by a complex regulatory network. We have developed a database to collect experimental and computational data relating to regulation of gene and protein expression, and, a visualization environment that enables integration of these data types. The regulatory information in the database includes predictions of DNA regulator binding sites, sigma factor binding sites, transcription units, operons, promoters, and RNA regulators including non-coding RNAs, riboswitches, and different types of terminators. AVAILABILITY: http://shewanella-knowledgebase.org:8080/Shewanella/gbrowserLanding.jsp.

7.
Anal Chem ; 78(20): 7110-20, 2006 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-17037910

RESUMO

The abundance ratio between the light and heavy isotopologues of an isotopically labeled peptide can be estimated from their selected ion chromatograms. However, quantitative shotgun proteomics measurements yield selected ion chromatograms at highly variable signal-to-noise ratios for tens of thousands of peptides. This challenge calls for algorithms that not only robustly estimate the abundance ratios of different peptides but also rigorously score each abundance ratio for the expected estimation bias and variability. Scoring of the abundance ratios, much like scoring of sequence assignment for tandem mass spectra by peptide identification algorithms, enables filtering of unreliable peptide quantification and use of formal statistical inference in the subsequent protein abundance ratio estimation. In this study, a parallel paired covariance algorithm is used for robust peak detection in selected ion chromatograms. A peak profile is generated for each peptide, which is a scatterplot of ion intensities measured for the two isotopologues within their chromatographic peaks. Principal component analysis of the peak profile is proposed to estimate the peptide abundance ratio and to score the estimation with the signal-to-noise ratio of the peak profile (profile signal-to-noise ratio). We demonstrate that the profile signal-to-noise ratio is inversely correlated with the variability and bias of peptide abundance ratio estimation.


Assuntos
Peptídeos/análise , Proteômica/métodos , Algoritmos , Sequência de Aminoácidos , Viés , Cromatografia , Temperatura Alta , Íons , Dados de Sequência Molecular , Peptídeos/química , Reprodutibilidade dos Testes , Rodopseudomonas , Espectrometria de Massas em Tandem
8.
Anal Chem ; 78(20): 7121-31, 2006 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-17037911

RESUMO

A profile likelihood algorithm is proposed for quantitative shotgun proteomics to infer the abundance ratios of proteins from the abundance ratios of isotopically labeled peptides derived from proteolysis. Previously, we have shown that the estimation variability and bias of peptide abundance ratios can be predicted from their profile signal-to-noise ratios. Given multiple quantified peptides for a protein, the profile likelihood algorithm probabilistically weighs the peptide abundance ratios by their inferred estimation variability, accounts for their expected estimation bias, and suppresses contribution from outliers. This algorithm yields maximum likelihood point estimation and profile likelihood confidence interval estimation of protein abundance ratios. This point estimator is more accurate than an estimator based on the average of peptide abundance ratios. The confidence interval estimation provides an "error bar" for each protein abundance ratio that reflects its estimation precision and statistical uncertainty. The accuracy of the point estimation and the precision and confidence level of the interval estimation were benchmarked with standard mixtures of isotopically labeled proteomes. The profile likelihood algorithm was integrated into a quantitative proteomics program, called ProRata, freely available at www.MSProRata.org.


Assuntos
Proteínas de Bactérias/análise , Proteômica/métodos , Design de Software , Software , Algoritmos , Viés , Intervalos de Confiança , Temperatura Alta , Proteoma , Rodopseudomonas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA