Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 38
Filtrar
1.
Environ Microbiome ; 15(1): 11, 2020 May 18.
Artigo em Inglês | MEDLINE | ID: mdl-33902725

RESUMO

BACKGROUND: Sequencing of 16S rRNA genes has become a powerful technique to study microbial communities and their responses towards changing environmental conditions in various ecosystems. Several tools have been developed for the prediction of functional profiles from 16S rRNA gene sequencing data, because numerous questions in ecosystem ecology require knowledge of community functions in addition to taxonomic composition. However, the accuracy of these tools relies on functional information derived from genomes available in public databases, which are often not representative of the microorganisms present in the studied ecosystem. In addition, there is also a lack of tools to predict functional gene redundancy in microbial communities. RESULTS: To address these challenges, we developed Tax4Fun2, an R package for the prediction of functional profiles and functional gene redundancies of prokaryotic communities from 16S rRNA gene sequences. We demonstrate that functional profiles predicted by Tax4Fun2 are highly correlated to functional profiles derived from metagenomes of the same samples. We further show that Tax4Fun2 has higher accuracies than PICRUSt and Tax4Fun. By incorporating user-defined, habitat-specific genomic information, the accuracy and robustness of predicted functional profiles is substantially enhanced. In addition, functional gene redundancies predicted with Tax4Fun2 are highly correlated to functional gene redundancies determined for simulated microbial communities. CONCLUSIONS: Tax4Fun2 provides researchers with a unique tool to predict and investigate functional profiles of prokaryotic communities based on 16S rRNA gene sequencing data. It is easy-to-use, platform-independent and highly memory-efficient, thus enabling researchers without extensive bioinformatics knowledge or access to high-performance clusters to predict functional profiles. Another unique feature of Tax4Fun2 is that it allows researchers to calculate the redundancy of specific functions, which is a potentially important measure of how resilient a community will be to environmental perturbation. Tax4Fun2 is implemented in R and freely available at https://github.com/bwemheu/Tax4Fun2.

2.
PeerJ ; 5: e3859, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29062598

RESUMO

BACKGROUND: Differential expression analysis on the basis of RNA-Seq count data has become a standard tool in transcriptomics. Several studies have shown that prior normalization of the data is crucial for a reliable detection of transcriptional differences. Until now it has not been clear whether and how the transcriptomic approach can be used for differential expression analysis in metatranscriptomics. METHODS: We propose a model for differential expression in metatranscriptomics that explicitly accounts for variations in the taxonomic composition of transcripts across different samples. As a main consequence the correct normalization of metatranscriptomic count data under this model requires the taxonomic separation of the data into organism-specific bins. Then the taxon-specific scaling of organism profiles yields a valid normalization and allows us to recombine the scaled profiles into a metatranscriptomic count matrix. This matrix can then be analyzed with statistical tools for transcriptomic count data. For taxon-specific scaling and recombination of scaled counts we provide a simple R script. RESULTS: When applying transcriptomic tools for differential expression analysis directly to metatranscriptomic data with an organism-independent (global) scaling of counts the resulting differences may be difficult to interpret. The differences may correspond to changing functional profiles of the contributing organisms but may also result from a variation of taxonomic abundances. Taxon-specific scaling eliminates this variation and therefore the resulting differences actually reflect a different behavior of organisms under changing conditions. In simulation studies we show that the divergence between results from global and taxon-specific scaling can be drastic. In particular, the variation of organism abundances can imply a considerable increase of significant differences with global scaling. Also, on real metatranscriptomic data, the predictions from taxon-specific and global scaling can differ widely. Our studies indicate that in real data applications performed with global scaling it might be impossible to distinguish between differential expression in terms of transcriptomic changes and differential composition in terms of changing taxonomic proportions. CONCLUSIONS: As in transcriptomics, a proper normalization of count data is also essential for differential expression analysis in metatranscriptomics. Our model implies a taxon-specific scaling of counts for normalization of the data. The application of taxon-specific scaling consequently removes taxonomic composition variations from functional profiles and therefore provides a clear interpretation of the observed functional differences.

3.
Nat Methods ; 14(11): 1063-1071, 2017 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-28967888

RESUMO

Methods for assembly, taxonomic profiling and binning are key to interpreting metagenome data, but a lack of consensus about benchmarking complicates performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on highly complex and realistic data sets, generated from ∼700 newly sequenced microorganisms and ∼600 novel viruses and plasmids and representing common experimental setups. Assembly and genome binning programs performed well for species represented by individual genomes but were substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below family level. Parameter settings markedly affected performance, underscoring their importance for program reproducibility. The CAMI results highlight current challenges but also provide a roadmap for software selection to answer specific research questions.


Assuntos
Metagenômica , Software , Algoritmos , Benchmarking , Análise de Sequência de DNA
4.
Bioinformatics ; 31(17): 2882-4, 2015 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-25957349

RESUMO

MOTIVATION: The characterization of phylogenetic and functional diversity is a key element in the analysis of microbial communities. Amplicon-based sequencing of marker genes, such as 16S rRNA, is a powerful tool for assessing and comparing the structure of microbial communities at a high phylogenetic resolution. Because 16S rRNA sequencing is more cost-effective than whole metagenome shotgun sequencing, marker gene analysis is frequently used for broad studies that involve a large number of different samples. However, in comparison to shotgun sequencing approaches, insights into the functional capabilities of the community get lost when restricting the analysis to taxonomic assignment of 16S rRNA data. RESULTS: Tax4Fun is a software package that predicts the functional capabilities of microbial communities based on 16S rRNA datasets. We evaluated Tax4Fun on a range of paired metagenome/16S rRNA datasets to assess its performance. Our results indicate that Tax4Fun provides a good approximation to functional profiles obtained from metagenomic shotgun sequencing approaches. AVAILABILITY AND IMPLEMENTATION: Tax4Fun is an open-source R package and applicable to output as obtained from the SILVAngs web server or the application of QIIME with a SILVA database extension. Tax4Fun is freely available for download at http://tax4fun.gobics.de/. CONTACT: kasshau@gwdg.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Bactérias/genética , Biologia Computacional/métodos , Genes Bacterianos/genética , Metagenômica/métodos , RNA Ribossômico 16S/genética , Análise de Sequência de RNA/métodos , Software , Bactérias/classificação , Bases de Dados Factuais , Bases de Dados de Ácidos Nucleicos , Marcadores Genéticos , Metagenoma , Filogenia , RNA Bacteriano/genética
5.
Metabolomics ; 11(3): 764-777, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25972773

RESUMO

A central aim in the evaluation of non-targeted metabolomics data is the detection of intensity patterns that differ between experimental conditions as well as the identification of the underlying metabolites and their association with metabolic pathways. In this context, the identification of metabolites based on non-targeted mass spectrometry data is a major bottleneck. In many applications, this identification needs to be guided by expert knowledge and interactive tools for exploratory data analysis can significantly support this process. Additionally, the integration of data from other omics platforms, such as DNA microarray-based transcriptomics, can provide valuable hints and thereby facilitate the identification of metabolites via the reconstruction of related metabolic pathways. We here introduce the MarVis-Pathway tool, which allows the user to identify metabolites by annotation of pathways from cross-omics data. The analysis is supported by an extensive framework for pathway enrichment and meta-analysis. The tool allows the mapping of data set features by ID, name, and accurate mass, and can incorporate information from adduct and isotope correction of mass spectrometry data. MarVis-Pathway was integrated in the MarVis-Suite (http://marvis.gobics.de), which features the seamless highly interactive filtering, combination, clustering, and visualization of omics data sets. The functionality of the new software tool is illustrated using combined mass spectrometry and DNA microarray data. This application confirms jasmonate biosynthesis as important metabolic pathway that is upregulated during the wound response of Arabidopsis plants.

6.
Bioinformatics ; 31(9): 1382-8, 2015 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-25540185

RESUMO

MOTIVATION: With rapidly increasing volumes of biological sequence data the functional analysis of new sequences in terms of similarities to known protein families challenges classical bioinformatics. RESULTS: The ultrafast protein classification (UProC) toolbox implements a novel algorithm ('Mosaic Matching') for large-scale sequence analysis. UProC is by three orders of magnitude faster than profile-based methods and in a metagenome simulation study achieved up to 80% higher sensitivity on unassembled 100 bp reads. AVAILABILITY AND IMPLEMENTATION: UProC is available as an open-source software at https://github.com/gobics/uproc. Precompiled databases (Pfam) are linked on the UProC homepage: http://uproc.gobics.de/. CONTACT: peter@gobics.de. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Metagenômica/métodos , Estrutura Terciária de Proteína , Software , Algoritmos , Metagenoma , Fases de Leitura Aberta
7.
BMC Genomics ; 15: 1003, 2014 Nov 20.
Artigo em Inglês | MEDLINE | ID: mdl-25409897

RESUMO

BACKGROUND: The annotation of biomolecular functions is an essential step in the analysis of newly sequenced organisms. Usually, the functions are inferred from predicted genes on the genome using homology search techniques. A high quality genomic sequence is an important prerequisite which, however, is difficult to achieve for certain organisms, such as hybrids or organisms with a large genome. For functional analysis it is also possible to use a de novo transcriptome assembly but the computational requirements can be demanding. Up to now, it is unclear how much of the functional repertoire of an organism can be reliably predicted from unassembled RNA-seq short reads alone. RESULTS: We have conducted a study to investigate to what degree it is possible to reconstruct the functional profile of an organism from unassembled transcriptome data. We simulated the de novo prediction of biomolecular functions for Arabidopsis thaliana using a comprehensive RNA-seq data set. We evaluated the prediction performance using several homology search methods in combination with different evidence measures. For the decision on the presence or absence of a particular function under noisy conditions we propose a statistical mixture model enabling unsupervised estimation of a detection threshold. Our results indicate that the prediction of the biomolecular functions from the KEGG database is possible with a high sensitivity up to 94 percent. In this setting, the application of the mixture model for automatic threshold calibration allowed the reduction of the falsely predicted functions down to 4 percent. Furthermore, we found that our statistical approach even outperforms the prediction from a de novo transcriptome assembly. CONCLUSION: The analysis of an organism's transcriptome can provide a solid basis for the prediction of biomolecular functions. Using RNA-seq short reads directly, the functional profile of an organism can be reconstructed in a computationally efficient way to provide a draft annotation in cases where the classical genome-based approaches cannot be applied.


Assuntos
Arabidopsis/genética , Bases de Dados Genéticas , Análise de Sequência de RNA/métodos , Calibragem , Modelos Genéticos , Distribuição Normal , Transcriptoma/genética
8.
Int J Mol Sci ; 15(7): 12364-78, 2014 Jul 14.
Artigo em Inglês | MEDLINE | ID: mdl-25026170

RESUMO

The variety of metagenomes in current databases provides a rapidly growing source of information for comparative studies. However, the quantity and quality of supplementary metadata is still lagging behind. It is therefore important to be able to identify related metagenomes by means of the available sequence data alone. We have studied efficient sequence-based methods for large-scale identification of similar metagenomes within a database retrieval context. In a broad comparison of different profiling methods we found that vector-based distance measures are well-suitable for the detection of metagenomic neighbors. Our evaluation on more than 1700 publicly available metagenomes indicates that for a query metagenome from a particular habitat on average nine out of ten nearest neighbors represent the same habitat category independent of the utilized profiling method or distance measure. While for well-defined labels a neighborhood accuracy of 100% can be achieved, in general the neighbor detection is severely affected by a natural overlap of manually annotated categories. In addition, we present results of a novel visualization method that is able to reflect the similarity of metagenomes in a 2D scatter plot. The visualization method shows a similarly high accuracy in the reduced space as compared with the high-dimensional profile space. Our study suggests that for inspection of metagenome neighborhoods the profiling methods and distance measures can be chosen to provide a convenient interpretation of results in terms of the underlying features. Furthermore, supplementary metadata of metagenome samples in the future needs to comply with readily available ontologies for fine-grained and standardized annotation. To make profile-based k-nearest-neighbor search and the 2D-visualization of the metagenome universe available to the research community, we included the proposed methods in our CoMet-Universe server for comparative metagenome analysis.


Assuntos
Genômica/métodos , Metagenoma , Análise de Sequência de DNA/métodos , Genoma Humano , Humanos , Microbiota/genética
9.
PeerJ ; 2: e239, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24688832

RESUMO

State of the art high-throughput technologies allow comprehensive experimental studies of organism metabolism and induce the need for a convenient presentation of large heterogeneous datasets. Especially, the combined analysis and visualization of data from different high-throughput technologies remains a key challenge in bioinformatics. We present here the MarVis-Graph software for integrative analysis of metabolic and transcriptomic data. All experimental data is investigated in terms of the full metabolic network obtained from a reference database. The reactions of the network are scored based on the associated data, and sub-networks, according to connected high-scoring reactions, are identified. Finally, MarVis-Graph scores the detected sub-networks, evaluates them by means of a random permutation test and presents them as a ranked list. Furthermore, MarVis-Graph features an interactive network visualization that provides researchers with a convenient view on the results. The key advantage of MarVis-Graph is the analysis of reactions detached from their pathways so that it is possible to identify new pathways or to connect known pathways by previously unrelated reactions. The MarVis-Graph software is freely available for academic use and can be downloaded at: http://marvis.gobics.de/marvis-graph.

10.
PLoS One ; 9(2): e89297, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24586671

RESUMO

A major challenge in current systems biology is the combination and integrative analysis of large data sets obtained from different high-throughput omics platforms, such as mass spectrometry based Metabolomics and Proteomics or DNA microarray or RNA-seq-based Transcriptomics. Especially in the case of non-targeted Metabolomics experiments, where it is often impossible to unambiguously map ion features from mass spectrometry analysis to metabolites, the integration of more reliable omics technologies is highly desirable. A popular method for the knowledge-based interpretation of single data sets is the (Gene) Set Enrichment Analysis. In order to combine the results from different analyses, we introduce a methodical framework for the meta-analysis of p-values obtained from Pathway Enrichment Analysis (Set Enrichment Analysis based on pathways) of multiple dependent or independent data sets from different omics platforms. For dependent data sets, e.g. obtained from the same biological samples, the framework utilizes a covariance estimation procedure based on the nonsignificant pathways in single data set enrichment analysis. The framework is evaluated and applied in the joint analysis of Metabolomics mass spectrometry and Transcriptomics DNA microarray data in the context of plant wounding. In extensive studies of simulated data set dependence, the introduced correlation could be fully reconstructed by means of the covariance estimation based on pathway enrichment. By restricting the range of p-values of pathways considered in the estimation, the overestimation of correlation, which is introduced by the significant pathways, could be reduced. When applying the proposed methods to the real data sets, the meta-analysis was shown not only to be a powerful tool to investigate the correlation between different data sets and summarize the results of multiple analyses but also to distinguish experiment-specific key pathways.


Assuntos
Análise de Sequência com Séries de Oligonucleotídeos , Bases de Dados Genéticas , Humanos , Metabolômica , Biologia de Sistemas/métodos
11.
Microb Ecol ; 67(4): 919-30, 2014 May.
Artigo em Inglês | MEDLINE | ID: mdl-24553913

RESUMO

Soil microorganisms play an essential role in sustaining biogeochemical processes and cycling of nutrients across different land use types. To gain insights into microbial gene transcription in forest and grassland soil, we isolated mRNA from 32 sampling sites. After sequencing of generated complementary DNA (cDNA), a total of 5,824,229 sequences could be further analyzed. We were able to assign nonribosomal cDNA sequences to all three domains of life. A dominance of bacterial sequences, which were affiliated to 25 different phyla, was found. Bacterial groups capable of aromatic compound degradation such as Phenylobacterium and Burkholderia were detected in significantly higher relative abundance in forest soil than in grassland soil. Accordingly, KEGG pathway categories related to degradation of aromatic ring-containing molecules (e.g., benzoate degradation) were identified in high abundance within forest soil-derived metatranscriptomic datasets. The impact of land use type forest on community composition and activity is evidently to a high degree caused by the presence of wood breakdown products. Correspondingly, bacterial groups known to be involved in lignin degradation and containing ligninolytic genes such as Burkholderia, Bradyrhizobium, and Azospirillum exhibited increased transcriptional activity in forest soil. Higher solar radiation in grassland presumably induced increased transcription of photosynthesis-related genes within this land use type. This is in accordance with high abundance of photosynthetic organisms and plant-infecting viruses in grassland.


Assuntos
Florestas , Microbiota , Microbiologia do Solo , Transcriptoma , Archaea/classificação , Archaea/genética , Archaea/isolamento & purificação , Bactérias/classificação , Bactérias/genética , Bactérias/isolamento & purificação , Eucariotos/classificação , Eucariotos/genética , Eucariotos/isolamento & purificação , Pradaria , Dados de Sequência Molecular , Filogenia , RNA Mensageiro/genética , Análise de Sequência de DNA
12.
Bioinformatics ; 29(8): 973-80, 2013 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-23418187

RESUMO

MOTIVATION: Metagenome analysis requires tools that can estimate the taxonomic abundances in anonymous sequence data over the whole range of biological entities. Because there is usually no prior knowledge about the data composition, not only all domains of life but also viruses have to be included in taxonomic profiling. Such a full-range approach, however, is difficult to realize owing to the limited coverage of available reference data. In particular, archaea and viruses are generally not well represented by current genome databases. RESULTS: We introduce a novel approach to taxonomic profiling of metagenomes that is based on mixture model analysis of protein signatures. Our results on simulated and real data reveal the difficulties of the existing methods when measuring achaeal or viral abundances and show the overall good profiling performance of the protein-based mixture model. As an application example, we provide a large-scale analysis of data from the Human Microbiome Project. This demonstrates the utility of our method as a first instance profiling tool for a fast estimate of the community structure. AVAILABILITY: http://gobics.de/TaxyPro. SUPPLEMENTARY INFORMATION: Supplementary Material is available at Bioinformatics online.


Assuntos
Metagenômica/métodos , Estrutura Terciária de Proteína , DNA Arqueal/análise , DNA Viral/análise , Humanos , Metagenoma , Filogenia
13.
J Biomed Biotechnol ; 2012: 263910, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22550397

RESUMO

Statistical ranking, filtering, adduct detection, isotope correction, and molecular formula calculation are essential tasks in processing mass spectrometry data in metabolomics studies. In order to obtain high-quality data sets, a framework which incorporates all these methods is required. We present the MarVis-Filter software, which provides well-established and specialized methods for processing mass spectrometry data. For the task of ranking and filtering multivariate intensity profiles, MarVis-Filter provides the ANOVA and Kruskal-Wallis tests with adjustment for multiple hypothesis testing. Adduct and isotope correction are based on a novel algorithm which takes the similarity of intensity profiles into account and allows user-defined ionization rules. The molecular formula calculation utilizes the results of the adduct and isotope correction. For a comprehensive analysis, MarVis-Filter provides an interactive interface to combine data sets deriving from positive and negative ionization mode. The software is exemplarily applied in a metabolic case study, where octadecanoids could be identified as markers for wounding in plants.


Assuntos
Algoritmos , Biologia Computacional/métodos , Espectrometria de Massas/métodos , Metabolômica/métodos , Software , Arabidopsis/metabolismo , Isótopos de Carbono , Ciclopentanos/metabolismo , Bases de Dados Factuais , Metaboloma , Modelos Biológicos , Oxilipinas/metabolismo
14.
Stat Appl Genet Mol Biol ; 11(1): Article 1, 2012 Jan 06.
Artigo em Inglês | MEDLINE | ID: mdl-22499688

RESUMO

Profile Hidden Markov Models (pHMMs) are widely used to model nucleotide or protein sequence families. In many applications, a sequence family classified into several subfamilies is given and each subfamily is modeled separately by one pHMM. A major drawback of this approach is the difficulty of coping with subfamilies composed of very few sequences.Correct subtyping of human immunodeficiency virus-1 (HIV-1) sequences is one of the most crucial bioinformatic tasks affected by this problem of small subfamilies, i.e., HIV-1 subtypes with a small number of known sequences. To deal with small samples for particular subfamilies of HIV-1, we employ a machine learning approach. More precisely, we make use of an existing HMM architecture and its associated inference engine, while replacing the unsupervised estimation of emission probabilities by a supervised method. For that purpose, we use regularized linear discriminant learning together with a balancing scheme to account for the widely varying sample size. After training the multiclass linear discriminants, the corresponding weights are transformed to valid probabilities using a softmax function.We apply this modified algorithm to classify HIV-1 sequence data (in the form of partial-length HIV-1 sequences and semi-artificial recombinants) and show that the performance of pHMMs can be significantly improved by the proposed technique.


Assuntos
Algoritmos , HIV-1/genética , Cadeias de Markov , Inteligência Artificial , Bases de Dados Factuais , Humanos , Reconhecimento Automatizado de Padrão/métodos
15.
Nature ; 478(7369): 395-8, 2011 Oct 05.
Artigo em Inglês | MEDLINE | ID: mdl-21976020

RESUMO

Maize smut caused by the fungus Ustilago maydis is a widespread disease characterized by the development of large plant tumours. U. maydis is a biotrophic pathogen that requires living plant tissue for its development and establishes an intimate interaction zone between fungal hyphae and the plant plasma membrane. U. maydis actively suppresses plant defence responses by secreted protein effectors. Its effector repertoire comprises at least 386 genes mostly encoding proteins of unknown function and expressed exclusively during the biotrophic stage. The U. maydis secretome also contains about 150 proteins with probable roles in fungal nutrition, fungal cell wall modification and host penetration as well as proteins unlikely to act in the fungal-host interface like a chorismate mutase. Chorismate mutases are key enzymes of the shikimate pathway and catalyse the conversion of chorismate to prephenate, the precursor for tyrosine and phenylalanine synthesis. Root-knot nematodes inject a secreted chorismate mutase into plant cells likely to affect development. Here we show that the chorismate mutase Cmu1 secreted by U. maydis is a virulence factor. The enzyme is taken up by plant cells, can spread to neighbouring cells and changes the metabolic status of these cells through metabolic priming. Secreted chorismate mutases are found in many plant-associated microbes and might serve as general tools for host manipulation.


Assuntos
Corismato Mutase/metabolismo , Ustilago/enzimologia , Ustilago/patogenicidade , Fatores de Virulência/metabolismo , Zea mays/metabolismo , Zea mays/microbiologia , Citoplasma/enzimologia , Regulação da Expressão Gênica de Plantas , Teste de Complementação Genética , Interações Hospedeiro-Patógeno , Metaboloma , Modelos Biológicos , Proteínas de Plantas/metabolismo , Plastídeos/enzimologia , Multimerização Proteica , Saccharomyces cerevisiae/genética , Ácido Salicílico/metabolismo , Técnicas do Sistema de Duplo-Híbrido , Fatores de Virulência/genética
16.
Nucleic Acids Res ; 39(Web Server issue): W518-23, 2011 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-21622656

RESUMO

Analyzing the functional potential of newly sequenced genomes and metagenomes has become a common task in biomedical and biological research. With the advent of high-throughput sequencing technologies comparative metagenomics opens the way to elucidate the genetically determined similarities and differences of complex microbial communities. We developed the web server 'CoMet' (http://comet.gobics.de), which provides an easy-to-use comparative metagenomics platform that is well-suitable for the analysis of large collections of metagenomic short read data. CoMet combines the ORF finding and subsequent assignment of protein sequences to Pfam domain families with a comparative statistical analysis. Besides comprehensive tabular data files, the CoMet server also provides visually interpretable output in terms of hierarchical clustering and multi-dimensional scaling plots and thus allows a quick overview of a given set of metagenomic samples.


Assuntos
Metagenômica/métodos , Software , Análise por Conglomerados , Interpretação Estatística de Dados , Internet , Metagenoma , Estrutura Terciária de Proteína , Análise de Sequência de DNA , Análise de Sequência de Proteína
17.
Bioinformatics ; 27(12): 1618-24, 2011 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-21546400

RESUMO

MOTIVATION: Inferring the taxonomic profile of a microbial community from a large collection of anonymous DNA sequencing reads is a challenging task in metagenomics. Because existing methods for taxonomic profiling of metagenomes are all based on the assignment of fragmentary sequences to phylogenetic categories, the accuracy of results largely depends on fragment length. This dependence complicates comparative analysis of data originating from different sequencing platforms or resulting from different preprocessing pipelines. RESULTS: We here introduce a new method for taxonomic profiling based on mixture modeling of the overall oligonucleotide distribution of a sample. Our results indicate that the mixture-based profiles compare well with taxonomic profiles obtained with other methods. However, in contrast to the existing methods, our approach shows a nearly constant profiling accuracy across all kinds of read lengths and it operates at an unrivaled speed. AVAILABILITY: A platform-independent implementation of the mixture modeling approach is available in terms of a MATLAB/Octave toolbox at http://gobics.de/peter/taxy. In addition, a prototypical implementation within an easy-to-use interactive tool for Windows can be downloaded.


Assuntos
Metagenômica/métodos , Filogenia , Algoritmos , Metagenoma , Modelos Genéticos , Análise de Sequência de DNA
18.
Plant Cell ; 23(4): 1556-72, 2011 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-21487095

RESUMO

In the postgenomic era, accurate prediction tools are essential for identification of the proteomes of cell organelles. Prediction methods have been developed for peroxisome-targeted proteins in animals and fungi but are missing specifically for plants. For development of a predictor for plant proteins carrying peroxisome targeting signals type 1 (PTS1), we assembled more than 2500 homologous plant sequences, mainly from EST databases. We applied a discriminative machine learning approach to derive two different prediction methods, both of which showed high prediction accuracy and recognized specific targeting-enhancing patterns in the regions upstream of the PTS1 tripeptides. Upon application of these methods to the Arabidopsis thaliana genome, 392 gene models were predicted to be peroxisome targeted. These predictions were extensively tested in vivo, resulting in a high experimental verification rate of Arabidopsis proteins previously not known to be peroxisomal. The prediction methods were able to correctly infer novel PTS1 tripeptides, which even included novel residues. Twenty-three newly predicted PTS1 tripeptides were experimentally confirmed, and a high variability of the plant PTS1 motif was discovered. These prediction methods will be instrumental in identifying low-abundance and stress-inducible peroxisomal proteins and defining the entire peroxisomal proteome of Arabidopsis and agronomically important crop plants.


Assuntos
Proteínas de Arabidopsis/metabolismo , Arabidopsis/metabolismo , Inteligência Artificial , Biologia Computacional/métodos , Peroxissomos/metabolismo , Sinais Direcionadores de Proteínas , Sequência de Aminoácidos , Arabidopsis/genética , Proteínas de Arabidopsis/química , Bases de Dados de Proteínas , Genoma de Planta/genética , Modelos Biológicos , Dados de Sequência Molecular , Peptídeos , Transporte Proteico , Reprodutibilidade dos Testes , Frações Subcelulares/metabolismo
19.
Mol Microbiol ; 78(4): 964-79, 2010 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-21062371

RESUMO

The COP9 signalosome complex (CSN) is a crucial regulator of ubiquitin ligases. Defects in CSN result in embryonic impairment and death in higher eukaryotes, whereas the filamentous fungus Aspergillus nidulans survives without CSN, but is unable to complete sexual development. We investigated overall impact of CSN activity on A. nidulans cells by combined transcriptome, proteome and metabolome analysis. Absence of csn5/csnE affects transcription of at least 15% of genes during development, including numerous oxidoreductases. csnE deletion leads to changes in the fungal proteome indicating impaired redox regulation and hypersensitivity to oxidative stress. CSN promotes the formation of asexual spores by regulating developmental hormones produced by PpoA and PpoC dioxygenases. We identify more than 100 metabolites, including orsellinic acid derivatives, accumulating preferentially in the csnE mutant. We also show that CSN is required to activate glucanases and other cell wall recycling enzymes during development. These findings suggest a dual role for CSN during development: it is required early for protection against oxidative stress and hormone regulation and is later essential for control of the secondary metabolism and cell wall rearrangement.


Assuntos
Aspergillus nidulans/crescimento & desenvolvimento , Aspergillus nidulans/metabolismo , Parede Celular/metabolismo , Regulação Fúngica da Expressão Gênica , Hormônios/metabolismo , Complexos Multiproteicos/metabolismo , Estresse Oxidativo , Peptídeo Hidrolases/metabolismo , Transdução de Sinais , Aspergillus nidulans/genética , Complexo do Signalossomo COP9 , Proteínas Fúngicas/genética , Deleção de Genes , Perfilação da Expressão Gênica , Metaboloma , Complexos Multiproteicos/genética , Peptídeo Hidrolases/genética , Proteoma
20.
BMC Bioinformatics ; 11: 481, 2010 Sep 24.
Artigo em Inglês | MEDLINE | ID: mdl-20868492

RESUMO

BACKGROUND: Establishing the relationship between an organism's genome sequence and its phenotype is a fundamental challenge that remains largely unsolved. Accurately predicting microbial phenotypes solely based on genomic features will allow us to infer relevant phenotypic characteristics when the availability of a genome sequence precedes experimental characterization, a scenario that is favored by the advent of novel high-throughput and single cell sequencing techniques. RESULTS: We present a novel approach to predict the phenotype of prokaryotes directly from their protein domain frequencies. Our discriminative machine learning approach provides high prediction accuracy of relevant phenotypes such as motility, oxygen requirement or spore formation. Moreover, the set of discriminative domains provides biological insight into the underlying phenotype-genotype relationship and enables deriving hypotheses on the possible functions of uncharacterized domains. CONCLUSIONS: Fast and accurate prediction of microbial phenotypes based on genomic protein domain content is feasible and has the potential to provide novel biological insights. First results of a systematic check for annotation errors indicate that our approach may also be applied to semi-automatic correction and completion of the existing phenotype annotation.


Assuntos
Proteínas de Bactérias/química , Fenótipo , Algoritmos , Genoma Arqueal , Genoma Bacteriano , Anotação de Sequência Molecular , Estrutura Terciária de Proteína
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...