Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
PeerJ ; 7: e7359, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31388474

RESUMO

We previously reported on MetaBAT, an automated metagenome binning software tool to reconstruct single genomes from microbial communities for subsequent analyses of uncultivated microbial species. MetaBAT has become one of the most popular binning tools largely due to its computational efficiency and ease of use, especially in binning experiments with a large number of samples and a large assembly. MetaBAT requires users to choose parameters to fine-tune its sensitivity and specificity. If those parameters are not chosen properly, binning accuracy can suffer, especially on assemblies of poor quality. Here, we developed MetaBAT 2 to overcome this problem. MetaBAT 2 uses a new adaptive binning algorithm to eliminate manual parameter tuning. We also performed extensive software engineering optimization to increase both computational and memory efficiency. Comparing MetaBAT 2 to alternative software tools on over 100 real world metagenome assemblies shows superior accuracy and computing speed. Binning a typical metagenome assembly takes only a few minutes on a single commodity workstation. We therefore recommend the community adopts MetaBAT 2 for their metagenome binning experiments. MetaBAT 2 is open source software and available at https://bitbucket.org/berkeleylab/metabat.

2.
PLoS Genet ; 12(2): e1005854, 2016 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-26870957

RESUMO

DNA methylation acts in concert with restriction enzymes to protect the integrity of prokaryotic genomes. Studies in a limited number of organisms suggest that methylation also contributes to prokaryotic genome regulation, but the prevalence and properties of such non-restriction-associated methylation systems remain poorly understood. Here, we used single molecule, real-time sequencing to map DNA modifications including m6A, m4C, and m5C across the genomes of 230 diverse bacterial and archaeal species. We observed DNA methylation in nearly all (93%) organisms examined, and identified a total of 834 distinct reproducibly methylated motifs. This data enabled annotation of the DNA binding specificities of 620 DNA Methyltransferases (MTases), doubling known specificities for previously hard to study Type I, IIG and III MTases, and revealing their extraordinary diversity. Strikingly, 48% of organisms harbor active Type II MTases with no apparent cognate restriction enzyme. These active 'orphan' MTases are present in diverse bacterial and archaeal phyla and show motif specificities and methylation patterns consistent with functions in gene regulation and DNA replication. Our results reveal the pervasive presence of DNA methylation throughout the prokaryotic kingdoms, as well as the diversity of sequence specificities and potential functions of DNA methylation systems.


Assuntos
Epigenômica , Células Procarióticas/metabolismo , Sequência Conservada , Metilação de DNA/genética , Replicação do DNA/genética , Enzimas de Restrição-Modificação do DNA/classificação , Enzimas de Restrição-Modificação do DNA/metabolismo , Evolução Molecular , Regulação da Expressão Gênica , Genoma , Metiltransferases/metabolismo , Anotação de Sequência Molecular , Família Multigênica , Motivos de Nucleotídeos/genética , Filogenia , Especificidade por Substrato
3.
BMC Genomics ; 16: 924, 2015 Nov 11.
Artigo em Inglês | MEDLINE | ID: mdl-26560100

RESUMO

BACKGROUND: The increased multi-omics information on carefully phenotyped patients in studies of complex diseases requires novel methods for data integration. Unlike continuous intensity measurements from most omics data sets, phenome data contain clinical variables that are binary, ordinal and categorical. RESULTS: In this paper we introduce an integrative phenotyping framework (iPF) for disease subtype discovery. A feature topology plot was developed for effective dimension reduction and visualization of multi-omics data. The approach is free of model assumption and robust to data noises or missingness. We developed a workflow to integrate homogeneous patient clustering from different omics data in an agglomerative manner and then visualized heterogeneous clustering of pairwise omics sources. We applied the framework to two batches of lung samples obtained from patients diagnosed with chronic obstructive lung disease (COPD) or interstitial lung disease (ILD) with well-characterized clinical (phenomic) data, mRNA and microRNA expression profiles. Application of iPF to the first training batch identified clusters of patients consisting of homogenous disease phenotypes as well as clusters with intermediate disease characteristics. Analysis of the second batch revealed a similar data structure, confirming the presence of intermediate clusters. Genes in the intermediate clusters were enriched with inflammatory and immune functional annotations, suggesting that they represent mechanistically distinct disease subphenotypes that may response to immunomodulatory therapies. The iPF software package and all source codes are publicly available. CONCLUSIONS: Identification of subclusters with distinct clinical and biomolecular characteristics suggests that integration of phenomic and other omics information could lead to identification of novel mechanism-based disease sub-phenotypes.


Assuntos
Biologia Computacional/métodos , Fenótipo , Algoritmos , Análise por Conglomerados , Simulação por Computador , Conjuntos de Dados como Assunto , Análise Discriminante , Genômica/métodos , Humanos , Pneumopatias/etiologia , Pneumopatias/metabolismo , Anotação de Sequência Molecular , Fluxo de Trabalho
4.
J Clin Invest ; 125(12): 4699-713, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26571397

RESUMO

Parasitic helminth worms, such as Schistosoma mansoni, are endemic in regions with a high prevalence of tuberculosis (TB) among the population. Human studies suggest that helminth coinfections contribute to increased TB susceptibility and increased rates of TB reactivation. Prevailing models suggest that T helper type 2 (Th2) responses induced by helminth infection impair Th1 immune responses and thereby limit Mycobacterium tuberculosis (Mtb) control. Using a pulmonary mouse model of Mtb infection, we demonstrated that S. mansoni coinfection or immunization with S. mansoni egg antigens can reversibly impair Mtb-specific T cell responses without affecting macrophage-mediated Mtb control. Instead, S. mansoni infection resulted in accumulation of high arginase-1-expressing macrophages in the lung, which formed type 2 granulomas and exacerbated inflammation in Mtb-infected mice. Treatment of coinfected animals with an antihelminthic improved Mtb-specific Th1 responses and reduced disease severity. In a genetically diverse mouse population infected with Mtb, enhanced arginase-1 activity was associated with increased lung inflammation. Moreover, in patients with pulmonary TB, lung damage correlated with increased serum activity of arginase-1, which was elevated in TB patients coinfected with helminths. Together, our data indicate that helminth coinfection induces arginase-1-expressing type 2 granulomas, thereby increasing inflammation and TB disease severity. These results also provide insight into the mechanisms by which helminth coinfections drive increased susceptibility, disease progression, and severity in TB.


Assuntos
Arginase/sangue , Pulmão/metabolismo , Macrófagos/enzimologia , Mycobacterium tuberculosis , Schistosoma mansoni , Esquistossomose mansoni/sangue , Tuberculose Pulmonar/sangue , Animais , Feminino , Granuloma/enzimologia , Granuloma/microbiologia , Granuloma/parasitologia , Granuloma/patologia , Humanos , Pulmão/microbiologia , Pulmão/parasitologia , Pulmão/patologia , Macrófagos/patologia , Masculino , Camundongos , Camundongos Transgênicos , Esquistossomose mansoni/microbiologia , Esquistossomose mansoni/patologia , Tuberculose Pulmonar/parasitologia , Tuberculose Pulmonar/patologia
5.
PeerJ ; 3: e1165, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26336640

RESUMO

Grouping large genomic fragments assembled from shotgun metagenomic sequences to deconvolute complex microbial communities, or metagenome binning, enables the study of individual organisms and their interactions. Because of the complex nature of these communities, existing metagenome binning methods often miss a large number of microbial species. In addition, most of the tools are not scalable to large datasets. Here we introduce automated software called MetaBAT that integrates empirical probabilistic distances of genome abundance and tetranucleotide frequency for accurate metagenome binning. MetaBAT outperforms alternative methods in accuracy and computational efficiency on both synthetic and real metagenome datasets. It automatically forms hundreds of high quality genome bins on a very large assembly consisting millions of contigs in a matter of hours on a single node. MetaBAT is open source software and available at https://bitbucket.org/berkeleylab/metabat.

6.
BMC Bioinformatics ; 15: 346, 2014 Nov 05.
Artigo em Inglês | MEDLINE | ID: mdl-25371041

RESUMO

BACKGROUND: In modern biomedical research of complex diseases, a large number of demographic and clinical variables, herein called phenomic data, are often collected and missing values (MVs) are inevitable in the data collection process. Since many downstream statistical and bioinformatics methods require complete data matrix, imputation is a common and practical solution. In high-throughput experiments such as microarray experiments, continuous intensities are measured and many mature missing value imputation methods have been developed and widely applied. Numerous methods for missing data imputation of microarray data have been developed. Large phenomic data, however, contain continuous, nominal, binary and ordinal data types, which void application of most methods. Though several methods have been developed in the past few years, not a single complete guideline is proposed with respect to phenomic missing data imputation. RESULTS: In this paper, we investigated existing imputation methods for phenomic data, proposed a self-training selection (STS) scheme to select the best imputation method and provide a practical guideline for general applications. We introduced a novel concept of "imputability measure" (IM) to identify missing values that are fundamentally inadequate to impute. In addition, we also developed four variations of K-nearest-neighbor (KNN) methods and compared with two existing methods, multivariate imputation by chained equations (MICE) and missForest. The four variations are imputation by variables (KNN-V), by subjects (KNN-S), their weighted hybrid (KNN-H) and an adaptively weighted hybrid (KNN-A). We performed simulations and applied different imputation methods and the STS scheme to three lung disease phenomic datasets to evaluate the methods. An R package "phenomeImpute" is made publicly available. CONCLUSIONS: Simulations and applications to real datasets showed that MICE often did not perform well; KNN-A, KNN-H and random forest were among the top performers although no method universally performed the best. Imputation of missing values with low imputability measures increased imputation errors greatly and could potentially deteriorate downstream analyses. The STS scheme was accurate in selecting the optimal method by evaluating methods in a second layer of missingness simulation. All source files for the simulation and the real data analyses are available on the author's publication website.


Assuntos
Métodos Epidemiológicos , Software , Algoritmos , Análise por Conglomerados , Biologia Computacional , Simulação por Computador , Conjuntos de Dados como Assunto , Humanos , Projetos de Pesquisa
7.
Bioinformatics ; 28(19): 2534-6, 2012 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-22863766

RESUMO

SUMMARY: With the rapid advances and prevalence of high-throughput genomic technologies, integrating information of multiple relevant genomic studies has brought new challenges. Microarray meta-analysis has become a frequently used tool in biomedical research. Little effort, however, has been made to develop a systematic pipeline and user-friendly software. In this article, we present MetaOmics, a suite of three R packages MetaQC, MetaDE and MetaPath, for quality control, differentially expressed gene identification and enriched pathway detection for microarray meta-analysis. MetaQC provides a quantitative and objective tool to assist study inclusion/exclusion criteria for meta-analysis. MetaDE and MetaPath were developed for candidate marker and pathway detection, which provide choices of marker detection, meta-analysis and pathway analysis methods. The system allows flexible input of experimental data, clinical outcome (case-control, multi-class, continuous or survival) and pathway databases. It allows missing values in experimental data and utilizes multi-core parallel computing for fast implementation. It generates informative summary output and visualization plots, operates on different operation systems and can be expanded to include new algorithms or combine different types of genomic data. This software suite provides a comprehensive tool to conveniently implement and compare various genomic meta-analysis pipelines. AVAILABILITY: http://www.biostat.pitt.edu/bioinfo/software.htm CONTACT: ctseng@pitt.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Perfilação da Expressão Gênica/métodos , Genômica/métodos , Análise em Microsséries/métodos , Software , Algoritmos , Biologia Computacional/métodos , Humanos , Masculino , Metanálise como Assunto , Neoplasias da Próstata/genética , Controle de Qualidade
8.
Nucleic Acids Res ; 40(2): e15, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22116060

RESUMO

Genomic meta-analysis to combine relevant and homogeneous studies has been widely applied, but the quality control (QC) and objective inclusion/exclusion criteria have been largely overlooked. Currently, the inclusion/exclusion criteria mostly depend on ad-hoc expert opinion or naïve threshold by sample size or platform. There are pressing needs to develop a systematic QC methodology as the decision of study inclusion greatly impacts the final meta-analysis outcome. In this article, we propose six quantitative quality control measures, covering internal homogeneity of coexpression structure among studies, external consistency of coexpression pattern with pathway database, and accuracy and consistency of differentially expressed gene detection or enriched pathway identification. Each quality control index is defined as the minus log transformed P values from formal hypothesis testing. Principal component analysis biplots and a standardized mean rank are applied to assist visualization and decision. We applied the proposed method to 4 large-scale examples, combining 7 brain cancer, 9 prostate cancer, 8 idiopathic pulmonary fibrosis and 17 major depressive disorder studies, respectively. The identified problematic studies were further scrutinized for potential technical or biological causes of their lower quality to determine their exclusion from meta-analysis. The application and simulation results concluded a systematic quality assessment framework for genomic meta-analysis.


Assuntos
Perfilação da Expressão Gênica/normas , Metanálise como Assunto , Análise de Sequência com Séries de Oligonucleotídeos/normas , Neoplasias Encefálicas/genética , Transtorno Depressivo/genética , Genômica , Humanos , Fibrose Pulmonar Idiopática/genética , Masculino , Análise de Componente Principal , Neoplasias da Próstata/genética , Controle de Qualidade
9.
PLoS One ; 6(1): e16161, 2011 Jan 13.
Artigo em Inglês | MEDLINE | ID: mdl-21249199

RESUMO

Tuberculosis (TB) is caused by the intracellular bacteria Mycobacterium tuberculosis, and kills more than 1.5 million people every year worldwide. Immunity to TB is associated with the accumulation of IFNγ-producing T helper cell type 1 (Th1) in the lungs, activation of M.tuberculosis-infected macrophages and control of bacterial growth. However, very little is known regarding the early immune responses that mediate accumulation of activated Th1 cells in the M.tuberculosis-infected lungs. To define the induction of early immune mediators in the M.tuberculosis-infected lung, we performed mRNA profiling studies and characterized immune cells in M.tuberculosis-infected lungs at early stages of infection in the mouse model. Our data show that induction of mRNAs involved in the recognition of pathogens, expression of inflammatory cytokines, activation of APCs and generation of Th1 responses occurs between day 15 and day 21 post infection. The induction of these mRNAs coincides with cellular accumulation of Th1 cells and activation of myeloid cells in M.tuberculosis-infected lungs. Strikingly, we show the induction of mRNAs associated with Gr1+ cells, namely neutrophils and inflammatory monocytes, takes place on day 12 and coincides with cellular accumulation of Gr1+ cells in M.tuberculosis-infected lungs. Interestingly, in vivo depletion of Gr1+ neutrophils between days 10-15 results in decreased accumulation of Th1 cells on day 21 in M.tuberculosis-infected lungs without impacting overall protective outcomes. These data suggest that the recruitment of Gr1+ neutrophils is an early event that leads to production of chemokines that regulate the accumulation of Th1 cells in the M.tuberculosis-infected lungs.


Assuntos
Pulmão/imunologia , Tuberculose Pulmonar/imunologia , Animais , Citocinas/biossíntese , Modelos Animais de Doenças , Perfilação da Expressão Gênica , Imunidade , Pulmão/microbiologia , Pulmão/patologia , Camundongos , Mycobacterium tuberculosis , Infiltração de Neutrófilos , RNA Mensageiro/análise , Células Th1/imunologia , Fatores de Tempo , Tuberculose Pulmonar/patologia
10.
Bioinformatics ; 27(1): 78-86, 2011 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-21045072

RESUMO

MOTIVATION: Microarray experiments frequently produce multiple missing values (MVs) due to flaws such as dust, scratches, insufficient resolution or hybridization errors on the chips. Unfortunately, many downstream algorithms require a complete data matrix. The motivation of this work is to determine the impact of MV imputation on downstream analysis, and whether ranking of imputation methods by imputation accuracy correlates well with the biological impact of the imputation. METHODS: Using eight datasets for differential expression (DE) and classification analysis and eight datasets for gene clustering, we demonstrate the biological impact of missing-value imputation on statistical downstream analyses, including three commonly employed DE methods, four classifiers and three gene-clustering methods. Correlation between the rankings of imputation methods based on three root-mean squared error (RMSE) measures and the rankings based on the downstream analysis methods was used to investigate which RMSE measure was most consistent with the biological impact measures, and which downstream analysis methods were the most sensitive to the choice of imputation procedure. RESULTS: DE was the most sensitive to the choice of imputation procedure, while classification was the least sensitive and clustering was intermediate between the two. The logged RMSE (LRMSE) measure had the highest correlation with the imputation rankings based on the DE results, indicating that the LRMSE is the best representative surrogate among the three RMSE-based measures. Bayesian principal component analysis and least squares adaptive appeared to be the best performing methods in the empirical downstream evaluation.


Assuntos
Perfilação da Expressão Gênica/métodos , Análise por Conglomerados , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Reprodutibilidade dos Testes
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...