Pesquisa | BVS Doenças Infecciosas e Parasitárias

1.

Efficient computation of Faith's phylogenetic diversity with applications in characterizing microbiomes.

Armstrong, George; Cantrell, Kalen; Huang, Shi; McDonald, Daniel; Haiminen, Niina; Carrieri, Anna Paola; Zhu, Qiyun; Gonzalez, Antonio; McGrath, Imran; Beck, Kristen L; Hakim, Daniel; Havulinna, Aki S; Méric, Guillaume; Niiranen, Teemu; Lahti, Leo; Salomaa, Veikko; Jain, Mohit; Inouye, Michael; Swafford, Austin D; Kim, Ho-Cheol; Parida, Laxmi; Vázquez-Baeza, Yoshiki; Knight, Rob.

Genome Res ; 31(11): 2131-2137, 2021 11.

Artigo em Inglês | MEDLINE | ID: mdl-34479875

RESUMO

The number of publicly available microbiome samples is continually growing. As data set size increases, bottlenecks arise in standard analytical pipelines. Faith's phylogenetic diversity (Faith's PD) is a highly utilized phylogenetic alpha diversity metric that has thus far failed to effectively scale to trees with millions of vertices. Stacked Faith's phylogenetic diversity (SFPhD) enables calculation of this widely adopted diversity metric at a much larger scale by implementing a computationally efficient algorithm. The algorithm reduces the amount of computational resources required, resulting in more accessible software with a reduced carbon footprint, as compared to previous approaches. The new algorithm produces identical results to the previous method. We further demonstrate that the phylogenetic aspect of Faith's PD provides increased power in detecting diversity differences between younger and older populations in the FINRISK study's metagenomic data.

Assuntos

Microbiota , Microbiota/genética , Filogenia

2.

Challenges in benchmarking metagenomic profilers.

Sun, Zheng; Huang, Shi; Zhang, Meng; Zhu, Qiyun; Haiminen, Niina; Carrieri, Anna Paola; Vázquez-Baeza, Yoshiki; Parida, Laxmi; Kim, Ho-Cheol; Knight, Rob; Liu, Yang-Yu.

Nat Methods ; 18(6): 618-626, 2021 06.

Artigo em Inglês | MEDLINE | ID: mdl-33986544

RESUMO

Accurate microbial identification and abundance estimation are crucial for metagenomics analysis. Various methods for classification of metagenomic data and estimation of taxonomic profiles, broadly referred to as metagenomic profilers, have been developed. Nevertheless, benchmarking of metagenomic profilers remains challenging because some tools are designed to report relative sequence abundance while others report relative taxonomic abundance. Here we show how misleading conclusions can be drawn by neglecting this distinction between relative abundance types when benchmarking metagenomic profilers. Moreover, we show compelling evidence that interchanging sequence abundance and taxonomic abundance will influence both per-sample summary statistics and cross-sample comparisons. We suggest that the microbiome research community pay attention to potentially misleading biological conclusions arising from this issue when benchmarking metagenomic profilers, by carefully considering the type of abundance data that were analyzed and interpreted and clearly stating the strategy used for metagenomic profiling.

Assuntos

Benchmarking/métodos , Metagenômica , Biologia Computacional/métodos , Perfilação da Expressão Gênica , Microbiota/genética , Análise de Sequência de DNA/métodos

3.

Utilizing stability criteria in choosing feature selection methods yields reproducible results in microbiome data.

Jiang, Lingjing; Haiminen, Niina; Carrieri, Anna-Paola; Huang, Shi; Vázquez-Baeza, Yoshiki; Parida, Laxmi; Kim, Ho-Cheol; Swafford, Austin D; Knight, Rob; Natarajan, Loki.

Biometrics ; 78(3): 1155-1167, 2022 09.

Artigo em Inglês | MEDLINE | ID: mdl-33914902

RESUMO

Feature selection is indispensable in microbiome data analysis, but it can be particularly challenging as microbiome data sets are high dimensional, underdetermined, sparse and compositional. Great efforts have recently been made on developing new methods for feature selection that handle the above data characteristics, but almost all methods were evaluated based on performance of model predictions. However, little attention has been paid to address a fundamental question: how appropriate are those evaluation criteria? Most feature selection methods often control the model fit, but the ability to identify meaningful subsets of features cannot be evaluated simply based on the prediction accuracy. If tiny changes to the data would lead to large changes in the chosen feature subset, then many selected features are likely to be a data artifact rather than real biological signal. This crucial need of identifying relevant and reproducible features motivated the reproducibility evaluation criterion such as Stability, which quantifies how robust a method is to perturbations in the data. In our paper, we compare the performance of popular model prediction metrics (MSE or AUC) with proposed reproducibility criterion Stability in evaluating four widely used feature selection methods in both simulations and experimental microbiome applications with continuous or binary outcomes. We conclude that Stability is a preferred feature selection criterion over model prediction metrics because it better quantifies the reproducibility of the feature selection method.

Assuntos

Microbiota , Algoritmos , Reprodutibilidade dos Testes

4.

Haplotype assembly of autotetraploid potato using integer linear programing.

Siragusa, Enrico; Haiminen, Niina; Finkers, Richard; Visser, Richard; Parida, Laxmi.

Bioinformatics ; 35(18): 3279-3286, 2019 09 15.

Artigo em Inglês | MEDLINE | ID: mdl-30689725

RESUMO

SUMMARY: Haplotype assembly of polyploids is an open issue in plant genomics. Recent experimental studies on highly heterozygous autotetraploid potato have shown that available methods do not deliver satisfying results in practice. We propose an optimal method to assemble haplotypes of highly heterozygous polyploids from Illumina short-sequencing reads. Our method is based on a generalization of the existing minimum fragment removal model to the polyploid case and on new integer linear programs to reconstruct optimal haplotypes. We validate our methods experimentally by means of a combined evaluation on simulated and experimental data based on 83 previously sequenced autotetraploid potato cultivars. Results on simulated data show that our methods produce highly accurate haplotype assemblies, while results on experimental data confirm a sensible improvement over the state of the art. AVAILABILITY AND IMPLEMENTATION: Executables for Linux at http://github.com/Computational Genomics/HaplotypeAssembler. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Solanum tuberosum , Algoritmos , Haplótipos , Programação Linear , Análise de Sequência de DNA , Software

5.

Signal enrichment with strain-level resolution in metagenomes using topological data analysis.

Guzmán-Sáenz, Aldo; Haiminen, Niina; Basu, Saugata; Parida, Laxmi.

BMC Genomics ; 20(Suppl 2): 194, 2019 Apr 04.

Artigo em Inglês | MEDLINE | ID: mdl-30967115

RESUMO

BACKGROUND: A metagenome is a collection of genomes, usually in a micro-environment, and sequencing a metagenomic sample en masse is a powerful means for investigating the community of the constituent microorganisms. One of the challenges is in distinguishing between similar organisms due to rampant multiple possible assignments of sequencing reads, resulting in false positive identifications. We map the problem to a topological data analysis (TDA) framework that extracts information from the geometric structure of data. Here the structure is defined by multi-way relationships between the sequencing reads using a reference database. RESULTS: Based primarily on the patterns of co-mapping of the reads to multiple organisms in the reference database, we use two models: one a subcomplex of a Barycentric subdivision complex and the other a Cech complex. The Barycentric subcomplex allows a natural mapping of the reads along with their coverage of organisms while the Cech complex takes simply the number of reads into account to map the problem to homology computation. Using simulated genome mixtures we show not just enrichment of signal but also microbe identification with strain-level resolution. CONCLUSIONS: In particular, in the most refractory of cases where alternative algorithms that exploit unique reads (i.e., mapped to unique organisms) fail, we show that the TDA approach continues to show consistent performance. The Cech model that uses less information is equally effective, suggesting that even partial information when augmented with the appropriate structure is quite powerful.

Assuntos

Algoritmos , Bactérias/classificação , Bactérias/genética , Análise de Dados , Metagenoma , Metagenômica/métodos , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala

6.

Transcriptome characterization and differentially expressed genes under flooding and drought stress in the biomass grasses Phalaris arundinacea and Dactylis glomerata.

Klaas, Manfred; Haiminen, Niina; Grant, Jim; Cormican, Paul; Finnan, John; Arojju, Sai Krishna; Utro, Filippo; Vellani, Tia; Parida, Laxmi; Barth, Susanne.

Ann Bot ; 124(4): 717-730, 2019 10 29.

Artigo em Inglês | MEDLINE | ID: mdl-31241131

RESUMO

BACKGROUND AND AIMS: Perennial grasses are a global resource as forage, and for alternative uses in bioenergy and as raw materials for the processing industry. Marginal lands can be valuable for perennial biomass grass production, if perennial biomass grasses can cope with adverse abiotic environmental stresses such as drought and waterlogging. METHODS: In this study, two perennial grass species, reed canary grass (Phalaris arundinacea) and cocksfoot (Dactylis glomerata) were subjected to drought and waterlogging stress to study their responses for insights to improving environmental stress tolerance. Physiological responses were recorded, reference transcriptomes established and differential gene expression investigated between control and stress conditions. We applied a robust non-parametric method, RoDEO, based on rank ordering of transcripts to investigate differential gene expression. Furthermore, we extended and validated vRoDEO for comparing samples with varying sequencing depths. KEY RESULTS: This allowed us to identify expressed genes under drought and waterlogging whilst using only a limited number of RNA sequencing experiments. Validating the methodology, several differentially expressed candidate genes involved in the stage 3 step-wise scheme in detoxification and degradation of xenobiotics were recovered, while several novel stress-related genes classified as of unknown function were discovered. CONCLUSIONS: Reed canary grass is a species coping particularly well with flooding conditions, but this study adds novel information on how its transcriptome reacts under drought stress. We built extensive transcriptomes for the two investigated C3 species cocksfoot and reed canary grass under both extremes of water stress to provide a clear comparison amongst the two species to broaden our horizon for comparative studies, but further confirmation of the data would be ideal to obtain a more detailed picture.

Assuntos

Secas , Phalaris , Biomassa , Dactylis , Estresse Fisiológico , Transcriptoma

7.

SimBA: simulation algorithm to fit extant-population distributions.

Parida, Laxmi; Haiminen, Niina.

BMC Bioinformatics ; 16: 82, 2015 Mar 14.

Artigo em Inglês | MEDLINE | ID: mdl-25886895

RESUMO

BACKGROUND: Simulation of populations with specified characteristics such as allele frequencies, linkage disequilibrium etc., is an integral component of many studies, including in-silico breeding optimization. Since the accuracy and sensitivity of population simulation is critical to the quality of the output of the applications that use them, accurate algorithms are required to provide a strong foundation to the methods in these studies. RESULTS: In this paper we present SimBA (Simulation using Best-fit Algorithm) a non-generative approach, based on a combination of stochastic techniques and discrete methods. We optimize a hill climbing algorithm and extend the framework to include multiple subpopulation structures. Additionally, we show that SimBA is very sensitive to the input specifications, i.e., very similar but distinct input characteristics result in distinct outputs with high fidelity to the specified distributions. This property of the simulation is not explicitly modeled or studied by previous methods. CONCLUSIONS: We show that SimBA outperforms the existing population simulation methods, both in terms of accuracy as well as time-efficiency. Not only does it construct populations that meet the input specifications more stringently than other published methods, SimBA is also easy to use. It does not require explicit parameter adaptations or calibrations. Also, it can work with input specified as distributions, without an exemplar matrix or population as required by some methods. SimBA is available at http://researcher.ibm.com/project/5669 .

Assuntos

Algoritmos , Simulação por Computador , Genética Populacional , Modelos Teóricos , Dinâmica Populacional , Conservação dos Recursos Naturais , Frequência do Gene , Humanos , Desequilíbrio de Ligação

8.

Haplotype assembly of autotetraploid potato using integer linear programing.

Siragusa, Enrico; Haiminen, Niina; Finkers, Richard; Visser, Richard; Parida, Laxmi.

Bioinformatics ; 35(21): 4534, 2019 Nov 01.

Artigo em Inglês | MEDLINE | ID: mdl-31280288

9.

Comparative exomics of Phalaris cultivars under salt stress.

Haiminen, Niina; Klaas, Manfred; Zhou, Zeyu; Utro, Filippo; Cormican, Paul; Didion, Thomas; Jensen, Christian; Mason, Christopher E; Barth, Susanne; Parida, Laxmi.

BMC Genomics ; 15 Suppl 6: S18, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25573273

RESUMO

BACKGROUND: Reed canary grass (Phalaris arundinacea) is an economically important forage and bioenergy grass of the temperate regions of the world. Despite its economic importance, it is lacking in public genomic data. We explore comparative exomics of the grass cultivars in the context of response to salt exposure. The limited data set poses challenges to the computational pipeline. METHODS: As a prerequisite for the comparative study, we generate the Phalaris reference transcriptome sequence, one of the first steps in addressing the issue of paucity of processed genomic data in this species. In addition, the differential expression (DE) and active-but-stable genes for salt stress conditions were analyzed by a novel method that was experimentally verified on human RNA-seq data. For the comparative exomics, we focus on the DE and stable genic regions, with respect to salt stress, of the genome. RESULTS AND CONCLUSIONS: In our comparative study, we find that phylogeny of the DE and stable genic regions of the Phalaris cultivars are distinct. At the same time we find the phylogeny of the entire expressed reference transcriptome matches the phylogeny of only the stable genes. Thus the behavior of the different cultivars is distinguished by the salt stress response. This is also reflected in the genomic distinctions in the DE genic regions. These observations have important implications in the choice of cultivars, and their breeding, for bio-energy fuels. Further, we identified genes that are representative of DE under salt stress and could provide vital clues in our understanding of the stress handling mechanisms in general.

Assuntos

Exoma , Genômica/métodos , Phalaris/genética , Tolerância ao Sal/genética , Estresse Fisiológico/genética , Algoritmos , Perfilação da Expressão Gênica , Regulação da Expressão Gênica de Plantas , Sequenciamento de Nucleotídeos em Larga Escala , Fenótipo , Transcriptoma

10.

GenomicTools: a computational platform for developing high-throughput analytics in genomics.

Tsirigos, Aristotelis; Haiminen, Niina; Bilal, Erhan; Utro, Filippo.

Bioinformatics ; 28(2): 282-3, 2012 Jan 15.

Artigo em Inglês | MEDLINE | ID: mdl-22113082

RESUMO

MOTIVATION: Recent advances in sequencing technology have resulted in the dramatic increase of sequencing data, which, in turn, requires efficient management of computational resources, such as computing time, memory requirements as well as prototyping of computational pipelines. RESULTS: We present GenomicTools, a flexible computational platform, comprising both a command-line set of tools and a C++ API, for the analysis and manipulation of high-throughput sequencing data such as DNA-seq, RNA-seq, ChIP-seq and MethylC-seq. GenomicTools implements a variety of mathematical operations between sets of genomic regions thereby enabling the prototyping of computational pipelines that can address a wide spectrum of tasks ranging from pre-processing and quality control to meta-analyses. Additionally, the GenomicTools platform is designed to analyze large datasets of any size by minimizing memory requirements. In practical applications, where comparable, GenomicTools outperforms existing tools in terms of both time and memory usage. AVAILABILITY: The GenomicTools platform (version 2.0.0) was implemented in C++. The source code, documentation, user manual, example datasets and scripts are available online at http://code.google.com/p/ibm-cbc-genomic-tools.

Assuntos

Genômica/métodos , Software , Biologia Computacional/métodos , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos

11.

iXora: exact haplotype inferencing and trait association.

Utro, Filippo; Haiminen, Niina; Livingstone, Donald; Cornejo, Omar E; Royaert, Stefan; Schnell, Raymond J; Motamayor, Juan Carlos; Kuhn, David N; Parida, Laxmi.

BMC Genet ; 14: 48, 2013 Jun 06.

Artigo em Inglês | MEDLINE | ID: mdl-23742238

RESUMO

BACKGROUND: We address the task of extracting accurate haplotypes from genotype data of individuals of large F1 populations for mapping studies. While methods for inferring parental haplotype assignments on large F1 populations exist in theory, these approaches do not work in practice at high levels of accuracy. RESULTS: We have designed iXora (Identifying crossovers and recombining alleles), a robust method for extracting reliable haplotypes of a mapping population, as well as parental haplotypes, that runs in linear time. Each allele in the progeny is assigned not just to a parent, but more precisely to a haplotype inherited from the parent. iXora shows an improvement of at least 15% in accuracy over similar systems in literature. Furthermore, iXora provides an easy-to-use, comprehensive environment for association studies and hypothesis checking in populations of related individuals. CONCLUSIONS: iXora provides detailed resolution in parental inheritance, along with the capability of handling very large populations, which allows for accurate haplotype extraction and trait association. iXora is available for non-commercial use from http://researcher.ibm.com/project/3430.

Assuntos

Haplótipos , Locos de Características Quantitativas , Troca Genética , Humanos , Recombinação Genética

12.

Phylogeny-Aware Analysis of Metagenome Community Ecology Based on Matched Reference Genomes while Bypassing Taxonomy.

Zhu, Qiyun; Huang, Shi; Gonzalez, Antonio; McGrath, Imran; McDonald, Daniel; Haiminen, Niina; Armstrong, George; Vázquez-Baeza, Yoshiki; Yu, Julian; Kuczynski, Justin; Sepich-Poore, Gregory D; Swafford, Austin D; Das, Promi; Shaffer, Justin P; Lejzerowicz, Franck; Belda-Ferre, Pedro; Havulinna, Aki S; Méric, Guillaume; Niiranen, Teemu; Lahti, Leo; Salomaa, Veikko; Kim, Ho-Cheol; Jain, Mohit; Inouye, Michael; Gilbert, Jack A; Knight, Rob.

mSystems ; 7(2): e0016722, 2022 04 26.

Artigo em Inglês | MEDLINE | ID: mdl-35369727

RESUMO

We introduce the operational genomic unit (OGU) method, a metagenome analysis strategy that directly exploits sequence alignment hits to individual reference genomes as the minimum unit for assessing the diversity of microbial communities and their relevance to environmental factors. This approach is independent of taxonomic classification, granting the possibility of maximal resolution of community composition, and organizes features into an accurate hierarchy using a phylogenomic tree. The outputs are suitable for contemporary analytical protocols for community ecology, differential abundance, and supervised learning while supporting phylogenetic methods, such as UniFrac and phylofactorization, that are seldom applied to shotgun metagenomics despite being prevalent in 16S rRNA gene amplicon studies. As demonstrated in two real-world case studies, the OGU method produces biologically meaningful patterns from microbiome data sets. Such patterns further remain detectable at very low metagenomic sequencing depths. Compared with taxonomic unit-based analyses implemented in currently adopted metagenomics tools, and the analysis of 16S rRNA gene amplicon sequence variants, this method shows superiority in informing biologically relevant insights, including stronger correlation with body environment and host sex on the Human Microbiome Project data set and more accurate prediction of human age by the gut microbiomes of Finnish individuals included in the FINRISK 2002 cohort. We provide Woltka, a bioinformatics tool to implement this method, with full integration with the QIIME 2 package and the Qiita web platform, to facilitate adoption of the OGU method in future metagenomics studies. IMPORTANCE Shotgun metagenomics is a powerful, yet computationally challenging, technique compared to 16S rRNA gene amplicon sequencing for decoding the composition and structure of microbial communities. Current analyses of metagenomic data are primarily based on taxonomic classification, which is limited in feature resolution. To solve these challenges, we introduce operational genomic units (OGUs), which are the individual reference genomes derived from sequence alignment results, without further assigning them taxonomy. The OGU method advances current read-based metagenomics in two dimensions: (i) providing maximal resolution of community composition and (ii) permitting use of phylogeny-aware tools. Our analysis of real-world data sets shows that it is advantageous over currently adopted metagenomic analysis methods and the finest-grained 16S rRNA analysis methods in predicting biological traits. We thus propose the adoption of OGUs as an effective practice in metagenomic studies.

Assuntos

Metagenoma , Microbiota , Humanos , Filogenia , RNA Ribossômico 16S/genética , Ecologia

13.

Standardized multi-omics of Earth's microbiomes reveals microbial and metabolite diversity.

Shaffer, Justin P; Nothias, Louis-Félix; Thompson, Luke R; Sanders, Jon G; Salido, Rodolfo A; Couvillion, Sneha P; Brejnrod, Asker D; Lejzerowicz, Franck; Haiminen, Niina; Huang, Shi; Lutz, Holly L; Zhu, Qiyun; Martino, Cameron; Morton, James T; Karthikeyan, Smruthi; Nothias-Esposito, Mélissa; Dührkop, Kai; Böcker, Sebastian; Kim, Hyun Woo; Aksenov, Alexander A; Bittremieux, Wout; Minich, Jeremiah J; Marotz, Clarisse; Bryant, MacKenzie M; Sanders, Karenina; Schwartz, Tara; Humphrey, Greg; Vásquez-Baeza, Yoshiki; Tripathi, Anupriya; Parida, Laxmi; Carrieri, Anna Paola; Beck, Kristen L; Das, Promi; González, Antonio; McDonald, Daniel; Ladau, Joshua; Karst, Søren M; Albertsen, Mads; Ackermann, Gail; DeReus, Jeff; Thomas, Torsten; Petras, Daniel; Shade, Ashley; Stegen, James; Song, Se Jin; Metz, Thomas O; Swafford, Austin D; Dorrestein, Pieter C; Jansson, Janet K; Gilbert, Jack A.

Nat Microbiol ; 7(12): 2128-2150, 2022 12.

Artigo em Inglês | MEDLINE | ID: mdl-36443458

RESUMO

Despite advances in sequencing, lack of standardization makes comparisons across studies challenging and hampers insights into the structure and function of microbial communities across multiple habitats on a planetary scale. Here we present a multi-omics analysis of a diverse set of 880 microbial community samples collected for the Earth Microbiome Project. We include amplicon (16S, 18S, ITS) and shotgun metagenomic sequence data, and untargeted metabolomics data (liquid chromatography-tandem mass spectrometry and gas chromatography mass spectrometry). We used standardized protocols and analytical methods to characterize microbial communities, focusing on relationships and co-occurrences of microbially related metabolites and microbial taxa across environments, thus allowing us to explore diversity at extraordinary scale. In addition to a reference database for metagenomic and metabolomic data, we provide a framework for incorporating additional studies, enabling the expansion of existing knowledge in the form of an evolving community resource. We demonstrate the utility of this database by testing the hypothesis that every microbe and metabolite is everywhere but the environment selects. Our results show that metabolite diversity exhibits turnover and nestedness related to both microbial communities and the environment, whereas the relative abundances of microbially related metabolites vary and co-occur with specific microbial consortia in a habitat-specific manner. We additionally show the power of certain chemistry, in particular terpenoids, in distinguishing Earth's environments (for example, terrestrial plant surfaces and soils, freshwater and marine animal stool), as well as that of certain microbes including Conexibacter woesei (terrestrial soils), Haloquadratum walsbyi (marine deposits) and Pantoea dispersa (terrestrial plant detritus). This Resource provides insight into the taxa and metabolites within microbial communities from diverse habitats across Earth, informing both microbial and chemical ecology, and provides a foundation and methods for multi-omics microbiome studies of hosts and the environment.

Assuntos

Microbiota , Animais , Microbiota/genética , Metagenoma , Metagenômica , Planeta Terra , Solo

14.

Randomization techniques for assessing the significance of gene periodicity results.

Kallio, Aleksi; Vuokko, Niko; Ojala, Markus; Haiminen, Niina; Mannila, Heikki.

BMC Bioinformatics ; 12: 330, 2011 Aug 09.

Artigo em Inglês | MEDLINE | ID: mdl-21827656

RESUMO

BACKGROUND: Modern high-throughput measurement technologies such as DNA microarrays and next generation sequencers produce extensive datasets. With large datasets the emphasis has been moving from traditional statistical tests to new data mining methods that are capable of detecting complex patterns, such as clusters, regulatory networks, or time series periodicity. Study of periodic gene expression is an interesting research question that also is a good example of challenges involved in the analysis of high-throughput data in general. Unlike for classical statistical tests, the distribution of test statistic for data mining methods cannot be derived analytically. RESULTS: We describe the randomization based approach to significance testing, and show how it can be applied to detect periodically expressed genes. We present four randomization methods, three of which have previously been used for gene cycle data. We propose a new method for testing significance of periodicity in gene expression short time series data, such as from gene cycle and circadian clock studies. We argue that the underlying assumptions behind existing significance testing approaches are problematic and some of them unrealistic. We analyze the theoretical properties of the existing and proposed methods, showing how our method can be robustly used to detect genes with exceptionally high periodicity. We also demonstrate the large differences in the number of significant results depending on the chosen randomization methods and parameters of the testing framework.By reanalyzing gene cycle data from various sources, we show how previous estimates on the number of gene cycle controlled genes are not supported by the data. Our randomization approach combined with widely adopted Benjamini-Hochberg multiple testing method yields better predictive power and produces more accurate null distributions than previous methods. CONCLUSIONS: Existing methods for testing significance of periodic gene expression patterns are simplistic and optimistic. Our testing framework allows strict levels of statistical significance with more realistic underlying assumptions, without losing predictive power. As DNA microarrays have now become mainstream and new high-throughput methods are rapidly being adopted, we argue that not only there will be need for data mining methods capable of coping with immense datasets, but there will also be need for solid methods for significance testing.

Assuntos

Mineração de Dados/métodos , Regulação da Expressão Gênica , Periodicidade , Relógios Circadianos , Análise por Conglomerados , Perfilação da Expressão Gênica , Análise de Sequência com Séries de Oligonucleotídeos

15.

Assessing pooled BAC and whole genome shotgun strategies for assembly of complex genomes.

Haiminen, Niina; Feltus, F Alex; Parida, Laxmi.

BMC Genomics ; 12: 194, 2011 Apr 15.

Artigo em Inglês | MEDLINE | ID: mdl-21496274

RESUMO

BACKGROUND: We investigate if pooling BAC clones and sequencing the pools can provide for more accurate assembly of genome sequences than the "whole genome shotgun" (WGS) approach. Furthermore, we quantify this accuracy increase. We compare the pooled BAC and WGS approaches using in silico simulations. Standard measures of assembly quality focus on assembly size and fragmentation, which are desirable for large whole genome assemblies. We propose additional measures enabling easy and visual comparison of assembly quality, such as rearrangements and redundant sequence content, relative to the known target sequence. RESULTS: The best assembly quality scores were obtained using 454 coverage of 15× linear and 5× paired (3kb insert size) reads (15L-5P) on Arabidopsis. This regime gave similarly good results on four additional plant genomes of very different GC and repeat contents. BAC pooling improved assembly scores over WGS assembly, coverage and redundancy scores improving the most. CONCLUSIONS: BAC pooling works better than WGS, however, both require a physical map to order the scaffolds. Pool sizes up to 12Mbp work well, suggesting this pooling density to be effective in medium-scale re-sequencing applications such as targeted sequencing of QTL intervals for candidate gene discovery. Assuming the current Roche/454 Titanium sequencing limitations, a 12 Mbp region could be re-sequenced with a full plate of linear reads and a half plate of paired-end reads, yielding 15L-5P coverage after read pre-processing. Our simulation suggests that massively over-sequencing may not improve accuracy. Our scoring measures can be used generally to evaluate and compare results of simulated genome assemblies.

Assuntos

Cromossomos Artificiais Bacterianos/genética , Genômica/métodos , Arabidopsis/genética , Pareamento de Bases , Genoma de Planta/genética , Biblioteca Genômica , Genômica/normas , Padrões de Referência , Análise de Sequência de DNA

16.

Sequencing of a QTL-rich region of the Theobroma cacao genome using pooled BACs and the identification of trait specific candidate genes.

Feltus, Frank A; Saski, Christopher A; Mockaitis, Keithanne; Haiminen, Niina; Parida, Laxmi; Smith, Zachary; Ford, James; Staton, Margaret E; Ficklin, Stephen P; Blackmon, Barbara P; Cheng, Chun-Huai; Schnell, Raymond J; Kuhn, David N; Motamayor, Juan-Carlos.

BMC Genomics ; 12: 379, 2011 Jul 27.

Artigo em Inglês | MEDLINE | ID: mdl-21794110

RESUMO

BACKGROUND: BAC-based physical maps provide for sequencing across an entire genome or a selected sub-genomic region of biological interest. Such a region can be approached with next-generation whole-genome sequencing and assembly as if it were an independent small genome. Using the minimum tiling path as a guide, specific BAC clones representing the prioritized genomic interval are selected, pooled, and used to prepare a sequencing library. RESULTS: This pooled BAC approach was taken to sequence and assemble a QTL-rich region, of ~3 Mbp and represented by twenty-seven BACs, on linkage group 5 of the Theobroma cacao cv. Matina 1-6 genome. Using various mixtures of read coverages from paired-end and linear 454 libraries, multiple assemblies of varied quality were generated. Quality was assessed by comparing the assembly of 454 reads with a subset of ten BACs individually sequenced and assembled using Sanger reads. A mixture of reads optimal for assembly was identified. We found, furthermore, that a quality assembly suitable for serving as a reference genome template could be obtained even with a reduced depth of sequencing coverage. Annotation of the resulting assembly revealed several genes potentially responsible for three T. cacao traits: black pod disease resistance, bean shape index, and pod weight. CONCLUSIONS: Our results, as with other pooled BAC sequencing reports, suggest that pooling portions of a minimum tiling path derived from a BAC-based physical map is an effective method to target sub-genomic regions for sequencing. While we focused on a single QTL region, other QTL regions of importance could be similarly sequenced allowing for biological discovery to take place before a high quality whole-genome assembly is completed.

Assuntos

Cacau/genética , Cromossomos Artificiais Bacterianos , Genoma de Planta , Locos de Características Quantitativas , Biblioteca Genômica , Alinhamento de Sequência , Análise de Sequência de DNA

17.

Functional profiling of COVID-19 respiratory tract microbiomes.

Haiminen, Niina; Utro, Filippo; Seabolt, Ed; Parida, Laxmi.

Sci Rep ; 11(1): 6433, 2021 03 19.

Artigo em Inglês | MEDLINE | ID: mdl-33742096

RESUMO

In response to the ongoing global pandemic, characterizing the molecular-level host interactions of the new coronavirus SARS-CoV-2 responsible for COVID-19 has been at the center of unprecedented scientific focus. However, when the virus enters the body it also interacts with the micro-organisms already inhabiting the host. Understanding the virus-host-microbiome interactions can yield additional insights into the biological processes perturbed by viral invasion. Alterations in the gut microbiome species and metabolites have been noted during respiratory viral infections, possibly impacting the lungs via gut-lung microbiome crosstalk. To better characterize microbial functions in the lower respiratory tract during COVID-19 infection, we carry out a functional analysis of previously published metatranscriptome sequencing data of bronchoalveolar lavage fluid from eight COVID-19 cases, twenty-five community-acquired pneumonia patients, and twenty healthy controls. The functional profiles resulting from comparing the sequences against annotated microbial protein domains clearly separate the cohorts. By examining the associated metabolic pathways, distinguishing functional signatures in COVID-19 respiratory tract microbiomes are identified, including decreased potential for lipid metabolism and glycan biosynthesis and metabolism pathways, and increased potential for carbohydrate metabolism pathways. The results include overlap between previous studies on COVID-19 microbiomes, including decrease in the glycosaminoglycan degradation pathway and increase in carbohydrate metabolism. The results also suggest novel connections to consider, possibly specific to the lower respiratory tract microbiome, calling for further research on microbial functions and host-microbiome interactions during SARS-CoV-2 infection.

Assuntos

COVID-19/microbiologia , Interações Microbianas , Microbiota , Sistema Respiratório/microbiologia , SARS-CoV-2/fisiologia , Líquido da Lavagem Broncoalveolar/microbiologia , Humanos , Pulmão/microbiologia

18.

Re-purposing software for functional characterization of the microbiome.

Gardiner, Laura-Jayne; Haiminen, Niina; Utro, Filippo; Parida, Laxmi; Seabolt, Ed; Krishna, Ritesh; Kaufman, James H.

Microbiome ; 9(1): 4, 2021 01 09.

Artigo em Inglês | MEDLINE | ID: mdl-33422152

RESUMO

BACKGROUND: Widespread bioinformatic resource development generates a constantly evolving and abundant landscape of workflows and software. For analysis of the microbiome, workflows typically begin with taxonomic classification of the microorganisms that are present in a given environment. Additional investigation is then required to uncover the functionality of the microbial community, in order to characterize its currently or potentially active biological processes. Such functional analysis of metagenomic data can be computationally demanding for high-throughput sequencing experiments. Instead, we can directly compare sequencing reads to a functionally annotated database. However, since reads frequently match multiple sequences equally well, analyses benefit from a hierarchical annotation tree, e.g. for taxonomic classification where reads are assigned to the lowest taxonomic unit. RESULTS: To facilitate functional microbiome analysis, we re-purpose well-known taxonomic classification tools to allow us to perform direct functional sequencing read classification with the added benefit of a functional hierarchy. To enable this, we develop and present a tree-shaped functional hierarchy representing the molecular function subset of the Gene Ontology annotation structure. We use this functional hierarchy to replace the standard phylogenetic taxonomy used by the classification tools and assign query sequences accurately to the lowest possible molecular function in the tree. We demonstrate this with simulated and experimental datasets, where we reveal new biological insights. CONCLUSIONS: We demonstrate that improved functional classification of metagenomic sequencing reads is possible by re-purposing a range of taxonomic classification tools that are already well-established, in conjunction with either protein or nucleotide reference databases. We leverage the advances in speed, accuracy and efficiency that have been made for taxonomic classification and translate these benefits for the rapid functional classification of microbiomes. While we focus on a specific set of commonly used methods, the functional annotation approach has broad applicability across other sequence classification tools. We hope that re-purposing becomes a routine consideration during bioinformatic resource development. Video abstract.

Assuntos

Classificação/métodos , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Metagenoma/genética , Metagenômica/métodos , Microbiota/genética , Software , Filogenia

19.

DNA Extraction and Host Depletion Methods Significantly Impact and Potentially Bias Bacterial Detection in a Biological Fluid.

Ganda, Erika; Beck, Kristen L; Haiminen, Niina; Silverman, Justin D; Kawas, Ban; Cronk, Brittany D; Anderson, Renee R; Goodman, Laura B; Wiedmann, Martin.

mSystems ; 6(3): e0061921, 2021 Jun 29.

Artigo em Inglês | MEDLINE | ID: mdl-34128697

RESUMO

Untargeted sequencing of nucleic acids present in food can inform the detection of food safety and origin, as well as product tampering and mislabeling issues. The application of such technologies to food analysis may reveal valuable insights that are simply unobtainable by targeted testing, leading to the efforts of applying such technologies in the food industry. However, before these approaches can be applied, it is imperative to verify that the most appropriate methods are used at every step of the process: gathering of primary material, laboratory methods, data analysis, and interpretation. The focus of this study is on gathering the primary material, in this case, DNA. We used bovine milk as a model to (i) evaluate commercially available kits for their ability to extract nucleic acids from inoculated bovine milk, (ii) evaluate host DNA depletion methods for use with milk, and (iii) develop and evaluate a selective lysis-propidium monoazide (PMA)-based protocol for host DNA depletion in milk. Our results suggest that magnetically based nucleic acid extraction methods are best for nucleic acid isolation of bovine milk. Removal of host DNA remains a challenge for untargeted sequencing of milk, highlighting the finding that the individual matrix characteristics should always be considered in food testing. Some reported methods introduce bias against specific types of microbes, which may be particularly problematic in food safety, where the detection of Gram-negative pathogens and hygiene indicators is essential. Continuous efforts are needed to develop and validate new approaches for untargeted metagenomics in samples with large amounts of DNA from a single host. IMPORTANCE Tracking the bacterial communities present in our food has the potential to inform food safety and product origin. To do so, the entire genetic material present in a sample is extracted using chemical methods or commercially available kits and sequenced using next-generation platforms to provide a snapshot of the microbial composition. Because the genetic material of higher organisms present in food (e.g., cow in milk or beef, wheat in flour) is around 1,000 times larger than the bacterial content, challenges exist in gathering the information of interest. Additionally, specific bacterial characteristics can make them easier or harder to detect, adding another layer of complexity to this issue. In this study, we demonstrate the impact of using different methods for the ability to detect specific bacteria and highlight the need to ensure that the most appropriate methods are being used for each particular sample.

20.

Monitoring the microbiome for food safety and quality using deep shotgun sequencing.

Beck, Kristen L; Haiminen, Niina; Chambliss, David; Edlund, Stefan; Kunitomi, Mark; Huang, B Carol; Kong, Nguyet; Ganesan, Balasubramanian; Baker, Robert; Markwell, Peter; Kawas, Ban; Davis, Matthew; Prill, Robert J; Krishnareddy, Harsha; Seabolt, Ed; Marlowe, Carl H; Pierre, Sophie; Quintanar, André; Parida, Laxmi; Dubois, Geraud; Kaufman, James; Weimer, Bart C.

NPJ Sci Food ; 5(1): 3, 2021 Feb 08.

Artigo em Inglês | MEDLINE | ID: mdl-33558514

RESUMO

In this work, we hypothesized that shifts in the food microbiome can be used as an indicator of unexpected contaminants or environmental changes. To test this hypothesis, we sequenced the total RNA of 31 high protein powder (HPP) samples of poultry meal pet food ingredients. We developed a microbiome analysis pipeline employing a key eukaryotic matrix filtering step that improved microbe detection specificity to >99.96% during in silico validation. The pipeline identified 119 microbial genera per HPP sample on average with 65 genera present in all samples. The most abundant of these were Bacteroides, Clostridium, Lactococcus, Aeromonas, and Citrobacter. We also observed shifts in the microbial community corresponding to ingredient composition differences. When comparing culture-based results for Salmonella with total RNA sequencing, we found that Salmonella growth did not correlate with multiple sequence analyses. We conclude that microbiome sequencing is useful to characterize complex food microbial communities, while additional work is required for predicting specific species' viability from total RNA sequencing.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA