RESUMEN
Profile hidden Markov models (pHMMs) are able to achieve high sensitivity in remote homology search, making them popular choices for detecting novel or highly diverged viruses in metagenomic data. However, many existing pHMM databases have different design focuses, making it difficult for users to decide the proper one to use. In this review, we provide a thorough evaluation and comparison for multiple commonly used profile HMM databases for viral sequence discovery in metagenomic data. We characterized the databases by comparing their sizes, their taxonomic coverage, and the properties of their models using quantitative metrics. Subsequently, we assessed their performance in virus identification across multiple application scenarios, utilizing both simulated and real metagenomic data. We aim to offer researchers a thorough and critical assessment of the strengths and limitations of different databases. Furthermore, based on the experimental results obtained from the simulated and real metagenomic data, we provided practical suggestions for users to optimize their use of pHMM databases, thus enhancing the quality and reliability of their findings in the field of viral metagenomics.
Asunto(s)
Cadenas de Markov , Metagenómica , Virus , Metagenómica/métodos , Virus/genética , Virus/clasificación , Bases de Datos Genéticas , Humanos , Biología Computacional/métodos , AlgoritmosRESUMEN
BACKGROUND: Metagenomic sequencing is an unbiased approach that can potentially detect all the known and unidentified strains in pathogen detection. Recently, nanopore sequencing has been emerging as a highly potential tool for rapid pathogen detection due to its fast turnaround time. However, identifying pathogen within species is nontrivial for nanopore sequencing data due to the high sequencing error rate. RESULTS: We developed the core gene alleles metagenome strain identification (cgMSI) tool, which uses a two-stage maximum a posteriori probability estimation method to detect pathogens at strain level from nanopore metagenomic sequencing data at low computational cost. The cgMSI tool can accurately identify strains and estimate relative abundance at 1× coverage. CONCLUSIONS: We developed cgMSI for nanopore metagenomic pathogen detection within species. cgMSI is available at https://github.com/ZHU-XU-xmu/cgMSI .
Asunto(s)
Secuenciación de Nanoporos , Nanoporos , Metagenoma , Alelos , Metagenómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodosRESUMEN
The fast accumulation of viral metagenomic data has contributed significantly to new RNA virus discovery. However, the short read size, complex composition, and large data size can all make taxonomic analysis difficult. In particular, commonly used alignment-based methods are not ideal choices for detecting new viral species. In this work, we present a novel hierarchical classification model named CHEER, which can conduct read-level taxonomic classification from order to genus for new species. By combining k-mer embedding-based encoding, hierarchically organized CNNs, and carefully trained rejection layer, CHEER is able to assign correct taxonomic labels for reads from new species. We tested CHEER on both simulated and real sequencing data. The results show that CHEER can achieve higher accuracy than popular alignment-based and alignment-free taxonomic assignment tools. The source code, scripts, and pre-trained parameters for CHEER are available via GitHub:https://github.com/KennthShang/CHEER.
Asunto(s)
Aprendizaje Profundo , Metagenómica/métodos , Virus ARN/genética , Clasificación , Virus ARN/clasificaciónRESUMEN
The advent of high-throughput technologies, such as 16s rDNA sequencing, has significantly contributed to expanding our knowledge of the microbiota composition of the genital tract during infections such as Chlamydia trachomatis. The growing body of metagenomic data can be further exploited to provide a functional characterization of microbial communities via several powerful computational approaches. Therefore, in this study, we investigated the predicted metabolic pathways of the cervicovaginal microbiota associated with C. trachomatis genital infection in relation to the different Community State Types (CSTs), via PICRUSt2 analysis. Our results showed a more rich and diverse mix of predicted metabolic pathways in women with a CST-IV microbiota as compared to all the other CSTs, independently from infection status. C. trachomatis genital infection further modified the metabolic profiles in women with a CST-IV microbiota and was characterized by increased prevalence of the pathways for the biosynthesis of precursor metabolites and energy, biogenic amino-acids, nucleotides, and tetrahydrofolate. Overall, predicted metabolic pathways might represent the starting point for more precisely designed future metabolomic studies, aiming to investigate the actual metabolic pathways characterizing C. trachomatis genital infection in the cervicovaginal microenvironment.
Asunto(s)
Infecciones por Chlamydia , Microbiota , Femenino , Humanos , Chlamydia trachomatis , Vagina/metabolismo , Infecciones por Chlamydia/epidemiología , Redes y Vías MetabólicasRESUMEN
One goal of human microbiome studies is to relate host traits with human microbiome compositions. The analysis of microbial community sequencing data presents great statistical challenges, especially when the samples have different library sizes and the data are overdispersed with many zeros. To address these challenges, we introduce a new statistical framework, called predictive analysis in metagenomics via inverse regression (PAMIR), to analyze microbiome sequencing data. Within this framework, an inverse regression model is developed for overdispersed microbiota counts given the trait, and then a prediction rule is constructed by taking advantage of the dimension-reduction structure in the model. An efficient Monte Carlo expectation-maximization algorithm is proposed for maximum likelihood estimation. The method is further generalized to accommodate other types of covariates. We demonstrate the advantages of PAMIR through simulations and two real data examples.
Asunto(s)
Microbiota/genética , Análisis de Secuencia , Algoritmos , Bacterias/genética , Humanos , Funciones de Verosimilitud , Método de Montecarlo , Análisis de RegresiónRESUMEN
Phenotypic characteristics while complimenting 16S rRNA gene sequencing in identifying bacteria become decisive in solving conflicts of equal % similarity of a given DNA sequence to more than one classified microorganisms. "Phenotypic light" may also indicate right direction when a new species' 16S rDNA sequence is under consideration. In fact 16S rRNA gene sequences give indication that either a novel species has been isolated or the test organism is identified. In each case additional tests are required for resolving different issues. Predictions of microbial phenotypes from metagenomic data depend heavily on our knowledge of expressed genes. Thus renaissance of microbial phenotypic characterization is likely to emerge at par with genotypic signatures. Interplay of these and other complimentary levels of analyses are likely to lead DNA barcoding for microorganisms as it has provided efficient methods for species-level identifications of animals and plants. In this review, an attempt has been made to realize the reader(s) importance of interplay of genotypic and phenotypic characteristics of bacteria for development of comprehensive and more stable classification schemes. It is expected that future valid classification schemes will be based on the phenetic relationships of microorganisms.
Asunto(s)
Bacterias/clasificación , Bacterias/genética , Técnicas de Tipificación Bacteriana/normas , Análisis de Secuencia de ADN/normas , Bacterias/química , Bacterias/aislamiento & purificación , Código de Barras del ADN Taxonómico/normas , ADN Bacteriano/genética , Genoma Bacteriano/genética , Genotipo , Metagenómica/normas , Fenotipo , ARN Ribosómico 16S/genéticaRESUMEN
Metagenomics has been greatly accelerated by the development of next-generation sequencing (NGS) technologies, which allow scientists to discover and describe novel microorganisms without the need for conventional culture techniques. Examining integrative bioinformatics methods used in viral interaction research, this study highlights metagenomic data from various contexts. Accurate viral identification depends on high-purity genetic material extraction, appropriate NGS platform selection, and sophisticated bioinformatics tools like VirPipe and VirFinder. The efficiency and precision of metagenomic analysis are further improved with the advent of AI-based techniques. The diversity and dynamics of viral communities are demonstrated by case studies from a variety of environments, emphasizing the seasonal and geographical variations that influence viral populations. In addition to speeding up the discovery of new viruses, metagenomics offers thorough understanding of virus-host interactions and their ecological effects. This review provides a promising framework for comprehending the complexity of viral communities and their interactions with hosts, highlighting the transformational potential of metagenomics and bioinformatics in viral research.
Asunto(s)
Biología Computacional , Minería de Datos , Secuenciación de Nucleótidos de Alto Rendimiento , Metagenoma , Metagenómica , Virus , Metagenómica/métodos , Virus/genética , Virus/clasificación , Biología Computacional/métodos , Humanos , AnimalesRESUMEN
In this study, we employed high-throughput metagenomic data to assemble the mitochondrial genome (mitogenome) of the European greenfinch (Chloris chloris; Linnaeus 1758). The circular mitogenome was 16,813 base pairs (bp) in length, containing 13 protein-coding genes (PCGs), 22 tRNA genes, 2 rRNA genes, and 1 control region. The base composition of the mitogenome is 30.6% A, 30.7% C, 14.2% G, and 24.5% T, resulting in a GC content of 44.9%. Phylogenetic analysis, utilizing the concatenation of the 13 mitochondrial PCGs from 32 related species of the order Passeriformes, indicated a closer relationship between C. chloris and C. sinica. Moreover, the genus Chloris was closely related to the genera of Serinus, Crithagra, Carduelis, and Acanthis. This mitogenomic data of C. chloris not only be helpful for species identification but also facilitates our understanding of the evolutionary relationship among different species in genus Chloris, which experienced rapid radiation evolution.
RESUMEN
With the development of sequencing technology and analytic tools, studying within-species variations enhances the understanding of microbial biological processes. Nevertheless, most existing methods designed for strain-level analysis lack the capability to concurrently assess both strain proportions and genome-wide single nucleotide variants (SNVs) across longitudinal metagenomic samples. In this study, we introduce LongStrain, an integrated pipeline for the analysis of large-scale metagenomic data from individuals with longitudinal or repeated samples. In LongStrain, we first utilize two efficient tools, Kraken2 and Bowtie2, for the taxonomic classification and alignment of sequencing reads, respectively. Subsequently, we propose to jointly model strain proportions and shared haplotypes across samples within individuals. This approach specifically targets tracking a primary strain and a secondary strain for each subject, providing their respective proportions and SNVs as output. With extensive simulation studies of a microbial community and single species, our results demonstrate that LongStrain is superior to two genotyping methods and two deconvolution methods across a majority of scenarios. Furthermore, we illustrate the potential applications of LongStrain in the real data analysis of The Environmental Determinants of Diabetes in the Young study and a gastric intestinal metaplasia microbiome study. In summary, the proposed analytic pipeline demonstrates marked statistical efficiency over the same type of methods and has great potential in understanding the genomic variants and dynamic changes at strain level. LongStrain and its tutorial are freely available online at https://github.com/BoyanZhou/LongStrain. IMPORTANCE: The advancement in DNA-sequencing technology has enabled the high-resolution identification of microorganisms in microbial communities. Since different microbial strains within species may contain extreme phenotypic variability (e.g., nutrition metabolism, antibiotic resistance, and pathogen virulence), investigating within-species variations holds great scientific promise in understanding the underlying mechanism of microbial biological processes. To fully utilize the shared genomic variants across longitudinal metagenomics samples collected in microbiome studies, we develop an integrated analytic pipeline (LongStrain) for longitudinal metagenomics data. It concurrently leverages the information on proportions of mapped reads for individual strains and genome-wide SNVs to enhance the efficiency and accuracy of strain identification. Our method helps to understand strains' dynamic changes and their association with genome-wide variants. Given the fast-growing longitudinal studies of microbial communities, LongStrain which streamlines analyses of large-scale raw sequencing data should be of great value in microbiome research communities.
Asunto(s)
Metagenómica , Metagenómica/métodos , Humanos , Estudios Longitudinales , Programas Informáticos , Microbiota/genética , Polimorfismo de Nucleótido Simple , Secuenciación de Nucleótidos de Alto Rendimiento , Metagenoma , Microbioma Gastrointestinal/genética , Bacterias/genética , Bacterias/clasificación , Bacterias/aislamiento & purificación , Biología Computacional/métodos , Análisis de Secuencia de ADN/métodosRESUMEN
Tracking nitrogen pollution sources is crucial for the effective management of water quality; however, it is a challenging task due to the complex contaminative scenarios in the freshwater systems. The contaminative pattern variations can induce quick responses of aquatic microorganisms, making them sensitive indicators of pollution origins. In this study, the soil and water assessment tool, accompanied by a detailed pollution source database, was used to detect the main nitrogen pollution sources in each sub-basin of the Liuyang River watershed. Thus, each sub-basin was assigned to a known class according to SWAT outputs, including point source pollution-dominated area, crop cultivation pollution-dominated area, and the septic tank pollution-dominated area. Based on these outputs, the random forest (RF) model was developed to predict the main pollution sources from different river ecosystems using a series of input variable groups (e.g., natural macroscopic characteristics, river physicochemical properties, 16S rRNA microbial taxonomic composition, microbial metagenomic data containing taxonomic and functional information, and their combination). The accuracy and the Kappa coefficient were used as the performance metrics for the RF model. Compared with the prediction performance among all the input variable groups, the prediction performance of the RF model was significantly improved using metagenomic indices as inputs. Among the metagenomic data-based models, the combination of the taxonomic information with functional information of all the species achieved the highest accuracy (0.84) and increased median Kappa coefficient (0.70). Feature importance analysis was used to identify key features that could serve as indicators for sudden pollution accidents and contribute to the overall function of the river system. The bacteria Rhabdochromatium marinum, Frankia, Actinomycetia, and Competibacteraceae were the most important species, whose mean decrease Gini indices were 0.0023, 0.0021, 0.0019, and 0.0018, respectively, although their relative abundances ranged only from 0.0004 to 0.1 %. Among the top 30 important variables, functional variables constituted more than half, demonstrating the remarkable variation in the microbial functions among sites with distinct pollution sources and the key role of functionality in predicting pollution sources. Many functional indicators related to the metabolism of Mycobacterium tuberculosis, such as K24693, K25621, K16048, and K14952, emerged as significant important factors in distinguishing nitrogen pollution origins. With the shortage of pollution source data in developing regions, this suggested approach offers an economical, quick, and accurate solution to locate the origins of water nitrogen pollution using the metagenomic data of microbial communities.
Asunto(s)
Microbiota , Contaminantes Químicos del Agua , Nitrógeno/análisis , Ríos/química , ARN Ribosómico 16S , Contaminación del Agua/análisis , Monitoreo del Ambiente , China , Contaminantes Químicos del Agua/análisisRESUMEN
Polycyclic aromatic hydrocarbons (PAHs) are highly toxic, carcinogenic substances. On soils contaminated with PAHs, crop cultivation, animal husbandry and even the survival of microflora in the soil are greatly perturbed, depending on the degree of contamination. Most microorganisms cannot tolerate PAH-contaminated soils, however, some microbial strains can adapt to these harsh conditions and survive on contaminated soils. Analysis of the metagenomes of contaminated environmental samples may lead to discovery of PAH-degrading enzymes suitable for green biotechnology methodologies ranging from biocatalysis to pollution control. In the present study, our goal was to apply a metagenomic data search to identify efficient novel enzymes in remediation of PAH-contaminated soils. The metagenomic hits were further analyzed using a set of bioinformatics tools to select protein sequences predicted to encode well-folded soluble enzymes. Three novel enzymes (two dioxygenases and one peroxidase) were cloned and used in soil remediation microcosms experiments. The experimental design of the present study aimed at evaluating the effectiveness of the novel enzymes on short-term PAH degradation in the soil microcosmos model. The novel enzymes were found to be efficient for degradation of naphthalene and phenanthrene. Adding the inorganic oxidant CaO2 further increased the degrading potential of the novel enzymes for anthracene and pyrene. We conclude that metagenome mining paired with bioinformatic predictions, structural modelling and functional assays constitutes a powerful approach towards novel enzymes for soil remediation.
Asunto(s)
Biodegradación Ambiental , Metagenómica , Hidrocarburos Policíclicos Aromáticos , Microbiología del Suelo , Contaminantes del Suelo , Metagenómica/métodos , Hidrocarburos Policíclicos Aromáticos/metabolismo , Contaminantes del Suelo/metabolismo , Suelo/química , Dioxigenasas/metabolismo , Dioxigenasas/genética , Dioxigenasas/química , Fenantrenos/metabolismo , Naftalenos/metabolismo , MetagenomaRESUMEN
BACKGROUND: Bacterial strains under the same species can exhibit different biological properties, making strain-level composition analysis an important step in understanding the dynamics of microbial communities. Metagenomic sequencing has become the major means for probing the microbial composition in host-associated or environmental samples. Although there are a plethora of composition analysis tools, they are not optimized to address the challenges in strain-level analysis: highly similar strain genomes and the presence of multiple strains under one species in a sample. Thus, this work aims to provide a high-resolution and more accurate strain-level analysis tool for short reads. RESULTS: In this work, we present a new strain-level composition analysis tool named StrainScan that employs a novel tree-based k-mers indexing structure to strike a balance between the strain identification accuracy and the computational complexity. We tested StrainScan extensively on a large number of simulated and real sequencing data and benchmarked StrainScan with popular strain-level analysis tools including Krakenuniq, StrainSeeker, Pathoscope2, Sigma, StrainGE, and StrainEst. The results show that StrainScan has higher accuracy and resolution than the state-of-the-art tools on strain-level composition analysis. It improves the F1 score by 20% in identifying multiple strains at the strain level. CONCLUSIONS: By using a novel k-mer indexing structure, StrainScan is able to provide strain-level analysis with higher resolution than existing tools, enabling it to return more informative strain composition analysis in one sample or across multiple samples. StrainScan takes short reads and a set of reference strains as input and its source codes are freely available at https://github.com/liaoherui/StrainScan . Video Abstract.
Asunto(s)
Microbiota , Microbiota/genética , Metagenoma/genética , Metagenómica , Programas InformáticosRESUMEN
Metagenomic datasets of host-associated microbial communities often contain host DNA that is usually discarded because the amount of data is too low for accurate host genetic analyses. However, genotype imputation can be employed to reconstruct host genotypes if a reference panel is available. Here, the performance of a two-step strategy is tested to impute genotypes from four types of reference panels built using different strategies to low-depth host genome data (≈2× coverage) recovered from intestinal samples of two chicken genetic lines. First, imputation accuracy is evaluated in 12 samples for which both low- and high-depth sequencing data are available, obtaining high imputation accuracies for all tested panels (>0.90). Second, the impact of reference panel choice in population genetics statistics on 100 chickens is assessed, all four panels yielding comparable results. In light of the observations, the feasibility and application of the applied imputation strategy are discussed for different species with regard to the host DNA proportion, genomic diversity, and availability of a reference panel. This method enables leveraging insofar discarded host DNA to get insights into the genetic structure of host populations, and in doing so, facilitates the implementation of hologenomic approaches that jointly analyze host and microbial genomic data.
RESUMEN
Bacteriophages, which are viruses infecting bacteria, are the most ubiquitous and diverse entities in the biosphere. There is accumulating evidence revealing their important roles in shaping the structure of various microbiomes. Thanks to (viral) metagenomic sequencing, a large number of new bacteriophages have been discovered. However, lacking a standard and automatic virus classification pipeline, the taxonomic characterization of new viruses seriously lag behind the sequencing efforts. In particular, according to the latest version of ICTV, several large phage families in the previous classification system are removed. Therefore, a comprehensive review and comparison of taxonomic classification tools under the new standard are needed to establish the state-of-the-art. In this work, we retrained and tested four recently published tools on newly labeled databases. We demonstrated their utilities and tested them on multiple datasets, including the RefSeq, short contigs, simulated metagenomic datasets, and low-similarity datasets. This study provides a comprehensive review of phage family classification in different scenarios and a practical guidance for choosing appropriate taxonomic classification pipelines. To our best knowledge, this is the first review conducted under the new ICTV classification framework. The results show that the new family classification framework overall leads to better conserved groups and thus makes family-level classification more feasible.
RESUMEN
The Lanna region, the main part of northern Thailand, is a place of ethnic diversity. In this study, we investigated phak-gard-dong (PGD), or pickled mustard green (Brassica juncea L. Czern.), for its beneficial bacteria content and to analyse the variations in bacterial compositions among the PGD of three different ethnolinguistic groups, the Karen, Lawa, and Shan. DNA was extracted from the PGD pickled brine, and 16S rRNA gene Illumina sequencing was performed. Metagenomic data were analysed and the results demonstrated that the dominant bacterial species were Weissella (54.2%, 65.0%, and 10.0%) and Lactobacillus (17.5%, 5.6%, and 79.1%) in the PGD of the Karen, Lawa, and Shan, respectively. Pediococcus was found only in the PGD of the Karen and Shan. Bacterial communities in PGD of the Lawa were distinctive from the other ethnic groups, both in the alpha and beta diversity, as well as the predicted functions of the bacterial communities. In addition, overall network analysis results were correlated to bacterial proportions in every ethnic PGD. We suggest that all ethnic PGDs have the potential to be a good source of beneficial bacteria, warranting its conservation and further development into health food products.
RESUMEN
The vaginal microbiota is less complex than the gut microbiota, and the colonization of Lactobacillus in the female vagina is considered to be critical for reproductive health. Oral probiotics have been suggested as promising means to modulate vaginal homeostasis in the general population. In this study, 60 Chinese women were followed for over a year before, during, and after treatment with the probiotics Lactobacillus rhamnosus GR-1 and Lactobacillusreuteri RC-14. Shotgun metagenomic data of 1334 samples from multiple body sites did not support a colonization route of the probiotics from the oral cavity to the intestinal tract and then to the vagina. Our analyses enable the classification of the cervicovaginal microbiome into a stable state and a state of dysbiosis. The microbiome in the stable group steadily maintained a relatively high abundance of Lactobacilli over one year, which was not affected by probiotic intake, whereas in the dysbiosis group, the microbiota was more diverse and changed markedly over time. Data from a subset of the dysbiosis group suggests this subgroup possibly benefited from supplementation with the probiotics, indicating that probiotics supplementation can be prescribed for women in a subclinical microbiome setting of dysbiosis, providing opportunities for targeted and personalized microbiome reconstitution.
Asunto(s)
MicrobiotaRESUMEN
Characterizing the metabolic functions of the gut microbiome in health and disease is pivotal for translating alterations in microbial composition into clinical insights. Two major analysis paradigms have been used to explore the metabolic functions of the microbiome but not systematically integrated with each other: statistical screening approaches, such as metabolome-microbiome association studies, and computational approaches, such as constraint-based metabolic modeling. To combine the strengths of the two analysis paradigms, we herein introduce a set of theoretical concepts allowing for the population statistical treatment of constraint-based microbial community models. To demonstrate the utility of the theoretical framework, we applied it to a public metagenomic dataset consisting of 365 colorectal cancer (CRC) cases and 251 healthy controls, shining a light on the metabolic role of Fusobacterium spp. in CRC. We found that (1) glutarate production capability was significantly enriched in CRC microbiomes and mechanistically linked to lysine fermentation in Fusobacterium spp., (2) acetate and butyrate production potentials were lowered in CRC, and (3) Fusobacterium spp. presence had large negative ecological effects on community butyrate production in CRC cases and healthy controls. Validating the model predictions against fecal metabolomics, the in silico frameworks correctly predicted in vivo species metabolite correlations with high accuracy. In conclusion, highlighting the value of combining statistical association studies with in silico modeling, this study provides insights into the metabolic role of Fusobacterium spp. in the gut, while providing a proof of concept for the validity of constraint-based microbial community modeling.
Asunto(s)
Bacterias/metabolismo , Butiratos/metabolismo , Heces/microbiología , Fusobacterium/metabolismo , Microbioma Gastrointestinal , Anciano , Bacterias/clasificación , Bacterias/genética , Bacterias/aislamiento & purificación , Estudios de Casos y Controles , Neoplasias Colorrectales/microbiología , Heces/química , Femenino , Fusobacterium/genética , Fusobacterium/aislamiento & purificación , Humanos , Masculino , Metabolómica , Persona de Mediana EdadRESUMEN
Microbes play important roles in human health and disease. The interaction between microbes and hosts is a reciprocal relationship, which remains largely under-explored. Current computational resources lack manually and consistently curated data to connect metagenomic data to pathogenic microbes, microbial core genes, and disease phenotypes. We developed the MicroPhenoDB database by manually curating and consistently integrating microbe-disease association data. MicroPhenoDB provides 5677 non-redundant associations between 1781 microbes and 542 human disease phenotypes across more than 22 human body sites. MicroPhenoDB also provides 696,934 relationships between 27,277 unique clade-specific core genes and 685 microbes. Disease phenotypes are classified and described using the Experimental Factor Ontology (EFO). A refined score model was developed to prioritize the associations based on evidential metrics. The sequence search option in MicroPhenoDB enables rapid identification of existing pathogenic microbes in samples without running the usual metagenomic data processing and assembly. MicroPhenoDB offers data browsing, searching, and visualization through user-friendly web interfaces and web service application programming interfaces. MicroPhenoDB is the first database platform to detail the relationships between pathogenic microbes, core genes, and disease phenotypes. It will accelerate metagenomic data analysis and assist studies in decoding microbes related to human diseases. MicroPhenoDB is available through http://www.liwzlab.cn/microphenodb and http://lilab2.sysu.edu.cn/microphenodb.
Asunto(s)
Metagenoma , Metagenómica , Genes Microbianos , Humanos , Fenotipo , Programas InformáticosRESUMEN
The phytoplanktonic production and prokaryotic consumption of organic matter significantly contribute to marine carbon cycling. Organic matter released from phytoplankton via three processes (exudation of living cells, cell disruption through grazing, and viral lysis) shows distinct chemical properties. We herein investigated the effects of phytoplanktonic whole-cell fractions (WF) (representing cell disruption by grazing) and extracellular fractions (EF) (representing exudates) prepared from Heterosigma akashiwo, a bloom-forming Raphidophyceae, on prokaryotic communities using culture-based experiments. We analyzed prokaryotic community changes for two weeks. The shift in cell abundance by both treatments showed similar dynamics, reaching the first peak (~4.1×106| |cells| |mL-1) on day 3 and second peak (~1.1×106| |cells| |mL-1) on day 13. We classified the sequences obtained into operational taxonomic units (OTUs). A Bray-Curtis dissimilarity analysis revealed that the OTU-level community structure changed distinctively with the two treatments. Ten and 13 OTUs were specifically abundant in the WF and EF treatments, respectively. These OTUs were assigned as heterotrophic bacteria mainly belonging to the Alteromonadales (Gammaproteobacteria) and Bacteroidetes clades and showed successive dynamics following the addition of organic matter. We also analyzed the dynamics of these OTUs in the ocean using publicly available metagenomic data from a natural coastal bloom in Monterey Bay, USA. At least two WF treatment OTUs showed co-occurrence with H. akashiwo, indicating that the blooms of H. akashiwo also affect these OTUs in the ocean. The present results strongly suggest that the thriving and dead cells of uninfected phytoplankton differentially influence the marine prokaryotic community.
Asunto(s)
Exudados y Transudados/metabolismo , Microbiota , Fitoplancton/metabolismo , Agua de Mar/microbiología , Bacterias/clasificación , Bacterias/genética , Bacterias/crecimiento & desarrollo , Bacterias/metabolismo , Biodiversidad , Eutrofización , Filogenia , ARN Ribosómico 16S/genética , Estramenopilos/metabolismoRESUMEN
We present a new algorithm to cluster high-dimensional sequence data and its application to the field of metagenomics, which aims at reconstructing individual genomes from a mixture of genomes sampled from an environmental site, without any prior knowledge of reference data (genomes) or the shape of clusters. Such problems typically cannot be solved directly with classical approaches seeking to estimate the density of clusters, for example, using the shared nearest neighbors (SNN) rule, due to the prohibitive size of contemporary sequence datasets. We explore here a new approach based on combining the SNN rule with the concept of locality sensitive hashing (LSH). The proposed method, called LSH-SNN, works by randomly splitting the input data into smaller-sized subsets (buckets) and employing the SNN rule on each of these buckets. Links can be created among neighbors sharing a sufficient number of elements, hence allowing clusters to be grown from linked elements. LSH-SNN can scale up to larger datasets consisting of millions of sequences, while achieving high accuracy across a variety of sample sizes and complexities.