Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
Más filtros













Base de datos
Intervalo de año de publicación
1.
Artículo en Inglés | MEDLINE | ID: mdl-37022754

RESUMEN

A strictly anaerobic hyperthermophilic archaeon, designated strain IOH2T, was isolated from a deep-sea hydrothermal vent (Onnuri vent field) area on the Central Indian Ocean Ridge. Strain IOH2T showed high 16S rRNA gene sequence similarity to Thermococcus sibiricus MM 739T (99.42 %), Thermococcus alcaliphilus DSM 10322T (99.28 %), Thermococcus aegaeus P5T (99.21 %), Thermococcus litoralis DSM 5473T (99.13 %), 'Thermococcus bergensis' T7324T (99.13 %), Thermococcus aggregans TYT (98.92 %) and Thermococcus prieurii Bio-pl-0405IT2T (98.01 %), with all other strains showing lower than 98 % similarity. The average nucleotide identity and in silico DNA-DNA hybridization values were highest between strain IOH2T and T. sibiricus MM 739T (79.33 and 15.00 %, respectively); these values are much lower than the species delineation cut-offs. Cells of strain IOH2T were coccoid, 1.0-1.2 µm in diameter and had no flagella. Growth ranges were 60-85 °C (optimum at 80 °C), pH 4.5-8.5 (optimum at pH 6.3) and 2.0-6.0 % (optimum at 4.0 %) NaCl. Growth of strain IOH2T was enhanced by starch, glucose, maltodextrin and pyruvate as a carbon source, and elemental sulphur as an electron acceptor. Through genome analysis of strain IOH2T, arginine biosynthesis related genes were predicted, and growth of strain IOH2T without arginine was confirmed. The genome of strain IOH2T was assembled as a circular chromosome of 1 946 249 bp and predicted 2096 genes. The DNA G+C content was 39.44 mol%. Based on the results of physiological and phylogenetic analyses, Thermococcus argininiproducens sp. nov. is proposed with type strain IOH2T (=MCCC 4K00089T=KCTC 25190T).


Asunto(s)
Thermococcus , Thermococcus/genética , Agua de Mar , Composición de Base , Filogenia , ARN Ribosómico 16S/genética , Océano Índico , ADN Bacteriano/genética , Ácidos Grasos/química , Análisis de Secuencia de ADN , Técnicas de Tipificación Bacteriana
2.
NAR Genom Bioinform ; 5(1): lqad023, 2023 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-36915411

RESUMEN

Metagenomics is the study of all genomic content contained in given microbial communities. Metagenomic functional analysis aims to quantify protein families and reconstruct metabolic pathways from the metagenome. It plays a central role in understanding the interaction between the microbial community and its host or environment. De novo functional analysis, which allows the discovery of novel protein families, remains challenging for high-complexity communities. There are currently three main approaches for recovering novel genes or proteins: de novo nucleotide assembly, gene calling and peptide assembly. Unfortunately, their information dependency has been overlooked, and each has been formulated as an independent problem. In this work, we develop a sophisticated workflow called integrated Metagenomic Protein Predictor (iMPP), which leverages the information dependencies for better de novo functional analysis. iMPP contains three novel modules: a hybrid assembly graph generation module, a graph-based gene calling module, and a peptide assembly-based refinement module. iMPP significantly improved the existing gene calling sensitivity on unassembled metagenomic reads, achieving a 92-97% recall rate at a high precision level (>85%). iMPP further allowed for more sensitive and accurate peptide assembly, recovering more reference proteins and delivering more hypothetical protein sequences. The high performance of iMPP can provide a more comprehensive and unbiased view of the microbial communities under investigation. iMPP is freely available from https://github.com/Sirisha-t/iMPP.

3.
J Microbiol ; 60(9): 916-927, 2022 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-35913594

RESUMEN

Siboglinid tubeworms thrive in hydrothermal vent and seep habitats via a symbiotic relationship with chemosynthetic bacteria. Difficulties in culturing tubeworms and their symbionts in a laboratory setting have hindered the study of host-microbe interactions. Therefore, released symbiont genomes are fragmented, thereby limiting the data available on the genome that affect subsequent analyses. Here, we present a complete genome of gammaproteobacterial endosymbiont from the tubeworm Lamellibrachia satsuma collected from a seep in Kagoshima Bay, assembled using a hybrid approach that combines sequences generated from the Illumina and Oxford Nano-pore platforms. The genome consists of a single circular chromosome with an assembly size of 4,323,754 bp and a GC content of 53.9% with 3,624 protein-coding genes. The genome is of high quality and contains no assembly gaps, while the completeness and contamination are 99.33% and 2.73%, respectively. Comparative genome analysis revealed a total of 1,724 gene clusters shared in the vent and seep tubeworm symbionts, while 294 genes were found exclusively in L. satsuma symbionts such as transposons, genes for defense mechanisms, and inorganic ion transportations. The addition of this complete endosymbiont genome assembly would be valuable for comparative studies particularly with tubeworm symbiont genomes as well as with other chemosynthetic microbial communities.


Asunto(s)
Respiraderos Hidrotermales , Microbiota , Poliquetos , Animales , Bacterias/genética , Respiraderos Hidrotermales/microbiología , Poliquetos/genética , Poliquetos/microbiología , Simbiosis
4.
Mitochondrial DNA B Resour ; 7(4): 640-641, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35425856

RESUMEN

Fungal species in the genus Trichoderma are widely used for industrial enzyme production and as biocontrol agents. In this study, we report the complete mitochondrial genome of a marine-derived Trichoderma simmonsii strain GH-Sj1, which belongs to the Harzianum clade of Trichoderma. GH-Sj1 was isolated from an edible sea alga Saccharina japonica collected from the southern coast of Korea. This newly assembled circular molecule is 28,668 bp in length and consists of 15 protein-coding genes, 26 transfer RNA genes, and two ribosomal RNA genes. Phylogenetic analysis using the maximum likelihood method shows that T. simmonsii GH-Sj1 is closely related to Trichoderma harzianum and Trichoderma lixii. To the best of our knowledge, this is the first characterization of a marine-derived mitogenome within the genus Trichoderma.

5.
Gigascience ; 11(1)2022 01 12.
Artículo en Inglés | MEDLINE | ID: mdl-35022698

RESUMEN

BACKGROUND: The shuttles hoppfish (mudskipper), Periophthalmus modestus, is one of the mudskippers, which are the largest group of amphibious teleost fishes, which are uniquely adapted to live on mudflats. Because mudskippers can survive on land for extended periods by breathing through their skin and through the lining of the mouth and throat, they were evaluated as a model for the evolutionary sea-land transition of Devonian protoamphibians, ancestors of all present tetrapods. RESULTS: A total of 39.6, 80.2, 52.9, and 33.3 Gb of Illumina, Pacific Biosciences, 10X linked, and Hi-C data, respectively, was assembled into 1,419 scaffolds with an N50 length of 33 Mb and BUSCO score of 96.6%. The assembly covered 117% of the estimated genome size (729 Mb) and included 23 pseudo-chromosomes anchored by a Hi-C contact map, which corresponded to the top 23 longest scaffolds above 20 Mb and close to the estimated one. Of the genome, 43.8% were various repetitive elements such as DNAs, tandem repeats, long interspersed nuclear elements, and simple repeats. Ab initio and homology-based gene prediction identified 30,505 genes, of which 94% had homology to the 14 Actinopterygii transcriptomes and 89% and 85% to Pfam familes and InterPro domains, respectively. Comparative genomics with 15 Actinopterygii species identified 59,448 gene families of which 12% were only in P. modestus. CONCLUSIONS: We present the high quality of the first genome assembly and gene annotation of the shuttles hoppfish. It will provide a valuable resource for further studies on sea-land transition, bimodal respiration, nitrogen excretion, osmoregulation, thermoregulation, vision, and mechanoreception.


Asunto(s)
Cromosomas , Genoma , Animales , Cromosomas/genética , Genómica , Anotación de Secuencia Molecular , Secuencias Repetitivas de Ácidos Nucleicos
6.
BMC Genomics ; 22(1): 830, 2021 Nov 17.
Artículo en Inglés | MEDLINE | ID: mdl-34789157

RESUMEN

BACKGROUND: Trichoderma is a genus of fungi in the family Hypocreaceae and includes species known to produce enzymes with commercial use. They are largely found in soil and terrestrial plants. Recently, Trichoderma simmonsii isolated from decaying bark and decorticated wood was newly identified in the Harzianum clade of Trichoderma. Due to a wide range of applications in agriculture and other industries, genomes of at least 12 Trichoderma spp. have been studied. Moreover, antifungal and enzymatic activities have been extensively characterized in Trichoderma spp. However, the genomic information and bioactivities of T. simmonsii from a particular marine-derived isolate remain largely unknown. While we screened for asparaginase-producing fungi, we observed that T. simmonsii GH-Sj1 strain isolated from edible kelp produced asparaginase. In this study, we report a draft genome of T. simmonsii GH-Sj1 using Illumina and Oxford Nanopore technologies. Furthermore, to facilitate biotechnological applications of this species, RNA-sequencing was performed to elucidate the transcriptional profile of T. simmonsii GH-Sj1 in response to asparaginase-rich conditions. RESULTS: We generated ~ 14 Gb of sequencing data assembled in a ~ 40 Mb genome. The T. simmonsii GH-Sj1 genome consisted of seven telomere-to-telomere scaffolds with no sequencing gaps, where the N50 length was 6.4 Mb. The total number of protein-coding genes was 13,120, constituting ~ 99% of the genome. The genome harbored 176 tRNAs, which encode a full set of 20 amino acids. In addition, it had an rRNA repeat region consisting of seven repeats of the 18S-ITS1-5.8S-ITS2-26S cluster. The T. simmonsii genome also harbored 7 putative asparaginase-encoding genes with potential medical applications. Using RNA-sequencing analysis, we found that 3 genes among the 7 putative genes were significantly upregulated under asparaginase-rich conditions. CONCLUSIONS: The genome and transcriptome of T. simmonsii GH-Sj1 established in the current work represent valuable resources for future comparative studies on fungal genomes and asparaginase production.


Asunto(s)
Trichoderma , Asparaginasa , Genoma , Hypocreales , Telómero , Trichoderma/genética
7.
Sci Data ; 7(1): 85, 2020 03 09.
Artículo en Inglés | MEDLINE | ID: mdl-32152293

RESUMEN

Crustacean amphipods are important trophic links between primary producers and higher consumers. Although most amphipods occur in or around aquatic environments, the family Talitridae is the only family found in terrestrial and semi-terrestrial habitats. The sand-hopper Trinorchestia longiramus is a talitrid species often found in the sandy beaches of South Korea. In this study, we present the first draft genome assembly and annotation of this species. We generated ~380.3 Gb of sequencing data assembled in a 0.89 Gb draft genome. Annotation analysis estimated 26,080 protein-coding genes, with 89.9% genome completeness. Comparison with other amphipods showed that T. longiramus has 327 unique orthologous gene clusters, many of which are expanded gene families responsible for cellular transport of toxic substances, homeostatic processes, and ionic and osmotic stress tolerance. This first talitrid genome will be useful for further understanding the mechanisms of adaptation in terrestrial environments, the effects of heavy metal toxicity, as well as for studies of comparative genomic variation across amphipods.


Asunto(s)
Anfípodos/genética , Genoma , Animales , Ecosistema , Genómica , Anotación de Secuencia Molecular , Familia de Multigenes
9.
BMC Bioinformatics ; 20(Suppl 11): 276, 2019 Jun 06.
Artículo en Inglés | MEDLINE | ID: mdl-31167633

RESUMEN

BACKGROUND: A crucial task in metagenomic analysis is to annotate the function and taxonomy of the sequencing reads generated from a microbiome sample. In general, the reads can either be assembled into contigs and searched against reference databases, or individually searched without assembly. The first approach may suffer from fragmentary and incomplete assembly, while the second is hampered by the reduced functional signal contained in the short reads. To tackle these issues, we have previously developed GRASP (Guided Reference-based Assembly of Short Peptides), which accepts a reference protein sequence as input and aims to assemble its homologs from a database containing fragmentary protein sequences. In addition to a gene-centric assembly tool, GRASP also serves as a homolog search tool when using the assembled protein sequences as templates to recruit reads. GRASP has significantly improved recall rate (60-80% vs. 30-40%) compared to other homolog search tools such as BLAST. However, GRASP is both time- and space-consuming. Subsequently, we developed GRASPx, which is 30X faster than GRASP. Here, we present a completely redesigned algorithm, GRASP2, for this computational problem. RESULTS: GRASP2 utilizes Burrows-Wheeler Transformation (BWT) and FM-index to perform assembly graph generation, and reduces the search space by employing a fast ungapped alignment strategy as a filter. GRASP2 also explicitly generates candidate paths prior to alignment, which effectively uncouples the iterative access of the assembly graph and alignment matrix. This strategy makes the execution of the program more efficient under current computer architecture, and contributes to GRASP2's speedup. GRASP2 is 8-fold faster than GRASPx (and 250-fold faster than GRASP) and uses 8-fold less memory while maintaining the original high recall rate of GRASP. GRASP2 reaches ~ 80% recall rate compared to that of ~ 40% generated by BLAST, both at a high precision level (> 95%). With such a high performance, GRASP2 is only ~3X slower than BLASTP. CONCLUSION: GRASP2 is a high-performance gene-centric and homolog search tool with significant speedup compared to its predecessors, which makes GRASP2 a useful tool for metagenomics data analysis, GRASP2 is implemented in C++ and is freely available from http://www.sourceforge.net/projects/grasp2 .


Asunto(s)
Genes , Metagenómica/métodos , Análisis de Secuencia de ADN/métodos , Homología de Secuencia de Ácido Nucleico , Programas Informáticos , Algoritmos , Organismos Acuáticos/genética , Microbiota/genética , Curva ROC , Factores de Tiempo
10.
Mitochondrial DNA B Resour ; 4(2): 2104-2105, 2019 Jul 10.
Artículo en Inglés | MEDLINE | ID: mdl-33365428

RESUMEN

The complete mitochondrial genome of sand-hopper Trinorchestia longiramus was analyzed in this study, which is the first for the genus within the family Talitridae. The mitogenome sequence is 15,401 bp in length containing two ribosomal RNA genes, 22 transfer RNA genes, 13 protein-coding genes, and a control region as found in most amphipods. The gene order showed that T. longiramus has a unique control region location compared to other amphipods. Phylogenetic analysis using the maximum likelihood method positioned T. longiramus within the monophyletic clades of the family Talitridae.

11.
Microbiome ; 6(1): 217, 2018 12 06.
Artículo en Inglés | MEDLINE | ID: mdl-30522530

RESUMEN

BACKGROUND: Dental plaque is composed of hundreds of bacterial taxonomic units and represents one of the most diverse and stable microbial ecosystems associated with the human body. Taxonomic composition and functional capacity of mature plaque is gradually shaped during several stages of community assembly via processes such as co-aggregation, competition for space and resources, and by bacterially produced reactive agents. Knowledge on the dynamics of assembly within complex communities is very limited and derives mainly from studies composed of a limited number of bacterial species. To fill current knowledge gaps, we applied parallel metagenomic and metatranscriptomic analyses during assembly and maturation of an in vitro oral biofilm. This model system has previously demonstrated remarkable reproducibility in taxonomic composition across replicate samples during maturation. RESULTS: Time course analysis of the biofilm maturation was performed by parallel sampling every 2-3 h for 24 h for both DNA and RNA. Metagenomic analyses revealed that community taxonomy changed most dramatically between three and six hours of growth when pH dropped from 6.5 to 5.5. By applying comparative metatranscriptome analysis we could identify major shifts in overall community activities between six and nine hours of growth when pH dropped below 5.5, as 29,015 genes were significantly up- or down- expressed. Several of the differentially expressed genes showed unique activities for individual bacterial genomes and were associated with pyruvate and lactate metabolism, two-component signaling pathways, production of antibacterial molecules, iron sequestration, pH neutralization, protein hydrolysis, and surface attachment. Our analysis also revealed several mechanisms responsible for the niche expansion of the cariogenic pathogen Lactobacillus fermentum. CONCLUSION: It is highly regarded that acidic conditions in dental plaque cause a net loss of enamel from teeth. Here, as pH drops below 5.5 pH to 4.7, we observe blooms of cariogenic lactobacilli, and a transition point of many bacterial gene expression activities within the community. To our knowledge, this represents the first study of the assembly and maturation of a complex oral bacterial biofilm community that addresses gene level functional responses over time.


Asunto(s)
Bacterias/clasificación , Placa Dental/microbiología , Perfilación de la Expresión Génica/métodos , Metagenómica/métodos , Boca/microbiología , Adulto , Bacterias/genética , Bacterias/crecimiento & desarrollo , Proteínas Bacterianas/genética , Biopelículas , ADN Bacteriano/genética , ADN Ribosómico/genética , Regulación Bacteriana de la Expresión Génica , Humanos , Concentración de Iones de Hidrógeno , Redes y Vías Metabólicas , ARN Ribosómico 16S/genética , Análisis de Secuencia de ADN/métodos
12.
BMC Bioinformatics ; 17 Suppl 8: 283, 2016 Aug 31.
Artículo en Inglés | MEDLINE | ID: mdl-27585568

RESUMEN

BACKGROUND: Metagenomics is a cultivation-independent approach that enables the study of the genomic composition of microbes present in an environment. Metagenomic samples are routinely sequenced using next-generation sequencing technologies that generate short nucleotide reads. Proteins identified from these reads are mostly of partial length. On the other hand, de novo assembly of a large metagenomic dataset is computationally demanding and the assembled contigs are often fragmented, resulting in the identification of protein sequences that are also of partial length and incomplete. Annotation of an incomplete protein sequence often proceeds by identifying its homologs in a database of reference sequences. Identifying the homologs of incomplete sequences is a challenge and can result in substandard annotation of proteins from metagenomic datasets. To address this problem, we recently developed a homology detection algorithm named GRASP (Guided Reference-based Assembly of Short Peptides) that identifies the homologs of a given reference protein sequence in a database of short peptide metagenomic sequences. GRASP was developed to implement a simultaneous alignment and assembly algorithm for annotation of short peptides identified on metagenomic reads. The program achieves significantly improved recall rate at the cost of computational efficiency. In this article, we adopted three techniques to speed up the original version of GRASP, including the pre-construction of extension links, local assembly of individual seeds, and the implementation of query-level parallelism. RESULTS: The resulting new program, GRASPx, achieves >30X speedup compared to its predecessor GRASP. At the same time, we show that the performance of GRASPx is consistent with that of GRASP, and that both of them significantly outperform other popular homology-search tools including the BLAST and FASTA suites. GRASPx was also applied to a human saliva metagenome dataset and shows superior performance for both recall and precision rates. CONCLUSIONS: In this article we present GRASPx, a fast and accurate homology-search program implementing a simultaneous alignment and assembly framework. GRASPx can be used for more comprehensive and accurate annotation of short peptides. GRASPx is freely available at http://graspx.sourceforge.net/ .


Asunto(s)
Algoritmos , Bases de Datos de Proteínas , Metagenoma , Metagenómica/métodos , Péptidos/química , Alineación de Secuencia/métodos , Homología de Secuencia de Aminoácido , Secuencia de Aminoácidos , Simulación por Computador , Humanos
13.
PLoS Comput Biol ; 12(7): e1004991, 2016 07.
Artículo en Inglés | MEDLINE | ID: mdl-27400380

RESUMEN

Analyses of metagenome data (MG) and metatranscriptome data (MT) are often challenged by a paucity of complete reference genome sequences and the uneven/low sequencing depth of the constituent organisms in the microbial community, which respectively limit the power of reference-based alignment and de novo sequence assembly. These limitations make accurate protein family classification and abundance estimation challenging, which in turn hamper downstream analyses such as abundance profiling of metabolic pathways, identification of differentially encoded/expressed genes, and de novo reconstruction of complete gene and protein sequences from the protein family of interest. The profile hidden Markov model (HMM) framework enables the construction of very useful probabilistic models for protein families that allow for accurate modeling of position specific matches, insertions, and deletions. We present a novel homology detection algorithm that integrates banded Viterbi algorithm for profile HMM parsing with an iterative simultaneous alignment and assembly computational framework. The algorithm searches a given profile HMM of a protein family against a database of fragmentary MG/MT sequencing data and simultaneously assembles complete or near-complete gene and protein sequences of the protein family. The resulting program, HMM-GRASPx, demonstrates superior performance in aligning and assembling homologs when benchmarked on both simulated marine MG and real human saliva MG datasets. On real supragingival plaque and stool MG datasets that were generated from healthy individuals, HMM-GRASPx accurately estimates the abundances of the antimicrobial resistance (AMR) gene families and enables accurate characterization of the resistome profiles of these microbial communities. For real human oral microbiome MT datasets, using the HMM-GRASPx estimated transcript abundances significantly improves detection of differentially expressed (DE) genes. Finally, HMM-GRASPx was used to reconstruct comprehensive sets of complete or near-complete protein and nucleotide sequences for the query protein families. HMM-GRASPx is freely available online from http://sourceforge.net/projects/hmm-graspx.


Asunto(s)
Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Metagenómica/métodos , Proteínas/análisis , Proteínas/genética , Algoritmos , Antibacterianos/farmacología , Bacterias/efectos de los fármacos , Bacterias/genética , Bacterias/metabolismo , Simulación por Computador , Bases de Datos Genéticas , Farmacorresistencia Bacteriana/genética , Humanos , Metagenoma/genética , Modelos Teóricos , Proteínas/metabolismo , Saliva/química , Saliva/metabolismo , Transcriptoma/genética
14.
Proc Natl Acad Sci U S A ; 112(24): 7569-74, 2015 Jun 16.
Artículo en Inglés | MEDLINE | ID: mdl-26034276

RESUMEN

One major challenge to studying human microbiome and its associated diseases is the lack of effective tools to achieve targeted modulation of individual species and study its ecological function within multispecies communities. Here, we show that C16G2, a specifically targeted antimicrobial peptide, was able to selectively kill cariogenic pathogen Streptococcus mutans with high efficacy within a human saliva-derived in vitro oral multispecies community. Importantly, a significant shift in the overall microbial structure of the C16G2-treated community was revealed after a 24-h recovery period: several bacterial species with metabolic dependency or physical interactions with S. mutans suffered drastic reduction in their abundance, whereas S. mutans' natural competitors, including health-associated Streptococci, became dominant. This study demonstrates the use of targeted antimicrobials to modulate the microbiome structure allowing insights into the key community role of specific bacterial species and also indicates the therapeutic potential of C16G2 to achieve a healthy oral microbiome.


Asunto(s)
Péptidos Catiónicos Antimicrobianos/farmacología , Microbiota/efectos de los fármacos , Streptococcus mutans/efectos de los fármacos , Streptococcus mutans/fisiología , Adulto , Antibacterianos/farmacología , Biopelículas/efectos de los fármacos , Biopelículas/crecimiento & desarrollo , Caries Dental/microbiología , Humanos , Pruebas de Sensibilidad Microbiana , Boca/microbiología , Saliva/microbiología , Streptococcus mutans/patogenicidad
15.
ISME J ; 9(12): 2605-19, 2015 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-26023872

RESUMEN

Dental caries, one of the most globally widespread infectious diseases, is intimately linked to pH dynamics. In supragingival plaque, after the addition of a carbohydrate source, bacterial metabolism decreases the pH which then subsequently recovers. Molecular mechanisms supporting this important homeostasis are poorly characterized in part due to the fact that there are hundreds of active species in dental plaque. Only a few mechanisms (for example, lactate fermentation, the arginine deiminase system) have been identified and studied in detail. Here, we conducted what is to our knowledge, the first full transcriptome and metabolome analysis of a diverse oral plaque community by using a functionally and taxonomically robust in vitro model system greater than 100 species. Differential gene expression analyses from the complete transcriptome of 14 key community members revealed highly varied regulation of both known and previously unassociated pH-neutralizing pathways as a response to the pH drop. Unique expression and metabolite signatures from 400 detected metabolites were found for each stage along the pH curve suggesting it may be possible to define healthy and diseased states of activity. Importantly, for the maintenance of healthy plaque pH, gene transcription activity of known and previously unrecognized pH-neutralizing pathways was associated with the genera Lactobacillus, Veillonella and Streptococcus during the pH recovery phase. Our in vitro study provides a baseline for defining healthy and disease-like states and highlights the power of moving beyond single and dual species applications to capture key players and their orchestrated metabolic activities within a complex human oral microbiome model.


Asunto(s)
Bacterias/metabolismo , Metabolismo de los Hidratos de Carbono , Microbiota , Boca/microbiología , Adulto , Bacterias/química , Bacterias/clasificación , Bacterias/genética , Caries Dental/microbiología , Placa Dental/microbiología , Femenino , Humanos , Concentración de Iones de Hidrógeno , Masculino , Boca/química
16.
Bioinformatics ; 31(11): 1833-5, 2015 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-25637561

RESUMEN

UNLABELLED: The determination of protein sequences from a metagenomic dataset enables the study of metabolism and functional roles of the organisms that are present in the sampled microbial community. We had previously introduced algorithm and software for the accurate reconstruction of protein sequences from short peptides identified on nucleotide reads in a metagenomic dataset. Here, we present significant computational improvements to the short peptide assembly algorithm that make it practical to reconstruct proteins from large metagenomic datasets containing several hundred million reads, while maintaining accuracy. The improved computational efficiency is achieved using a suffix array data structure that allows for fast querying during the assembly process, and a significant redesign of assembly steps that enables multi-threaded execution. AVAILABILITY AND IMPLEMENTATION: The program is available under the GPLv3 license from sourceforge.net/projects/spa-assembler.


Asunto(s)
Metagenómica/métodos , Péptidos/química , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Algoritmos
17.
Proc Natl Acad Sci U S A ; 112(4): 1173-8, 2015 Jan 27.
Artículo en Inglés | MEDLINE | ID: mdl-25587132

RESUMEN

Thaumarchaeota are among the most abundant microbial cells in the ocean, but difficulty in cultivating marine Thaumarchaeota has hindered investigation into the physiological and evolutionary basis of their success. We report here a closed genome assembled from a highly enriched culture of the ammonia-oxidizing pelagic thaumarchaeon CN25, originating from the open ocean. The CN25 genome exhibits strong evidence of genome streamlining, including a 1.23-Mbp genome, a high coding density, and a low number of paralogous genes. Proteomic analysis recovered nearly 70% of the predicted proteins encoded by the genome, demonstrating that a high fraction of the genome is translated. In contrast to other minimal marine microbes that acquire, rather than synthesize, cofactors, CN25 encodes and expresses near-complete biosynthetic pathways for multiple vitamins. Metagenomic fragment recruitment indicated the presence of DNA sequences >90% identical to the CN25 genome throughout the oligotrophic ocean. We propose the provisional name "Candidatus Nitrosopelagicus brevis" str. CN25 for this minimalist marine thaumarchaeon and suggest it as a potential model system for understanding archaeal adaptation to the open ocean.


Asunto(s)
Archaea , Proteínas Arqueales , Regulación de la Expresión Génica Arqueal/fisiología , Proteoma , Proteómica , Microbiología del Agua , Secuencia de Aminoácidos , Archaea/clasificación , Archaea/genética , Archaea/metabolismo , Proteínas Arqueales/biosíntesis , Proteínas Arqueales/genética , Metagenómica , Datos de Secuencia Molecular , Océanos y Mares , Proteoma/biosíntesis , Proteoma/genética
18.
Nucleic Acids Res ; 43(3): e18, 2015 Feb 18.
Artículo en Inglés | MEDLINE | ID: mdl-25414351

RESUMEN

Protein sequences predicted from metagenomic datasets are annotated by identifying their homologs via sequence comparisons with reference or curated proteins. However, a majority of metagenomic protein sequences are partial-length, arising as a result of identifying genes on sequencing reads or on assembled nucleotide contigs, which themselves are often very fragmented. The fragmented nature of metagenomic protein predictions adversely impacts homology detection and, therefore, the quality of the overall annotation of the dataset. Here we present a novel algorithm called GRASP that accurately identifies the homologs of a given reference protein sequence from a database consisting of partial-length metagenomic proteins. Our homology detection strategy is guided by the reference sequence, and involves the simultaneous search and assembly of overlapping database sequences. GRASP was compared to three commonly used protein sequence search programs (BLASTP, PSI-BLAST and FASTM). Our evaluations using several simulated and real datasets show that GRASP has a significantly higher sensitivity than these programs while maintaining a very high specificity. GRASP can be a very useful program for detecting and quantifying taxonomic and protein family abundances in metagenomic datasets. GRASP is implemented in GNU C++, and is freely available at http://sourceforge.net/projects/grasp-release.


Asunto(s)
Péptidos/química , Algoritmos , Bases de Datos de Proteínas , Metagenoma , Péptidos/genética
19.
Proc Natl Acad Sci U S A ; 110(26): E2390-9, 2013 Jun 25.
Artículo en Inglés | MEDLINE | ID: mdl-23754396

RESUMEN

The "dark matter of life" describes microbes and even entire divisions of bacterial phyla that have evaded cultivation and have yet to be sequenced. We present a genome from the globally distributed but elusive candidate phylum TM6 and uncover its metabolic potential. TM6 was detected in a biofilm from a sink drain within a hospital restroom by analyzing cells using a highly automated single-cell genomics platform. We developed an approach for increasing throughput and effectively improving the likelihood of sampling rare events based on forming small random pools of single-flow-sorted cells, amplifying their DNA by multiple displacement amplification and sequencing all cells in the pool, creating a "mini-metagenome." A recently developed single-cell assembler, SPAdes, in combination with contig binning methods, allowed the reconstruction of genomes from these mini-metagenomes. A total of 1.07 Mb was recovered in seven contigs for this member of TM6 (JCVI TM6SC1), estimated to represent 90% of its genome. High nucleotide identity between a total of three TM6 genome drafts generated from pools that were independently captured, amplified, and assembled provided strong confirmation of a correct genomic sequence. TM6 is likely a Gram-negative organism and possibly a symbiont of an unknown host (nonfree living) in part based on its small genome, low-GC content, and lack of biosynthesis pathways for most amino acids and vitamins. Phylogenomic analysis of conserved single-copy genes confirms that TM6SC1 is a deeply branching phylum.


Asunto(s)
Biopelículas , Hospitales , Metagenoma , Ingeniería Sanitaria , Microbiología del Agua , Bacterias/clasificación , Bacterias/genética , Bacterias/aislamiento & purificación , ADN Bacteriano/genética , ADN Bacteriano/aislamiento & purificación , ADN Bacteriano/metabolismo , Evolución Molecular , Genoma Bacteriano , Humanos , Redes y Vías Metabólicas , Metagenómica/métodos , Datos de Secuencia Molecular , Filogenia , Abastecimiento de Agua
20.
Nucleic Acids Res ; 41(8): e91, 2013 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-23435317

RESUMEN

The metagenomic paradigm allows for an understanding of the metabolic and functional potential of microbes in a community via a study of their proteins. The substrate for protein identification is either the set of individual nucleotide reads generated from metagenomic samples or the set of contig sequences produced by assembling these reads. However, a read-based strategy using reads generated by next-generation sequencing (NGS) technologies, results in an overwhelming majority of partial-length protein predictions. A nucleotide assembly-based strategy does not fare much better, as metagenomic assemblies are typically fragmented and also leave a large fraction of reads unassembled. Here, we present a method for reconstructing complete protein sequences directly from NGS metagenomic data. Our framework is based on a novel short peptide assembler (SPA) that assembles protein sequences from their constituent peptide fragments identified on short reads. The SPA algorithm is based on informed traversals of a de Bruijn graph, defined on an amino acid alphabet, to identify probable paths that correspond to proteins. Using large simulated and real metagenomic data sets, we show that our method outperforms the alternate approach of identifying genes on nucleotide sequence assemblies and generates longer protein sequences that can be more effectively analysed.


Asunto(s)
Algoritmos , Metagenómica/métodos , Análisis de Secuencia de Proteína/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Péptidos/química , Sensibilidad y Especificidad
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA