Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 645
Filter
1.
Comput Struct Biotechnol J ; 23: 2011-2033, 2024 Dec.
Article in English | MEDLINE | ID: mdl-38765606

ABSTRACT

The fields of Metagenomics and Metatranscriptomics involve the examination of complete nucleotide sequences, gene identification, and analysis of potential biological functions within diverse organisms or environmental samples. Despite the vast opportunities for discovery in metagenomics, the sheer volume and complexity of sequence data often present challenges in processing analysis and visualization. This article highlights the critical role of advanced visualization tools in enabling effective exploration, querying, and analysis of these complex datasets. Emphasizing the importance of accessibility, the article categorizes various visualizers based on their intended applications and highlights their utility in empowering bioinformaticians and non-bioinformaticians to interpret and derive insights from meta-omics data effectively.

2.
mBio ; 15(4): e0018124, 2024 Apr 10.
Article in English | MEDLINE | ID: mdl-38477597

ABSTRACT

A comprehensive microbial surveillance was conducted at NASA's Mars 2020 spacecraft assembly facility (SAF), where whole-genome sequencing (WGS) of 110 bacterial strains was performed. One isolate, designated 179-BFC-A-HST, exhibited less than 80% average nucleotide identity (ANI) to known species, suggesting a novel organism. This strain demonstrated high-level resistance [minimum inhibitory concentration (MIC) >256 mg/L] to third-generation cephalosporins, including ceftazidime, cefpodoxime, combination ceftazidime/avibactam, and the fourth-generation cephalosporin cefepime. The results of a comparative genomic analysis revealed that 179-BFC-A-HST is most closely related to Virgibacillus halophilus 5B73CT, sharing an ANI of 78.7% and a digital DNA-DNA hybridization (dDDH) value of 23.5%, while their 16S rRNA gene sequences shared 97.7% nucleotide identity. Based on these results and the recent recognition that the genus Virgibacillus is polyphyletic, strain 179-BFC-A-HST is proposed as a novel species of a novel genus, Tigheibacillus jepli gen. nov., sp. nov (type strain 179-BFC-A-HST = DSM 115946T = NRRL B-65666T), and its closest neighbor, V. halophilus, is proposed to be reassigned to this genus as Tigheibacillus halophilus comb. nov. (type strain 5B73CT = DSM 21623T = JCM 21758T = KCTC 13935T). It was also necessary to reclassify its second closest neighbor Virgibacillus soli, as a member of a novel genus Paracerasibacillus, reflecting its phylogenetic position relative to the genus Cerasibacillus, for which we propose Paracerasibacillus soli comb. nov. (type strain CC-YMP-6T = DSM 22952T = CCM 7714T). Within Amphibacillaceae (n = 64), P. soli exhibited 11 antibiotic resistance genes (ARG), while T. jepli encoded for 3, lacking any known ß-lactamases, suggesting resistance from variant penicillin-binding proteins, disrupting cephalosporin efficacy. P. soli was highly resistant to azithromycin (MIC >64 mg/L) yet susceptible to cephalosporins and penicillins. IMPORTANCE: The significance of this research extends to understanding microbial survival and adaptation in oligotrophic environments, such as those found in SAF. Whole-genome sequencing of several strains isolated from Mars 2020 mission assembly cleanroom facilities, including the discovery of the novel species Tigheibacillus jepli, highlights the resilience and antimicrobial resistance (AMR) in clinically relevant antibiotic classes of microbes in nutrient-scarce settings. The study also redefines the taxonomic classifications within the Amphibacillaceae family, aligning genetic identities with phylogenetic data. Investigating ARG and virulence factors (VF) across these strains illuminates the microbial capability for resistance under resource-limited conditions while emphasizing the role of human-associated VF in microbial survival, informing sterilization practices and microbial management in similar oligotrophic settings beyond spacecraft assembly cleanrooms such as pharmaceutical and medical industry cleanrooms.


Subject(s)
Ceftazidime , Fatty Acids , Humans , Fatty Acids/analysis , Phylogeny , RNA, Ribosomal, 16S/genetics , Base Composition , Nucleic Acid Hybridization , Spores/chemistry , Nucleotides , DNA , DNA, Bacterial/genetics , DNA, Bacterial/chemistry , Sequence Analysis, DNA , Bacterial Typing Techniques
3.
Microbiol Resour Announc ; 13(3): e0098023, 2024 Mar 12.
Article in English | MEDLINE | ID: mdl-38329355

ABSTRACT

We present six whole community shotgun metagenomic sequencing data sets of two types of biological soil crusts sampled at the ecotone of the Mojave Desert and Colorado Desert in California. These data will help us understand the diversity and function of biocrust microbial communities, which are essential for desert ecosystems.

4.
Microbiol Resour Announc ; 13(2): e0108023, 2024 Feb 15.
Article in English | MEDLINE | ID: mdl-38189307

ABSTRACT

We present eight metatranscriptomic datasets of light algal and cyanolichen biological soil crusts from the Mojave Desert in response to wetting. These data will help us understand gene expression patterns in desert biocrust microbial communities after they have been reactivated by the addition of water.

5.
Nucleic Acids Res ; 52(D1): D502-D512, 2024 Jan 05.
Article in English | MEDLINE | ID: mdl-37811892

ABSTRACT

The Novel Metagenome Protein Families Database (NMPFamsDB) is a database of metagenome- and metatranscriptome-derived protein families, whose members have no hits to proteins of reference genomes or Pfam domains. Each protein family is accompanied by multiple sequence alignments, Hidden Markov Models, taxonomic information, ecosystem and geolocation metadata, sequence and structure predictions, as well as 3D structure models predicted with AlphaFold2. In its current version, NMPFamsDB hosts over 100 000 protein families, each with at least 100 members. The reported protein families significantly expand (more than double) the number of known protein sequence clusters from reference genomes and reveal new insights into their habitat distribution, origins, functions and taxonomy. We expect NMPFamsDB to be a valuable resource for microbial proteome-wide analyses and for further discovery and characterization of novel functions. NMPFamsDB is publicly available in http://www.nmpfamsdb.org/ or https://bib.fleming.gr/NMPFamsDB.


Subject(s)
Databases, Protein , Metagenome , Proteins , Amino Acid Sequence , Databases, Factual , Ecosystem , Proteins/chemistry , Geography
6.
Nucleic Acids Res ; 52(D1): D164-D173, 2024 Jan 05.
Article in English | MEDLINE | ID: mdl-37930866

ABSTRACT

Plasmids are mobile genetic elements found in many clades of Archaea and Bacteria. They drive horizontal gene transfer, impacting ecological and evolutionary processes within microbial communities, and hold substantial importance in human health and biotechnology. To support plasmid research and provide scientists with data of an unprecedented diversity of plasmid sequences, we introduce the IMG/PR database, a new resource encompassing 699 973 plasmid sequences derived from genomes, metagenomes and metatranscriptomes. IMG/PR is the first database to provide data of plasmid that were systematically identified from diverse microbiome samples. IMG/PR plasmids are associated with rich metadata that includes geographical and ecosystem information, host taxonomy, similarity to other plasmids, functional annotation, presence of genes involved in conjugation and antibiotic resistance. The database offers diverse methods for exploring its extensive plasmid collection, enabling users to navigate plasmids through metadata-centric queries, plasmid comparisons and BLAST searches. The web interface for IMG/PR is accessible at https://img.jgi.doe.gov/pr. Plasmid metadata and sequences can be downloaded from https://genome.jgi.doe.gov/portal/IMG_PR.


Subject(s)
Metagenome , Microbiota , Humans , Metadata , Software , Databases, Genetic , Plasmids/genetics
7.
Comput Struct Biotechnol J ; 21: 5630-5639, 2023.
Article in English | MEDLINE | ID: mdl-38047235

ABSTRACT

Structured RNAs play crucial roles in viruses, exerting influence over both viral and host gene expression. However, the extensive diversity of structured RNAs and their ability to act in cis or trans positions pose challenges for predicting and assigning their functions. While comparative genomics approaches have successfully predicted candidate structured RNAs in microbes on a large scale, similar efforts for viruses have been lacking. In this study, we screened over 5 million DNA and RNA viral sequences, resulting in the prediction of 10,006 novel candidate structured RNAs. These predictions are widely distributed across taxonomy and ecosystem. We found transcriptional evidence for 206 of these candidate structured RNAs in the human fecal microbiome. These candidate RNAs exhibited evidence of nucleotide covariation, indicative of selective pressure maintaining the predicted secondary structures. Our analysis revealed a diverse repertoire of candidate structured RNAs, encompassing a substantial number of putative tRNAs or tRNA-like structures, Rho-independent transcription terminators, and potentially cis-regulatory structures consistently positioned upstream of genes. In summary, our findings shed light on the extensive diversity of structured RNAs in viruses, offering a valuable resource for further investigations into their functional roles and implications in viral gene expression and pave the way for a deeper understanding of the intricate interplay between viruses and their hosts at the molecular level.

8.
Int J Syst Evol Microbiol ; 73(12)2023 Dec.
Article in English | MEDLINE | ID: mdl-38108591

ABSTRACT

In this study, a Gram-stain-positive, non-motile, oxidase- and catalase-negative, rod-shaped, bacterial strain (SG_E_30_P1T) that formed light yellow colonies was isolated from a groundwater sample of Sztaravoda spring, Hungary. Based on 16S rRNA phylogenetic and phylogenomic analyses, the strain was found to form a distinct linage within the family Microbacteriaceae. Its closest relatives in terms of near full-length 16S rRNA gene sequences are Salinibacterium hongtaonis MH299814 (97.72 % sequence similarity) and Leifsonia psychrotolerans GQ406810 (97.57 %). The novel strain grows optimally at 20-28 °C, at neutral pH and in the presence of NaCl (1-2 w/v%). Strain SG_E_30_P1T contains MK-7 and B-type peptidoglycan with diaminobutyrate as the diagnostic amino acid. The major cellular fatty acids are anteiso-C15 : 0, iso-C16 : 0 and iso-C14 : 0, and the polar lipid profile is composed of diphosphatidylglycerol and phosphatidylglycerol, as well as an unidentified aminoglycolipid, aminophospholipid and some unidentified phospholipids. The assembled draft genome is a contig with a total length of 2 897 968 bp and a DNA G+C content of 65.5 mol%. Amino acid identity values with it closest relatives with sequenced genomes of <62.54 %, as well as other genome distance results, indicate that this bacterium represents a novel genus within the family Microbacteriaceae. We suggest that SG_E_30_P1T (=DSM 111415T=NCAIM B.02656T) represents the type strain of a novel genus and species for which the name Antiquaquibacter oligotrophicus gen. nov., sp. nov. is proposed.


Subject(s)
Actinomycetales , Groundwater , Phylogeny , RNA, Ribosomal, 16S/genetics , Base Composition , Fatty Acids/chemistry , Sequence Analysis, DNA , DNA, Bacterial/genetics , Bacterial Typing Techniques , Bacteria , Amino Acids
9.
Nature ; 622(7983): 594-602, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37821698

ABSTRACT

Metagenomes encode an enormous diversity of proteins, reflecting a multiplicity of functions and activities1,2. Exploration of this vast sequence space has been limited to a comparative analysis against reference microbial genomes and protein families derived from those genomes. Here, to examine the scale of yet untapped functional diversity beyond what is currently possible through the lens of reference genomes, we develop a computational approach to generate reference-free protein families from the sequence space in metagenomes. We analyse 26,931 metagenomes and identify 1.17 billion protein sequences longer than 35 amino acids with no similarity to any sequences from 102,491 reference genomes or the Pfam database3. Using massively parallel graph-based clustering, we group these proteins into 106,198 novel sequence clusters with more than 100 members, doubling the number of protein families obtained from the reference genomes clustered using the same approach. We annotate these families on the basis of their taxonomic, habitat, geographical and gene neighbourhood distributions and, where sufficient sequence diversity is available, predict protein three-dimensional models, revealing novel structures. Overall, our results uncover an enormously diverse functional space, highlighting the importance of further exploring the microbial functional dark matter.


Subject(s)
Metagenome , Metagenomics , Microbiology , Proteins , Cluster Analysis , Metagenome/genetics , Metagenomics/methods , Proteins/chemistry , Proteins/classification , Proteins/genetics , Databases, Protein , Protein Conformation
10.
Nat Biotechnol ; 2023 Sep 21.
Article in English | MEDLINE | ID: mdl-37735266

ABSTRACT

Identifying and characterizing mobile genetic elements in sequencing data is essential for understanding their diversity, ecology, biotechnological applications and impact on public health. Here we introduce geNomad, a classification and annotation framework that combines information from gene content and a deep neural network to identify sequences of plasmids and viruses. geNomad uses a dataset of more than 200,000 marker protein profiles to provide functional gene annotation and taxonomic assignment of viral genomes. Using a conditional random field model, geNomad also detects proviruses integrated into host genomes with high precision. In benchmarks, geNomad achieved high classification performance for diverse plasmids and viruses (Matthews correlation coefficient of 77.8% and 95.3%, respectively), substantially outperforming other tools. Leveraging geNomad's speed and scalability, we processed over 2.7 trillion base pairs of sequencing data, leading to the discovery of millions of viruses and plasmids that are available through the IMG/VR and IMG/PR databases. geNomad is available at https://portal.nersc.gov/genomad .

11.
Front Bioinform ; 3: 1157956, 2023.
Article in English | MEDLINE | ID: mdl-36959975

ABSTRACT

Metagenomics has enabled accessing the genetic repertoire of natural microbial communities. Metagenome shotgun sequencing has become the method of choice for studying and classifying microorganisms from various environments. To this end, several methods have been developed to process and analyze the sequence data from raw reads to end-products such as predicted protein sequences or families. In this article, we provide a thorough review to simplify such processes and discuss the alternative methodologies that can be followed in order to explore biodiversity at the protein family level. We provide details for analysis tools and we comment on their scalability as well as their advantages and disadvantages. Finally, we report the available data repositories and recommend various approaches for protein family annotation related to phylogenetic distribution, structure prediction and metadata enrichment.

12.
Front Microbiol ; 14: 1082107, 2023.
Article in English | MEDLINE | ID: mdl-36925474

ABSTRACT

Integrated virus genomes (prophages) are commonly found in sequenced bacterial genomes but have rarely been described in detail for rhizobial genomes. Cupriavidus taiwanensis STM 6018 is a rhizobial Betaproteobacteria strain that was isolated in 2006 from a root nodule of a Mimosa pudica host in French Guiana, South America. Here we describe features of the genome of STM 6018, focusing on the characterization of two different types of prophages that have been identified in its genome. The draft genome of STM 6018 is 6,553,639 bp, and consists of 80 scaffolds, containing 5,864 protein-coding genes and 61 RNA genes. STM 6018 contains all the nodulation and nitrogen fixation gene clusters common to symbiotic Cupriavidus species; sharing >99.97% bp identity homology to the nod/nif/noeM gene clusters from C. taiwanensis LMG19424T and "Cupriavidus neocalidonicus" STM 6070. The STM 6018 genome contains the genomes of two prophages: one complete Mu-like capsular phage and one filamentous phage, which integrates into a putative dif site. This is the first characterization of a filamentous phage found within the genome of a rhizobial strain. Further examination of sequenced rhizobial genomes identified filamentous prophage sequences in several Beta-rhizobial strains but not in any Alphaproteobacterial rhizobia.

13.
Database (Oxford) ; 20232023 02 16.
Article in English | MEDLINE | ID: mdl-36794865

ABSTRACT

The power of next-generation sequencing has resulted in an explosive growth in the number of projects aiming to understand the metagenomic diversity of complex microbial environments. The interdisciplinary nature of this microbiome research community, along with the absence of reporting standards for microbiome data and samples, poses a significant challenge for follow-up studies. Commonly used names of metagenomes and metatranscriptomes in public databases currently lack the essential information necessary to accurately describe and classify the underlying samples, which makes a comparative analysis difficult to conduct and often results in misclassified sequences in data repositories. The Genomes OnLine Database (GOLD) (https:// gold.jgi.doe.gov/) at the Department of Energy Joint Genome Institute has been at the forefront of addressing this challenge by developing a standardized nomenclature system for naming microbiome samples. GOLD, currently in its twenty-fifth anniversary, continues to enrich the research community with hundreds of thousands of metagenomes and metatranscriptomes with well-curated and easy-to-understand names. Through this manuscript, we describe the overall naming process that can be easily adopted by researchers worldwide. Additionally, we propose the use of this naming system as a best practice for the scientific community to facilitate better interoperability and reusability of microbiome data.


Subject(s)
Microbiota , Software , Microbiota/genetics , Metagenome/genetics , Metagenomics/methods , Data Management
14.
Cell ; 186(3): 646-661.e4, 2023 02 02.
Article in English | MEDLINE | ID: mdl-36696902

ABSTRACT

Viroids and viroid-like covalently closed circular (ccc) RNAs are minimal replicators that typically encode no proteins and hijack cellular enzymes for replication. The extent and diversity of viroid-like agents are poorly understood. We developed a computational pipeline to identify viroid-like cccRNAs and applied it to 5,131 metatranscriptomes and 1,344 plant transcriptomes. The search yielded 11,378 viroid-like cccRNAs spanning 4,409 species-level clusters, a 5-fold increase compared to the previously identified viroid-like elements. Within this diverse collection, we discovered numerous putative viroids, satellite RNAs, retrozymes, and ribozy-like viruses. Diverse ribozyme combinations and unusual ribozymes within the cccRNAs were identified. Self-cleaving ribozymes were identified in ambiviruses, some mito-like viruses and capsid-encoding satellite virus-like cccRNAs. The broad presence of viroid-like cccRNAs in diverse transcriptomes and ecosystems implies that their host range is far broader than currently known, and matches to CRISPR spacers suggest that some cccRNAs replicate in prokaryotes.


Subject(s)
RNA, Catalytic , Viroids , RNA, Circular/metabolism , Viroids/genetics , Viroids/metabolism , RNA, Catalytic/genetics , RNA, Viral/genetics , RNA, Viral/metabolism , Ecosystem , Plant Diseases
15.
ISME J ; 17(3): 354-370, 2023 03.
Article in English | MEDLINE | ID: mdl-36536072

ABSTRACT

The substrates of the Brazilian campos rupestres, a grassland ecosystem, have extremely low concentrations of phosphorus and nitrogen, imposing restrictions to plant growth. Despite that, this ecosystem harbors almost 15% of the Brazilian plant diversity, raising the question of how plants acquire nutrients in such a harsh environment. Here, we set out to uncover the taxonomic profile, the compositional and functional differences and similarities, and the nutrient turnover potential of microbial communities associated with two plant species of the campos rupestres-dominant family Velloziaceae that grow over distinct substrates (soil and rock). Using amplicon sequencing data, we show that, despite the pronounced composition differentiation, the plant-associated soil and rock communities share a core of highly efficient colonizers that tend to be highly abundant and is enriched in 21 bacterial families. Functional investigation of metagenomes and 522 metagenome-assembled genomes revealed that the microorganisms found associated to plant roots are enriched in genes involved in organic compound intake, and phosphorus and nitrogen turnover. We show that potential for phosphorus transport, mineralization, and solubilization are mostly found within bacterial families of the shared microbiome, such as Xanthobacteraceae and Bryobacteraceae. We also detected the full repertoire of nitrogen cycle-related genes and discovered a lineage of Isosphaeraceae that acquired nitrogen-fixing potential via horizontal gene transfer and might be also involved in nitrification via a metabolic handoff association with Binataceae. We highlight that plant-associated microbial populations in the campos rupestres harbor a genetic repertoire with potential to increase nutrient availability and that the microbiomes of biodiversity hotspots can reveal novel mechanisms of nutrient turnover.


Subject(s)
Ecosystem , Microbiota , Brazil , Soil Microbiology , Biodiversity , Bacteria/genetics , Bacteria/metabolism , Plants/metabolism , Soil/chemistry , Phosphorus/metabolism , Nitrogen/metabolism
16.
Nucleic Acids Res ; 51(D1): D733-D743, 2023 01 06.
Article in English | MEDLINE | ID: mdl-36399502

ABSTRACT

Viruses are widely recognized as critical members of all microbiomes. Metagenomics enables large-scale exploration of the global virosphere, progressively revealing the extensive genomic diversity of viruses on Earth and highlighting the myriad of ways by which viruses impact biological processes. IMG/VR provides access to the largest collection of viral sequences obtained from (meta)genomes, along with functional annotation and rich metadata. A web interface enables users to efficiently browse and search viruses based on genome features and/or sequence similarity. Here, we present the fourth version of IMG/VR, composed of >15 million virus genomes and genome fragments, a ≈6-fold increase in size compared to the previous version. These clustered into 8.7 million viral operational taxonomic units, including 231 408 with at least one high-quality representative. Viral sequences in IMG/VR are now systematically identified from genomes, metagenomes, and metatranscriptomes using a new detection approach (geNomad), and IMG standard annotation are complemented with genome quality estimation using CheckV, taxonomic classification reflecting the latest taxonomic standards, and microbial host taxonomy prediction. IMG/VR v4 is available at https://img.jgi.doe.gov/vr, and the underlying data are available to download at https://genome.jgi.doe.gov/portal/IMG_VR.


Subject(s)
Databases, Genetic , Genome, Viral , Metadata , Metagenomics , Software
17.
Nucleic Acids Res ; 51(D1): D957-D963, 2023 01 06.
Article in English | MEDLINE | ID: mdl-36318257

ABSTRACT

The Genomes OnLine Database (GOLD) (https://gold.jgi.doe.gov/) at the Department of Energy Joint Genome Institute (DOE-JGI) continues to maintain its role as one of the flagship genomic metadata repositories of the world. The ever-increasing number of projects and metadata are freely available to the user community world-wide. GOLD's metadata is consumed by scientists and remains an important source for large-scale comparative genomics analysis initiatives. Encouraged by this active user engagement and growth, GOLD has continued to add new components and capabilities. The new features such as a public Application Programming Interface (API) and Ecosystem landing page as well as the growth of different entities in this current GOLD v.9 edition are described in detail in this manuscript.


Subject(s)
Databases, Genetic , Genomics , Genome , Software
18.
Nucleic Acids Res ; 51(D1): D723-D732, 2023 01 06.
Article in English | MEDLINE | ID: mdl-36382399

ABSTRACT

The Integrated Microbial Genomes & Microbiomes system (IMG/M: https://img.jgi.doe.gov/m/) at the Department of Energy (DOE) Joint Genome Institute (JGI) continues to provide support for users to perform comparative analysis of isolate and single cell genomes, metagenomes, and metatranscriptomes. In addition to datasets produced by the JGI, IMG v.7 also includes datasets imported from public sources such as NCBI Genbank, SRA, and the DOE National Microbiome Data Collaborative (NMDC), or submitted by external users. In the past couple years, we have continued our effort to help the user community by improving the annotation pipeline, upgrading the contents with new reference database versions, and adding new analysis functionalities such as advanced scaffold search, Average Nucleotide Identity (ANI) for high-quality metagenome bins, new cassette search, improved gene neighborhood display, and improvements to metatranscriptome data display and analysis. We also extended the collaboration and integration efforts with other DOE-funded projects such as NMDC and DOE Biology Knowledgebase (KBase).


Subject(s)
Data Management , Genomics , Genome, Bacterial , Software , Genome, Archaeal , Databases, Genetic , Metagenome
19.
Microbiol Resour Announc ; 11(11): e0062022, 2022 Nov 17.
Article in English | MEDLINE | ID: mdl-36259954

ABSTRACT

We report here the genome sequences of three Aquimarina megaterium strains isolated from the octocoral Eunicella labiata. We reveal a coding potential for versatile carbon metabolism and biosynthesis of natural products belonging to the polyketide, nonribosomal peptide, and terpene compound classes.

SELECTION OF CITATIONS
SEARCH DETAIL
...