Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 457
Filter
1.
Cell ; 185(21): 4023-4037.e18, 2022 10 13.
Article in English | MEDLINE | ID: mdl-36174579

ABSTRACT

High-throughput RNA sequencing offers broad opportunities to explore the Earth RNA virome. Mining 5,150 diverse metatranscriptomes uncovered >2.5 million RNA virus contigs. Analysis of >330,000 RNA-dependent RNA polymerases (RdRPs) shows that this expansion corresponds to a 5-fold increase of the known RNA virus diversity. Gene content analysis revealed multiple protein domains previously not found in RNA viruses and implicated in virus-host interactions. Extended RdRP phylogeny supports the monophyly of the five established phyla and reveals two putative additional bacteriophage phyla and numerous putative additional classes and orders. The dramatically expanded phylum Lenarviricota, consisting of bacterial and related eukaryotic viruses, now accounts for a third of the RNA virome. Identification of CRISPR spacer matches and bacteriolytic proteins suggests that subsets of picobirnaviruses and partitiviruses, previously associated with eukaryotes, infect prokaryotic hosts.


Subject(s)
Bacteriophages , RNA Viruses , Bacteriophages/genetics , DNA-Directed RNA Polymerases/genetics , Genome, Viral , Phylogeny , RNA , RNA Viruses/genetics , RNA-Dependent RNA Polymerase/genetics , Virome
2.
Cell ; 178(5): 1245-1259.e14, 2019 08 22.
Article in English | MEDLINE | ID: mdl-31402174

ABSTRACT

Small proteins are traditionally overlooked due to computational and experimental difficulties in detecting them. To systematically identify small proteins, we carried out a comparative genomics study on 1,773 human-associated metagenomes from four different body sites. We describe >4,000 conserved protein families, the majority of which are novel; ∼30% of these protein families are predicted to be secreted or transmembrane. Over 90% of the small protein families have no known domain and almost half are not represented in reference genomes. We identify putative housekeeping, mammalian-specific, defense-related, and protein families that are likely to be horizontally transferred. We provide evidence of transcription and translation for a subset of these families. Our study suggests that small proteins are highly abundant and those of the human microbiome, in particular, may perform diverse functions that have not been previously reported.


Subject(s)
Microbiota , Proteins/metabolism , Amino Acid Sequence , Cell Communication , Host-Pathogen Interactions , Humans , Metagenome , Open Reading Frames/genetics , Proteins/chemistry , Ribosomal Proteins/chemistry , Ribosomal Proteins/metabolism , Sequence Alignment
3.
Nature ; 622(7983): 594-602, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37821698

ABSTRACT

Metagenomes encode an enormous diversity of proteins, reflecting a multiplicity of functions and activities1,2. Exploration of this vast sequence space has been limited to a comparative analysis against reference microbial genomes and protein families derived from those genomes. Here, to examine the scale of yet untapped functional diversity beyond what is currently possible through the lens of reference genomes, we develop a computational approach to generate reference-free protein families from the sequence space in metagenomes. We analyse 26,931 metagenomes and identify 1.17 billion protein sequences longer than 35 amino acids with no similarity to any sequences from 102,491 reference genomes or the Pfam database3. Using massively parallel graph-based clustering, we group these proteins into 106,198 novel sequence clusters with more than 100 members, doubling the number of protein families obtained from the reference genomes clustered using the same approach. We annotate these families on the basis of their taxonomic, habitat, geographical and gene neighbourhood distributions and, where sufficient sequence diversity is available, predict protein three-dimensional models, revealing novel structures. Overall, our results uncover an enormously diverse functional space, highlighting the importance of further exploring the microbial functional dark matter.


Subject(s)
Metagenome , Metagenomics , Microbiology , Proteins , Cluster Analysis , Metagenome/genetics , Metagenomics/methods , Proteins/chemistry , Proteins/classification , Proteins/genetics , Databases, Protein , Protein Conformation
4.
Mol Cell ; 79(3): 416-424.e5, 2020 08 06.
Article in English | MEDLINE | ID: mdl-32645367

ABSTRACT

CRISPR-Cas12c/d proteins share limited homology with Cas12a and Cas9 bacterial CRISPR RNA (crRNA)-guided nucleases used widely for genome editing and DNA detection. However, Cas12c (C2c3)- and Cas12d (CasY)-catalyzed DNA cleavage and genome editing activities have not been directly observed. We show here that a short-complementarity untranslated RNA (scoutRNA), together with crRNA, is required for Cas12d-catalyzed DNA cutting. The scoutRNA differs in secondary structure from previously described tracrRNAs used by CRISPR-Cas9 and some Cas12 enzymes, and in Cas12d-containing systems, scoutRNA includes a conserved five-nucleotide sequence that is essential for activity. In addition to supporting crRNA-directed DNA recognition, biochemical and cell-based experiments establish scoutRNA as an essential cofactor for Cas12c-catalyzed pre-crRNA maturation. These results define scoutRNA as a third type of transcript encoded by a subset of CRISPR-Cas genomic loci and explain how Cas12c/d systems avoid requirements for host factors including ribonuclease III for bacterial RNA-mediated adaptive immunity.


Subject(s)
Bacteria/genetics , Bacterial Proteins/genetics , CRISPR-Cas Systems , Endodeoxyribonucleases/genetics , Genome, Bacterial/immunology , RNA, Bacterial/genetics , RNA, Small Untranslated/genetics , Bacteria/classification , Bacteria/immunology , Bacteria/metabolism , Bacterial Proteins/metabolism , Base Sequence , Clustered Regularly Interspaced Short Palindromic Repeats , DNA, Bacterial/chemistry , DNA, Bacterial/genetics , DNA, Bacterial/metabolism , Endodeoxyribonucleases/metabolism , Escherichia coli/genetics , Escherichia coli/immunology , Escherichia coli/metabolism , Nucleic Acid Conformation , Phylogeny , RNA, Bacterial/chemistry , RNA, Bacterial/metabolism , RNA, Guide, Kinetoplastida/genetics , RNA, Guide, Kinetoplastida/metabolism , RNA, Small Untranslated/chemistry , RNA, Small Untranslated/metabolism , Sequence Alignment , Sequence Homology, Nucleic Acid
5.
Mol Cell ; 73(4): 727-737.e3, 2019 02 21.
Article in English | MEDLINE | ID: mdl-30709710

ABSTRACT

CRISPR-Cas immunity requires integration of short, foreign DNA fragments into the host genome at the CRISPR locus, a site consisting of alternating repeat sequences and foreign-derived spacers. In most CRISPR systems, the proteins Cas1 and Cas2 form the integration complex and are both essential for DNA acquisition. Most type V-C and V-D systems lack the cas2 gene and have unusually short CRISPR repeats and spacers. Here, we show that a mini-integrase comprising the type V-C Cas1 protein alone catalyzes DNA integration with a preference for short (17- to 19-base-pair) DNA fragments. The mini-integrase has weak specificity for the CRISPR array. We present evidence that the Cas1 proteins form a tetramer for integration. Our findings support a model of a minimal integrase with an internal ruler mechanism that favors shorter repeats and spacers. This minimal integrase may represent the function of the ancestral Cas1 prior to Cas2 adoption.


Subject(s)
CRISPR-Associated Proteins/genetics , CRISPR-Cas Systems , Clustered Regularly Interspaced Short Palindromic Repeats , DNA, Bacterial/genetics , Endodeoxyribonucleases/genetics , Endonucleases/genetics , Escherichia coli Proteins/genetics , Escherichia coli/genetics , Gene Editing/methods , Integrases/genetics , Base Pairing , CRISPR-Associated Proteins/metabolism , DNA, Bacterial/metabolism , Endodeoxyribonucleases/metabolism , Endonucleases/metabolism , Escherichia coli/enzymology , Escherichia coli Proteins/metabolism , Gene Expression Regulation, Bacterial , Integrases/metabolism , Nucleotide Motifs , Substrate Specificity
6.
Nature ; 578(7795): 432-436, 2020 02.
Article in English | MEDLINE | ID: mdl-31968354

ABSTRACT

Our current knowledge about nucleocytoplasmic large DNA viruses (NCLDVs) is largely derived from viral isolates that are co-cultivated with protists and algae. Here we reconstructed 2,074 NCLDV genomes from sampling sites across the globe by building on the rapidly increasing amount of publicly available metagenome data. This led to an 11-fold increase in phylogenetic diversity and a parallel 10-fold expansion in functional diversity. Analysis of 58,023 major capsid proteins from large and giant viruses using metagenomic data revealed the global distribution patterns and cosmopolitan nature of these viruses. The discovered viral genomes encoded a wide range of proteins with putative roles in photosynthesis and diverse substrate transport processes, indicating that host reprogramming is probably a common strategy in the NCLDVs. Furthermore, inferences of horizontal gene transfer connected viral lineages to diverse eukaryotic hosts. We anticipate that the global diversity of NCLDVs that we describe here will establish giant viruses-which are associated with most major eukaryotic lineages-as important players in ecosystems across Earth's biomes.


Subject(s)
Biodiversity , DNA Viruses/classification , DNA Viruses/genetics , Eukaryotic Cells/metabolism , Eukaryotic Cells/virology , Host Microbial Interactions/genetics , Metagenomics , Animals , Capsid Proteins/genetics , Gene Transfer, Horizontal , Genome, Viral/genetics , Giant Viruses/classification , Giant Viruses/genetics , Phylogeny
7.
Nucleic Acids Res ; 52(D1): D502-D512, 2024 Jan 05.
Article in English | MEDLINE | ID: mdl-37811892

ABSTRACT

The Novel Metagenome Protein Families Database (NMPFamsDB) is a database of metagenome- and metatranscriptome-derived protein families, whose members have no hits to proteins of reference genomes or Pfam domains. Each protein family is accompanied by multiple sequence alignments, Hidden Markov Models, taxonomic information, ecosystem and geolocation metadata, sequence and structure predictions, as well as 3D structure models predicted with AlphaFold2. In its current version, NMPFamsDB hosts over 100 000 protein families, each with at least 100 members. The reported protein families significantly expand (more than double) the number of known protein sequence clusters from reference genomes and reveal new insights into their habitat distribution, origins, functions and taxonomy. We expect NMPFamsDB to be a valuable resource for microbial proteome-wide analyses and for further discovery and characterization of novel functions. NMPFamsDB is publicly available in http://www.nmpfamsdb.org/ or https://bib.fleming.gr/NMPFamsDB.


Subject(s)
Databases, Protein , Metagenome , Proteins , Amino Acid Sequence , Databases, Factual , Ecosystem , Proteins/chemistry , Geography
8.
Nucleic Acids Res ; 52(D1): D164-D173, 2024 Jan 05.
Article in English | MEDLINE | ID: mdl-37930866

ABSTRACT

Plasmids are mobile genetic elements found in many clades of Archaea and Bacteria. They drive horizontal gene transfer, impacting ecological and evolutionary processes within microbial communities, and hold substantial importance in human health and biotechnology. To support plasmid research and provide scientists with data of an unprecedented diversity of plasmid sequences, we introduce the IMG/PR database, a new resource encompassing 699 973 plasmid sequences derived from genomes, metagenomes and metatranscriptomes. IMG/PR is the first database to provide data of plasmid that were systematically identified from diverse microbiome samples. IMG/PR plasmids are associated with rich metadata that includes geographical and ecosystem information, host taxonomy, similarity to other plasmids, functional annotation, presence of genes involved in conjugation and antibiotic resistance. The database offers diverse methods for exploring its extensive plasmid collection, enabling users to navigate plasmids through metadata-centric queries, plasmid comparisons and BLAST searches. The web interface for IMG/PR is accessible at https://img.jgi.doe.gov/pr. Plasmid metadata and sequences can be downloaded from https://genome.jgi.doe.gov/portal/IMG_PR.


Subject(s)
Metagenome , Microbiota , Humans , Metadata , Software , Databases, Genetic , Plasmids/genetics
10.
Nature ; 568(7753): 505-510, 2019 04.
Article in English | MEDLINE | ID: mdl-30867587

ABSTRACT

The genome sequences of many species of the human gut microbiome remain unknown, largely owing to challenges in cultivating microorganisms under laboratory conditions. Here we address this problem by reconstructing 60,664 draft prokaryotic genomes from 3,810 faecal metagenomes, from geographically and phenotypically diverse humans. These genomes provide reference points for 2,058 newly identified species-level operational taxonomic units (OTUs), which represents a 50% increase over the previously known phylogenetic diversity of sequenced gut bacteria. On average, the newly identified OTUs comprise 33% of richness and 28% of species abundance per individual, and are enriched in humans from rural populations. A meta-analysis of clinical gut-microbiome studies pinpointed numerous disease associations for the newly identified OTUs, which have the potential to improve predictive models. Finally, our analysis revealed that uncultured gut species have undergone genome reduction that has resulted in the loss of certain biosynthetic pathways, which may offer clues for improving cultivation strategies in the future.


Subject(s)
Bacteria/classification , Bacteria/genetics , Gastrointestinal Microbiome/genetics , Genome, Bacterial/genetics , Metagenome/genetics , Bacteria/growth & development , Bacteria/isolation & purification , Bacterial Physiological Phenomena/genetics , Biosynthetic Pathways/genetics , Disease , Feces/microbiology , Gastrointestinal Microbiome/physiology , Genomics , Geographic Mapping , Humans , Phylogeny , Rural Population , Species Specificity
11.
Nucleic Acids Res ; 51(D1): D957-D963, 2023 01 06.
Article in English | MEDLINE | ID: mdl-36318257

ABSTRACT

The Genomes OnLine Database (GOLD) (https://gold.jgi.doe.gov/) at the Department of Energy Joint Genome Institute (DOE-JGI) continues to maintain its role as one of the flagship genomic metadata repositories of the world. The ever-increasing number of projects and metadata are freely available to the user community world-wide. GOLD's metadata is consumed by scientists and remains an important source for large-scale comparative genomics analysis initiatives. Encouraged by this active user engagement and growth, GOLD has continued to add new components and capabilities. The new features such as a public Application Programming Interface (API) and Ecosystem landing page as well as the growth of different entities in this current GOLD v.9 edition are described in detail in this manuscript.


Subject(s)
Databases, Genetic , Genomics , Genome , Software
12.
Nucleic Acids Res ; 51(D1): D723-D732, 2023 01 06.
Article in English | MEDLINE | ID: mdl-36382399

ABSTRACT

The Integrated Microbial Genomes & Microbiomes system (IMG/M: https://img.jgi.doe.gov/m/) at the Department of Energy (DOE) Joint Genome Institute (JGI) continues to provide support for users to perform comparative analysis of isolate and single cell genomes, metagenomes, and metatranscriptomes. In addition to datasets produced by the JGI, IMG v.7 also includes datasets imported from public sources such as NCBI Genbank, SRA, and the DOE National Microbiome Data Collaborative (NMDC), or submitted by external users. In the past couple years, we have continued our effort to help the user community by improving the annotation pipeline, upgrading the contents with new reference database versions, and adding new analysis functionalities such as advanced scaffold search, Average Nucleotide Identity (ANI) for high-quality metagenome bins, new cassette search, improved gene neighborhood display, and improvements to metatranscriptome data display and analysis. We also extended the collaboration and integration efforts with other DOE-funded projects such as NMDC and DOE Biology Knowledgebase (KBase).


Subject(s)
Data Management , Genomics , Genome, Bacterial , Software , Genome, Archaeal , Databases, Genetic , Metagenome
13.
Nucleic Acids Res ; 51(D1): D733-D743, 2023 01 06.
Article in English | MEDLINE | ID: mdl-36399502

ABSTRACT

Viruses are widely recognized as critical members of all microbiomes. Metagenomics enables large-scale exploration of the global virosphere, progressively revealing the extensive genomic diversity of viruses on Earth and highlighting the myriad of ways by which viruses impact biological processes. IMG/VR provides access to the largest collection of viral sequences obtained from (meta)genomes, along with functional annotation and rich metadata. A web interface enables users to efficiently browse and search viruses based on genome features and/or sequence similarity. Here, we present the fourth version of IMG/VR, composed of >15 million virus genomes and genome fragments, a ≈6-fold increase in size compared to the previous version. These clustered into 8.7 million viral operational taxonomic units, including 231 408 with at least one high-quality representative. Viral sequences in IMG/VR are now systematically identified from genomes, metagenomes, and metatranscriptomes using a new detection approach (geNomad), and IMG standard annotation are complemented with genome quality estimation using CheckV, taxonomic classification reflecting the latest taxonomic standards, and microbial host taxonomy prediction. IMG/VR v4 is available at https://img.jgi.doe.gov/vr, and the underlying data are available to download at https://genome.jgi.doe.gov/portal/IMG_VR.


Subject(s)
Databases, Genetic , Genome, Viral , Metadata , Metagenomics , Software
14.
Int J Syst Evol Microbiol ; 73(12)2023 Dec.
Article in English | MEDLINE | ID: mdl-38108591

ABSTRACT

In this study, a Gram-stain-positive, non-motile, oxidase- and catalase-negative, rod-shaped, bacterial strain (SG_E_30_P1T) that formed light yellow colonies was isolated from a groundwater sample of Sztaravoda spring, Hungary. Based on 16S rRNA phylogenetic and phylogenomic analyses, the strain was found to form a distinct linage within the family Microbacteriaceae. Its closest relatives in terms of near full-length 16S rRNA gene sequences are Salinibacterium hongtaonis MH299814 (97.72 % sequence similarity) and Leifsonia psychrotolerans GQ406810 (97.57 %). The novel strain grows optimally at 20-28 °C, at neutral pH and in the presence of NaCl (1-2 w/v%). Strain SG_E_30_P1T contains MK-7 and B-type peptidoglycan with diaminobutyrate as the diagnostic amino acid. The major cellular fatty acids are anteiso-C15 : 0, iso-C16 : 0 and iso-C14 : 0, and the polar lipid profile is composed of diphosphatidylglycerol and phosphatidylglycerol, as well as an unidentified aminoglycolipid, aminophospholipid and some unidentified phospholipids. The assembled draft genome is a contig with a total length of 2 897 968 bp and a DNA G+C content of 65.5 mol%. Amino acid identity values with it closest relatives with sequenced genomes of <62.54 %, as well as other genome distance results, indicate that this bacterium represents a novel genus within the family Microbacteriaceae. We suggest that SG_E_30_P1T (=DSM 111415T=NCAIM B.02656T) represents the type strain of a novel genus and species for which the name Antiquaquibacter oligotrophicus gen. nov., sp. nov. is proposed.


Subject(s)
Actinomycetales , Groundwater , Phylogeny , RNA, Ribosomal, 16S/genetics , Base Composition , Fatty Acids/chemistry , Sequence Analysis, DNA , DNA, Bacterial/genetics , Bacterial Typing Techniques , Bacteria , Amino Acids
15.
Nucleic Acids Res ; 49(D1): D723-D733, 2021 01 08.
Article in English | MEDLINE | ID: mdl-33152092

ABSTRACT

The Genomes OnLine Database (GOLD) (https://gold.jgi.doe.gov/) is a manually curated, daily updated collection of genome projects and their metadata accumulated from around the world. The current version of the database includes over 1.17 million entries organized broadly into Studies (45 770), Organisms (387 382) or Biosamples (101 207), Sequencing Projects (355 364) and Analysis Projects (283 481). These four levels contain over 600 metadata fields, which includes 76 controlled vocabulary (CV) tables containing 3873 terms. GOLD provides an interactive web user interface for browsing and searching by a wide range of project and metadata fields. Users can enter details about their own projects in GOLD, which acts as a gatekeeper to ensure that metadata is accurately documented before submitting sequence information to the Integrated Microbial Genomes (IMG) system for analysis. In order to maintain a reference dataset for use by members of the scientific community, GOLD also imports projects from public repositories such as GenBank and SRA. The current status of the database, along with recent updates and improvements are described in this manuscript.


Subject(s)
Databases, Genetic , Genome , Ecosystem , Gene Ontology , Search Engine , Sequence Analysis, DNA
16.
Nucleic Acids Res ; 49(D1): D764-D775, 2021 01 08.
Article in English | MEDLINE | ID: mdl-33137183

ABSTRACT

Viruses are integral components of all ecosystems and microbiomes on Earth. Through pervasive infections of their cellular hosts, viruses can reshape microbial community structure and drive global nutrient cycling. Over the past decade, viral sequences identified from genomes and metagenomes have provided an unprecedented view of viral genome diversity in nature. Since 2016, the IMG/VR database has provided access to the largest collection of viral sequences obtained from (meta)genomes. Here, we present the third version of IMG/VR, composed of 18 373 cultivated and 2 314 329 uncultivated viral genomes (UViGs), nearly tripling the total number of sequences compared to the previous version. These clustered into 935 362 viral Operational Taxonomic Units (vOTUs), including 188 930 with two or more members. UViGs in IMG/VR are now reported as single viral contigs, integrated proviruses or genome bins, and are annotated with a new standardized pipeline including genome quality estimation using CheckV, taxonomic classification reflecting the latest ICTV update, and expanded host taxonomy prediction. The new IMG/VR interface enables users to efficiently browse, search, and select UViGs based on genome features and/or sequence similarity. IMG/VR v3 is available at https://img.jgi.doe.gov/vr, and the underlying data are available to download at https://genome.jgi.doe.gov/portal/IMG_VR.


Subject(s)
Databases, Genetic , Ecosystem , Evolution, Molecular , Genome, Viral , Viruses/genetics , Base Sequence , Cluster Analysis , Geography , Molecular Sequence Annotation , Sequence Homology, Nucleic Acid , User-Computer Interface
17.
Nucleic Acids Res ; 49(D1): D751-D763, 2021 01 08.
Article in English | MEDLINE | ID: mdl-33119741

ABSTRACT

The Integrated Microbial Genomes & Microbiomes system (IMG/M: https://img.jgi.doe.gov/m/) contains annotated isolate genome and metagenome datasets sequenced at the DOE's Joint Genome Institute (JGI), submitted by external users, or imported from public sources such as NCBI. IMG v 6.0 includes advanced search functions and a new tool for statistical analysis of mixed sets of genomes and metagenome bins. The new IMG web user interface also has a new Help page with additional documentation and webinar tutorials to help users better understand how to use various IMG functions and tools for their research. New datasets have been processed with the prokaryotic annotation pipeline v.5, which includes extended protein family assignments.


Subject(s)
Data Analysis , Data Management , Databases, Genetic , Genome, Archaeal , Genome, Microbial , Metagenome , RNA, Ribosomal, 16S/genetics , Search Engine
18.
Bioinformatics ; 37(13): 1805-1813, 2021 Jul 27.
Article in English | MEDLINE | ID: mdl-33471063

ABSTRACT

MOTIVATION: Two key steps in the analysis of uncultured viruses recovered from metagenomes are the taxonomic classification of the viral sequences and the identification of putative host(s). Both steps rely mainly on the assignment of viral proteins to orthologs in cultivated viruses. Viral Protein Families (VPFs) can be used for the robust identification of new viral sequences in large metagenomics datasets. Despite the importance of VPF information for viral discovery, VPFs have not yet been explored for determining viral taxonomy and host targets. RESULTS: In this work, we classified the set of VPFs from the IMG/VR database and developed VPF-Class. VPF-Class is a tool that automates the taxonomic classification and host prediction of viral contigs based on the assignment of their proteins to a set of classified VPFs. Applying VPF-Class on 731K uncultivated virus contigs from the IMG/VR database, we were able to classify 363K contigs at the genus level and predict the host of over 461K contigs. In the RefSeq database, VPF-class reported an accuracy of nearly 100% to classify dsDNA, ssDNA and retroviruses, at the genus level, considering a membership ratio and a confidence score of 0.2. The accuracy in host prediction was 86.4%, also at the genus level, considering a membership ratio of 0.3 and a confidence score of 0.5. And, in the prophages dataset, the accuracy in host prediction was 86% considering a membership ratio of 0.6 and a confidence score of 0.8. Moreover, from the Global Ocean Virome dataset, over 817K viral contigs out of 1 million were classified. AVAILABILITY AND IMPLEMENTATION: The implementation of VPF-Class can be downloaded from https://github.com/biocom-uib/vpf-tools. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

19.
Int J Syst Evol Microbiol ; 72(12)2022 Dec.
Article in English | MEDLINE | ID: mdl-36748409

ABSTRACT

Bacterial strain A52C2T was isolated from the endophytic microbial community of a Pinus pinaster tree trunk and characterized. Strain A52C2T stained Gram-negative and formed rod-shaped cells that grew optimally at 30 °C and at pH 6.0-7.0. The G+C content of the DNA was 65.1 mol %. The respiratory quinone was ubiquinone 10, and the major fatty acids were cyclo-C19:0 ω8c and C18:0, representing 70.1 % of the total fatty acids. Phylogenetic analyses based on the 16S rRNA gene sequences placed strain A52C2T in a distinct lineage within the order Hyphomicrobiales, family Pleomorphomonadaceae. The 16S rRNA gene sequence similarities of A52C2T to that of Mongoliimonas terrestris and Oharaeibacter diazotrophicus were 93.15 and 93.2 %, respectively. The draft genome sequence of strain A52C2T comprises 4 196 045 bases with a 195-fold mapped coverage of the genome. The assembled genome consists of 43 contigs of more than 1 000 bp (N50 contig size was 209 720 bp). The genome encodes 4033 putative coding sequences. The phylogenetic, phenotypic and chemotaxonomic data showed that strain A52C2T (=UCCCB 130T=CECT 8949T=LMG 29042T) represents the type of a novel species and genus, for which we propose the name Faunimonas pinastri gen. nov., sp. nov.


Subject(s)
Alphaproteobacteria , Pinus , Fatty Acids/chemistry , Phospholipids/chemistry , Endophytes , Pinus/microbiology , Phylogeny , RNA, Ribosomal, 16S/genetics , DNA, Bacterial/genetics , Base Composition , Sequence Analysis, DNA , Bacterial Typing Techniques
20.
RNA Biol ; 19(1): 678-685, 2022.
Article in English | MEDLINE | ID: mdl-35491944

ABSTRACT

Noncoding RNAs with secondary structures play important roles in CRISPR-Cas systems. Many of these structures likely remain undiscovered. We used a large-scale comparative genomics approach to predict 156 novel candidate structured RNAs from 36,111 CRISPR-Cas systems. A number of these were found to overlap with coding genes, including palindromic candidates that overlapped with a variety of Cas genes in type I and III systems. Among these 156 candidates, we identified 46 new models of CRISPR direct repeats and 1 tracrRNA. This tracrRNA model occasionally overlapped with predicted cas9 coding regions, emphasizing the importance of expanding our search windows for novel structure RNAs in coding regions. We also demonstrated that the antirepeat sequence in this tracrRNA model can be used to accurately assign thousands of predicted CRISPR arrays to type II-C systems. This study highlights the importance of unbiased identification of candidate structured RNAs across CRISPR-Cas systems.


Subject(s)
CRISPR-Cas Systems , RNA , Genomics , Operon , RNA/genetics , Repetitive Sequences, Nucleic Acid
SELECTION OF CITATIONS
SEARCH DETAIL