RESUMEN
Small proteins are traditionally overlooked due to computational and experimental difficulties in detecting them. To systematically identify small proteins, we carried out a comparative genomics study on 1,773 human-associated metagenomes from four different body sites. We describe >4,000 conserved protein families, the majority of which are novel; â¼30% of these protein families are predicted to be secreted or transmembrane. Over 90% of the small protein families have no known domain and almost half are not represented in reference genomes. We identify putative housekeeping, mammalian-specific, defense-related, and protein families that are likely to be horizontally transferred. We provide evidence of transcription and translation for a subset of these families. Our study suggests that small proteins are highly abundant and those of the human microbiome, in particular, may perform diverse functions that have not been previously reported.
Asunto(s)
Microbiota , Proteínas/metabolismo , Secuencia de Aminoácidos , Comunicación Celular , Interacciones Huésped-Patógeno , Humanos , Metagenoma , Sistemas de Lectura Abierta/genética , Proteínas/química , Proteínas Ribosómicas/química , Proteínas Ribosómicas/metabolismo , Alineación de SecuenciaRESUMEN
Microbial source tracking (MST) identifies sources of fecal contamination in the environment using host-associated fecal markers. While there are numerous bacterial MST markers that can be used herein, there are few such viral markers. Here, we designed and tested novel viral MST markers based on tomato brown rugose fruit virus (ToBRFV) genomes. We assembled eight nearly complete genomes of ToBRFV from wastewater and stool samples from the San Francisco Bay Area in the United States. Next, we developed two novel probe-based reverse transcription-PCR (RT-PCR) assays based on conserved regions of the ToBRFV genome and tested the markers' sensitivities and specificities using human and non-human animal stool as well as wastewater. The ToBRFV markers are sensitive and specific; in human stool and wastewater, they are more prevalent and abundant than a commonly used viral marker, the pepper mild mottle virus (PMMoV) coat protein (CP) gene. We used the assays to detect fecal contamination in urban stormwater samples and found that the ToBRFV markers matched cross-assembly phage (crAssphage), an established viral MST marker, in prevalence across samples. Taken together, these results indicate that ToBRFV is a promising viral human-associated MST marker. IMPORTANCE Human exposure to fecal contamination in the environment can cause transmission of infectious diseases. Microbial source tracking (MST) can identify sources of fecal contamination so that contamination can be remediated and human exposures can be reduced. MST requires the use of host-associated MST markers. Here, we designed and tested novel MST markers from genomes of tomato brown rugose fruit virus (ToBRFV). The markers are sensitive and specific to human stool and highly abundant in human stool and wastewater samples.
Asunto(s)
Solanum lycopersicum , Aguas Residuales , Animales , Frutas , Biomarcadores , Heces/microbiología , Monitoreo del Ambiente/métodosRESUMEN
Noncoding RNAs with secondary structures play important roles in CRISPR-Cas systems. Many of these structures likely remain undiscovered. We used a large-scale comparative genomics approach to predict 156 novel candidate structured RNAs from 36,111 CRISPR-Cas systems. A number of these were found to overlap with coding genes, including palindromic candidates that overlapped with a variety of Cas genes in type I and III systems. Among these 156 candidates, we identified 46 new models of CRISPR direct repeats and 1 tracrRNA. This tracrRNA model occasionally overlapped with predicted cas9 coding regions, emphasizing the importance of expanding our search windows for novel structure RNAs in coding regions. We also demonstrated that the antirepeat sequence in this tracrRNA model can be used to accurately assign thousands of predicted CRISPR arrays to type II-C systems. This study highlights the importance of unbiased identification of candidate structured RNAs across CRISPR-Cas systems.
Asunto(s)
Sistemas CRISPR-Cas , ARN , Genómica , Operón , ARN/genética , Secuencias Repetitivas de Ácidos NucleicosRESUMEN
In Drosophila, myoblast fusion is a conserved process in which founder cells (FCs) and fusion competent myoblasts (FCMs) fuse to form a syncytial muscle fiber. Mutants for the myogenic regulator Myocyte enhancer factor-2 (MEF2) show a failure of myoblast fusion, indicating that MEF2 regulates the fusion process. Indeed, chromatin immunoprecipitation studies show that several genes involved in myoblast fusion are bound by MEF2 during embryogenesis. Of these, the MARVEL domain gene singles bar (sing), is down-regulated in MEF2 knockdown pupae, and has five consensus MEF2 binding sites within a 9000-bp region. To determine if MEF2 is an essential and direct regulator of sing during pupal muscle development, we identified a 315-bp myoblast enhancer of sing. This enhancer was active during myoblast fusion, and mutation of two MEF2 sites significantly decreased enhancer activity. We show that lack of sing expression resulted in adult lethality and muscle loss, due to a failure of fusion during the pupal stage. Additionally, we sought to determine if sing was required in either FCs or FCMs to support fusion. Interestingly, knockdown of sing in either population did not significantly affect fusion, however, knockdown in both FCs and FCMs resulted in muscles with significantly reduced nuclei numbers, provisionally indicating that sing function is required in either cell type, but not both. Finally, we found that MEF2 regulated sing expression at the embryonic stage through the same 315-bp enhancer, indicating that sing is a MEF2 target at both critical stages of myoblast fusion. Our studies define for the first time how MEF2 directly controls fusion at multiple stages of the life cycle, and provide further evidence that the mechanisms of fusion characterized in Drosophila embryos is also used in the formation of the more complex adult muscles.
Asunto(s)
Proteínas de Drosophila/genética , Drosophila/embriología , Proteínas de la Membrana/genética , Mioblastos/citología , Factores Reguladores Miogénicos/genética , Activación Transcripcional/genética , Animales , Animales Modificados Genéticamente , Secuencia de Bases , Sitios de Unión/genética , Fusión Celular , Núcleo Celular/genética , Inmunoprecipitación de Cromatina , Proteínas de Unión al ADN/biosíntesis , Proteínas de Unión al ADN/genética , Ensayo de Cambio de Movilidad Electroforética , Regulación del Desarrollo de la Expresión Génica , Células Gigantes/citología , Proteínas con Dominio MARVEL/biosíntesis , Proteínas con Dominio MARVEL/genética , Datos de Secuencia Molecular , Desarrollo de Músculos/genética , Fibras Musculares Esqueléticas , Interferencia de ARN , ARN Interferente PequeñoRESUMEN
Structured RNAs play crucial roles in viruses, exerting influence over both viral and host gene expression. However, the extensive diversity of structured RNAs and their ability to act in cis or trans positions pose challenges for predicting and assigning their functions. While comparative genomics approaches have successfully predicted candidate structured RNAs in microbes on a large scale, similar efforts for viruses have been lacking. In this study, we screened over 5 million DNA and RNA viral sequences, resulting in the prediction of 10,006 novel candidate structured RNAs. These predictions are widely distributed across taxonomy and ecosystem. We found transcriptional evidence for 206 of these candidate structured RNAs in the human fecal microbiome. These candidate RNAs exhibited evidence of nucleotide covariation, indicative of selective pressure maintaining the predicted secondary structures. Our analysis revealed a diverse repertoire of candidate structured RNAs, encompassing a substantial number of putative tRNAs or tRNA-like structures, Rho-independent transcription terminators, and potentially cis-regulatory structures consistently positioned upstream of genes. In summary, our findings shed light on the extensive diversity of structured RNAs in viruses, offering a valuable resource for further investigations into their functional roles and implications in viral gene expression and pave the way for a deeper understanding of the intricate interplay between viruses and their hosts at the molecular level.
RESUMEN
Microbial source tracking (MST) identifies sources of fecal contamination in the environment using fecal host-associated markers. While there are numerous bacterial MST markers, there are few viral markers. Here we design and test novel viral MST markers based on tomato brown rugose fruit virus (ToBRFV) genomes. We assembled eight nearly complete genomes of ToBRFV from wastewater and stool samples from the San Francisco Bay Area in the United States of America. Next, we developed two novel probe-based RT-PCR assays based on conserved regions of the ToBRFV genome, and tested the markersâ™ sensitivities and specificities using human and non-human animal stool as well as wastewater. TheToBRFV markers are sensitive and specific; in human stool and wastewater, they are more prevalent and abundant than a currently used marker, the pepper mild mottle virus (PMMoV) coat protein (CP) gene. We applied the assays to detect fecal contamination in urban stormwater samples and found that the ToBRFV markers matched cross-assembly phage (crAssphage), an established viral MST marker, in prevalence across samples. Taken together, ToBRFV is a promising viral human-associated MST marker. Importance: Human exposure to fecal contamination in the environment can cause transmission of infectious diseases. Microbial source tracking (MST) can identify sources of fecal contamination so that contamination can be remediated and human exposures can be reduced. MST requires the use of fecal host-associated MST markers. Here we design and test novel MST markers from genomes of tomato brown rugose fruit virus (ToBRFV). The markers are sensitive and specific to human stool, and highly abundant in human stool and wastewater samples.
RESUMEN
Small genes (<150 nucleotides) have been systematically overlooked in phage genomes. We employ a large-scale comparative genomics approach to predict >40,000 small-gene families in â¼2.3 million phage genome contigs. We find that small genes in phage genomes are approximately 3-fold more prevalent than in host prokaryotic genomes. Our approach enriches for small genes that are translated in microbiomes, suggesting the small genes identified are coding. More than 9,000 families encode potentially secreted or transmembrane proteins, more than 5,000 families encode predicted anti-CRISPR proteins, and more than 500 families encode predicted antimicrobial proteins. By combining homology and genomic-neighborhood analyses, we reveal substantial novelty and diversity within phage biology, including small phage genes found in multiple host phyla, small genes encoding proteins that play essential roles in host infection, and small genes that share genomic neighborhoods and whose encoded proteins may share related functions.
Asunto(s)
Bacteriófagos , Microbiota , Bacteriófagos/genética , Genoma Viral/genética , Genómica , FilogeniaRESUMEN
BACKGROUND: Structured RNAs play varied bioregulatory roles within microbes. To date, hundreds of candidate structured RNAs have been predicted using informatic approaches that search for motif structures in genomic sequence data. The human microbiome contains thousands of species and strains of microbes. Yet, much of the metagenomic data from the human microbiome remains unmined for structured RNA motifs primarily due to computational limitations. RESULTS: We sought to apply a large-scale, comparative genomics approach to these organisms to identify candidate structured RNAs. With a carefully constructed, though computationally intensive automated analysis, we identify 3161 conserved candidate structured RNAs in intergenic regions, as well as 2022 additional candidate structured RNAs that may overlap coding regions. We validate the RNA expression of 177 of these candidate structures by analyzing small fragment RNA-seq data from four human fecal samples. CONCLUSIONS: This approach identifies a wide variety of candidate structured RNAs, including tmRNAs, antitoxins, and likely ribosome protein leaders, from a wide variety of taxa. Overall, our pipeline enables conservative predictions of thousands of novel candidate structured RNAs from human microbiomes.
Asunto(s)
Genómica , Metagenoma , Metagenómica , Microbiota , Biología Computacional/métodos , Microbioma Gastrointestinal , Genómica/métodos , Humanos , Metagenómica/métodos , Conformación de Ácido Nucleico , ARN , Flujo de TrabajoRESUMEN
Ribosome profiling enables sequencing of ribosome-bound fragments of RNA, revealing which transcripts are being translated as well as the position of ribosomes along mRNAs. Although ribosome profiling has been applied to cultured bacterial isolates, its application to uncultured, mixed communities has been challenging. We present MetaRibo-Seq, a protocol that enables the application of ribosome profiling directly to the human fecal microbiome. MetaRibo-Seq is a benchmarked method that includes several modifications to existing ribosome profiling protocols, specifically addressing challenges involving fecal sample storage, purity and input requirements. We also provide a computational workflow to quality control and trim reads, de novo assemble a reference metagenome with metagenomic reads, align MetaRibo-Seq reads to the reference, and assess MetaRibo-Seq library quality ( https://github.com/bhattlab/bhattlab_workflows/tree/master/metariboseq ). This MetaRibo-Seq protocol enables researchers in standard molecular biology laboratories to study translation in the fecal microbiome in ~5 d.
Asunto(s)
Microbiota , Ribosomas , Heces , Humanos , Análisis de Secuencia de ARNRESUMEN
Ribosome profiling (Ribo-Seq) is a powerful method to study translation in bacteria. However, Ribo-Seq signal can be observed across RNAs that one would not expect to be bound by ribosomes. For example, Escherichia coli Ribo-Seq libraries also capture reads from most noncoding RNAs (ncRNAs). While some of these ncRNAs may overlap coding regions, this alone does not explain the majority of observed signal across ncRNAs. These fragments of ncRNAs in Ribo-Seq data pass all size selection steps of the Ribo-Seq protocol and survive hours of micrococcal nuclease (MNase) treatment. In this work, we specifically focus on Ribo-Seq signal across ncRNAs and provide evidence to suggest that RNA structure, as opposed to ribosome binding, protects them from degradation and allows them to persist in the Ribo-Seq sequencing library preparation. By inspecting these "contaminant reads" in bacterial Ribo-Seq, we show that data previously disregarded in bacterial Ribo-Seq experiments may, in fact, be used to gain partial information regarding the in vivo secondary structure of ncRNAs.IMPORTANCE Structured ncRNAs are pivotal mediators of bioregulation in bacteria, and their functions are often reliant on their specific structures. Here, we first inspect Ribo-Seq reads across noncoding regions, identifying contaminant reads in these libraries. We observe that contaminant reads in bacterial Ribo-Seq experiments that are often disregarded, in fact, strongly overlap with structured regions of ncRNAs. We then perform several bioinformatic analyses to determine why these contaminant reads may persist in Ribo-Seq libraries. Finally, we highlight some structured RNA contaminants in Ribo-Seq and support the hypothesis that structures in the RNA protect them from MNase digestion. We conclude that researchers should be cautious when interpreting Ribo-Seq signal as coding without considering signal distribution. These findings also may enable us to partially resolve RNA structures, identify novel structured RNAs, and elucidate RNA structure-function relationships in bacteria at a large scale and in vivo through the reanalysis of existing Ribo-Seq data sets.
Asunto(s)
Escherichia coli/genética , ARN no Traducido/química , ARN no Traducido/genética , Ribosomas/genética , Biología Computacional , Perfilación de la Expresión Génica , Análisis de Secuencia de ARNRESUMEN
No method exists to measure large-scale translation of genes in uncultured organisms in microbiomes. To overcome this limitation, we develop MetaRibo-Seq, a method for simultaneous ribosome profiling of tens to hundreds of organisms in microbiome samples. MetaRibo-Seq was benchmarked against gold-standard Ribo-Seq in a mock microbial community and applied to five different human fecal samples. Unlike RNA-Seq, Ribo-Seq signal of a predicted gene suggests it encodes a translated protein. We demonstrate two applications of this technique: First, MetaRibo-Seq identifies small genes, whose identification until now has been challenging. For example, MetaRibo-Seq identifies 2,091 translated, previously unannotated small protein families from five fecal samples, more than doubling the number of small proteins predicted to exist in this niche. Second, the combined application of RNA-Seq and MetaRibo-Seq identifies differences in the translation of transcripts. In summary, MetaRibo-Seq enables comprehensive translational profiling in microbiomes and identifies previously unannotated small proteins.
Asunto(s)
Microbiota/genética , RNA-Seq/métodos , Metagenómica , Biosíntesis de Proteínas/genéticaRESUMEN
In this issue of Cell Host & Microbe, Caballero et al. (2017) define a precise, limited consortium of commensal bacteria that restores resistance to colonization by clinically vexing vancomycin-resistant Enterococcus species.
Asunto(s)
Bacterias/crecimiento & desarrollo , Consorcios Microbianos/fisiología , Simbiosis/fisiología , Enterococos Resistentes a la Vancomicina/crecimiento & desarrollo , Recuento de Colonia Microbiana , Farmacorresistencia Bacteriana Múltiple , Infecciones por Bacterias Grampositivas/microbiología , Resistencia a la VancomicinaRESUMEN
Background. Five neuroinvasive Bacillus cereus infections (4 fatal) occurred in hospitalized patients with acute myelogenous leukemia (AML) during a 9-month period, prompting an investigation by infection control and public health officials. Methods. Medical records of case-patients were reviewed and a matched case-control study was performed. Infection control practices were observed. Multiple environmental, food, and medication samples common to AML patients were cultured. Multilocus sequence typing was performed for case and environmental B cereus isolates. Results. All 5 case-patients received chemotherapy and had early-onset neutropenic fevers that resolved with empiric antibiotics. Fever recurred at a median of 17 days (range, 9-20) with headaches and abrupt neurological deterioration. Case-patients had B cereus identified in central nervous system (CNS) samples by (1) polymerase chain reaction or culture or (2) bacilli seen on CNS pathology stains with high-grade B cereus bacteremia. Two case-patients also had colonic ulcers with abundant bacilli on autopsy. No infection control breaches were observed. On case-control analysis, bananas were the only significant exposure shared by all 5 case-patients (odds ratio, 9.3; P = .04). Five environmental or food isolates tested positive for B cereus, including a homogenized banana peel isolate and the shelf of a kitchen cart where bananas were stored. Multilocus sequence typing confirmed that all case and environmental strains were genetically distinct. Multilocus sequence typing-based phylogenetic analysis revealed that the organisms clustered in 2 separate clades. Conclusions. The investigation of this neuroinvasive B cereus cluster did not identify a single point source but was suggestive of a possible dietary exposure. Our experience underscores the potential virulence of B cereus in immunocompromised hosts.