RESUMO
Urbanizing global populations spend over 90% of their time indoors where microbiome abundance and diversity are low. Chronic exposure to microbiomes with low abundance and diversity have demonstrated negative long-term impacts on human health. Sequencing-based analyses of environmental nucleic acids are critical to understanding the impact of the indoor microbiome on human health, however low DNA yields indoors, alongside sample collection and processing inconsistencies, currently challenge study replicability. This study presents a comparative assessment of a novel, passive, easily replicable sampling strategy using polydimethylsiloxane (PDMS) sheets alongside a representative swab-based collection protocol. Deployable, customizable PDMS films designed for whole-sample insertion into standardized extraction kits demonstrated 43% higher DNA yields per sample, and 76% higher yields per cm2 of sampler over swab-based protocols. These results indicate that this accessible, scalable method enables sufficient DNA collection to comprehensively evaluate indoor microbiome exposures and potential human health impacts using smaller, more space efficient samplers, representing an attractive alternative to swab-based collection. In addition, this process reduces the manual steps required for microbiome sampling which could address inter-study variability, transform the current microbiome sampling paradigm, and ultimately benefit the replicability and accessibility of microbiome exposure studies.
Assuntos
Microbiota , Microbiota/genética , Humanos , Manejo de Espécimes/métodos , Dimetilpolisiloxanos/química , DNA Bacteriano/genética , DNA Bacteriano/análise , DNA Bacteriano/isolamento & purificação , Monitoramento Ambiental/métodos , Microbiologia AmbientalRESUMO
BACKGROUND: Over half of the world's population lives in urban areas with, according to the United Nations, nearly 70% expected to live in cities by 2050. Our cities are built by and for humans, but are also complex, adaptive biological systems involving a diversity of other living species. The majority of these species are invisible and constitute the city's microbiome. Our design decisions for the built environment shape these invisible populations, and as inhabitants we interact with them on a constant basis. A growing body of evidence shows us that human health and well-being are dependent on these interactions. Indeed, multicellular organisms owe meaningful aspects of their development and phenotype to interactions with the microorganisms-bacteria or fungi-with which they live in continual exchange and symbiosis. Therefore, it is meaningful to establish microbial maps of the cities we inhabit. While the processing and sequencing of environmental microbiome samples can be high-throughput, gathering samples is still labor and time intensive, and can require mobilizing large numbers of volunteers to get a snapshot of the microbial landscape of a city. RESULTS: Here we postulate that honeybees may be effective collaborators in gathering samples of urban microbiota, as they forage daily within a 2-mile radius of their hive. We describe the results of a pilot study conducted with three rooftop beehives in Brooklyn, NY, where we evaluated the potential of various hive materials (honey, debris, hive swabs, bee bodies) to reveal information as to the surrounding metagenomic landscape, and where we conclude that the bee debris are the richest substrate. Based on these results, we profiled 4 additional cities through collected hive debris: Sydney, Melbourne, Venice and Tokyo. We show that each city displays a unique metagenomic profile as seen by honeybees. These profiles yield information relevant to hive health such as known bee symbionts and pathogens. Additionally, we show that this method can be used for human pathogen surveillance, with a proof-of-concept example in which we recover the majority of virulence factor genes for Rickettsia felis, a pathogen known to be responsible for "cat scratch fever". CONCLUSIONS: We show that this method yields information relevant to hive health and human health, providing a strategy to monitor environmental microbiomes on a city scale. Here we present the results of this study, and discuss them in terms of architectural implications, as well as the potential of this method for epidemic surveillance.
RESUMO
The recent increase in publicly available metagenomic datasets with geospatial metadata has made it possible to determine location-specific, microbial fingerprints from around the world. Such fingerprints can be useful for comparing microbial niches for environmental research, as well as for applications within forensic science and public health. To determine the regional specificity for environmental metagenomes, we examined 4305 shotgun-sequenced samples from the MetaSUB Consortium dataset-the most extensive public collection of urban microbiomes, spanning 60 different cities, 30 countries, and 6 continents. We were able to identify city-specific microbial fingerprints using supervised machine learning (SML) on the taxonomic classifications, and we also compared the performance of ten SML classifiers. We then further evaluated the five algorithms with the highest accuracy, with the city and continental accuracy ranging from 85-89% to 90-94%, respectively. Thereafter, we used these results to develop Cassandra, a random-forest-based classifier that identifies bioindicator species to aid in fingerprinting and can infer higher-order microbial interactions at each site. We further tested the Cassandra algorithm on the Tara Oceans dataset, the largest collection of marine-based microbial genomes, where it classified the oceanic sample locations with 83% accuracy. These results and code show the utility of SML methods and Cassandra to identify bioindicator species across both oceanic and urban environments, which can help guide ongoing efforts in biotracing, environmental monitoring, and microbial forensics (MF).
Assuntos
Metagenômica , Microbiota , Metagenômica/métodos , Metagenoma , Microbiota/genética , Aprendizado de Máquina Supervisionado , CidadesRESUMO
Flooding is expected to increase due to intensification of extreme precipitation events, sea-level rise, and urbanization. Low-cost water level sensors have the ability to fill a critical data gap on the presence, depth, and duration of street-level floods by measuring flood profiles (i.e., flood stage hydrographs) in real-time with a time interval on the order of minutes. Hyperlocal flood data collected by low-cost sensors have many use cases for a variety of stakeholders including municipal agencies, community members, and researchers. Here we outline examples of potential uses of flood sensor data before, during, and after flood events, based on dialog with stakeholders in New York City. These uses include inputs to predictive flood models, generation of real-time flood alerts for community members and emergency response teams, storm recovery assistance and cataloging of storm impacts, and informing infrastructure design and investment for long-term flood resilience project planning.
Assuntos
Inundações , UrbanizaçãoRESUMO
Following publication of the original article [1], the authors would like to highlight the following two corrections.
RESUMO
BACKGROUND: One of the main challenges in metagenomics is the identification of microorganisms in clinical and environmental samples. While an extensive and heterogeneous set of computational tools is available to classify microorganisms using whole-genome shotgun sequencing data, comprehensive comparisons of these methods are limited. RESULTS: In this study, we use the largest-to-date set of laboratory-generated and simulated controls across 846 species to evaluate the performance of 11 metagenomic classifiers. Tools were characterized on the basis of their ability to identify taxa at the genus, species, and strain levels, quantify relative abundances of taxa, and classify individual reads to the species level. Strikingly, the number of species identified by the 11 tools can differ by over three orders of magnitude on the same datasets. Various strategies can ameliorate taxonomic misclassification, including abundance filtering, ensemble approaches, and tool intersection. Nevertheless, these strategies were often insufficient to completely eliminate false positives from environmental samples, which are especially important where they concern medically relevant species. Overall, pairing tools with different classification strategies (k-mer, alignment, marker) can combine their respective advantages. CONCLUSIONS: This study provides positive and negative controls, titrated standards, and a guide for selecting tools for metagenomic analyses by comparing ranges of precision, accuracy, and recall. We show that proper experimental design and analysis parameters can reduce false positives, provide greater resolution of species in complex metagenomic samples, and improve the interpretation of results.
Assuntos
Benchmarking/métodos , Mapeamento de Sequências Contíguas/métodos , Código de Barras de DNA Taxonômico/métodos , Metagenoma , Análise de Sequência de DNA/métodos , Software , Benchmarking/normas , Mapeamento de Sequências Contíguas/normas , Código de Barras de DNA Taxonômico/normas , Humanos , Microbiota , Filogenia , Análise de Sequência de DNA/normasRESUMO
Genomic stability depends on faithful genome replication. This is achieved by the concerted activity of thousands of DNA replication origins (ORIs) scattered throughout the genome. The DNA and chromatin features determining ORI specification are not presently known. We have generated a high-resolution genome-wide map of 3230 ORIs in cultured Arabidopsis thaliana cells. Here, we focused on defining the features associated with ORIs in heterochromatin. In pericentromeric gene-poor domains ORIs associate almost exclusively with the retrotransposon class of transposable elements (TEs), in particular of the Gypsy family. ORI activity in retrotransposons occurs independently of TE expression and while maintaining high levels of H3K9me2 and H3K27me1, typical marks of repressed heterochromatin. ORI-TEs largely colocalize with chromatin signatures defining GC-rich heterochromatin. Importantly, TEs with active ORIs contain a local GC content higher than the TEs lacking them. Our results lead us to conclude that ORI colocalization with retrotransposons is determined by their transposition mechanism based on transcription, and a specific chromatin landscape. Our detailed analysis of ORIs responsible for heterochromatin replication has implications on the mechanisms of ORI specification in other multicellular organisms in which retrotransposons are major components of heterochromatin and of the entire genome.
Assuntos
Arabidopsis/genética , Replicação do DNA , Heterocromatina/genética , Origem de Replicação/genética , Retroelementos/genética , Arabidopsis/citologia , Arabidopsis/metabolismo , Linhagem Celular , Cromatina/genética , Cromatina/metabolismo , Mapeamento Cromossômico , DNA de Plantas/genética , DNA de Plantas/metabolismo , Sequência Rica em GC/genética , Genoma de Planta/genética , Heterocromatina/metabolismo , Histonas/metabolismo , Lisina/metabolismo , Metilação , Microscopia Confocal , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Transcrição GênicaRESUMO
The Genome in a Bottle Consortium, hosted by the National Institute of Standards and Technology (NIST) is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a large, diverse set of sequencing data for seven human genomes; five are current or candidate NIST Reference Materials. The pilot genome, NA12878, has been released as NIST RM 8398. We also describe data from two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Chinese ancestry. The data come from 12 technologies: BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Oxford Nanopore, Pacific Biosciences, SOLiD, 10X Genomics GemCode WGS, and Illumina exome and WGS paired-end, mate-pair, and synthetic long reads. Cell lines, DNA, and data from these individuals are publicly available. Therefore, we expect these data to be useful for revealing novel information about the human genome and improving sequencing technologies, SNP, indel, and structural variant calling, and de novo assembly.
Assuntos
Benchmarking , Genoma Humano , Exoma , Genômica , Humanos , Mutação INDELRESUMO
The panoply of microorganisms and other species present in our environment influence human health and disease, especially in cities, but have not been profiled with metagenomics at a city-wide scale. We sequenced DNA from surfaces across the entire New York City (NYC) subway system, the Gowanus Canal, and public parks. Nearly half of the DNA (48%) does not match any known organism; identified organisms spanned 1,688 bacterial, viral, archaeal, and eukaryotic taxa, which were enriched for harmless genera associated with skin (e.g., Acinetobacter). Predicted ancestry of human DNA left on subway surfaces can recapitulate U.S. Census demographic data, and bacterial signatures can reveal a station's history, such as marine-associated bacteria in a hurricane-flooded station. Some evidence of pathogens was found (Bacillus anthracis), but a lack of reported cases in NYC suggests that the pathogens represent a normal, urban microbiome. This baseline metagenomic map of NYC could help long-term disease surveillance, bioterrorism threat mitigation, and health management in the built environment of cities.
RESUMO
BACKGROUND: Transposable elements are major players in genome evolution. Transposon insertion polymorphisms can translate into phenotypic differences in plants and animals and are linked to different diseases including human cancer, making their characterization highly relevant to the study of genome evolution and genetic diseases. RESULTS: Here we present Jitterbug, a novel tool that identifies transposable element insertion sites at single-nucleotide resolution based on the pairedend mapping and clipped-read signatures produced by NGS alignments. Jitterbug can be easily integrated into existing NGS analysis pipelines, using the standard BAM format produced by frequently applied alignment tools (e.g. bwa, bowtie2), with no need to realign reads to a set of consensus transposon sequences. Jitterbug is highly sensitive and able to recall transposon insertions with a very high specificity, as demonstrated by benchmarks in the human and Arabidopsis genomes, and validation using long PacBio reads. In addition, Jitterbug estimates the zygosity of transposon insertions with high accuracy and can also identify somatic insertions. CONCLUSIONS: We demonstrate that Jitterbug can identify mosaic somatic transposon movement using sequenced tumor-normal sample pairs and allows for estimating the cancer cell fraction of clones containing a somatic TE insertion. We suggest that the independent methods we use to evaluate performance are a step towards creating a gold standard dataset for benchmarking structural variant prediction tools.
Assuntos
Biologia Computacional/métodos , Elementos de DNA Transponíveis , Genômica/métodos , Células Germinativas/metabolismo , Mutagênese Insercional , Algoritmos , Arabidopsis/genética , Simulação por Computador , Genoma Humano , Homozigoto , Humanos , Neoplasias/genética , Polimorfismo Genético , Reprodutibilidade dos Testes , SoftwareRESUMO
Transposons are mobile genetic elements that are found in nearly all organisms, including humans. Mobilization of DNA transposons by transposase enzymes can cause genomic rearrangements, but our knowledge of human genes derived from transposases is limited. In this study, we find that the protein encoded by human PGBD5, the most evolutionarily conserved transposable element-derived gene in vertebrates, can induce stereotypical cut-and-paste DNA transposition in human cells. Genomic integration activity of PGBD5 requires distinct aspartic acid residues in its transposase domain, and specific DNA sequences containing inverted terminal repeats with similarity to piggyBac transposons. DNA transposition catalyzed by PGBD5 in human cells occurs genome-wide, with precise transposon excision and preference for insertion at TTAA sites. The apparent conservation of DNA transposition activity by PGBD5 suggests that genomic remodeling contributes to its biological function.
Assuntos
Elementos de DNA Transponíveis , Recombinação Genética , Transposases/metabolismo , Humanos , Especificidade por SubstratoRESUMO
The availability of extensive databases of crop genome sequences should allow analysis of crop variability at an unprecedented scale, which should have an important impact in plant breeding. However, up to now the analysis of genetic variability at the whole-genome scale has been mainly restricted to single nucleotide polymorphisms (SNPs). This is a strong limitation as structural variation (SV) and transposon insertion polymorphisms are frequent in plant species and have had an important mutational role in crop domestication and breeding. Here, we present the first comprehensive analysis of melon genetic diversity, which includes a detailed analysis of SNPs, SV, and transposon insertion polymorphisms. The variability found among seven melon varieties representing the species diversity and including wild accessions and highly breed lines, is relatively high due in part to the marked divergence of some lineages. The diversity is distributed nonuniformly across the genome, being lower at the extremes of the chromosomes and higher in the pericentromeric regions, which is compatible with the effect of purifying selection and recombination forces over functional regions. Additionally, this variability is greatly reduced among elite varieties, probably due to selection during breeding. We have found some chromosomal regions showing a high differentiation of the elite varieties versus the rest, which could be considered as strongly selected candidate regions. Our data also suggest that transposons and SV may be at the origin of an important fraction of the variability in melon, which highlights the importance of analyzing all types of genetic variability to understand crop genome evolution.
Assuntos
Cucurbitaceae/genética , Elementos de DNA Transponíveis/genética , Evolução Molecular , Genoma de Planta , Mutagênese Insercional/genética , Polimorfismo de Nucleotídeo Único/genética , Cucumis sativus/genética , Deleção de Genes , Loci Gênicos , Nucleotídeos/genética , Filogenia , Seleção GenéticaRESUMO
Transposable elements (TEs) are major players in genome evolution. The effects of their movement vary from gene knockouts to more subtle effects such as changes in gene expression. It has recently been shown that TEs may contain transcription factor binding sites (TFBSs), and it has been proposed that they may rewire new genes into existing transcriptional networks. However, little is known about the dynamics of this process and its effect on transcription factor binding. Here we show that TEs have extensively amplified the number of sequences that match the E2F TFBS during Brassica speciation, and, as a result, as many as 85% of the sequences that fit the E2F TFBS consensus are within TEs in some Brassica species. We show that these sequences found within TEs bind E2Fa in vivo, which indicates a direct effect of these TEs on E2F-mediated gene regulation. Our results suggest that the TEs located close to genes may directly participate in gene promoters, whereas those located far from genes may have an indirect effect by diluting the effective amount of E2F protein able to bind to its cognate promoters. These results illustrate an extreme case of the effect of TEs in TFBS evolution, and suggest a singular way by which they affect host genes by modulating essential transcriptional networks.
Assuntos
Brassica/genética , Elementos de DNA Transponíveis/genética , Fatores de Transcrição E2F/genética , Regulação da Expressão Gênica de Plantas , Genoma de Planta/genética , Sequência de Bases , Sítios de Ligação , Evolução Molecular , Amplificação de Genes , Especiação Genética , Sequências Repetidas Invertidas/genética , Dados de Sequência Molecular , Proteínas de Plantas/genética , Regiões Promotoras Genéticas/genéticaRESUMO
In the large Cucurbitaceae genus Cucumis, cucumber (C. sativus) is the only species with 2n = 2x = 14 chromosomes. The majority of the remaining species, including melon (C. melo) and the sister species of cucumber, C. hystrix, have 2n = 2x = 24 chromosomes, implying a reduction from n = 12 to n = 7. To understand the underlying mechanisms, we investigated chromosome synteny among cucumber, C. hystrix and melon using integrated and complementary approaches. We identified 14 inversions and a C. hystrix lineage-specific reciprocal inversion between C. hystrix and melon. The results reveal the location and orientation of 53 C. hystrix syntenic blocks on the seven cucumber chromosomes, and allow us to infer at least 59 chromosome rearrangement events that led to the seven cucumber chromosomes, including five fusions, four translocations, and 50 inversions. The 12 inferred chromosomes (AK1-AK12) of an ancestor similar to melon and C. hystrix had strikingly different evolutionary fates, with cucumber chromosome C1 apparently resulting from insertion of chromosome AK12 into the centromeric region of translocated AK2/AK8, cucumber chromosome C3 originating from a Robertsonian-like translocation between AK4 and AK6, and cucumber chromosome C5 originating from fusion of AK9 and AK10. Chromosomes C2, C4 and C6 were the result of complex reshuffling of syntenic blocks from three (AK3, AK5 and AK11), three (AK5, AK7 and AK8) and five (AK2, AK3, AK5, AK8 and AK11) ancestral chromosomes, respectively, through 33 fusion, translocation and inversion events. Previous results (Huang, S., Li, R., Zhang, Z. et al., , Nat. Genet. 41, 1275-1281; Li, D., Cuevas, H.E., Yang, L., Li, Y., Garcia-Mas, J., Zalapa, J., Staub, J.E., Luan, F., Reddy, U., He, X., Gong, Z., Weng, Y. 2011a, BMC Genomics, 12, 396) showing that cucumber C7 stayed largely intact during the entire evolution of Cucumis are supported. Results from this study allow a fine-scale understanding of the mechanisms of dysploid chromosome reduction that has not been achieved previously.
Assuntos
Cromossomos de Plantas/genética , Cucumis/genética , Genoma de Planta/genética , Sintenia/genética , Mapeamento Cromossômico , Cucumis/citologia , Rearranjo Gênico , Sequenciamento de Nucleotídeos em Larga Escala , Hibridização in Situ Fluorescente , Modelos Genéticos , Filogenia , Ploidias , Análise de Sequência de DNA , Especificidade da EspécieRESUMO
We report the genome sequence of melon, an important horticultural crop worldwide. We assembled 375 Mb of the double-haploid line DHL92, representing 83.3% of the estimated melon genome. We predicted 27,427 protein-coding genes, which we analyzed by reconstructing 22,218 phylogenetic trees, allowing mapping of the orthology and paralogy relationships of sequenced plant genomes. We observed the absence of recent whole-genome duplications in the melon lineage since the ancient eudicot triplication, and our data suggest that transposon amplification may in part explain the increased size of the melon genome compared with the close relative cucumber. A low number of nucleotide-binding site-leucine-rich repeat disease resistance genes were annotated, suggesting the existence of specific defense mechanisms in this species. The DHL92 genome was compared with that of its parental lines allowing the quantification of sequence variability in the species. The use of the genome sequence in future investigations will facilitate the understanding of evolution of cucurbits and the improvement of breeding strategies.
Assuntos
Evolução Biológica , Cucumis melo/genética , Genoma de Planta/genética , Filogenia , Sequência de Bases , Mapeamento Cromossômico , Cromossomos Artificiais Bacterianos/genética , Elementos de DNA Transponíveis/genética , Resistência à Doença/genética , Genes Duplicados/genética , Genes de Plantas/genética , Genômica/métodos , Funções Verossimilhança , Modelos Genéticos , Anotação de Sequência Molecular , Dados de Sequência Molecular , Alinhamento de Sequência , Análise de Sequência de DNARESUMO
Retrotransposons' high capacity for mutagenesis is a threat that genomes need to control tightly. Transcriptional gene silencing is a general and highly effective control of retrotransposon expression. Yet, some retrotransposons manage to transpose and proliferate in plant genomes, suggesting that, as shown for plant viruses, retrotransposons can escape silencing. However no evidence of retrotransposon silencing escape has been reported. Here we analyze the silencing control of the tobacco Tnt1 retrotransposon and report that even though constructs driven by the Tnt1 promoter become silenced when stably integrated in tobacco, the endogenous Tnt1 elements remain active. Silencing of Tnt1-containing transgenes correlates with high DNA methylation and the inability to incorporate H2A.Z into their promoters, whereas the endogenous Tnt1 elements remain partially methylated at asymmetrical positions and incorporate H2A.Z upon induction. Our results show that the promoter of Tnt1 is a target of silencing in tobacco, but also that endogenous Tnt1 elements can escape this control and be expressed in their natural host.
Assuntos
Inativação Gênica , Nicotiana/genética , Retroelementos/genética , Cromatina/metabolismo , Metilação de DNA , Epigênese Genética , Regulação da Expressão Gênica , Ordem dos Genes , Histonas/metabolismo , Metiltransferases/metabolismo , Regiões Promotoras Genéticas , Estresse Fisiológico/genética , Nicotiana/metabolismo , Transcrição GênicaRESUMO
BACKGROUND: Cucumis melo (melon) belongs to the Cucurbitaceae family, whose economic importance among horticulture crops is second only to Solanaceae. Melon has a high intra-specific genetic variation, morphologic diversity and a small genome size (454 Mb), which make it suitable for a great variety of molecular and genetic studies. A number of genetic and genomic resources have already been developed, such as several genetic maps, BAC genomic libraries, a BAC-based physical map and EST collections. Sequence information would be invaluable to complete the picture of the melon genomic landscape, furthering our understanding of this species' evolution from its relatives and providing an important genetic tool. However, to this day there is little sequence data available, only a few melon genes and genomic regions are deposited in public databases. The development of massively parallel sequencing methods allows envisaging new strategies to obtain long fragments of genomic sequence at higher speed and lower cost than previous Sanger-based methods. RESULTS: In order to gain insight into the structure of a significant portion of the melon genome we set out to perform massive sequencing of pools of BAC clones. For this, a set of 57 BAC clones from a double haploid line was sequenced in two pools with the 454 system using both shotgun and paired-end approaches. The final assembly consists of an estimated 95% of the actual size of the melon BAC clones, with most likely complete sequences for 50 of the BACs, and a total sequence coverage of 39x. The accuracy of the assembly was assessed by comparing the previously available Sanger sequence of one of the BACs against its 454 sequence, and the polymorphisms found involved only 1.7 differences every 10,000 bp that were localized in 15 homopolymeric regions and two dinucleotide tandem repeats. Overall, the study provides approximately 6.7 Mb or 1.5% of the melon genome. The analysis of this new data has allowed us to gain further insight into characteristics of the melon genome such as gene density, average protein length, or microsatellite and transposon content. The annotation of the BAC sequences revealed a high degree of collinearity and protein sequence identity between melon and its close relative Cucumis sativus (cucumber). Transposon content analysis of the syntenic regions suggests that transposition activity after the split of both cucurbit species has been low in cucumber but very high in melon. CONCLUSIONS: The results presented here show that the strategy followed, which combines shotgun and BAC-end sequencing together with anchored marker information, is an excellent method for sequencing specific genomic regions, especially from relatively compact genomes such as that of melon. However, in agreement with other results, this map-based, BAC approach is confirmed to be an expensive way of sequencing a whole plant genome. Our results also provide a partial description of the melon genome's structure. Namely, our analysis shows that the melon genome is highly collinear with the smaller one of cucumber, the size difference being mainly due to the expansion of intergenic regions and proliferation of transposable elements.