RESUMEN
SUMMARY: Large-scale sharing of genomic quantification data requires standardized access interfaces. In this Global Alliance for Genomics and Health project, we developed RNAget, an API for secure access to genomic quantification data in matrix form. RNAget provides for slicing matrices to extract desired subsets of data and is applicable to all expression matrix-format data, including RNA sequencing and microarrays. Further, it generalizes to quantification matrices of other sequence-based genomics such as ATAC-seq and ChIP-seq. AVAILABILITY AND IMPLEMENTATION: https://ga4gh-rnaseq.github.io/schema/docs/index.html.
Asunto(s)
ARN , Programas Informáticos , Genómica , Genoma , Análisis de Secuencia de ARNRESUMEN
Coevolution between transposable elements (TEs) and their hosts can be antagonistic, where TEs evolve to avoid silencing and the host responds by reestablishing TE suppression, or mutualistic, where TEs are co-opted to benefit their host. The TART-A TE functions as an important component of Drosophila telomeres but has also reportedly inserted into the Drosophila melanogaster nuclear export factor gene nxf2. We find that, rather than inserting into nxf2, TART-A has actually captured a portion of nxf2 sequence. We show that TART-A produces abundant Piwi-interacting small RNAs (piRNAs), some of which are antisense to the nxf2 transcript, and that the TART-like region of nxf2 is evolving rapidly. Furthermore, in D. melanogaster, TART-A is present at higher copy numbers, and nxf2 shows reduced expression, compared to the closely related species Drosophila simulans. We propose that capturing nxf2 sequence allowed TART-A to target the nxf2 gene for piRNA-mediated repression and that these 2 elements are engaged in antagonistic coevolution despite the fact that TART-A is serving a critical role for its host genome.
Asunto(s)
Elementos Transponibles de ADN/genética , Proteínas de Drosophila/genética , ARN Interferente Pequeño/genética , Animales , Proteínas de Drosophila/metabolismo , Drosophila melanogaster , Evolución Molecular , Elementos de Nucleótido Esparcido Largo , Proteínas de Transporte Nucleocitoplasmático/genética , Proteínas de Transporte Nucleocitoplasmático/metabolismo , ARN Interferente Pequeño/metabolismo , Proteínas de Unión al ARN/genética , Proteínas de Unión al ARN/metabolismo , Telómero/genética , Telómero/metabolismoRESUMEN
The Encyclopedia of DNA Elements (ENCODE) is an ongoing collaborative research project aimed at identifying all the functional elements in the human and mouse genomes. Data generated by the ENCODE consortium are freely accessible at the ENCODE portal (https://www.encodeproject.org/), which is developed and maintained by the ENCODE Data Coordinating Center (DCC). Since the initial portal release in 2013, the ENCODE DCC has updated the portal to make ENCODE data more findable, accessible, interoperable and reusable. Here, we report on recent updates, including new ENCODE data and assays, ENCODE uniform data processing pipelines, new visualization tools, a dataset cart feature, unrestricted public access to ENCODE data on the cloud (Amazon Web Services open data registry, https://registry.opendata.aws/encode-project/) and more comprehensive tutorials and documentation.
Asunto(s)
ADN/genética , Bases de Datos Genéticas , Genoma Humano , Programas Informáticos , Animales , Genómica , Humanos , RatonesRESUMEN
The use of host nutrients to support pathogen growth is central to disease. We addressed the relationship between metabolism and trophic behavior by comparing metabolic gene expression during potato tuber colonization by two oomycetes, the hemibiotroph Phytophthora infestans and the necrotroph Pythium ultimum. Genes for several pathways including amino acid, nucleotide, and cofactor biosynthesis were expressed more by Ph. infestans during its biotrophic stage compared to Py. ultimum. In contrast, Py. ultimum had higher expression of genes for metabolizing compounds that are normally sequestered within plant cells but released to the pathogen upon plant cell lysis, such as starch and triacylglycerides. The transcription pattern of metabolic genes in Ph. infestans during late infection became more like that of Py. ultimum, consistent with the former's transition to necrotrophy. Interspecific variation in metabolic gene content was limited but included the presence of γ-amylase only in Py. ultimum. The pathogens were also found to employ strikingly distinct strategies for using nitrate. Measurements of mRNA, 15N labeling studies, enzyme assays, and immunoblotting indicated that the assimilation pathway in Ph. infestans was nitrate-insensitive but induced during amino acid and ammonium starvation. In contrast, the pathway was nitrate-induced but not amino acid-repressed in Py. ultimum. The lack of amino acid repression in Py. ultimum appears due to the absence of a transcription factor common to fungi and Phytophthora that acts as a nitrogen metabolite repressor. Evidence for functional diversification in nitrate reductase protein was also observed. Its temperature optimum was adapted to each organism's growth range, and its Km was much lower in Py. ultimum. In summary, we observed divergence in patterns of gene expression, gene content, and enzyme function which contribute to the fitness of each species in its niche.
Asunto(s)
Proteínas Fúngicas/genética , Glucano 1,4-alfa-Glucosidasa/metabolismo , Nutrientes/metabolismo , Phytophthora/genética , Enfermedades de las Plantas/parasitología , Tubérculos de la Planta/metabolismo , Solanum tuberosum/metabolismo , Adaptación Fisiológica , Evolución Molecular , Proteínas Fúngicas/metabolismo , Perfilación de la Expresión Génica , Regulación Fúngica de la Expresión Génica , Interacciones Huésped-Parásitos/genética , Phytophthora/clasificación , Phytophthora/fisiología , Enfermedades de las Plantas/genética , Tubérculos de la Planta/crecimiento & desarrollo , Tubérculos de la Planta/parasitología , Solanum tuberosum/crecimiento & desarrollo , Solanum tuberosum/parasitologíaRESUMEN
Flagellated spores play important roles in the infection of plants and animals by many eukaryotic microbes. The oomycete Phytophthora infestans, which causes potato blight, expresses two phosphagen kinases (PKs). These enzymes store energy in taurocyamine, and are hypothesized to resolve spatial and temporal imbalances between rates of ATP creation and use in zoospores. A dimeric PK is found at low levels in vegetative mycelia, but high levels in ungerminated sporangia and zoospores. In contrast, a monomeric PK protein is at similar levels in all tissues, although is transcribed primarily in mycelia. Subcellular localization studies indicate that the monomeric PK is mitochondrial. In contrast, the dimeric PK is cytoplasmic in mycelia and sporangia but is retargeted to flagellar axonemes during zoosporogenesis. This supports a model in which PKs shuttle energy from mitochondria to and through flagella. Metabolite analysis indicates that deployment of the flagellar PK is coordinated with a large increase in taurocyamine, synthesized by sporulation-induced enzymes that were lost during the evolution of zoospore-lacking oomycetes. Thus, PK function is enabled by coordination of the transcriptional, metabolic and protein targeting machinery during the life cycle. Since plants lack PKs, the enzymes may be useful targets for inhibitors of oomycete plant pathogens.
Asunto(s)
Flagelos/enzimología , Regulación de la Expresión Génica/fisiología , Fosfotransferasas/metabolismo , Phytophthora infestans/enzimología , Esporas/enzimología , Adenosina Trifosfato/metabolismo , Animales , Citoplasma/enzimología , Solanum lycopersicum/genética , Solanum lycopersicum/parasitología , Mitocondrias/metabolismo , Fosfotransferasas/genética , Phytophthora infestans/genética , Esporangios/enzimología , Taurina/análogos & derivados , Taurina/metabolismoRESUMEN
BACKGROUND: The quick and accurate identification of viruses is essential for plant disease management. Next-generation sequencing (NGS) technology may allow the discovery, detection, and identification of plant pathogens. This study adopted RNA-sequencing (RNA-Seq) technology to explore the viruses in three potato plants (S3, S4, and S6) growing under field conditions. RESULTS: Potato-known infecting viruses, such as alfalfa mosaic virus (AMV), potato leafroll virus (PLRV), and potato virus Y (PVY), were identified using bioinformatics programs and validated using RT-PCR. The presence of these potato viruses was also confirmed by visual inspection of host symptoms. In addition, the nearly complete genome of PLRV and the complete or partial genome sequence of multipartite virus segments have been identified. Besides the three major potato viruses that BLASTn analysis revealed were present in our samples, BLASTx analysis revealed some reads are derived from other potato viruses, such as potato virus V (PVV), Andean potato latent virus (APLV), and tomato chlorosis virus (ToCV), which are not frequently reported in potato field screenings in Egypt. Other microbial agents, such as bacteria and fungi, were also identified in the examined sample sequences. Some mycovirus sequences derived from ourmia-like viruses and Alternaria alternata chrysovirus were also identified in sample S4, confirming the complexity of the potato microbiome under field conditions. CONCLUSION: NGS quickly and accurately identifies potato plant viruses under field conditions. Implementing this technology on a larger scale is recommended to explore potato fields and imported plants, where symptoms may be absent, unspecific, or only triggered under certain conditions.
RESUMEN
The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and available via the ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL; https://openwdl.org/) is publicly available in GitHub, with images available on Dockerhub (https://hub.docker.com), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environments via Cromwell. Access to the pipelines and data via the cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses.
RESUMEN
The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and available via the ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL; https://openwdl.org/) is publicly available in GitHub, with images available on Dockerhub (https://hub.docker.com), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environments via Cromwell. Access to the pipelines and data via the cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses.
RESUMEN
The majority of mammalian genes encode multiple transcript isoforms that result from differential promoter use, changes in exonic splicing, and alternative 3' end choice. Detecting and quantifying transcript isoforms across tissues, cell types, and species has been extremely challenging because transcripts are much longer than the short reads normally used for RNA-seq. By contrast, long-read RNA-seq (LR-RNA-seq) gives the complete structure of most transcripts. We sequenced 264 LR-RNA-seq PacBio libraries totaling over 1 billion circular consensus reads (CCS) for 81 unique human and mouse samples. We detect at least one full-length transcript from 87.7% of annotated human protein coding genes and a total of 200,000 full-length transcripts, 40% of which have novel exon junction chains. To capture and compute on the three sources of transcript structure diversity, we introduce a gene and transcript annotation framework that uses triplets representing the transcript start site, exon junction chain, and transcript end site of each transcript. Using triplets in a simplex representation demonstrates how promoter selection, splice pattern, and 3' processing are deployed across human tissues, with nearly half of multi-transcript protein coding genes showing a clear bias toward one of the three diversity mechanisms. Evaluated across samples, the predominantly expressed transcript changes for 74% of protein coding genes. In evolution, the human and mouse transcriptomes are globally similar in types of transcript structure diversity, yet among individual orthologous gene pairs, more than half (57.8%) show substantial differences in mechanism of diversification in matching tissues. This initial large-scale survey of human and mouse long-read transcriptomes provides a foundation for further analyses of alternative transcript usage, and is complemented by short-read and microRNA data on the same samples and by epigenome data elsewhere in the ENCODE4 collection.
RESUMEN
The oomycete Phytophthora infestans, the causal agent of potato and tomato blight, expresses two extracellular invertases. Unlike typical fungal invertases, the P. infestans genes are not sucrose induced or glucose repressed but instead appear to be under developmental control. Transcript levels of both genes were very low in mycelia harvested from artificial medium but high in preinfection stages (sporangia, zoospores, and germinated cysts), high during biotrophic growth in leaves and tubers, and low during necrotrophy. Genome-wide analyses of metabolic enzymes and effectors indicated that this expression profile was fairly unusual, matched only by a few other enzymes, such as carbonic anhydrases and a few RXLR effectors. Genes for other metabolic enzymes were typically downregulated in the preinfection stages. Overall metabolic gene expression during the necrotrophic stage of infection clustered with artificial medium, while the biotrophic phase formed a separate cluster. Confocal microscopy of transformants expressing green fluorescent protein (GFP) fusions indicated that invertase protein resided primarily in haustoria during infection. This localization was not attributable to haustorium-specific promoter activity. Instead, the N-terminal regions of proteins containing signal peptides were sufficient to deliver proteins to haustoria. Invertase expression during leaf infection was linked to a decline in apoplastic sucrose, consistent with a role of the enzymes in plant pathogenesis. This was also suggested by the discovery that invertase genes occur across multiple orders of oomycetes but not in most animal pathogens or a mycoparasite.IMPORTANCE Oomycetes cause hundreds of diseases in economically and environmentally significant plants. How these microbes acquire host nutrients is not well understood. Many oomycetes insert specialized hyphae called haustoria into plant cells, but unlike their fungal counterparts, a role in nutrition has remained unproven. The discovery that Phytophthora invertases localize to haustoria provides the first strong evidence that these structures participate in feeding. Since regions of proteins containing signal peptides targeted proteins to the haustorium-plant interface, haustoria appear to be the primary machinery for secreting proteins during biotrophic pathogenesis. Although oomycete invertases were acquired laterally from fungi, their expression patterns have adapted to the Phytophthora lifestyle by abandoning substrate-level regulation in favor of developmental control, allowing the enzymes to be produced in anticipation of plant colonization. This study highlights how a widely distributed hydrolytic enzyme has evolved new behaviors in oomycetes.
Asunto(s)
Hifa/enzimología , Phytophthora infestans/enzimología , Phytophthora infestans/genética , Solanum lycopersicum/microbiología , beta-Fructofuranosidasa/genética , Perfilación de la Expresión Génica , Estudio de Asociación del Genoma Completo , Enfermedades de las Plantas/microbiología , Hojas de la Planta/microbiología , Solanum tuberosum/microbiologíaRESUMEN
Importance: No previous studies have shown that acute inhalation of thirdhand smoke (THS) activates stress and survival pathways in the human nasal epithelium. Objective: To evaluate gene expression in the nasal epithelium of nonsmoking women following acute inhalation of clean air and THS. Design, Setting, and Participants: Nasal epithelium samples were obtained from participants in a randomized clinical trial (2011-2015) on the health effects of inhaled THS. In a crossover design, participants were exposed, head only, to THS and to conditioned, filtered air in a laboratory setting. The order of exposures was randomized and exposures were separated by at least 21 days. Ribonucleic acid was obtained from a subset of 4 healthy, nonsmoking women. Exposures: By chance, women in the subset were randomized to receive clean air exposure first and THS exposure second. Exposures lasted 3 hours. Main Outcomes and Measures: Differentially expressed genes were identified using RNA sequencing with a false-discovery rate less than 0.1. Results: Participants were 4 healthy, nonsmoking women aged 27 to 49 years (mean [SD] age, 42 [10.2] years) with no chronic diseases. A total of 389 differentially expressed genes were identified in nasal epithelium exposed to THS, while only 2 genes, which were not studied further, were affected by clean air. Enriched gene ontology terms associated with stress-induced mitochondrial hyperfusion were identified, such as respiratory electron transport chain (q = 2.84 × 10-3) and mitochondrial inner membrane (q = 7.21 × 10-6). Reactome pathway analysis identified terms associated with upregulation of DNA repair mechanisms, such as nucleotide excision repair (q = 1.05 × 10-2). Enrichment analyses using ingenuity pathway analysis identified canonical pathways related to stress-induced mitochondrial hyperfusion (eg, increased oxidative phosphorylation) (P = .001), oxidative stress (eg, glutathione depletion phase II reactions) (P = .04), and cell survival (z score = 5.026). Conclusions and Relevance: This study found that acute inhalation of THS caused cell stress that led to the activation of survival pathways. Some responses were consistent with stress-induced mitochondrial hyperfusion and similar to those demonstrated previously in vitro. These data may be valuable to physicians treating patients exposed to THS and may aid in formulating regulations for the remediation of THS-contaminated environments.
Asunto(s)
Contaminantes Atmosféricos/efectos adversos , Mucosa Nasal/fisiología , Humo/efectos adversos , Transcriptoma/fisiología , Adulto , Muerte Celular/fisiología , Supervivencia Celular/fisiología , Estudios Cruzados , Reparación del ADN/fisiología , Exposición a Riesgos Ambientales/efectos adversos , Femenino , Expresión Génica/fisiología , Voluntarios Sanos , Humanos , Persona de Mediana Edad , Estrés Fisiológico/fisiología , Contaminación por Humo de Tabaco/efectos adversosRESUMEN
The Encyclopedia of DNA Elements (ENCODE) web portal hosts genomic data generated by the ENCODE Consortium, Genomics of Gene Regulation, The NIH Roadmap Epigenomics Consortium, and the modENCODE and modERN projects. The goal of the ENCODE project is to build a comprehensive map of the functional elements of the human and mouse genomes. Currently, the portal database stores over 500 TB of raw and processed data from over 15,000 experiments spanning assays that measure gene expression, DNA accessibility, DNA and RNA binding, DNA methylation, and 3D chromatin structure across numerous cell lines, tissue types, and differentiation states with selected genetic and molecular perturbations. The ENCODE portal provides unrestricted access to the aforementioned data and relevant metadata as a service to the scientific community. The metadata model captures the details of the experiments, raw and processed data files, and processing pipelines in human and machine-readable form and enables the user to search for specific data either using a web browser or programmatically via REST API. Furthermore, ENCODE data can be freely visualized or downloaded for additional analyses. © 2019 The Authors. Basic Protocol: Query the portal Support Protocol 1: Batch downloading Support Protocol 2: Using the cart to download files Support Protocol 3: Visualize data Alternate Protocol: Query building and programmatic access.