|

The ENCODE Uniform Analysis Pipelines.

Hitz, Benjamin C; Lee, Jin-Wook; Jolanki, Otto; Kagda, Meenakshi S; Graham, Keenan; Sud, Paul; Gabdank, Idan; Strattan, J Seth; Sloan, Cricket A; Dreszer, Timothy; Rowe, Laurence D; Podduturi, Nikhil R; Malladi, Venkat S; Chan, Esther T; Davidson, Jean M; Ho, Marcus; Miyasato, Stuart; Simison, Matt; Tanaka, Forrest; Luo, Yunhai; Whaling, Ian; Hong, Eurie L; Lee, Brian T; Sandstrom, Richard; Rynes, Eric; Nelson, Jemma; Nishida, Andrew; Ingersoll, Alyssa; Buckley, Michael; Frerker, Mark; Kim, Daniel S; Boley, Nathan; Trout, Diane; Dobin, Alex; Rahmanian, Sorena; Wyman, Dana; Balderrama-Gutierrez, Gabriela; Reese, Fairlie; Durand, Neva C; Dudchenko, Olga; Weisz, David; Rao, Suhas S P; Blackburn, Alyssa; Gkountaroulis, Dimos; Sadr, Mahdi; Olshansky, Moshe; Eliaz, Yossi; Nguyen, Dat; Bochkov, Ivan; Shamim, Muhammad Saad.

Res Sq ; 2023 Jul 19.

Article En | MEDLINE | ID: mdl-37503119

The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and available via the ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL; https://openwdl.org/) is publicly available in GitHub, with images available on Dockerhub (https://hub.docker.com), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environments via Cromwell. Access to the pipelines and data via the cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses.

The ENCODE4 long-read RNA-seq collection reveals distinct classes of transcript structure diversity.

Reese, Fairlie; Williams, Brian; Balderrama-Gutierrez, Gabriela; Wyman, Dana; Çelik, Muhammed Hasan; Rebboah, Elisabeth; Rezaie, Narges; Trout, Diane; Razavi-Mohseni, Milad; Jiang, Yunzhe; Borsari, Beatrice; Morabito, Samuel; Liang, Heidi Yahan; McGill, Cassandra J; Rahmanian, Sorena; Sakr, Jasmine; Jiang, Shan; Zeng, Weihua; Carvalho, Klebea; Weimer, Annika K; Dionne, Louise A; McShane, Ariel; Bedi, Karan; Elhajjajy, Shaimae I; Upchurch, Sean; Jou, Jennifer; Youngworth, Ingrid; Gabdank, Idan; Sud, Paul; Jolanki, Otto; Strattan, J Seth; Kagda, Meenakshi S; Snyder, Michael P; Hitz, Ben C; Moore, Jill E; Weng, Zhiping; Bennett, David; Reinholdt, Laura; Ljungman, Mats; Beer, Michael A; Gerstein, Mark B; Pachter, Lior; Guigó, Roderic; Wold, Barbara J; Mortazavi, Ali.

bioRxiv ; 2023 May 16.

Article En | MEDLINE | ID: mdl-37292896

The majority of mammalian genes encode multiple transcript isoforms that result from differential promoter use, changes in exonic splicing, and alternative 3' end choice. Detecting and quantifying transcript isoforms across tissues, cell types, and species has been extremely challenging because transcripts are much longer than the short reads normally used for RNA-seq. By contrast, long-read RNA-seq (LR-RNA-seq) gives the complete structure of most transcripts. We sequenced 264 LR-RNA-seq PacBio libraries totaling over 1 billion circular consensus reads (CCS) for 81 unique human and mouse samples. We detect at least one full-length transcript from 87.7% of annotated human protein coding genes and a total of 200,000 full-length transcripts, 40% of which have novel exon junction chains. To capture and compute on the three sources of transcript structure diversity, we introduce a gene and transcript annotation framework that uses triplets representing the transcript start site, exon junction chain, and transcript end site of each transcript. Using triplets in a simplex representation demonstrates how promoter selection, splice pattern, and 3' processing are deployed across human tissues, with nearly half of multi-transcript protein coding genes showing a clear bias toward one of the three diversity mechanisms. Evaluated across samples, the predominantly expressed transcript changes for 74% of protein coding genes. In evolution, the human and mouse transcriptomes are globally similar in types of transcript structure diversity, yet among individual orthologous gene pairs, more than half (57.8%) show substantial differences in mechanism of diversification in matching tissues. This initial large-scale survey of human and mouse long-read transcriptomes provides a foundation for further analyses of alternative transcript usage, and is complemented by short-read and microRNA data on the same samples and by epigenome data elsewhere in the ENCODE4 collection.

The ENCODE Uniform Analysis Pipelines.

Hitz, Benjamin C; Jin-Wook, Lee; Jolanki, Otto; Kagda, Meenakshi S; Graham, Keenan; Sud, Paul; Gabdank, Idan; Strattan, J Seth; Sloan, Cricket A; Dreszer, Timothy; Rowe, Laurence D; Podduturi, Nikhil R; Malladi, Venkat S; Chan, Esther T; Davidson, Jean M; Ho, Marcus; Miyasato, Stuart; Simison, Matt; Tanaka, Forrest; Luo, Yunhai; Whaling, Ian; Hong, Eurie L; Lee, Brian T; Sandstrom, Richard; Rynes, Eric; Nelson, Jemma; Nishida, Andrew; Ingersoll, Alyssa; Buckley, Michael; Frerker, Mark; Kim, Daniel S; Boley, Nathan; Trout, Diane; Dobin, Alex; Rahmanian, Sorena; Wyman, Dana; Balderrama-Gutierrez, Gabriela; Reese, Fairlie; Durand, Neva C; Dudchenko, Olga; Weisz, David; Rao, Suhas S P; Blackburn, Alyssa; Gkountaroulis, Dimos; Sadr, Mahdi; Olshansky, Moshe; Eliaz, Yossi; Nguyen, Dat; Bochkov, Ivan; Shamim, Muhammad Saad.

bioRxiv ; 2023 Apr 06.

Article En | MEDLINE | ID: mdl-37066421

New developments on the Encyclopedia of DNA Elements (ENCODE) data portal.

Luo, Yunhai; Hitz, Benjamin C; Gabdank, Idan; Hilton, Jason A; Kagda, Meenakshi S; Lam, Bonita; Myers, Zachary; Sud, Paul; Jou, Jennifer; Lin, Khine; Baymuradov, Ulugbek K; Graham, Keenan; Litton, Casey; Miyasato, Stuart R; Strattan, J Seth; Jolanki, Otto; Lee, Jin-Wook; Tanaka, Forrest Y; Adenekan, Philip; O'Neill, Emma; Cherry, J Michael.

Nucleic Acids Res ; 48(D1): D882-D889, 2020 01 08.

Article En | MEDLINE | ID: mdl-31713622

The Encyclopedia of DNA Elements (ENCODE) is an ongoing collaborative research project aimed at identifying all the functional elements in the human and mouse genomes. Data generated by the ENCODE consortium are freely accessible at the ENCODE portal (https://www.encodeproject.org/), which is developed and maintained by the ENCODE Data Coordinating Center (DCC). Since the initial portal release in 2013, the ENCODE DCC has updated the portal to make ENCODE data more findable, accessible, interoperable and reusable. Here, we report on recent updates, including new ENCODE data and assays, ENCODE uniform data processing pipelines, new visualization tools, a dataset cart feature, unrestricted public access to ENCODE data on the cloud (Amazon Web Services open data registry, https://registry.opendata.aws/encode-project/) and more comprehensive tutorials and documentation.

DNA/genetics , Databases, Genetic , Genome, Human , Software , Animals , Genomics , Humans , Mice

The ENCODE Portal as an Epigenomics Resource.

Jou, Jennifer; Gabdank, Idan; Luo, Yunhai; Lin, Khine; Sud, Paul; Myers, Zachary; Hilton, Jason A; Kagda, Meenakshi S; Lam, Bonita; O'Neill, Emma; Adenekan, Philip; Graham, Keenan; Baymuradov, Ulugbek K; R Miyasato, Stuart; Strattan, J Seth; Jolanki, Otto; Lee, Jin-Wook; Litton, Casey; Y Tanaka, Forrest; Hitz, Benjamin C; Cherry, J Michael.

Curr Protoc Bioinformatics ; 68(1): e89, 2019 12.

Article En | MEDLINE | ID: mdl-31751002

The Encyclopedia of DNA Elements (ENCODE) web portal hosts genomic data generated by the ENCODE Consortium, Genomics of Gene Regulation, The NIH Roadmap Epigenomics Consortium, and the modENCODE and modERN projects. The goal of the ENCODE project is to build a comprehensive map of the functional elements of the human and mouse genomes. Currently, the portal database stores over 500 TB of raw and processed data from over 15,000 experiments spanning assays that measure gene expression, DNA accessibility, DNA and RNA binding, DNA methylation, and 3D chromatin structure across numerous cell lines, tissue types, and differentiation states with selected genetic and molecular perturbations. The ENCODE portal provides unrestricted access to the aforementioned data and relevant metadata as a service to the scientific community. The metadata model captures the details of the experiments, raw and processed data files, and processing pipelines in human and machine-readable form and enables the user to search for specific data either using a web browser or programmatically via REST API. Furthermore, ENCODE data can be freely visualized or downloaded for additional analyses. © 2019 The Authors. Basic Protocol: Query the portal Support Protocol 1: Batch downloading Support Protocol 2: Using the cart to download files Support Protocol 3: Visualize data Alternate Protocol: Query building and programmatic access.

Chromatin/metabolism , DNA/genetics , Databases, Genetic , Epigenomics/methods , Animals , DNA Methylation , Genome, Human , Humans , Internet , Metadata , Mice , Software

Gold-Embedded Hollow Silica Nanogolf Balls for Imaging and Photothermal Therapy.

Janetanakit, Woraphong; Wang, Liping; Santacruz-Gomez, Karla; Landon, Preston B; Sud, Paul L; Patel, Nirav; Jang, Grace; Jain, Malvika; Yepremyan, Alice; Kazmi, Sami A; Ban, Deependra K; Zhang, Feng; Lal, Ratnesh.

ACS Appl Mater Interfaces ; 9(33): 27533-27543, 2017 Aug 23.

Article En | MEDLINE | ID: mdl-28752765

Hybrid nanocarriers with multifunctional properties have wide therapeutic and diagnostic applications. We have constructed hollow silica nanogolf balls (HGBs) and gold-embedded hollow silica nanogolf balls (Au@SiO2 HGBs) using the layer-by-layer approach on a symmetric polystyrene (PS) Janus template; the template consists of smaller PS spheres attached to an oppositely charged large PS core. Î¶ Potential measurement supports the electric force-based template-assisted synthesis mechanism. Electron microscopy, UV-vis, and near-infrared (NIR) spectroscopy show that HGBs or Au@SiO2 HGBs are composed of a porous silica shell with an optional dense layer of gold nanoparticles embedded in the silica shell. To visualize their cellular uptake and imaging potential, Au@SiO2 HGBs were loaded with quantum dots (QDs). Confocal fluorescent microscopy and atomic force microscopy imaging show reliable endocytosis of QD-loaded Au@SiO2 HGBs in adherent HeLa cells and circulating red blood cells (RBCs). Surface-enhanced Raman spectroscopy of Au@SiO2 HGBs in RBC cells show enhanced intensity of the Raman signal specific to the RBCs' membrane specific spectral markers. Au@SiO2 HGBs show localized surface plasmon resonance and heat-induced HeLa cell death in the NIR range. These hybrid golf ball nanocarriers would have broad applications in personalized nanomedicine ranging from in vivo imaging to photothermal therapy.

Gold/chemistry , HeLa Cells , Humans , Metal Nanoparticles , Silicon Dioxide , Spectrum Analysis, Raman

Dual-Functionalized Theranostic Nanocarriers.

Mo, Alexander H; Zhang, Chen; Landon, Preston B; Janetanakit, Woraphong; Hwang, Michael T; Santacruz Gomez, Karla; Colburn, David A; Dossou, Samuel M; Lu, Tianyi; Cao, Yue; Sant, Vrinda; Sud, Paul L; Akkiraju, Siddhartha; Shubayev, Veronica I; Glinsky, Gennadi; Lal, Ratnesh.

ACS Appl Mater Interfaces ; 8(23): 14740-6, 2016 Jun 15.

Article En | MEDLINE | ID: mdl-27144808

Nanocarriers with the ability to spatially organize chemically distinct multiple bioactive moieties will have wide combinatory therapeutic and diagnostic (theranostic) applications. We have designed dual-functionalized, 100 nm to 1 µm sized scalable nanocarriers comprising a silica golf ball with amine or quaternary ammonium functional groups located in its pits and hydroxyl groups located on its nonpit surface. These functionalized golf balls selectively captured 10-40 nm charged gold nanoparticles (GNPs) into their pits. The selective capture of GNPs in the golf ball pits is visualized by scanning electron microscopy. Î¶ potential measurements and analytical modeling indicate that the GNP capture involves its proximity to and the electric charge on the surface of the golf balls. Potential applications of these dual-functionalized carriers include distinct attachment of multiple agents for multifunctional theranostic applications, selective scavenging, and clearance of harmful substances.

Theranostic Nanomedicine/methods , Gold/chemistry , Metal Nanoparticles/chemistry , Metal Nanoparticles/ultrastructure , Microscopy, Electron, Scanning , Silicon Dioxide