Pesquisa | BVS CLAP/SMR-OPAS/OMS

1.

hipFG: high-throughput harmonization and integration pipeline for functional genomics data.

Cifello, Jeffrey; Kuksa, Pavel P; Saravanan, Naveensri; Valladares, Otto; Wang, Li-San; Leung, Yuk Yee.

Bioinformatics ; 39(11)2023 11 01.

Artigo em Inglês | MEDLINE | ID: mdl-37947320

RESUMO

SUMMARY: Preparing functional genomic (FG) data with diverse assay types and file formats for integration into analysis workflows that interpret genome-wide association and other studies is a significant and time-consuming challenge. Here we introduce hipFG (Harmonization and Integration Pipeline for Functional Genomics), an automatically customized pipeline for efficient and scalable normalization of heterogenous FG data collections into standardized, indexed, rapidly searchable analysis-ready datasets while accounting for FG datatypes (e.g. chromatin interactions, genomic intervals, quantitative trait loci). AVAILABILITY AND IMPLEMENTATION: hipFG is freely available at https://bitbucket.org/wanglab-upenn/hipFG. A Docker container is available at https://hub.docker.com/r/wanglab/hipfg.

Assuntos

Estudo de Associação Genômica Ampla , Software , Genômica , Cromatina , Locos de Características Quantitativas

2.

NIAGADS Alzheimer's GenomicsDB: A resource for exploring Alzheimer's disease genetic and genomic knowledge.

Greenfest-Allen, Emily; Valladares, Otto; Kuksa, Pavel P; Gangadharan, Prabhakaran; Lee, Wan-Ping; Cifello, Jeffrey; Katanic, Zivadin; Kuzma, Amanda B; Wheeler, Nicholas; Bush, William S; Leung, Yuk Yee; Schellenberg, Gerard; Stoeckert, Christian J; Wang, Li-San.

Alzheimers Dement ; 20(2): 1123-1136, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-37881831

RESUMO

INTRODUCTION: The National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site Alzheimer's Genomics Database (GenomicsDB) is a public knowledge base of Alzheimer's disease (AD) genetic datasets and genomic annotations. METHODS: GenomicsDB uses a custom systems architecture to adopt and enforce rigorous standards that facilitate harmonization of AD-relevant genome-wide association study summary statistics datasets with functional annotations, including over 230 million annotated variants from the AD Sequencing Project. RESULTS: GenomicsDB generates interactive reports compiled from the harmonized datasets and annotations. These reports contextualize AD-risk associations in a broader functional genomic setting and summarize them in the context of functionally annotated genes and variants. DISCUSSION: Created to make AD-genetics knowledge more accessible to AD researchers, the GenomicsDB is designed to guide users unfamiliar with genetic data in not only exploring but also interpreting this ever-growing volume of data. Scalable and interoperable with other genomics resources using data technology standards, the GenomicsDB can serve as a central hub for research and data analysis on AD and related dementias. HIGHLIGHTS: The National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site (NIAGADS) offers to the public a unique, disease-centric collection of AD-relevant GWAS summary statistics datasets. Interpreting these data is challenging and requires significant bioinformatics expertise to standardize datasets and harmonize them with functional annotations on genome-wide scales. The NIAGADS Alzheimer's GenomicsDB helps overcome these challenges by providing a user-friendly public knowledge base for AD-relevant genetics that shares harmonized, annotated summary statistics datasets from the NIAGADS repository in an interpretable, easily searchable format.

Assuntos

Doença de Alzheimer , Estados Unidos , Humanos , Doença de Alzheimer/genética , Estudo de Associação Genômica Ampla , National Institute on Aging (U.S.) , Genômica , Bases de Dados Factuais , Predisposição Genética para Doença/genética

3.

SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants.

Kuksa, Pavel P; Lee, Chien-Yueh; Amlie-Wolf, Alexandre; Gangadharan, Prabhakaran; Mlynarski, Elizabeth E; Chou, Yi-Fan; Lin, Han-Jen; Issen, Heather; Greenfest-Allen, Emily; Valladares, Otto; Leung, Yuk Yee; Wang, Li-San.

Bioinformatics ; 36(12): 3879-3881, 2020 06 01.

Artigo em Inglês | MEDLINE | ID: mdl-32330239

RESUMO

SUMMARY: We report Spark-based INFERence of the molecular mechanisms of NOn-coding genetic variants (SparkINFERNO), a scalable bioinformatics pipeline characterizing non-coding genome-wide association study (GWAS) association findings. SparkINFERNO prioritizes causal variants underlying GWAS association signals and reports relevant regulatory elements, tissue contexts and plausible target genes they affect. To achieve this, the SparkINFERNO algorithm integrates GWAS summary statistics with large-scale collection of functional genomics datasets spanning enhancer activity, transcription factor binding, expression quantitative trait loci and other functional datasets across more than 400 tissues and cell types. Scalability is achieved by an underlying API implemented using Apache Spark and Giggle-based genomic indexing. We evaluated SparkINFERNO on large GWASs and show that SparkINFERNO is more than 60 times efficient and scales with data size and amount of computational resources. AVAILABILITY AND IMPLEMENTATION: SparkINFERNO runs on clusters or a single server with Apache Spark environment, and is available at https://bitbucket.org/wanglab-upenn/SparkINFERNO or https://hub.docker.com/r/wanglab/spark-inferno. CONTACT: lswang@pennmedicine.upenn.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Estudo de Associação Genômica Ampla , Locos de Características Quantitativas , Algoritmos , Genômica , Software

4.

DASHR 2.0: integrated database of human small non-coding RNA genes and mature products.

Kuksa, Pavel P; Amlie-Wolf, Alexandre; Katanic, Zivadin; Valladares, Otto; Wang, Li-San; Leung, Yuk Yee.

Bioinformatics ; 35(6): 1033-1039, 2019 03 15.

Artigo em Inglês | MEDLINE | ID: mdl-30668832

RESUMO

MOTIVATION: Small non-coding RNAs (sncRNAs, <100 nts) are highly abundant RNAs that regulate diverse and often tissue-specific cellular processes by associating with transcription factor complexes or binding to mRNAs. While thousands of sncRNA genes exist in the human genome, no single resource provides searchable, unified annotation, expression and processing information for full sncRNA transcripts and mature RNA products derived from these larger RNAs. RESULTS: Our goal is to establish a complete catalog of annotation, expression, processing, conservation, tissue-specificity and other biological features for all human sncRNA genes and mature products derived from all major RNA classes. DASHR (Database of small human non-coding RNAs) v2.0 database is the first that integrates human sncRNA gene and mature products profiles obtained from multiple RNA-seq protocols. Altogether, 185 tissues/cell types and sncRNA annotations and >800 curated experiments from ENCODE and GEO/SRA across multiple RNA-seq protocols for both GRCh38/hg38 and GRCh37/hg19 assemblies are integrated in DASHR. Moreover, DASHR is the first to contain both known and novel, previously un-annotated sncRNA loci identified by unsupervised segmentation (13 times more loci with 1 678 800 total). Additionally, DASHR v2.0 adds >3 200 000 annotations for non-small RNA genes and other genomic features (long-noncoding RNAs, mRNAs, promoters, repeats). Furthermore, DASHR v2.0 introduces an enhanced user interface, interactive experiment-by-locus table view, sncRNA locus sorting and filtering by biological features. All annotation and expression information directly downloadable and accessible as UCSC genome browser tracks. AVAILABILITY AND IMPLEMENTATION: DASHR v2.0 is freely available at https://lisanwanglab.org/DASHRv2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Pequeno RNA não Traduzido/provisão & distribuição , Bases de Dados de Ácidos Nucleicos , Genômica , Humanos , RNA Longo não Codificante , Análise de Sequência de RNA , Software

5.

VCPA: genomic variant calling pipeline and data management tool for Alzheimer's Disease Sequencing Project.

Leung, Yuk Yee; Valladares, Otto; Chou, Yi-Fan; Lin, Han-Jen; Kuzma, Amanda B; Cantwell, Laura; Qu, Liming; Gangadharan, Prabhakaran; Salerno, William J; Schellenberg, Gerard D; Wang, Li-San.

Bioinformatics ; 35(10): 1768-1770, 2019 05 15.

Artigo em Inglês | MEDLINE | ID: mdl-30351394

RESUMO

SUMMARY: We report VCPA, our SNP/Indel Variant Calling Pipeline and data management tool used for the analysis of whole genome and exome sequencing (WGS/WES) for the Alzheimer's Disease Sequencing Project. VCPA consists of two independent but linkable components: pipeline and tracking database. The pipeline, implemented using the Workflow Description Language and fully optimized for the Amazon elastic compute cloud environment, includes steps from aligning raw sequence reads to variant calling using GATK. The tracking database allows users to view job running status in real time and visualize >100 quality metrics per genome. VCPA is functionally equivalent to the CCDG/TOPMed pipeline. Users can use the pipeline and the dockerized database to process large WGS/WES datasets on Amazon cloud with minimal configuration. AVAILABILITY AND IMPLEMENTATION: VCPA is released under the MIT license and is available for academic and nonprofit use for free. The pipeline source code and step-by-step instructions are available from the National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site (http://www.niagads.org/VCPA). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Doença de Alzheimer , Gerenciamento de Dados , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Software

6.

SPAR: small RNA-seq portal for analysis of sequencing experiments.

Kuksa, Pavel P; Amlie-Wolf, Alexandre; Katanic, Zivadin; Valladares, Otto; Wang, Li-San; Leung, Yuk Yee.

Nucleic Acids Res ; 46(W1): W36-W42, 2018 07 02.

Artigo em Inglês | MEDLINE | ID: mdl-29733404

RESUMO

The introduction of new high-throughput small RNA sequencing protocols that generate large-scale genomics datasets along with increasing evidence of the significant regulatory roles of small non-coding RNAs (sncRNAs) have highlighted the urgent need for tools to analyze and interpret large amounts of small RNA sequencing data. However, it remains challenging to systematically and comprehensively discover and characterize sncRNA genes and specifically-processed sncRNA products from these datasets. To fill this gap, we present Small RNA-seq Portal for Analysis of sequencing expeRiments (SPAR), a user-friendly web server for interactive processing, analysis, annotation and visualization of small RNA sequencing data. SPAR supports sequencing data generated from various experimental protocols, including smRNA-seq, short total RNA sequencing, microRNA-seq, and single-cell small RNA-seq. Additionally, SPAR includes publicly available reference sncRNA datasets from our DASHR database and from ENCODE across 185 human tissues and cell types to produce highly informative small RNA annotations across all major small RNA types and other features such as co-localization with various genomic features, precursor transcript cleavage patterns, and conservation. SPAR allows the user to compare the input experiment against reference ENCODE/DASHR datasets. SPAR currently supports analyses of human (hg19, hg38) and mouse (mm10) sequencing data. SPAR is freely available at https://www.lisanwanglab.org/SPAR.

Assuntos

Biologia Computacional/tendências , Pequeno RNA não Traduzido/genética , RNA/genética , Software , Animais , Genômica , Sequenciamento de Nucleotídeos em Larga Escala/instrumentação , Humanos , Internet , Camundongos , Anotação de Sequência Molecular , Análise de Sequência de RNA/instrumentação , Transcriptoma/genética

7.

INFERNO: inferring the molecular mechanisms of noncoding genetic variants.

Amlie-Wolf, Alexandre; Tang, Mitchell; Mlynarski, Elisabeth E; Kuksa, Pavel P; Valladares, Otto; Katanic, Zivadin; Tsuang, Debby; Brown, Christopher D; Schellenberg, Gerard D; Wang, Li-San.

Nucleic Acids Res ; 46(17): 8740-8753, 2018 09 28.

Artigo em Inglês | MEDLINE | ID: mdl-30113658

RESUMO

The majority of variants identified by genome-wide association studies (GWAS) reside in the noncoding genome, affecting regulatory elements including transcriptional enhancers. However, characterizing their effects requires the integration of GWAS results with context-specific regulatory activity and linkage disequilibrium annotations to identify causal variants underlying noncoding association signals and the regulatory elements, tissue contexts, and target genes they affect. We propose INFERNO, a novel method which integrates hundreds of functional genomics datasets spanning enhancer activity, transcription factor binding sites, and expression quantitative trait loci with GWAS summary statistics. INFERNO includes novel statistical methods to quantify empirical enrichments of tissue-specific enhancer overlap and to identify co-regulatory networks of dysregulated long noncoding RNAs (lncRNAs). We applied INFERNO to two large GWAS studies. For schizophrenia (36,989 cases, 113,075 controls), INFERNO identified putatively causal variants affecting brain enhancers for known schizophrenia-related genes. For inflammatory bowel disease (IBD) (12,882 cases, 21,770 controls), INFERNO found enrichments of immune and digestive enhancers and lncRNAs involved in regulation of the adaptive immune response. In summary, INFERNO comprehensively infers the molecular mechanisms of causal noncoding variants, providing a sensitive hypothesis generation method for post-GWAS analysis. The software is available as an open source pipeline and a web server.

Assuntos

Elementos Facilitadores Genéticos , Genoma Humano , Doenças Inflamatórias Intestinais/genética , RNA Longo não Codificante/genética , Esquizofrenia/genética , Software , Imunidade Adaptativa , Estudos de Casos e Controles , Feminino , Marcadores Genéticos , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Doenças Inflamatórias Intestinais/imunologia , Doenças Inflamatórias Intestinais/fisiopatologia , Internet , Desequilíbrio de Ligação , Masculino , Fenótipo , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , RNA Longo não Codificante/imunologia , Esquizofrenia/imunologia , Esquizofrenia/fisiopatologia

8.

Patterns and rates of exonic de novo mutations in autism spectrum disorders.

Neale, Benjamin M; Kou, Yan; Liu, Li; Ma'ayan, Avi; Samocha, Kaitlin E; Sabo, Aniko; Lin, Chiao-Feng; Stevens, Christine; Wang, Li-San; Makarov, Vladimir; Polak, Paz; Yoon, Seungtai; Maguire, Jared; Crawford, Emily L; Campbell, Nicholas G; Geller, Evan T; Valladares, Otto; Schafer, Chad; Liu, Han; Zhao, Tuo; Cai, Guiqing; Lihm, Jayon; Dannenfelser, Ruth; Jabado, Omar; Peralta, Zuleyma; Nagaswamy, Uma; Muzny, Donna; Reid, Jeffrey G; Newsham, Irene; Wu, Yuanqing; Lewis, Lora; Han, Yi; Voight, Benjamin F; Lim, Elaine; Rossin, Elizabeth; Kirby, Andrew; Flannick, Jason; Fromer, Menachem; Shakir, Khalid; Fennell, Tim; Garimella, Kiran; Banks, Eric; Poplin, Ryan; Gabriel, Stacey; DePristo, Mark; Wimbish, Jack R; Boone, Braden E; Levy, Shawn E; Betancur, Catalina; Sunyaev, Shamil.

Nature ; 485(7397): 242-5, 2012 Apr 04.

Artigo em Inglês | MEDLINE | ID: mdl-22495311

RESUMO

Autism spectrum disorders (ASD) are believed to have genetic and environmental origins, yet in only a modest fraction of individuals can specific causes be identified. To identify further genetic risk factors, here we assess the role of de novo mutations in ASD by sequencing the exomes of ASD cases and their parents (n = 175 trios). Fewer than half of the cases (46.3%) carry a missense or nonsense de novo variant, and the overall rate of mutation is only modestly higher than the expected rate. In contrast, the proteins encoded by genes that harboured de novo missense or nonsense mutations showed a higher degree of connectivity among themselves and to previous ASD genes as indexed by protein-protein interaction screens. The small increase in the rate of de novo events, when taken together with the protein interaction results, are consistent with an important but limited role for de novo point mutations in ASD, similar to that documented for de novo copy number variants. Genetic models incorporating these data indicate that most of the observed de novo events are unconnected to ASD; those that do confer risk are distributed across many genes and are incompletely penetrant (that is, not necessarily sufficient for disease). Our results support polygenic models in which spontaneous coding mutations in any of a large number of genes increases risk by 5- to 20-fold. Despite the challenge posed by such models, results from de novo events and a large parallel case-control study provide strong evidence in favour of CHD8 and KATNAL2 as genuine autism risk factors.

Assuntos

Transtorno Autístico/genética , Proteínas de Ligação a DNA/genética , Éxons/genética , Predisposição Genética para Doença/genética , Mutação/genética , Fatores de Transcrição/genética , Estudos de Casos e Controles , Exoma/genética , Saúde da Família , Humanos , Modelos Genéticos , Herança Multifatorial/genética , Fenótipo , Distribuição de Poisson , Mapas de Interação de Proteínas

9.

DASHR: database of small human noncoding RNAs.

Leung, Yuk Yee; Kuksa, Pavel P; Amlie-Wolf, Alexandre; Valladares, Otto; Ungar, Lyle H; Kannan, Sampath; Gregory, Brian D; Wang, Li-San.

Nucleic Acids Res ; 44(D1): D216-22, 2016 Jan 04.

Artigo em Inglês | MEDLINE | ID: mdl-26553799

RESUMO

Small non-coding RNAs (sncRNAs) are highly abundant RNAs, typically <100 nucleotides long, that act as key regulators of diverse cellular processes. Although thousands of sncRNA genes are known to exist in the human genome, no single database provides searchable, unified annotation, and expression information for full sncRNA transcripts and mature RNA products derived from these larger RNAs. Here, we present the Database of small human noncoding RNAs (DASHR). DASHR contains the most comprehensive information to date on human sncRNA genes and mature sncRNA products. DASHR provides a simple user interface for researchers to view sequence and secondary structure, compare expression levels, and evidence of specific processing across all sncRNA genes and mature sncRNA products in various human tissues. DASHR annotation and expression data covers all major classes of sncRNAs including microRNAs (miRNAs), Piwi-interacting (piRNAs), small nuclear, nucleolar, cytoplasmic (sn-, sno-, scRNAs, respectively), transfer (tRNAs), and ribosomal RNAs (rRNAs). Currently, DASHR (v1.0) integrates 187 smRNA high-throughput sequencing (smRNA-seq) datasets with over 2.5 billion reads and annotation data from multiple public sources. DASHR contains annotations for â¼ 48,000 human sncRNA genes and mature sncRNA products, 82% of which are expressed in one or more of the curated tissues. DASHR is available at http://lisanwanglab.org/DASHR.

Assuntos

Bases de Dados de Ácidos Nucleicos , Pequeno RNA não Traduzido/metabolismo , Humanos , Anotação de Sequência Molecular , Processamento Pós-Transcricional do RNA , Pequeno RNA não Traduzido/química , Pequeno RNA não Traduzido/genética

10.

HIPPIE: a high-throughput identification pipeline for promoter interacting enhancer elements.

Hwang, Yih-Chii; Lin, Chiao-Feng; Valladares, Otto; Malamon, John; Kuksa, Pavel P; Zheng, Qi; Gregory, Brian D; Wang, Li-San.

Bioinformatics ; 31(8): 1290-2, 2015 Apr 15.

Artigo em Inglês | MEDLINE | ID: mdl-25480377

RESUMO

UNLABELLED: We implemented a high-throughput identification pipeline for promoter interacting enhancer element to streamline the workflow from mapping raw Hi-C reads, identifying DNA-DNA interacting fragments with high confidence and quality control, detecting histone modifications and DNase hypersensitive enrichments in putative enhancer elements, to ultimately extracting possible intra- and inter-chromosomal enhancer-target gene relationships. AVAILABILITY AND IMPLEMENTATION: This software package is designed to run on high-performance computing clusters with Oracle Grid Engine. The source code is freely available under the MIT license for academic and nonprofit use. The source code and instructions are available at the Wang lab website (http://wanglab.pcbi.upenn.edu/hippie/). It is also provided as an Amazon Machine Image to be used directly on Amazon Cloud with minimal installation. CONTACT: lswang@mail.med.upenn.edu or bdgregor@sas.upenn.edu SUPPLEMENTARY INFORMATION: Supplementary Material is available at Bioinformatics online.

Assuntos

DNA/genética , DNA/metabolismo , Elementos Facilitadores Genéticos/genética , Regiões Promotoras Genéticas/genética , Análise de Sequência de DNA/métodos , Humanos , Linguagens de Programação

11.

Global and local ancestry in African-Americans: Implications for Alzheimer's disease risk.

Hohman, Timothy J; Cooke-Bailey, Jessica N; Reitz, Christiane; Jun, Gyungah; Naj, Adam; Beecham, Gary W; Liu, Zhi; Carney, Regina M; Vance, Jeffrey M; Cuccaro, Michael L; Rajbhandary, Ruchita; Vardarajan, Badri Narayan; Wang, Li-San; Valladares, Otto; Lin, Chiao-Feng; Larson, Eric B; Graff-Radford, Neill R; Evans, Denis; De Jager, Philip L; Crane, Paul K; Buxbaum, Joseph D; Murrell, Jill R; Raj, Towfique; Ertekin-Taner, Nilufer; Logue, Mark W; Baldwin, Clinton T; Green, Robert C; Barnes, Lisa L; Cantwell, Laura B; Fallin, M Daniele; Go, Rodney C P; Griffith, Patrick; Obisesan, Thomas O; Manly, Jennifer J; Lunetta, Kathryn L; Kamboh, M Ilyas; Lopez, Oscar L; Bennett, David A; Hardy, John; Hendrie, Hugh C; Hall, Kathleen S; Goate, Alison M; Lang, Rosalyn; Byrd, Goldie S; Kukull, Walter A; Foroud, Tatiana M; Farrer, Lindsay A; Martin, Eden R; Pericak-Vance, Margaret A; Schellenberg, Gerard D.

Alzheimers Dement ; 12(3): 233-43, 2016 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-26092349

RESUMO

INTRODUCTION: African-American (AA) individuals have a higher risk for late-onset Alzheimer's disease (LOAD) than Americans of primarily European ancestry (EA). Recently, the largest genome-wide association study in AAs to date confirmed that six of the Alzheimer's disease (AD)-related genetic variants originally discovered in EA cohorts are also risk variants in AA; however, the risk attributable to many of the loci (e.g., APOE, ABCA7) differed substantially from previous studies in EA. There likely are risk variants of higher frequency in AAs that have not been discovered. METHODS: We performed a comprehensive analysis of genetically determined local and global ancestry in AAs with regard to LOAD status. RESULTS: Compared to controls, LOAD cases showed higher levels of African ancestry, both globally and at several LOAD relevant loci, which explained risk for AD beyond global differences. DISCUSSION: Exploratory post hoc analyses highlight regions with greatest differences in ancestry as potential candidate regions for future genetic analyses.

Assuntos

Doença de Alzheimer/etnologia , Doença de Alzheimer/genética , Predisposição Genética para Doença/genética , Transportadores de Cassetes de Ligação de ATP/genética , Negro ou Afro-Americano/genética , Idoso , Idoso de 80 Anos ou mais , Doença de Alzheimer/epidemiologia , Apolipoproteínas E/genética , Distribuição de Qui-Quadrado , Aberrações Cromossômicas , Estudos de Coortes , Feminino , Estudos de Associação Genética , Genótipo , Humanos , Masculino , Polimorfismo de Nucleotídeo Único/genética , Lectina 3 Semelhante a Ig de Ligação ao Ácido Siálico/genética

12.

HAMR: high-throughput annotation of modified ribonucleotides.

Ryvkin, Paul; Leung, Yuk Yee; Silverman, Ian M; Childress, Micah; Valladares, Otto; Dragomir, Isabelle; Gregory, Brian D; Wang, Li-San.

RNA ; 19(12): 1684-92, 2013 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-24149843

RESUMO

RNA is often altered post-transcriptionally by the covalent modification of particular nucleotides; these modifications are known to modulate the structure and activity of their host RNAs. The recent discovery that an RNA methyl-6 adenosine demethylase (FTO) is a risk gene in obesity has brought to light the significance of RNA modifications to human biology. These noncanonical nucleotides, when converted to cDNA in the course of RNA sequencing, can produce sequence patterns that are distinguishable from simple base-calling errors. To determine whether these modifications can be detected in RNA sequencing data, we developed a method that can not only locate these modifications transcriptome-wide with single nucleotide resolution, but can also differentiate between different classes of modifications. Using small RNA-seq data we were able to detect 92% of all known human tRNA modification sites that are predicted to affect RT activity. We also found that different modifications produce distinct patterns of cDNA sequence, allowing us to differentiate between two classes of adenosine and two classes of guanine modifications with 98% and 79% accuracy, respectively. To show the robustness of this method to sample preparation and sequencing methods, as well as to organismal diversity, we applied it to a publicly available yeast data set and achieved similar levels of accuracy. We also experimentally validated two novel and one known 3-methylcytosine (3mC) sites predicted by HAMR in human tRNAs. Researchers can now use our method to identify and characterize RNA modifications using only RNA-seq data, both retrospectively and when asking questions specifically about modified RNA.

Assuntos

Anotação de Sequência Molecular/métodos , Processamento Pós-Transcricional do RNA , RNA de Transferência/genética , Software , Feminino , Células HEK293 , Humanos , Masculino , RNA/genética , RNA/metabolismo , RNA de Transferência/metabolismo , Saccharomyces cerevisiae/genética , Alinhamento de Sequência , Análise de Sequência de RNA

13.

VCPA: genomic variant calling pipeline and data management tool for Alzheimer's Disease Sequencing Project.

Leung, Yuk Yee; Valladares, Otto; Chou, Yi-Fan; Lin, Han-Jen; Kuzma, Amanda B; Cantwell, Laura; Qu, Liming; Gangadharan, Prabhakaran; Salerno, William J; Schellenberg, Gerard D; Wang, Li-San.

Bioinformatics ; 35(11): 1985, 2019 06 01.

Artigo em Inglês | MEDLINE | ID: mdl-31004159

14.

The role of TREM2 R47H as a risk factor for Alzheimer's disease, frontotemporal lobar degeneration, amyotrophic lateral sclerosis, and Parkinson's disease.

Lill, Christina M; Rengmark, Aina; Pihlstrøm, Lasse; Fogh, Isabella; Shatunov, Aleksey; Sleiman, Patrick M; Wang, Li-San; Liu, Tian; Lassen, Christina F; Meissner, Esther; Alexopoulos, Panos; Calvo, Andrea; Chio, Adriano; Dizdar, Nil; Faltraco, Frank; Forsgren, Lars; Kirchheiner, Julia; Kurz, Alexander; Larsen, Jan P; Liebsch, Maria; Linder, Jan; Morrison, Karen E; Nissbrandt, Hans; Otto, Markus; Pahnke, Jens; Partch, Amanda; Restagno, Gabriella; Rujescu, Dan; Schnack, Cathrin; Shaw, Christopher E; Shaw, Pamela J; Tumani, Hayrettin; Tysnes, Ole-Bjørn; Valladares, Otto; Silani, Vincenzo; van den Berg, Leonard H; van Rheenen, Wouter; Veldink, Jan H; Lindenberger, Ulman; Steinhagen-Thiessen, Elisabeth; Teipel, Stefan; Perneczky, Robert; Hakonarson, Hakon; Hampel, Harald; von Arnim, Christine A F; Olsen, Jørgen H; Van Deerlin, Vivianna M; Al-Chalabi, Ammar; Toft, Mathias; Ritz, Beate.

Alzheimers Dement ; 11(12): 1407-1416, 2015 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-25936935

RESUMO

A rare variant in TREM2 (p.R47H, rs75932628) was recently reported to increase the risk of Alzheimer's disease (AD) and, subsequently, other neurodegenerative diseases, i.e. frontotemporal lobar degeneration (FTLD), amyotrophic lateral sclerosis (ALS), and Parkinson's disease (PD). Here we comprehensively assessed TREM2 rs75932628 for association with these diseases in a total of 19,940 previously untyped subjects of European descent. These data were combined with those from 28 published data sets by meta-analysis. Furthermore, we tested whether rs75932628 shows association with amyloid beta (Aß42) and total-tau protein levels in the cerebrospinal fluid (CSF) of 828 individuals with AD or mild cognitive impairment. Our data show that rs75932628 is highly significantly associated with the risk of AD across 24,086 AD cases and 148,993 controls of European descent (odds ratio or OR = 2.71, P = 4.67 × 10(-25)). No consistent evidence for association was found between this marker and the risk of FTLD (OR = 2.24, P = .0113 across 2673 cases/9283 controls), PD (OR = 1.36, P = .0767 across 8311 cases/79,938 controls) and ALS (OR = 1.41, P = .198 across 5544 cases/7072 controls). Furthermore, carriers of the rs75932628 risk allele showed significantly increased levels of CSF-total-tau (P = .0110) but not Aß42 suggesting that TREM2's role in AD may involve tau dysfunction.

Assuntos

Doença de Alzheimer/genética , Predisposição Genética para Doença , Glicoproteínas de Membrana/genética , Doenças Neurodegenerativas/genética , Receptores Imunológicos/genética , Idoso , Alelos , Esclerose Lateral Amiotrófica/genética , Estudos de Casos e Controles , Feminino , Degeneração Lobar Frontotemporal/genética , Genótipo , Humanos , Masculino , Pessoa de Meia-Idade , Doença de Parkinson/genética , Locos de Características Quantitativas , Fatores de Risco , População Branca , Proteínas tau/líquido cefalorraquidiano

15.

DRAW+SneakPeek: analysis workflow and quality metric management for DNA-seq experiments.

Lin, Chiao-Feng; Valladares, Otto; Childress, D Micah; Klevak, Egor; Geller, Evan T; Hwang, Yih-Chii; Tsai, Ellen A; Schellenberg, Gerard D; Wang, Li-San.

Bioinformatics ; 29(19): 2498-500, 2013 Oct 01.

Artigo em Inglês | MEDLINE | ID: mdl-23943636

RESUMO

SUMMARY: We report our new DRAW+SneakPeek software for DNA-seq analysis. DNA resequencing analysis workflow (DRAW) automates the workflow of processing raw sequence reads including quality control, read alignment and variant calling on high-performance computing facilities such as Amazon elastic compute cloud. SneakPeek provides an effective interface for reviewing dozens of quality metrics reported by DRAW, so users can assess the quality of data and diagnose problems in their sequencing procedures. Both DRAW and SneakPeek are freely available under the MIT license, and are available as Amazon machine images to be used directly on Amazon cloud with minimal installation. AVAILABILITY: DRAW+SneakPeek is released under the MIT license and is available for academic and nonprofit use for free. The information about source code, Amazon machine images and instructions on how to install and run DRAW+SneakPeek locally and on Amazon elastic compute cloud is available at the National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site (http://www.niagads.org/) and Wang lab Web site (http://wanglab.pcbi.upenn.edu/).

Assuntos

Biometria/métodos , DNA/análise , Análise de Sequência de DNA/métodos , Design de Software , Internet , Linguagens de Programação

16.

ABCC9 gene polymorphism is associated with hippocampal sclerosis of aging pathology.

Nelson, Peter T; Estus, Steven; Abner, Erin L; Parikh, Ishita; Malik, Manasi; Neltner, Janna H; Ighodaro, Eseosa; Wang, Wang-Xia; Wilfred, Bernard R; Wang, Li-San; Kukull, Walter A; Nandakumar, Kannabiran; Farman, Mark L; Poon, Wayne W; Corrada, Maria M; Kawas, Claudia H; Cribbs, David H; Bennett, David A; Schneider, Julie A; Larson, Eric B; Crane, Paul K; Valladares, Otto; Schmitt, Frederick A; Kryscio, Richard J; Jicha, Gregory A; Smith, Charles D; Scheff, Stephen W; Sonnen, Joshua A; Haines, Jonathan L; Pericak-Vance, Margaret A; Mayeux, Richard; Farrer, Lindsay A; Van Eldik, Linda J; Horbinski, Craig; Green, Robert C; Gearing, Marla; Poon, Leonard W; Kramer, Patricia L; Woltjer, Randall L; Montine, Thomas J; Partch, Amanda B; Rajic, Alexander J; Richmire, KatieRose; Monsell, Sarah E; Schellenberg, Gerard D; Fardo, David W.

Acta Neuropathol ; 127(6): 825-43, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-24770881

RESUMO

Hippocampal sclerosis of aging (HS-Aging) is a high-morbidity brain disease in the elderly but risk factors are largely unknown. We report the first genome-wide association study (GWAS) with HS-Aging pathology as an endophenotype. In collaboration with the Alzheimer's Disease Genetics Consortium, data were analyzed from large autopsy cohorts: (#1) National Alzheimer's Coordinating Center (NACC); (#2) Rush University Religious Orders Study and Memory and Aging Project; (#3) Group Health Research Institute Adult Changes in Thought study; (#4) University of California at Irvine 90+ Study; and (#5) University of Kentucky Alzheimer's Disease Center. Altogether, 363 HS-Aging cases and 2,303 controls, all pathologically confirmed, provided statistical power to test for risk alleles with large effect size. A two-tier study design included GWAS from cohorts #1-3 (Stage I) to identify promising SNP candidates, followed by focused evaluation of particular SNPs in cohorts #4-5 (Stage II). Polymorphism in the ATP-binding cassette, sub-family C member 9 (ABCC9) gene, also known as sulfonylurea receptor 2, was associated with HS-Aging pathology. In the meta-analyzed Stage I GWAS, ABCC9 polymorphisms yielded the lowest p values, and factoring in the Stage II results, the meta-analyzed risk SNP (rs704178:G) attained genome-wide statistical significance (p = 1.4 × 10(-9)), with odds ratio (OR) of 2.13 (recessive mode of inheritance). For SNPs previously linked to hippocampal sclerosis, meta-analyses of Stage I results show OR = 1.16 for rs5848 (GRN) and OR = 1.22 rs1990622 (TMEM106B), with the risk alleles as previously described. Sulfonylureas, a widely prescribed drug class used to treat diabetes, also modify human ABCC9 protein function. A subsample of patients from the NACC database (n = 624) were identified who were older than age 85 at death with known drug history. Controlling for important confounders such as diabetes itself, exposure to a sulfonylurea drug was associated with risk for HS-Aging pathology (p = 0.03). Thus, we describe a novel and targetable dementia risk factor.

Assuntos

Envelhecimento/genética , Envelhecimento/patologia , Hipocampo/patologia , Polimorfismo de Nucleotídeo Único , Receptores de Sulfonilureias/genética , Idoso de 80 Anos ou mais , Envelhecimento/efeitos dos fármacos , Estudos de Coortes , Bases de Dados como Assunto , Endofenótipos , Estudo de Associação Genômica Ampla , Hipocampo/efeitos dos fármacos , Humanos , Esclerose/genética , Esclerose/patologia , Compostos de Sulfonilureia/efeitos adversos , Compostos de Sulfonilureia/uso terapêutico

17.

SAVoR: a server for sequencing annotation and visualization of RNA structures.

Li, Fan; Ryvkin, Paul; Childress, Daniel M; Valladares, Otto; Gregory, Brian D; Wang, Li-San.

Nucleic Acids Res ; 40(Web Server issue): W59-64, 2012 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-22492627

RESUMO

RNA secondary structure is required for the proper regulation of the cellular transcriptome. This is because the functionality, processing, localization and stability of RNAs are all dependent on the folding of these molecules into intricate structures through specific base pairing interactions encoded in their primary nucleotide sequences. Thus, as the number of RNA sequencing (RNA-seq) data sets and the variety of protocols for this technology grow rapidly, it is becoming increasingly pertinent to develop tools that can analyze and visualize this sequence data in the context of RNA secondary structure. Here, we present Sequencing Annotation and Visualization of RNA structures (SAVoR), a web server, which seamlessly links RNA structure predictions with sequencing data and genomic annotations to produce highly informative and annotated models of RNA secondary structure. SAVoR accepts read alignment data from RNA-seq experiments and computes a series of per-base values such as read abundance and sequence variant frequency. These values can then be visualized on a customizable secondary structure model. SAVoR is freely available at http://tesla.pcbi.upenn.edu/savor.

Assuntos

RNA/química , Software , Internet , Modelos Moleculares , Anotação de Sequência Molecular , Conformação de Ácido Nucleico , Análise de Sequência de RNA

18.

A comparative study of structural variant calling in WGS from Alzheimer's disease families.

Malamon, John S; Farrell, John J; Xia, Li Charlie; Dombroski, Beth A; Das, Rueben G; Way, Jessica; Kuzma, Amanda B; Valladares, Otto; Leung, Yuk Yee; Scanlon, Allison J; Lopez, Irving Antonio Barrera; Brehony, Jack; Worley, Kim C; Zhang, Nancy R; Wang, Li-San; Farrer, Lindsay A; Schellenberg, Gerard D; Lee, Wan-Ping; Vardarajan, Badri N.

Life Sci Alliance ; 7(5)2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-38418088

RESUMO

Detecting structural variants (SVs) in whole-genome sequencing poses significant challenges. We present a protocol for variant calling, merging, genotyping, sensitivity analysis, and laboratory validation for generating a high-quality SV call set in whole-genome sequencing from the Alzheimer's Disease Sequencing Project comprising 578 individuals from 111 families. Employing two complementary pipelines, Scalpel and Parliament, for SV/indel calling, we assessed sensitivity through sample replicates (N = 9) with in silico variant spike-ins. We developed a novel metric, D-score, to evaluate caller specificity for deletions. The accuracy of deletions was evaluated by Sanger sequencing. We generated a high-quality call set of 152,301 deletions of diverse sizes. Sanger sequencing validated 114 of 146 detected deletions (78.1%). Scalpel excelled in accuracy for deletions ≤100 bp, whereas Parliament was optimal for deletions >900 bp. Overall, 83.0% and 72.5% of calls by Scalpel and Parliament were validated, respectively, including all 11 deletions called by both Parliament and Scalpel between 101 and 900 bp. Our flexible protocol successfully generated a high-quality deletion call set and a truth set of Sanger sequencing-validated deletions with precise breakpoints spanning 1-17,000 bp.

Assuntos

Doença de Alzheimer , Humanos , Doença de Alzheimer/genética , Sequenciamento Completo do Genoma/métodos

19.

Human whole-exome genotype data for Alzheimer's disease.

Leung, Yuk Yee; Naj, Adam C; Chou, Yi-Fan; Valladares, Otto; Schmidt, Michael; Hamilton-Nelson, Kara; Wheeler, Nicholas; Lin, Honghuang; Gangadharan, Prabhakaran; Qu, Liming; Clark, Kaylyn; Kuzma, Amanda B; Lee, Wan-Ping; Cantwell, Laura; Nicaretta, Heather; Haines, Jonathan; Farrer, Lindsay; Seshadri, Sudha; Brkanac, Zoran; Cruchaga, Carlos; Pericak-Vance, Margaret; Mayeux, Richard P; Bush, William S; Destefano, Anita; Martin, Eden; Schellenberg, Gerard D; Wang, Li-San.

Nat Commun ; 15(1): 684, 2024 Jan 23.

Artigo em Inglês | MEDLINE | ID: mdl-38263370

RESUMO

The heterogeneity of the whole-exome sequencing (WES) data generation methods present a challenge to a joint analysis. Here we present a bioinformatics strategy for joint-calling 20,504 WES samples collected across nine studies and sequenced using ten capture kits in fourteen sequencing centers in the Alzheimer's Disease Sequencing Project. The joint-genotype called variant-called format (VCF) file contains only positions within the union of capture kits. The VCF was then processed specifically to account for the batch effects arising from the use of different capture kits from different studies. We identified 8.2 million autosomal variants. 96.82% of the variants are high-quality, and are located in 28,579 Ensembl transcripts. 41% of the variants are intronic and 1.8% of the variants are with CADD > 30, indicating they are of high predicted pathogenicity. Here we show our new strategy can generate high-quality data from processing these diversely generated WES samples. The improved ability to combine data sequenced in different batches benefits the whole genomics research community.

Assuntos

Doença de Alzheimer , Humanos , Exoma , Biologia Computacional , Confiabilidade dos Dados , Genótipo

20.

Genome-wide double-stranded RNA sequencing reveals the functional significance of base-paired RNAs in Arabidopsis.

Zheng, Qi; Ryvkin, Paul; Li, Fan; Dragomir, Isabelle; Valladares, Otto; Yang, Jamie; Cao, Kajia; Wang, Li-San; Gregory, Brian D.

PLoS Genet ; 6(9): e1001141, 2010 Sep 30.

Artigo em Inglês | MEDLINE | ID: mdl-20941385

RESUMO

The functional structure of all biologically active molecules is dependent on intra- and inter-molecular interactions. This is especially evident for RNA molecules whose functionality, maturation, and regulation require formation of correct secondary structure through encoded base-pairing interactions. Unfortunately, intra- and inter-molecular base-pairing information is lacking for most RNAs. Here, we marry classical nuclease-based structure mapping techniques with high-throughput sequencing technology to interrogate all base-paired RNA in Arabidopsis thaliana and identify â¼200 new small (sm)RNA-producing substrates of RNA-DEPENDENT RNA POLYMERASE6. Our comprehensive analysis of paired RNAs reveals conserved functionality within introns and both 5' and 3' untranslated regions (UTRs) of mRNAs, as well as a novel population of functional RNAs, many of which are the precursors of smRNAs. Finally, we identify intra-molecular base-pairing interactions to produce a genome-wide collection of RNA secondary structure models. Although our methodology reveals the pairing status of RNA molecules in the absence of cellular proteins, previous studies have demonstrated that structural information obtained for RNAs in solution accurately reflects their structure in ribonucleoprotein complexes. Furthermore, our identification of RNA-DEPENDENT RNA POLYMERASE6 substrates and conserved functional RNA domains within introns and both 5' and 3' untranslated regions (UTRs) of mRNAs using this approach strongly suggests that RNA molecules are correctly folded into their secondary structure in solution. Overall, our findings highlight the importance of base-paired RNAs in eukaryotes and present an approach that should be widely applicable for the analysis of this key structural feature of RNA.

Assuntos

Arabidopsis/genética , Pareamento de Bases/genética , Genoma de Planta/genética , RNA de Cadeia Dupla/genética , RNA de Plantas/genética , Análise de Sequência de RNA/métodos , Proteínas de Arabidopsis/metabolismo , Sequência Conservada/genética , Perfilação da Expressão Gênica , Regulação da Expressão Gênica de Plantas , Genômica , Íntrons/genética , RNA Mensageiro/química , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , RNA de Plantas/química , RNA Nuclear Pequeno/química , RNA Nuclear Pequeno/genética , RNA Polimerase Dependente de RNA/metabolismo , Especificidade por Substrato , Regiões não Traduzidas/genética

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA