Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 40
Filtrar
1.
Life Sci Alliance ; 7(5)2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38418088

RESUMEN

Detecting structural variants (SVs) in whole-genome sequencing poses significant challenges. We present a protocol for variant calling, merging, genotyping, sensitivity analysis, and laboratory validation for generating a high-quality SV call set in whole-genome sequencing from the Alzheimer's Disease Sequencing Project comprising 578 individuals from 111 families. Employing two complementary pipelines, Scalpel and Parliament, for SV/indel calling, we assessed sensitivity through sample replicates (N = 9) with in silico variant spike-ins. We developed a novel metric, D-score, to evaluate caller specificity for deletions. The accuracy of deletions was evaluated by Sanger sequencing. We generated a high-quality call set of 152,301 deletions of diverse sizes. Sanger sequencing validated 114 of 146 detected deletions (78.1%). Scalpel excelled in accuracy for deletions ≤100 bp, whereas Parliament was optimal for deletions >900 bp. Overall, 83.0% and 72.5% of calls by Scalpel and Parliament were validated, respectively, including all 11 deletions called by both Parliament and Scalpel between 101 and 900 bp. Our flexible protocol successfully generated a high-quality deletion call set and a truth set of Sanger sequencing-validated deletions with precise breakpoints spanning 1-17,000 bp.


Asunto(s)
Enfermedad de Alzheimer , Humanos , Enfermedad de Alzheimer/genética , Secuenciación Completa del Genoma/métodos
2.
Nat Commun ; 15(1): 684, 2024 Jan 23.
Artículo en Inglés | MEDLINE | ID: mdl-38263370

RESUMEN

The heterogeneity of the whole-exome sequencing (WES) data generation methods present a challenge to a joint analysis. Here we present a bioinformatics strategy for joint-calling 20,504 WES samples collected across nine studies and sequenced using ten capture kits in fourteen sequencing centers in the Alzheimer's Disease Sequencing Project. The joint-genotype called variant-called format (VCF) file contains only positions within the union of capture kits. The VCF was then processed specifically to account for the batch effects arising from the use of different capture kits from different studies. We identified 8.2 million autosomal variants. 96.82% of the variants are high-quality, and are located in 28,579 Ensembl transcripts. 41% of the variants are intronic and 1.8% of the variants are with CADD > 30, indicating they are of high predicted pathogenicity. Here we show our new strategy can generate high-quality data from processing these diversely generated WES samples. The improved ability to combine data sequenced in different batches benefits the whole genomics research community.


Asunto(s)
Enfermedad de Alzheimer , Humanos , Exoma , Biología Computacional , Exactitud de los Datos , Genotipo
3.
Alzheimers Dement ; 20(2): 1123-1136, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-37881831

RESUMEN

INTRODUCTION: The National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site Alzheimer's Genomics Database (GenomicsDB) is a public knowledge base of Alzheimer's disease (AD) genetic datasets and genomic annotations. METHODS: GenomicsDB uses a custom systems architecture to adopt and enforce rigorous standards that facilitate harmonization of AD-relevant genome-wide association study summary statistics datasets with functional annotations, including over 230 million annotated variants from the AD Sequencing Project. RESULTS: GenomicsDB generates interactive reports compiled from the harmonized datasets and annotations. These reports contextualize AD-risk associations in a broader functional genomic setting and summarize them in the context of functionally annotated genes and variants. DISCUSSION: Created to make AD-genetics knowledge more accessible to AD researchers, the GenomicsDB is designed to guide users unfamiliar with genetic data in not only exploring but also interpreting this ever-growing volume of data. Scalable and interoperable with other genomics resources using data technology standards, the GenomicsDB can serve as a central hub for research and data analysis on AD and related dementias. HIGHLIGHTS: The National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site (NIAGADS) offers to the public a unique, disease-centric collection of AD-relevant GWAS summary statistics datasets. Interpreting these data is challenging and requires significant bioinformatics expertise to standardize datasets and harmonize them with functional annotations on genome-wide scales. The NIAGADS Alzheimer's GenomicsDB helps overcome these challenges by providing a user-friendly public knowledge base for AD-relevant genetics that shares harmonized, annotated summary statistics datasets from the NIAGADS repository in an interpretable, easily searchable format.


Asunto(s)
Enfermedad de Alzheimer , Estados Unidos , Humanos , Enfermedad de Alzheimer/genética , Estudio de Asociación del Genoma Completo , National Institute on Aging (U.S.) , Genómica , Bases de Datos Factuales , Predisposición Genética a la Enfermedad/genética
4.
Bioinformatics ; 39(11)2023 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-37947320

RESUMEN

SUMMARY: Preparing functional genomic (FG) data with diverse assay types and file formats for integration into analysis workflows that interpret genome-wide association and other studies is a significant and time-consuming challenge. Here we introduce hipFG (Harmonization and Integration Pipeline for Functional Genomics), an automatically customized pipeline for efficient and scalable normalization of heterogenous FG data collections into standardized, indexed, rapidly searchable analysis-ready datasets while accounting for FG datatypes (e.g. chromatin interactions, genomic intervals, quantitative trait loci). AVAILABILITY AND IMPLEMENTATION: hipFG is freely available at https://bitbucket.org/wanglab-upenn/hipFG. A Docker container is available at https://hub.docker.com/r/wanglab/hipfg.


Asunto(s)
Estudio de Asociación del Genoma Completo , Programas Informáticos , Genómica , Cromatina , Sitios de Carácter Cuantitativo
5.
medRxiv ; 2023 Jul 08.
Artículo en Inglés | MEDLINE | ID: mdl-37461624

RESUMEN

Limited ancestral diversity has impaired our ability to detect risk variants more prevalent in non-European ancestry groups in genome-wide association studies (GWAS). We constructed and analyzed a multi-ancestry GWAS dataset in the Alzheimer's Disease (AD) Genetics Consortium (ADGC) to test for novel shared and ancestry-specific AD susceptibility loci and evaluate underlying genetic architecture in 37,382 non-Hispanic White (NHW), 6,728 African American, 8,899 Hispanic (HIS), and 3,232 East Asian individuals, performing within-ancestry fixed-effects meta-analysis followed by a cross-ancestry random-effects meta-analysis. We identified 13 loci with cross-ancestry associations including known loci at/near CR1 , BIN1 , TREM2 , CD2AP , PTK2B , CLU , SHARPIN , MS4A6A , PICALM , ABCA7 , APOE and two novel loci not previously reported at 11p12 ( LRRC4C ) and 12q24.13 ( LHX5-AS1 ). Reflecting the power of diverse ancestry in GWAS, we observed the SHARPIN locus using 7.1% the sample size of the original discovering single-ancestry GWAS (n=788,989). We additionally identified three GWS ancestry-specific loci at/near ( PTPRK ( P =2.4×10 -8 ) and GRB14 ( P =1.7×10 -8 ) in HIS), and KIAA0825 ( P =2.9×10 -8 in NHW). Pathway analysis implicated multiple amyloid regulation pathways (strongest with P adjusted =1.6×10 -4 ) and the classical complement pathway ( P adjusted =1.3×10 -3 ). Genes at/near our novel loci have known roles in neuronal development ( LRRC4C, LHX5-AS1 , and PTPRK ) and insulin receptor activity regulation ( GRB14 ). These findings provide compelling support for using traditionally-underrepresented populations for gene discovery, even with smaller sample sizes.

6.
bioRxiv ; 2023 Apr 25.
Artículo en Inglés | MEDLINE | ID: mdl-37162864

RESUMEN

Preparing functional genomic (FG) data with diverse assay types and file formats for integration into analysis workflows that interpret genome-wide association and other studies is a significant and time-consuming challenge. Here we introduce hipFG, an automatically customized pipeline for efficient and scalable normalization of heterogenous FG data collections into standardized, indexed, rapidly searchable analysis-ready datasets while accounting for FG datatypes (e.g., chromatin interactions, genomic intervals, quantitative trait loci).

7.
NAR Genom Bioinform ; 4(1): lqab123, 2022 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-35047815

RESUMEN

Querying massive functional genomic and annotation data collections, linking and summarizing the query results across data sources/data types are important steps in high-throughput genomic and genetic analytical workflows. However, these steps are made difficult by the heterogeneity and breadth of data sources, experimental assays, biological conditions/tissues/cell types and file formats. FILER (FunctIonaL gEnomics Repository) is a framework for querying large-scale genomics knowledge with a large, curated integrated catalog of harmonized functional genomic and annotation data coupled with a scalable genomic search and querying interface. FILER uniquely provides: (i) streamlined access to >50 000 harmonized, annotated genomic datasets across >20 integrated data sources, >1100 tissues/cell types and >20 experimental assays; (ii) a scalable genomic querying interface; and (iii) ability to analyze and annotate user's experimental data. This rich resource spans >17 billion GRCh37/hg19 and GRCh38/hg38 genomic records. Our benchmark querying 7 × 109 hg19 FILER records shows FILER is highly scalable, with a sub-linear 32-fold increase in querying time when increasing the number of queries 1000-fold from 1000 to 1 000 000 intervals. Together, these features facilitate reproducible research and streamline integrating/querying large-scale genomic data within analyses/workflows. FILER can be deployed on cloud or local servers (https://bitbucket.org/wanglab-upenn/FILER) for integration with custom pipelines and is freely available (https://lisanwanglab.org/FILER).

8.
J Alzheimers Dis ; 86(1): 461-477, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35068457

RESUMEN

BACKGROUND: Recent Alzheimer's disease (AD) genetics findings from genome-wide association studies (GWAS) span progressively larger and more diverse populations and outcomes. Currently, there is no up-to-date resource providing harmonized and searchable information on all AD genetic associations found by GWAS, nor linking the reported genetic variants and genes with functional and genomic annotations. OBJECTIVE: Create an integrated/harmonized, and literature-derived collection of population-specific AD genetic associations. METHODS: We developed the Alzheimer's Disease Variant Portal (ADVP), an extensive collection of associations curated from >200 GWAS publications from Alzheimer's Disease Genetics Consortium and other consortia. Genetic associations were systematically extracted, harmonized, and annotated from both the genome-wide significant and suggestive loci reported in these publications. To ensure consistent representation of AD genetic findings, all the extracted genetic association information was harmonized across specifically designed publication, variant, and association categories. RESULTS: ADVP V1.0 (February 2021) catalogs 6,990 associations related to disease-risk, expression quantitative traits, endophenotypes, or neuropathology. This extensive harmonization effort led to a catalog containing >900 loci, >1,800 variants, >80 cohorts, and 8 populations. Besides, ADVP provides investigators with a seamless integration of genomic and publicly available functional annotations across multiple databases per harmonized variant and gene records, thus facilitating further understanding and analyses of these genetics findings. CONCLUSION: ADVP is a valuable resource for investigators to quickly and systematically explore high-confidence AD genetic findings and provides insights into population-specific AD genetic architecture. ADVP is continually maintained and enhanced by NIAGADS and is freely accessible at https://advp.niagads.org.


Asunto(s)
Enfermedad de Alzheimer , Estudio de Asociación del Genoma Completo , Enfermedad de Alzheimer/genética , Endofenotipos , Predisposición Genética a la Enfermedad/genética , Humanos , Polimorfismo de Nucleótido Simple
9.
Transl Psychiatry ; 11(1): 618, 2021 12 06.
Artículo en Inglés | MEDLINE | ID: mdl-34873149

RESUMEN

Late-onset Alzheimer disease (LOAD) is highly polygenic, with a heritability estimated between 40 and 80%, yet risk variants identified in genome-wide studies explain only ~8% of phenotypic variance. Due to its increased power and interpretability, genetically regulated expression (GReX) analysis is an emerging approach to investigate the genetic mechanisms of complex diseases. Here, we conducted GReX analysis within and across 51 tissues on 39 LOAD GWAS data sets comprising 58,713 cases and controls from the Alzheimer's Disease Genetics Consortium (ADGC) and the International Genomics of Alzheimer's Project (IGAP). Meta-analysis across studies identified 216 unique significant genes, including 72 with no previously reported LOAD GWAS associations. Cross-brain-tissue and cross-GTEx models revealed eight additional genes significantly associated with LOAD. Conditional analysis of previously reported loci using established LOAD-risk variants identified eight genes reaching genome-wide significance independent of known signals. Moreover, the proportion of SNP-based heritability is highly enriched in genes identified by GReX analysis. In summary, GReX-based meta-analysis in LOAD identifies 216 genes (including 72 novel genes), illuminating the role of gene regulatory models in LOAD.


Asunto(s)
Enfermedad de Alzheimer , Enfermedad de Alzheimer/genética , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Herencia Multifactorial , Polimorfismo de Nucleótido Simple
10.
Front Genet ; 12: 752390, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34804120

RESUMEN

Alzheimer's Disease (AD) is a progressive neurologic disease and the most common form of dementia. While the causes of AD are not completely understood, genetics plays a key role in the etiology of AD, and thus finding genetic factors holds the potential to uncover novel AD mechanisms. For this study, we focus on copy number variation (CNV) detection and burden analysis. Leveraging whole-genome sequence (WGS) data released by Alzheimer's Disease Sequencing Project (ADSP), we developed a scalable bioinformatics pipeline to identify CNVs. This pipeline was applied to 1,737 AD cases and 2,063 cognitively normal controls. As a result, we observed 237,306 and 42,767 deletions and duplications, respectively, with an average of 2,255 deletions and 1,820 duplications per subject. The burden tests show that Non-Hispanic-White cases on average have 16 more duplications than controls do (p-value 2e-6), and Hispanic cases have larger deletions than controls do (p-value 6.8e-5).

11.
NAR Genom Bioinform ; 2(2): lqaa022, 2020 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-32270138

RESUMEN

Most regulatory chromatin interactions are mediated by various transcription factors (TFs) and involve physically interacting elements such as enhancers, insulators or promoters. To map these elements and interactions at a fine scale, we developed HIPPIE2 that analyzes raw reads from high-throughput chromosome conformation (Hi-C) experiments to identify precise loci of DNA physically interacting regions (PIRs). Unlike standard genome binning approaches (e.g. 10-kb to 1-Mb bins), HIPPIE2 dynamically infers the physical locations of PIRs using the distribution of restriction sites to increase analysis precision and resolution. We applied HIPPIE2 to in situ Hi-C datasets across six human cell lines (GM12878, IMR90, K562, HMEC, HUVEC, NHEK) with matched ENCODE/Roadmap functional genomic data. HIPPIE2 detected 1042 738 distinct PIRs, with high resolution (average PIR length of 1006 bp) and high reproducibility (92.3% in GM12878). PIRs are enriched for epigenetic marks (H3K27ac, H3K4me1) and open chromatin, suggesting active regulatory roles. HIPPIE2 identified 2.8 million significant PIR-PIR interactions, 27.2% of which were enriched for TF binding sites. 50 608 interactions were enhancer-promoter interactions and were enriched for 33 TFs, including known DNA looping/long-range mediators. These findings demonstrate that the novel dynamic approach of HIPPIE2 (https://bitbucket.com/wanglab-upenn/HIPPIE2) enables the characterization of chromatin and regulatory interactions with high resolution and reproducibility.

12.
Bioinformatics ; 36(12): 3879-3881, 2020 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-32330239

RESUMEN

SUMMARY: We report Spark-based INFERence of the molecular mechanisms of NOn-coding genetic variants (SparkINFERNO), a scalable bioinformatics pipeline characterizing non-coding genome-wide association study (GWAS) association findings. SparkINFERNO prioritizes causal variants underlying GWAS association signals and reports relevant regulatory elements, tissue contexts and plausible target genes they affect. To achieve this, the SparkINFERNO algorithm integrates GWAS summary statistics with large-scale collection of functional genomics datasets spanning enhancer activity, transcription factor binding, expression quantitative trait loci and other functional datasets across more than 400 tissues and cell types. Scalability is achieved by an underlying API implemented using Apache Spark and Giggle-based genomic indexing. We evaluated SparkINFERNO on large GWASs and show that SparkINFERNO is more than 60 times efficient and scales with data size and amount of computational resources. AVAILABILITY AND IMPLEMENTATION: SparkINFERNO runs on clusters or a single server with Apache Spark environment, and is available at https://bitbucket.org/wanglab-upenn/SparkINFERNO or https://hub.docker.com/r/wanglab/spark-inferno. CONTACT: lswang@pennmedicine.upenn.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Estudio de Asociación del Genoma Completo , Sitios de Carácter Cuantitativo , Algoritmos , Genómica , Programas Informáticos
15.
Nat Genet ; 51(3): 414-430, 2019 03.
Artículo en Inglés | MEDLINE | ID: mdl-30820047

RESUMEN

Risk for late-onset Alzheimer's disease (LOAD), the most prevalent dementia, is partially driven by genetics. To identify LOAD risk loci, we performed a large genome-wide association meta-analysis of clinically diagnosed LOAD (94,437 individuals). We confirm 20 previous LOAD risk loci and identify five new genome-wide loci (IQCK, ACE, ADAM10, ADAMTS1, and WWOX), two of which (ADAM10, ACE) were identified in a recent genome-wide association (GWAS)-by-familial-proxy of Alzheimer's or dementia. Fine-mapping of the human leukocyte antigen (HLA) region confirms the neurological and immune-mediated disease haplotype HLA-DR15 as a risk factor for LOAD. Pathway analysis implicates immunity, lipid metabolism, tau binding proteins, and amyloid precursor protein (APP) metabolism, showing that genetic variants affecting APP and Aß processing are associated not only with early-onset autosomal dominant Alzheimer's disease but also with LOAD. Analyses of risk genes and pathways show enrichment for rare variants (P = 1.32 × 10-7), indicating that additional rare variants remain to be identified. We also identify important genetic correlations between LOAD and traits such as family history of dementia and education.


Asunto(s)
Enfermedad de Alzheimer/genética , Péptidos beta-Amiloides/genética , Sitios Genéticos/genética , Predisposición Genética a la Enfermedad/genética , Inmunidad/genética , Lípidos/genética , Proteínas tau/genética , Anciano , Estudios de Casos y Controles , Femenino , Pruebas Genéticas/métodos , Estudio de Asociación del Genoma Completo/métodos , Haplotipos/genética , Humanos , Metabolismo de los Lípidos/genética , Masculino
16.
Bioinformatics ; 35(6): 1033-1039, 2019 03 15.
Artículo en Inglés | MEDLINE | ID: mdl-30668832

RESUMEN

MOTIVATION: Small non-coding RNAs (sncRNAs, <100 nts) are highly abundant RNAs that regulate diverse and often tissue-specific cellular processes by associating with transcription factor complexes or binding to mRNAs. While thousands of sncRNA genes exist in the human genome, no single resource provides searchable, unified annotation, expression and processing information for full sncRNA transcripts and mature RNA products derived from these larger RNAs. RESULTS: Our goal is to establish a complete catalog of annotation, expression, processing, conservation, tissue-specificity and other biological features for all human sncRNA genes and mature products derived from all major RNA classes. DASHR (Database of small human non-coding RNAs) v2.0 database is the first that integrates human sncRNA gene and mature products profiles obtained from multiple RNA-seq protocols. Altogether, 185 tissues/cell types and sncRNA annotations and >800 curated experiments from ENCODE and GEO/SRA across multiple RNA-seq protocols for both GRCh38/hg38 and GRCh37/hg19 assemblies are integrated in DASHR. Moreover, DASHR is the first to contain both known and novel, previously un-annotated sncRNA loci identified by unsupervised segmentation (13 times more loci with 1 678 800 total). Additionally, DASHR v2.0 adds >3 200 000 annotations for non-small RNA genes and other genomic features (long-noncoding RNAs, mRNAs, promoters, repeats). Furthermore, DASHR v2.0 introduces an enhanced user interface, interactive experiment-by-locus table view, sncRNA locus sorting and filtering by biological features. All annotation and expression information directly downloadable and accessible as UCSC genome browser tracks. AVAILABILITY AND IMPLEMENTATION: DASHR v2.0 is freely available at https://lisanwanglab.org/DASHRv2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
ARN Pequeño no Traducido/provisión & distribución , Bases de Datos de Ácidos Nucleicos , Genómica , Humanos , ARN Largo no Codificante , Análisis de Secuencia de ARN , Programas Informáticos
17.
Bioinformatics ; 35(10): 1768-1770, 2019 05 15.
Artículo en Inglés | MEDLINE | ID: mdl-30351394

RESUMEN

SUMMARY: We report VCPA, our SNP/Indel Variant Calling Pipeline and data management tool used for the analysis of whole genome and exome sequencing (WGS/WES) for the Alzheimer's Disease Sequencing Project. VCPA consists of two independent but linkable components: pipeline and tracking database. The pipeline, implemented using the Workflow Description Language and fully optimized for the Amazon elastic compute cloud environment, includes steps from aligning raw sequence reads to variant calling using GATK. The tracking database allows users to view job running status in real time and visualize >100 quality metrics per genome. VCPA is functionally equivalent to the CCDG/TOPMed pipeline. Users can use the pipeline and the dockerized database to process large WGS/WES datasets on Amazon cloud with minimal configuration. AVAILABILITY AND IMPLEMENTATION: VCPA is released under the MIT license and is available for academic and nonprofit use for free. The pipeline source code and step-by-step instructions are available from the National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site (http://www.niagads.org/VCPA). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Enfermedad de Alzheimer , Manejo de Datos , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Programas Informáticos
18.
Nucleic Acids Res ; 46(17): 8740-8753, 2018 09 28.
Artículo en Inglés | MEDLINE | ID: mdl-30113658

RESUMEN

The majority of variants identified by genome-wide association studies (GWAS) reside in the noncoding genome, affecting regulatory elements including transcriptional enhancers. However, characterizing their effects requires the integration of GWAS results with context-specific regulatory activity and linkage disequilibrium annotations to identify causal variants underlying noncoding association signals and the regulatory elements, tissue contexts, and target genes they affect. We propose INFERNO, a novel method which integrates hundreds of functional genomics datasets spanning enhancer activity, transcription factor binding sites, and expression quantitative trait loci with GWAS summary statistics. INFERNO includes novel statistical methods to quantify empirical enrichments of tissue-specific enhancer overlap and to identify co-regulatory networks of dysregulated long noncoding RNAs (lncRNAs). We applied INFERNO to two large GWAS studies. For schizophrenia (36,989 cases, 113,075 controls), INFERNO identified putatively causal variants affecting brain enhancers for known schizophrenia-related genes. For inflammatory bowel disease (IBD) (12,882 cases, 21,770 controls), INFERNO found enrichments of immune and digestive enhancers and lncRNAs involved in regulation of the adaptive immune response. In summary, INFERNO comprehensively infers the molecular mechanisms of causal noncoding variants, providing a sensitive hypothesis generation method for post-GWAS analysis. The software is available as an open source pipeline and a web server.


Asunto(s)
Elementos de Facilitación Genéticos , Genoma Humano , Enfermedades Inflamatorias del Intestino/genética , ARN Largo no Codificante/genética , Esquizofrenia/genética , Programas Informáticos , Inmunidad Adaptativa , Estudios de Casos y Controles , Femenino , Marcadores Genéticos , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Enfermedades Inflamatorias del Intestino/inmunología , Enfermedades Inflamatorias del Intestino/fisiopatología , Internet , Desequilibrio de Ligamiento , Masculino , Fenotipo , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , ARN Largo no Codificante/inmunología , Esquizofrenia/inmunología , Esquizofrenia/fisiopatología
19.
Nucleic Acids Res ; 46(W1): W36-W42, 2018 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-29733404

RESUMEN

The introduction of new high-throughput small RNA sequencing protocols that generate large-scale genomics datasets along with increasing evidence of the significant regulatory roles of small non-coding RNAs (sncRNAs) have highlighted the urgent need for tools to analyze and interpret large amounts of small RNA sequencing data. However, it remains challenging to systematically and comprehensively discover and characterize sncRNA genes and specifically-processed sncRNA products from these datasets. To fill this gap, we present Small RNA-seq Portal for Analysis of sequencing expeRiments (SPAR), a user-friendly web server for interactive processing, analysis, annotation and visualization of small RNA sequencing data. SPAR supports sequencing data generated from various experimental protocols, including smRNA-seq, short total RNA sequencing, microRNA-seq, and single-cell small RNA-seq. Additionally, SPAR includes publicly available reference sncRNA datasets from our DASHR database and from ENCODE across 185 human tissues and cell types to produce highly informative small RNA annotations across all major small RNA types and other features such as co-localization with various genomic features, precursor transcript cleavage patterns, and conservation. SPAR allows the user to compare the input experiment against reference ENCODE/DASHR datasets. SPAR currently supports analyses of human (hg19, hg38) and mouse (mm10) sequencing data. SPAR is freely available at https://www.lisanwanglab.org/SPAR.


Asunto(s)
Biología Computacional/tendencias , ARN Pequeño no Traducido/genética , ARN/genética , Programas Informáticos , Animales , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento/instrumentación , Humanos , Internet , Ratones , Anotación de Secuencia Molecular , Análisis de Secuencia de ARN/instrumentación , Transcriptoma/genética
20.
Nat Genet ; 49(9): 1373-1384, 2017 09.
Artículo en Inglés | MEDLINE | ID: mdl-28714976

RESUMEN

We identified rare coding variants associated with Alzheimer's disease in a three-stage case-control study of 85,133 subjects. In stage 1, we genotyped 34,174 samples using a whole-exome microarray. In stage 2, we tested associated variants (P < 1 × 10-4) in 35,962 independent samples using de novo genotyping and imputed genotypes. In stage 3, we used an additional 14,997 samples to test the most significant stage 2 associations (P < 5 × 10-8) using imputed genotypes. We observed three new genome-wide significant nonsynonymous variants associated with Alzheimer's disease: a protective variant in PLCG2 (rs72824905: p.Pro522Arg, P = 5.38 × 10-10, odds ratio (OR) = 0.68, minor allele frequency (MAF)cases = 0.0059, MAFcontrols = 0.0093), a risk variant in ABI3 (rs616338: p.Ser209Phe, P = 4.56 × 10-10, OR = 1.43, MAFcases = 0.011, MAFcontrols = 0.008), and a new genome-wide significant variant in TREM2 (rs143332484: p.Arg62His, P = 1.55 × 10-14, OR = 1.67, MAFcases = 0.0143, MAFcontrols = 0.0089), a known susceptibility gene for Alzheimer's disease. These protein-altering changes are in genes highly expressed in microglia and highlight an immune-related protein-protein interaction network enriched for previously identified risk genes in Alzheimer's disease. These genetic findings provide additional evidence that the microglia-mediated innate immune response contributes directly to the development of Alzheimer's disease.


Asunto(s)
Proteínas Adaptadoras Transductoras de Señales/genética , Enfermedad de Alzheimer/genética , Inmunidad Innata/genética , Glicoproteínas de Membrana/genética , Microglía/metabolismo , Fosfolipasa C gamma/genética , Polimorfismo de Nucleótido Simple , Receptores Inmunológicos/genética , Secuencia de Aminoácidos , Estudios de Casos y Controles , Exoma/genética , Perfilación de la Expresión Génica , Frecuencia de los Genes , Predisposición Genética a la Enfermedad/genética , Genotipo , Humanos , Desequilibrio de Ligamiento , Oportunidad Relativa , Mapas de Interacción de Proteínas/genética , Homología de Secuencia de Aminoácido
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA