Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
Add more filters










Database
Language
Publication year range
1.
Nat Commun ; 15(1): 684, 2024 Jan 23.
Article in English | MEDLINE | ID: mdl-38263370

ABSTRACT

The heterogeneity of the whole-exome sequencing (WES) data generation methods present a challenge to a joint analysis. Here we present a bioinformatics strategy for joint-calling 20,504 WES samples collected across nine studies and sequenced using ten capture kits in fourteen sequencing centers in the Alzheimer's Disease Sequencing Project. The joint-genotype called variant-called format (VCF) file contains only positions within the union of capture kits. The VCF was then processed specifically to account for the batch effects arising from the use of different capture kits from different studies. We identified 8.2 million autosomal variants. 96.82% of the variants are high-quality, and are located in 28,579 Ensembl transcripts. 41% of the variants are intronic and 1.8% of the variants are with CADD > 30, indicating they are of high predicted pathogenicity. Here we show our new strategy can generate high-quality data from processing these diversely generated WES samples. The improved ability to combine data sequenced in different batches benefits the whole genomics research community.


Subject(s)
Alzheimer Disease , Humans , Exome , Computational Biology , Data Accuracy , Genotype
2.
Alzheimers Dement ; 20(2): 1123-1136, 2024 Feb.
Article in English | MEDLINE | ID: mdl-37881831

ABSTRACT

INTRODUCTION: The National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site Alzheimer's Genomics Database (GenomicsDB) is a public knowledge base of Alzheimer's disease (AD) genetic datasets and genomic annotations. METHODS: GenomicsDB uses a custom systems architecture to adopt and enforce rigorous standards that facilitate harmonization of AD-relevant genome-wide association study summary statistics datasets with functional annotations, including over 230 million annotated variants from the AD Sequencing Project. RESULTS: GenomicsDB generates interactive reports compiled from the harmonized datasets and annotations. These reports contextualize AD-risk associations in a broader functional genomic setting and summarize them in the context of functionally annotated genes and variants. DISCUSSION: Created to make AD-genetics knowledge more accessible to AD researchers, the GenomicsDB is designed to guide users unfamiliar with genetic data in not only exploring but also interpreting this ever-growing volume of data. Scalable and interoperable with other genomics resources using data technology standards, the GenomicsDB can serve as a central hub for research and data analysis on AD and related dementias. HIGHLIGHTS: The National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site (NIAGADS) offers to the public a unique, disease-centric collection of AD-relevant GWAS summary statistics datasets. Interpreting these data is challenging and requires significant bioinformatics expertise to standardize datasets and harmonize them with functional annotations on genome-wide scales. The NIAGADS Alzheimer's GenomicsDB helps overcome these challenges by providing a user-friendly public knowledge base for AD-relevant genetics that shares harmonized, annotated summary statistics datasets from the NIAGADS repository in an interpretable, easily searchable format.


Subject(s)
Alzheimer Disease , United States , Humans , Alzheimer Disease/genetics , Genome-Wide Association Study , National Institute on Aging (U.S.) , Genomics , Databases, Factual , Genetic Predisposition to Disease/genetics
3.
NAR Genom Bioinform ; 4(1): lqab123, 2022 Mar.
Article in English | MEDLINE | ID: mdl-35047815

ABSTRACT

Querying massive functional genomic and annotation data collections, linking and summarizing the query results across data sources/data types are important steps in high-throughput genomic and genetic analytical workflows. However, these steps are made difficult by the heterogeneity and breadth of data sources, experimental assays, biological conditions/tissues/cell types and file formats. FILER (FunctIonaL gEnomics Repository) is a framework for querying large-scale genomics knowledge with a large, curated integrated catalog of harmonized functional genomic and annotation data coupled with a scalable genomic search and querying interface. FILER uniquely provides: (i) streamlined access to >50 000 harmonized, annotated genomic datasets across >20 integrated data sources, >1100 tissues/cell types and >20 experimental assays; (ii) a scalable genomic querying interface; and (iii) ability to analyze and annotate user's experimental data. This rich resource spans >17 billion GRCh37/hg19 and GRCh38/hg38 genomic records. Our benchmark querying 7 × 109 hg19 FILER records shows FILER is highly scalable, with a sub-linear 32-fold increase in querying time when increasing the number of queries 1000-fold from 1000 to 1 000 000 intervals. Together, these features facilitate reproducible research and streamline integrating/querying large-scale genomic data within analyses/workflows. FILER can be deployed on cloud or local servers (https://bitbucket.org/wanglab-upenn/FILER) for integration with custom pipelines and is freely available (https://lisanwanglab.org/FILER).

4.
Bioinformatics ; 36(12): 3879-3881, 2020 06 01.
Article in English | MEDLINE | ID: mdl-32330239

ABSTRACT

SUMMARY: We report Spark-based INFERence of the molecular mechanisms of NOn-coding genetic variants (SparkINFERNO), a scalable bioinformatics pipeline characterizing non-coding genome-wide association study (GWAS) association findings. SparkINFERNO prioritizes causal variants underlying GWAS association signals and reports relevant regulatory elements, tissue contexts and plausible target genes they affect. To achieve this, the SparkINFERNO algorithm integrates GWAS summary statistics with large-scale collection of functional genomics datasets spanning enhancer activity, transcription factor binding, expression quantitative trait loci and other functional datasets across more than 400 tissues and cell types. Scalability is achieved by an underlying API implemented using Apache Spark and Giggle-based genomic indexing. We evaluated SparkINFERNO on large GWASs and show that SparkINFERNO is more than 60 times efficient and scales with data size and amount of computational resources. AVAILABILITY AND IMPLEMENTATION: SparkINFERNO runs on clusters or a single server with Apache Spark environment, and is available at https://bitbucket.org/wanglab-upenn/SparkINFERNO or https://hub.docker.com/r/wanglab/spark-inferno. CONTACT: lswang@pennmedicine.upenn.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genome-Wide Association Study , Quantitative Trait Loci , Algorithms , Genomics , Software
6.
Bioinformatics ; 35(10): 1768-1770, 2019 05 15.
Article in English | MEDLINE | ID: mdl-30351394

ABSTRACT

SUMMARY: We report VCPA, our SNP/Indel Variant Calling Pipeline and data management tool used for the analysis of whole genome and exome sequencing (WGS/WES) for the Alzheimer's Disease Sequencing Project. VCPA consists of two independent but linkable components: pipeline and tracking database. The pipeline, implemented using the Workflow Description Language and fully optimized for the Amazon elastic compute cloud environment, includes steps from aligning raw sequence reads to variant calling using GATK. The tracking database allows users to view job running status in real time and visualize >100 quality metrics per genome. VCPA is functionally equivalent to the CCDG/TOPMed pipeline. Users can use the pipeline and the dockerized database to process large WGS/WES datasets on Amazon cloud with minimal configuration. AVAILABILITY AND IMPLEMENTATION: VCPA is released under the MIT license and is available for academic and nonprofit use for free. The pipeline source code and step-by-step instructions are available from the National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site (http://www.niagads.org/VCPA). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Alzheimer Disease , Data Management , Genomics , High-Throughput Nucleotide Sequencing , Humans , Software
7.
JAMA Neurol ; 74(10): 1178-1189, 2017 10 01.
Article in English | MEDLINE | ID: mdl-28846757

ABSTRACT

Importance: It is unclear whether female carriers of the apolipoprotein E (APOE) ε4 allele are at greater risk of developing Alzheimer disease (AD) than men, and the sex-dependent association of mild cognitive impairment (MCI) and APOE has not been established. Objective: To determine how sex and APOE genotype affect the risks for developing MCI and AD. Data Sources: Twenty-seven independent research studies in the Global Alzheimer's Association Interactive Network with data on nearly 58 000 participants. Study Selection: Non-Hispanic white individuals with clinical diagnostic and APOE genotype data. Data Extraction and Synthesis: Homogeneous data sets were pooled in case-control analyses, and logistic regression models were used to compute risks. Main Outcomes and Measures: Age-adjusted odds ratios (ORs) and 95% confidence intervals for developing MCI and AD were calculated for men and women across APOE genotypes. Results: Participants were men and women between ages 55 and 85 years. Across data sets most participants were white, and for many participants, racial/ethnic information was either not collected or not known. Men (OR, 3.09; 95% CI, 2.79-3.42) and women (OR, 3.31; CI, 3.03-3.61) with the APOE ε3/ε4 genotype from ages 55 to 85 years did not show a difference in AD risk; however, women had an increased risk compared with men between the ages of 65 and 75 years (women, OR, 4.37; 95% CI, 3.82-5.00; men, OR, 3.14; 95% CI, 2.68-3.67; P = .002). Men with APOE ε3/ε4 had an increased risk of AD compared with men with APOE ε3/ε3. The APOE ε2/ε3 genotype conferred a protective effect on women (OR, 0.51; 95% CI, 0.43-0.61) decreasing their risk of AD more (P value = .01) than men (OR, 0.71; 95% CI, 0.60-0.85). There was no difference between men with APOE ε3/ε4 (OR, 1.55; 95% CI, 1.36-1.76) and women (OR, 1.60; 95% CI, 1.43-1.81) in their risk of developing MCI between the ages of 55 and 85 years, but women had an increased risk between 55 and 70 years (women, OR, 1.43; 95% CI, 1.19-1.73; men, OR, 1.07; 95% CI, 0.87-1.30; P = .05). There were no significant differences between men and women in their risks for converting from MCI to AD between the ages of 55 and 85 years. Individuals with APOE ε4/ε4 showed increased risks vs individuals with ε3/ε4, but no significant differences between men and women with ε4/ε4 were seen. Conclusions and Relevance: Contrary to long-standing views, men and women with the APOE ε3/ε4 genotype have nearly the same odds of developing AD from age 55 to 85 years, but women have an increased risk at younger ages.


Subject(s)
Alzheimer Disease/genetics , Apolipoproteins E/genetics , Sex Characteristics , Aged , Aged, 80 and over , Alzheimer Disease/epidemiology , Case-Control Studies , Databases, Factual/statistics & numerical data , Female , Humans , Logistic Models , Male , Middle Aged , Risk Factors
SELECTION OF CITATIONS
SEARCH DETAIL
...