Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 39
Filter
1.
Life Sci Alliance ; 7(5)2024 May.
Article in English | MEDLINE | ID: mdl-38418088

ABSTRACT

Detecting structural variants (SVs) in whole-genome sequencing poses significant challenges. We present a protocol for variant calling, merging, genotyping, sensitivity analysis, and laboratory validation for generating a high-quality SV call set in whole-genome sequencing from the Alzheimer's Disease Sequencing Project comprising 578 individuals from 111 families. Employing two complementary pipelines, Scalpel and Parliament, for SV/indel calling, we assessed sensitivity through sample replicates (N = 9) with in silico variant spike-ins. We developed a novel metric, D-score, to evaluate caller specificity for deletions. The accuracy of deletions was evaluated by Sanger sequencing. We generated a high-quality call set of 152,301 deletions of diverse sizes. Sanger sequencing validated 114 of 146 detected deletions (78.1%). Scalpel excelled in accuracy for deletions ≤100 bp, whereas Parliament was optimal for deletions >900 bp. Overall, 83.0% and 72.5% of calls by Scalpel and Parliament were validated, respectively, including all 11 deletions called by both Parliament and Scalpel between 101 and 900 bp. Our flexible protocol successfully generated a high-quality deletion call set and a truth set of Sanger sequencing-validated deletions with precise breakpoints spanning 1-17,000 bp.


Subject(s)
Alzheimer Disease , Humans , Alzheimer Disease/genetics , Whole Genome Sequencing/methods
2.
medRxiv ; 2024 Jan 18.
Article in English | MEDLINE | ID: mdl-38293024

ABSTRACT

The prevalence of dementia among South Asians across India is approximately 7.4% in those 60 years and older, yet little is known about genetic risk factors for dementia in this population. Most known risk loci for Alzheimer's disease (AD) have been identified from studies conducted in European Ancestry (EA) but are unknown in South Asians. Using whole-genome sequence data from 2680 participants from the Diagnostic Assessment of Dementia for the Longitudinal Aging Study of India (LASI-DAD), we performed a gene-based analysis of 84 genes previously associated with AD in EA. We investigated associations with the Hindi Mental State Examination (HMSE) score and factor scores for general cognitive function and five cognitive domains. For each gene, we examined missense/loss-of-function (LoF) variants and brain-specific promoter/enhancer variants, separately, both with and without incorporating additional annotation weights (e.g., deleteriousness, conservation scores) using the variant-Set Test for Association using Annotation infoRmation (STAAR). In the missense/LoF analysis without annotation weights and controlling for age, sex, state/territory, and genetic ancestry, three genes had an association with at least one measure of cognitive function (FDR q<0.1). APOE was associated with four measures of cognitive function, PICALM was associated with HMSE score, and TSPOAP1 was associated with executive function. The most strongly associated variants in each gene were rs429358 (APOE ε4), rs779406084 (PICALM), and rs9913145 (TSPOAP1). rs779406084 is a rare missense mutation that is more prevalent in LASI-DAD than in EA (minor allele frequency=0.075% vs. 0.0015%); the other two are common variants. No genes in the brain-specific promoter/enhancer analysis met criteria for significance. Results with and without annotation weights were similar. Missense/LoF variants in some genes previously associated with AD in EA are associated with measures of cognitive function in South Asians from India. Analyzing genome sequence data allows identification of potential novel causal variants enriched in South Asians.

3.
Nat Commun ; 15(1): 684, 2024 Jan 23.
Article in English | MEDLINE | ID: mdl-38263370

ABSTRACT

The heterogeneity of the whole-exome sequencing (WES) data generation methods present a challenge to a joint analysis. Here we present a bioinformatics strategy for joint-calling 20,504 WES samples collected across nine studies and sequenced using ten capture kits in fourteen sequencing centers in the Alzheimer's Disease Sequencing Project. The joint-genotype called variant-called format (VCF) file contains only positions within the union of capture kits. The VCF was then processed specifically to account for the batch effects arising from the use of different capture kits from different studies. We identified 8.2 million autosomal variants. 96.82% of the variants are high-quality, and are located in 28,579 Ensembl transcripts. 41% of the variants are intronic and 1.8% of the variants are with CADD > 30, indicating they are of high predicted pathogenicity. Here we show our new strategy can generate high-quality data from processing these diversely generated WES samples. The improved ability to combine data sequenced in different batches benefits the whole genomics research community.


Subject(s)
Alzheimer Disease , Humans , Exome , Computational Biology , Data Accuracy , Genotype
4.
BMC Genomics ; 25(1): 115, 2024 Jan 26.
Article in English | MEDLINE | ID: mdl-38279154

ABSTRACT

BACKGROUND: Short tandem repeats (STRs) are widely distributed across the human genome and are associated with numerous neurological disorders. However, the extent that STRs contribute to disease is likely under-estimated because of the challenges calling these variants in short read next generation sequencing data. Several computational tools have been developed for STR variant calling, but none fully address all of the complexities associated with this variant class. RESULTS: Here we introduce LUSTR which is designed to address some of the challenges associated with STR variant calling by enabling more flexibility in defining STR loci, allowing for customizable modules to tailor analyses, and expanding the capability to call somatic and multiallelic STR variants. LUSTR is a user-friendly and easily customizable tool for targeted or unbiased genome-wide STR variant screening that can use either predefined or novel genome builds. Using both simulated and real data sets, we demonstrated that LUSTR accurately infers germline and somatic STR expansions in individuals with and without diseases. CONCLUSIONS: LUSTR offers a powerful and user-friendly approach that allows for the identification of STR variants and can facilitate more comprehensive studies evaluating the role of pathogenic STR variants across human diseases.


Subject(s)
Genome, Human , Microsatellite Repeats , Humans , Microsatellite Repeats/genetics , Germ Cells , High-Throughput Nucleotide Sequencing
5.
Alzheimers Dement ; 20(2): 1123-1136, 2024 Feb.
Article in English | MEDLINE | ID: mdl-37881831

ABSTRACT

INTRODUCTION: The National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site Alzheimer's Genomics Database (GenomicsDB) is a public knowledge base of Alzheimer's disease (AD) genetic datasets and genomic annotations. METHODS: GenomicsDB uses a custom systems architecture to adopt and enforce rigorous standards that facilitate harmonization of AD-relevant genome-wide association study summary statistics datasets with functional annotations, including over 230 million annotated variants from the AD Sequencing Project. RESULTS: GenomicsDB generates interactive reports compiled from the harmonized datasets and annotations. These reports contextualize AD-risk associations in a broader functional genomic setting and summarize them in the context of functionally annotated genes and variants. DISCUSSION: Created to make AD-genetics knowledge more accessible to AD researchers, the GenomicsDB is designed to guide users unfamiliar with genetic data in not only exploring but also interpreting this ever-growing volume of data. Scalable and interoperable with other genomics resources using data technology standards, the GenomicsDB can serve as a central hub for research and data analysis on AD and related dementias. HIGHLIGHTS: The National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site (NIAGADS) offers to the public a unique, disease-centric collection of AD-relevant GWAS summary statistics datasets. Interpreting these data is challenging and requires significant bioinformatics expertise to standardize datasets and harmonize them with functional annotations on genome-wide scales. The NIAGADS Alzheimer's GenomicsDB helps overcome these challenges by providing a user-friendly public knowledge base for AD-relevant genetics that shares harmonized, annotated summary statistics datasets from the NIAGADS repository in an interpretable, easily searchable format.


Subject(s)
Alzheimer Disease , United States , Humans , Alzheimer Disease/genetics , Genome-Wide Association Study , National Institute on Aging (U.S.) , Genomics , Databases, Factual , Genetic Predisposition to Disease/genetics
6.
Bioinformatics ; 39(11)2023 11 01.
Article in English | MEDLINE | ID: mdl-37947320

ABSTRACT

SUMMARY: Preparing functional genomic (FG) data with diverse assay types and file formats for integration into analysis workflows that interpret genome-wide association and other studies is a significant and time-consuming challenge. Here we introduce hipFG (Harmonization and Integration Pipeline for Functional Genomics), an automatically customized pipeline for efficient and scalable normalization of heterogenous FG data collections into standardized, indexed, rapidly searchable analysis-ready datasets while accounting for FG datatypes (e.g. chromatin interactions, genomic intervals, quantitative trait loci). AVAILABILITY AND IMPLEMENTATION: hipFG is freely available at https://bitbucket.org/wanglab-upenn/hipFG. A Docker container is available at https://hub.docker.com/r/wanglab/hipfg.


Subject(s)
Genome-Wide Association Study , Software , Genomics , Chromatin , Quantitative Trait Loci
7.
Res Sq ; 2023 Oct 05.
Article in English | MEDLINE | ID: mdl-37886469

ABSTRACT

Structural variations (SVs) are important contributors to the genetics of human diseases. However, their role in Alzheimer's disease (AD) remains largely unstudied due to challenges in accurately detecting SVs. We analyzed whole-genome sequencing data from the Alzheimer's Disease Sequencing Project (N = 16,905) and identified 400,234 (168,223 high-quality) SVs. Laboratory validation yielded a sensitivity of 82% (85% for high-quality). We found a significant burden of deletions and duplications in AD cases, particularly for singletons and homozygous events. On AD genes, we observed the ultra-rare SVs associated with the disease, including protein-altering SVs in ABCA7, APP, PLCG2, and SORL1. Twenty-one SVs are in linkage disequilibrium (LD) with known AD-risk variants, exemplified by a 5k deletion in complete LD with rs143080277 in NCK2. We also identified 16 SVs associated with AD and 13 SVs linked to AD-related pathological/cognitive endophenotypes. This study highlights the pivotal role of SVs in shaping our understanding of AD genetics.

8.
medRxiv ; 2023 Sep 02.
Article in English | MEDLINE | ID: mdl-37693521

ABSTRACT

Alzheimer's Disease (AD) is a common disorder of the elderly that is both highly heritable and genetically heterogeneous. Here, we investigated the association between AD and both common variants and aggregates of rare coding and noncoding variants in 13,371 individuals of diverse ancestry with whole genome sequence (WGS) data. Pooled-population analyses identified genetic variants in or near APOE, BIN1, and LINC00320 significantly associated with AD (p < 5×10-8). Population-specific analyses identified a haplotype on chromosome 14 including PSEN1 associated with AD in Hispanics, further supported by aggregate testing of rare coding and noncoding variants in this region. Finally, we observed suggestive associations (p < 5×10-5) of aggregates of rare coding rare variants in ABCA7 among non-Hispanic Whites (p=5.4×10-6), and rare noncoding variants in the promoter of TOMM40 distinct of APOE in pooled-population analyses (p=7.2×10-8). Complementary pooled-population and population-specific analyses offered unique insights into the genetic architecture of AD.

9.
medRxiv ; 2023 Sep 13.
Article in English | MEDLINE | ID: mdl-37745545

ABSTRACT

Structural variations (SVs) are important contributors to the genetics of numerous human diseases. However, their role in Alzheimer's disease (AD) remains largely unstudied due to challenges in accurately detecting SVs. Here, we analyzed whole-genome sequencing data from the Alzheimer's Disease Sequencing Project (ADSP, N=16,905 subjects) and identified 400,234 (168,223 high-quality) SVs. We found a significant burden of deletions and duplications in AD cases (OR=1.05, P=0.03), particularly for singletons (OR=1.12, P=0.0002) and homozygous events (OR=1.10, P<0.0004). On AD genes, the ultra-rare SVs, including protein-altering SVs in ABCA7, APP, PLCG2, and SORL1, were associated with AD (SKAT-O P=0.004). Twenty-one SVs are in linkage disequilibrium (LD) with known AD-risk variants, e.g., a deletion (chr2:105731359-105736864) in complete LD (R2=0.99) with rs143080277 (chr2:105749599) in NCK2. We also identified 16 SVs associated with AD and 13 SVs associated with AD-related pathological/cognitive endophenotypes. Our findings demonstrate the broad impact of SVs on AD genetics.

10.
medRxiv ; 2023 Jul 08.
Article in English | MEDLINE | ID: mdl-37461624

ABSTRACT

Limited ancestral diversity has impaired our ability to detect risk variants more prevalent in non-European ancestry groups in genome-wide association studies (GWAS). We constructed and analyzed a multi-ancestry GWAS dataset in the Alzheimer's Disease (AD) Genetics Consortium (ADGC) to test for novel shared and ancestry-specific AD susceptibility loci and evaluate underlying genetic architecture in 37,382 non-Hispanic White (NHW), 6,728 African American, 8,899 Hispanic (HIS), and 3,232 East Asian individuals, performing within-ancestry fixed-effects meta-analysis followed by a cross-ancestry random-effects meta-analysis. We identified 13 loci with cross-ancestry associations including known loci at/near CR1 , BIN1 , TREM2 , CD2AP , PTK2B , CLU , SHARPIN , MS4A6A , PICALM , ABCA7 , APOE and two novel loci not previously reported at 11p12 ( LRRC4C ) and 12q24.13 ( LHX5-AS1 ). Reflecting the power of diverse ancestry in GWAS, we observed the SHARPIN locus using 7.1% the sample size of the original discovering single-ancestry GWAS (n=788,989). We additionally identified three GWS ancestry-specific loci at/near ( PTPRK ( P =2.4×10 -8 ) and GRB14 ( P =1.7×10 -8 ) in HIS), and KIAA0825 ( P =2.9×10 -8 in NHW). Pathway analysis implicated multiple amyloid regulation pathways (strongest with P adjusted =1.6×10 -4 ) and the classical complement pathway ( P adjusted =1.3×10 -3 ). Genes at/near our novel loci have known roles in neuronal development ( LRRC4C, LHX5-AS1 , and PTPRK ) and insulin receptor activity regulation ( GRB14 ). These findings provide compelling support for using traditionally-underrepresented populations for gene discovery, even with smaller sample sizes.

11.
bioRxiv ; 2023 Apr 25.
Article in English | MEDLINE | ID: mdl-37162864

ABSTRACT

Preparing functional genomic (FG) data with diverse assay types and file formats for integration into analysis workflows that interpret genome-wide association and other studies is a significant and time-consuming challenge. Here we introduce hipFG, an automatically customized pipeline for efficient and scalable normalization of heterogenous FG data collections into standardized, indexed, rapidly searchable analysis-ready datasets while accounting for FG datatypes (e.g., chromatin interactions, genomic intervals, quantitative trait loci).

12.
Alzheimers Res Ther ; 14(1): 194, 2022 12 26.
Article in English | MEDLINE | ID: mdl-36572909

ABSTRACT

BACKGROUND: Alzheimer's disease (AD) shares risk factors with cardiovascular disease (CVD) and dysregulated cholesterol metabolism is a mechanism common to both diseases. Cholesterol efflux capacity (CEC) is an ex vivo metric of plasma high-density lipoprotein (HDL) function and inversely predicts incident CVD independently of other risk factors. Cholesterol pools in the central nervous system (CNS) are largely separate from those in blood, and CNS cholesterol excess may promote neurodegeneration. CEC of cerebrospinal fluid (CSF) may be a useful measure of CNS cholesterol trafficking. We hypothesized that subjects with AD and mild cognitive impairment (MCI) would have reduced CSF CEC compared with Cognitively Normal (CN) and that CSF apolipoproteins apoA-I, apoJ, and apoE might have associations with CSF CEC. METHODS: We retrieved CSF and same-day ethylenediaminetetraacetic acid (EDTA) plasma from 108 subjects (40 AD; 18 MCI; and 50 CN) from the Center for Neurodegenerative Disease Research biobank at the Perelman School of Medicine, University of Pennsylvania. For CSF CEC assays, we used N9 mouse microglial cells and SH-SY5Y human neuroblastoma cells, and the corresponding plasma assay used J774 cells. Cells were labeled with [3H]-cholesterol for 24 h, had ABCA1 expression upregulated for 6 h, were exposed to 33 µl of CSF, and then were incubated for 2.5 h. CEC was quantified as percent [3H]-cholesterol counts in medium of total counts medium+cells, normalized to a pool sample. ApoA-I, ApoJ, ApoE, and cholesterol were also measured in CSF. RESULTS: We found that CSF CEC was significantly lower in MCI compared with controls and was poorly correlated with plasma CEC. CSF levels of ApoJ/Clusterin were also significantly lower in MCI and were significantly associated with CSF CEC. While CSF ApoA-I was also associated with CSF CEC, CSF ApoE had no association with CSF CEC. CSF CEC is significantly and positively associated with CSF Aß. Taken together, ApoJ/Clusterin may be an important determinant of CSF CEC, which in turn could mitigate risk of MCI and AD risk by promoting cellular efflux of cholesterol or other lipids. In contrast, CSF ApoE does not appear to play a role in determining CSF CEC.


Subject(s)
Alzheimer Disease , Cardiovascular Diseases , Neuroblastoma , Neurodegenerative Diseases , Humans , Mice , Animals , Clusterin , Alzheimer Disease/cerebrospinal fluid , Apolipoprotein A-I , Apolipoproteins E/cerebrospinal fluid , Cholesterol
13.
Bioinformatics ; 38(19): 4530-4536, 2022 09 30.
Article in English | MEDLINE | ID: mdl-35980155

ABSTRACT

MOTIVATION: Cell-type deconvolution of bulk tissue RNA sequencing (RNA-seq) data is an important step toward understanding the variations in cell-type composition among disease conditions. Owing to recent advances in single-cell RNA sequencing (scRNA-seq) and the availability of large amounts of bulk RNA-seq data in disease-relevant tissues, various deconvolution methods have been developed. However, the performance of existing methods heavily relies on the quality of information provided by external data sources, such as the selection of scRNA-seq data as a reference and prior biological information. RESULTS: We present the Integrated and Robust Deconvolution (InteRD) algorithm to infer cell-type proportions from target bulk RNA-seq data. Owing to the innovative use of penalized regression with a new evaluation criterion for deconvolution, InteRD has three primary advantages. First, it is able to effectively integrate deconvolution results from multiple scRNA-seq datasets. Second, InteRD calibrates estimates from reference-based deconvolution by taking into account extra biological information as priors. Third, the proposed algorithm is robust to inaccurate external information imposed in the deconvolution system. Extensive numerical evaluations and real-data applications demonstrate that InteRD yields more accurate and robust cell-type proportion estimates that agree well with known biology. AVAILABILITY AND IMPLEMENTATION: The proposed InteRD framework is implemented in R and the package is available at https://cran.r-project.org/web/packages/InteRD/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
RNA , Single-Cell Analysis , Single-Cell Analysis/methods , Gene Expression Profiling/methods , Software , Sequence Analysis, RNA/methods
14.
Hum Mol Genet ; 31(R1): R62-R72, 2022 10 20.
Article in English | MEDLINE | ID: mdl-35943817

ABSTRACT

Non-coding genetic variants outside of protein-coding genome regions play an important role in genetic and epigenetic regulation. It has become increasingly important to understand their roles, as non-coding variants often make up the majority of top findings of genome-wide association studies (GWAS). In addition, the growing popularity of disease-specific whole-genome sequencing (WGS) efforts expands the library of and offers unique opportunities for investigating both common and rare non-coding variants, which are typically not detected in more limited GWAS approaches. However, the sheer size and breadth of WGS data introduce additional challenges to predicting functional impacts in terms of data analysis and interpretation. This review focuses on the recent approaches developed for efficient, at-scale annotation and prioritization of non-coding variants uncovered in WGS analyses. In particular, we review the latest scalable annotation tools, databases and functional genomic resources for interpreting the variant findings from WGS based on both experimental data and in silico predictive annotations. We also review machine learning-based predictive models for variant scoring and prioritization. We conclude with a discussion of future research directions which will enhance the data and tools necessary for the effective functional analyses of variants identified by WGS to improve our understanding of disease etiology.


Subject(s)
Epigenesis, Genetic , Genome-Wide Association Study , Whole Genome Sequencing , Genomics
15.
J Alzheimers Dis ; 89(1): 1-12, 2022.
Article in English | MEDLINE | ID: mdl-35848019

ABSTRACT

The success of genome-wide association studies (GWAS) completed in the last 15 years has reinforced a key fact: polygenic architecture makes a substantial contribution to variation of susceptibility to complex disease, including Alzheimer's disease. One straight-forward way to capture this architecture and predict which individuals in a population are most at risk is to calculate a polygenic risk score (PRS). This score aggregates the risk conferred across multiple genetic variants, ultimately representing an individual's predicted genetic susceptibility for a disease. PRS have received increasing attention after having been successfully used in complex traits. This has brought with it renewed attention on new methods which improve the accuracy of risk prediction. While these applications are initially informative, their utility is far from equitable: the majority of PRS models use samples heavily if not entirely of individuals of European descent. This basic approach opens concerns of health equity if applied inaccurately to other population groups, or health disparity if we fail to use them at all. In this review we will examine the methods of calculating PRS and some of their previous uses in disease prediction. We also advocate for, with supporting scientific evidence, inclusion of data from diverse populations in these existing and future studies of population risk via PRS.


Subject(s)
Alzheimer Disease , Genome-Wide Association Study , Alzheimer Disease/genetics , Genetic Predisposition to Disease/genetics , Humans , Multifactorial Inheritance/genetics , Risk Factors
16.
Alzheimers Dement ; 2022 Jun 30.
Article in English | MEDLINE | ID: mdl-35770850

ABSTRACT

INTRODUCTION: Variants in the tau gene (MAPT) region are associated with breast cancer in women and Alzheimer's disease (AD) among persons lacking apolipoprotein E ε4 (ε4-). METHODS: To identify novel genes associated with tau-related pathology, we conducted two genome-wide association studies (GWAS) for AD, one among 10,340 ε4- women in the Alzheimer's Disease Genetics Consortium (ADGC) and another in 31 members (22 women) of a consanguineous Hutterite kindred. RESULTS: We identified novel associations of AD with MGMT variants in the ADGC (rs12775171, odds ratio [OR] = 1.4, P = 4.9 × 10-8 ) and Hutterite (rs12256016 and rs2803456, OR = 2.0, P = 1.9 × 10-14 ) datasets. Multi-omics analyses showed that the most significant and largest number of associations among the single nucleotide polymorphisms (SNPs), DNA-methylated CpGs, MGMT expression, and AD-related neuropathological traits were observed among women. Furthermore, promoter capture Hi-C analyses revealed long-range interactions of the MGMT promoter with MGMT SNPs and CpG sites. DISCUSSION: These findings suggest that epigenetically regulated MGMT expression is involved in AD pathogenesis, especially in women.

17.
Alzheimers Dement ; 18(12): 2458-2467, 2022 12.
Article in English | MEDLINE | ID: mdl-35258170

ABSTRACT

INTRODUCTION: Progranulin (GRN) mutations occur in frontotemporal lobar degeneration (FTLD) and in Alzheimer's disease (AD), often with TDP-43 pathology. METHODS: We determined the frequency of rs5848 and rare, pathogenic GRN mutations in two autopsy and one family cohort. We compared Braak stage, ß-amyloid load, hyperphosphorylated tau (PHFtau) tangle density and TDP-43 pathology in GRN carriers and non-carriers. RESULTS: Pathogenic GRN mutations were more frequent in all cohorts compared to the Genome Aggregation Database (gnomAD), but there was no evidence for association with AD. Pathogenic GRN carriers had significantly higher PHFtau tangle density adjusting for age, sex and APOE ε4 genotype. AD patients with rs5848 had higher frequencies of hippocampal sclerosis and TDP-43 deposits. Twenty-two rare, pathogenic GRN variants were observed in the family cohort. DISCUSSION: GRN mutations in clinical and neuropathological AD increase the burden of tau-related brain pathology but show no specific association with ß-amyloid load or AD.


Subject(s)
Alzheimer Disease , Frontotemporal Lobar Degeneration , Humans , Progranulins/genetics , Alzheimer Disease/genetics , Alzheimer Disease/pathology , Intercellular Signaling Peptides and Proteins/genetics , Mutation/genetics , Frontotemporal Lobar Degeneration/genetics , DNA-Binding Proteins/genetics
18.
Genome Res ; 32(4): 778-790, 2022 04.
Article in English | MEDLINE | ID: mdl-35210353

ABSTRACT

More than 90% of genetic variants are rare in most modern sequencing studies, such as the Alzheimer's Disease Sequencing Project (ADSP) whole-exome sequencing (WES) data. Furthermore, 54% of the rare variants in ADSP WES are singletons. However, both single variant and unit-based tests are limited in their statistical power to detect an association between rare variants and phenotypes. To best use missense rare variants and investigate their biological effect, we examine their association with phenotypes in the context of protein structures. We developed a protein structure-based approach, protein optimized kernel evaluation of missense nucleotides (POKEMON), which evaluates rare missense variants based on their spatial distribution within a protein rather than their allele frequency. The hypothesis behind this test is that the three-dimensional spatial distribution of variants within a protein structure provides functional context to power an association test. POKEMON identified three candidate genes (TREM2, SORL1, and EXOC3L4) and another suggestive gene from the ADSP WES data. For TREM2 and SORL1, two known Alzheimer's disease (AD) genes, the signal from the spatial cluster is stable even if we exclude known AD risk variants, indicating the presence of additional low-frequency risk variants within these genes. EXOC3L4 is a novel AD risk gene that has a cluster of variants primarily shared by case subjects around the Sec6 domain. This cluster is also validated in an independent replication data set and a validation data set with a larger sample size.


Subject(s)
Alzheimer Disease , Alzheimer Disease/genetics , Gene Frequency , Genetic Predisposition to Disease , Humans , LDL-Receptor Related Proteins/genetics , LDL-Receptor Related Proteins/metabolism , Membrane Transport Proteins/genetics , Mutation, Missense , Phenotype , Exome Sequencing
19.
NAR Genom Bioinform ; 4(1): lqab123, 2022 Mar.
Article in English | MEDLINE | ID: mdl-35047815

ABSTRACT

Querying massive functional genomic and annotation data collections, linking and summarizing the query results across data sources/data types are important steps in high-throughput genomic and genetic analytical workflows. However, these steps are made difficult by the heterogeneity and breadth of data sources, experimental assays, biological conditions/tissues/cell types and file formats. FILER (FunctIonaL gEnomics Repository) is a framework for querying large-scale genomics knowledge with a large, curated integrated catalog of harmonized functional genomic and annotation data coupled with a scalable genomic search and querying interface. FILER uniquely provides: (i) streamlined access to >50 000 harmonized, annotated genomic datasets across >20 integrated data sources, >1100 tissues/cell types and >20 experimental assays; (ii) a scalable genomic querying interface; and (iii) ability to analyze and annotate user's experimental data. This rich resource spans >17 billion GRCh37/hg19 and GRCh38/hg38 genomic records. Our benchmark querying 7 × 109 hg19 FILER records shows FILER is highly scalable, with a sub-linear 32-fold increase in querying time when increasing the number of queries 1000-fold from 1000 to 1 000 000 intervals. Together, these features facilitate reproducible research and streamline integrating/querying large-scale genomic data within analyses/workflows. FILER can be deployed on cloud or local servers (https://bitbucket.org/wanglab-upenn/FILER) for integration with custom pipelines and is freely available (https://lisanwanglab.org/FILER).

20.
J Alzheimers Dis ; 86(1): 461-477, 2022.
Article in English | MEDLINE | ID: mdl-35068457

ABSTRACT

BACKGROUND: Recent Alzheimer's disease (AD) genetics findings from genome-wide association studies (GWAS) span progressively larger and more diverse populations and outcomes. Currently, there is no up-to-date resource providing harmonized and searchable information on all AD genetic associations found by GWAS, nor linking the reported genetic variants and genes with functional and genomic annotations. OBJECTIVE: Create an integrated/harmonized, and literature-derived collection of population-specific AD genetic associations. METHODS: We developed the Alzheimer's Disease Variant Portal (ADVP), an extensive collection of associations curated from >200 GWAS publications from Alzheimer's Disease Genetics Consortium and other consortia. Genetic associations were systematically extracted, harmonized, and annotated from both the genome-wide significant and suggestive loci reported in these publications. To ensure consistent representation of AD genetic findings, all the extracted genetic association information was harmonized across specifically designed publication, variant, and association categories. RESULTS: ADVP V1.0 (February 2021) catalogs 6,990 associations related to disease-risk, expression quantitative traits, endophenotypes, or neuropathology. This extensive harmonization effort led to a catalog containing >900 loci, >1,800 variants, >80 cohorts, and 8 populations. Besides, ADVP provides investigators with a seamless integration of genomic and publicly available functional annotations across multiple databases per harmonized variant and gene records, thus facilitating further understanding and analyses of these genetics findings. CONCLUSION: ADVP is a valuable resource for investigators to quickly and systematically explore high-confidence AD genetic findings and provides insights into population-specific AD genetic architecture. ADVP is continually maintained and enhanced by NIAGADS and is freely accessible at https://advp.niagads.org.


Subject(s)
Alzheimer Disease , Genome-Wide Association Study , Alzheimer Disease/genetics , Endophenotypes , Genetic Predisposition to Disease/genetics , Humans , Polymorphism, Single Nucleotide
SELECTION OF CITATIONS
SEARCH DETAIL
...