Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 79
Filter
Add more filters

Publication year range
1.
Cell ; 173(7): 1692-1704.e11, 2018 06 14.
Article in English | MEDLINE | ID: mdl-29779949

ABSTRACT

Heritability is essential for understanding the biological causes of disease but requires laborious patient recruitment and phenotype ascertainment. Electronic health records (EHRs) passively capture a wide range of clinically relevant data and provide a resource for studying the heritability of traits that are not typically accessible. EHRs contain next-of-kin information collected via patient emergency contact forms, but until now, these data have gone unused in research. We mined emergency contact data at three academic medical centers and identified 7.4 million familial relationships while maintaining patient privacy. Identified relationships were consistent with genetically derived relatedness. We used EHR data to compute heritability estimates for 500 disease phenotypes. Overall, estimates were consistent with the literature and between sites. Inconsistencies were indicative of limitations and opportunities unique to EHR research. These analyses provide a validation of the use of EHRs for genetics and disease research.


Subject(s)
Electronic Health Records , Genetic Diseases, Inborn/genetics , Algorithms , Databases, Factual , Family Relations , Genetic Diseases, Inborn/pathology , Genotype , Humans , Pedigree , Phenotype , Quantitative Trait, Heritable
2.
Am J Hum Genet ; 111(7): 1448-1461, 2024 07 11.
Article in English | MEDLINE | ID: mdl-38821058

ABSTRACT

Both trio and population designs are popular study designs for identifying risk genetic variants in genome-wide association studies (GWASs). The trio design, as a family-based design, is robust to confounding due to population structure, whereas the population design is often more powerful due to larger sample sizes. Here, we propose KnockoffHybrid, a knockoff-based statistical method for hybrid analysis of both the trio and population designs. KnockoffHybrid provides a unified framework that brings together the advantages of both designs and produces powerful hybrid analysis while controlling the false discovery rate (FDR) in the presence of linkage disequilibrium and population structure. Furthermore, KnockoffHybrid has the flexibility to leverage different types of summary statistics for hybrid analyses, including expression quantitative trait loci (eQTL) and GWAS summary statistics. We demonstrate in simulations that KnockoffHybrid offers power gains over non-hybrid methods for the trio and population designs with the same number of cases while controlling the FDR with complex correlation among variants and population structure among subjects. In hybrid analyses of three trio cohorts for autism spectrum disorders (ASDs) from the Autism Speaks MSSNG, Autism Sequencing Consortium, and Autism Genome Project with GWAS summary statistics from the iPSYCH project and eQTL summary statistics from the MetaBrain project, KnockoffHybrid outperforms conventional methods by replicating several known risk genes for ASDs and identifying additional associations with variants in other genes, including the PRAME family genes involved in axon guidance and which may act as common targets for human speech/language evolution and related disorders.


Subject(s)
Autism Spectrum Disorder , Genome-Wide Association Study , Linkage Disequilibrium , Quantitative Trait Loci , Genome-Wide Association Study/methods , Humans , Autism Spectrum Disorder/genetics , Genetic Predisposition to Disease , Polymorphism, Single Nucleotide , Computer Simulation , Models, Genetic
3.
Am J Hum Genet ; 109(10): 1761-1776, 2022 10 06.
Article in English | MEDLINE | ID: mdl-36150388

ABSTRACT

Family-based designs can eliminate confounding due to population substructure and can distinguish direct from indirect genetic effects, but these designs are underpowered due to limited sample sizes. Here, we propose KnockoffTrio, a statistical method to identify putative causal genetic variants for father-mother-child trio design built upon a recently developed knockoff framework in statistics. KnockoffTrio controls the false discovery rate (FDR) in the presence of arbitrary correlations among tests and is less conservative and thus more powerful than the conventional methods that control the family-wise error rate via Bonferroni correction. Furthermore, KnockoffTrio is not restricted to family-based association tests and can be used in conjunction with more powerful, potentially nonlinear models to improve the power of standard family-based tests. We show, using empirical simulations, that KnockoffTrio can prioritize causal variants over associations due to linkage disequilibrium and can provide protection against confounding due to population stratification. In applications to 14,200 trios from three study cohorts for autism spectrum disorders (ASDs), including AGP, SPARK, and SSC, we show that KnockoffTrio can identify multiple significant associations that are missed by conventional tests applied to the same data. In particular, we replicate known ASD association signals with variants in several genes such as MACROD2, NRXN1, PRKAR1B, CADM2, PCDH9, and DOCK4 and identify additional associations with variants in other genes including ARHGEF10, SLC28A1, ZNF589, and HINT1 at FDR 10%.


Subject(s)
Autism Spectrum Disorder , Genome-Wide Association Study , Autism Spectrum Disorder/genetics , Causality , Genome-Wide Association Study/methods , Humans , Linkage Disequilibrium , Nerve Tissue Proteins/genetics
4.
Am J Hum Genet ; 109(3): 446-456, 2022 03 03.
Article in English | MEDLINE | ID: mdl-35216679

ABSTRACT

Attempts to identify and prioritize functional DNA elements in coding and non-coding regions, particularly through use of in silico functional annotation data, continue to increase in popularity. However, specific functional roles can vary widely from one variant to another, making it challenging to summarize different aspects of variant function with a one-dimensional rating. Here we propose multi-dimensional annotation-class integrative estimation (MACIE), an unsupervised multivariate mixed-model framework capable of integrating annotations of diverse origin to assess multi-dimensional functional roles for both coding and non-coding variants. Unlike existing one-dimensional scoring methods, MACIE views variant functionality as a composite attribute encompassing multiple characteristics and estimates the joint posterior functional probabilities of each genomic position. This estimate offers more comprehensive and interpretable information in the presence of multiple aspects of functionality. Applied to a variety of independent coding and non-coding datasets, MACIE demonstrates powerful and robust performance in discriminating between functional and non-functional variants. We also show an application of MACIE to fine-mapping and heritability enrichment analysis by using the lipids GWAS summary statistics data from the European Network for Genetic and Genomic Epidemiology Consortium.


Subject(s)
Genome, Human , Genome-Wide Association Study , Genome, Human/genetics , Genome-Wide Association Study/methods , Genomics , Humans , Molecular Sequence Annotation , Polymorphism, Single Nucleotide/genetics , Probability
5.
Am J Hum Genet ; 108(12): 2336-2353, 2021 12 02.
Article in English | MEDLINE | ID: mdl-34767756

ABSTRACT

Knockoff-based methods have become increasingly popular due to their enhanced power for locus discovery and their ability to prioritize putative causal variants in a genome-wide analysis. However, because of the substantial computational cost for generating knockoffs, existing knockoff approaches cannot analyze millions of rare genetic variants in biobank-scale whole-genome sequencing and whole-genome imputed datasets. We propose a scalable knockoff-based method for the analysis of common and rare variants across the genome, KnockoffScreen-AL, that is applicable to biobank-scale studies with hundreds of thousands of samples and millions of genetic variants. The application of KnockoffScreen-AL to the analysis of Alzheimer disease (AD) in 388,051 WG-imputed samples from the UK Biobank resulted in 31 significant loci, including 14 loci that are missed by conventional association tests on these data. We perform replication studies in an independent meta-analysis of clinically diagnosed AD with 94,437 samples, and additionally leverage single-cell RNA-sequencing data with 143,793 single-nucleus transcriptomes from 17 control subjects and AD-affected individuals, and proteomics data from 735 control subjects and affected indviduals with AD and related disorders to validate the genes at these significant loci. These multi-omics analyses show that 79.1% of the proximal genes at these loci and 76.2% of the genes at loci identified only by KnockoffScreen-AL exhibit at least suggestive signal (p < 0.05) in the scRNA-seq or proteomics analyses. We highlight a potentially causal gene in AD progression, EGFR, that shows significant differences in expression and protein levels between AD-affected individuals and healthy control subjects.


Subject(s)
Alzheimer Disease/genetics , Biological Specimen Banks , Gene Knockout Techniques , Genes, erbB-1 , Genetic Variation , Genome-Wide Association Study , Humans , RNA-Seq , Transcriptome , Whole Genome Sequencing
6.
Proc Natl Acad Sci U S A ; 118(47)2021 11 23.
Article in English | MEDLINE | ID: mdl-34799441

ABSTRACT

Gene-based tests are valuable techniques for identifying genetic factors in complex traits. Here, we propose a gene-based testing framework that incorporates data on long-range chromatin interactions, several recent technical advances for region-based tests, and leverages the knockoff framework for synthetic genotype generation for improved gene discovery. Through simulations and applications to genome-wide association studies (GWAS) and whole-genome sequencing data for multiple diseases and traits, we show that the proposed test increases the power over state-of-the-art gene-based tests in the literature, identifies genes that replicate in larger studies, and can provide a more narrow focus on the possible causal genes at a locus by reducing the confounding effect of linkage disequilibrium. Furthermore, our results show that incorporating genetic variation in distal regulatory elements tends to improve power over conventional tests. Results for UK Biobank and BioBank Japan traits are also available in a publicly accessible database that allows researchers to query gene-based results in an easy fashion.


Subject(s)
Chromatin , Genetic Testing/methods , Genotype , Genome-Wide Association Study/methods , Humans , Japan , Linkage Disequilibrium , Lung , Models, Genetic , Phenotype , Quantitative Trait Loci , Whole Genome Sequencing/methods
7.
PLoS Genet ; 17(8): e1009713, 2021 08.
Article in English | MEDLINE | ID: mdl-34460823

ABSTRACT

Genome-wide association studies (GWASs) have uncovered a wealth of associations between common variants and human phenotypes. Here, we present an integrative analysis of GWAS summary statistics from 36 phenotypes to decipher multitrait genetic architecture and its link with biological mechanisms. Our framework incorporates multitrait association mapping along with an investigation of the breakdown of genetic associations into clusters of variants harboring similar multitrait association profiles. Focusing on two subsets of immunity and metabolism phenotypes, we then demonstrate how genetic variants within clusters can be mapped to biological pathways and disease mechanisms. Finally, for the metabolism set, we investigate the link between gene cluster assignment and the success of drug targets in randomized controlled trials.


Subject(s)
Computational Biology/methods , Polymorphism, Single Nucleotide , Quantitative Trait Loci , Cluster Analysis , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Phenotype
8.
J Am Soc Nephrol ; 34(4): 607-618, 2023 04 01.
Article in English | MEDLINE | ID: mdl-36302597

ABSTRACT

SIGNIFICANCE STATEMENT: Pathogenic structural genetic variants, also known as genomic disorders, have been associated with pediatric CKD. This study extends those results across the lifespan, with genomic disorders enriched in both pediatric and adult patients compared with controls. In the Chronic Renal Insufficiency Cohort study, genomic disorders were also associated with lower serum Mg, lower educational performance, and a higher risk of death. A phenome-wide association study confirmed the link between kidney disease and genomic disorders in an unbiased way. Systematic detection of genomic disorders can provide a molecular diagnosis and refine prediction of risk and prognosis. BACKGROUND: Genomic disorders (GDs) are associated with many comorbid outcomes, including CKD. Identification of GDs has diagnostic utility. METHODS: We examined the prevalence of GDs among participants in the Chronic Kidney Disease in Children (CKiD) cohort II ( n =248), Chronic Renal Insufficiency Cohort (CRIC) study ( n =3375), Columbia University CKD Biobank (CU-CKD; n =1986), and the Family Investigation of Nephropathy and Diabetes (FIND; n =1318) compared with 30,746 controls. We also performed a phenome-wide association analysis (PheWAS) of GDs in the electronic MEdical Records and GEnomics (eMERGE; n =11,146) cohort. RESULTS: We found nine out of 248 (3.6%) CKiD II participants carried a GD, replicating prior findings in pediatric CKD. We also identified GDs in 72 out of 6679 (1.1%) adult patients with CKD in the CRIC, CU-CKD, and FIND cohorts, compared with 199 out of 30,746 (0.65%) GDs in controls (OR, 1.7; 95% CI, 1.3 to 2.2). Among adults with CKD, we found recurrent GDs at the 1q21.1, 16p11.2, 17q12, and 22q11.2 loci. The 17q12 GD (diagnostic of renal cyst and diabetes syndrome) was most frequent, present in 1:252 patients with CKD and diabetes. In the PheWAS, dialysis and neuropsychiatric phenotypes were the top associations with GDs. In CRIC participants, GDs were associated with lower serum magnesium, lower educational achievement, and higher mortality risk. CONCLUSION: Undiagnosed GDs are detected both in children and adults with CKD. Identification of GDs in these patients can enable a precise genetic diagnosis, inform prognosis, and help stratify risk in clinical studies. GDs could also provide a molecular explanation for nephropathy and comorbidities, such as poorer neurocognition for a subset of patients.


Subject(s)
Longevity , Renal Insufficiency, Chronic , Humans , Cohort Studies , Prospective Studies , Renal Insufficiency, Chronic/epidemiology , Renal Insufficiency, Chronic/genetics , Renal Insufficiency, Chronic/complications , Genomics , Disease Progression , Risk Factors
9.
Am J Hum Genet ; 106(4): 513-524, 2020 04 02.
Article in English | MEDLINE | ID: mdl-32243819

ABSTRACT

The identification of functional regions in the noncoding human genome is difficult but critical in order to gain understanding of the role noncoding variation plays in gene regulation in human health and disease. We describe here a co-localization approach that aims to identify constrained sequences that co-localize with tissue- or cell-type-specific regulatory regions, and we show that the resulting score is particularly well suited for the identification of rare regulatory variants. For 127 tissues and cell types in the ENCODE/Roadmap Epigenomics Project, we provide catalogs of putative tissue- or cell-type-specific regulatory regions under sequence constraint. We use the newly developed co-localization score for brain tissues to score de novo mutations in whole genomes from 1,902 individuals affected with autism spectrum disorder (ASD) and their unaffected siblings in the Simons Simplex Collection. We show that noncoding de novo mutations near genes co-expressed in midfetal brain with high confidence ASD risk genes, and near FMRP gene targets are more likely to be in co-localized regions if they occur in ASD probands versus in their unaffected siblings. We also observed a similar enrichment for mutations near lincRNAs, previously shown to co-express with ASD risk genes. Additionally, we provide strong evidence that prioritized de novo mutations in autism probands point to a small set of well-known ASD genes, the disruption of which produces relevant mouse phenotypes such as abnormal social investigation and abnormal discrimination/associative learning, unlike the de novo mutations in unaffected siblings. The genome-wide co-localization results are available online.


Subject(s)
Gene Expression Regulation/genetics , Genome, Human/genetics , Autism Spectrum Disorder/genetics , Epigenomics/methods , Humans , Mutation/genetics , Phenotype , Siblings , Whole Genome Sequencing/methods
10.
Genet Med ; 25(12): 100983, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37746849

ABSTRACT

PURPOSE: Previous work identified rare variants in DSTYK associated with human congenital anomalies of the kidney and urinary tract (CAKUT). Here, we present a series of mouse and human studies to clarify the association, penetrance, and expressivity of DSTYK variants. METHODS: We phenotypically characterized Dstyk knockout mice of 3 separate inbred backgrounds and re-analyzed the original family segregating the DSTYK c.654+1G>A splice-site variant (referred to as "SSV" below). DSTYK loss of function (LOF) and SSVs were annotated in individuals with CAKUT, epilepsy, or amyotrophic lateral sclerosis vs controls. A phenome-wide association study analysis was also performed using United Kingdom Biobank (UKBB) data. RESULTS: Results demonstrate ∼20% to 25% penetrance of obstructive uropathy, at least, in C57BL/6J and FVB/NJ Dstyk-/- mice. Phenotypic penetrance increased to ∼40% in C3H/HeJ mutants, with mild-to-moderate severity. Re-analysis of the original family segregating the rare SSV showed low penetrance (43.8%) and no alternative genetic causes for CAKUT. LOF DSTYK variants burden showed significant excess for CAKUT and epilepsy vs controls and an exploratory phenome-wide association study supported association with neurological disorders. CONCLUSION: These data support causality for DSTYK LOF variants and highlights the need for large-scale sequencing studies (here >200,000 cases) to accurately assess causality for genes and variants to lowly penetrant traits with common population prevalence.


Subject(s)
Epilepsy , Urinary Tract , Urogenital Abnormalities , Animals , Mice , Humans , Penetrance , Mice, Inbred C3H , Mice, Inbred C57BL , Urogenital Abnormalities/genetics , Kidney/abnormalities , Risk Factors , Epilepsy/genetics , Receptor-Interacting Protein Serine-Threonine Kinases/genetics
11.
N Engl J Med ; 380(20): 1918-1928, 2019 05 16.
Article in English | MEDLINE | ID: mdl-31091373

ABSTRACT

BACKGROUND: In the context of kidney transplantation, genomic incompatibilities between donor and recipient may lead to allosensitization against new antigens. We hypothesized that recessive inheritance of gene-disrupting variants may represent a risk factor for allograft rejection. METHODS: We performed a two-stage genetic association study of kidney allograft rejection. In the first stage, we performed a recessive association screen of 50 common gene-intersecting deletion polymorphisms in a cohort of kidney transplant recipients. In the second stage, we replicated our findings in three independent cohorts of donor-recipient pairs. We defined genomic collision as a specific donor-recipient genotype combination in which a recipient who was homozygous for a gene-intersecting deletion received a transplant from a nonhomozygous donor. Identification of alloantibodies was performed with the use of protein arrays, enzyme-linked immunosorbent assays, and Western blot analyses. RESULTS: In the discovery cohort, which included 705 recipients, we found a significant association with allograft rejection at the LIMS1 locus represented by rs893403 (hazard ratio with the risk genotype vs. nonrisk genotypes, 1.84; 95% confidence interval [CI], 1.35 to 2.50; P = 9.8×10-5). This effect was replicated under the genomic-collision model in three independent cohorts involving a total of 2004 donor-recipient pairs (hazard ratio, 1.55; 95% CI, 1.25 to 1.93; P = 6.5×10-5). In the combined analysis (discovery cohort plus replication cohorts), the risk genotype was associated with a higher risk of rejection than the nonrisk genotype (hazard ratio, 1.63; 95% CI, 1.37 to 1.95; P = 4.7×10-8). We identified a specific antibody response against LIMS1, a kidney-expressed protein encoded within the collision locus. The response involved predominantly IgG2 and IgG3 antibody subclasses. CONCLUSIONS: We found that the LIMS1 locus appeared to encode a minor histocompatibility antigen. Genomic collision at this locus was associated with rejection of the kidney allograft and with production of anti-LIMS1 IgG2 and IgG3. (Funded by the Columbia University Transplant Center and others.).


Subject(s)
Adaptor Proteins, Signal Transducing/genetics , DNA Copy Number Variations , Graft Rejection/genetics , Kidney Transplantation , LIM Domain Proteins/genetics , Adaptor Proteins, Signal Transducing/immunology , Cohort Studies , Genetic Association Studies , Genotype , HLA Antigens/genetics , Histocompatibility Testing , Humans , Immunoglobulin G/blood , LIM Domain Proteins/immunology , Membrane Proteins/genetics , Membrane Proteins/immunology , Polymorphism, Single Nucleotide , Tissue Donors
12.
Bioinformatics ; 2021 Jan 30.
Article in English | MEDLINE | ID: mdl-33515242

ABSTRACT

MOTIVATION: Predicting regulatory effects of genetic variants is a challenging but important problem in functional genomics. Given the relatively low sensitivity of functional assays, and the pervasiveness of class imbalance in functional genomic data, popular statistical prediction models can sharply underestimate the probability of a regulatory effect. We describe here the presence-only model (PO-EN), a type of semi-supervised model, to predict regulatory effects of genetic variants at sequence-level resolution in a context of interest by integrating a large number of epigenetic features and massively parallel reporter assays (MPRAs). RESULTS: Using experimental data from a variety of MPRAs we show that the presence-only model produces better calibrated predicted probabilities and has increased accuracy relative to state-of-the-art prediction models. Furthermore, we show that the predictions based on pre-trained PO-EN models are useful for prioritizing functional variants among candidate eQTLs and significant SNPs at GWAS loci. In particular, for the costimulatory locus, associated with multiple autoimmune diseases, we show evidence of a regulatory variant residing in an enhancer 24.4 kb downstream of CTLA4, with evidence from capture Hi-C of interaction with CTLA4. Furthermore, the risk allele of the regulatory variant is on the same risk increasing haplotype as a functional coding variant in exon 1 of CTLA4, suggesting that the regulatory variant acts jointly with the coding variant leading to increased risk to disease. AVAILABILITY: The presence-only model is implemented in the R package 'PO.EN', freely available on CRAN. A vignette describing a detailed demonstration of using the proposed PO-EN model can be found on github at https://github.com/Iuliana-Ionita-Laza/PO.EN/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

13.
Am J Hum Genet ; 102(5): 920-942, 2018 05 03.
Article in English | MEDLINE | ID: mdl-29727691

ABSTRACT

We describe a method based on a latent Dirichlet allocation model for predicting functional effects of noncoding genetic variants in a cell-type- and/or tissue-specific way (FUN-LDA). Using this unsupervised approach, we predict tissue-specific functional effects for every position in the human genome in 127 different tissues and cell types. We demonstrate the usefulness of our predictions by using several validation experiments. Using eQTL data from several sources, including the GTEx project, Geuvadis project, and TwinsUK cohort, we show that eQTLs in specific tissues tend to be most enriched among the predicted functional variants in relevant tissues in Roadmap. We further show how these integrated functional scores can be used for (1) deriving the most likely cell or tissue type causally implicated for a complex trait by using summary statistics from genome-wide association studies and (2) estimating a tissue-based correlation matrix of various complex traits. We found large enrichment of heritability in functional components of relevant tissues for various complex traits, and FUN-LDA yielded higher enrichment estimates than existing methods. Finally, using experimentally validated functional variants from the literature and variants possibly implicated in disease by previous studies, we rigorously compare FUN-LDA with state-of-the-art functional annotation methods and show that FUN-LDA has better prediction accuracy and higher resolution than these methods. In particular, our results suggest that tissue- and cell-type-specific functional prediction methods tend to have substantially better prediction accuracy than organism-level prediction methods. Scores for each position in the human genome and for each ENCODE and Roadmap tissue are available online (see Web Resources).


Subject(s)
Algorithms , DNA, Intergenic/genetics , Genetic Variation , Models, Genetic , Organ Specificity/genetics , Genome-Wide Association Study , Humans , Linkage Disequilibrium/genetics , Molecular Sequence Annotation , Polymorphism, Single Nucleotide/genetics , Probability , Quantitative Trait Loci/genetics , Reproducibility of Results , Twins/genetics
14.
Am J Hum Genet ; 102(6): 1031-1047, 2018 06 07.
Article in English | MEDLINE | ID: mdl-29754769

ABSTRACT

Analysis of de novo mutations (DNMs) from sequencing data of nuclear families has identified risk genes for many complex diseases, including multiple neurodevelopmental and psychiatric disorders. Most of these efforts have focused on mutations in protein-coding sequences. Evidence from genome-wide association studies (GWASs) strongly suggests that variants important to human diseases often lie in non-coding regions. Extending DNM-based approaches to non-coding sequences is challenging, however, because the functional significance of non-coding mutations is difficult to predict. We propose a statistical framework for analyzing DNMs from whole-genome sequencing (WGS) data. This method, TADA-Annotations (TADA-A), is a major advance of the TADA method we developed earlier for DNM analysis in coding regions. TADA-A is able to incorporate many functional annotations such as conservation and enhancer marks, to learn from data which annotations are informative of pathogenic mutations, and to combine both coding and non-coding mutations at the gene level to detect risk genes. It also supports meta-analysis of multiple DNM studies, while adjusting for study-specific technical effects. We applied TADA-A to WGS data of ∼300 autism-affected family trios across five studies and discovered several autism risk genes. The software is freely available for all research uses.


Subject(s)
Chromosome Mapping , Genetic Predisposition to Disease , Mutation/genetics , Statistics as Topic , Whole Genome Sequencing , Autistic Disorder/genetics , Calibration , Enhancer Elements, Genetic/genetics , Humans , Molecular Sequence Annotation , Mutation Rate , RNA Splicing/genetics , Risk Factors , Exome Sequencing
15.
Am J Hum Genet ; 101(3): 340-352, 2017 Sep 07.
Article in English | MEDLINE | ID: mdl-28844485

ABSTRACT

Substantial progress has been made in the functional annotation of genetic variation in the human genome. Integrative analysis that incorporates such functional annotations into sequencing studies can aid the discovery of disease-associated genetic variants, especially those with unknown function and located outside protein-coding regions. Direct incorporation of one functional annotation as weight in existing dispersion and burden tests can suffer substantial loss of power when the functional annotation is not predictive of the risk status of a variant. Here, we have developed unified tests that can utilize multiple functional annotations simultaneously for integrative association analysis with efficient computational techniques. We show that the proposed tests significantly improve power when variant risk status can be predicted by functional annotations. Importantly, when functional annotations are not predictive of risk status, the proposed tests incur only minimal loss of power in relation to existing dispersion and burden tests, and under certain circumstances they can even have improved power by learning a weight that better approximates the underlying disease model in a data-adaptive manner. The tests can be constructed with summary statistics of existing dispersion and burden tests for sequencing data, therefore allowing meta-analysis of multiple studies without sharing individual-level data. We applied the proposed tests to a meta-analysis of noncoding rare variants in Metabochip data on 12,281 individuals from eight studies for lipid traits. By incorporating the Eigen functional score, we detected significant associations between noncoding rare variants in SLC22A3 and low-density lipoprotein and total cholesterol, associations that are missed by standard dispersion and burden tests.


Subject(s)
Genome, Human , Genome-Wide Association Study/methods , Metabolomics , Molecular Sequence Annotation/methods , Polymorphism, Single Nucleotide , Cardiovascular Diseases/genetics , Cardiovascular Diseases/metabolism , Cardiovascular Diseases/pathology , Computer Simulation , Genetic Predisposition to Disease , Humans , Lipids/analysis , Organic Cation Transport Proteins/genetics , Phenotype
16.
Am J Hum Genet ; 101(5): 789-802, 2017 Nov 02.
Article in English | MEDLINE | ID: mdl-29100090

ABSTRACT

Renal agenesis and hypodysplasia (RHD) are major causes of pediatric chronic kidney disease and are highly genetically heterogeneous. We conducted whole-exome sequencing in 202 case subjects with RHD and identified diagnostic mutations in genes known to be associated with RHD in 7/202 case subjects. In an additional affected individual with RHD and a congenital heart defect, we found a homozygous loss-of-function (LOF) variant in SLIT3, recapitulating phenotypes reported with Slit3 inactivation in the mouse. To identify genes associated with RHD, we performed an exome-wide association study with 195 unresolved case subjects and 6,905 control subjects. The top signal resided in GREB1L, a gene implicated previously in Hoxb1 and Shha signaling in zebrafish. The significance of the association, which was p = 2.0 × 10-5 for novel LOF, increased to p = 4.1 × 10-6 for LOF and deleterious missense variants combined, and augmented further after accounting for segregation and de novo inheritance of rare variants (joint p = 2.3 × 10-7). Finally, CRISPR/Cas9 disruption or knockdown of greb1l in zebrafish caused specific pronephric defects, which were rescued by wild-type human GREB1L mRNA, but not mRNA containing alleles identified in case subjects. Together, our study provides insight into the genetic landscape of kidney malformations in humans, presents multiple candidates, and identifies SLIT3 and GREB1L as genes implicated in the pathogenesis of RHD.


Subject(s)
Congenital Abnormalities/genetics , Exome/genetics , Kidney Diseases/congenital , Kidney/abnormalities , Mutation/genetics , Neoplasm Proteins/genetics , Alleles , Animals , Case-Control Studies , Clustered Regularly Interspaced Short Palindromic Repeats/genetics , Female , Genetic Heterogeneity , Genome-Wide Association Study/methods , Genotype , Heredity/genetics , Homozygote , Humans , Kidney Diseases/genetics , Male , Membrane Proteins/genetics , Mice , Phenotype , RNA, Long Noncoding/genetics , Urinary Tract/abnormalities , Urogenital Abnormalities/genetics , Zebrafish
17.
PLoS Genet ; 13(2): e1006609, 2017 02.
Article in English | MEDLINE | ID: mdl-28187132

ABSTRACT

Aberrant O-glycosylation of serum immunoglobulin A1 (IgA1) represents a heritable pathogenic defect in IgA nephropathy, the most common form of glomerulonephritis worldwide, but specific genetic factors involved in its determination are not known. We performed a quantitative GWAS for serum levels of galactose-deficient IgA1 (Gd-IgA1) in 2,633 subjects of European and East Asian ancestry and discovered two genome-wide significant loci, in C1GALT1 (rs13226913, P = 3.2 x 10-11) and C1GALT1C1 (rs5910940, P = 2.7 x 10-8). These genes encode molecular partners essential for enzymatic O-glycosylation of IgA1. We demonstrated that these two loci explain approximately 7% of variability in circulating Gd-IgA1 in Europeans, but only 2% in East Asians. Notably, the Gd-IgA1-increasing allele of rs13226913 is common in Europeans, but rare in East Asians. Moreover, rs13226913 represents a strong cis-eQTL for C1GALT1 that encodes the key enzyme responsible for the transfer of galactose to O-linked glycans on IgA1. By in vitro siRNA knock-down studies, we confirmed that mRNA levels of both C1GALT1 and C1GALT1C1 determine the rate of secretion of Gd-IgA1 in IgA1-producing cells. Our findings provide novel insights into the genetic regulation of O-glycosylation and are relevant not only to IgA nephropathy, but also to other complex traits associated with O-glycosylation defects, including inflammatory bowel disease, hematologic disease, and cancer.


Subject(s)
Galactosyltransferases/genetics , Genetic Predisposition to Disease/genetics , Genome-Wide Association Study/methods , Glomerulonephritis, IGA/genetics , Molecular Chaperones/genetics , Polymorphism, Single Nucleotide , Alleles , Asian People/genetics , Cell Line , Cohort Studies , Galactose/deficiency , Gene Expression Regulation , Gene Frequency , Gene Regulatory Networks , Genetic Predisposition to Disease/ethnology , Genotype , Glomerulonephritis, IGA/blood , Glomerulonephritis, IGA/ethnology , Glycosylation , Humans , Immunoglobulin A/blood , Models, Genetic , Nerve Tissue Proteins/genetics , Phenotype , RNA Interference , Reverse Transcriptase Polymerase Chain Reaction , Signal Transduction/genetics , Ubiquitin-Protein Ligases/genetics , White People/genetics
18.
Genet Epidemiol ; 41(8): 801-810, 2017 12.
Article in English | MEDLINE | ID: mdl-29076270

ABSTRACT

Over the past few years, an increasing number of studies have identified rare variants that contribute to trait heritability. Due to the extreme rarity of some individual variants, gene-based association tests have been proposed to aggregate the genetic variants within a gene, pathway, or specific genomic region as opposed to a one-at-a-time single variant analysis. In addition, in longitudinal studies, statistical power to detect disease susceptibility rare variants can be improved through jointly testing repeatedly measured outcomes, which better describes the temporal development of the trait of interest. However, usual sandwich/model-based inference for sequencing studies with longitudinal outcomes and rare variants can produce deflated/inflated type I error rate without further corrections. In this paper, we develop a group of tests for rare-variant association based on outcomes with repeated measures. We propose new perturbation methods such that the type I error rate of the new tests is not only robust to misspecification of within-subject correlation, but also significantly improved for variants with extreme rarity in a study with small or moderate sample size. Through extensive simulation studies, we illustrate that substantially higher power can be achieved by utilizing longitudinal outcomes and our proposed finite sample adjustment. We illustrate our methods using data from the Multi-Ethnic Study of Atherosclerosis for exploring association of repeated measures of blood pressure with rare and common variants based on exome sequencing data on 6,361 individuals.


Subject(s)
Atherosclerosis/genetics , Models, Genetic , Atherosclerosis/ethnology , Atherosclerosis/pathology , Blood Pressure/genetics , DNA-Binding Proteins/genetics , Ethnicity/genetics , Genome-Wide Association Study , Humans , Polymorphism, Single Nucleotide
19.
Bioinformatics ; 33(14): 2123-2130, 2017 Jul 15.
Article in English | MEDLINE | ID: mdl-28334222

ABSTRACT

MOTIVATION: Over the past decade, there has been a remarkable improvement in our understanding of the role of genetic variation in complex human diseases, especially via genome-wide association studies. However, the underlying molecular mechanisms are still poorly characterized, impending the development of therapeutic interventions. Identifying genetic variants that influence the expression level of a gene, i.e. expression quantitative trait loci (eQTLs), can help us understand how genetic variants influence traits at the molecular level. While most eQTL studies focus on identifying mean effects on gene expression using linear regression, evidence suggests that genetic variation can impact the entire distribution of the expression level. Motivated by the potential higher order associations, several studies investigated variance eQTLs. RESULTS: In this paper, we develop a Quantile Rank-score based test (QRank), which provides an easy way to identify eQTLs that are associated with the conditional quantile functions of gene expression. We have applied the proposed QRank to the Genotype-Tissue Expression project, an international tissue bank for studying the relationship between genetic variation and gene expression in human tissues, and found that the proposed QRank complements the existing methods, and identifies new eQTLs with heterogeneous effects across different quantile levels. Notably, we show that the eQTLs identified by QRank but missed by linear regression are associated with greater enrichment in genome-wide significant SNPs from the GWAS catalog, and are also more likely to be tissue specific than eQTLs identified by linear regression. AVAILABILITY AND IMPLEMENTATION: An R package is available on R CRAN at https://cran.r-project.org/web/packages/QRank . CONTACT: xs2148@cumc.columbia.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Gene Expression Profiling/methods , Genetic Variation , Genome-Wide Association Study/methods , Quantitative Trait Loci , Software , Computational Biology/methods , Computer Simulation , Humans
20.
PLoS Genet ; 10(12): e1004729, 2014 Dec.
Article in English | MEDLINE | ID: mdl-25502226

ABSTRACT

Pinpointing the small number of causal variants among the abundant naturally occurring genetic variation is a difficult challenge, but a crucial one for understanding precise molecular mechanisms of disease and follow-up functional studies. We propose and investigate two complementary statistical approaches for identification of rare causal variants in sequencing studies: a backward elimination procedure based on groupwise association tests, and a hierarchical approach that can integrate sequencing data with diverse functional and evolutionary conservation annotations for individual variants. Using simulations, we show that incorporation of multiple bioinformatic predictors of deleteriousness, such as PolyPhen-2, SIFT and GERP++ scores, can improve the power to discover truly causal variants. As proof of principle, we apply the proposed methods to VPS13B, a gene mutated in the rare neurodevelopmental disorder called Cohen syndrome, and recently reported with recessive variants in autism. We identify a small set of promising candidates for causal variants, including two loss-of-function variants and a rare, homozygous probably-damaging variant that could contribute to autism risk.


Subject(s)
Autistic Disorder/genetics , Evolution, Molecular , Fingers/abnormalities , Genetic Variation , Intellectual Disability/genetics , Microcephaly/genetics , Muscle Hypotonia/genetics , Myopia/genetics , Obesity/genetics , Vesicular Transport Proteins/genetics , Angiopoietin-Like Protein 4 , Angiopoietins/genetics , Autistic Disorder/diagnosis , Computational Biology , Computer Simulation , Developmental Disabilities/diagnosis , Developmental Disabilities/genetics , Gene Frequency , Genome-Wide Association Study , Humans , Intellectual Disability/diagnosis , Microcephaly/diagnosis , Models, Genetic , Muscle Hypotonia/diagnosis , Myopia/diagnosis , Obesity/diagnosis , Retinal Degeneration , Software
SELECTION OF CITATIONS
SEARCH DETAIL