Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 78
Filtrar
1.
Cell ; 173(7): 1692-1704.e11, 2018 06 14.
Artigo em Inglês | MEDLINE | ID: mdl-29779949

RESUMO

Heritability is essential for understanding the biological causes of disease but requires laborious patient recruitment and phenotype ascertainment. Electronic health records (EHRs) passively capture a wide range of clinically relevant data and provide a resource for studying the heritability of traits that are not typically accessible. EHRs contain next-of-kin information collected via patient emergency contact forms, but until now, these data have gone unused in research. We mined emergency contact data at three academic medical centers and identified 7.4 million familial relationships while maintaining patient privacy. Identified relationships were consistent with genetically derived relatedness. We used EHR data to compute heritability estimates for 500 disease phenotypes. Overall, estimates were consistent with the literature and between sites. Inconsistencies were indicative of limitations and opportunities unique to EHR research. These analyses provide a validation of the use of EHRs for genetics and disease research.


Assuntos
Registros Eletrônicos de Saúde , Doenças Genéticas Inatas/genética , Algoritmos , Bases de Dados Factuais , Relações Familiares , Doenças Genéticas Inatas/patologia , Genótipo , Humanos , Linhagem , Fenótipo , Característica Quantitativa Herdável
2.
Am J Hum Genet ; 2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38821058

RESUMO

Both trio and population designs are popular study designs for identifying risk genetic variants in genome-wide association studies (GWASs). The trio design, as a family-based design, is robust to confounding due to population structure, whereas the population design is often more powerful due to larger sample sizes. Here, we propose KnockoffHybrid, a knockoff-based statistical method for hybrid analysis of both the trio and population designs. KnockoffHybrid provides a unified framework that brings together the advantages of both designs and produces powerful hybrid analysis while controlling the false discovery rate (FDR) in the presence of linkage disequilibrium and population structure. Furthermore, KnockoffHybrid has the flexibility to leverage different types of summary statistics for hybrid analyses, including expression quantitative trait loci (eQTL) and GWAS summary statistics. We demonstrate in simulations that KnockoffHybrid offers power gains over non-hybrid methods for the trio and population designs with the same number of cases while controlling the FDR with complex correlation among variants and population structure among subjects. In hybrid analyses of three trio cohorts for autism spectrum disorders (ASDs) from the Autism Speaks MSSNG, Autism Sequencing Consortium, and Autism Genome Project with GWAS summary statistics from the iPSYCH project and eQTL summary statistics from the MetaBrain project, KnockoffHybrid outperforms conventional methods by replicating several known risk genes for ASDs and identifying additional associations with variants in other genes, including the PRAME family genes involved in axon guidance and which may act as common targets for human speech/language evolution and related disorders.

3.
Am J Hum Genet ; 109(10): 1761-1776, 2022 10 06.
Artigo em Inglês | MEDLINE | ID: mdl-36150388

RESUMO

Family-based designs can eliminate confounding due to population substructure and can distinguish direct from indirect genetic effects, but these designs are underpowered due to limited sample sizes. Here, we propose KnockoffTrio, a statistical method to identify putative causal genetic variants for father-mother-child trio design built upon a recently developed knockoff framework in statistics. KnockoffTrio controls the false discovery rate (FDR) in the presence of arbitrary correlations among tests and is less conservative and thus more powerful than the conventional methods that control the family-wise error rate via Bonferroni correction. Furthermore, KnockoffTrio is not restricted to family-based association tests and can be used in conjunction with more powerful, potentially nonlinear models to improve the power of standard family-based tests. We show, using empirical simulations, that KnockoffTrio can prioritize causal variants over associations due to linkage disequilibrium and can provide protection against confounding due to population stratification. In applications to 14,200 trios from three study cohorts for autism spectrum disorders (ASDs), including AGP, SPARK, and SSC, we show that KnockoffTrio can identify multiple significant associations that are missed by conventional tests applied to the same data. In particular, we replicate known ASD association signals with variants in several genes such as MACROD2, NRXN1, PRKAR1B, CADM2, PCDH9, and DOCK4 and identify additional associations with variants in other genes including ARHGEF10, SLC28A1, ZNF589, and HINT1 at FDR 10%.


Assuntos
Transtorno do Espectro Autista , Estudo de Associação Genômica Ampla , Transtorno do Espectro Autista/genética , Causalidade , Estudo de Associação Genômica Ampla/métodos , Humanos , Desequilíbrio de Ligação , Proteínas do Tecido Nervoso/genética
4.
Am J Hum Genet ; 109(3): 446-456, 2022 03 03.
Artigo em Inglês | MEDLINE | ID: mdl-35216679

RESUMO

Attempts to identify and prioritize functional DNA elements in coding and non-coding regions, particularly through use of in silico functional annotation data, continue to increase in popularity. However, specific functional roles can vary widely from one variant to another, making it challenging to summarize different aspects of variant function with a one-dimensional rating. Here we propose multi-dimensional annotation-class integrative estimation (MACIE), an unsupervised multivariate mixed-model framework capable of integrating annotations of diverse origin to assess multi-dimensional functional roles for both coding and non-coding variants. Unlike existing one-dimensional scoring methods, MACIE views variant functionality as a composite attribute encompassing multiple characteristics and estimates the joint posterior functional probabilities of each genomic position. This estimate offers more comprehensive and interpretable information in the presence of multiple aspects of functionality. Applied to a variety of independent coding and non-coding datasets, MACIE demonstrates powerful and robust performance in discriminating between functional and non-functional variants. We also show an application of MACIE to fine-mapping and heritability enrichment analysis by using the lipids GWAS summary statistics data from the European Network for Genetic and Genomic Epidemiology Consortium.


Assuntos
Genoma Humano , Estudo de Associação Genômica Ampla , Genoma Humano/genética , Estudo de Associação Genômica Ampla/métodos , Genômica , Humanos , Anotação de Sequência Molecular , Polimorfismo de Nucleotídeo Único/genética , Probabilidade
5.
Am J Hum Genet ; 108(12): 2336-2353, 2021 12 02.
Artigo em Inglês | MEDLINE | ID: mdl-34767756

RESUMO

Knockoff-based methods have become increasingly popular due to their enhanced power for locus discovery and their ability to prioritize putative causal variants in a genome-wide analysis. However, because of the substantial computational cost for generating knockoffs, existing knockoff approaches cannot analyze millions of rare genetic variants in biobank-scale whole-genome sequencing and whole-genome imputed datasets. We propose a scalable knockoff-based method for the analysis of common and rare variants across the genome, KnockoffScreen-AL, that is applicable to biobank-scale studies with hundreds of thousands of samples and millions of genetic variants. The application of KnockoffScreen-AL to the analysis of Alzheimer disease (AD) in 388,051 WG-imputed samples from the UK Biobank resulted in 31 significant loci, including 14 loci that are missed by conventional association tests on these data. We perform replication studies in an independent meta-analysis of clinically diagnosed AD with 94,437 samples, and additionally leverage single-cell RNA-sequencing data with 143,793 single-nucleus transcriptomes from 17 control subjects and AD-affected individuals, and proteomics data from 735 control subjects and affected indviduals with AD and related disorders to validate the genes at these significant loci. These multi-omics analyses show that 79.1% of the proximal genes at these loci and 76.2% of the genes at loci identified only by KnockoffScreen-AL exhibit at least suggestive signal (p < 0.05) in the scRNA-seq or proteomics analyses. We highlight a potentially causal gene in AD progression, EGFR, that shows significant differences in expression and protein levels between AD-affected individuals and healthy control subjects.


Assuntos
Doença de Alzheimer/genética , Bancos de Espécimes Biológicos , Técnicas de Inativação de Genes , Genes erbB-1 , Variação Genética , Estudo de Associação Genômica Ampla , Humanos , RNA-Seq , Transcriptoma , Sequenciamento Completo do Genoma
6.
Proc Natl Acad Sci U S A ; 118(47)2021 11 23.
Artigo em Inglês | MEDLINE | ID: mdl-34799441

RESUMO

Gene-based tests are valuable techniques for identifying genetic factors in complex traits. Here, we propose a gene-based testing framework that incorporates data on long-range chromatin interactions, several recent technical advances for region-based tests, and leverages the knockoff framework for synthetic genotype generation for improved gene discovery. Through simulations and applications to genome-wide association studies (GWAS) and whole-genome sequencing data for multiple diseases and traits, we show that the proposed test increases the power over state-of-the-art gene-based tests in the literature, identifies genes that replicate in larger studies, and can provide a more narrow focus on the possible causal genes at a locus by reducing the confounding effect of linkage disequilibrium. Furthermore, our results show that incorporating genetic variation in distal regulatory elements tends to improve power over conventional tests. Results for UK Biobank and BioBank Japan traits are also available in a publicly accessible database that allows researchers to query gene-based results in an easy fashion.


Assuntos
Cromatina , Testes Genéticos/métodos , Genótipo , Estudo de Associação Genômica Ampla/métodos , Humanos , Japão , Desequilíbrio de Ligação , Pulmão , Modelos Genéticos , Fenótipo , Locos de Características Quantitativas , Sequenciamento Completo do Genoma/métodos
7.
PLoS Genet ; 17(8): e1009713, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-34460823

RESUMO

Genome-wide association studies (GWASs) have uncovered a wealth of associations between common variants and human phenotypes. Here, we present an integrative analysis of GWAS summary statistics from 36 phenotypes to decipher multitrait genetic architecture and its link with biological mechanisms. Our framework incorporates multitrait association mapping along with an investigation of the breakdown of genetic associations into clusters of variants harboring similar multitrait association profiles. Focusing on two subsets of immunity and metabolism phenotypes, we then demonstrate how genetic variants within clusters can be mapped to biological pathways and disease mechanisms. Finally, for the metabolism set, we investigate the link between gene cluster assignment and the success of drug targets in randomized controlled trials.


Assuntos
Biologia Computacional/métodos , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Análise por Conglomerados , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Fenótipo
8.
J Am Soc Nephrol ; 34(4): 607-618, 2023 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-36302597

RESUMO

SIGNIFICANCE STATEMENT: Pathogenic structural genetic variants, also known as genomic disorders, have been associated with pediatric CKD. This study extends those results across the lifespan, with genomic disorders enriched in both pediatric and adult patients compared with controls. In the Chronic Renal Insufficiency Cohort study, genomic disorders were also associated with lower serum Mg, lower educational performance, and a higher risk of death. A phenome-wide association study confirmed the link between kidney disease and genomic disorders in an unbiased way. Systematic detection of genomic disorders can provide a molecular diagnosis and refine prediction of risk and prognosis. BACKGROUND: Genomic disorders (GDs) are associated with many comorbid outcomes, including CKD. Identification of GDs has diagnostic utility. METHODS: We examined the prevalence of GDs among participants in the Chronic Kidney Disease in Children (CKiD) cohort II ( n =248), Chronic Renal Insufficiency Cohort (CRIC) study ( n =3375), Columbia University CKD Biobank (CU-CKD; n =1986), and the Family Investigation of Nephropathy and Diabetes (FIND; n =1318) compared with 30,746 controls. We also performed a phenome-wide association analysis (PheWAS) of GDs in the electronic MEdical Records and GEnomics (eMERGE; n =11,146) cohort. RESULTS: We found nine out of 248 (3.6%) CKiD II participants carried a GD, replicating prior findings in pediatric CKD. We also identified GDs in 72 out of 6679 (1.1%) adult patients with CKD in the CRIC, CU-CKD, and FIND cohorts, compared with 199 out of 30,746 (0.65%) GDs in controls (OR, 1.7; 95% CI, 1.3 to 2.2). Among adults with CKD, we found recurrent GDs at the 1q21.1, 16p11.2, 17q12, and 22q11.2 loci. The 17q12 GD (diagnostic of renal cyst and diabetes syndrome) was most frequent, present in 1:252 patients with CKD and diabetes. In the PheWAS, dialysis and neuropsychiatric phenotypes were the top associations with GDs. In CRIC participants, GDs were associated with lower serum magnesium, lower educational achievement, and higher mortality risk. CONCLUSION: Undiagnosed GDs are detected both in children and adults with CKD. Identification of GDs in these patients can enable a precise genetic diagnosis, inform prognosis, and help stratify risk in clinical studies. GDs could also provide a molecular explanation for nephropathy and comorbidities, such as poorer neurocognition for a subset of patients.


Assuntos
Longevidade , Insuficiência Renal Crônica , Humanos , Estudos de Coortes , Estudos Prospectivos , Insuficiência Renal Crônica/epidemiologia , Insuficiência Renal Crônica/genética , Insuficiência Renal Crônica/complicações , Genômica , Progressão da Doença , Fatores de Risco
9.
Am J Hum Genet ; 106(4): 513-524, 2020 04 02.
Artigo em Inglês | MEDLINE | ID: mdl-32243819

RESUMO

The identification of functional regions in the noncoding human genome is difficult but critical in order to gain understanding of the role noncoding variation plays in gene regulation in human health and disease. We describe here a co-localization approach that aims to identify constrained sequences that co-localize with tissue- or cell-type-specific regulatory regions, and we show that the resulting score is particularly well suited for the identification of rare regulatory variants. For 127 tissues and cell types in the ENCODE/Roadmap Epigenomics Project, we provide catalogs of putative tissue- or cell-type-specific regulatory regions under sequence constraint. We use the newly developed co-localization score for brain tissues to score de novo mutations in whole genomes from 1,902 individuals affected with autism spectrum disorder (ASD) and their unaffected siblings in the Simons Simplex Collection. We show that noncoding de novo mutations near genes co-expressed in midfetal brain with high confidence ASD risk genes, and near FMRP gene targets are more likely to be in co-localized regions if they occur in ASD probands versus in their unaffected siblings. We also observed a similar enrichment for mutations near lincRNAs, previously shown to co-express with ASD risk genes. Additionally, we provide strong evidence that prioritized de novo mutations in autism probands point to a small set of well-known ASD genes, the disruption of which produces relevant mouse phenotypes such as abnormal social investigation and abnormal discrimination/associative learning, unlike the de novo mutations in unaffected siblings. The genome-wide co-localization results are available online.


Assuntos
Regulação da Expressão Gênica/genética , Genoma Humano/genética , Transtorno do Espectro Autista/genética , Epigenômica/métodos , Humanos , Mutação/genética , Fenótipo , Irmãos , Sequenciamento Completo do Genoma/métodos
10.
Genet Med ; 25(12): 100983, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37746849

RESUMO

PURPOSE: Previous work identified rare variants in DSTYK associated with human congenital anomalies of the kidney and urinary tract (CAKUT). Here, we present a series of mouse and human studies to clarify the association, penetrance, and expressivity of DSTYK variants. METHODS: We phenotypically characterized Dstyk knockout mice of 3 separate inbred backgrounds and re-analyzed the original family segregating the DSTYK c.654+1G>A splice-site variant (referred to as "SSV" below). DSTYK loss of function (LOF) and SSVs were annotated in individuals with CAKUT, epilepsy, or amyotrophic lateral sclerosis vs controls. A phenome-wide association study analysis was also performed using United Kingdom Biobank (UKBB) data. RESULTS: Results demonstrate ∼20% to 25% penetrance of obstructive uropathy, at least, in C57BL/6J and FVB/NJ Dstyk-/- mice. Phenotypic penetrance increased to ∼40% in C3H/HeJ mutants, with mild-to-moderate severity. Re-analysis of the original family segregating the rare SSV showed low penetrance (43.8%) and no alternative genetic causes for CAKUT. LOF DSTYK variants burden showed significant excess for CAKUT and epilepsy vs controls and an exploratory phenome-wide association study supported association with neurological disorders. CONCLUSION: These data support causality for DSTYK LOF variants and highlights the need for large-scale sequencing studies (here >200,000 cases) to accurately assess causality for genes and variants to lowly penetrant traits with common population prevalence.


Assuntos
Epilepsia , Sistema Urinário , Anormalidades Urogenitais , Animais , Camundongos , Humanos , Penetrância , Camundongos Endogâmicos C3H , Camundongos Endogâmicos C57BL , Anormalidades Urogenitais/genética , Rim/anormalidades , Fatores de Risco , Epilepsia/genética , Proteína Serina-Treonina Quinases de Interação com Receptores/genética
11.
N Engl J Med ; 380(20): 1918-1928, 2019 05 16.
Artigo em Inglês | MEDLINE | ID: mdl-31091373

RESUMO

BACKGROUND: In the context of kidney transplantation, genomic incompatibilities between donor and recipient may lead to allosensitization against new antigens. We hypothesized that recessive inheritance of gene-disrupting variants may represent a risk factor for allograft rejection. METHODS: We performed a two-stage genetic association study of kidney allograft rejection. In the first stage, we performed a recessive association screen of 50 common gene-intersecting deletion polymorphisms in a cohort of kidney transplant recipients. In the second stage, we replicated our findings in three independent cohorts of donor-recipient pairs. We defined genomic collision as a specific donor-recipient genotype combination in which a recipient who was homozygous for a gene-intersecting deletion received a transplant from a nonhomozygous donor. Identification of alloantibodies was performed with the use of protein arrays, enzyme-linked immunosorbent assays, and Western blot analyses. RESULTS: In the discovery cohort, which included 705 recipients, we found a significant association with allograft rejection at the LIMS1 locus represented by rs893403 (hazard ratio with the risk genotype vs. nonrisk genotypes, 1.84; 95% confidence interval [CI], 1.35 to 2.50; P = 9.8×10-5). This effect was replicated under the genomic-collision model in three independent cohorts involving a total of 2004 donor-recipient pairs (hazard ratio, 1.55; 95% CI, 1.25 to 1.93; P = 6.5×10-5). In the combined analysis (discovery cohort plus replication cohorts), the risk genotype was associated with a higher risk of rejection than the nonrisk genotype (hazard ratio, 1.63; 95% CI, 1.37 to 1.95; P = 4.7×10-8). We identified a specific antibody response against LIMS1, a kidney-expressed protein encoded within the collision locus. The response involved predominantly IgG2 and IgG3 antibody subclasses. CONCLUSIONS: We found that the LIMS1 locus appeared to encode a minor histocompatibility antigen. Genomic collision at this locus was associated with rejection of the kidney allograft and with production of anti-LIMS1 IgG2 and IgG3. (Funded by the Columbia University Transplant Center and others.).


Assuntos
Proteínas Adaptadoras de Transdução de Sinal/genética , Variações do Número de Cópias de DNA , Rejeição de Enxerto/genética , Transplante de Rim , Proteínas com Domínio LIM/genética , Proteínas Adaptadoras de Transdução de Sinal/imunologia , Estudos de Coortes , Estudos de Associação Genética , Genótipo , Antígenos HLA/genética , Teste de Histocompatibilidade , Humanos , Imunoglobulina G/sangue , Proteínas com Domínio LIM/imunologia , Proteínas de Membrana/genética , Proteínas de Membrana/imunologia , Polimorfismo de Nucleotídeo Único , Doadores de Tecidos
12.
Bioinformatics ; 2021 Jan 30.
Artigo em Inglês | MEDLINE | ID: mdl-33515242

RESUMO

MOTIVATION: Predicting regulatory effects of genetic variants is a challenging but important problem in functional genomics. Given the relatively low sensitivity of functional assays, and the pervasiveness of class imbalance in functional genomic data, popular statistical prediction models can sharply underestimate the probability of a regulatory effect. We describe here the presence-only model (PO-EN), a type of semi-supervised model, to predict regulatory effects of genetic variants at sequence-level resolution in a context of interest by integrating a large number of epigenetic features and massively parallel reporter assays (MPRAs). RESULTS: Using experimental data from a variety of MPRAs we show that the presence-only model produces better calibrated predicted probabilities and has increased accuracy relative to state-of-the-art prediction models. Furthermore, we show that the predictions based on pre-trained PO-EN models are useful for prioritizing functional variants among candidate eQTLs and significant SNPs at GWAS loci. In particular, for the costimulatory locus, associated with multiple autoimmune diseases, we show evidence of a regulatory variant residing in an enhancer 24.4 kb downstream of CTLA4, with evidence from capture Hi-C of interaction with CTLA4. Furthermore, the risk allele of the regulatory variant is on the same risk increasing haplotype as a functional coding variant in exon 1 of CTLA4, suggesting that the regulatory variant acts jointly with the coding variant leading to increased risk to disease. AVAILABILITY: The presence-only model is implemented in the R package 'PO.EN', freely available on CRAN. A vignette describing a detailed demonstration of using the proposed PO-EN model can be found on github at https://github.com/Iuliana-Ionita-Laza/PO.EN/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

13.
Am J Hum Genet ; 102(5): 920-942, 2018 05 03.
Artigo em Inglês | MEDLINE | ID: mdl-29727691

RESUMO

We describe a method based on a latent Dirichlet allocation model for predicting functional effects of noncoding genetic variants in a cell-type- and/or tissue-specific way (FUN-LDA). Using this unsupervised approach, we predict tissue-specific functional effects for every position in the human genome in 127 different tissues and cell types. We demonstrate the usefulness of our predictions by using several validation experiments. Using eQTL data from several sources, including the GTEx project, Geuvadis project, and TwinsUK cohort, we show that eQTLs in specific tissues tend to be most enriched among the predicted functional variants in relevant tissues in Roadmap. We further show how these integrated functional scores can be used for (1) deriving the most likely cell or tissue type causally implicated for a complex trait by using summary statistics from genome-wide association studies and (2) estimating a tissue-based correlation matrix of various complex traits. We found large enrichment of heritability in functional components of relevant tissues for various complex traits, and FUN-LDA yielded higher enrichment estimates than existing methods. Finally, using experimentally validated functional variants from the literature and variants possibly implicated in disease by previous studies, we rigorously compare FUN-LDA with state-of-the-art functional annotation methods and show that FUN-LDA has better prediction accuracy and higher resolution than these methods. In particular, our results suggest that tissue- and cell-type-specific functional prediction methods tend to have substantially better prediction accuracy than organism-level prediction methods. Scores for each position in the human genome and for each ENCODE and Roadmap tissue are available online (see Web Resources).


Assuntos
Algoritmos , DNA Intergênico/genética , Variação Genética , Modelos Genéticos , Especificidade de Órgãos/genética , Estudo de Associação Genômica Ampla , Humanos , Desequilíbrio de Ligação/genética , Anotação de Sequência Molecular , Polimorfismo de Nucleotídeo Único/genética , Probabilidade , Locos de Características Quantitativas/genética , Reprodutibilidade dos Testes , Gêmeos/genética
14.
Am J Hum Genet ; 102(6): 1031-1047, 2018 06 07.
Artigo em Inglês | MEDLINE | ID: mdl-29754769

RESUMO

Analysis of de novo mutations (DNMs) from sequencing data of nuclear families has identified risk genes for many complex diseases, including multiple neurodevelopmental and psychiatric disorders. Most of these efforts have focused on mutations in protein-coding sequences. Evidence from genome-wide association studies (GWASs) strongly suggests that variants important to human diseases often lie in non-coding regions. Extending DNM-based approaches to non-coding sequences is challenging, however, because the functional significance of non-coding mutations is difficult to predict. We propose a statistical framework for analyzing DNMs from whole-genome sequencing (WGS) data. This method, TADA-Annotations (TADA-A), is a major advance of the TADA method we developed earlier for DNM analysis in coding regions. TADA-A is able to incorporate many functional annotations such as conservation and enhancer marks, to learn from data which annotations are informative of pathogenic mutations, and to combine both coding and non-coding mutations at the gene level to detect risk genes. It also supports meta-analysis of multiple DNM studies, while adjusting for study-specific technical effects. We applied TADA-A to WGS data of ∼300 autism-affected family trios across five studies and discovered several autism risk genes. The software is freely available for all research uses.


Assuntos
Mapeamento Cromossômico , Predisposição Genética para Doença , Mutação/genética , Estatística como Assunto , Sequenciamento Completo do Genoma , Transtorno Autístico/genética , Calibragem , Elementos Facilitadores Genéticos/genética , Humanos , Anotação de Sequência Molecular , Taxa de Mutação , Splicing de RNA/genética , Fatores de Risco , Sequenciamento do Exoma
15.
Am J Hum Genet ; 101(3): 340-352, 2017 Sep 07.
Artigo em Inglês | MEDLINE | ID: mdl-28844485

RESUMO

Substantial progress has been made in the functional annotation of genetic variation in the human genome. Integrative analysis that incorporates such functional annotations into sequencing studies can aid the discovery of disease-associated genetic variants, especially those with unknown function and located outside protein-coding regions. Direct incorporation of one functional annotation as weight in existing dispersion and burden tests can suffer substantial loss of power when the functional annotation is not predictive of the risk status of a variant. Here, we have developed unified tests that can utilize multiple functional annotations simultaneously for integrative association analysis with efficient computational techniques. We show that the proposed tests significantly improve power when variant risk status can be predicted by functional annotations. Importantly, when functional annotations are not predictive of risk status, the proposed tests incur only minimal loss of power in relation to existing dispersion and burden tests, and under certain circumstances they can even have improved power by learning a weight that better approximates the underlying disease model in a data-adaptive manner. The tests can be constructed with summary statistics of existing dispersion and burden tests for sequencing data, therefore allowing meta-analysis of multiple studies without sharing individual-level data. We applied the proposed tests to a meta-analysis of noncoding rare variants in Metabochip data on 12,281 individuals from eight studies for lipid traits. By incorporating the Eigen functional score, we detected significant associations between noncoding rare variants in SLC22A3 and low-density lipoprotein and total cholesterol, associations that are missed by standard dispersion and burden tests.


Assuntos
Genoma Humano , Estudo de Associação Genômica Ampla/métodos , Metabolômica , Anotação de Sequência Molecular/métodos , Polimorfismo de Nucleotídeo Único , Doenças Cardiovasculares/genética , Doenças Cardiovasculares/metabolismo , Doenças Cardiovasculares/patologia , Simulação por Computador , Predisposição Genética para Doença , Humanos , Lipídeos/análise , Proteínas de Transporte de Cátions Orgânicos/genética , Fenótipo
16.
Am J Hum Genet ; 101(5): 789-802, 2017 Nov 02.
Artigo em Inglês | MEDLINE | ID: mdl-29100090

RESUMO

Renal agenesis and hypodysplasia (RHD) are major causes of pediatric chronic kidney disease and are highly genetically heterogeneous. We conducted whole-exome sequencing in 202 case subjects with RHD and identified diagnostic mutations in genes known to be associated with RHD in 7/202 case subjects. In an additional affected individual with RHD and a congenital heart defect, we found a homozygous loss-of-function (LOF) variant in SLIT3, recapitulating phenotypes reported with Slit3 inactivation in the mouse. To identify genes associated with RHD, we performed an exome-wide association study with 195 unresolved case subjects and 6,905 control subjects. The top signal resided in GREB1L, a gene implicated previously in Hoxb1 and Shha signaling in zebrafish. The significance of the association, which was p = 2.0 × 10-5 for novel LOF, increased to p = 4.1 × 10-6 for LOF and deleterious missense variants combined, and augmented further after accounting for segregation and de novo inheritance of rare variants (joint p = 2.3 × 10-7). Finally, CRISPR/Cas9 disruption or knockdown of greb1l in zebrafish caused specific pronephric defects, which were rescued by wild-type human GREB1L mRNA, but not mRNA containing alleles identified in case subjects. Together, our study provides insight into the genetic landscape of kidney malformations in humans, presents multiple candidates, and identifies SLIT3 and GREB1L as genes implicated in the pathogenesis of RHD.


Assuntos
Anormalidades Congênitas/genética , Exoma/genética , Nefropatias/congênito , Rim/anormalidades , Mutação/genética , Proteínas de Neoplasias/genética , Alelos , Animais , Estudos de Casos e Controles , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas/genética , Feminino , Heterogeneidade Genética , Estudo de Associação Genômica Ampla/métodos , Genótipo , Hereditariedade/genética , Homozigoto , Humanos , Nefropatias/genética , Masculino , Proteínas de Membrana/genética , Camundongos , Fenótipo , RNA Longo não Codificante/genética , Sistema Urinário/anormalidades , Anormalidades Urogenitais/genética , Peixe-Zebra
17.
PLoS Genet ; 13(2): e1006609, 2017 02.
Artigo em Inglês | MEDLINE | ID: mdl-28187132

RESUMO

Aberrant O-glycosylation of serum immunoglobulin A1 (IgA1) represents a heritable pathogenic defect in IgA nephropathy, the most common form of glomerulonephritis worldwide, but specific genetic factors involved in its determination are not known. We performed a quantitative GWAS for serum levels of galactose-deficient IgA1 (Gd-IgA1) in 2,633 subjects of European and East Asian ancestry and discovered two genome-wide significant loci, in C1GALT1 (rs13226913, P = 3.2 x 10-11) and C1GALT1C1 (rs5910940, P = 2.7 x 10-8). These genes encode molecular partners essential for enzymatic O-glycosylation of IgA1. We demonstrated that these two loci explain approximately 7% of variability in circulating Gd-IgA1 in Europeans, but only 2% in East Asians. Notably, the Gd-IgA1-increasing allele of rs13226913 is common in Europeans, but rare in East Asians. Moreover, rs13226913 represents a strong cis-eQTL for C1GALT1 that encodes the key enzyme responsible for the transfer of galactose to O-linked glycans on IgA1. By in vitro siRNA knock-down studies, we confirmed that mRNA levels of both C1GALT1 and C1GALT1C1 determine the rate of secretion of Gd-IgA1 in IgA1-producing cells. Our findings provide novel insights into the genetic regulation of O-glycosylation and are relevant not only to IgA nephropathy, but also to other complex traits associated with O-glycosylation defects, including inflammatory bowel disease, hematologic disease, and cancer.


Assuntos
Galactosiltransferases/genética , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla/métodos , Glomerulonefrite por IGA/genética , Chaperonas Moleculares/genética , Polimorfismo de Nucleotídeo Único , Alelos , Povo Asiático/genética , Linhagem Celular , Estudos de Coortes , Galactose/deficiência , Regulação da Expressão Gênica , Frequência do Gene , Redes Reguladoras de Genes , Predisposição Genética para Doença/etnologia , Genótipo , Glomerulonefrite por IGA/sangue , Glomerulonefrite por IGA/etnologia , Glicosilação , Humanos , Imunoglobulina A/sangue , Modelos Genéticos , Proteínas do Tecido Nervoso/genética , Fenótipo , Interferência de RNA , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Transdução de Sinais/genética , Ubiquitina-Proteína Ligases/genética , População Branca/genética
18.
Genet Epidemiol ; 41(8): 801-810, 2017 12.
Artigo em Inglês | MEDLINE | ID: mdl-29076270

RESUMO

Over the past few years, an increasing number of studies have identified rare variants that contribute to trait heritability. Due to the extreme rarity of some individual variants, gene-based association tests have been proposed to aggregate the genetic variants within a gene, pathway, or specific genomic region as opposed to a one-at-a-time single variant analysis. In addition, in longitudinal studies, statistical power to detect disease susceptibility rare variants can be improved through jointly testing repeatedly measured outcomes, which better describes the temporal development of the trait of interest. However, usual sandwich/model-based inference for sequencing studies with longitudinal outcomes and rare variants can produce deflated/inflated type I error rate without further corrections. In this paper, we develop a group of tests for rare-variant association based on outcomes with repeated measures. We propose new perturbation methods such that the type I error rate of the new tests is not only robust to misspecification of within-subject correlation, but also significantly improved for variants with extreme rarity in a study with small or moderate sample size. Through extensive simulation studies, we illustrate that substantially higher power can be achieved by utilizing longitudinal outcomes and our proposed finite sample adjustment. We illustrate our methods using data from the Multi-Ethnic Study of Atherosclerosis for exploring association of repeated measures of blood pressure with rare and common variants based on exome sequencing data on 6,361 individuals.


Assuntos
Aterosclerose/genética , Modelos Genéticos , Aterosclerose/etnologia , Aterosclerose/patologia , Pressão Sanguínea/genética , Proteínas de Ligação a DNA/genética , Etnicidade/genética , Estudo de Associação Genômica Ampla , Humanos , Polimorfismo de Nucleotídeo Único
19.
Bioinformatics ; 33(14): 2123-2130, 2017 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-28334222

RESUMO

MOTIVATION: Over the past decade, there has been a remarkable improvement in our understanding of the role of genetic variation in complex human diseases, especially via genome-wide association studies. However, the underlying molecular mechanisms are still poorly characterized, impending the development of therapeutic interventions. Identifying genetic variants that influence the expression level of a gene, i.e. expression quantitative trait loci (eQTLs), can help us understand how genetic variants influence traits at the molecular level. While most eQTL studies focus on identifying mean effects on gene expression using linear regression, evidence suggests that genetic variation can impact the entire distribution of the expression level. Motivated by the potential higher order associations, several studies investigated variance eQTLs. RESULTS: In this paper, we develop a Quantile Rank-score based test (QRank), which provides an easy way to identify eQTLs that are associated with the conditional quantile functions of gene expression. We have applied the proposed QRank to the Genotype-Tissue Expression project, an international tissue bank for studying the relationship between genetic variation and gene expression in human tissues, and found that the proposed QRank complements the existing methods, and identifies new eQTLs with heterogeneous effects across different quantile levels. Notably, we show that the eQTLs identified by QRank but missed by linear regression are associated with greater enrichment in genome-wide significant SNPs from the GWAS catalog, and are also more likely to be tissue specific than eQTLs identified by linear regression. AVAILABILITY AND IMPLEMENTATION: An R package is available on R CRAN at https://cran.r-project.org/web/packages/QRank . CONTACT: xs2148@cumc.columbia.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Perfilação da Expressão Gênica/métodos , Variação Genética , Estudo de Associação Genômica Ampla/métodos , Locos de Características Quantitativas , Software , Biologia Computacional/métodos , Simulação por Computador , Humanos
20.
PLoS Genet ; 10(12): e1004729, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25502226

RESUMO

Pinpointing the small number of causal variants among the abundant naturally occurring genetic variation is a difficult challenge, but a crucial one for understanding precise molecular mechanisms of disease and follow-up functional studies. We propose and investigate two complementary statistical approaches for identification of rare causal variants in sequencing studies: a backward elimination procedure based on groupwise association tests, and a hierarchical approach that can integrate sequencing data with diverse functional and evolutionary conservation annotations for individual variants. Using simulations, we show that incorporation of multiple bioinformatic predictors of deleteriousness, such as PolyPhen-2, SIFT and GERP++ scores, can improve the power to discover truly causal variants. As proof of principle, we apply the proposed methods to VPS13B, a gene mutated in the rare neurodevelopmental disorder called Cohen syndrome, and recently reported with recessive variants in autism. We identify a small set of promising candidates for causal variants, including two loss-of-function variants and a rare, homozygous probably-damaging variant that could contribute to autism risk.


Assuntos
Transtorno Autístico/genética , Evolução Molecular , Dedos/anormalidades , Variação Genética , Deficiência Intelectual/genética , Microcefalia/genética , Hipotonia Muscular/genética , Miopia/genética , Obesidade/genética , Proteínas de Transporte Vesicular/genética , Proteína 4 Semelhante a Angiopoietina , Angiopoietinas/genética , Transtorno Autístico/diagnóstico , Biologia Computacional , Simulação por Computador , Deficiências do Desenvolvimento/diagnóstico , Deficiências do Desenvolvimento/genética , Frequência do Gene , Estudo de Associação Genômica Ampla , Humanos , Deficiência Intelectual/diagnóstico , Microcefalia/diagnóstico , Modelos Genéticos , Hipotonia Muscular/diagnóstico , Miopia/diagnóstico , Obesidade/diagnóstico , Degeneração Retiniana , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA