Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Genet Epidemiol ; 43(7): 800-814, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-31433078

RESUMO

The power of genetic association analyses can be increased by jointly meta-analyzing multiple correlated phenotypes. Here, we develop a meta-analysis framework, Meta-MultiSKAT, that uses summary statistics to test for association between multiple continuous phenotypes and variants in a region of interest. Our approach models the heterogeneity of effects between studies through a kernel matrix and performs a variance component test for association. Using a genotype kernel, our approach can test for rare-variants and the combined effects of both common and rare-variants. To achieve robust power, within Meta-MultiSKAT, we developed fast and accurate omnibus tests combining different models of genetic effects, functional genomic annotations, multiple correlated phenotypes, and heterogeneity across studies. In addition, Meta-MultiSKAT accommodates situations where studies do not share exactly the same set of phenotypes or have differing correlation patterns among the phenotypes. Simulation studies confirm that Meta-MultiSKAT can maintain the type-I error rate at the exome-wide level of 2.5 × 10-6 . Further simulations under different models of association show that Meta-MultiSKAT can improve the power of detection from 23% to 38% on average over single phenotype-based meta-analysis approaches. We demonstrate the utility and improved power of Meta-MultiSKAT in the meta-analyses of four white blood cell subtype traits from the Michigan Genomics Initiative (MGI) and SardiNIA studies.


Assuntos
Estudos de Associação Genética , Metanálise como Assunto , Frequência do Gene/genética , Genótipo , Humanos , Itália , Leucócitos/metabolismo , Modelos Genéticos , Mutação/genética , Fenótipo
2.
PLoS Genet ; 15(6): e1008202, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-31194742

RESUMO

Polygenic risk scores (PRS) are designed to serve as single summary measures that are easy to construct, condensing information from a large number of genetic variants associated with a disease. They have been used for stratification and prediction of disease risk. The primary focus of this paper is to demonstrate how we can combine PRS and electronic health records data to better understand the shared and unique genetic architecture and etiology of disease subtypes that may be both related and heterogeneous. PRS construction strategies often depend on the purpose of the study, the available data/summary estimates, and the underlying genetic architecture of a disease. We consider several choices for constructing a PRS using data obtained from various publicly-available sources including the UK Biobank and evaluate their abilities to predict not just the primary phenotype but also secondary phenotypes derived from electronic health records (EHR). This study was conducted using data from 30,702 unrelated, genotyped patients of recent European descent from the Michigan Genomics Initiative (MGI), a longitudinal biorepository effort within Michigan Medicine. We examine the three most common skin cancer subtypes in the USA: basal cell carcinoma, cutaneous squamous cell carcinoma, and melanoma. Using these PRS for various skin cancer subtypes, we conduct a phenome-wide association study (PheWAS) within the MGI data to evaluate PRS associations with secondary traits. PheWAS results are then replicated using population-based UK Biobank data and compared across various PRS construction methods. We develop an accompanying visual catalog called PRSweb that provides detailed PheWAS results and allows users to directly compare different PRS construction methods.


Assuntos
Predisposição Genética para Doença , Genômica , Herança Multifatorial/genética , Neoplasias Cutâneas/genética , Bancos de Espécimes Biológicos , Registros Eletrônicos de Saúde , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Michigan/epidemiologia , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Fatores de Risco , Neoplasias Cutâneas/patologia , Reino Unido/epidemiologia
3.
Pac Symp Biocomput ; 24: 391-402, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30963077

RESUMO

As genetic sequencing becomes less expensive and data sets linking genetic data and medical records (e.g., Biobanks) become larger and more common, issues of data privacy and computational challenges become more necessary to address in order to realize the benefits of these datasets. One possibility for alleviating these issues is through the use of already-computed summary statistics (e.g., slopes and standard errors from a regression model of a phenotype on a genotype). If groups share summary statistics from their analyses of biobanks, many of the privacy issues and computational challenges concerning the access of these data could be bypassed. In this paper we explore the possibility of using summary statistics from simple linear models of phenotype on genotype in order to make inferences about more complex phenotypes (those that are derived from two or more simple phenotypes). We provide exact formulas for the slope, intercept, and standard error of the slope for linear regressions when combining phenotypes. Derived equations are validated via simulation and tested on a real data set exploring the genetics of fatty acids.

4.
Nat Commun ; 10(1): 1847, 2019 04 23.
Artigo em Inglês | MEDLINE | ID: mdl-31015462

RESUMO

Chronic kidney disease (CKD) is a growing health burden currently affecting 10-15% of adults worldwide. Estimated glomerular filtration rate (eGFR) as a marker of kidney function is commonly used to diagnose CKD. We analyze eGFR data from the Nord-Trøndelag Health Study and Michigan Genomics Initiative and perform a GWAS meta-analysis with public summary statistics, more than doubling the sample size of previous meta-analyses. We identify 147 loci (53 novel) associated with eGFR, including genes involved in transcriptional regulation, kidney development, cellular signaling, metabolism, and solute transport. Additionally, sex-stratified analysis identifies one locus with more significant effects in women than men. Using genetic risk scores constructed from these eGFR meta-analysis results, we show that associated variants are generally predictive of CKD with only modest improvements in detection compared with other known clinical risk factors. Collectively, these results yield additional insight into the genetic factors underlying kidney function and progression to CKD.


Assuntos
Loci Gênicos , Estudo de Associação Genômica Ampla , Taxa de Filtração Glomerular/genética , Insuficiência Renal Crônica/genética , Feminino , Carga Global da Doença , Humanos , Rim/fisiopatologia , Masculino , Prognóstico , Insuficiência Renal Crônica/diagnóstico , Insuficiência Renal Crônica/epidemiologia , Insuficiência Renal Crônica/fisiopatologia , Medição de Risco/métodos , Fatores de Risco , Fatores Sexuais
5.
Nat Commun ; 9(1): 3753, 2018 09 14.
Artigo em Inglês | MEDLINE | ID: mdl-30218074

RESUMO

A detailed understanding of the genome-wide variability of single-nucleotide germline mutation rates is essential to studying human genome evolution. Here, we use ~36 million singleton variants from 3560 whole-genome sequences to infer fine-scale patterns of mutation rate heterogeneity. Mutability is jointly affected by adjacent nucleotide context and diverse genomic features of the surrounding region, including histone modifications, replication timing, and recombination rate, sometimes suggesting specific mutagenic mechanisms. Remarkably, GC content, DNase hypersensitivity, CpG islands, and H3K36 trimethylation are associated with both increased and decreased mutation rates depending on nucleotide context. We validate these estimated effects in an independent dataset of ~46,000 de novo mutations, and confirm our estimates are more accurate than previously published results based on ancestrally older variants without considering genomic features. Our results thus provide the most refined portrait to date of the factors contributing to genome-wide variability of the human germline mutation rate.

6.
Am J Hum Genet ; 102(6): 1048-1061, 2018 06 07.
Artigo em Inglês | MEDLINE | ID: mdl-29779563

RESUMO

Health systems are stewards of patient electronic health record (EHR) data with extraordinarily rich depth and breadth, reflecting thousands of diagnoses and exposures. Measures of genomic variation integrated with EHRs offer a potential strategy to accurately stratify patients for risk profiling and discover new relationships between diagnoses and genomes. The objective of this study was to evaluate whether polygenic risk scores (PRS) for common cancers are associated with multiple phenotypes in a phenome-wide association study (PheWAS) conducted in 28,260 unrelated, genotyped patients of recent European ancestry who consented to participate in the Michigan Genomics Initiative, a longitudinal biorepository effort within Michigan Medicine. PRS for 12 cancer traits were calculated using summary statistics from the NHGRI-EBI catalog. A total of 1,711 synthetic case-control studies was used for PheWAS analyses. There were 13,490 (47.7%) patients with at least one cancer diagnosis in this study sample. PRS exhibited strong association for several cancer traits they were designed for, including female breast cancer, prostate cancer, melanoma, basal cell carcinoma, squamous cell carcinoma, and thyroid cancer. Phenome-wide significant associations were observed between PRS and many non-cancer diagnoses. To differentiate PRS associations driven by the primary trait from associations arising through shared genetic risk profiles, the idea of "exclusion PRS PheWAS" was introduced. Further analysis of temporal order of the diagnoses improved our understanding of these secondary associations. This comprehensive PheWAS used PRS instead of a single variant.

7.
Proc Natl Acad Sci U S A ; 115(2): 379-384, 2018 01 09.
Artigo em Inglês | MEDLINE | ID: mdl-29279374

RESUMO

A major challenge in evaluating the contribution of rare variants to complex disease is identifying enough copies of the rare alleles to permit informative statistical analysis. To investigate the contribution of rare variants to the risk of type 2 diabetes (T2D) and related traits, we performed deep whole-genome analysis of 1,034 members of 20 large Mexican-American families with high prevalence of T2D. If rare variants of large effect accounted for much of the diabetes risk in these families, our experiment was powered to detect association. Using gene expression data on 21,677 transcripts for 643 pedigree members, we identified evidence for large-effect rare-variant cis-expression quantitative trait loci that could not be detected in population studies, validating our approach. However, we did not identify any rare variants of large effect associated with T2D, or the related traits of fasting glucose and insulin, suggesting that large-effect rare variants account for only a modest fraction of the genetic risk of these traits in this sample of families. Reliable identification of large-effect rare variants will require larger samples of extended pedigrees or different study designs that further enrich for such variants.


Assuntos
Diabetes Mellitus Tipo 2/genética , Predisposição Genética para Doença/genética , Variação Genética , Americanos Mexicanos/genética , Diabetes Mellitus Tipo 2/etnologia , Diabetes Mellitus Tipo 2/patologia , Saúde da Família , Feminino , Frequência do Gene , Predisposição Genética para Doença/etnologia , Estudo de Associação Genômica Ampla/métodos , Genótipo , Humanos , Masculino , Linhagem , Fenótipo , Locos de Características Quantitativas/genética , Sequenciamento Completo do Genoma/métodos
8.
Hum Mol Genet ; 26(21): 4301-4313, 2017 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-28973304

RESUMO

Psoriasis is a common inflammatory skin disorder for which multiple genetic susceptibility loci have been identified, but few resolved to specific functional variants. In this study, we sought to identify common and rare psoriasis-associated gene-centric variation. Using exome arrays we genotyped four independent cohorts, totalling 11 861 psoriasis cases and 28 610 controls, aggregating the dataset through statistical meta-analysis. Single variant analysis detected a previously unreported risk locus at TNFSF15 (rs6478108; P = 1.50 × 10-8, OR = 1.10), and association of common protein-altering variants at 11 loci previously implicated in psoriasis susceptibility. We validate previous reports of protective low-frequency protein-altering variants within IFIH1 (encoding an innate antiviral receptor) and TYK2 (encoding a Janus kinase), in each case establishing a further series of protective rare variants (minor allele frequency < 0.01) via gene-wide aggregation testing (IFIH1: pburden = 2.53 × 10-7, OR = 0.707; TYK2: pburden = 6.17 × 10-4, OR = 0.744). Both genes play significant roles in type I interferon (IFN) production and signalling. Several of the protective rare and low-frequency variants in IFIH1 and TYK2 disrupt conserved protein domains, highlighting potential mechanisms through which their effect may be exerted.


Assuntos
Psoríase/genética , Membro 15 da Superfamília de Ligantes de Fatores de Necrose Tumoral/genética , Alelos , Estudos de Casos e Controles , Estudos de Coortes , Exoma , Feminino , Frequência do Gene/genética , Predisposição Genética para Doença/genética , Variação Genética/genética , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Helicase IFIH1 Induzida por Interferon/genética , Helicase IFIH1 Induzida por Interferon/metabolismo , Masculino , Polimorfismo de Nucleotídeo Único/genética , Psoríase/fisiopatologia , Fatores de Risco , TYK2 Quinase/genética , TYK2 Quinase/metabolismo , Membro 15 da Superfamília de Ligantes de Fatores de Necrose Tumoral/metabolismo , Sequenciamento Completo do Exoma
9.
Med Care ; 55(9): 864-870, 2017 09.
Artigo em Inglês | MEDLINE | ID: mdl-28763374

RESUMO

BACKGROUND: Accurately estimating cardiovascular risk is fundamental to good decision-making in cardiovascular disease (CVD) prevention, but risk scores developed in one population often perform poorly in dissimilar populations. We sought to examine whether a large integrated health system can use their electronic health data to better predict individual patients' risk of developing CVD. METHODS: We created a cohort using all patients ages 45-80 who used Department of Veterans Affairs (VA) ambulatory care services in 2006 with no history of CVD, heart failure, or loop diuretics. Our outcome variable was new-onset CVD in 2007-2011. We then developed a series of recalibrated scores, including a fully refit "VA Risk Score-CVD (VARS-CVD)." We tested the different scores using standard measures of prediction quality. RESULTS: For the 1,512,092 patients in the study, the Atherosclerotic cardiovascular disease risk score had similar discrimination as the VARS-CVD (c-statistic of 0.66 in men and 0.73 in women), but the Atherosclerotic cardiovascular disease model had poor calibration, predicting 63% more events than observed. Calibration was excellent in the fully recalibrated VARS-CVD tool, but simpler techniques tested proved less reliable. CONCLUSIONS: We found that local electronic health record data can be used to estimate CVD better than an established risk score based on research populations. Recalibration improved estimates dramatically, and the type of recalibration was important. Such tools can also easily be integrated into health system's electronic health record and can be more readily updated.


Assuntos
Doenças Cardiovasculares/epidemiologia , Registros Eletrônicos de Saúde/estatística & dados numéricos , Indicadores Básicos de Saúde , Distribuição por Idade , Idoso , Aterosclerose/epidemiologia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Medição de Risco , Fatores de Risco , Distribuição por Sexo , Fatores Socioeconômicos , Estados Unidos , United States Department of Veterans Affairs
10.
Ann Rheum Dis ; 76(7): 1321-1324, 2017 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-28501801

RESUMO

OBJECTIVES: Psoriatic arthritis (PsA) is an inflammatory arthritis associated with psoriasis. While many common risk alleles have been reported for association with PsA as well as psoriasis, few rare coding alleles have yet been identified. METHODS: To identify rare coding variation associated with PsA risk or protection, we genotyped 41 267 variants with the exome chip and investigated association within an initial cohort of 1980 PsA cases and 5913 controls. Genotype data for an independent cohort of 2234 PsA cases and 5708 controls was also made available, allowing for a meta-analysis to be performed with the discovery dataset. RESULTS: We identified an association with the rare variant rs35667974 (p=2.39x10-6, OR=0.47), encoding an Ile923Val amino acid change in the IFIH1 gene protein product. The association was reproduced in our independent cohort, which reached a high level of significance on meta-analysis with the discovery and replication datasets (p=4.67x10-10). We identified a strong association with IFIH1 when performing multiple-variant analysis (p=6.77x10-6), and found evidence of independent effects between the rare allele and the common PsA variant at the same locus. CONCLUSION: For the first time, we report a rare coding allele in IFIH1 to be protective for PsA. This rare allele has also been identified to have the same direction of effect on type I diabetes and psoriasis. While this association further supports existing evidence for IFIH1 as a causal gene for PsA, mechanistic studies will need to be pursued to confirm that IFIH1 is indeed causal.


Assuntos
Artrite Psoriásica/genética , Helicase IFIH1 Induzida por Interferon/genética , Alelos , Estudos de Casos e Controles , Predisposição Genética para Doença , Genótipo , Humanos , Modelos Logísticos , Polimorfismo de Nucleotídeo Único , Análise de Componente Principal , Fatores de Proteção
11.
Nat Commun ; 8: 15382, 2017 05 24.
Artigo em Inglês | MEDLINE | ID: mdl-28537254

RESUMO

Psoriasis is a complex disease of skin with a prevalence of about 2%. We conducted the largest meta-analysis of genome-wide association studies (GWAS) for psoriasis to date, including data from eight different Caucasian cohorts, with a combined effective sample size >39,000 individuals. We identified 16 additional psoriasis susceptibility loci achieving genome-wide significance, increasing the number of identified loci to 63 for European-origin individuals. Functional analysis highlighted the roles of interferon signalling and the NFκB cascade, and we showed that the psoriasis signals are enriched in regulatory elements from different T cells (CD8+ T-cells and CD4+ T-cells including TH0, TH1 and TH17). The identified loci explain ∼28% of the genetic heritability and generate a discriminatory genetic risk score (AUC=0.76 in our sample) that is significantly correlated with age at onset (p=2 × 10-89). This study provides a comprehensive layout for the genetic architecture of common variants for psoriasis.


Assuntos
Grupo com Ancestrais do Continente Europeu/genética , Loci Gênicos/genética , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Psoríase/genética , Idade de Início , Redes Reguladoras de Genes/genética , Redes Reguladoras de Genes/imunologia , Humanos , Interferons/imunologia , Interferons/metabolismo , NF-kappa B/imunologia , NF-kappa B/metabolismo , Polimorfismo de Nucleotídeo Único , Mapas de Interação de Proteínas/genética , Mapas de Interação de Proteínas/imunologia , Psoríase/imunologia , Transdução de Sinais/genética , Transdução de Sinais/imunologia , Linfócitos T/imunologia , Linfócitos T/metabolismo
12.
Stat Med ; 36(13): 2148-2160, 2017 06 15.
Artigo em Inglês | MEDLINE | ID: mdl-28245528

RESUMO

Creating accurate risk prediction models from Big Data resources such as Electronic Health Records (EHRs) is a critical step toward achieving precision medicine. A major challenge in developing these tools is accounting for imperfect aspects of EHR data, particularly the potential for misclassified outcomes. Misclassification, the swapping of case and control outcome labels, is well known to bias effect size estimates for regression prediction models. In this paper, we study the effect of misclassification on accuracy assessment for risk prediction models and find that it leads to bias in the area under the curve (AUC) metric from standard ROC analysis. The extent of the bias is determined by the false positive and false negative misclassification rates as well as disease prevalence. Notably, we show that simply correcting for misclassification while building the prediction model is not sufficient to remove the bias in AUC. We therefore introduce an intuitive misclassification-adjusted ROC procedure that accounts for uncertainty in observed outcomes and produces bias-corrected estimates of the true AUC. The method requires that misclassification rates are either known or can be estimated, quantities typically required for the modeling step. The computational simplicity of our method is a key advantage, making it ideal for efficiently comparing multiple prediction models on very large datasets. Finally, we apply the correction method to a hospitalization prediction model from a cohort of over 1 million patients from the Veterans Health Administrations EHR. Implementations of the ROC correction are provided for Stata and R. Published 2017. This article is a U.S. Government work and is in the public domain in the USA.


Assuntos
Modelos Estatísticos , Curva ROC , Área Sob a Curva , Viés , Registros Eletrônicos de Saúde , Hospitalização/estatística & dados numéricos , Humanos , Medição de Risco/métodos , Estados Unidos , United States Department of Veterans Affairs/estatística & dados numéricos
13.
Pharmacogenet Genomics ; 27(3): 89-100, 2017 03.
Artigo em Inglês | MEDLINE | ID: mdl-27984508

RESUMO

OBJECTIVE: Proteins involving absorption, distribution, metabolism, and excretion (ADME) play a critical role in drug pharmacokinetics. The type and frequency of genetic variation in the ADME genes differ among populations. The aim of this study was to systematically investigate common and rare ADME coding variation in diverse ethnic populations by exome sequencing. MATERIALS AND METHODS: Data derived from commercial exome capture arrays and next-generation sequencing were used to characterize coding variation in 298 ADME genes in 251 Northeast Asians and 1181 individuals from the 1000 Genomes Project. RESULTS: Approximately 75% of the ADME coding sequence was captured at high quality across the joint samples harboring more than 8000 variants, with 49% of individuals carrying at least one 'knockout' allele. ADME genes carried 50% more nonsynonymous variation than non-ADME genes (P=8.2×10) and showed significantly greater levels of population differentiation (P=7.6×10). Out of the 2135 variants identified that were predicted to be deleterious, 633 were not on commercially available ADME or general-purpose genotyping arrays. Forty deleterious variants within important ADME genes, with frequencies of at least 2% in at least one population, were identified as candidates for future pharmacogenetic studies. CONCLUSION: Exome sequencing was effective in accurately genotyping most ADME variants important for pharmacogenetic research, in addition to identifying rare or potentially de novo coding variants that may be clinically meaningful. Furthermore, as a class, ADME genes are more variable and less sensitive to purifying selection than non-ADME genes.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Grupos Populacionais/genética , Análise de Sequência de DNA/métodos , Exoma , Variação Genética , Genética Populacional , Humanos , Masculino , Polimorfismo de Nucleotídeo Único , Grupos Populacionais/etnologia , Análise de Componente Principal
14.
Genet Epidemiol ; 39(4): 227-38, 2015 May.
Artigo em Inglês | MEDLINE | ID: mdl-25740221

RESUMO

Advances in exome sequencing and the development of exome genotyping arrays are enabling explorations of association between rare coding variants and complex traits. To ensure power for these rare variant analyses, a variety of association tests that group variants by gene or functional unit have been proposed. Here, we extend these tests to family-based studies. We develop family-based burden tests, variable frequency threshold tests and sequence kernel association tests. Through simulations, we compare the performance of different tests. We describe situations where family-based studies provide greater power than studies of unrelated individuals to detect rare variants associated with moderate to large changes in trait values. Broadly speaking, we find that when sample sizes are limited and only a modest fraction of all trait-associated variants can be identified, family samples are more powerful. Finally, we illustrate our approach by analyzing the relationship between coding variants and levels of high-density lipoprotein (HDL) cholesterol in 11,556 individuals from the HUNT and SardiNIA studies, demonstrating association for coding variants in the APOC3, CETP, LIPC, LIPG, and LPL genes and illustrating the value of family samples, meta-analysis, and gene-level tests. Our methods are implemented in freely available C++ code.


Assuntos
Estudos de Associação Genética/métodos , Variação Genética/genética , Modelos Genéticos , Software , Apolipoproteína C-III/genética , Proteínas de Transferência de Ésteres de Colesterol/genética , HDL-Colesterol/genética , Simulação por Computador , Exoma/genética , Família , Genótipo , Humanos , Lipase/genética , Lipase Lipoproteica/genética , Fenótipo
15.
Eur J Hum Genet ; 22(9): 1137-44, 2014 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-24398795

RESUMO

There is substantial interest in the role of rare genetic variants in the etiology of complex human diseases. Several gene-based tests have been developed to simultaneously analyze multiple rare variants for association with phenotypic traits. The tests can largely be partitioned into two classes - 'burden' tests and 'joint' tests - based on how they accumulate evidence of association across sites. We used the empirical joint site frequency spectra of rare, nonsynonymous variation from a large multi-population sequencing study to explore the effect of realistic rare variant population structure on gene-based tests. We observed an important difference between the two test classes: their susceptibility to population stratification. Focusing on European samples, we found that joint tests, which allow variants to have opposite directions of effect, consistently showed higher levels of P-value inflation than burden tests. We determined that the differential stratification was caused by two specific patterns in the interpopulation distribution of rare variants, each correlating with inflation in one of the test classes. The pattern that inflates joint tests is more prevalent in real data, explaining the higher levels of inflation in these tests. Furthermore, we show that the different sources of inflation between tests lead to heterogeneous responses to genomic control correction and the number of variants analyzed. Our results indicate that care must be taken when interpreting joint and burden analyses of the same set of rare variants, in particular, to avoid mistaking inflated P-values in joint tests for stronger signals of true associations.


Assuntos
Grupo com Ancestrais do Continente Europeu/genética , Frequência do Gene , Testes Genéticos/métodos , Modelos Genéticos , Interpretação Estatística de Dados , Testes Genéticos/normas , Humanos , Polimorfismo Genético
16.
Nat Genet ; 46(2): 200-4, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24336170

RESUMO

The majority of reported complex disease associations for common genetic variants have been identified through meta-analysis, a powerful approach that enables the use of large sample sizes while protecting against common artifacts due to population structure and repeated small-sample analyses sharing individual-level data. As the focus of genetic association studies shifts to rare variants, genes and other functional units are becoming the focus of analysis. Here we propose and evaluate new approaches for performing meta-analysis of rare variant association tests, including burden tests, weighted burden tests, variable-threshold tests and tests that allow variants with opposite effects to be grouped together. We show that our approach retains useful features from single-variant meta-analysis approaches and demonstrate its use in a study of blood lipid levels in ∼18,500 individuals genotyped with exome arrays.


Assuntos
Estudos de Associação Genética/métodos , Variação Genética , Lipídeos/genética , Metanálise como Assunto , Projetos de Pesquisa , Interpretação Estatística de Dados , Exoma/genética , Genética Populacional , Genótipo , Humanos , Lipídeos/sangue , Modelos Genéticos , Método de Monte Carlo
17.
Genome Res ; 23(12): 1974-84, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-23990608

RESUMO

Understanding patterns of spontaneous mutations is of fundamental interest in studies of human genome evolution and genetic disease. Here, we used extremely rare variants in humans to model the molecular spectrum of single-nucleotide mutations. Compared to common variants in humans and human-chimpanzee fixed differences (substitutions), rare variants, on average, arose more recently in the human lineage and are less affected by the potentially confounding effects of natural selection, population demographic history, and biased gene conversion. We analyzed variants obtained from a population-based sequencing study of 202 genes in >14,000 individuals. We observed considerable variability in the per-gene mutation rate, which was correlated with local GC content, but not recombination rate. Using >20,000 variants with a derived allele frequency ≤ 10(-4), we examined the effect of local GC content and recombination rate on individual variant subtypes and performed comparisons with common variants and substitutions. The influence of local GC content on rare variants differed from that on common variants or substitutions, and the differences varied by variant subtype. Furthermore, recombination rate and recombination hotspots have little effect on rare variants of any subtype, yet both have a relatively strong impact on multiple variant subtypes in common variants and substitutions. This observation is consistent with the effect of biased gene conversion or selection-dependent processes. Our results highlight the distinct biases inherent in the initial mutation patterns and subsequent evolutionary processes that affect segregating variants.


Assuntos
Variação Genética , Genoma Humano , Mutação Puntual , Animais , Composição de Bases , Evolução Molecular , Conversão Gênica , Frequência do Gene , Genômica , Humanos , Modelos Logísticos , Modelos Genéticos , Taxa de Mutação , Pan troglodytes/genética , Filogenia , Recombinação Genética , Seleção Genética
18.
Genet Epidemiol ; 37(4): 345-57, 2013 May.
Artigo em Inglês | MEDLINE | ID: mdl-23526307

RESUMO

The wave of next-generation sequencing data has arrived. However, many questions still remain about how to best analyze sequence data, particularly the contribution of rare genetic variants to human disease. Numerous statistical methods have been proposed to aggregate association signals across multiple rare variant sites in an effort to increase statistical power; however, the precise relation between the tests is often not well understood. We present a geometric representation for rare variant data in which rare allele counts in case and control samples are treated as vectors in Euclidean space. The geometric framework facilitates a rigorous classification of existing rare variant tests into two broad categories: tests for a difference in the lengths of the case and control vectors, and joint tests for a difference in either the lengths or angles of the two vectors. We demonstrate that genetic architecture of a trait, including the number and frequency of risk alleles, directly relates to the behavior of the length and joint tests. Hence, the geometric framework allows prediction of which tests will perform best under different disease models. Furthermore, the structure of the geometric framework immediately suggests additional classes and types of rare variant tests. We consider two general classes of tests which show robustness to noncausal and protective variants. The geometric framework introduces a novel and unique method to assess current rare variant methodology and provides guidelines for both applied and theoretical researchers.


Assuntos
Variação Genética , Estudo de Associação Genômica Ampla , Modelos Teóricos , Algoritmos , Alelos , Frequência do Gene , Predisposição Genética para Doença , Humanos , Modelos Genéticos , Modelos Estatísticos
19.
Genetics ; 191(4): 1239-55, 2012 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-22595242

RESUMO

The potential for imputed genotypes to enhance an analysis of genetic data depends largely on the accuracy of imputation, which in turn depends on properties of the reference panel of template haplotypes used to perform the imputation. To provide a basis for exploring how properties of the reference panel affect imputation accuracy theoretically rather than with computationally intensive imputation experiments, we introduce a coalescent model that considers imputation accuracy in terms of population-genetic parameters. Our model allows us to investigate sampling designs in the frequently occurring scenario in which imputation targets and templates are sampled from different populations. In particular, we derive expressions for expected imputation accuracy as a function of reference panel size and divergence time between the reference and target populations. We find that a modestly sized "internal" reference panel from the same population as a target haplotype yields, on average, greater imputation accuracy than a larger "external" panel from a different population, even if the divergence time between the two populations is small. The improvement in accuracy for the internal panel increases with increasing divergence time between the target and reference populations. Thus, in humans, our model predicts that imputation accuracy can be improved by generating small population-specific custom reference panels to augment existing collections such as those of the HapMap or 1000 Genomes Projects. Our approach can be extended to understand additional factors that affect imputation accuracy in complex population-genetic settings, and the results can ultimately facilitate improvements in imputation study designs.


Assuntos
Estudo de Associação Genômica Ampla , Genótipo , Modelos Genéticos , Algoritmos , Simulação por Computador , Haplótipos , Humanos , Densidade Demográfica
20.
Science ; 337(6090): 100-4, 2012 Jul 06.
Artigo em Inglês | MEDLINE | ID: mdl-22604722

RESUMO

Rare genetic variants contribute to complex disease risk; however, the abundance of rare variants in human populations remains unknown. We explored this spectrum of variation by sequencing 202 genes encoding drug targets in 14,002 individuals. We find rare variants are abundant (1 every 17 bases) and geographically localized, so that even with large sample sizes, rare variant catalogs will be largely incomplete. We used the observed patterns of variation to estimate population growth parameters, the proportion of variants in a given frequency class that are putatively deleterious, and mutation rates for each gene. We conclude that because of rapid population growth and weak purifying selection, human populations harbor an abundance of rare variants, many of which are deleterious and have relevance to understanding disease risk.


Assuntos
Doença/genética , Variação Genética , Genoma Humano , Afro-Americanos/genética , Grupo com Ancestrais do Continente Asiático , Grupo com Ancestrais do Continente Europeu/genética , Frequência do Gene , Estudos de Associação Genética , Predisposição Genética para Doença , Geografia , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Terapia de Alvo Molecular , Herança Multifatorial , Taxa de Mutação , Farmacogenética , Fenótipo , Polimorfismo de Nucleotídeo Único , Crescimento Demográfico , Tamanho da Amostra , Seleção Genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA