Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 217
Filtrar
1.
Genet Epidemiol ; 2019 Aug 26.
Artigo em Inglês | MEDLINE | ID: mdl-31452258

RESUMO

Array genotyping is a cost-effective and widely used tool that enables assessment of up to millions of genetic markers in hundreds of thousands of individuals. Genotyping array data are typically highly accurate but sensitive to mixing of DNA samples from multiple individuals before or during genotyping. Contaminated samples can lead to genotyping errors and consequently cause false positive signals or reduce power of association analyses. Here, we propose a new method to identify contaminated samples and the sources of contamination within a genotyping batch. Through analysis of array intensity and genotype data from intentionally mixed samples and 22,366 samples of the Michigan Genomics Initiative, an ongoing biobank-based study, we show that our method can reliably estimate contamination. We also show that identifying sources of contamination can implicate problematic sample processing steps and guide process improvements. Compared to existing methods, our approach can estimate the proportion of contaminating DNA more accurately, eliminate the need for external databases of allele frequencies, and provide contamination estimates that are more robust to the ancestral origin of the contaminating sample.

3.
Genet Epidemiol ; 43(7): 800-814, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-31433078

RESUMO

The power of genetic association analyses can be increased by jointly meta-analyzing multiple correlated phenotypes. Here, we develop a meta-analysis framework, Meta-MultiSKAT, that uses summary statistics to test for association between multiple continuous phenotypes and variants in a region of interest. Our approach models the heterogeneity of effects between studies through a kernel matrix and performs a variance component test for association. Using a genotype kernel, our approach can test for rare-variants and the combined effects of both common and rare-variants. To achieve robust power, within Meta-MultiSKAT, we developed fast and accurate omnibus tests combining different models of genetic effects, functional genomic annotations, multiple correlated phenotypes, and heterogeneity across studies. In addition, Meta-MultiSKAT accommodates situations where studies do not share exactly the same set of phenotypes or have differing correlation patterns among the phenotypes. Simulation studies confirm that Meta-MultiSKAT can maintain the type-I error rate at the exome-wide level of 2.5 × 10-6 . Further simulations under different models of association show that Meta-MultiSKAT can improve the power of detection from 23% to 38% on average over single phenotype-based meta-analysis approaches. We demonstrate the utility and improved power of Meta-MultiSKAT in the meta-analyses of four white blood cell subtype traits from the Michigan Genomics Initiative (MGI) and SardiNIA studies.

4.
Am J Hum Genet ; 105(1): 65-77, 2019 Jul 03.
Artigo em Inglês | MEDLINE | ID: mdl-31204010

RESUMO

The Genes for Good study uses social media to engage a large, diverse participant pool in genetics research and education. Health history and daily tracking surveys are administered through a Facebook application, and participants who complete a minimum number of surveys are mailed a saliva sample kit ("spit kit") to collect DNA for genotyping. As of March 2019, we engaged >80,000 individuals, sent spit kits to >32,000 individuals who met minimum participation requirements, and collected >27,000 spit kits. Participants come from all 50 states and include a diversity of ancestral backgrounds. Rates of important chronic health indicators are consistent with those estimated for the general U.S. population using more traditional study designs. However, our sample is younger and contains a greater percentage of females than the general population. As one means of verifying data quality, we have replicated genome-wide association studies (GWASs) for exemplar traits, such as asthma, diabetes, body mass index (BMI), and pigmentation. The flexible framework of the web application makes it relatively simple to add new questionnaires and for other researchers to collaborate. We anticipate that the study sample will continue to grow and that future analyses may further capitalize on the strengths of the longitudinal data in combination with genetic information.

5.
PLoS Genet ; 15(6): e1008202, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-31194742

RESUMO

Polygenic risk scores (PRS) are designed to serve as single summary measures that are easy to construct, condensing information from a large number of genetic variants associated with a disease. They have been used for stratification and prediction of disease risk. The primary focus of this paper is to demonstrate how we can combine PRS and electronic health records data to better understand the shared and unique genetic architecture and etiology of disease subtypes that may be both related and heterogeneous. PRS construction strategies often depend on the purpose of the study, the available data/summary estimates, and the underlying genetic architecture of a disease. We consider several choices for constructing a PRS using data obtained from various publicly-available sources including the UK Biobank and evaluate their abilities to predict not just the primary phenotype but also secondary phenotypes derived from electronic health records (EHR). This study was conducted using data from 30,702 unrelated, genotyped patients of recent European descent from the Michigan Genomics Initiative (MGI), a longitudinal biorepository effort within Michigan Medicine. We examine the three most common skin cancer subtypes in the USA: basal cell carcinoma, cutaneous squamous cell carcinoma, and melanoma. Using these PRS for various skin cancer subtypes, we conduct a phenome-wide association study (PheWAS) within the MGI data to evaluate PRS associations with secondary traits. PheWAS results are then replicated using population-based UK Biobank data and compared across various PRS construction methods. We develop an accompanying visual catalog called PRSweb that provides detailed PheWAS results and allows users to directly compare different PRS construction methods.

6.
Nat Genet ; 51(6): 1067, 2019 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-31068672

RESUMO

In the version of this article initially published, in Supplementary Data 5, the logFC, FC, P value and adjusted P value for advanced AMD versus control (DE 4/1) without age correction did not correspond to the correct gene IDs. The errors have been corrected in the HTML version of the article.

7.
Nat Commun ; 10(1): 1847, 2019 04 23.
Artigo em Inglês | MEDLINE | ID: mdl-31015462

RESUMO

Chronic kidney disease (CKD) is a growing health burden currently affecting 10-15% of adults worldwide. Estimated glomerular filtration rate (eGFR) as a marker of kidney function is commonly used to diagnose CKD. We analyze eGFR data from the Nord-Trøndelag Health Study and Michigan Genomics Initiative and perform a GWAS meta-analysis with public summary statistics, more than doubling the sample size of previous meta-analyses. We identify 147 loci (53 novel) associated with eGFR, including genes involved in transcriptional regulation, kidney development, cellular signaling, metabolism, and solute transport. Additionally, sex-stratified analysis identifies one locus with more significant effects in women than men. Using genetic risk scores constructed from these eGFR meta-analysis results, we show that associated variants are generally predictive of CKD with only modest improvements in detection compared with other known clinical risk factors. Collectively, these results yield additional insight into the genetic factors underlying kidney function and progression to CKD.


Assuntos
Loci Gênicos , Estudo de Associação Genômica Ampla , Taxa de Filtração Glomerular/genética , Insuficiência Renal Crônica/genética , Feminino , Carga Global da Doença , Humanos , Rim/fisiopatologia , Masculino , Prognóstico , Insuficiência Renal Crônica/diagnóstico , Insuficiência Renal Crônica/epidemiologia , Insuficiência Renal Crônica/fisiopatologia , Medição de Risco/métodos , Fatores de Risco , Fatores Sexuais
8.
Nat Genet ; 51(4): 606-610, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30742112

RESUMO

Genome-wide association studies (GWAS) have identified genetic variants at 34 loci contributing to age-related macular degeneration (AMD)1-3. We generated transcriptional profiles of postmortem retinas from 453 controls and cases at distinct stages of AMD and integrated retinal transcriptomes, covering 13,662 protein-coding and 1,462 noncoding genes, with genotypes at more than 9 million common SNPs for expression quantitative trait loci (eQTL) analysis of a tissue not included in Genotype-Tissue Expression (GTEx) and other large datasets4,5. Cis-eQTL analysis identified 10,474 genes under genetic regulation, including 4,541 eQTLs detected only in the retina. Integrated analysis of AMD-GWAS with eQTLs ascertained likely target genes at six reported loci. Using transcriptome-wide association analysis (TWAS), we identified three additional genes, RLBP1, HIC1 and PARP12, after Bonferroni correction. Our studies expand the genetic landscape of AMD and establish the Eye Genotype Expression (EyeGEx) database as a resource for post-GWAS interpretation of multifactorial ocular traits.


Assuntos
Predisposição Genética para Doença/genética , Degeneração Macular/genética , Locos de Características Quantitativas/genética , Transcriptoma/genética , Estudos de Casos e Controles , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica/genética , Estudo de Associação Genômica Ampla/métodos , Genótipo , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Retina/fisiopatologia
9.
Nat Neurosci ; 22(3): 503, 2019 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-30622366

RESUMO

The author list was in the wrong order in the HTML version of the original article and in the HTML version of the original correction notice. This has been corrected to show the 23andMe Research Team as the fourth author and Abraham A. Palmer as the last author in both places.

10.
Genet Epidemiol ; 43(1): 112-117, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-30565766

RESUMO

It is unclear whether insertions and deletions (indels) are more likely to influence complex traits than abundant single-nucleotide polymorphisms (SNPs). We sought to understand which category of variation is more likely to impact health. Using the SardiNIA study as an exemplar, we characterized 478,876 common indels and 8,246,244 common SNPs in up to 5,949 well-phenotyped individuals from an isolated valley in Sardinia. We assessed association between 120 traits, resulting in 89 nonoverlapping-associated loci.We evaluated whether indels were enriched among credible sets of potential causal variants. These credible sets included 1,319 SNPs and 88 indels. We did not find indels to be significantly enriched. Indels were the most likely causal variant in seven loci, including one locus associated with monocyte count where an indel with causality and mechanism previously demonstrated (rs200748895:TGCTG/T) had a 0.999 posterior probability. Overall, our results show a very modest and nonsignificant enrichment for common indels in associated loci.


Assuntos
Mutação INDEL/genética , Polimorfismo de Nucleotídeo Único/genética , Loci Gênicos , Humanos , Itália , Anotação de Sequência Molecular
12.
Nat Commun ; 9(1): 4178, 2018 10 09.
Artigo em Inglês | MEDLINE | ID: mdl-30301895

RESUMO

Psoriatic arthritis (PsA) is a complex chronic musculoskeletal condition that occurs in ~30% of psoriasis patients. Currently, no systematic strategy is available that utilizes the differences in genetic architecture between PsA and cutaneous-only psoriasis (PsC) to assess PsA risk before symptoms appear. Here, we introduce a computational pipeline for predicting PsA among psoriasis patients using data from six cohorts with >7000 genotyped PsA and PsC patients. We identify 9 new loci for psoriasis or its subtypes and achieve 0.82 area under the receiver operator curve in distinguishing PsA vs. PsC when using 200 genetic markers. Among the top 5% of our PsA prediction we achieve >90% precision with 100% specificity and 16% recall for predicting PsA among psoriatic patients, using conditional inference forest or shrinkage discriminant analysis. Combining statistical and machine-learning techniques, we show that the underlying genetic differences between psoriasis subtypes can be used for individualized subtype risk assessment.

13.
Nat Commun ; 9(1): 4038, 2018 10 02.
Artigo em Inglês | MEDLINE | ID: mdl-30279509

RESUMO

Hundreds of thousands of human whole genome sequencing (WGS) datasets will be generated over the next few years. These data are more valuable in aggregate: joint analysis of genomes from many sources increases sample size and statistical power. A central challenge for joint analysis is that different WGS data processing pipelines cause substantial differences in variant calling in combined datasets, necessitating computationally expensive reprocessing. This approach is no longer tenable given the scale of current studies and data volumes. Here, we define WGS data processing standards that allow different groups to produce functionally equivalent (FE) results, yet still innovate on data processing pipelines. We present initial FE pipelines developed at five genome centers and show that they yield similar variant calling results and produce significantly less variability than sequencing replicates. This work alleviates a key technical bottleneck for genome aggregation and helps lay the foundation for community-wide human genetics studies.

14.
Nat Genet ; 50(10): 1426-1434, 2018 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-30224645

RESUMO

The population of the Mediterranean island of Sardinia has made important contributions to genome-wide association studies of complex disease traits and, based on ancient DNA studies of mainland Europe, Sardinia is hypothesized to be a unique refuge for early Neolithic ancestry. To provide new insights on the genetic history of this flagship population, we analyzed 3,514 whole-genome sequenced individuals from Sardinia. Sardinian samples show elevated levels of shared ancestry with Basque individuals, especially samples from the more historically isolated regions of Sardinia. Our analysis also uniquely illuminates how levels of genetic similarity with mainland ancient DNA samples varies subtly across the island. Together, our results indicate that within-island substructure and sex-biased processes have substantially impacted the genetic history of Sardinia. These results give new insight into the demography of ancestral Sardinians and help further the understanding of sharing of disease risk alleles between Sardinia and mainland populations.

15.
Nat Genet ; 50(9): 1335-1341, 2018 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-30104761

RESUMO

In genome-wide association studies (GWAS) for thousands of phenotypes in large biobanks, most binary traits have substantially fewer cases than controls. Both of the widely used approaches, the linear mixed model and the recently proposed logistic mixed model, perform poorly; they produce large type I error rates when used to analyze unbalanced case-control phenotypes. Here we propose a scalable and accurate generalized mixed model association test that uses the saddlepoint approximation to calibrate the distribution of score test statistics. This method, SAIGE (Scalable and Accurate Implementation of GEneralized mixed model), provides accurate P values even when case-control ratios are extremely unbalanced. SAIGE uses state-of-art optimization strategies to reduce computational costs; hence, it is applicable to GWAS for thousands of phenotypes by large biobanks. Through the analysis of UK Biobank data of 408,961 samples from white British participants with European ancestry for > 1,400 binary phenotypes, we show that SAIGE can efficiently analyze large sample data, controlling for unbalanced case-control ratios and sample relatedness.

16.
Nat Genet ; 50(9): 1234-1239, 2018 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-30061737

RESUMO

To identify genetic variation underlying atrial fibrillation, the most common cardiac arrhythmia, we performed a genome-wide association study of >1,000,000 people, including 60,620 atrial fibrillation cases and 970,216 controls. We identified 142 independent risk variants at 111 loci and prioritized 151 functional candidate genes likely to be involved in atrial fibrillation. Many of the identified risk variants fall near genes where more deleterious mutations have been reported to cause serious heart defects in humans (GATA4, MYH6, NKX2-5, PITX2, TBX5)1, or near genes important for striated muscle function and integrity (for example, CFL2, MYH7, PKP2, RBM20, SGCG, SSPN). Pathway and functional enrichment analyses also suggested that many of the putative atrial fibrillation genes act via cardiac structural remodeling, potentially in the form of an 'atrial cardiomyopathy'2, either during fetal heart development or as a response to stress in the adult heart.

17.
PLoS Genet ; 14(7): e1007452, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-30016313

RESUMO

Meta-analysis of genetic association studies increases sample size and the power for mapping complex traits. Existing methods are mostly developed for datasets without missing values, i.e. the summary association statistics are measured for all variants in contributing studies. In practice, genotype imputation is not always effective. This may be the case when targeted genotyping/sequencing assays are used or when the un-typed genetic variant is rare. Therefore, contributed summary statistics often contain missing values. Existing methods for imputing missing summary association statistics and using imputed values in meta-analysis, approximate conditional analysis, or simple strategies such as complete case analysis all have theoretical limitations. Applying these approaches can bias genetic effect estimates and lead to seriously inflated type-I or type-II errors in conditional analysis, which is a critical tool for identifying independently associated variants. To address this challenge and complement imputation methods, we developed a method to combine summary statistics across participating studies and consistently estimate joint effects, even when the contributed summary statistics contain large amounts of missing values. Based on this estimator, we proposed a score statistic called PCBS (partial correlation based score statistic) for conditional analysis of single-variant and gene-level associations. Through extensive analysis of simulated and real data, we showed that the new method produces well-calibrated type-I errors and is substantially more powerful than existing approaches. We applied the proposed approach to one of the largest meta-analyses to date for the cigarettes-per-day phenotype. Using the new method, we identified multiple novel independently associated variants at known loci for tobacco use, which were otherwise missed by alternative methods. Together, the phenotypic variance explained by these variants was 1.1%, improving that of previously reported associations by 71%. These findings illustrate the extent of locus allelic heterogeneity and can help pinpoint causal variants.

18.
Am J Hum Genet ; 102(6): 1048-1061, 2018 06 07.
Artigo em Inglês | MEDLINE | ID: mdl-29779563

RESUMO

Health systems are stewards of patient electronic health record (EHR) data with extraordinarily rich depth and breadth, reflecting thousands of diagnoses and exposures. Measures of genomic variation integrated with EHRs offer a potential strategy to accurately stratify patients for risk profiling and discover new relationships between diagnoses and genomes. The objective of this study was to evaluate whether polygenic risk scores (PRS) for common cancers are associated with multiple phenotypes in a phenome-wide association study (PheWAS) conducted in 28,260 unrelated, genotyped patients of recent European ancestry who consented to participate in the Michigan Genomics Initiative, a longitudinal biorepository effort within Michigan Medicine. PRS for 12 cancer traits were calculated using summary statistics from the NHGRI-EBI catalog. A total of 1,711 synthetic case-control studies was used for PheWAS analyses. There were 13,490 (47.7%) patients with at least one cancer diagnosis in this study sample. PRS exhibited strong association for several cancer traits they were designed for, including female breast cancer, prostate cancer, melanoma, basal cell carcinoma, squamous cell carcinoma, and thyroid cancer. Phenome-wide significant associations were observed between PRS and many non-cancer diagnoses. To differentiate PRS associations driven by the primary trait from associations arising through shared genetic risk profiles, the idea of "exclusion PRS PheWAS" was introduced. Further analysis of temporal order of the diagnoses improved our understanding of these secondary associations. This comprehensive PheWAS used PRS instead of a single variant.

19.
Nat Neurosci ; 21(7): 1018, 2018 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-29752479

RESUMO

In the version of this article initially published, the consortium authorship was not presented correctly. The 23andMe Research Team was listed as the last author, rather than the fourth, and a line directing readers to the Supplementary Note for a list of members did appear but was not directly associated with the consortium name. Also, the Supplementary Note description stated that both member names and affiliations were included; in fact, only names are given. Finally, the URL for S-PrediXcan was given in the Methods as https://github.com/hakyimlab/S-PrediXcan; the correct URL is https://github.com/hakyimlab/MetaXcan. The errors have been corrected in the HTML and PDF versions of the article.

20.
Annu Rev Genomics Hum Genet ; 19: 73-96, 2018 Aug 31.
Artigo em Inglês | MEDLINE | ID: mdl-29799802

RESUMO

Genotype imputation has become a standard tool in genome-wide association studies because it enables researchers to inexpensively approximate whole-genome sequence data from genome-wide single-nucleotide polymorphism array data. Genotype imputation increases statistical power, facilitates fine mapping of causal variants, and plays a key role in meta-analyses of genome-wide association studies. Only variants that were previously observed in a reference panel of sequenced individuals can be imputed. However, the rapid increase in the number of deeply sequenced individuals will soon make it possible to assemble enormous reference panels that greatly increase the number of imputable variants. In this review, we present an overview of genotype imputation and describe the computational techniques that make it possible to impute genotypes from reference panels with millions of individuals.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA