Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 120
Filter
Add more filters

Publication year range
1.
Hum Mol Genet ; 33(4): 374-385, 2024 Feb 01.
Article in English | MEDLINE | ID: mdl-37934784

ABSTRACT

Genome-wide association studies have contributed extensively to the discovery of disease-associated common variants. However, the genetic contribution to complex traits is still largely difficult to interpret. We report a genome-wide association study of 2394 cases and 2393 controls for age-related macular degeneration (AMD) via whole-genome sequencing, with 46.9 million genetic variants. Our study reveals significant single-variant association signals at four loci and independent gene-based signals in CFH, C2, C3, and NRTN. Using data from the Exome Aggregation Consortium (ExAC) for a gene-based test, we demonstrate an enrichment of predicted rare loss-of-function variants in CFH, CFI, and an as-yet unreported gene in AMD, ORMDL2. Our method of using a large variant list without individual-level genotypes as an external reference provides a flexible and convenient approach to leverage the publicly available variant datasets to augment the search for rare variant associations, which can explain additional disease risk in AMD.


Subject(s)
Genome-Wide Association Study , Macular Degeneration , Humans , Genome-Wide Association Study/methods , Macular Degeneration/genetics , Genotype , Genetic Testing , Whole Genome Sequencing , Polymorphism, Single Nucleotide/genetics , Genetic Predisposition to Disease , Complement Factor H/genetics
2.
PLoS Genet ; 19(12): e1010907, 2023 Dec.
Article in English | MEDLINE | ID: mdl-38113267

ABSTRACT

OBJECTIVE: To overcome the limitations associated with the collection and curation of COVID-19 outcome data in biobanks, this study proposes the use of polygenic risk scores (PRS) as reliable proxies of COVID-19 severity across three large biobanks: the Michigan Genomics Initiative (MGI), UK Biobank (UKB), and NIH All of Us. The goal is to identify associations between pre-existing conditions and COVID-19 severity. METHODS: Drawing on a sample of more than 500,000 individuals from the three biobanks, we conducted a phenome-wide association study (PheWAS) to identify associations between a PRS for COVID-19 severity, derived from a genome-wide association study on COVID-19 hospitalization, and clinical pre-existing, pre-pandemic phenotypes. We performed cohort-specific PRS PheWAS and a subsequent fixed-effects meta-analysis. RESULTS: The current study uncovered 23 pre-existing conditions significantly associated with the COVID-19 severity PRS in cohort-specific analyses, of which 21 were observed in the UKB cohort and two in the MGI cohort. The meta-analysis yielded 27 significant phenotypes predominantly related to obesity, metabolic disorders, and cardiovascular conditions. After adjusting for body mass index, several clinical phenotypes, such as hypercholesterolemia and gastrointestinal disorders, remained associated with an increased risk of hospitalization following COVID-19 infection. CONCLUSION: By employing PRS as a proxy for COVID-19 severity, we corroborated known risk factors and identified novel associations between pre-existing clinical phenotypes and COVID-19 severity. Our study highlights the potential value of using PRS when actual outcome data may be limited or inadequate for robust analyses.


Subject(s)
COVID-19 , Population Health , Humans , Genome-Wide Association Study , Genetic Risk Score , COVID-19/genetics , Biological Specimen Banks , Preexisting Condition Coverage , Risk Factors , Genetic Predisposition to Disease
3.
Am J Hum Genet ; 109(11): 1998-2008, 2022 11 03.
Article in English | MEDLINE | ID: mdl-36240765

ABSTRACT

As most existing genome-wide association studies (GWASs) were conducted in European-ancestry cohorts, and as the existing polygenic risk score (PRS) models have limited transferability across ancestry groups, PRS research on non-European-ancestry groups needs to make efficient use of available data until we attain large sample sizes across all ancestry groups. Here we propose a PRS method using transfer learning techniques. Our approach, TL-PRS, uses gradient descent to fine-tune the baseline PRS model from an ancestry group with large sample GWASs to the dataset of target ancestry. In our application of constructing PRS for seven quantitative and two dichotomous traits for 10,285 individuals of South Asian ancestry and 8,168 individuals of African ancestry in UK Biobank, TL-PRS using PRS-CS as a baseline method obtained 25% average relative improvement for South Asian samples and 29% for African samples compared to the standard PRS-CS method in terms of predicted R2. Our approach increases the transferability of PRSs across ancestries and thereby helps reduce existing inequities in genetics research.


Subject(s)
Genome-Wide Association Study , Multifactorial Inheritance , Humans , Multifactorial Inheritance/genetics , Genetic Predisposition to Disease , Polymorphism, Single Nucleotide/genetics , Risk Factors , Machine Learning
4.
Am J Hum Genet ; 109(10): 1742-1760, 2022 10 06.
Article in English | MEDLINE | ID: mdl-36152628

ABSTRACT

Complex traits are influenced by genetic risk factors, lifestyle, and environmental variables, so-called exposures. Some exposures, e.g., smoking or lipid levels, have common genetic modifiers identified in genome-wide association studies. Because measurements are often unfeasible, exposure polygenic risk scores (ExPRSs) offer an alternative to study the influence of exposures on various phenotypes. Here, we collected publicly available summary statistics for 28 exposures and applied four common PRS methods to generate ExPRSs in two large biobanks: the Michigan Genomics Initiative and the UK Biobank. We established ExPRSs for 27 exposures and demonstrated their applicability in phenome-wide association studies and as predictors for common chronic conditions. Especially the addition of multiple ExPRSs showed, for several chronic conditions, an improvement compared to prediction models that only included traditional, disease-focused PRSs. To facilitate follow-up studies, we share all ExPRS constructs and generated results via an online repository called ExPRSweb.


Subject(s)
Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Lipids , Multifactorial Inheritance/genetics , Risk Factors
5.
BMC Bioinformatics ; 25(1): 65, 2024 Feb 09.
Article in English | MEDLINE | ID: mdl-38336614

ABSTRACT

BACKGROUND: Genetic variants can contribute differently to trait heritability by their functional categories, and recent studies have shown that incorporating functional annotation can improve the predictive performance of polygenic risk scores (PRSs). In addition, when only a small proportion of variants are causal variants, PRS methods that employ a Bayesian framework with shrinkage can account for such sparsity. It is possible that the annotation group level effect is also sparse. However, the number of PRS methods that incorporate both annotation information and shrinkage on effect sizes is limited. We propose a PRS method, PRSbils, which utilizes the functional annotation information with a bilevel continuous shrinkage prior to accommodate the varying genetic architectures both on the variant-specific level and on the functional annotation level. RESULTS: We conducted simulation studies and investigated the predictive performance in settings with different genetic architectures. Results indicated that when there was a relatively large variability of group-wise heritability contribution, the gain in prediction performance from the proposed method was on average 8.0% higher AUC compared to the benchmark method PRS-CS. The proposed method also yielded higher predictive performance compared to PRS-CS in settings with different overlapping patterns of annotation groups and obtained on average 6.4% higher AUC. We applied PRSbils to binary and quantitative traits in three real world data sources (the UK Biobank, the Michigan Genomics Initiative (MGI), and the Korean Genome and Epidemiology Study (KoGES)), and two sources of annotations: ANNOVAR, and pathway information from the Kyoto Encyclopedia of Genes and Genomes (KEGG), and demonstrated that the proposed method holds the potential for improving predictive performance by incorporating functional annotations. CONCLUSIONS: By utilizing a bilevel shrinkage framework, PRSbils enables the incorporation of both overlapping and non-overlapping annotations into PRS construction to improve the performance of genetic risk prediction. The software is available at https://github.com/styvon/PRSbils .


Subject(s)
Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Bayes Theorem , Genome-Wide Association Study/methods , Multifactorial Inheritance , Software , Risk Factors
6.
Genet Epidemiol ; 47(3): 231-248, 2023 04.
Article in English | MEDLINE | ID: mdl-36739617

ABSTRACT

Linkage analysis, a class of methods for detecting co-segregation of genomic segments and traits in families, was used to map disease-causing genes for decades before genotyping arrays and dense SNP genotyping enabled genome-wide association studies in population samples. Population samples often contain related individuals, but the segregation of alleles within families is rarely used because traditional linkage methods are computationally inefficient for larger datasets. Here, we describe Population Linkage, a novel application of Haseman-Elston regression as a method of moments estimator of variance components and their standard errors. We achieve additional computational efficiency by using modern methods for detection of IBD segments and variance component estimation, efficient preprocessing of input data, and minimizing redundant numerical calculations. We also refined variance component models to account for the biases in population-scale methods for IBD segment detection. We ran Population Linkage on four blood lipid traits in over 70,000 individuals from the HUNT and SardiNIA studies, successfully detecting 25 known genetic signals. One notable linkage signal that appeared in both was for low-density lipoprotein (LDL) cholesterol levels in the region near the gene APOE (LOD = 29.3, variance explained = 4.1%). This is the region where the missense variants rs7412 and rs429358, which together make up the ε2, ε3, and ε4 alleles each account for 2.4% and 0.8% of variation in circulating LDL cholesterol. Our results show the potential for linkage analysis and other large-scale applications of method of moments variance components estimation.


Subject(s)
Genome-Wide Association Study , Models, Genetic , Humans , Phenotype , Cholesterol, LDL/genetics , Genetic Linkage , Apolipoproteins E/genetics
7.
Am J Hum Genet ; 108(4): 669-681, 2021 04 01.
Article in English | MEDLINE | ID: mdl-33730541

ABSTRACT

Tests of association between a phenotype and a set of genes in a biological pathway can provide insights into the genetic architecture of complex phenotypes beyond those obtained from single-variant or single-gene association analysis. However, most existing gene set tests have limited power to detect gene set-phenotype association when a small fraction of the genes are associated with the phenotype and cannot identify the potentially "active" genes that might drive a gene set-based association. To address these issues, we have developed Gene set analysis Association Using Sparse Signals (GAUSS), a method for gene set association analysis that requires only GWAS summary statistics. For each significantly associated gene set, GAUSS identifies the subset of genes that have the maximal evidence of association and can best account for the gene set association. Using pre-computed correlation structure among test statistics from a reference panel, our p value calculation is substantially faster than other permutation- or simulation-based approaches. In simulations with varying proportions of causal genes, we find that GAUSS effectively controls type 1 error rate and has greater power than several existing methods, particularly when a small proportion of genes account for the gene set signal. Using GAUSS, we analyzed UK Biobank GWAS summary statistics for 10,679 gene sets and 1,403 binary phenotypes. We found that GAUSS is scalable and identified 13,466 phenotype and gene set association pairs. Within these gene sets, we identify an average of 17.2 (max = 405) genes that underlie these gene set associations.


Subject(s)
Biological Specimen Banks , Data Interpretation, Statistical , Databases, Genetic , Datasets as Topic , Genome-Wide Association Study/methods , Phenotype , ATP-Binding Cassette Transporters/genetics , Computer Simulation , Gene Expression/genetics , Humans , Research Design , Time Factors , United Kingdom , Web Browser
8.
PLoS Genet ; 17(9): e1009670, 2021 09.
Article in English | MEDLINE | ID: mdl-34529658

ABSTRACT

Polygenic risk scores (PRS) can provide useful information for personalized risk stratification and disease risk assessment, especially when combined with non-genetic risk factors. However, their construction depends on the availability of summary statistics from genome-wide association studies (GWAS) independent from the target sample. For best compatibility, it was reported that GWAS and the target sample should match in terms of ancestries. Yet, GWAS, especially in the field of cancer, often lack diversity and are predominated by European ancestry. This bias is a limiting factor in PRS research. By using electronic health records and genetic data from the UK Biobank, we contrast the utility of breast and prostate cancer PRS derived from external European-ancestry-based GWAS across African, East Asian, European, and South Asian ancestry groups. We highlight differences in the PRS distributions of these groups that are amplified when PRS methods condense hundreds of thousands of variants into a single score. While European-GWAS-derived PRS were not directly transferrable across ancestries on an absolute scale, we establish their predictive potential when considering them separately within each group. For example, the top 10% of the breast cancer PRS distributions within each ancestry group each revealed significant enrichments of breast cancer cases compared to the bottom 90% (odds ratio of 2.81 [95%CI: 2.69,2.93] in European, 2.88 [1.85, 4.48] in African, 2.60 [1.25, 5.40] in East Asian, and 2.33 [1.55, 3.51] in South Asian individuals). Our findings highlight a compromise solution for PRS research to compensate for the lack of diversity in well-powered European GWAS efforts while recruitment of diverse participants in the field catches up.


Subject(s)
Breast Neoplasms/genetics , Genetic Predisposition to Disease , Multifactorial Inheritance , Female , Genome-Wide Association Study , Humans
9.
Hum Mol Genet ; 30(21): 2027-2039, 2021 10 13.
Article in English | MEDLINE | ID: mdl-33961016

ABSTRACT

Circulating cardiac troponin proteins are associated with structural heart disease and predict incident cardiovascular disease in the general population. However, the genetic contribution to cardiac troponin I (cTnI) concentrations and its causal effect on cardiovascular phenotypes are unclear. We combine data from two large population-based studies, the Trøndelag Health Study and the Generation Scotland Scottish Family Health Study, and perform a genome-wide association study of high-sensitivity cTnI concentrations with 48 115 individuals. We further use two-sample Mendelian randomization to investigate the causal effects of circulating cTnI on acute myocardial infarction (AMI) and heart failure (HF). We identified 12 genetic loci (8 novel) associated with cTnI concentrations. Associated protein-altering variants highlighted putative functional genes: CAND2, HABP2, ANO5, APOH, FHOD3, TNFAIP2, KLKB1 and LMAN1. Phenome-wide association tests in 1688 phecodes and 83 continuous traits in UK Biobank showed associations between a genetic risk score for cTnI and cardiac arrhythmias, metabolic and anthropometric measures. Using two-sample Mendelian randomization, we confirmed the non-causal role of cTnI in AMI (5948 cases, 355 246 controls). We found indications for a causal role of cTnI in HF (47 309 cases and 930 014 controls), but this was not supported by secondary analyses using left ventricular mass as outcome (18 257 individuals). Our findings clarify the biology underlying the heritable contribution to circulating cTnI and support cTnI as a non-causal biomarker for AMI in the general population. Using genetically informed methods for causal inference helps inform the role and value of measuring cTnI in the general population.


Subject(s)
Biomarkers , Genetics, Population , Genome-Wide Association Study , Troponin I/genetics , Alleles , Chromosome Mapping , Gene Expression , Genetic Variation , Mendelian Randomization Analysis , Organ Specificity , Quantitative Trait Loci , Troponin T/genetics
10.
Am J Hum Genet ; 107(2): 222-233, 2020 08 06.
Article in English | MEDLINE | ID: mdl-32589924

ABSTRACT

With increasing biobanking efforts connecting electronic health records and national registries to germline genetics, the time-to-event data analysis has attracted increasing attention in the genetics studies of human diseases. In time-to-event data analysis, the Cox proportional hazards (PH) regression model is one of the most used approaches. However, existing methods and tools are not scalable when analyzing a large biobank with hundreds of thousands of samples and endpoints, and they are not accurate when testing low-frequency and rare variants. Here, we propose a scalable and accurate method, SPACox (a saddlepoint approximation implementation based on the Cox PH regression model), that is applicable for genome-wide scale time-to-event data analysis. SPACox requires fitting a Cox PH regression model only once across the genome-wide analysis and then uses a saddlepoint approximation (SPA) to calibrate the test statistics. Simulation studies show that SPACox is 76-252 times faster than other existing alternatives, such as gwasurvivr, 185-511 times faster than the standard Wald test, and more than 6,000 times faster than the Firth correction and can control type I error rates at the genome-wide significance level regardless of minor allele frequencies. Through the analysis of UK Biobank inpatient data of 282,871 white British European ancestry samples, we show that SPACox can efficiently analyze large sample sizes and accurately control type I error rates. We identified 611 loci associated with time-to-event phenotypes of 12 common diseases, of which 38 loci would be missed within a logistic regression framework with a binary phenotype defined as event occurrence status during the follow-up period.


Subject(s)
Genome-Wide Association Study/methods , Biological Specimen Banks , Case-Control Studies , Data Analysis , Gene Frequency/genetics , Humans , Logistic Models , Phenotype , Proportional Hazards Models , Sample Size , United Kingdom , White People/genetics
11.
Am J Hum Genet ; 106(1): 3-12, 2020 01 02.
Article in English | MEDLINE | ID: mdl-31866045

ABSTRACT

In biobank data analysis, most binary phenotypes have unbalanced case-control ratios, and this can cause inflation of type I error rates. Recently, a saddle point approximation (SPA) based single-variant test has been developed to provide an accurate and scalable method to test for associations of such phenotypes. For gene- or region-based multiple-variant tests, a few methods exist that can adjust for unbalanced case-control ratios; however, these methods are either less accurate when case-control ratios are extremely unbalanced or not scalable for large data analyses. To address these problems, we propose SKAT- and SKAT-O- type region-based tests; in these tests, the single-variant score statistic is calibrated based on SPA and efficient resampling (ER). Through simulation studies, we show that the proposed method provides well-calibrated p values. In contrast, when the case-control ratio is 1:99, the unadjusted approach has greatly inflated type I error rates (90 times that of exome-wide sequencing α = 2.5 × 10-6). Additionally, the proposed method has similar computation time to the unadjusted approaches and is scalable for large sample data. In our application, the UK Biobank whole-exome sequence data analysis of 45,596 unrelated European samples and 791 PheCode phenotypes identified 10 rare-variant associations with p value < 10-7, including the associations between JAK2 and myeloproliferative disease, HOXB13 and cancer of prostate, and F11 and congenital coagulation defects. All analysis summary results are publicly available through a web-based visual server, and this availability can help facilitate the identification of the genetic basis of complex diseases.


Subject(s)
Biological Specimen Banks , Exome Sequencing/methods , Exome/genetics , Genome-Wide Association Study , Phenomics , Polymorphism, Single Nucleotide , Case-Control Studies , Computer Simulation , Humans , Numerical Analysis, Computer-Assisted , Phenotype , United Kingdom
12.
Am J Hum Genet ; 107(5): 815-836, 2020 11 05.
Article in English | MEDLINE | ID: mdl-32991828

ABSTRACT

To facilitate scientific collaboration on polygenic risk scores (PRSs) research, we created an extensive PRS online repository for 35 common cancer traits integrating freely available genome-wide association studies (GWASs) summary statistics from three sources: published GWASs, the NHGRI-EBI GWAS Catalog, and UK Biobank-based GWASs. Our framework condenses these summary statistics into PRSs using various approaches such as linkage disequilibrium pruning/p value thresholding (fixed or data-adaptively optimized thresholds) and penalized, genome-wide effect size weighting. We evaluated the PRSs in two biobanks: the Michigan Genomics Initiative (MGI), a longitudinal biorepository effort at Michigan Medicine, and the population-based UK Biobank (UKB). For each PRS construct, we provide measures on predictive performance and discrimination. Besides PRS evaluation, the Cancer-PRSweb platform features construct downloads and phenome-wide PRS association study results (PRS-PheWAS) for predictive PRSs. We expect this integrated platform to accelerate PRS-related cancer research.


Subject(s)
Biological Specimen Banks/statistics & numerical data , Genetic Predisposition to Disease , Genome, Human , Genomics/methods , Multifactorial Inheritance , Neoplasms/genetics , Adult , Aged , Female , Genome-Wide Association Study , Humans , Internet , Linkage Disequilibrium , Male , Middle Aged , Neoplasms/classification , Neoplasms/diagnosis , Neoplasms/epidemiology , Phenotype , Quantitative Trait, Heritable , Risk Factors , United Kingdom/epidemiology , United States/epidemiology
13.
PLoS Genet ; 16(11): e1009077, 2020 11.
Article in English | MEDLINE | ID: mdl-33175840

ABSTRACT

Phenotypes extracted from Electronic Health Records (EHRs) are increasingly prevalent in genetic studies. EHRs contain hundreds of distinct clinical laboratory test results, providing a trove of health data beyond diagnoses. Such lab data is complex and lacks a ubiquitous coding scheme, making it more challenging than diagnosis data. Here we describe the first large-scale cross-health system genome-wide association study (GWAS) of EHR-based quantitative laboratory-derived phenotypes. We meta-analyzed 70 lab traits matched between the BioVU cohort from the Vanderbilt University Health System and the Michigan Genomics Initiative (MGI) cohort from Michigan Medicine. We show high replication of known association for these traits, validating EHR-based measurements as high-quality phenotypes for genetic analysis. Notably, our analysis provides the first replication for 699 previous GWAS associations across 46 different traits. We discovered 31 novel associations at genome-wide significance for 22 distinct traits, including the first reported associations for two lab-based traits. We replicated 22 of these novel associations in an independent tranche of BioVU samples. The summary statistics for all association tests are freely available to benefit other researchers. Finally, we performed mirrored analyses in BioVU and MGI to assess competing analytic practices for EHR lab traits. We find that using the mean of all available lab measurements provides a robust summary value, but alternate summarizations can improve power in certain circumstances. This study provides a proof-of-principle for cross health system GWAS and is a framework for future studies of quantitative EHR lab traits.


Subject(s)
Electronic Health Records/statistics & numerical data , Genetic Association Studies/methods , Genome-Wide Association Study/methods , Biological Specimen Banks , Cohort Studies , Electronic Health Records/trends , Genomics , Humans , Michigan , Phenotype , Polymorphism, Single Nucleotide/genetics , Quantitative Trait, Heritable
14.
PLoS Genet ; 16(6): e1008725, 2020 06.
Article in English | MEDLINE | ID: mdl-32603359

ABSTRACT

Risk factors that contribute to inter-individual differences in the age-of-onset of allergic diseases are poorly understood. The aim of this study was to identify genetic risk variants associated with the age at which symptoms of allergic disease first develop, considering information from asthma, hay fever and eczema. Self-reported age-of-onset information was available for 117,130 genotyped individuals of European ancestry from the UK Biobank study. For each individual, we identified the earliest age at which asthma, hay fever and/or eczema was first diagnosed and performed a genome-wide association study (GWAS) of this combined age-of-onset phenotype. We identified 50 variants with a significant independent association (P<3x10-8) with age-of-onset. Forty-five variants had comparable effects on the onset of the three individual diseases and 38 were also associated with allergic disease case-control status in an independent study (n = 222,484). We observed a strong negative genetic correlation between age-of-onset and case-control status of allergic disease (rg = -0.63, P = 4.5x10-61), indicating that cases with early disease onset have a greater burden of allergy risk alleles than those with late disease onset. Subsequently, a multivariate GWAS of age-of-onset and case-control status identified a further 26 associations that were missed by the univariate analyses of age-of-onset or case-control status only. Collectively, of the 76 variants identified, 18 represent novel associations for allergic disease. We identified 81 likely target genes of the 76 associated variants based on information from expression quantitative trait loci (eQTL) and non-synonymous variants, of which we highlight ADAM15, FOSL2, TRIM8, BMPR2, CD200R1, PRKCQ, NOD2, SMAD4, ABCA7 and UBE2L3. Our results support the notion that early and late onset allergic disease have partly distinct genetic architectures, potentially explaining known differences in pathophysiology between individuals.


Subject(s)
Asthma/genetics , Eczema/genetics , Polymorphism, Single Nucleotide , Rhinitis, Allergic, Seasonal/genetics , Adolescent , Adult , Age of Onset , Aged , Asthma/pathology , Child , Eczema/pathology , Female , Genetic Loci , Genome-Wide Association Study/methods , Humans , Male , Middle Aged , Rhinitis, Allergic, Seasonal/pathology
15.
J Infect Dis ; 226(9): 1593-1607, 2022 11 01.
Article in English | MEDLINE | ID: mdl-35429399

ABSTRACT

BACKGROUND: This study aims to examine the worldwide prevalence of post-coronavirus disease 2019 (COVID-19) condition, through a systematic review and meta-analysis. METHODS: PubMed, Embase, and iSearch were searched on July 5, 2021 with verification extending to March 13, 2022. Using a random-effects framework with DerSimonian-Laird estimator, we meta-analyzed post-COVID-19 condition prevalence at 28+ days from infection. RESULTS: Fifty studies were included, and 41 were meta-analyzed. Global estimated pooled prevalence of post-COVID-19 condition was 0.43 (95% confidence interval [CI], .39-.46). Hospitalized and nonhospitalized patients had estimates of 0.54 (95% CI, .44-.63) and 0.34 (95% CI, .25-.46), respectively. Regional prevalence estimates were Asia (0.51; 95% CI, .37-.65), Europe (0.44; 95% CI, .32-.56), and United States of America (0.31; 95% CI, .21-.43). Global prevalence for 30, 60, 90, and 120 days after infection were estimated to be 0.37 (95% CI, .26-.49), 0.25 (95% CI, .15-.38), 0.32 (95% CI, .14-.57), and 0.49 (95% CI, .40-.59), respectively. Fatigue was the most common symptom reported with a prevalence of 0.23 (95% CI, .17-.30), followed by memory problems (0.14; 95% CI, .10-.19). CONCLUSIONS: This study finds post-COVID-19 condition prevalence is substantial; the health effects of COVID-19 seem to be prolonged and can exert stress on the healthcare system.


Subject(s)
COVID-19 , Coronavirus Infections , Pneumonia, Viral , Humans , Pneumonia, Viral/epidemiology , Coronavirus Infections/epidemiology , Pandemics , Prevalence , Post-Acute COVID-19 Syndrome
16.
Am J Hum Genet ; 105(6): 1182-1192, 2019 12 05.
Article in English | MEDLINE | ID: mdl-31735295

ABSTRACT

The etiology of most complex diseases involves genetic variants, environmental factors, and gene-environment interaction (G × E) effects. Compared with marginal genetic association studies, G × E analysis requires more samples and detailed measure of environmental exposures, and this limits the possible discoveries. Large-scale population-based biobanks with detailed phenotypic and environmental information, such as UK-Biobank, can be ideal resources for identifying G × E effects. However, due to the large computation cost and the presence of case-control imbalance, existing methods often fail. Here we propose a scalable and accurate method, SPAGE (SaddlePoint Approximation implementation of G × E analysis), that is applicable for genome-wide scale phenome-wide G × E studies. SPAGE fits a genotype-independent logistic model only once across the genome-wide analysis in order to reduce computation cost, and SPAGE uses a saddlepoint approximation (SPA) to calibrate the test statistics for analysis of phenotypes with unbalanced case-control ratios. Simulation studies show that SPAGE is 33-79 times faster than the Wald test and 72-439 times faster than the Firth's test, and SPAGE can control type I error rates at the genome-wide significance level even when case-control ratios are extremely unbalanced. Through the analysis of UK-Biobank data of 344,341 white British European-ancestry samples, we show that SPAGE can efficiently analyze large samples while controlling for unbalanced case-control ratios.


Subject(s)
Biological Specimen Banks , Gene-Environment Interaction , Genetic Diseases, Inborn/genetics , Genome-Wide Association Study , Polymorphism, Single Nucleotide , Quantitative Trait, Heritable , Case-Control Studies , Female , Genetic Diseases, Inborn/epidemiology , Humans , Logistic Models , Male , Phenomics , Phenotype , United Kingdom/epidemiology
17.
Am J Hum Genet ; 105(1): 65-77, 2019 07 03.
Article in English | MEDLINE | ID: mdl-31204010

ABSTRACT

The Genes for Good study uses social media to engage a large, diverse participant pool in genetics research and education. Health history and daily tracking surveys are administered through a Facebook application, and participants who complete a minimum number of surveys are mailed a saliva sample kit ("spit kit") to collect DNA for genotyping. As of March 2019, we engaged >80,000 individuals, sent spit kits to >32,000 individuals who met minimum participation requirements, and collected >27,000 spit kits. Participants come from all 50 states and include a diversity of ancestral backgrounds. Rates of important chronic health indicators are consistent with those estimated for the general U.S. population using more traditional study designs. However, our sample is younger and contains a greater percentage of females than the general population. As one means of verifying data quality, we have replicated genome-wide association studies (GWASs) for exemplar traits, such as asthma, diabetes, body mass index (BMI), and pigmentation. The flexible framework of the web application makes it relatively simple to add new questionnaires and for other researchers to collaborate. We anticipate that the study sample will continue to grow and that future analyses may further capitalize on the strengths of the longitudinal data in combination with genetic information.


Subject(s)
Genes/genetics , Genetic Markers , Genome-Wide Association Study , Polymorphism, Single Nucleotide , Research Design , Social Media , Adolescent , Adult , Diabetes Mellitus/diagnosis , Diabetes Mellitus/genetics , Female , Humans , Hypertension/diagnosis , Hypertension/genetics , Male , Middle Aged , Public Health , Surveys and Questionnaires , Young Adult
18.
J Biomed Inform ; 136: 104237, 2022 12.
Article in English | MEDLINE | ID: mdl-36283580

ABSTRACT

BACKGROUND: Post COVID-19 condition (PCC) is known to affect a large proportion of COVID-19 survivors. Robust study design and methods are needed to understand post-COVID-19 diagnosis patterns in all survivors, not just those clinically diagnosed with PCC. METHODS: We applied a case-crossover Phenome-Wide Association Study (PheWAS) in a retrospective cohort of COVID-19 survivors, comparing the occurrences of 1,671 diagnosis-based phenotype codes (PheCodes) pre- and post-COVID-19 infection periods in the same individual using a conditional logistic regression. We studied how this pattern varied by COVID-19 severity and vaccination status, and we compared to test negative and test negative but flu positive controls. RESULTS: In 44,198 SARS-CoV-2-positive patients, we foundenrichment in respiratory,circulatory, and mental health disorders post-COVID-19-infection. Top hits included anxiety disorder (p = 2.8e-109, OR = 1.7 [95 % CI: 1.6-1.8]), cardiac dysrhythmias (p = 4.9e-87, OR = 1.7 [95 % CI: 1.6-1.8]), and respiratory failure, insufficiency, arrest (p = 5.2e-75, OR = 2.9 [95 % CI: 2.6-3.3]). In severe patients, we found stronger associations with respiratory and circulatory disorders compared to mild/moderate patients. Fully vaccinated patients had mental health and chronic circulatory diseases rise to the top of the association list, similar to the mild/moderate cohort. Both control groups (test negative, test negative and flu positive) showed a different pattern of hits to SARS-CoV-2 positives. CONCLUSIONS: Patients experience myriad symptoms more than 28 days after SARS-CoV-2 infection, but especially respiratory, circulatory, and mental health disorders. Our case-crossover PheWAS approach controls for within-person confounders that are time-invariant. Comparison to test negatives and test negative but flu positive patients with a similar design helped identify enrichment specific to COVID-19. This design may be applied other emerging diseases with long-lasting effects other than a SARS-CoV-2 infection. Given the potential for bias from observational data, these results should be considered exploratory. As we look into the future, we must be aware of COVID-19 survivors' healthcare needs.


Subject(s)
COVID-19 , Humans , COVID-19/diagnosis , SARS-CoV-2 , COVID-19 Testing , Retrospective Studies , Case-Control Studies
19.
PLoS Genet ; 15(6): e1008202, 2019 06.
Article in English | MEDLINE | ID: mdl-31194742

ABSTRACT

Polygenic risk scores (PRS) are designed to serve as single summary measures that are easy to construct, condensing information from a large number of genetic variants associated with a disease. They have been used for stratification and prediction of disease risk. The primary focus of this paper is to demonstrate how we can combine PRS and electronic health records data to better understand the shared and unique genetic architecture and etiology of disease subtypes that may be both related and heterogeneous. PRS construction strategies often depend on the purpose of the study, the available data/summary estimates, and the underlying genetic architecture of a disease. We consider several choices for constructing a PRS using data obtained from various publicly-available sources including the UK Biobank and evaluate their abilities to predict not just the primary phenotype but also secondary phenotypes derived from electronic health records (EHR). This study was conducted using data from 30,702 unrelated, genotyped patients of recent European descent from the Michigan Genomics Initiative (MGI), a longitudinal biorepository effort within Michigan Medicine. We examine the three most common skin cancer subtypes in the USA: basal cell carcinoma, cutaneous squamous cell carcinoma, and melanoma. Using these PRS for various skin cancer subtypes, we conduct a phenome-wide association study (PheWAS) within the MGI data to evaluate PRS associations with secondary traits. PheWAS results are then replicated using population-based UK Biobank data and compared across various PRS construction methods. We develop an accompanying visual catalog called PRSweb that provides detailed PheWAS results and allows users to directly compare different PRS construction methods.


Subject(s)
Genetic Predisposition to Disease , Genomics , Multifactorial Inheritance/genetics , Skin Neoplasms/genetics , Biological Specimen Banks , Electronic Health Records , Genome-Wide Association Study , Genotype , Humans , Michigan/epidemiology , Phenotype , Polymorphism, Single Nucleotide/genetics , Risk Factors , Skin Neoplasms/pathology , United Kingdom/epidemiology
20.
Gut ; 2021 Apr 22.
Article in English | MEDLINE | ID: mdl-33888516

ABSTRACT

OBJECTIVE: Haemorrhoidal disease (HEM) affects a large and silently suffering fraction of the population but its aetiology, including suspected genetic predisposition, is poorly understood. We report the first genome-wide association study (GWAS) meta-analysis to identify genetic risk factors for HEM to date. DESIGN: We conducted a GWAS meta-analysis of 218 920 patients with HEM and 725 213 controls of European ancestry. Using GWAS summary statistics, we performed multiple genetic correlation analyses between HEM and other traits as well as calculated HEM polygenic risk scores (PRS) and evaluated their translational potential in independent datasets. Using functional annotation of GWAS results, we identified HEM candidate genes, which differential expression and coexpression in HEM tissues were evaluated employing RNA-seq analyses. The localisation of expressed proteins at selected loci was investigated by immunohistochemistry. RESULTS: We demonstrate modest heritability and genetic correlation of HEM with several other diseases from the GI, neuroaffective and cardiovascular domains. HEM PRS validated in 180 435 individuals from independent datasets allowed the identification of those at risk and correlated with younger age of onset and recurrent surgery. We identified 102 independent HEM risk loci harbouring genes whose expression is enriched in blood vessels and GI tissues, and in pathways associated with smooth muscles, epithelial and endothelial development and morphogenesis. Network transcriptomic analyses highlighted HEM gene coexpression modules that are relevant to the development and integrity of the musculoskeletal and epidermal systems, and the organisation of the extracellular matrix. CONCLUSION: HEM has a genetic component that predisposes to smooth muscle, epithelial and connective tissue dysfunction.

SELECTION OF CITATIONS
SEARCH DETAIL