Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 16 de 16
Filter
1.
Nat Commun ; 15(1): 5357, 2024 Jun 25.
Article in English | MEDLINE | ID: mdl-38918381

ABSTRACT

Large national-level electronic health record (EHR) datasets offer new opportunities for disentangling the role of genes and environment through deep phenotype information and approximate pedigree structures. Here we use the approximate geographical locations of patients as a proxy for spatially correlated community-level environmental risk factors. We develop a spatial mixed linear effect (SMILE) model that incorporates both genetics and environmental contribution. We extract EHR and geographical locations from 257,620 nuclear families and compile 1083 disease outcome measurements from the MarketScan dataset. We augment the EHR with publicly available environmental data, including levels of particulate matter 2.5 (PM2.5), nitrogen dioxide (NO2), climate, and sociodemographic data. We refine the estimates of genetic heritability and quantify community-level environmental contributions. We also use wind speed and direction as instrumental variables to assess the causal effects of air pollution. In total, we find PM2.5 or NO2 have statistically significant causal effects on 135 diseases, including respiratory, musculoskeletal, digestive, metabolic, and sleep disorders, where PM2.5 and NO2 tend to affect biologically distinct disease categories. These analyses showcase several robust strategies for jointly modeling genetic and environmental effects on disease risk using large EHR datasets and will benefit upcoming biobank studies in the era of precision medicine.


Subject(s)
Air Pollution , Nitrogen Dioxide , Particulate Matter , Humans , Air Pollution/adverse effects , Particulate Matter/adverse effects , Nitrogen Dioxide/adverse effects , Nitrogen Dioxide/analysis , Risk Factors , Environmental Exposure/adverse effects , Male , Female , Electronic Health Records , Air Pollutants/adverse effects , Air Pollutants/analysis , Air Pollutants/toxicity , Genetic Predisposition to Disease , Gene-Environment Interaction , Middle Aged , Adult
2.
Nat Commun ; 15(1): 4260, 2024 May 20.
Article in English | MEDLINE | ID: mdl-38769300

ABSTRACT

Transcriptome-wide association study (TWAS) is a popular approach to dissect the functional consequence of disease associated non-coding variants. Most existing TWAS use bulk tissues and may not have the resolution to reveal cell-type specific target genes. Single-cell expression quantitative trait loci (sc-eQTL) datasets are emerging. The largest bulk- and sc-eQTL datasets are most conveniently available as summary statistics, but have not been broadly utilized in TWAS. Here, we present a new method EXPRESSO (EXpression PREdiction with Summary Statistics Only), to analyze sc-eQTL summary statistics, which also integrates 3D genomic data and epigenomic annotation to prioritize causal variants. EXPRESSO substantially improves existing methods. We apply EXPRESSO to analyze multi-ancestry GWAS datasets for 14 autoimmune diseases. EXPRESSO uniquely identifies 958 novel gene x trait associations, which is 26% more than the second-best method. Among them, 492 are unique to cell type level analysis and missed by TWAS using whole blood. We also develop a cell type aware drug repurposing pipeline, which leverages EXPRESSO results to identify drug compounds that can reverse disease gene expressions in relevant cell types. Our results point to multiple drugs with therapeutic potentials, including metformin for type 1 diabetes, and vitamin K for ulcerative colitis.


Subject(s)
Genome-Wide Association Study , Quantitative Trait Loci , Single-Cell Analysis , Humans , Single-Cell Analysis/methods , Genome-Wide Association Study/methods , Genetic Predisposition to Disease/genetics , Transcriptome/genetics , Autoimmune Diseases/genetics , Polymorphism, Single Nucleotide , Multifactorial Inheritance/genetics , Gene Expression Profiling/methods
3.
Adv Nutr ; 15(5): 100217, 2024 05.
Article in English | MEDLINE | ID: mdl-38579971

ABSTRACT

Despite the widely recommended usage of partially hydrolyzed formula (PHF) or extensively hydrolyzed formula (EHF) of milk protein for preventing allergic diseases (ADs), clinical studies have been inconclusive regarding their efficacy compared with that of cow's milk formula (CMF) or breast milk (BM). We aimed to systematically evaluate the effects of PHF or EHF compared with those of CMF or BM on risk of ADs (cow's milk allergy, allergic rhinitis, eczema, asthma, wheeze, food allergy, and sensitization) in children. We searched PubMed, Embase, Cochrane Library, and Web of Science for clinical trials published from inception to 21 October, 2022. We used the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) approach to grade the strength of evidence. Overall, 24 trials (10,950 infants) were included, 17 of which specifically included high-risk infants. GRADE was low for the evidence that, compared with CMF, infants early fed with EHF had lower risk of cow's milk allergy at age 0-2 y [relative risk (RR): 0.62; 95% CI: 0.39, 0.99]. Moderate evidence supported that PHF and EHF reduced risk of eczema in children aged younger or older than 2 y, respectively (RR: 0.71; 95% CI: 0.52, 0.96; and RR: 0.79; 95% CI: 0.67, 0.94, respectively). We also identified moderate systematic evidence indicating that PHF reduced risk of wheeze at age 0-2 y compared with CMF (RR: 0.50; 95% CI: 0.29, 0.85), but PHF and EHF increased the risk compared with BM (RR: 1.61; 95% CI: 1.11, 2.31; and RR: 1.64; 95% CI: 1.26, 2.14). Neither PHF nor EHF had significant effects on other ADs in children of any age. In conclusion, compared with CMF, PHF, or EHF had different preventive effect on cow's milk allergy, eczema, and wheeze. Compared with BM, both PHF and EHF may increase risk of wheeze but not other ADs. Given that most trials included only high-risk infants, more research on non-high-risk infants is warranted before any generalization is attempted. This protocol was registered at PROSPERO as CRD42022320787.


Subject(s)
Infant Formula , Milk Hypersensitivity , Milk Proteins , Humans , Infant , Milk Proteins/administration & dosage , Infant Formula/chemistry , Milk Hypersensitivity/prevention & control , Infant, Newborn , Animals , Milk , Child, Preschool , Cattle , Clinical Trials as Topic , Protein Hydrolysates/administration & dosage , Hypersensitivity/prevention & control , Female , Male , Milk, Human/chemistry , Eczema/prevention & control , Randomized Controlled Trials as Topic
4.
Environ Health Perspect ; 132(4): 47010, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38630604

ABSTRACT

BACKGROUND: Polyunsaturated fatty acids (PUFAs) have been shown to protect against fine particulate matter <2.5µm in aerodynamic diameter (PM2.5)-induced hazards. However, limited evidence is available for respiratory health, particularly in pregnant women and their offspring. OBJECTIVES: We aimed to investigate the association of prenatal exposure to PM2.5 and its chemical components with allergic rhinitis (AR) in children and explore effect modification by maternal erythrocyte PUFAs. METHODS: This prospective birth cohort study involved 657 mother-child pairs from Guangzhou, China. Prenatal exposure to residential PM2.5 mass and its components [black carbon (BC), organic matter (OM), sulfate (SO42-), nitrate (NO3-), and ammonium (NH4+)] were estimated by an established spatiotemporal model. Maternal erythrocyte PUFAs during pregnancy were measured using gas chromatography. The diagnosis of AR and report of AR symptoms in children were assessed up to 2 years of age. We used Cox regression with the quantile-based g-computation approach to assess the individual and joint effects of PM2.5 components and examine the modification effects of maternal PUFA levels. RESULTS: Approximately 5.33% and 8.07% of children had AR and related symptoms, respectively. The average concentration of prenatal PM2.5 was 35.50±5.31 µg/m3. PM2.5 was positively associated with the risk of developing AR [hazard ratio (HR)=1.85; 95% confidence interval (CI): 1.16, 2.96 per 5 µg/m3] and its symptoms (HR=1.79; 95% CI: 1.22, 2.62 per 5 µg/m3) after adjustment for confounders. Similar associations were observed between individual PM2.5 components and AR outcomes. Each quintile change in a mixture of components was associated with an adjusted HR of 3.73 (95% CI: 1.80, 7.73) and 2.69 (95% CI: 1.55, 4.67) for AR and AR symptoms, with BC accounting for the largest contribution. Higher levels of n-3 docosapentaenoic acid and lower levels of n-6 linoleic acid showed alleviating effects on AR symptoms risk associated with exposure to PM2.5 and its components. CONCLUSION: Prenatal exposure to PM2.5 and its chemical components, particularly BC, was associated with AR/symptoms in early childhood. We highlight that PUFA biomarkers could modify the adverse effects of PM2.5 on respiratory allergy. https://doi.org/10.1289/EHP13524.


Subject(s)
Air Pollutants , Air Pollution , Prenatal Exposure Delayed Effects , Rhinitis, Allergic , Humans , Female , Child, Preschool , Pregnancy , Particulate Matter/analysis , Cohort Studies , Air Pollutants/analysis , Prenatal Exposure Delayed Effects/chemically induced , Prospective Studies , Fatty Acids, Unsaturated/analysis , Rhinitis, Allergic/chemically induced , China , Air Pollution/analysis , Environmental Exposure/analysis
5.
Nutr Rev ; 2023 Oct 31.
Article in English | MEDLINE | ID: mdl-37930102

ABSTRACT

CONTEXT: Although the nutritional composition of organic food has been thoroughly researched, there is a dearth of published data relating to its impact on human health. OBJECTIVE: This systematic review aimed to examine the association between organic food intake and health effects, including changes in in vivo biomarkers, disease prevalence, and functional changes. DATA SOURCES: PubMed, EMBASE, Web of Science, the Cochrane Library, and ClinicalTrials.gov were searched from inception through Nov 13, 2022. DATA EXTRACTION: Both observational and interventional studies conducted in human populations were included, and association between level of organic food intake and each outcome was quantified as "no association," "inconsistent," "beneficial correlation/harmful correlation," or "insufficient". For outcomes with sufficient data reported by at least 3 studies, meta-analyses were conducted, using random-effects models to calculate standardized mean differences. DATA ANALYSIS: Based on the included 23 observational and 27 interventional studies, the association between levels of organic food intake and (i) pesticide exposure biomarker was assessed as "beneficial correlation," (ii) toxic metals and carotenoids in the plasma was assessed as "no association," (iii) fatty acids in human milk was assessed as "insufficient," (iv) phenolics was assessed as "beneficial", and serum parameters and antioxidant status was assessed as "inconsistent". For diseases and functional changes, there was an overall "beneficial" association with organic food intake, and there were similar findings for obesity and body mass index. However, evidence for association of organic food intake with other single diseases was assessed as "insufficient" due to the limited number and extent of studies. CONCLUSION: Organic food intake was found to have a beneficial impact in terms of reducing pesticide exposure, and the general effect on disease and functional changes (body mass index, male sperm quality) was appreciable. More long-term studies are required, especially for single diseases. SYSTEMATIC REVIEW REGISTRATION: PROSPERO registration no. CRD42022350175.

6.
Food Funct ; 14(17): 7938-7945, 2023 Aug 29.
Article in English | MEDLINE | ID: mdl-37552113

ABSTRACT

Background: Previous studies on prenatal polyunsaturated fatty acids (PUFAs) and children's neurodevelopment have shown inconsistent results, and evidence from the Asian population is scarce. Objective: To investigate the association between maternal erythrocyte PUFAs and neurodevelopment in children in the Chinese population. Methods: We included 242 mother-child pairs from the Yuexiu birth cohort. The composition of maternal erythrocyte fatty acids during pregnancy was measured by gas chromatography. Each PUFA was divided into 3 tertiles. Neurodevelopment in children was evaluated with the Ages and Stages Questionnaire at 2 years of age, including 5 domains of development: communication, gross motor, fine motor, problem solving, and personal-social skills. Results: Maternal eicosapentaenoic acid (EPA) [OR (95% CI): 0.34 (0.15, 0.74) for tertile 2, and 0.31 (0.13, 0.70) for tertile 3] was associated with a reduced risk of potential developmental delay in gross motor skills. Conversely, arachidonic acid (AA) [OR (95% CI): 2.54 (1.17, 5.70) for tertile 3] was associated with an increased risk of potential developmental delay in personal-social skills. The ratio of AA/EPA [OR (95% CI): 2.64 (1.18, 6.15) for tertile 3] was associated with an increased risk of potential developmental delay in gross motor skills. No significant association was found between other PUFAs and neurodevelopment. Conclusion: This birth cohort has first shown a beneficial association between maternal EPA and gross motor skills of children. Meanwhile, maternal AA and the ratio of AA/EPA have negative associations with neurodevelopment in children.


Subject(s)
Eicosapentaenoic Acid , Fatty Acids, Unsaturated , Pregnancy , Female , Humans , Cohort Studies
7.
World J Pediatr ; 19(10): 972-982, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37029331

ABSTRACT

BACKGROUND: Previous studies have linked gestational diabetes (GDM) with allergies in offspring. However, the effect of specific glucose metabolism metrics was not well characterized, and the role of polyunsaturated fatty acids (PUFAs), a modifier of metabolism and the immune system, was understudied. We aimed to investigate the association between maternal GDM and allergic diseases in children and the interaction between glucose metabolism and PUFAs on allergic outcomes. METHODS: This prospective cohort study included 706 mother-child dyads from Guangzhou, China. Maternal GDM was diagnosed via a 75-g oral glucose tolerance test (OGTT), and dietary PUFAs were assessed using a validated food frequency questionnaire. Allergic disease diagnoses and the age of onset were obtained from medical records of children within three years old. RESULTS: Approximately 19.4% of women had GDM, and 51.3% of children had any allergic diseases. GDM was positively associated with any allergic diseases (hazard ratio [HR] 1.40; 95% confidence interval (CI) 1.05-1.88) and eczema (HR 1.44; 95% CI 1.02-1.97). A unit increase in OGTT after two hours (OGTT-2 h) glucose was associated with an 11% (95% CI 2%-21%) higher risk of any allergic diseases and a 17% (95% CI 1-36%) higher risk of food allergy. The positive associations between OGTT-2 h glucose and any allergic diseases were strengthened with decreased dietary a-linolenic acid (ALA) and increased n-6 PUFAs, linoleic acid (LA), LA/ALA ratio, and n-6/n-3 PUFA ratio. CONCLUSIONS: Maternal GDM was adversely associated with early-life allergic diseases, especially eczema. We were the first to identify OGTT-2 h glucose to be more sensitive in inducing allergy risk and that dietary PUFAs might modify the associations.


Subject(s)
Diabetes, Gestational , Eczema , Hypersensitivity , Pregnancy , Female , Humans , Child, Preschool , Diabetes, Gestational/diagnosis , Diabetes, Gestational/epidemiology , Cohort Studies , Prospective Studies , Hypersensitivity/epidemiology , Glucose
8.
Nat Commun ; 14(1): 668, 2023 02 07.
Article in English | MEDLINE | ID: mdl-36750564

ABSTRACT

Systemic lupus erythematosus is a heritable autoimmune disease that predominantly affects young women. To improve our understanding of genetic etiology, we conduct multi-ancestry and multi-trait meta-analysis of genome-wide association studies, encompassing 12 systemic lupus erythematosus cohorts from 3 different ancestries and 10 genetically correlated autoimmune diseases, and identify 16 novel loci. We also perform transcriptome-wide association studies, computational drug repurposing analysis, and cell type enrichment analysis. We discover putative drug classes, including a histone deacetylase inhibitor that could be repurposed to treat lupus. We also identify multiple cell types enriched with putative target genes, such as non-classical monocytes and B cells, which may be targeted for future therapeutics. Using this newly assembled result, we further construct polygenic risk score models and demonstrate that integrating polygenic risk score with clinical lab biomarkers improves the diagnostic accuracy of systemic lupus erythematosus using the Vanderbilt BioVU and Michigan Genomics Initiative biobanks.


Subject(s)
Autoimmune Diseases , Lupus Erythematosus, Systemic , Humans , Female , Genome-Wide Association Study , Genetic Predisposition to Disease , Phenotype , Polymorphism, Single Nucleotide
9.
Front Immunol ; 13: 889296, 2022.
Article in English | MEDLINE | ID: mdl-35833142

ABSTRACT

Genome-wide association studies (GWAS) have identified hundreds of genetic variants associated with autoimmune diseases and provided unique mechanistic insights and informed novel treatments. These individual genetic variants on their own typically confer a small effect of disease risk with limited predictive power; however, when aggregated (e.g., via polygenic risk score method), they could provide meaningful risk predictions for a myriad of diseases. In this review, we describe the recent advances in GWAS for autoimmune diseases and the practical application of this knowledge to predict an individual's susceptibility/severity for autoimmune diseases such as systemic lupus erythematosus (SLE) via the polygenic risk score method. We provide an overview of methods for deriving different polygenic risk scores and discuss the strategies to integrate additional information from correlated traits and diverse ancestries. We further advocate for the need to integrate clinical features (e.g., anti-nuclear antibody status) with genetic profiling to better identify patients at high risk of disease susceptibility/severity even before clinical signs or symptoms develop. We conclude by discussing future challenges and opportunities of applying polygenic risk score methods in clinical care.


Subject(s)
Autoimmune Diseases , Lupus Erythematosus, Systemic , Autoimmune Diseases/genetics , Genetic Predisposition to Disease , Genome-Wide Association Study/methods , Humans , Lupus Erythematosus, Systemic/diagnosis , Lupus Erythematosus, Systemic/genetics , Risk Factors
10.
Nat Commun ; 13(1): 3258, 2022 06 07.
Article in English | MEDLINE | ID: mdl-35672318

ABSTRACT

Transcriptome-wide association studies (TWAS) are popular approaches to test for association between imputed gene expression levels and traits of interest. Here, we propose an integrative method PUMICE (Prediction Using Models Informed by Chromatin conformations and Epigenomics) to integrate 3D genomic and epigenomic data with expression quantitative trait loci (eQTL) to more accurately predict gene expressions. PUMICE helps define and prioritize regions that harbor cis-regulatory variants, which outperforms competing methods. We further describe an extension to our method PUMICE +, which jointly combines TWAS results from single- and multi-tissue models. Across 79 traits, PUMICE + identifies 22% more independent novel genes and increases median chi-square statistics values at known loci by 35% compared to the second-best method, as well as achieves the narrowest credible interval size. Lastly, we perform computational drug repurposing and confirm that PUMICE + outperforms other TWAS methods.


Subject(s)
Genome-Wide Association Study , Transcriptome , Drug Repositioning , Epigenomics , Genetic Predisposition to Disease , Genome-Wide Association Study/methods , Genomics , Humans , Polymorphism, Single Nucleotide , Transcriptome/genetics
11.
Nat Commun ; 12(1): 1964, 2021 03 30.
Article in English | MEDLINE | ID: mdl-33785739

ABSTRACT

Genome-wide association meta-analysis (GWAMA) is an effective approach to enlarge sample sizes and empower the discovery of novel associations between genotype and phenotype. Independent replication has been used as a gold-standard for validating genetic associations. However, as current GWAMA often seeks to aggregate all available datasets, it becomes impossible to find a large enough independent dataset to replicate new discoveries. Here we introduce a method, MAMBA (Meta-Analysis Model-based Assessment of replicability), for assessing the "posterior-probability-of-replicability" for identified associations by leveraging the strength and consistency of association signals between contributing studies. We demonstrate using simulations that MAMBA is more powerful and robust than existing methods, and produces more accurate genetic effects estimates. We apply MAMBA to a large-scale meta-analysis of addiction phenotypes with 1.2 million individuals. In addition to accurately identifying replicable common variant associations, MAMBA also pinpoints novel replicable rare variant associations from imputation-based GWAMA and hence greatly expands the set of analyzable variants.


Subject(s)
Algorithms , Computational Biology/methods , Genome-Wide Association Study/methods , Meta-Analysis as Topic , Models, Genetic , Polymorphism, Single Nucleotide , Genetic Association Studies/methods , Genotype , Phenotype , Reproducibility of Results , Sample Size , Software
12.
Bioinformatics ; 36(19): 4951-4954, 2020 12 08.
Article in English | MEDLINE | ID: mdl-32756942

ABSTRACT

SUMMARY: Here, we present a highly efficient R-package seqminer2 for querying and retrieving sequence variants from biobank scale datasets of millions of individuals and hundreds of millions of genetic variants. Seqminer2 implements a novel variant-based index for querying VCF/BCF files. It improves the speed of query and retrieval by several magnitudes compared to the state-of-the-art tools based upon tabix. It also reimplements support for BGEN and PLINK format, which improves speed over alternative implementations. The improved efficiency and comprehensive support for popular file formats will facilitate method development, software prototyping and data analysis of biobank scale sequence datasets in R. AVAILABILITY AND IMPLEMENTATION: The seqminer2 R package is available from https://github.com/zhanxw/seqminer. Scripts used for the benchmarks are available in https://github.com/yang-lina/seqminer/blob/master/seqminer2%20benchmark%20script.txt. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Biological Specimen Banks , Software , Genotype , Humans
13.
Genes (Basel) ; 11(5)2020 05 25.
Article in English | MEDLINE | ID: mdl-32466134

ABSTRACT

There is great interest in understanding the impact of rare variants in human diseases using large sequence datasets. In deep sequence datasets of >10,000 samples, ~10% of the variant sites are observed to be multi-allelic. Many of the multi-allelic variants have been shown to be functional and disease-relevant. Proper analysis of multi-allelic variants is critical to the success of a sequencing study, but existing methods do not properly handle multi-allelic variants and can produce highly misleading association results. We discuss practical issues and methods to encode multi-allelic sites, conduct single-variant and gene-level association analyses, and perform meta-analysis for multi-allelic variants. We evaluated these methods through extensive simulations and the study of a large meta-analysis of ~18,000 samples on the cigarettes-per-day phenotype. We showed that our joint modeling approach provided an unbiased estimate of genetic effects, greatly improved the power of single-variant association tests among methods that can properly estimate allele effects, and enhanced gene-level tests over existing approaches. Software packages implementing these methods are available online.


Subject(s)
Cigarette Smoking/genetics , Genetic Predisposition to Disease , Genome-Wide Association Study/statistics & numerical data , Rare Diseases/genetics , Alleles , Data Interpretation, Statistical , Female , Genetic Variation/genetics , Humans , Male , Phenotype , Polymorphism, Single Nucleotide/genetics , Rare Diseases/epidemiology , Rare Diseases/pathology
14.
Am J Alzheimers Dis Other Demen ; 35: 1533317520922392, 2020.
Article in English | MEDLINE | ID: mdl-32367740

ABSTRACT

Subjective cognitive decline (SCD) has been linked to Alzheimer's Disease in the literature. However, little is known about whether SCD is associated with social/emotional support (SES). To investigate this association, this study utilized the 2015 and 2016 Behavioral Risk Factor Surveillance System data. A study population of 17206 participants aged 45 years and older who responded to both the Emotional Support and Life Satisfaction survey module and the Cognition Decline survey module were included. Of this study population, 11.22% had SCD, and 21.83% reported insufficient SES. A much higher percentage of those with insufficient SES experienced SCD compared to those with sufficient SES (21.15% vs 8.45%, P < .0001). Insufficient SES was significantly associated with SCD (odds ratio = 1.68, 95% confidence interval: 1.37-2.06), after controlling for other factors. Furthermore, this study found certain demographic groups such as female, white, or married groups were more likely to receive sufficient SES.


Subject(s)
Cognitive Dysfunction/epidemiology , Cognitive Dysfunction/psychology , Emotions , Social Interaction , Aged , Alzheimer Disease/epidemiology , Alzheimer Disease/psychology , Behavioral Risk Factor Surveillance System , Female , Humans , Male , Marital Status , Middle Aged , Surveys and Questionnaires , White People
15.
Curr Protoc Hum Genet ; 101(1): e83, 2019 04.
Article in English | MEDLINE | ID: mdl-30849219

ABSTRACT

With the advent of Next Generation Sequencing (NGS) technologies, whole genome and whole exome DNA sequencing has become affordable for routine genetic studies. Coupled with improved genotyping arrays and genotype imputation methodologies, it is increasingly feasible to obtain rare genetic variant information in large datasets. Such datasets allow researchers to gain a more complete understanding of the genetic architecture of complex traits caused by rare variants. State-of-the-art statistical methods for the statistical genetics analysis of sequence-based association, including efficient algorithms for association analysis in biobank-scale datasets, gene-association tests, meta-analysis, fine mapping methods that integrate functional genomic dataset, and phenome-wide association studies (PheWAS), are reviewed here. These methods are expected to be highly useful for next generation statistical genetics analysis in the era of precision medicine. © 2019 by John Wiley & Sons, Inc.


Subject(s)
Genetic Predisposition to Disease , Genome, Human/genetics , Multifactorial Inheritance/genetics , Algorithms , Genome-Wide Association Study/methods , Genotype , High-Throughput Nucleotide Sequencing , Humans , Phenotype , Polymorphism, Single Nucleotide/genetics , Exome Sequencing/methods , Whole Genome Sequencing/methods
16.
PLoS Genet ; 14(7): e1007452, 2018 07.
Article in English | MEDLINE | ID: mdl-30016313

ABSTRACT

Meta-analysis of genetic association studies increases sample size and the power for mapping complex traits. Existing methods are mostly developed for datasets without missing values, i.e. the summary association statistics are measured for all variants in contributing studies. In practice, genotype imputation is not always effective. This may be the case when targeted genotyping/sequencing assays are used or when the un-typed genetic variant is rare. Therefore, contributed summary statistics often contain missing values. Existing methods for imputing missing summary association statistics and using imputed values in meta-analysis, approximate conditional analysis, or simple strategies such as complete case analysis all have theoretical limitations. Applying these approaches can bias genetic effect estimates and lead to seriously inflated type-I or type-II errors in conditional analysis, which is a critical tool for identifying independently associated variants. To address this challenge and complement imputation methods, we developed a method to combine summary statistics across participating studies and consistently estimate joint effects, even when the contributed summary statistics contain large amounts of missing values. Based on this estimator, we proposed a score statistic called PCBS (partial correlation based score statistic) for conditional analysis of single-variant and gene-level associations. Through extensive analysis of simulated and real data, we showed that the new method produces well-calibrated type-I errors and is substantially more powerful than existing approaches. We applied the proposed approach to one of the largest meta-analyses to date for the cigarettes-per-day phenotype. Using the new method, we identified multiple novel independently associated variants at known loci for tobacco use, which were otherwise missed by alternative methods. Together, the phenotypic variance explained by these variants was 1.1%, improving that of previously reported associations by 71%. These findings illustrate the extent of locus allelic heterogeneity and can help pinpoint causal variants.


Subject(s)
Data Analysis , Tobacco Products/statistics & numerical data , Tobacco Use/genetics , Alleles , Data Interpretation, Statistical , Datasets as Topic , Genetic Loci/genetics , Genome-Wide Association Study , Genotype , Humans , Phenotype , Polymorphism, Single Nucleotide
SELECTION OF CITATIONS
SEARCH DETAIL
...