ABSTRACT
Genomic studies in African populations provide unique opportunities to understand disease etiology, human diversity, and population history. In the largest study of its kind, comprising genome-wide data from 6,400 individuals and whole-genome sequences from 1,978 individuals from rural Uganda, we find evidence of geographically correlated fine-scale population substructure. Historically, the ancestry of modern Ugandans was best represented by a mixture of ancient East African pastoralists. We demonstrate the value of the largest sequence panel from Africa to date as an imputation resource. Examining 34 cardiometabolic traits, we show systematic differences in trait heritability between European and African populations, probably reflecting the differential impact of genes and environment. In a multi-trait pan-African GWAS of up to 14,126 individuals, we identify novel loci associated with anthropometric, hematological, lipid, and glycemic traits. We find that several functionally important signals are driven by Africa-specific variants, highlighting the value of studying diverse populations across the region.
Subject(s)
Black People/genetics , Genetic Predisposition to Disease , Genome, Human/genetics , Genomics , Female , Gene Frequency/genetics , Genome-Wide Association Study , Humans , Male , Polymorphism, Single Nucleotide/genetics , Uganda/epidemiology , Whole Genome SequencingABSTRACT
MOTIVATION: Methods for analysis of GWAS summary statistics have encouraged data sharing and democratized the analysis of different diseases. Ideal validation for such methods is application to simulated data, where some 'truth' is known. As GWAS increase in size, so does the computational complexity of such evaluations; standard practice repeatedly simulates and analyses genotype data for all individuals in an example study. RESULTS: We have developed a novel method based on an alternative approach, directly simulating GWAS summary data, without individual data as an intermediate step. We mathematically derive the expected statistics for any set of causal variants and their effect sizes, conditional upon control haplotype frequencies (available from public reference datasets). Simulation of GWAS summary output can be conducted independently of sample size by simulating random variates about these expected values. Across a range of scenarios, our method, produces very similar output to that from simulating individual genotypes with a substantial gain in speed even for modest sample sizes. Fast simulation of GWAS summary statistics will enable more complete and rapid evaluation of summary statistic methods as well as opening new potential avenues of research in fine mapping and gene set enrichment analysis. AVAILABILITY AND IMPLEMENTATION: Our method is available under a GPL license as an R package from http://github.com/chr1swallace/simGWAS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Subject(s)
Genome-Wide Association Study , Software , Case-Control Studies , Genotype , Humans , Polymorphism, Single NucleotideABSTRACT
Genome-wide association studies (GWAS) have transformed our understanding of the genetics of complex traits such as autoimmune diseases, but how risk variants contribute to pathogenesis remains largely unknown. Identifying genetic variants that affect gene expression (expression quantitative trait loci, or eQTLs) is crucial to addressing this. eQTLs vary between tissues and following in vitro cellular activation, but have not been examined in the context of human inflammatory diseases. We performed eQTL mapping in five primary immune cell types from patients with active inflammatory bowel disease (n = 91), anti-neutrophil cytoplasmic antibody-associated vasculitis (n = 46) and healthy controls (n = 43), revealing eQTLs present only in the context of active inflammatory disease. Moreover, we show that following treatment a proportion of these eQTLs disappear. Through joint analysis of expression data from multiple cell types, we reveal that previous estimates of eQTL immune cell-type specificity are likely to have been exaggerated. Finally, by analysing gene expression data from multiple cell types, we find eQTLs not previously identified by database mining at 34 inflammatory bowel disease-associated loci. In summary, this parallel eQTL analysis in multiple leucocyte subsets from patients with active disease provides new insights into the genetic basis of immune-mediated diseases.
Subject(s)
Anti-Neutrophil Cytoplasmic Antibody-Associated Vasculitis/genetics , Genetic Association Studies , Inflammatory Bowel Diseases/genetics , Quantitative Trait Loci/genetics , Anti-Neutrophil Cytoplasmic Antibody-Associated Vasculitis/immunology , Anti-Neutrophil Cytoplasmic Antibody-Associated Vasculitis/pathology , Female , Gene Expression Regulation , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Inflammatory Bowel Diseases/immunology , Inflammatory Bowel Diseases/pathology , Male , Monocytes/immunology , Monocytes/metabolism , Neutrophils/immunology , Neutrophils/metabolism , Phenotype , T-Lymphocytes/immunology , T-Lymphocytes/metabolismABSTRACT
The genes and cells that mediate genetic associations identified through genome-wide association studies (GWAS) are only partially understood. Several studies that have investigated the genetic regulation of gene expression have shown that disease-associated variants are over-represented amongst expression quantitative trait loci (eQTL) variants. Evidence for colocalisation of eQTL and disease causal variants can suggest causal genes and cells for these genetic associations. Here, we used colocalisation analysis to investigate whether 595 genetic associations to ten immune-mediated diseases are consistent with a causal variant that regulates, in cis, gene expression in resting B cells, and in resting and stimulated monocytes. Previously published candidate causal genes were over-represented amongst genes exhibiting colocalisation (odds ratio > 1.5), and we identified evidence for colocalisation (posterior odds > 5) between cis eQTLs in at least one cell type and at least one disease for six genes: ADAM15, RGS1, CARD9, LTBR, CTSH and SYNGR1. We identified cell-specific effects, such as for CTSH, the expression of which in monocytes, but not in B cells, may mediate type 1 diabetes and narcolepsy associations in the chromosome 15q25.1 region. Our results demonstrate the utility of integrating genetic studies of disease and gene expression for highlighting causal genes and cell types.
Subject(s)
Bayes Theorem , Genome-Wide Association Study , Immune System Diseases/genetics , Quantitative Trait Loci , Gene Expression Regulation , Genetic Predisposition to Disease , Humans , Quantitative Trait, HeritableABSTRACT
Pathway analysis can complement point-wise single nucleotide polymorphism (SNP) analysis in exploring genomewide association study (GWAS) data to identify specific disease-associated genes that can be candidate causal genes. We propose a straightforward methodology that can be used for conducting a gene-based pathway analysis using summary GWAS statistics in combination with widely available reference genotype data. We used this method to perform a gene-based pathway analysis of a type 1 diabetes (T1D) meta-analysis GWAS (of 7,514 cases and 9,045 controls). An important feature of the conducted analysis is the removal of the major histocompatibility complex gene region, the major genetic risk factor for T1D. Thirty-one of the 1,583 (2%) tested pathways were identified to be enriched for association with T1D at a 5% false discovery rate. We analyzed these 31 pathways and their genes to identify SNPs in or near these pathway genes that showed potentially novel association with T1D and attempted to replicate the association of 22 SNPs in additional samples. Replication P-values were skewed (P=9.85×10-11) with 12 of the 22 SNPs showing P<0.05. Support, including replication evidence, was obtained for nine T1D associated variants in genes ITGB7 (rs11170466, P=7.86×10-9), NRP1 (rs722988, 4.88×10-8), BAD (rs694739, 2.37×10-7), CTSB (rs1296023, 2.79×10-7), FYN (rs11964650, P=5.60×10-7), UBE2G1 (rs9906760, 5.08×10-7), MAP3K14 (rs17759555, 9.67×10-7), ITGB1 (rs1557150, 1.93×10-6), and IL7R (rs1445898, 2.76×10-6). The proposed methodology can be applied to other GWAS datasets for which only summary level data are available.
Subject(s)
Diabetes Mellitus, Type 1/genetics , Genome-Wide Association Study , Genotype , Humans , Polymorphism, Single Nucleotide , Reproducibility of ResultsABSTRACT
Introduction: Fragility ankle fractures are traditionally managed conservatively or with open reduction internal fixation. Tibiotalocalcaneal (TTC) nailing is an alternative option for the geriatric patient. This meta-analysis provides the most detailed analysis of TTC nailing for fragility ankle fractures. Methods: A systematic search was performed on MEDLINE, EMBASE, Cochrane Library, and Web of Science, identifying 14 studies for inclusion. Studies including patients with a fragility ankle fracture, defined according to NICE guidelines as a low-energy fracture obtained following a fall from standing height or less, that were treated with TTC nail were included. Patients with a previous fracture of the ipsilateral limb, fibular nails, and pathological fractures were excluded. This review was registered in PROSPERO (ID: CRD42021258893). Results: A total of 312 ankle fractures were included. The mean age was 77.3 years old. In this study, 26.9% were male, and 41.9% were diabetics. The pooled proportion of superficial infection was 10% (95% CI: 0.06-0.16), deep infection 8% (95% CI: 0.06-0.11), implant failure 11% (95% CI: 0.07-0.15), malunion 11% (95% CI: 0.06-0.18), and all-cause mortality 27% (95% CI: 0.20-0.34). The pooled mean post-operative Olerud-Molander ankle score was 54.07 (95% CI: 48.98-59.16). Egger's test (P = 0.56) showed no significant publication bias. Conclusion: TTC nailing is an adequate alternative option for fragility ankle fractures. However, current evidence includes mainly case series with inconsistent post-operative rehabilitation protocols. Prospective randomised control trials with long follow-up times and large cohort sizes are needed to guide the use of TTC nailing for ankle fractures.
ABSTRACT
PURPOSE: To generate a prognostic model to predict keratoconus progression to corneal crosslinking (CXL). DESIGN: Retrospective cohort study. METHODS: We recruited 5025 patients (9341 eyes) with early keratoconus between January 2011 and November 2020. Genetic data from 926 patients were available. We investigated both keratometry or CXL as end points for progression and used the Royston-Parmar method on the proportional hazards scale to generate a prognostic model. We calculated hazard ratios (HRs) for each significant covariate, with explained variation and discrimination, and performed internal-external cross validation by geographic regions. RESULTS: After exclusions, model fitting comprised 8701 eyes, of which 3232 underwent CXL. For early keratoconus, CXL provided a more robust prognostic model than keratometric progression. The final model explained 33% of the variation in time to event: age HR (95% CI) 0.9 (0.90-0.91), maximum anterior keratometry 1.08 (1.07-1.09), and minimum corneal thickness 0.95 (0.93-0.96) as significant covariates. Single-nucleotide polymorphisms (SNPs) associated with keratoconus (n=28) did not significantly contribute to the model. The predicted time-to-event curves closely followed the observed curves during internal-external validation. Differences in discrimination between geographic regions was low, suggesting the model maintained its predictive ability. CONCLUSIONS: A prognostic model to predict keratoconus progression could aid patient empowerment, triage, and service provision. Age at presentation is the most significant predictor of progression risk. Candidate SNPs associated with keratoconus do not contribute to progression risk.
Subject(s)
Keratoconus , Photochemotherapy , Collagen/therapeutic use , Corneal Topography , Demography , Humans , Keratoconus/diagnosis , Keratoconus/drug therapy , Keratoconus/genetics , Photochemotherapy/methods , Photosensitizing Agents/therapeutic use , Retrospective Studies , Riboflavin/therapeutic use , Ultraviolet Rays , Visual AcuityABSTRACT
OBJECTIVES: When the prevalence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is low, many positive test results are false positives. Confirmatory testing reduces overdiagnosis and nosocomial infection and enables real-world estimates of test specificity and positive predictive value. This study estimates these parameters to evaluate the impact of confirmatory testing and to improve clinical diagnosis, epidemiological estimation and interpretation of vaccine trials. METHODS: Over 1 month we took all respiratory samples from our laboratory with a patient's first detection of SARS-CoV-2 RNA (Hologic Aptima SARS-CoV-2 assay or in-house RT-PCR platform), and repeated testing using two platforms. Samples were categorized by source, and by whether clinical details suggested COVID-19 or corroborative testing from another laboratory. We estimated specificity and positive predictive value using approaches based on maximum likelihood. RESULTS: Of 19 597 samples, SARS-CoV-2 RNA was detected in 107; 52 corresponded to first-time detection (0.27% of tests on samples without previous detection). Further testing detected SARS-CoV-2 RNA once or more ('confirmed') in 29 samples (56%), and failed to detect SARS-CoV-2 RNA ('not confirmed') in 23 (44%). Depending upon assumed parameters, point estimates for specificity and positive predictive value were 99.91-99.98% and 61.8-89.8% respectively using the Hologic Aptima SARS-CoV-2 assay, and 97.4-99.1% and 20.1-73.8% respectively using an in-house assay. CONCLUSIONS: Nucleic acid amplification testing for SARS-CoV-2 is highly specific. Nevertheless, when prevalence is low a significant proportion of initially positive results fail to confirm, and confirmatory testing substantially reduces the detection of false positives. Omitting additional testing in samples with higher prior detection probabilities focuses testing where it is clinically impactful and minimizes delay.
Subject(s)
COVID-19 Nucleic Acid Testing/methods , COVID-19/diagnosis , Nucleic Acid Amplification Techniques/methods , SARS-CoV-2/isolation & purification , Adult , Aged , COVID-19/epidemiology , Diagnostic Tests, Routine , England/epidemiology , Female , Humans , Male , Middle Aged , Predictive Value of Tests , Prevalence , SARS-CoV-2/genetics , Sensitivity and SpecificityABSTRACT
Thousands of genetic variants are associated with human disease risk, but linkage disequilibrium (LD) hinders fine-mapping the causal variants. Both lack of power, and joint tagging of two or more distinct causal variants by a single non-causal SNP, lead to inaccuracies in fine-mapping, with stochastic search more robust than stepwise. We develop a computationally efficient multinomial fine-mapping (MFM) approach that borrows information between diseases in a Bayesian framework. We show that MFM has greater accuracy than single disease analysis when shared causal variants exist, and negligible loss of precision otherwise. MFM analysis of six immune-mediated diseases reveals causal variants undetected in individual disease analysis, including in IL2RA where we confirm functional effects of multiple causal variants using allele-specific expression in sorted CD4+ T cells from genotype-selected individuals. MFM has the potential to increase fine-mapping resolution in related diseases enabling the identification of associated cellular and molecular phenotypes.
Subject(s)
Autoimmunity/genetics , Genetic Association Studies/methods , Genetic Predisposition to Disease/genetics , Genome-Wide Association Study/methods , Models, Genetic , Alleles , Bayes Theorem , CD4-Positive T-Lymphocytes , CTLA-4 Antigen/genetics , Chromosome Mapping , Gene Expression Regulation , Genotype , Humans , Interleukin-2 Receptor alpha Subunit/genetics , Linkage Disequilibrium , Phenotype , Polymorphism, Single NucleotideABSTRACT
Determining whether potential causal variants for related diseases are shared can identify overlapping etiologies of multifactorial disorders. Colocalization methods disentangle shared and distinct causal variants. However, existing approaches require independent data sets. Here we extend two colocalization methods to allow for the shared-control design commonly used in comparison of genome-wide association study results across diseases. Our analysis of four autoimmune diseases--type 1 diabetes (T1D), rheumatoid arthritis, celiac disease and multiple sclerosis--identified 90 regions that were associated with at least one disease, 33 (37%) of which were associated with 2 or more disorders. Nevertheless, for 14 of these 33 shared regions, there was evidence that the causal variants differed. We identified new disease associations in 11 regions previously associated with one or more of the other 3 disorders. Four of eight T1D-specific regions contained known type 2 diabetes (T2D) candidate genes (COBL, GLIS3, RNLS and BCAR1), suggesting a shared cellular etiology.
Subject(s)
Arthritis, Rheumatoid/genetics , Celiac Disease/genetics , Diabetes Mellitus, Type 1/genetics , Multiple Sclerosis/genetics , Bayes Theorem , Case-Control Studies , Diabetes Mellitus, Type 2/genetics , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Linkage Disequilibrium , Models, Genetic , Polymorphism, Single Nucleotide , RiskABSTRACT
Genetic studies of type 1 diabetes (T1D) have identified 50 susceptibility regions, finding major pathways contributing to risk, with some loci shared across immune disorders. To make genetic comparisons across autoimmune disorders as informative as possible, a dense genotyping array, the Immunochip, was developed, from which we identified four new T1D-associated regions (P < 5 × 10(-8)). A comparative analysis with 15 immune diseases showed that T1D is more similar genetically to other autoantibody-positive diseases, significantly most similar to juvenile idiopathic arthritis and significantly least similar to ulcerative colitis, and provided support for three additional new T1D risk loci. Using a Bayesian approach, we defined credible sets for the T1D-associated SNPs. The associated SNPs localized to enhancer sequences active in thymus, T and B cells, and CD34(+) stem cells. Enhancer-promoter interactions can now be analyzed in these cell types to identify which particular genes and regulatory sequences are causal.