ABSTRACT
Data from the Pharmacogenomics and Risk of Cardiovascular Disease (PARC) study and the Cardiovascular Health Study (CHS) provide independent and confirmatory evidence for association between common polymorphisms of the HNF1A gene encoding hepatocyte nuclear factor-1 alpha and plasma C-reactive protein (CRP) concentration. Analyses with the use of imputation-based methods to combine genotype data from both studies and to test untyped SNPs from the HapMap database identified several SNPs within a 5 kb region of HNF1A intron 1 with the strongest evidence of association with CRP phenotype.
Subject(s)
C-Reactive Protein/genetics , Hepatocyte Nuclear Factor 1-alpha/genetics , Aged , Bayes Theorem , Female , Humans , Hydroxymethylglutaryl-CoA Reductase Inhibitors/therapeutic use , Male , Middle Aged , Polymorphism, Single Nucleotide , Pravastatin/therapeutic use , Simvastatin/therapeutic useABSTRACT
Genome-wide association analysis of cohorts with thousands of phenotypes is computationally expensive, particularly when accounting for sample relatedness or population structure. Here we present a novel machine-learning method called REGENIE for fitting a whole-genome regression model for quantitative and binary phenotypes that is substantially faster than alternatives in multi-trait analyses while maintaining statistical efficiency. The method naturally accommodates parallel analysis of multiple phenotypes and requires only local segments of the genotype matrix to be loaded in memory, in contrast to existing alternatives, which must load genome-wide matrices into memory. This results in substantial savings in compute time and memory usage. We introduce a fast, approximate Firth logistic regression test for unbalanced case-control phenotypes. The method is ideally suited to take advantage of distributed computing frameworks. We demonstrate the accuracy and computational benefits of this approach using the UK Biobank dataset with up to 407,746 individuals.
Subject(s)
Computational Biology , Genome-Wide Association Study , Genomics , Case-Control Studies , Computational Biology/methods , Genome-Wide Association Study/methods , Genomics/methods , Genotype , Humans , Logistic Models , Machine Learning , Phenotype , Reproducibility of ResultsABSTRACT
Despite strides in characterizing human history from genetic polymorphism data, progress in identifying genetic signatures of recent demography has been limited. Here we identify very recent fine-scale population structure in North America from a network of over 500 million genetic (identity-by-descent, IBD) connections among 770,000 genotyped individuals of US origin. We detect densely connected clusters within the network and annotate these clusters using a database of over 20 million genealogical records. Recent population patterns captured by IBD clustering include immigrants such as Scandinavians and French Canadians; groups with continental admixture such as Puerto Ricans; settlers such as the Amish and Appalachians who experienced geographic or cultural isolation; and broad historical trends, including reduced north-south gene flow. Our results yield a detailed historical portrait of North America after European settlement and support substantial genetic heterogeneity in the United States beyond that uncovered by previous studies.
Subject(s)
Demography/statistics & numerical data , Genetics, Population/methods , Population Dynamics/trends , Population/genetics , Cluster Analysis , Demography/methods , Emigrants and Immigrants , Gene Flow/genetics , Genotyping Techniques , Haplotypes/genetics , Humans , Polymorphism, Single Nucleotide , Population Dynamics/statistics & numerical data , Sequence Analysis, DNA , United States/ethnologyABSTRACT
The purposes of this study were 1) to examine the performance of a new multimarker regression approach for model-free linkage analysis in comparison to a conventional multipoint approach, and 2) to determine the whether a conditioning strategy would improve the performance of the conventional multipoint method when applied to data from two interacting loci. Linkage analysis of the Kofendrerd Personality Disorder phenotype to chromosomes 1 and 3 was performed in three populations for all 100 replicates of the Genetic Analysis Workshop 14 simulated data. Three approaches were used: a conventional multipoint analysis using the Zlr statistic as calculated in the program ALLEGRO; a conditioning approach in which the per-family contribution on one chromosome was weighted according to evidence for linkage on the other chromosome; and a novel multimarker regression approach. The multipoint and multimarker approaches were generally successful in localizing known susceptibility loci on chromosomes 1 and 3, and were found to give broadly similar results. No advantage was found with the per-family conditioning approach. The effect on power and type I error of different choices of weighting scheme (to account for different numbers of affected siblings) in the multimarker approach was examined.
Subject(s)
Chromosome Mapping , Computer Simulation , Congresses as Topic , Databases, Genetic , Models, Genetic , Genetic Markers , Genetics, Population , Humans , Regression Analysis , Reproducibility of ResultsABSTRACT
INTRODUCTION: It is unclear whether the current distribution of surgeons practicing female pelvic medicine and reconstructive surgery in the United States is adequate to meet the needs of a growing and aging population. We assessed the geographic distribution of female pelvic surgeons as represented by members of the American Urogynecologic Society (AUGS) throughout the United States at the county, state, and American Congress of Obstetricians and Gynecologists district levels. MATERIALS AND METHODS: County-level data from the AUGS, American Congress of Obstetricians and Gynecologists, and the United States Census were analyzed in this observational study. State and national patterns of female pelvic surgeon density were mapped graphically using ArcGIS software and 2010 US Census demographic data. RESULTS: In 2013, the 1058 AUGS practicing physicians represented 0.13% of the total physician workforce. There were 6.7 AUGS members available for every 1 million women and 20 AUGS members for every 1 million postreproductive-aged women in the United States. The density of female pelvic surgeons was highest in metropolitan areas. Overall, 88% of the counties in the United States lacked female pelvic surgeons. Nationwide, there was a mean of 1 AUGS member for every 31 practicing general obstetrician-gynecologists. CONCLUSIONS: These findings have implications for training, recruiting, and retaining female pelvic surgeons. The uneven distribution of female pelvic surgeons throughout the United States is likely to worsen as graduating female pelvic medicine and reconstructive surgery fellows continue to cluster in urban areas.
Subject(s)
Gynecologic Surgical Procedures , Gynecology , Physicians/supply & distribution , Plastic Surgery Procedures , Urology , Cross-Sectional Studies/methods , Female , Gynecologic Surgical Procedures/statistics & numerical data , Humans , Plastic Surgery Procedures/statistics & numerical data , Rural Health Services , Societies/statistics & numerical data , Statistical Distributions , United States , Urban Health Services , WorkforceABSTRACT
BACKGROUND: Idiopathic pulmonary fibrosis (IPF) is a devastating disease that probably involves several genetic loci. Several rare genetic variants and one common single nucleotide polymorphism (SNP) of MUC5B have been associated with the disease. Our aim was to identify additional common variants associated with susceptibility and ultimately mortality in IPF. METHODS: First, we did a three-stage genome-wide association study (GWAS): stage one was a discovery GWAS; and stages two and three were independent case-control studies. DNA samples from European-American patients with IPF meeting standard criteria were obtained from several US centres for each stage. Data for European-American control individuals for stage one were gathered from the database of genotypes and phenotypes; additional control individuals were recruited at the University of Pittsburgh to increase the number. For controls in stages two and three, we gathered data for additional sex-matched European-American control individuals who had been recruited in another study. DNA samples from patients and from control individuals were genotyped to identify SNPs associated with IPF. SNPs identified in stage one were carried forward to stage two, and those that achieved genome-wide significance (p<5Ć¢ĀĀĆĆ¢ĀĀ10(-8)) in a meta-analysis were carried forward to stage three. Three case series with follow-up data were selected from stages one and two of the GWAS using samples with follow-up data. Mortality analyses were done in these case series to assess the SNPs associated with IPF that had achieved genome-wide significance in the meta-analysis of stages one and two. Finally, we obtained gene-expression profiling data for lungs of patients with IPF from the Lung Genomics Research Consortium and analysed correlation with SNP genotypes. FINDINGS: In stage one of the GWAS (542 patients with IPF, 542 control individuals matched one-by-one to cases by genetic ancestry estimates), we identified 20 loci. Six SNPs reached genome-wide significance in stage two (544 patients, 687 control individuals): three TOLLIP SNPs (rs111521887, rs5743894, rs5743890) and one MUC5B SNP (rs35705950) at 11p15.5; one MDGA2 SNP (rs7144383) at 14q21.3; and one SPPL2C SNP (rs17690703) at 17q21.31. Stage three (324 patients, 702 control individuals) confirmed the associations for all these SNPs, except for rs7144383. Linkage disequilibrium between the MUC5B SNP (rs35705950) and TOLLIP SNPs (rs111521887 [r(2)=0Ā·07], rs5743894 [r(2)=0Ā·16], and rs5743890 [r(2)=0Ā·01]) was low. 683 patients from the GWAS were included in the mortality analysis. Individuals who developed IPF despite having the protective TOLLIP minor allele of rs5743890 carried an increased mortality risk (meta-analysis with fixed-effect model: hazard ratio 1Ā·72 [95% CI 1Ā·24-2Ā·38]; p=0Ā·0012). TOLLIP expression was decreased by 20% in individuals carrying the minor allele of rs5743890 (p=0Ā·097), 40% in those with the minor allele of rs111521887 (p=3Ā·0Ć¢ĀĀĆĆ¢ĀĀ10(-4)), and 50% in those with the minor allele of rs5743894 (p=2Ā·93Ć¢ĀĀĆĆ¢ĀĀ10(-5)) compared with homozygous carriers of common alleles for these SNPs. INTERPRETATION: Novel variants in TOLLIP and SPPL2C are associated with IPF susceptibility. One novel variant of TOLLIP, rs5743890, is also associated with mortality. These associations and the reduced expression of TOLLIP in patients with IPF who carry TOLLIP SNPs emphasise the importance of this gene in the disease. FUNDING: National Institutes of Health; National Heart, Lung, and Blood Institute; Pulmonary Fibrosis Foundation; Coalition for Pulmonary Fibrosis; and Instituto de Salud Carlos III.
Subject(s)
DNA/genetics , Genetic Predisposition to Disease , Genetic Variation , Genome-Wide Association Study/methods , Idiopathic Pulmonary Fibrosis/genetics , Adult , Aged , Female , Genotype , Humans , Idiopathic Pulmonary Fibrosis/mortality , Linkage Disequilibrium , Male , Middle Aged , Phenotype , Retrospective Studies , Survival Rate/trends , United States/epidemiologyABSTRACT
BACKGROUND: Statins effectively lower total and plasma LDL-cholesterol, but the magnitude of decrease varies among individuals. To identify single nucleotide polymorphisms (SNPs) contributing to this variation, we performed a combined analysis of genome-wide association (GWA) results from three trials of statin efficacy. METHODS AND PRINCIPAL FINDINGS: Bayesian and standard frequentist association analyses were performed on untreated and statin-mediated changes in LDL-cholesterol, total cholesterol, HDL-cholesterol, and triglyceride on a total of 3932 subjects using data from three studies: Cholesterol and Pharmacogenetics (40 mg/day simvastatin, 6 weeks), Pravastatin/Inflammation CRP Evaluation (40 mg/day pravastatin, 24 weeks), and Treating to New Targets (10 mg/day atorvastatin, 8 weeks). Genotype imputation was used to maximize genomic coverage and to combine information across studies. Phenotypes were normalized within each study to account for systematic differences among studies, and fixed-effects combined analysis of the combined sample were performed to detect consistent effects across studies. Two SNP associations were assessed as having posterior probability greater than 50%, indicating that they were more likely than not to be genuinely associated with statin-mediated lipid response. SNP rs8014194, located within the CLMN gene on chromosome 14, was strongly associated with statin-mediated change in total cholesterol with an 84% probability by Bayesian analysis, and a p-value exceeding conventional levels of genome-wide significance by frequentist analysis (P = 1.8 x 10(-8)). This SNP was less significantly associated with change in LDL-cholesterol (posterior probability = 0.16, P = 4.0 x 10(-6)). Bayesian analysis also assigned a 51% probability that rs4420638, located in APOC1 and near APOE, was associated with change in LDL-cholesterol. CONCLUSIONS AND SIGNIFICANCE: Using combined GWA analysis from three clinical trials involving nearly 4,000 individuals treated with simvastatin, pravastatin, or atorvastatin, we have identified SNPs that may be associated with variation in the magnitude of statin-mediated reduction in total and LDL-cholesterol, including one in the CLMN gene for which statistical evidence for association exceeds conventional levels of genome-wide significance. TRIAL REGISTRATION: PRINCE and TNT are not registered. CAP is registered at Clinicaltrials.gov NCT00451828.
Subject(s)
Genome-Wide Association Study , Hydroxymethylglutaryl-CoA Reductase Inhibitors/therapeutic use , Lipids/chemistry , Adult , Aged , Atorvastatin , Bayes Theorem , Cholesterol/metabolism , Female , Genotype , Heptanoic Acids/therapeutic use , Humans , Inflammation , Male , Middle Aged , Phenotype , Polymorphism, Single Nucleotide , Pravastatin/therapeutic use , Pyrroles/therapeutic use , Simvastatin/therapeutic useABSTRACT
We address the analytical problem of evaluating the evidence for linkage at a test locus while taking into account the effect of a known linked disease locus. The method we propose is a multimarker regression approach that models the identity-by-descent states for affected sib-pairs at a series of linked markers in terms of the identity-by-descent state at the known disease locus. Our method allows analysis to be performed at a test location (or a series of locations) without the requirement that identity-by-descent be directly observed at either the test or the known conditioning locus. An advantage of our method is that identity-by-descent states from multiple markers are included simultaneously in the test of linkage, without recourse to multipoint imputation. The properties and power of the method are examined under various null and alternative hypotheses. The method is applied to data from a study of 1,056 type 1 diabetes families to examine the evidence for an additional putative locus (IDDM15) on chromosome 6q, linked to IDDM1 in the HLA region on chromosome 6p. After accounting for the strong effect of IDDM1 and the differing rates of male and female recombination in the region, we find only marginal evidence for IDDM15 (P = 0.03 to 0.002, using different methods) approximately 15 cM centromeric of the original localisation.
Subject(s)
Genetic Linkage/genetics , Chromosome Mapping , Diabetes Mellitus, Type 1/genetics , Female , Genetic Markers , Genetic Predisposition to Disease , Humans , Male , Models, Theoretical , SiblingsABSTRACT
OBJECTIVES: Associations between calcium-sensing receptor (CaSR) polymorphisms and serum calcium, PTH and bone mineral density (BMD) have been reported by six studies. However, three other studies have failed to detect such associations. We therefore further investigated three CaSR coding region polymorphisms (Ala986Ser, Arg990Gly and Gln1011Glu) for associations with indices of calcium homeostasis and BMD and for alterations in receptor function. PATIENTS AND DESIGN: One hundred and ten adult, Caucasian, female, dizygotic twin pairs were investigated for associations between the three CaSR polymorphisms and serum calcium, albumin, PTH, 25-hydroxyvitamin D(3) (25OHD(3)), 1,25-dihydroxyvitamin D(3)[1,25(OH)(2)D(3)], urinary calcium excretion and BMD. Each polymorphic CaSR was also transfected into HEK293 cells and functionally evaluated. RESULTS: There was a lack of association between each of these three CaSR polymorphisms and serum calcium corrected for albumin, PTH, 25OHD(3), 1,25(OH)(2)D(3), urinary calcium excretion or BMD at the hip, forearm and lumbar spine. These findings were supported by a lack of functional differences in the dose-response curves of the CaSR variants, with the EC(50) values (mean +/- SEM) of the wild-type (Ala986/Arg990/Gln1011), Ser986, Gly990 and Glu1011 CaSR variants being 2.74 +/- 0.29 mm, 3.09 +/- 0.34 mm (P > 0.4), 2.99 +/- 0.23 mm (P > 0.4) and 2.96 +/- 0.30 mm (P > 0.5), respectively. CONCLUSIONS: Our study, which was sufficiently powered to detect effects that would explain up to 5%, but not less than 1%, of the variance has revealed that the three CaSR polymorphisms of the coding region have no major influence on indices of calcium homeostasis in this female population, and that they do not alter receptor function.
Subject(s)
Bone Density/physiology , Calcium/metabolism , Polymorphism, Genetic , Receptors, Calcium-Sensing/genetics , Adolescent , Adult , Aged , Aged, 80 and over , Biomarkers/blood , Calcitriol/blood , Calcium/blood , Calcium/urine , Cell Line , Cluster Analysis , Female , Gene Expression , Genotype , Homeostasis , Humans , Middle Aged , Parathyroid Hormone/blood , Receptors, Calcium-Sensing/metabolism , Regression Analysis , Serum Albumin/analysis , Twins, Dizygotic , Vitamin D/analogs & derivatives , Vitamin D/bloodABSTRACT
Existing standard methods of linkage analysis for quantitative phenotypes rest on the assumptions of either ordinary least squares (Haseman and Elston [1972] Behav. Genet. 2:3-19; Sham and Purcell [2001] Am. J. Hum. Genet. 68:1527-1532) or phenotypic normality (Almasy and Blangero [1998] Am. J. Hum. Genet. 68:1198-1199; Kruglyak and Lander [1995] Am. J. Hum. Genet. 57:439-454). The limitations of both these methods lie in the specification of the error distribution in the respective regression analyses. In ordinary least squares regression, the residual distribution is misspecified as being independent of the mean level. Using variance components and assuming phenotypic normality, the dependency on the mean level is correctly specified, but the remaining residual coefficient of variation is constrained a priori. Here it is shown that these limitations can be addressed (for a sample of unselected sib-pairs) using a generalized linear model based on the gamma distribution, which can be readily implemented in any standard statistical software package. The generalized linear model approach can emulate variance components when phenotypic multivariate normality is assumed (Almasy and Blangero [1998] Am. J. Hum Genet. 68: 1198-1211) and is therefore more powerful than ordinary least squares, but has the added advantage of being robust to deviations from multivariate normality and provides (often overlooked) model-fit diagnostics for linkage analysis.