Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 215
Filter
1.
Hum Mol Genet ; 33(8): 687-697, 2024 Apr 08.
Article in English | MEDLINE | ID: mdl-38263910

ABSTRACT

BACKGROUND: Expansion of genome-wide association studies across population groups is needed to improve our understanding of shared and unique genetic contributions to breast cancer. We performed association and replication studies guided by a priori linkage findings from African ancestry (AA) relative pairs. METHODS: We performed fixed-effect inverse-variance weighted meta-analysis under three significant AA breast cancer linkage peaks (3q26-27, 12q22-23, and 16q21-22) in 9241 AA cases and 10 193 AA controls. We examined associations with overall breast cancer as well as estrogen receptor (ER)-positive and negative subtypes (193,132 SNPs). We replicated associations in the African-ancestry Breast Cancer Genetic Consortium (AABCG). RESULTS: In AA women, we identified two associations on chr12q for overall breast cancer (rs1420647, OR = 1.15, p = 2.50×10-6; rs12322371, OR = 1.14, p = 3.15×10-6), and one for ER-negative breast cancer (rs77006600, OR = 1.67, p = 3.51×10-6). On chr3, we identified two associations with ER-negative disease (rs184090918, OR = 3.70, p = 1.23×10-5; rs76959804, OR = 3.57, p = 1.77×10-5) and on chr16q we identified an association with ER-negative disease (rs34147411, OR = 1.62, p = 8.82×10-6). In the replication study, the chr3 associations were significant and effect sizes were larger (rs184090918, OR: 6.66, 95% CI: 1.43, 31.01; rs76959804, OR: 5.24, 95% CI: 1.70, 16.16). CONCLUSION: The two chr3 SNPs are upstream to open chromatin ENSR00000710716, a regulatory feature that is actively regulated in mammary tissues, providing evidence that variants in this chr3 region may have a regulatory role in our target organ. Our study provides support for breast cancer variant discovery using prioritization based on linkage evidence.


Subject(s)
Black People , Breast Neoplasms , Genetic Predisposition to Disease , Female , Humans , Black People/genetics , Breast Neoplasms/genetics , Genome-Wide Association Study , Polymorphism, Single Nucleotide
2.
Annu Rev Genomics Hum Genet ; 21: 15-36, 2020 08 31.
Article in English | MEDLINE | ID: mdl-31935127

ABSTRACT

I briefly describe my early life and how, through a series of serendipitous events, I became a genetic epidemiologist. I discuss how the Elston-Stewart algorithm was discovered and its contribution to segregation, linkage, and association analysis. New linkage findings and paternity testing resulted from having a genotyping lab. The different meanings of interaction-statistical and biological-are clarified. The computer package S.A.G.E. (Statistical Analysis for Genetic Epidemiology), based on extensive method development over two decades, was conceived in 1986, flourished for 20 years, and is now freely available for use and further development. Finally, I describe methods to estimate and test hypotheses about familial correlations, and point out that the liability model often used to estimate disease heritability estimates the heritability of that liability, rather than of the disease itself, and so can be highly dependent on the assumed distribution of that liability.


Subject(s)
Algorithms , Genetic Linkage , Models, Genetic , Molecular Epidemiology , History, 20th Century , History, 21st Century , Humans
3.
Genet Epidemiol ; 42(8): 849-853, 2018 12.
Article in English | MEDLINE | ID: mdl-30298598

ABSTRACT

This is the 100th year anniversary of Fisher's 1918 paper "The correlation between relatives on the supposition of Mendelian inheritance" (Transactions of the Royal Society of Edinburgh 1918, 52 pp 899-438). Fisher's work has had a strong influence on today's genetic epidemiology and this brief autobiographical note highlights a few of the ways his influence on me has affected the field. Although I once took a course of lectures from Fisher, it was mainly his writings that influenced my statistical thinking. Not only did the concept of maximum likelihood appeal to me, but also the concepts of interclass and intraclass correlations, discriminant analysis, and transforming semiquantitative scores to minimize interactions-all topics I first learned about from the 11th edition of his book on Statistical Methods for Research Workers. This, together with a few serendipitous events that shaped my career, had a large influence on me and hence also on the field of genetic epidemiology.


Subject(s)
Molecular Epidemiology , History, 20th Century , History, 21st Century , Humans , Models, Genetic , Pedigree , Probability
4.
Genet Epidemiol ; 42(8): 812-825, 2018 12.
Article in English | MEDLINE | ID: mdl-30238496

ABSTRACT

Linear regression is a standard approach to identify genetic variants associated with continuous traits in genome-wide association studies (GWAS). In a standard epidemiology study, linear regression is often performed with adjustment for covariates to estimate the independent effect of a predictor variable or to improve statistical power by reducing residual variability. However, it is problematic to adjust for heritable covariates in genetic association analysis. Here, we propose a new method that utilizes summary statistics of the covariate from additional samples for reducing the residual variability and hence improves statistical power. Our simulation study showed that the proposed methodology can maintain a good control of Type I error and can achieve much higher power than a simple linear regression. The method is illustrated by an application to the GWAS results from the Genetic Investigation of Anthropometric Traits consortium.


Subject(s)
Genome-Wide Association Study , Statistics as Topic , Computer Simulation , Humans , Linear Models , Models, Genetic , Phenotype , Polymorphism, Single Nucleotide/genetics , Waist Circumference , Waist-Hip Ratio
5.
Bioinformatics ; 34(16): 2851-2853, 2018 08 15.
Article in English | MEDLINE | ID: mdl-29596615

ABSTRACT

Motivation: Despite the need for separate tools to analyze family-based data, there are only a handful of tools optimized for family-based big data compared to the number of tools available for analyzing population-based data. Results: ONETOOL implements the properties of well-known existing family data analysis tools and recently developed methods in a computationally efficient manner, and so is suitable for analyzing the vast amount of variant data available from sequencing family members, providing a rich choice of analysis methods for big data on families. Availability and implementation: ONETOOL is freely available from http://healthstat.snu.ac.kr/software/onetool/. Supplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
Big Data , Databases, Factual , Software
6.
Bioinformatics ; 34(4): 635-642, 2018 02 15.
Article in English | MEDLINE | ID: mdl-28968884

ABSTRACT

Motivation: Pedigree analysis is a longstanding and powerful approach to gain insight into the underlying genetic factors in human health, but identifying, recruiting and genotyping families can be difficult, time consuming and costly. Development of high throughput methods to identify families and foster downstream analyses are necessary. Results: This paper describes simple methods that allowed us to identify 173 368 family pedigrees with high probability using basic demographic data available in most electronic health records (EHRs). We further developed and validate a novel statistical method that uses EHR data to identify families more likely to have a major genetic component to their diseases risk. Lastly, we showed that incorporating EHR-linked family data into genetic association testing may provide added power for genetic mapping without additional recruitment or genotyping. The totality of these results suggests that EHR-linked families can enable classical genetic analyses in a high-throughput manner. Availability and implementation: Pseudocode is provided as supplementary information. Contact: HEBBRING.SCOTT@marshfieldresearch.org. Supplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
Electronic Health Records , Genetic Research , Genome, Human , Pedigree , Algorithms , Chromosome Mapping , Databases, Factual , Female , Genetic Association Studies , Genetic Diseases, Inborn , Humans , Male , Middle Aged
7.
Proc Natl Acad Sci U S A ; 112(4): 1149-54, 2015 Jan 27.
Article in English | MEDLINE | ID: mdl-25583493

ABSTRACT

We used whole-exome and targeted sequencing to characterize somatic mutations in 103 colorectal cancers (CRC) from African Americans, identifying 20 new genes as significantly mutated in CRC. Resequencing 129 Caucasian derived CRCs confirmed a 15-gene set as a preferential target for mutations in African American CRCs. Two predominant genes, ephrin type A receptor 6 (EPHA6) and folliculin (FLCN), with mutations exclusive to African American CRCs, are by genetic and biological criteria highly likely African American CRC driver genes. These previously unsuspected differences in the mutational landscapes of CRCs arising among individuals of different ethnicities have potential to impact on broader disparities in cancer behaviors.


Subject(s)
Black or African American/genetics , Colonic Neoplasms/ethnology , Colonic Neoplasms/genetics , Mutation , Proto-Oncogene Proteins/genetics , Receptor, EphA6/genetics , Tumor Suppressor Proteins/genetics , Exome , Female , Genome-Wide Association Study , Humans , Male , White People/genetics
8.
PLoS Genet ; 11(8): e1005352, 2015 Aug.
Article in English | MEDLINE | ID: mdl-26305897

ABSTRACT

Diabetic kidney disease (DKD) is the most common etiology of chronic kidney disease (CKD) in the industrialized world and accounts for much of the excess mortality in patients with diabetes mellitus. Approximately 45% of U.S. patients with incident end-stage kidney disease (ESKD) have DKD. Independent of glycemic control, DKD aggregates in families and has higher incidence rates in African, Mexican, and American Indian ancestral groups relative to European populations. The Family Investigation of Nephropathy and Diabetes (FIND) performed a genome-wide association study (GWAS) contrasting 6,197 unrelated individuals with advanced DKD with healthy and diabetic individuals lacking nephropathy of European American, African American, Mexican American, or American Indian ancestry. A large-scale replication and trans-ethnic meta-analysis included 7,539 additional European American, African American and American Indian DKD cases and non-nephropathy controls. Within ethnic group meta-analysis of discovery GWAS and replication set results identified genome-wide significant evidence for association between DKD and rs12523822 on chromosome 6q25.2 in American Indians (P = 5.74x10-9). The strongest signal of association in the trans-ethnic meta-analysis was with a SNP in strong linkage disequilibrium with rs12523822 (rs955333; P = 1.31x10-8), with directionally consistent results across ethnic groups. These 6q25.2 SNPs are located between the SCAF8 and CNKSR3 genes, a region with DKD relevant changes in gene expression and an eQTL with IPCEF1, a gene co-translated with CNKSR3. Several other SNPs demonstrated suggestive evidence of association with DKD, within and across populations. These data identify a novel DKD susceptibility locus with consistent directions of effect across diverse ancestral groups and provide insight into the genetic architecture of DKD.


Subject(s)
Diabetes Mellitus, Type 2/genetics , Diabetic Nephropathies/genetics , Black or African American/genetics , Diabetes Mellitus, Type 2/complications , Diabetic Nephropathies/ethnology , Genetic Predisposition to Disease , Genome-Wide Association Study , Hispanic or Latino/genetics , Humans , Indians, North American/genetics , RNA-Binding Proteins/genetics , United States , White People/genetics
9.
BMC Bioinformatics ; 18(1): 217, 2017 Apr 18.
Article in English | MEDLINE | ID: mdl-28420343

ABSTRACT

BACKGROUND: Copy number variation (CNV) is known to play an important role in the genetics of complex diseases and several methods have been proposed to detect association of CNV with phenotypes of interest. Statistical methods for CNV association analysis can be categorized into two different strategies. First, the copy number is estimated by maximum likelihood and association of the expected copy number with the phenotype is tested. Second, the observed probe intensity measurements can be directly used to detect association of CNV with the phenotypes of interest. RESULTS: For each strategy we provide a statistic that can be applied to extended families. The computational efficiency of the proposed methods enables genome-wide association analysis and we show with simulation studies that the proposed methods outperform other existing approaches. In particular, we found that the first strategy is always more efficient than the second strategy no matter whether copy numbers for each individual are well identified or not. With the proposed methods, we performed genome-wide CNV association analyses of hematological trait, hematocrit, on 521 Korean family samples. CONCLUSIONS: We found that statistical analysis with the expected copy number is more powerful than the statistic with the probe intensity measurements regardless of the accuracy of the estimation of copy numbers.


Subject(s)
DNA Copy Number Variations/genetics , Genome-Wide Association Study/methods , Hematocrit/methods , Humans
10.
Genet Epidemiol ; 40(6): 502-11, 2016 09.
Article in English | MEDLINE | ID: mdl-27312886

ABSTRACT

Family-based designs have been repeatedly shown to be powerful in detecting the significant rare variants associated with human diseases. Furthermore, human diseases are often defined by the outcomes of multiple phenotypes, and thus we expect multivariate family-based analyses may be very efficient in detecting associations with rare variants. However, few statistical methods implementing this strategy have been developed for family-based designs. In this report, we describe one such implementation: the multivariate family-based rare variant association tool (mFARVAT). mFARVAT is a quasi-likelihood-based score test for rare variant association analysis with multiple phenotypes, and tests both homogeneous and heterogeneous effects of each variant on multiple phenotypes. Simulation results show that the proposed method is generally robust and efficient for various disease models, and we identify some promising candidate genes associated with chronic obstructive pulmonary disease. The software of mFARVAT is freely available at http://healthstat.snu.ac.kr/software/mfarvat/, implemented in C++ and supported on Linux and MS Windows.


Subject(s)
Genetic Variation , Models, Genetic , Computer Simulation , Genetic Association Studies , Humans , Likelihood Functions , Phenotype
11.
PLoS Genet ; 10(9): e1004641, 2014 Sep.
Article in English | MEDLINE | ID: mdl-25233454

ABSTRACT

High blood pressure (BP) is the most common cardiovascular risk factor worldwide and a major contributor to heart disease and stroke. We previously discovered a BP-associated missense SNP (single nucleotide polymorphism)-rs2272996-in the gene encoding vanin-1, a glycosylphosphatidylinositol (GPI)-anchored membrane pantetheinase. In the present study, we first replicated the association of rs2272996 and BP traits with a total sample size of nearly 30,000 individuals from the Continental Origins and Genetic Epidemiology Network (COGENT) of African Americans (P=0.01). This association was further validated using patient plasma samples; we observed that the N131S mutation is associated with significantly lower plasma vanin-1 protein levels. We observed that the N131S vanin-1 is subjected to rapid endoplasmic reticulum-associated degradation (ERAD) as the underlying mechanism for its reduction. Using HEK293 cells stably expressing vanin-1 variants, we showed that N131S vanin-1 was degraded significantly faster than wild type (WT) vanin-1. Consequently, there were only minimal quantities of variant vanin-1 present on the plasma membrane and greatly reduced pantetheinase activity. Application of MG-132, a proteasome inhibitor, resulted in accumulation of ubiquitinated variant protein. A further experiment demonstrated that atenolol and diltiazem, two current drugs for treating hypertension, reduce the vanin-1 protein level. Our study provides strong biological evidence for the association of the identified SNP with BP and suggests that vanin-1 misfolding and degradation are the underlying molecular mechanism.


Subject(s)
Amidohydrolases/genetics , Amidohydrolases/metabolism , Blood Pressure/genetics , Endoplasmic Reticulum-Associated Degradation/genetics , Genetic Variation , Alleles , Amidohydrolases/blood , Antihypertensive Agents/pharmacology , Antihypertensive Agents/therapeutic use , Blood Pressure/drug effects , Cohort Studies , Enzyme Activation , GPI-Linked Proteins/blood , GPI-Linked Proteins/genetics , GPI-Linked Proteins/metabolism , Genetic Association Studies , Genotype , Humans , Hypertension/drug therapy , Hypertension/epidemiology , Hypertension/genetics , Mutation , Phenotype , Polymorphism, Single Nucleotide
12.
BMC Genomics ; 17: 325, 2016 05 04.
Article in English | MEDLINE | ID: mdl-27142425

ABSTRACT

BACKGROUND: The presence of population structure in a sample may confound the search for important genetic loci associated with disease. Our four samples in the Family Investigation of Nephropathy and Diabetes (FIND), European Americans, Mexican Americans, African Americans, and American Indians are part of a genome- wide association study in which population structure might be particularly important. We therefore decided to study in detail one component of this, individual genetic ancestry (IGA). From SNPs present on the Affymetrix 6.0 Human SNP array, we identified 3 sets of ancestry informative markers (AIMs), each maximized for the information in one the three contrasts among ancestral populations: Europeans (HAPMAP, CEU), Africans (HAPMAP, YRI and LWK), and Native Americans (full heritage Pima Indians). We estimate IGA and present an algorithm for their standard errors, compare IGA to principal components, emphasize the importance of balancing information in the ancestry informative markers (AIMs), and test the association of IGA with diabetic nephropathy in the combined sample. RESULTS: A fixed parental allele maximum likelihood algorithm was applied to the FIND to estimate IGA in four samples: 869 American Indians; 1385 African Americans; 1451 Mexican Americans; and 826 European Americans. When the information in the AIMs is unbalanced, the estimates are incorrect with large error. Individual genetic admixture is highly correlated with principle components for capturing population structure. It takes ~700 SNPs to reduce the average standard error of individual admixture below 0.01. When the samples are combined, the resulting population structure creates associations between IGA and diabetic nephropathy. CONCLUSIONS: The identified set of AIMs, which include American Indian parental allele frequencies, may be particularly useful for estimating genetic admixture in populations from the Americas. Failure to balance information in maximum likelihood, poly-ancestry models creates biased estimates of individual admixture with large error. This also occurs when estimating IGA using the Bayesian clustering method as implemented in the program STRUCTURE. Odds ratios for the associations of IGA with disease are consistent with what is known about the incidence and prevalence of diabetic nephropathy in these populations.


Subject(s)
Black or African American/genetics , Diabetic Nephropathies/genetics , Indians, North American/genetics , Mexican Americans/genetics , Polymorphism, Single Nucleotide , White People/genetics , Algorithms , Chromosome Mapping , Diabetic Nephropathies/ethnology , Genetic Markers/genetics , Genetic Predisposition to Disease , Genome-Wide Association Study/methods , Humans , Likelihood Functions , Models, Genetic , Oligonucleotide Array Sequence Analysis/methods , Principal Component Analysis , United States/ethnology
13.
Hum Genet ; 135(10): 1175-9, 2016 10.
Article in English | MEDLINE | ID: mdl-27393575

ABSTRACT

Genes of the immune system are relevant to the etiology of schizophrenia. However, to our knowledge, no large-scale studies, using molecular methods, have been undertaken to investigate the role of highly polymorphic immunoglobulin GM (γ marker) genes in this disorder. In this investigation, we aimed to determine whether particular GM genotypes were associated with susceptibility to schizophrenia. Using a matched case-control study design, we analyzed DNA samples from 798 subjects-398 patients with schizophrenia and 400 controls-obtained from the U.S. National Institute of Mental Health Repository. GM alleles were determined by the TaqMan(®) genotyping assay. The GM 3/3; 23-/23- genotype was highly significantly associated with susceptibility to schizophrenia (p = 0.0002). Subjects with this genotype were over three times (OR 3.4; 95 % CI 1.7-6.7) as likely to develop schizophrenia as those without this genotype. Our results show that immunoglobulin GM genes are risk factors for the development of schizophrenia. Since GM alleles have been implicated in gluten sensitivity and in immunity to neurotropic viruses associated with cognitive impairment, the results presented here may help unify these two disparate areas of pathology affected in this disorder.


Subject(s)
Genetic Predisposition to Disease , Immunoglobulin G/genetics , Immunoglobulin Gm Allotypes/genetics , Schizophrenia/genetics , Adult , Alleles , Cognitive Dysfunction/genetics , Cognitive Dysfunction/pathology , Female , Genetic Association Studies , Genotype , Glutens/metabolism , Humans , Male , Middle Aged , Risk Factors , Schizophrenia/pathology
15.
Stat Med ; 35(16): 2802-14, 2016 07 20.
Article in English | MEDLINE | ID: mdl-26833871

ABSTRACT

Converging evidence suggests that common complex diseases with the same or similar clinical manifestations could have different underlying genetic etiologies. While current research interests have shifted toward uncovering rare variants and structural variations predisposing to human diseases, the impact of heterogeneity in genetic studies of complex diseases has been largely overlooked. Most of the existing statistical methods assume the disease under investigation has a homogeneous genetic effect and could, therefore, have low power if the disease undergoes heterogeneous pathophysiological and etiological processes. In this paper, we propose a heterogeneity-weighted U (HWU) method for association analyses considering genetic heterogeneity. HWU can be applied to various types of phenotypes (e.g., binary and continuous) and is computationally efficient for high-dimensional genetic data. Through simulations, we showed the advantage of HWU when the underlying genetic etiology of a disease was heterogeneous, as well as the robustness of HWU against different model assumptions (e.g., phenotype distributions). Using HWU, we conducted a genome-wide analysis of nicotine dependence from the Study of Addiction: Genetics and Environments dataset. The genome-wide analysis of nearly one million genetic markers took 7h, identifying heterogeneous effects of two new genes (i.e., CYP3A5 and IKBKB) on nicotine dependence. Copyright © 2016 John Wiley & Sons, Ltd.


Subject(s)
Genetic Markers , Models, Genetic , Disease/genetics , Environment , Genetic Variation , Genome-Wide Association Study , Humans , Phenotype
16.
Genet Epidemiol ; 38(3): 242-53, 2014 Apr.
Article in English | MEDLINE | ID: mdl-24482034

ABSTRACT

With the advance of high-throughput sequencing technologies, it has become feasible to investigate the influence of the entire spectrum of sequencing variations on complex human diseases. Although association studies utilizing the new sequencing technologies hold great promise to unravel novel genetic variants, especially rare genetic variants that contribute to human diseases, the statistical analysis of high-dimensional sequencing data remains a challenge. Advanced analytical methods are in great need to facilitate high-dimensional sequencing data analyses. In this article, we propose a generalized genetic random field (GGRF) method for association analyses of sequencing data. Like other similarity-based methods (e.g., SIMreg and SKAT), the new method has the advantages of avoiding the need to specify thresholds for rare variants and allowing for testing multiple variants acting in different directions and magnitude of effects. The method is built on the generalized estimating equation framework and thus accommodates a variety of disease phenotypes (e.g., quantitative and binary phenotypes). Moreover, it has a nice asymptotic property, and can be applied to small-scale sequencing data without need for small-sample adjustment. Through simulations, we demonstrate that the proposed GGRF attains an improved or comparable power over a commonly used method, SKAT, under various disease scenarios, especially when rare variants play a significant role in disease etiology. We further illustrate GGRF with an application to a real dataset from the Dallas Heart Study. By using GGRF, we were able to detect the association of two candidate genes, ANGPTL3 and ANGPTL4, with serum triglyceride.


Subject(s)
Genetic Association Studies/methods , Sequence Analysis, DNA , Angiopoietin-Like Protein 3 , Angiopoietin-Like Protein 4 , Angiopoietin-like Proteins , Angiopoietins/genetics , Disease , Genetic Variation/genetics , Heart , Humans , Models, Genetic , Phenotype , Texas , Triglycerides/blood
17.
Genet Epidemiol ; 36(5): 480-7, 2012 Jul.
Article in English | MEDLINE | ID: mdl-22648939

ABSTRACT

Current genome-wide association studies still heavily rely on a single-marker strategy, in which each single nucleotide polymorphism (SNP) is tested individually for association with a phenotype. Although methods and software packages that consider multimarker models have become available, they have been slow to become widely adopted and their efficacy in real data analysis is often questioned. Based on conducting extensive simulations, here we endeavor to provide more insights into the performance of simple multimarker association tests as compared to single-marker tests. The results reveal the power advantage as well as disadvantage of the two- vs. the single-marker test. Power differentials depend on the correlation structure among tag SNPs, as well as that between tag SNPs and causal variants. A two-marker test has relatively better performance than single-marker tests when the correlation of the two adjacent markers is high. However, using HapMap data, two-marker tests tended to have a greater chance of being less powerful than single-marker tests, due to constraints on the number of actual possible haplotypes in the HapMap data. Yet, the average power difference was small whenever the one-marker test is more powerful, while there were many situations where the two-marker test can be much more powerful. These findings can be useful to guide analyses of future studies.


Subject(s)
Genome-Wide Association Study , Models, Genetic , Alleles , Chromosome Mapping/methods , Gene Frequency , Genetic Markers/genetics , Genotype , HapMap Project , Haplotypes , Humans , Linkage Disequilibrium , Models, Statistical , Polymorphism, Single Nucleotide , Reproducibility of Results
18.
Genet Epidemiol ; 36(6): 583-93, 2012 Sep.
Article in English | MEDLINE | ID: mdl-22760990

ABSTRACT

The potential importance of the joint action of genes, whether modeled with or without a statistical interaction term, has long been recognized. However, identifying such action has been a great challenge, especially when millions of genetic markers are involved. We propose a likelihood ratio-based Mann-Whitney test to search for joint gene action either among candidate genes or genome-wide. It extends the traditional univariate Mann-Whitney test to assess the joint association of genotypes at multiple loci with disease, allowing for high-order statistical interactions. Because only one overall significance test is conducted for the entire analysis, it avoids the issue of multiple testing. Moreover, the approach adopts a computationally efficient algorithm, making a genome-wide search feasible in a reasonable amount of time on a high performance personal computer. We evaluated the approach using both theoretical and real data. By applying the approach to 40 type 2 diabetes (T2D) susceptibility single-nucleotide polymorphisms (SNPs), we identified a four-locus model strongly associated with T2D in the Wellcome Trust (WT) study (permutation P-value < 0.001), and replicated the same finding in the Nurses' Health Study/Health Professionals Follow-Up Study (NHS/HPFS) (P-value = 3.03×10-11). We also conducted a genome-wide search on 385,598 SNPs in the WT study. The analysis took approximately 55 hr on a personal computer, identifying the same first two loci, but overall a different set of four SNPs, jointly associated with T2D (P-value = 1.29×10-5). The nominal significance of this same association reached 4.01×10-6 in the NHS/HPFS.


Subject(s)
Diabetes Mellitus, Type 2/genetics , Genetic Markers , Genetic Predisposition to Disease , Genome-Wide Association Study/methods , Polymorphism, Single Nucleotide , Algorithms , Genome-Wide Association Study/statistics & numerical data , Humans , Likelihood Functions , Models, Genetic , Models, Statistical
19.
BMC Genet ; 14: 17, 2013 Mar 04.
Article in English | MEDLINE | ID: mdl-23497289

ABSTRACT

BACKGROUND: Incorporating family data in genetic association studies has become increasingly appreciated, especially for its potential value in testing rare variants. We introduce here a variance-component based association test that can test multiple common or rare variants jointly using both family and unrelated samples. RESULTS: The proposed approach implemented in our R package aggregates or collapses the information across a region based on genetic similarity instead of genotype scores, which avoids the power loss when the effects are in different directions or have different association strengths. The method is also able to effectively leverage the LD information in a region and it can produce a test statistic with an adaptively estimated number of degrees of freedom. Our method can readily allow for the adjustment of non-genetic contributions to the familial similarity, as well as multiple covariates. CONCLUSIONS: We demonstrate through simulations that the proposed method achieves good performance in terms of Type I error control and statistical power. The method is implemented in the R package "fassoc", which provides a useful tool for data analysis and exploration.


Subject(s)
Genetic Association Studies , Models, Genetic , Software , Computer Simulation , Family , Female , Genotype , Humans , Male , Polymorphism, Single Nucleotide
20.
Stat Med ; 32(7): 1164-90, 2013 Mar 30.
Article in English | MEDLINE | ID: mdl-23018341

ABSTRACT

This paper is concerned with evaluating whether an interaction between two sets of risk factors for a binary trait is removable and, when it is removable, fitting a parsimonious additive model using a suitable link function to estimate the disease odds (on the natural logarithm scale). Statisticians define the term 'interaction' as a departure from additivity in a linear model on a specific scale on which the data are measured. Certain interactions may be eliminated via a transformation of the outcome such that the relationship between the risk factors and the outcome is additive on the transformed scale. Such interactions are known as removable interactions. We develop a novel test statistic for detecting the presence of a removable interaction in case-control studies. We consider the Guerrero and Johnson family of transformations and show that this family constitutes an appropriate link function for fitting an additive model when an interaction is removable. We use simulation studies to examine the type I error and power of the proposed test and to show that, when an interaction is removable, an additive model based on the Guerrero and Johnson link function leads to more precise estimates of the disease odds parameters and a better fit. We illustrate the proposed test and use of the transformation by using case-control data from three published studies. Finally, we indicate how one can check that, after transformation, no further interaction is significant.


Subject(s)
Biostatistics/methods , Disease/etiology , Adenoma/enzymology , Adenoma/etiology , Analysis of Variance , Aromatase/genetics , Arylamine N-Acetyltransferase/metabolism , Case-Control Studies , Colorectal Neoplasms/etiology , Disease/genetics , Endometrial Neoplasms/enzymology , Endometrial Neoplasms/etiology , Female , Humans , Linear Models , Logistic Models , Models, Statistical , Risk Factors , Smoking/adverse effects , Tea , Urinary Bladder Neoplasms/enzymology , Urinary Bladder Neoplasms/etiology
SELECTION OF CITATIONS
SEARCH DETAIL