Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
1.
Stat Methods Med Res ; 29(1): 44-56, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-30612522

RESUMO

Genetic association studies using high-throughput genotyping and sequencing technologies have identified a large number of genetic variants associated with complex human diseases. These findings have provided an unprecedented opportunity to identify individuals in the population at high risk for disease who carry causal genetic mutations and hold great promise for early intervention and individualized medicine. While interest is high in building risk prediction models based on recent genetic findings, it is crucial to have appropriate statistical measurements to assess the performance of a genetic risk prediction model. Predictiveness curves were recently proposed as a graphic tool for evaluating a risk prediction model on the basis of a single continuous biomarker. The curve evaluates a risk prediction model for classification performance as well as its usefulness when applied to a population. In this article, we extend the predictiveness curve to measure the collective contribution of multiple genetic variants. We further propose a nonparametric, U-statistics-based measurement, referred to as the U-Index, to quantify the performance of a multi-locus predictiveness curve. In particular, a global U-Index and a partial U-Index can be used in the general population and a subpopulation of particular clinical interest, respectively. Through simulation studies, we demonstrate that the proposed U-Index has advantages over several existing summary statistics under various disease models. We also show that the partial U-Index can have its own uniqueness when rare variants have a substantial contribution to disease risk. Finally, we use the proposed predictiveness curve and its corresponding U-Index to evaluate the performance of a genetic risk prediction model for nicotine dependence.


Assuntos
Predisposição Genética para Doença , Modelos Genéticos , Modelos Estatísticos , Tabagismo/genética , Biomarcadores/análise , Variação Genética , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Valor Preditivo dos Testes , Medição de Risco
2.
Ethn Health ; 24(7): 754-766, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-28922931

RESUMO

Background: The study of physical activity in cancer survivors has been limited to one cause, one effect relationships. In this exploratory study, we used recursive partitioning to examine multiple correlates that influence physical activity compliance rates in cancer survivors. Methods: African American breast cancer survivors (N = 267, Mean age = 54 years) participated in an online survey that examined correlates of physical activity. Recursive partitioning (RP) was used to examine complex and nonlinear associations between sociodemographic, medical, cancer-related, theoretical, and quality of life indicators. Results: Recursive partitioning revealed five distinct groups. Compliance with physical activity guidelines was highest (82% met guidelines) among survivors who reported higher mean action planning scores (P < 0.001) and lower mean barriers to physical activity (P = 0.035). Compliance with physical activity guidelines was lowest (9% met guidelines) among survivors who reported lower mean action and coping (P = 0.002) planning scores. Similarly, lower mean action planning scores and poor advanced lower functioning (P = 0.034), even in the context of higher coping planning scores, resulted in low physical activity compliance rates (13% met guidelines). Subsequent analyses revealed that body mass index (P = 0.019) and number of comorbidities (P = 0.003) were lowest in those with the highest compliance rates. Conclusion: Our findings support the notion that multiple factors determine physical activity compliance rates in African American breast cancer survivors. Interventions that encourage action and coping planning and reduce barriers in the context of addressing function limitations may increase physical activity compliance rates.


Assuntos
Neoplasias da Mama/psicologia , Sobreviventes de Câncer/psicologia , Árvores de Decisões , Exercício Físico/psicologia , Cooperação do Paciente , Negro ou Afro-Americano/psicologia , Neoplasias da Mama/etnologia , Feminino , Humanos , Pessoa de Meia-Idade , Cooperação do Paciente/etnologia , Cooperação do Paciente/psicologia , Qualidade de Vida
3.
J Zhejiang Univ Sci B ; 19(12): 935-947, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30507077

RESUMO

OBJECTIVE: As one of the most popular designs used in genetic research, family-based design has been well recognized for its advantages, such as robustness against population stratification and admixture. With vast amounts of genetic data collected from family-based studies, there is a great interest in studying the role of genetic markers from the aspect of risk prediction. This study aims to develop a new statistical approach for family-based risk prediction analysis with an improved prediction accuracy compared with existing methods based on family history. METHODS: In this study, we propose an ensemble-based likelihood ratio (ELR) approach, Fam-ELR, for family-based genomic risk prediction. Fam-ELR incorporates a clustered receiver operating characteristic (ROC) curve method to consider correlations among family samples, and uses a computationally efficient tree-assembling procedure for variable selection and model building. RESULTS: Through simulations, Fam-ELR shows its robustness in various underlying disease models and pedigree structures, and attains better performance than two existing family-based risk prediction methods. In a real-data application to a family-based genome-wide dataset of conduct disorder, Fam-ELR demonstrates its ability to integrate potential risk predictors and interactions into the model for improved accuracy, especially on a genome-wide level. CONCLUSIONS: By comparing existing approaches, such as genetic risk-score approach, Fam-ELR has the capacity of incorporating genetic variants with small or moderate marginal effects and their interactions into an improved risk prediction model. Therefore, it is a robust and useful approach for high-dimensional family-based risk prediction, especially on complex disease with unknown or less known disease etiology.


Assuntos
Transtorno da Conduta/genética , Predisposição Genética para Doença , Genoma Humano , Genômica , Área Sob a Curva , Simulação por Computador , Transtorno da Conduta/fisiopatologia , Saúde da Família , Feminino , Marcadores Genéticos , Variação Genética , Estudo de Associação Genômica Ampla , Humanos , Funções Verossimilhança , Masculino , Modelos Genéticos , Razão de Chances , Linhagem , Curva ROC , Reprodutibilidade dos Testes , Fatores de Risco
4.
Bioinformatics ; 33(13): 1963-1971, 2017 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-28334117

RESUMO

MOTIVATION: Second generation sequencing technologies are being increasingly used for genetic association studies, where the main research interest is to identify sets of genetic variants that contribute to various phenotypes. The phenotype can be univariate disease status, multivariate responses and even high-dimensional outcomes. Considering the genotype and phenotype as two complex objects, this also poses a general statistical problem of testing association between complex objects. RESULTS: We here proposed a similarity-based test, generalized similarity U (GSU), that can test the association between complex objects. We first studied the theoretical properties of the test in a general setting and then focused on the application of the test to sequencing association studies. Based on theoretical analysis, we proposed to use Laplacian Kernel-based similarity for GSU to boost power and enhance robustness. Through simulation, we found that GSU did have advantages over existing methods in terms of power and robustness. We further performed a whole genome sequencing (WGS) scan for Alzherimer's disease neuroimaging initiative data, identifying three genes, APOE , APOC1 and TOMM40 , associated with imaging phenotype. AVAILABILITY AND IMPLEMENTATION: We developed a C ++ package for analysis of WGS data using GSU. The source codes can be downloaded at https://github.com/changshuaiwei/gsu . CONTACT: weichangshuai@gmail.com ; qlu@epi.msu.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Estudos de Associação Genética/métodos , Modelos Estatísticos , Análise de Sequência de DNA/métodos , Software , Humanos , Modelos Genéticos
5.
BMC Proc ; 10(Suppl 7): 125-129, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27980623

RESUMO

BACKGROUND: With the advance of next-generation sequencing technologies, the study of rare variants in targeted genome regions or even the whole genome becomes feasible. Nevertheless, the massive amount of sequencing data brings great computational and statistical challenges for association analyses. Aside from sequencing variants, other high-throughput omic data (eg, gene expression data) also become available, and can be incorporated into association analysis for better modeling and power improvement. This motivates the need of developing computationally efficient and powerful approaches to model the joint associations of multilevel omic data with complex human diseases. METHODS: A similarity-based weighted U approach is used to model the joint effect of sequencing variants and gene expression. Using a Mexican American sample provided by Genetic Analysis Workshop 19 (GAW19), we performed a whole-genome joint association analysis of sequencing variants and gene expression with systolic (SBP) and diastolic blood pressure (DBP) and hypertension (HTN) phenotypes. RESULTS: The whole-genome joint association analysis was completed in 80 min on a high-performance personal computer with an i7 4700 CPU and 8 GB memory. Although no gene reached statistical significance after adjusting for multiple testing, some top-ranked genes attained a high significance level and may have biological plausibility to hypertension-related phenotypes. CONCLUSIONS: The weighted U approach is computationally efficient for high-dimensional data analysis, and is capable of integrating multiple levels of omic data into association analysis. Through a real data application, we demonstrate the potential benefit of using the new approach for joint association analysis of sequencing variants and gene expression.

6.
Stat Med ; 35(16): 2802-14, 2016 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-26833871

RESUMO

Converging evidence suggests that common complex diseases with the same or similar clinical manifestations could have different underlying genetic etiologies. While current research interests have shifted toward uncovering rare variants and structural variations predisposing to human diseases, the impact of heterogeneity in genetic studies of complex diseases has been largely overlooked. Most of the existing statistical methods assume the disease under investigation has a homogeneous genetic effect and could, therefore, have low power if the disease undergoes heterogeneous pathophysiological and etiological processes. In this paper, we propose a heterogeneity-weighted U (HWU) method for association analyses considering genetic heterogeneity. HWU can be applied to various types of phenotypes (e.g., binary and continuous) and is computationally efficient for high-dimensional genetic data. Through simulations, we showed the advantage of HWU when the underlying genetic etiology of a disease was heterogeneous, as well as the robustness of HWU against different model assumptions (e.g., phenotype distributions). Using HWU, we conducted a genome-wide analysis of nicotine dependence from the Study of Addiction: Genetics and Environments dataset. The genome-wide analysis of nearly one million genetic markers took 7h, identifying heterogeneous effects of two new genes (i.e., CYP3A5 and IKBKB) on nicotine dependence. Copyright © 2016 John Wiley & Sons, Ltd.


Assuntos
Marcadores Genéticos , Modelos Genéticos , Doença/genética , Meio Ambiente , Variação Genética , Estudo de Associação Genômica Ampla , Humanos , Fenótipo
7.
Ann Hum Genet ; 80(1): 20-31, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-26612412

RESUMO

Congenital heart defects (CHDs) develop through a complex interplay between genetic variants, epigenetic modifications, and maternal environmental exposures. Genetic studies of CHDs have commonly tested single genetic variants for association with CHDs. Less attention has been given to complex gene-by-gene and gene-by-environment interactions. In this study, we applied a recently developed likelihood-ratio Mann-Whitney (LRMW) method to detect joint actions among maternal variants, fetal variants, and maternal environmental exposures, allowing for high-order statistical interactions. All subjects are participants from the National Birth Defect Prevention Study, including 623 mother-offspring pairs with CHD-affected pregnancies and 875 mother-offspring pairs with unaffected pregnancies. Each individual has 872 single nucleotide polymorphisms encoding for critical enzymes in the homocysteine, folate, and trans-sulfuration pathways. By using the LRMW method, three variants (fetal rs625879, maternal rs2169650, and maternal rs8177441) were identified with a joint association to CHD risk (nominal P-value = 1.13e-07). These three variants are located within genes BHMT2, GSTP1, and GPX3, respectively. Further examination indicated that maternal SNP rs2169650 may interact with both fetal SNP rs625879 and maternal SNP rs8177441. Our findings suggest that the risk of CHD may be influenced by both the intragenerational interaction within the maternal genome and the intergenerational interaction between maternal and fetal genomes.


Assuntos
Betaína-Homocisteína S-Metiltransferase/genética , Glutationa Peroxidase/genética , Glutationa S-Transferase pi/genética , Cardiopatias Congênitas/genética , Polimorfismo de Nucleotídeo Único , Adulto , Análise Mutacional de DNA , Feminino , Feto , Genoma Humano , Genótipo , Humanos , Funções Verossimilhança , Exposição Materna , Adulto Jovem
8.
Curr Genomics ; 17(5): 403-415, 2016 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-28479869

RESUMO

Many complex diseases, such as psychiatric and behavioral disorders, are commonly characterized through various measurements that reflect physical, behavioral and psychological aspects of diseases. While it remains a great challenge to find a unified measurement to characterize a disease, the available multiple phenotypes can be analyzed jointly in the genetic association study. Simultaneously testing these phenotypes has many advantages, including considering different aspects of the disease in the analysis, and utilizing correlated phenotypes to improve the power of detecting disease-associated variants. Furthermore, complex diseases are likely caused by the interplay of multiple genetic variants through complicated mechanisms. Considering gene-gene interactions in the joint association analysis of complex diseases could further increase our ability to discover genetic variants involving complex disease pathways. In this article, we propose a stepwise U-test for joint association analysis of multiple loci and multiple phenotypes. Through simulations, we demonstrated that testing multiple phenotypes simultaneously could attain higher power than testing one single phenotype at a time, especially when there are shared genes contributing to multiple phenotypes. We also illustrated the proposed method with an application to Nicotine Dependence (ND), using datasets from the Study of Addition, Genetics and Environment (SAGE). The joint analysis of three ND phenotypes identified two SNPs, rs10508649 and rs2491397, and reached a nominal P-value of 3.79e-13. The association was further replicated in two independent datasets with P-values of 2.37e-05 and 7.46e-05.

9.
Genet Epidemiol ; 38(8): 699-708, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25331574

RESUMO

With advancements in next-generation sequencing technology, a massive amount of sequencing data is generated, which offers a great opportunity to comprehensively investigate the role of rare variants in the genetic etiology of complex diseases. Nevertheless, the high-dimensional sequencing data poses a great challenge for statistical analysis. The association analyses based on traditional statistical methods suffer substantial power loss because of the low frequency of genetic variants and the extremely high dimensionality of the data. We developed a Weighted U Sequencing test, referred to as WU-SEQ, for the high-dimensional association analysis of sequencing data. Based on a nonparametric U-statistic, WU-SEQ makes no assumption of the underlying disease model and phenotype distribution, and can be applied to a variety of phenotypes. Through simulation studies and an empirical study, we showed that WU-SEQ outperformed a commonly used sequence kernel association test (SKAT) method when the underlying assumptions were violated (e.g., the phenotype followed a heavy-tailed distribution). Even when the assumptions were satisfied, WU-SEQ still attained comparable performance to SKAT. Finally, we applied WU-SEQ to sequencing data from the Dallas Heart Study (DHS), and detected an association between ANGPTL 4 and very low density lipoprotein cholesterol.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Proteína 4 Semelhante a Angiopoietina , Angiopoietinas/genética , Estudos de Associação Genética , Variação Genética , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Humanos , Lipoproteínas LDL/genética , Modelos Genéticos , Fenótipo , Análise de Sequência de DNA/estatística & dados numéricos , Software
10.
BMC Genet ; 15: 101, 2014 Oct 16.
Artigo em Inglês | MEDLINE | ID: mdl-25318532

RESUMO

BACKGROUND: While the importance of gene-gene interactions in human diseases has been well recognized, identifying them has been a great challenge, especially through association studies with millions of genetic markers and thousands of individuals. Computationally efficient and powerful tools are in great need for the identification of new gene-gene interactions in high-dimensional association studies. RESULT: We develop C++ software for genome-wide gene-gene interaction analyses (GWGGI). GWGGI utilizes tree-based algorithms to search a large number of genetic markers for a disease-associated joint association with the consideration of high-order interactions, and then uses non-parametric statistics to test the joint association. The package includes two functions, likelihood ratio Mann-Whitney (LRMW) and Tree Assembling Mann-Whitney (TAMW). We optimize the data storage and computational efficiency of the software, making it feasible to run the genome-wide analysis on a personal computer. The use of GWGGI was demonstrated by using two real data-sets with nearly 500 k genetic markers. CONCLUSION: Through the empirical study, we demonstrated that the genome-wide gene-gene interaction analysis using GWGGI could be accomplished within a reasonable time on a personal computer (i.e., ~3.5 hours for LRMW and ~10 hours for TAMW). We also showed that LRMW was suitable to detect interaction among a small number of genetic variants with moderate-to-strong marginal effect, while TAMW was useful to detect interaction among a larger number of low-marginal-effect genetic variants.


Assuntos
Estudo de Associação Genômica Ampla , Software , Algoritmos , Biologia Computacional , Humanos , Polimorfismo de Nucleotídeo Único , Linguagens de Programação
11.
PLoS One ; 9(9): e105074, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25244256

RESUMO

While progress has been made in identifying common genetic variants associated with human diseases, for most of common complex diseases, the identified genetic variants only account for a small proportion of heritability. Challenges remain in finding additional unknown genetic variants predisposing to complex diseases. With the advance in next-generation sequencing technologies, sequencing studies have become commonplace in genetic research. The ongoing exome-sequencing and whole-genome-sequencing studies generate a massive amount of sequencing variants and allow researchers to comprehensively investigate their role in human diseases. The discovery of new disease-associated variants can be enhanced by utilizing powerful and computationally efficient statistical methods. In this paper, we propose a functional analysis of variance (FANOVA) method for testing an association of sequence variants in a genomic region with a qualitative trait. The FANOVA has a number of advantages: (1) it tests for a joint effect of gene variants, including both common and rare; (2) it fully utilizes linkage disequilibrium and genetic position information; and (3) allows for either protective or risk-increasing causal variants. Through simulations, we show that FANOVA outperform two popularly used methods - SKAT and a previously proposed method based on functional linear models (FLM), - especially if a sample size of a study is small and/or sequence variants have low to moderate effects. We conduct an empirical study by applying three methods (FANOVA, SKAT and FLM) to sequencing data from Dallas Heart Study. While SKAT and FLM respectively detected ANGPTL 4 and ANGPTL 3 associated with obesity, FANOVA was able to identify both genes associated with obesity.


Assuntos
Estudos de Associação Genética/estatística & dados numéricos , Variação Genética , Análise de Variância , Simulação por Computador , Exoma , Frequência do Gene , Humanos , Modelos Lineares , Desequilíbrio de Ligação , Software
12.
Genet Epidemiol ; 38(3): 242-53, 2014 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-24482034

RESUMO

With the advance of high-throughput sequencing technologies, it has become feasible to investigate the influence of the entire spectrum of sequencing variations on complex human diseases. Although association studies utilizing the new sequencing technologies hold great promise to unravel novel genetic variants, especially rare genetic variants that contribute to human diseases, the statistical analysis of high-dimensional sequencing data remains a challenge. Advanced analytical methods are in great need to facilitate high-dimensional sequencing data analyses. In this article, we propose a generalized genetic random field (GGRF) method for association analyses of sequencing data. Like other similarity-based methods (e.g., SIMreg and SKAT), the new method has the advantages of avoiding the need to specify thresholds for rare variants and allowing for testing multiple variants acting in different directions and magnitude of effects. The method is built on the generalized estimating equation framework and thus accommodates a variety of disease phenotypes (e.g., quantitative and binary phenotypes). Moreover, it has a nice asymptotic property, and can be applied to small-scale sequencing data without need for small-sample adjustment. Through simulations, we demonstrate that the proposed GGRF attains an improved or comparable power over a commonly used method, SKAT, under various disease scenarios, especially when rare variants play a significant role in disease etiology. We further illustrate GGRF with an application to a real dataset from the Dallas Heart Study. By using GGRF, we were able to detect the association of two candidate genes, ANGPTL3 and ANGPTL4, with serum triglyceride.


Assuntos
Estudos de Associação Genética/métodos , Análise de Sequência de DNA , Proteína 3 Semelhante a Angiopoietina , Proteína 4 Semelhante a Angiopoietina , Proteínas Semelhantes a Angiopoietina , Angiopoietinas/genética , Doença , Variação Genética/genética , Coração , Humanos , Modelos Genéticos , Fenótipo , Texas , Triglicerídeos/sangue
13.
J Pediatr ; 164(1): 189-191.e1, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24209717

RESUMO

We studied gene expression in 9 sets of paired newborn blood spots stored for 8-10 years in either the frozen state or the unfrozen state. Fewer genes were expressed in unfrozen spots, but the average correlation coefficient for overall gene expression comparing the frozen and unfrozen state was 0.771 (95% CI, 0.700-0.828).


Assuntos
Criopreservação , Congelamento , Perfilação da Expressão Gênica/métodos , Triagem Neonatal , Análise de Sequência com Séries de Oligonucleotídeos/métodos , RNA Mensageiro/sangue , Coleta de Amostras Sanguíneas , Humanos , Recém-Nascido , Fatores de Tempo
14.
J Matern Fetal Neonatal Med ; 26(18): 1765-7, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-23668672

RESUMO

OBJECTIVE: To examine the correlation in genes expressed in paired umbilical cord blood (UCB) and newborn blood (NB). METHOD: Total mRNA and mRNA of three gene sets (inflammatory, hypoxia, and thyroidal response) was assessed using microarray in UCB and NB spotted on Guthrie cards from 7 mother/infant pairs. RESULTS: The average gene expression correlation between paired UCB and NB samples was 0.941 when all expressed genes were considered, and 0.949 for three selected gene sets. CONCLUSION: The high correlation of UCB and NB gene expression suggest that either source may be useful for examining gene expression in the perinatal period.


Assuntos
Sangue Fetal/metabolismo , Expressão Gênica , Recém-Nascido/sangue , Coleta de Amostras Sanguíneas/métodos , Feminino , Perfilação da Expressão Gênica , Saúde , Humanos , Análise de Sequência com Séries de Oligonucleotídeos , Gravidez
15.
Genet Epidemiol ; 37(1): 84-91, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23135745

RESUMO

Common complex diseases are likely influenced by the interplay of hundreds, or even thousands, of genetic variants. Converging evidence shows that genetic variants with low marginal effects (LMEs) play an important role in disease development. Despite their potential significance, discovering LME genetic variants and assessing their joint association on high-dimensional data (e.g., genome-wide data) remain a great challenge. To facilitate joint association analysis among a large ensemble of LME genetic variants, we proposed a computationally efficient and powerful approach, which we call Trees Assembling Mann-Whitney (TAMW). Through simulation studies and an empirical data application, we found that TAMW outperformed multifactor dimensionality reduction (MDR) and the likelihood ratio-based Mann-Whitney approach (LRMW) when the underlying complex disease involves multiple LME loci and their interactions. For instance, in a simulation with 20 interacting LME loci, TAMW attained a higher power (power = 0.931) than both MDR (power = 0.599) and LRMW (power = 0.704). In an empirical study of 29 known Crohn's disease (CD) loci, TAMW also identified a stronger joint association with CD than those detected by MDR and LRMW. Finally, we applied TAMW to Wellcome Trust CD GWAS to conduct a genome-wide analysis. The analysis of 459K single nucleotide polymorphisms was completed in 40 hrs using parallel computing, and revealed a joint association predisposing to CD (P-value = 2.763 × 10(-19)). Further analysis of the newly discovered association suggested that 13 genes, such as ATG16L1 and LACC1, may play an important role in CD pathophysiological and etiological processes.


Assuntos
Loci Gênicos , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Simulação por Computador , Doença de Crohn/genética , Humanos , Redução Dimensional com Múltiplos Fatores
16.
Genet Epidemiol ; 36(6): 583-93, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-22760990

RESUMO

The potential importance of the joint action of genes, whether modeled with or without a statistical interaction term, has long been recognized. However, identifying such action has been a great challenge, especially when millions of genetic markers are involved. We propose a likelihood ratio-based Mann-Whitney test to search for joint gene action either among candidate genes or genome-wide. It extends the traditional univariate Mann-Whitney test to assess the joint association of genotypes at multiple loci with disease, allowing for high-order statistical interactions. Because only one overall significance test is conducted for the entire analysis, it avoids the issue of multiple testing. Moreover, the approach adopts a computationally efficient algorithm, making a genome-wide search feasible in a reasonable amount of time on a high performance personal computer. We evaluated the approach using both theoretical and real data. By applying the approach to 40 type 2 diabetes (T2D) susceptibility single-nucleotide polymorphisms (SNPs), we identified a four-locus model strongly associated with T2D in the Wellcome Trust (WT) study (permutation P-value < 0.001), and replicated the same finding in the Nurses' Health Study/Health Professionals Follow-Up Study (NHS/HPFS) (P-value = 3.03×10-11). We also conducted a genome-wide search on 385,598 SNPs in the WT study. The analysis took approximately 55 hr on a personal computer, identifying the same first two loci, but overall a different set of four SNPs, jointly associated with T2D (P-value = 1.29×10-5). The nominal significance of this same association reached 4.01×10-6 in the NHS/HPFS.


Assuntos
Diabetes Mellitus Tipo 2/genética , Marcadores Genéticos , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único , Algoritmos , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Funções Verossimilhança , Modelos Genéticos , Modelos Estatísticos
17.
Front Biosci (Elite Ed) ; 4(7): 2607-2617, 2012 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-22652671

RESUMO

Variations in complex traits are influenced by multiple genetic variants, environmental risk factors, and their interactions. Though substantial progress has been made in identifying single genetic variants associated with complex traits, detecting the gene-gene and gene-environment interactions remains a great challenge. When a large number of genetic variants and environmental risk factors are involved, searching for interactions is limited to pair-wise interactions due to the exponentially increased feature space and computational intensity. Alternatively, recursive partitioning approaches, such as random forests, have gained popularity in high-dimensional genetic association studies. In this article, we propose a U-Statistic-based random forest approach, referred to as Forest U-Test, for genetic association studies with quantitative traits. Through simulation studies, we showed that the Forest U-Test outperformed exiting methods. The proposed method was also applied to study Cannabis Dependence (CD), using three independent datasets from the Study of Addiction: Genetics and Environment. A significant joint association was detected with an empirical p-value less than 0.001. The finding was also replicated in two independent datasets with p-values of 5.93e-19 and 4.70e-17, respectively.


Assuntos
Estudo de Associação Genômica Ampla , Árvores de Decisões , Humanos
18.
Front Genet ; 3: 83, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22629285

RESUMO

Cocaine-associated biomedical and psychosocial problems are substantial twenty-first century global burdens of disease. This burden is largely driven by a cocaine dependence process that becomes engaged with increasing occasions of cocaine product use. For this reason, the development of a risk-prediction model for cocaine dependence may be of special value. Ultimately, success in building such a risk-prediction model may help promote personalized cocaine dependence prediction, prevention, and treatment approaches not presently available. As an initial step toward this goal, we conducted a genome-environmental risk-prediction study for cocaine dependence, simultaneously considering 948,658 single nucleotide polymorphisms (SNPs), six potentially cocaine-related facets of environment, and three personal characteristics. In this study, a novel statistical approach was applied to 1045 case-control samples from the Family Study of Cocaine Dependence. The results identify 330 low- to medium-effect size SNPs (i.e., those with a single-locus p-value of less than 10(-4)) that made a substantial contribution to cocaine dependence risk prediction (AUC = 0.718). Inclusion of six facets of environment and three personal characteristics yielded greater accuracy (AUC = 0.809). Of special importance was the joint effect of childhood abuse (CA) among trauma experiences and the GBE1 gene in cocaine dependence risk prediction. Genome-environmental risk-prediction models may become more promising in future risk-prediction research, once a more substantial array of environmental facets are taken into account, sometimes with model improvement when gene-by-environment product terms are included as part of these risk predication models.

19.
Hum Hered ; 71(3): 161-70, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21778735

RESUMO

OBJECTIVE: Predictive tests that capitalize on emerging genetic findings hold great promise for enhanced personalized healthcare. With the emergence of a large amount of data from genome-wide association studies (GWAS), interest has shifted towards high-dimensional risk prediction. METHODS: To form predictive genetic tests on high-dimensional data, we propose a non-parametric method, called the 'forward ROC method'. The method adopts a computationally efficient algorithm to search for environment risk factors, genetic predictors on the entire genome, and their possible interactions for an optimal risk prediction model, without relying on prior knowledge of known risk factors. An efficient yet powerful procedure is also incorporated into the method to handle missing data. RESULTS: Through simulations and real data applications, we found our proposed method outperformed the existing approaches. We applied the new method to the Wellcome Trust rheumatoid arthritis GWAS dataset with a total of 460,547 markers. The results from the risk prediction analysis suggested important roles of HLA-DRB1 and PTPN22 in predicting rheumatoid arthritis. CONCLUSION: We proposed a powerful and robust approach for high-dimensional risk prediction. The new method will facilitate future risk prediction that considers a large number of predictors and their interaction for improved performance.


Assuntos
Bases de Dados Genéticas/estatística & dados numéricos , Testes Genéticos/métodos , Testes Genéticos/estatística & dados numéricos , Artrite Reumatoide/genética , Simulação por Computador , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Valor Preditivo dos Testes , Curva ROC , Reprodutibilidade dos Testes , Fatores de Risco , Estatísticas não Paramétricas
20.
BMC Genet ; 12: 19, 2011 Jan 28.
Artigo em Inglês | MEDLINE | ID: mdl-21276233

RESUMO

BACKGROUND: Silkworm is the basis of sericultural industry and the model organism in insect genetics study. Mapping quantitative trait loci (QTLs) underlying economically important traits of silkworm is of high significance for promoting the silkworm molecular breeding and advancing our knowledge on genetic architecture of the Lepidoptera. Yet, the currently used mapping methods are not well suitable for silkworm, because of ignoring the recombination difference in meiosis between two sexes. RESULTS: A mixed linear model including QTL main effects, epistatic effects, and QTL × sex interaction effects was proposed for mapping QTLs in an F2 population of silkworm. The number and positions of QTLs were determined by F-test and model selection. The Markov chain Monte Carlo (MCMC) algorithm was employed to estimate and test genetic effects of QTLs and QTL × sex interaction effects. The effectiveness of the model and statistical method was validated by a series of simulations. The results indicate that when markers are distributed sparsely on chromosomes, our method will substantially improve estimation accuracy as compared to the normal chiasmate F2 model. We also found that a sample size of hundreds was sufficiently large to unbiasedly estimate all the four types of epistases (i.e., additive-additive, additive-dominance, dominance-additive, and dominance-dominance) when the paired QTLs reside on different chromosomes in silkworm. CONCLUSION: The proposed method could accurately estimate not only the additive, dominance and digenic epistatic effects but also their interaction effects with sex, correcting the potential bias and precision loss in the current QTL mapping practice of silkworm and thus representing an important addition to the arsenal of QTL mapping tools.


Assuntos
Bombyx/genética , Mapeamento Cromossômico/métodos , Modelos Estatísticos , Locos de Características Quantitativas , Animais , Epistasia Genética , Modelos Genéticos , Método de Monte Carlo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA