Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 123
Filtrar
Más filtros

Bases de datos
Tipo del documento
Intervalo de año de publicación
1.
Genet Epidemiol ; 47(6): 409-431, 2023 09.
Artículo en Inglés | MEDLINE | ID: mdl-37101379

RESUMEN

In genetic studies, many phenotypes have multiple naturally ordered discrete values. The phenotypes can be correlated with each other. If multiple correlated ordinal traits are analyzed simultaneously, the power of analysis may increase significantly while the false positives can be controlled well. In this study, we propose bivariate functional ordinal linear regression (BFOLR) models using latent regressions with cumulative logit link or probit link to perform a gene-based analysis for bivariate ordinal traits and sequencing data. In the proposed BFOLR models, genetic variant data are viewed as stochastic functions of physical positions, and the genetic effects are treated as a function of physical positions. The BFOLR models take the correlation of the two ordinal traits into account via latent variables. The BFOLR models are built upon functional data analysis which can be revised to analyze the bivariate ordinal traits and high-dimension genetic data. The methods are flexible and can analyze three types of genetic data: (1) rare variants only, (2) common variants only, and (3) a combination of rare and common variants. Extensive simulation studies show that the likelihood ratio tests of the BFOLR models control type I errors well and have good power performance. The BFOLR models are applied to analyze Age-Related Eye Disease Study data, in which two genes, CFH and ARMS2, are found to strongly associate with eye drusen size, drusen area, age-related macular degeneration (AMD) categories, and AMD severity scale.


Asunto(s)
Degeneración Macular , Modelos Genéticos , Humanos , Fenotipo , Degeneración Macular/genética , Simulación por Computador , Modelos Lineales
2.
Genet Epidemiol ; 46(5-6): 234-255, 2022 07.
Artículo en Inglés | MEDLINE | ID: mdl-35438198

RESUMEN

In this paper, we develop functional ordinal logistic regression (FOLR) models to perform gene-based analysis of ordinal traits. In the proposed FOLR models, genetic variant data are viewed as stochastic functions of physical positions and the genetic effects are treated as a function of physical positions. The FOLR models are built upon functional data analysis which can be revised to analyze the ordinal traits and high dimension genetic data. The proposed methods are capable of dealing with dense genotype data which is usually encountered in analyzing the next-generation sequencing data. The methods are flexible and can analyze three types of genetic data: (1) rare variants only, (2) common variants only, and (3) a combination of rare and common variants. Simulation studies show that the likelihood ratio test statistics of the FOLR models control type I errors well and have good power performance. The proposed methods achieve the goals of analyzing ordinal traits directly, reducing high dimensionality of dense genetic variants, being computationally manageable, facilitating model convergence, properly controlling type I errors, and maintaining high power levels. The FOLR models are applied to analyze Age-Related Eye Disease Study data, in which two genes are found to strongly associate with four ordinal traits.


Asunto(s)
Pruebas Genéticas , Modelos Genéticos , Simulación por Computador , Variación Genética , Genotipo , Humanos , Modelos Logísticos , Fenotipo
3.
Genet Epidemiol ; 45(5): 455-470, 2021 07.
Artículo en Inglés | MEDLINE | ID: mdl-33645812

RESUMEN

Genetic studies of two related survival outcomes of a pleiotropic gene are commonly encountered but statistical models to analyze them are rarely developed. To analyze sequencing data, we propose mixed effect Cox proportional hazard models by functional regressions to perform gene-based joint association analysis of two survival traits motivated by our ongoing real studies. These models extend fixed effect Cox models of univariate survival traits by incorporating variations and correlation of multivariate survival traits into the models. The associations between genetic variants and two survival traits are tested by likelihood ratio test statistics. Extensive simulation studies suggest that type I error rates are well controlled and power performances are stable. The proposed models are applied to analyze bivariate survival traits of left and right eyes in the age-related macular degeneration progression.


Asunto(s)
Oftalmopatías , Variación Genética , Oftalmopatías/genética , Estudios de Asociación Genética , Humanos , Modelos Genéticos , Fenotipo
4.
Genet Epidemiol ; 43(8): 952-965, 2019 12.
Artículo en Inglés | MEDLINE | ID: mdl-31502722

RESUMEN

The importance to integrate survival analysis into genetics and genomics is widely recognized, but only a small number of statisticians have produced relevant work toward this study direction. For unrelated population data, functional regression (FR) models have been developed to test for association between a quantitative/dichotomous/survival trait and genetic variants in a gene region. In major gene association analysis, these models have higher power than sequence kernel association tests. In this paper, we extend this approach to analyze censored traits for family data or related samples using FR based mixed effect Cox models (FamCoxME). The FamCoxME model effect of major gene as fixed mean via functional data analysis techniques, the local gene or polygene variations or both as random, and the correlation of pedigree members by kinship coefficients or genetic relationship matrix or both. The association between the censored trait and the major gene is tested by likelihood ratio tests (FamCoxME FR LRT). Simulation results indicate that the LRT control the type I error rates accurately/conservatively and have good power levels when both local gene or polygene variations are modeled. The proposed methods were applied to analyze a breast cancer data set from the Consortium of Investigators of Modifiers of BRCA1 and BRCA2 (CIMBA). The FamCoxME provides a new tool for gene-based analysis of family-based studies or related samples.


Asunto(s)
Estudios de Asociación Genética , Modelos Genéticos , Análisis de Supervivencia , Simulación por Computador , Variación Genética , Humanos , Linaje , Fenotipo , Modelos de Riesgos Proporcionales , Análisis de Regresión
5.
Genet Epidemiol ; 43(2): 189-206, 2019 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-30537345

RESUMEN

We develop linear mixed models (LMMs) and functional linear mixed models (FLMMs) for gene-based tests of association between a quantitative trait and genetic variants on pedigrees. The effects of a major gene are modeled as a fixed effect, the contributions of polygenes are modeled as a random effect, and the correlations of pedigree members are modeled via inbreeding/kinship coefficients. F -statistics and χ 2 likelihood ratio test (LRT) statistics based on the LMMs and FLMMs are constructed to test for association. We show empirically that the F -distributed statistics provide a good control of the type I error rate. The F -test statistics of the LMMs have similar or higher power than the FLMMs, kernel-based famSKAT (family-based sequence kernel association test), and burden test famBT (family-based burden test). The F -statistics of the FLMMs perform well when analyzing a combination of rare and common variants. For small samples, the LRT statistics of the FLMMs control the type I error rate well at the nominal levels α = 0.01 and 0.05 . For moderate/large samples, the LRT statistics of the FLMMs control the type I error rates well. The LRT statistics of the LMMs can lead to inflated type I error rates. The proposed models are useful in whole genome and whole exome association studies of complex traits.


Asunto(s)
Estudios de Asociación Genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Modelos Genéticos , Carácter Cuantitativo Heredable , Simulación por Computador , Familia , Humanos , Modelos Lineales , Miopía/genética
6.
PLoS Genet ; 12(4): e1005965, 2016 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-27104857

RESUMEN

To date, most genetic analyses of phenotypes have focused on analyzing single traits or analyzing each phenotype independently. However, joint epistasis analysis of multiple complementary traits will increase statistical power and improve our understanding of the complicated genetic structure of the complex diseases. Despite their importance in uncovering the genetic structure of complex traits, the statistical methods for identifying epistasis in multiple phenotypes remains fundamentally unexplored. To fill this gap, we formulate a test for interaction between two genes in multiple quantitative trait analysis as a multiple functional regression (MFRG) in which the genotype functions (genetic variant profiles) are defined as a function of the genomic position of the genetic variants. We use large-scale simulations to calculate Type I error rates for testing interaction between two genes with multiple phenotypes and to compare the power with multivariate pairwise interaction analysis and single trait interaction analysis by a single variate functional regression model. To further evaluate performance, the MFRG for epistasis analysis is applied to five phenotypes of exome sequence data from the NHLBI's Exome Sequencing Project (ESP) to detect pleiotropic epistasis. A total of 267 pairs of genes that formed a genetic interaction network showed significant evidence of epistasis influencing five traits. The results demonstrate that the joint interaction analysis of multiple phenotypes has a much higher power to detect interaction than the interaction analysis of a single trait and may open a new direction to fully uncovering the genetic structure of multiple phenotypes.


Asunto(s)
Epistasis Genética , Modelos Genéticos , Sitios de Carácter Cuantitativo , Exoma , Humanos , Polimorfismo de Nucleótido Simple
7.
BMC Bioinformatics ; 19(1): 448, 2018 Nov 22.
Artículo en Inglés | MEDLINE | ID: mdl-30466390

RESUMEN

BACKGROUND: Testing the dependence of two variables is one of the fundamental tasks in statistics. In this work, we developed an open-source R package (knnAUC) for detecting nonlinear dependence between one continuous variable X and one binary dependent variables Y (0 or 1). RESULTS: We addressed this problem by using knnAUC (k-nearest neighbors AUC test, the R package is available at https://sourceforge.net/projects/knnauc/ ). In the knnAUC software framework, we first resampled a dataset to get the training and testing dataset according to the sample ratio (from 0 to 1), and then constructed a k-nearest neighbors algorithm classifier to get the yhat estimator (the probability of y = 1) of testy (the true label of testing dataset). Finally, we calculated the AUC (area under the curve of receiver operating characteristic) estimator and tested whether the AUC estimator is greater than 0.5. To evaluate the advantages of knnAUC compared to seven other popular methods, we performed extensive simulations to explore the relationships between eight different methods and compared the false positive rates and statistical power using both simulated and real datasets (Chronic hepatitis B datasets and kidney cancer RNA-seq datasets). CONCLUSIONS: We concluded that knnAUC is an efficient R package to test non-linear dependence between one continuous variable and one binary dependent variable especially in computational biology area.


Asunto(s)
Análisis de Secuencia de ARN/métodos , Análisis por Conglomerados , Biología Computacional/métodos , Humanos
8.
Genet Epidemiol ; 41(1): 18-34, 2017 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-27917525

RESUMEN

In this paper, extensive simulations are performed to compare two statistical methods to analyze multiple correlated quantitative phenotypes: (1) approximate F-distributed tests of multivariate functional linear models (MFLM) and additive models of multivariate analysis of variance (MANOVA), and (2) Gene Association with Multiple Traits (GAMuT) for association testing of high-dimensional genotype data. It is shown that approximate F-distributed tests of MFLM and MANOVA have higher power and are more appropriate for major gene association analysis (i.e., scenarios in which some genetic variants have relatively large effects on the phenotypes); GAMuT has higher power and is more appropriate for analyzing polygenic effects (i.e., effects from a large number of genetic variants each of which contributes a small amount to the phenotypes). MFLM and MANOVA are very flexible and can be used to perform association analysis for (i) rare variants, (ii) common variants, and (iii) a combination of rare and common variants. Although GAMuT was designed to analyze rare variants, it can be applied to analyze a combination of rare and common variants and it performs well when (1) the number of genetic variants is large and (2) each variant contributes a small amount to the phenotypes (i.e., polygenes). MFLM and MANOVA are fixed effect models that perform well for major gene association analysis. GAMuT can be viewed as an extension of sequence kernel association tests (SKAT). Both GAMuT and SKAT are more appropriate for analyzing polygenic effects and they perform well not only in the rare variant case, but also in the case of a combination of rare and common variants. Data analyses of European cohorts and the Trinity Students Study are presented to compare the performance of the two methods.


Asunto(s)
Estudios de Asociación Genética , Marcadores Genéticos/genética , Variación Genética/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Lípidos/genética , Modelos Genéticos , Herencia Multifactorial/genética , Análisis de Varianza , Genoma Humano , Genotipo , Humanos , Lípidos/análisis , Fenotipo , Sitios de Carácter Cuantitativo
9.
PLoS Comput Biol ; 13(10): e1005788, 2017 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-29040274

RESUMEN

Investigating the pleiotropic effects of genetic variants can increase statistical power, provide important information to achieve deep understanding of the complex genetic structures of disease, and offer powerful tools for designing effective treatments with fewer side effects. However, the current multiple phenotype association analysis paradigm lacks breadth (number of phenotypes and genetic variants jointly analyzed at the same time) and depth (hierarchical structure of phenotype and genotypes). A key issue for high dimensional pleiotropic analysis is to effectively extract informative internal representation and features from high dimensional genotype and phenotype data. To explore correlation information of genetic variants, effectively reduce data dimensions, and overcome critical barriers in advancing the development of novel statistical methods and computational algorithms for genetic pleiotropic analysis, we proposed a new statistic method referred to as a quadratically regularized functional CCA (QRFCCA) for association analysis which combines three approaches: (1) quadratically regularized matrix factorization, (2) functional data analysis and (3) canonical correlation analysis (CCA). Large-scale simulations show that the QRFCCA has a much higher power than that of the ten competing statistics while retaining the appropriate type 1 errors. To further evaluate performance, the QRFCCA and ten other statistics are applied to the whole genome sequencing dataset from the TwinsUK study. We identify a total of 79 genes with rare variants and 67 genes with common variants significantly associated with the 46 traits using QRFCCA. The results show that the QRFCCA substantially outperforms the ten other statistics.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Pleiotropía Genética/genética , Modelos Estadísticos , Análisis de Secuencia de ADN , Algoritmos , Simulación por Computador , Estudio de Asociación del Genoma Completo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Análisis de Componente Principal
10.
Genet Epidemiol ; 40(8): 702-721, 2016 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-27374056

RESUMEN

In association studies of complex traits, fixed-effect regression models are usually used to test for association between traits and major gene loci. In recent years, variance-component tests based on mixed models were developed for region-based genetic variant association tests. In the mixed models, the association is tested by a null hypothesis of zero variance via a sequence kernel association test (SKAT), its optimal unified test (SKAT-O), and a combined sum test of rare and common variant effect (SKAT-C). Although there are some comparison studies to evaluate the performance of mixed and fixed models, there is no systematic analysis to determine when the mixed models perform better and when the fixed models perform better. Here we evaluated, based on extensive simulations, the performance of the fixed and mixed model statistics, using genetic variants located in 3, 6, 9, 12, and 15 kb simulated regions. We compared the performance of three models: (i) mixed models that lead to SKAT, SKAT-O, and SKAT-C, (ii) traditional fixed-effect additive models, and (iii) fixed-effect functional regression models. To evaluate the type I error rates of the tests of fixed models, we generated genotype data by two methods: (i) using all variants, (ii) using only rare variants. We found that the fixed-effect tests accurately control or have low false positive rates. We performed simulation analyses to compare power for two scenarios: (i) all causal variants are rare, (ii) some causal variants are rare and some are common. Either one or both of the fixed-effect models performed better than or similar to the mixed models except when (1) the region sizes are 12 and 15 kb and (2) effect sizes are small. Therefore, the assumption of mixed models could be satisfied and SKAT/SKAT-O/SKAT-C could perform better if the number of causal variants is large and each causal variant contributes a small amount to the traits (i.e., polygenes). In major gene association studies, we argue that the fixed-effect models perform better or similarly to mixed models in most cases because some variants should affect the traits relatively large. In practice, it makes sense to perform analysis by both the fixed and mixed effect models and to make a comparison, and this can be readily done using our R codes and the SKAT packages.


Asunto(s)
Simulación por Computador , Estudios de Asociación Genética , Marcadores Genéticos/genética , Variación Genética/genética , Modelos Estadísticos , Herencia Multifactorial/genética , Sitios de Carácter Cuantitativo/genética , Genotipo , Enfermedad de Hirschsprung/genética , Humanos , Trastornos del Metabolismo de los Lípidos/genética , Modelos Genéticos , Defectos del Tubo Neural/genética , Fenotipo
11.
Genet Epidemiol ; 40(2): 133-43, 2016 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-26782979

RESUMEN

Genetic studies of survival outcomes have been proposed and conducted recently, but statistical methods for identifying genetic variants that affect disease progression are rarely developed. Motivated by our ongoing real studies, here we develop Cox proportional hazard models using functional regression (FR) to perform gene-based association analysis of survival traits while adjusting for covariates. The proposed Cox models are fixed effect models where the genetic effects of multiple genetic variants are assumed to be fixed. We introduce likelihood ratio test (LRT) statistics to test for associations between the survival traits and multiple genetic variants in a genetic region. Extensive simulation studies demonstrate that the proposed Cox RF LRT statistics have well-controlled type I error rates. To evaluate power, we compare the Cox FR LRT with the previously developed burden test (BT) in a Cox model and sequence kernel association test (SKAT), which is based on mixed effect Cox models. The Cox FR LRT statistics have higher power than or similar power as Cox SKAT LRT except when 50%/50% causal variants had negative/positive effects and all causal variants are rare. In addition, the Cox FR LRT statistics have higher power than Cox BT LRT. The models and related test statistics can be useful in the whole genome and whole exome association studies. An age-related macular degeneration dataset was analyzed as an example.


Asunto(s)
Progresión de la Enfermedad , Estudios de Asociación Genética/métodos , Variación Genética/genética , Modelos Genéticos , Simulación por Computador , Exoma/genética , Pruebas Genéticas , Humanos , Fenotipo , Modelos de Riesgos Proporcionales , Análisis de Regresión
12.
BMC Genomics ; 18(1): 385, 2017 05 18.
Artículo en Inglés | MEDLINE | ID: mdl-28521784

RESUMEN

BACKGROUND: Epistasis plays an essential rule in understanding the regulation mechanisms and is an essential component of the genetic architecture of the gene expressions. However, interaction analysis of gene expressions remains fundamentally unexplored due to great computational challenges and data availability. Due to variation in splicing, transcription start sites, polyadenylation sites, post-transcriptional RNA editing across the entire gene, and transcription rates of the cells, RNA-seq measurements generate large expression variability and collectively create the observed position level read count curves. A single number for measuring gene expression which is widely used for microarray measured gene expression analysis is highly unlikely to sufficiently account for large expression variation across the gene. Simultaneously analyzing epistatic architecture using the RNA-seq and whole genome sequencing (WGS) data poses enormous challenges. METHODS: We develop a nonlinear functional regression model (FRGM) with functional responses where the position-level read counts within a gene are taken as a function of genomic position, and functional predictors where genotype profiles are viewed as a function of genomic position, for epistasis analysis with RNA-seq data. Instead of testing the interaction of all possible pair-wises SNPs, the FRGM takes a gene as a basic unit for epistasis analysis, which tests for the interaction of all possible pairs of genes and use all the information that can be accessed to collectively test interaction between all possible pairs of SNPs within two genome regions. RESULTS: By large-scale simulations, we demonstrate that the proposed FRGM for epistasis analysis can achieve the correct type 1 error and has higher power to detect the interactions between genes than the existing methods. The proposed methods are applied to the RNA-seq and WGS data from the 1000 Genome Project. The numbers of pairs of significantly interacting genes after Bonferroni correction identified using FRGM, RPKM and DESeq were 16,2361, 260 and 51, respectively, from the 350 European samples. CONCLUSIONS: The proposed FRGM for epistasis analysis of RNA-seq can capture isoform and position-level information and will have a broad application. Both simulations and real data analysis highlight the potential for the FRGM to be a good choice of the epistatic analysis with sequencing data.


Asunto(s)
Epistasis Genética/genética , Genómica/métodos , Sitios de Carácter Cuantitativo/genética , Secuenciación Completa del Genoma , Ontología de Genes , Redes Reguladoras de Genes , Sistema de Señalización de MAP Quinasas/genética , Dinámicas no Lineales , Polimorfismo de Nucleótido Simple , Análisis de Regresión
13.
Genome Res ; 24(6): 989-98, 2014 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-24803592

RESUMEN

The critical barrier in interaction analysis for rare variants is that most traditional statistical methods for testing interactions were originally designed for testing the interaction between common variants and are difficult to apply to rare variants because of their prohibitive computational time and poor ability. The great challenges for successful detection of interactions with next-generation sequencing (NGS) data are (1) lack of methods for interaction analysis with rare variants, (2) severe multiple testing, and (3) time-consuming computations. To meet these challenges, we shift the paradigm of interaction analysis between two loci to interaction analysis between two sets of loci or genomic regions and collectively test interactions between all possible pairs of SNPs within two genomic regions. In other words, we take a genome region as a basic unit of interaction analysis and use high-dimensional data reduction and functional data analysis techniques to develop a novel functional regression model to collectively test interactions between all possible pairs of single nucleotide polymorphisms (SNPs) within two genome regions. By intensive simulations, we demonstrate that the functional regression models for interaction analysis of the quantitative trait have the correct type 1 error rates and a much better ability to detect interactions than the current pairwise interaction analysis. The proposed method was applied to exome sequence data from the NHLBI's Exome Sequencing Project (ESP) and CHARGE-S study. We discovered 27 pairs of genes showing significant interactions after applying the Bonferroni correction (P-values < 4.58 × 10(-10)) in the ESP, and 11 were replicated in the CHARGE-S study.


Asunto(s)
Epistasis Genética , Modelos Genéticos , Carácter Cuantitativo Heredable , Genoma Humano , Humanos , Polimorfismo de Nucleótido Simple , Análisis de Regresión
14.
Genet Epidemiol ; 39(4): 259-75, 2015 May.
Artículo en Inglés | MEDLINE | ID: mdl-25809955

RESUMEN

In genetics, pleiotropy describes the genetic effect of a single gene on multiple phenotypic traits. A common approach is to analyze the phenotypic traits separately using univariate analyses and combine the test results through multiple comparisons. This approach may lead to low power. Multivariate functional linear models are developed to connect genetic variant data to multiple quantitative traits adjusting for covariates for a unified analysis. Three types of approximate F-distribution tests based on Pillai-Bartlett trace, Hotelling-Lawley trace, and Wilks's Lambda are introduced to test for association between multiple quantitative traits and multiple genetic variants in one genetic region. The approximate F-distribution tests provide much more significant results than those of F-tests of univariate analysis and optimal sequence kernel association test (SKAT-O). Extensive simulations were performed to evaluate the false positive rates and power performance of the proposed models and tests. We show that the approximate F-distribution tests control the type I error rates very well. Overall, simultaneous analysis of multiple traits can increase power performance compared to an individual test of each trait. The proposed methods were applied to analyze (1) four lipid traits in eight European cohorts, and (2) three biochemical traits in the Trinity Students Study. The approximate F-distribution tests provide much more significant results than those of F-tests of univariate analysis and SKAT-O for the three biochemical traits. The approximate F-distribution tests of the proposed functional linear models are more sensitive than those of the traditional multivariate linear models that in turn are more sensitive than SKAT-O in the univariate case. The analysis of the four lipid traits and the three biochemical traits detects more association than SKAT-O in the univariate case.


Asunto(s)
Marcadores Genéticos/genética , Pleiotropía Genética , Variación Genética/genética , Modelos Lineales , Modelos Genéticos , Sitios de Carácter Cuantitativo , Estudios de Cohortes , Genoma Humano , Humanos , Fenotipo , Programas Informáticos
15.
BMC Genomics ; 17(1): 881, 2016 11 07.
Artículo en Inglés | MEDLINE | ID: mdl-27821073

RESUMEN

BACKGROUND: The widely used genetic pleiotropic analyses of multiple phenotypes are often designed for examining the relationship between common variants and a few phenotypes. They are not suited for both high dimensional phenotypes and high dimensional genotype (next-generation sequencing) data. To overcome limitations of the traditional genetic pleiotropic analysis of multiple phenotypes, we develop sparse structural equation models (SEMs) as a general framework for a new paradigm of genetic analysis of multiple phenotypes. To incorporate both common and rare variants into the analysis, we extend the traditional multivariate SEMs to sparse functional SEMs. To deal with high dimensional phenotype and genotype data, we employ functional data analysis and the alternative direction methods of multiplier (ADMM) techniques to reduce data dimension and improve computational efficiency. RESULTS: Using large scale simulations we showed that the proposed methods have higher power to detect true causal genetic pleiotropic structure than other existing methods. Simulations also demonstrate that the gene-based pleiotropic analysis has higher power than the single variant-based pleiotropic analysis. The proposed method is applied to exome sequence data from the NHLBI's Exome Sequencing Project (ESP) with 11 phenotypes, which identifies a network with 137 genes connected to 11 phenotypes and 341 edges. Among them, 114 genes showed pleiotropic genetic effects and 45 genes were reported to be associated with phenotypes in the analysis or other cardiovascular disease (CVD) related phenotypes in the literature. CONCLUSIONS: Our proposed sparse functional SEMs can incorporate both common and rare variants into the analysis and the ADMM algorithm can efficiently solve the penalized SEMs. Using this model we can jointly infer genetic architecture and casual phenotype network structure, and decompose the genetic effect into direct, indirect and total effect. Using large scale simulations we showed that the proposed methods have higher power to detect true causal genetic pleiotropic structure than other existing methods.


Asunto(s)
Estudios de Asociación Genética , Pleiotropía Genética , Predisposición Genética a la Enfermedad , Modelos Genéticos , Modelos Estadísticos , Fenotipo , Algoritmos , Simulación por Computador , Estudios de Asociación Genética/métodos , Genómica/métodos , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Polimorfismo de Nucleótido Simple , Carácter Cuantitativo Heredable
16.
BMC Bioinformatics ; 16: 260, 2015 Aug 19.
Artículo en Inglés | MEDLINE | ID: mdl-26283601

RESUMEN

BACKGROUND: Testing dependence/correlation of two variables is one of the fundamental tasks in statistics. In this work, we proposed a new way of testing nonlinear dependence between two continuous variables (X and Y). RESULTS: We addressed this research question by using CANOVA (continuous analysis of variance, software available at https://sourceforge.net/projects/canova/). In the CANOVA framework, we first defined a neighborhood for each data point related to its X value, and then calculated the variance of the Y value within the neighborhood. Finally, we performed permutations to evaluate the significance of the observed values within the neighborhood variance. To evaluate the strength of CANOVA compared to six other methods, we performed extensive simulations to explore the relationship between methods and compared the false positive rates and statistical power using both simulated and real datasets (kidney cancer RNA-seq dataset). CONCLUSIONS: We concluded that CANOVA is an efficient method for testing nonlinear correlation with several advantages in real data applications.


Asunto(s)
Programas Informáticos , Humanos , Neoplasias Renales/genética , Neoplasias Renales/patología , ARN/análisis , ARN/metabolismo , Análisis de Secuencia de ARN
17.
Genet Epidemiol ; 38(7): 622-637, 2014 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-25203683

RESUMEN

By using functional data analysis techniques, we developed generalized functional linear models for testing association between a dichotomous trait and multiple genetic variants in a genetic region while adjusting for covariates. Both fixed and mixed effect models are developed and compared. Extensive simulations show that Rao's efficient score tests of the fixed effect models are very conservative since they generate lower type I errors than nominal levels, and global tests of the mixed effect models generate accurate type I errors. Furthermore, we found that the Rao's efficient score test statistics of the fixed effect models have higher power than the sequence kernel association test (SKAT) and its optimal unified version (SKAT-O) in most cases when the causal variants are both rare and common. When the causal variants are all rare (i.e., minor allele frequencies less than 0.03), the Rao's efficient score test statistics and the global tests have similar or slightly lower power than SKAT and SKAT-O. In practice, it is not known whether rare variants or common variants in a gene region are disease related. All we can assume is that a combination of rare and common variants influences disease susceptibility. Thus, the improved performance of our models when the causal variants are both rare and common shows that the proposed models can be very useful in dissecting complex traits. We compare the performance of our methods with SKAT and SKAT-O on real neural tube defects and Hirschsprung's disease datasets. The Rao's efficient score test statistics and the global tests are more sensitive than SKAT and SKAT-O in the real data analysis. Our methods can be used in either gene-disease genome-wide/exome-wide association studies or candidate gene analyses.


Asunto(s)
Estudios de Casos y Controles , Estudios de Asociación Genética , Modelos Genéticos , Exoma , Frecuencia de los Genes , Genes , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Enfermedad de Hirschsprung/genética , Humanos , Modelos Lineales , Defectos del Tubo Neural/genética , Fenotipo , Polimorfismo de Nucleótido Simple , Programas Informáticos
18.
Am J Hum Genet ; 90(6): 1028-45, 2012 Jun 08.
Artículo en Inglés | MEDLINE | ID: mdl-22682329

RESUMEN

An individual's disease risk is determined by the compounded action of both common variants, inherited from remote ancestors, that segregated within the population and rare variants, inherited from recent ancestors, that segregated mainly within pedigrees. Next-generation sequencing (NGS) technologies generate high-dimensional data that allow a nearly complete evaluation of genetic variation. Despite their promise, NGS technologies also suffer from remarkable limitations: high error rates, enrichment of rare variants, and a large proportion of missing values, as well as the fact that most current analytical methods are designed for population-based association studies. To meet the analytical challenges raised by NGS, we propose a general framework for sequence-based association studies that can use various types of family and unrelated-individual data sampled from any population structure and a universal procedure that can transform any population-based association test statistic for use in family-based association tests. We develop family-based functional principal-component analysis (FPCA) with or without smoothing, a generalized T(2), combined multivariate and collapsing (CMC) method, and single-marker association test statistics. Through intensive simulations, we demonstrate that the family-based smoothed FPCA (SFPCA) has the correct type I error rates and much more power to detect association of (1) common variants, (2) rare variants, (3) both common and rare variants, and (4) variants with opposite directions of effect from other population-based or family-based association analysis methods. The proposed statistics are applied to two data sets with pedigree structures. The results show that the smoothed FPCA has a much smaller p value than other statistics.


Asunto(s)
Análisis de Secuencia de ADN/métodos , Algoritmos , Alelos , Asma/genética , Enfermedades Cardiovasculares/genética , Estudios de Cohortes , Salud de la Familia , Variación Genética , Genética de Población , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Modelos Genéticos , Modelos Estadísticos , Análisis Multivariante , Linaje , Análisis de Componente Principal , Reproducibilidad de los Resultados , Riesgo
19.
Stat Med ; 34(27): 3577-89, 2015 Nov 30.
Artículo en Inglés | MEDLINE | ID: mdl-26123093

RESUMEN

Continuous time Markov chain (CTMC) models are often used to study the progression of chronic diseases in medical research but rarely applied to studies of the process of behavioral change. In studies of interventions to modify behaviors, a widely used psychosocial model is based on the transtheoretical model that often has more than three states (representing stages of change) and conceptually permits all possible instantaneous transitions. Very little attention is given to the study of the relationships between a CTMC model and associated covariates under the framework of transtheoretical model. We developed a Bayesian approach to evaluate the covariate effects on a CTMC model through a log-linear regression link. A simulation study of this approach showed that model parameters were accurately and precisely estimated. We analyzed an existing data set on stages of change in dietary intake from the Next Step Trial using the proposed method and the generalized multinomial logit model. We found that the generalized multinomial logit model was not suitable for these data because it ignores the unbalanced data structure and temporal correlation between successive measurements. Our analysis not only confirms that the nutrition intervention was effective but also provides information on how the intervention affected the transitions among the stages of change. We found that, compared with the control group, subjects in the intervention group, on average, spent substantively less time in the precontemplation stage and were more/less likely to move from an unhealthy/healthy state to a healthy/unhealthy state.


Asunto(s)
Teorema de Bayes , Conducta Alimentaria , Conductas Relacionadas con la Salud , Cadenas de Markov , Modelos Psicológicos , Anciano , Investigación Empírica , Femenino , Humanos , Modelos Lineales , Masculino , Persona de Mediana Edad , Encuestas y Cuestionarios
20.
Nucleic Acids Res ; 41(8): e95, 2013 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-23460206

RESUMEN

Digital transcriptome analysis by next-generation sequencing discovers substantial mRNA variants. Variation in gene expression underlies many biological processes and holds a key to unravelling mechanism of common diseases. However, the current methods for construction of co-expression networks using overall gene expression are originally designed for microarray expression data, and they overlook a large number of variations in gene expressions. To use information on exon, genomic positional level and allele-specific expressions, we develop novel component-based methods, single and bivariate canonical correlation analysis, for construction of co-expression networks with RNA-seq data. To evaluate the performance of our methods for co-expression network inference with RNA-seq data, they are applied to lung squamous cell cancer expression data from TCGA database and our bipolar disorder and schizophrenia RNA-seq study. The preliminary results demonstrate that the co-expression networks constructed by canonical correlation analysis and RNA-seq data provide rich genetic and molecular information to gain insight into biological processes and disease mechanism. Our new methods substantially outperform the current statistical methods for co-expression network construction with microarray expression data or RNA-seq data based on overall gene expression levels.


Asunto(s)
Perfilación de la Expresión Génica , Redes Reguladoras de Genes , Análisis de Secuencia de ARN , Trastorno Bipolar/genética , Trastorno Bipolar/metabolismo , Biología Computacional/métodos , Interpretación Estadística de Datos , Exones , Humanos , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/metabolismo , Neoplasias de Células Escamosas/genética , Neoplasias de Células Escamosas/metabolismo , Esquizofrenia/genética , Esquizofrenia/metabolismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA