Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
Más filtros

Banco de datos
Tipo del documento
País de afiliación
Intervalo de año de publicación
1.
J Biopharm Stat ; : 1-19, 2024 Jun 18.
Artículo en Inglés | MEDLINE | ID: mdl-38888431

RESUMEN

Pharmaceutical researchers are continually searching for techniques to improve both drug development processes and patient outcomes. An area of recent interest is the potential for machine learning (ML) applications within pharmacology. One such application not yet given close study is the unsupervised clustering of plasma concentration-time curves, hereafter, pharmacokinetic (PK) curves. In this paper, we present our findings on how to cluster PK curves by their similarity. Specifically, we find clustering to be effective at identifying similar-shaped PK curves and informative for understanding patterns within each cluster of PK curves. Because PK curves are time series data objects, our approach utilizes the extensive body of research related to the clustering of time series data as a starting point. As such, we examine many dissimilarity measures between time series data objects to find those most suitable for PK curves. We identify Euclidean distance as generally most appropriate for clustering PK curves, and we further show that dynamic time warping, Fréchet, and structure-based measures of dissimilarity like correlation may produce unexpected results. As an illustration, we apply these methods in a case study with 250 PK curves used in a previous pharmacogenomic study. Our case study finds that an unsupervised ML clustering with Euclidean distance, without any subject genetic information, is able to independently validate the same conclusions as the reference pharmacogenomic results. To our knowledge, this is the first such demonstration. Further, the case study demonstrates how the clustering of PK curves may generate insights that could be difficult to perceive solely with population level summary statistics of PK metrics.

2.
Genet Epidemiol ; 44(1): 104-116, 2020 01.
Artículo en Inglés | MEDLINE | ID: mdl-31830326

RESUMEN

Single genome-wide studies may be underpowered to detect trait-associated rare variants with moderate or weak effect sizes. As a viable alternative, meta-analysis is widely used to increase power by combining different studies. The power of meta-analysis critically depends on the underlying association patterns and heterogeneity levels, which are unknown and vary from locus to locus. However, existing methods mainly focus on one or only a few combinations of the association pattern and heterogeneity level, thus may lose power in many situations. To address this issue, we propose a general and unified framework by combining a class of tests including and beyond some existing ones, leading to high power across a wide range of scenarios. We demonstrate that the proposed test is more powerful than some existing methods in simulation studies, then show their performance with the NHLBI Exome-Sequencing Project (ESP) data. One gene (B4GALNT2) was found by our proposed test, but not by others, to be statistically significantly associated with plasma triglyceride. The signal was driven by African-ancestry subjects but it was previously reported to be associated with coronary artery disease among European-ancestry subjects. We implemented our method in an R package aSPUmeta, publicly available at https://github.com/ytzhong/metaRV and will be on CRAN soon.


Asunto(s)
Enfermedad de la Arteria Coronaria/genética , Estudio de Asociación del Genoma Completo/métodos , Metaanálisis como Asunto , Modelos Genéticos , N-Acetilgalactosaminiltransferasas/genética , Triglicéridos/sangre , Población Negra/genética , Exoma/genética , Humanos , Fenotipo , Población Blanca/genética
3.
BMC Bioinformatics ; 21(Suppl 21): 581, 2020 Dec 28.
Artículo en Inglés | MEDLINE | ID: mdl-33371887

RESUMEN

BACKGROUND: The estimation of microbial networks can provide important insight into the ecological relationships among the organisms that comprise the microbiome. However, there are a number of critical statistical challenges in the inference of such networks from high-throughput data. Since the abundances in each sample are constrained to have a fixed sum and there is incomplete overlap in microbial populations across subjects, the data are both compositional and zero-inflated. RESULTS: We propose the COmpositional Zero-Inflated Network Estimation (COZINE) method for inference of microbial networks which addresses these critical aspects of the data while maintaining computational scalability. COZINE relies on the multivariate Hurdle model to infer a sparse set of conditional dependencies which reflect not only relationships among the continuous values, but also among binary indicators of presence or absence and between the binary and continuous representations of the data. Our simulation results show that the proposed method is better able to capture various types of microbial relationships than existing approaches. We demonstrate the utility of the method with an application to understanding the oral microbiome network in a cohort of leukemic patients. CONCLUSIONS: Our proposed method addresses important challenges in microbiome network estimation, and can be effectively applied to discover various types of dependence relationships in microbial communities. The procedure we have developed, which we refer to as COZINE, is available online at https://github.com/MinJinHa/COZINE .


Asunto(s)
Biología Computacional/métodos , Microbiota , Humanos , Leucemia/microbiología
4.
Biometrics ; 75(1): 172-182, 2019 03.
Artículo en Inglés | MEDLINE | ID: mdl-30051914

RESUMEN

Hub nodes within biological networks play a pivotal role in determining phenotypes and disease outcomes. In the multiple network setting, we are interested in understanding network similarities and differences across different experimental conditions or subtypes of disease. The majority of proposed approaches for joint modeling of multiple networks focus on the sharing of edges across graphs. Rather than assuming the network similarities are driven by individual edges, we instead focus on the presence of common hub nodes, which are more likely to be preserved across settings. Specifically, we formulate a Bayesian approach to the problem of multiple network inference which allows direct inference on shared and differential hub nodes. The proposed method not only allows a more intuitive interpretation of the resulting networks and clearer guidance on potential targets for treatment, but also improves power for identifying the edges of highly connected nodes. Through simulations, we demonstrate the utility of our method and compare its performance to current popular methods that do not borrow information regarding hub nodes across networks. We illustrate the applicability of our method to inference of co-expression networks from The Cancer Genome Atlas ovarian carcinoma dataset.


Asunto(s)
Teorema de Bayes , Gráficos por Computador , Biología de Sistemas/estadística & datos numéricos , Algoritmos , Simulación por Computador , Femenino , Redes Reguladoras de Genes , Humanos , Neoplasias Ováricas/genética
5.
Genet Epidemiol ; 41(3): 259-277, 2017 04.
Artículo en Inglés | MEDLINE | ID: mdl-28191669

RESUMEN

There has been increasing interest in developing more powerful and flexible statistical tests to detect genetic associations with multiple traits, as arising from neuroimaging genetic studies. Most of existing methods treat a single trait or multiple traits as response while treating an SNP as a predictor coded under an additive inheritance mode. In this paper, we follow an earlier approach in treating an SNP as an ordinal response while treating traits as predictors in a proportional odds model (POM). In this way, it is not only easier to handle mixed types of traits, e.g., some quantitative and some binary, but it is also potentially more robust to the commonly adopted additive inheritance mode. More importantly, we develop an adaptive test in a POM so that it can maintain high power across many possible situations. Compared to the existing methods treating multiple traits as responses, e.g., in a generalized estimating equation (GEE) approach, the proposed method can be applied to a high dimensional setting where the number of phenotypes (p) can be larger than the sample size (n), in addition to a usual small P setting. The promising performance of the proposed method was demonstrated with applications to the Alzheimer's Disease Neuroimaging Initiative (ADNI) data, in which either structural MRI driven phenotypes or resting-state functional MRI (rs-fMRI) derived brain functional connectivity measures were used as phenotypes. The applications led to the identification of several top SNPs of biological interest. Furthermore, simulation studies showed competitive performance of the new method, especially for p>n.


Asunto(s)
Enfermedad de Alzheimer/genética , Encéfalo/fisiología , Predisposición Genética a la Enfermedad , Modelos Genéticos , Red Nerviosa/fisiología , Polimorfismo de Nucleótido Simple/genética , Sitios de Carácter Cuantitativo/genética , Algoritmos , Enfermedad de Alzheimer/diagnóstico , Enfermedad de Alzheimer/diagnóstico por imagen , Encéfalo/diagnóstico por imagen , Mapeo Encefálico , Simulación por Computador , Genoma Humano , Humanos , Imagen por Resonancia Magnética/métodos , Neuroimagen , Fenotipo , Tamaño de la Muestra
6.
Genet Epidemiol ; 39(8): 651-63, 2015 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-26493956

RESUMEN

We study the problem of testing for single marker-multiple phenotype associations based on genome-wide association study (GWAS) summary statistics without access to individual-level genotype and phenotype data. For most published GWASs, because obtaining summary data is substantially easier than accessing individual-level phenotype and genotype data, while often multiple correlated traits have been collected, the problem studied here has become increasingly important. We propose a powerful adaptive test and compare its performance with some existing tests. We illustrate its applications to analyses of a meta-analyzed GWAS dataset with three blood lipid traits and another with sex-stratified anthropometric traits, and further demonstrate its potential power gain over some existing methods through realistic simulation studies. We start from the situation with only one set of (possibly meta-analyzed) genome-wide summary statistics, then extend the method to meta-analysis of multiple sets of genome-wide summary statistics, each from one GWAS. We expect the proposed test to be useful in practice as more powerful than or complementary to existing methods.


Asunto(s)
Marcadores Genéticos/genética , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Lípidos/sangre , Lípidos/genética , Modelos Genéticos , Genotipo , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple/genética
7.
Neuroimage ; 121: 136-45, 2015 Nov 01.
Artículo en Inglés | MEDLINE | ID: mdl-26220747

RESUMEN

Almost all genome-wide association studies (GWASs), including Alzheimer's Disease Neuroimaging Initiative (ADNI), are based on the case-control study design, implying that the resulting case-control data are likely a biased, not random, sample of the target population. Although association analysis of the disease (e.g. Alzheimer's disease in the ADNI) can be conducted using a standard logistic regression by ignoring the biased case-control sampling, a standard linear regression analysis on a secondary phenotype (e.g. any neuroimaging phenotype in the ADNI) may in general lead to biased inference, including biased parameter estimates, inflated Type I errors and reduced power for association testing. Despite of this well known result in genetic epidemiology, to our surprise, all the published studies on secondary phenotypes with the ADNI data have ignored this potential problem. Here we aim to answer whether such a standard analysis of a secondary phenotype is valid or problematic with the ADNI data. Through both real data analyses and simulation studies, we found that, strikingly, such an analysis was generally valid (with only small biases or slightly inflated Type I errors) for the ADNI data, though cautions must be taken when analyzing other data. We also illustrate applications and possible problems of two methods specifically developed for valid analysis of secondary phenotypes.


Asunto(s)
Enfermedad de Alzheimer , Encéfalo/patología , Interpretación Estadística de Datos , Estudio de Asociación del Genoma Completo/normas , Genotipo , Neuroimagen/normas , Fenotipo , Proyectos de Investigación/normas , Sesgo de Selección , Anciano , Anciano de 80 o más Años , Enfermedad de Alzheimer/genética , Enfermedad de Alzheimer/patología , Simulación por Computador , Humanos , Persona de Mediana Edad
8.
Neuroimage ; 101: 681-94, 2014 Nov 01.
Artículo en Inglés | MEDLINE | ID: mdl-25086298

RESUMEN

Brain functional connectivity has been studied by analyzing time series correlations in regional brain activities based on resting-state fMRI data. Brain functional connectivity can be depicted as a network or graph defined as a set of nodes linked by edges. Nodes represent brain regions and an edge measures the strength of functional correlation between two regions. Most of existing work focuses on estimation of such a network. A key but inadequately addressed question is how to test for possible differences of the networks between two subject groups, say between healthy controls and patients. Here we illustrate and compare the performance of several state-of-the-art statistical tests drawn from the neuroimaging, genetics, ecology and high-dimensional data literatures. Both real and simulated data were used to evaluate the methods. We found that Network Based Statistic (NBS) performed well in many but not all situations, and its performance critically depends on the choice of its threshold parameter, which is unknown and difficult to choose in practice. Importantly, two adaptive statistical tests called adaptive sum of powered score (aSPU) and its weighted version (aSPUw) are easy to use and complementary to NBS, being higher powered than NBS in some situations. The aSPU and aSPUw tests can also be applied to adjust for covariates. Between the aSPU and aSPUw tests, they often, but not always, performed similarly with neither one as a uniform winner. On the other hand, Multivariate Matrix Distance Regression (MDMR) has been applied to detect group differences for brain connectivity; with the usual choice of the Euclidean distance, MDMR is a special case of the aSPU test. Consequently NBS, aSPU and aSPUw tests are recommended to test for group differences in functional connectivity.


Asunto(s)
Encéfalo/fisiología , Conectoma/métodos , Interpretación Estadística de Datos , Red Nerviosa/fisiología , Adolescente , Niño , Simulación por Computador , Trastornos del Espectro Alcohólico Fetal/fisiopatología , Humanos , Imagen por Resonancia Magnética
9.
Lifetime Data Anal ; 20(4): 599-618, 2014 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-24549607

RESUMEN

The semiparametric accelerated failure time (AFT) model is not as widely used as the Cox relative risk model due to computational difficulties. Recent developments in least squares estimation and induced smoothing estimating equations for censored data provide promising tools to make the AFT models more attractive in practice. For multivariate AFT models, we propose a generalized estimating equations (GEE) approach, extending the GEE to censored data. The consistency of the regression coefficient estimator is robust to misspecification of working covariance, and the efficiency is higher when the working covariance structure is closer to the truth. The marginal error distributions and regression coefficients are allowed to be unique for each margin or partially shared across margins as needed. The initial estimator is a rank-based estimator with Gehan's weight, but obtained from an induced smoothing approach with computational ease. The resulting estimator is consistent and asymptotically normal, with variance estimated through a multiplier resampling method. In a large scale simulation study, our estimator was up to three times as efficient as the estimateor that ignores the within-cluster dependence, especially when the within-cluster dependence was strong. The methods were applied to the bivariate failure times data from a diabetic retinopathy study.


Asunto(s)
Modelos Estadísticos , Análisis Multivariante , Simulación por Computador , Retinopatía Diabética/cirugía , Humanos , Estimación de Kaplan-Meier , Coagulación con Láser , Análisis de los Mínimos Cuadrados , Tablas de Vida , Modelos Lineales , Modelos de Riesgos Proporcionales , Análisis de Regresión , Factores de Tiempo
10.
Stat Anal Data Min ; 14(1): 18-30, 2021 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-35027990

RESUMEN

With the advent of high-throughput sequencing, an efficient computing strategy is required to deal with large genomic data sets. The challenge of estimating a large precision matrix has garnered substantial research attention for its direct application to discriminant analyses and graphical models. Most existing methods either use a lasso-type penalty that may lead to biased estimators or are computationally intensive, which prevents their applications to very large graphs. We propose using an L 0 penalty to estimate an ultra-large precision matrix (scalnetL0). We apply scalnetL0 to RNA-seq data from breast cancer patients represented in The Cancer Genome Atlas and find improved accuracy of classifications for survival times. The estimated precision matrix provides information about a large-scale co-expression network in breast cancer. Simulation studies demonstrate that scalnetL0 provides more accurate and efficient estimators, yielding shorter CPU time and less Frobenius loss on sparse learning for large-scale precision matrix estimation.

11.
Pac Symp Biocomput ; 22: 58-69, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-27896962

RESUMEN

Due to its high dimensionality and high noise levels, analysis of a large brain functional network may not be powerful and easy to interpret; instead, decomposition of a large network into smaller subcomponents called modules may be more promising as suggested by some empirical evidence. For example, alteration of brain modularity is observed in patients suffering from various types of brain malfunctions. Although several methods exist for estimating brain functional networks, such as the sample correlation matrix or graphical lasso for a sparse precision matrix, it is still difficult to extract modules from such network estimates. Motivated by these considerations, we adapt a weighted gene co-expression network analysis (WGCNA) framework to resting-state fMRI (rs-fMRI) data to identify modular structures in brain functional networks. Modular structures are identified by using topological overlap matrix (TOM) elements in hierarchical clustering. We propose applying a new adaptive test built on the proportional odds model (POM) that can be applied to a high-dimensional setting, where the number of variables (p) can exceed the sample size (n) in addition to the usual p < n setting. We applied our proposed methods to the ADNI data to test for associations between a genetic variant and either the whole brain functional network or its various subcomponents using various connectivity measures. We uncovered several modules based on the control cohort, and some of them were marginally associated with the APOE4 variant and several other SNPs; however, due to the small sample size of the ADNI data, larger studies are needed.


Asunto(s)
Encéfalo/fisiología , Redes Reguladoras de Genes , Polimorfismo de Nucleótido Simple , Enfermedad de Alzheimer/diagnóstico por imagen , Enfermedad de Alzheimer/genética , Enfermedad de Alzheimer/fisiopatología , Apolipoproteína E4/genética , Encéfalo/diagnóstico por imagen , Biología Computacional , Neuroimagen Funcional , Estudio de Asociación del Genoma Completo , Humanos , Imagen por Resonancia Magnética
12.
Genetics ; 203(2): 715-31, 2016 06.
Artículo en Inglés | MEDLINE | ID: mdl-27075728

RESUMEN

Testing for genetic association with multiple traits has become increasingly important, not only because of its potential to boost statistical power, but also for its direct relevance to applications. For example, there is accumulating evidence showing that some complex neurodegenerative and psychiatric diseases like Alzheimer's disease are due to disrupted brain networks, for which it would be natural to identify genetic variants associated with a disrupted brain network, represented as a set of multiple traits, one for each of multiple brain regions of interest. In spite of its promise, testing for multivariate trait associations is challenging: if not appropriately used, its power can be much lower than testing on each univariate trait separately (with a proper control for multiple testing). Furthermore, differing from most existing methods for single-SNP-multiple-trait associations, we consider SNP set-based association testing to decipher complicated joint effects of multiple SNPs on multiple traits. Because the power of a test critically depends on several unknown factors such as the proportions of associated SNPs and of traits, we propose a highly adaptive test at both the SNP and trait levels, giving higher weights to those likely associated SNPs and traits, to yield high power across a wide spectrum of situations. We illuminate relationships among the proposed and some existing tests, showing that the proposed test covers several existing tests as special cases. We compare the performance of the new test with that of several existing tests, using both simulated and real data. The methods were applied to structural magnetic resonance imaging data drawn from the Alzheimer's Disease Neuroimaging Initiative to identify genes associated with gray matter atrophy in the human brain default mode network (DMN). For genome-wide association studies (GWAS), genes AMOTL1 on chromosome 11 and APOE on chromosome 19 were discovered by the new test to be significantly associated with the DMN. Notably, gene AMOTL1 was not detected by single SNP-based analyses. To our knowledge, AMOTL1 has not been highlighted in other Alzheimer's disease studies before, although it was indicated to be related to cognitive impairment. The proposed method is also applicable to rare variants in sequencing data and can be extended to pathway analysis.


Asunto(s)
Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo/métodos , Polimorfismo de Nucleótido Simple , Enfermedad de Alzheimer/diagnóstico por imagen , Enfermedad de Alzheimer/genética , Angiomotinas , Humanos , Proteínas de la Membrana/genética
13.
Genome Med ; 8(1): 56, 2016 05 19.
Artículo en Inglés | MEDLINE | ID: mdl-27198579

RESUMEN

There is increasing interest in investigating how the compositions of microbial communities are associated with human health and disease. Although existing methods have identified many associations, a proper choice of a phylogenetic distance is critical for the power of these methods. To assess an overall association between the composition of a microbial community and an outcome of interest, we present a novel multivariate testing method called aMiSPU, that is joint and highly adaptive over all observed taxa and thus high powered across various scenarios, alleviating the issue with the choice of a phylogenetic distance. Our simulations and real-data analyses demonstrated that the aMiSPU test was often more powerful than several competing methods while correctly controlling type I error rates. The R package MiSPU is available at https://github.com/ChongWu-Biostat/MiSPU and CRAN.


Asunto(s)
Bacterias/clasificación , Biología Computacional/métodos , Microbiota , Estudios de Asociación Genética , Humanos , Filogenia , Navegador Web
14.
Neuroimage Clin ; 9: 625-39, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26740916

RESUMEN

Resting-state functional magnetic resonance imaging (rs-fMRI) and other technologies have been offering evidence and insights showing that altered brain functional networks are associated with neurological illnesses such as Alzheimer's disease. Exploring brain networks of clinical populations compared to those of controls would be a key inquiry to reveal underlying neurological processes related to such illnesses. For such a purpose, group-level inference is a necessary first step in order to establish whether there are any genuinely disrupted brain subnetworks. Such an analysis is also challenging due to the high dimensionality of the parameters in a network model and high noise levels in neuroimaging data. We are still in the early stage of method development as highlighted by Varoquaux and Craddock (2013) that "there is currently no unique solution, but a spectrum of related methods and analytical strategies" to learn and compare brain connectivity. In practice the important issue of how to choose several critical parameters in estimating a network, such as what association measure to use and what is the sparsity of the estimated network, has not been carefully addressed, largely because the answers are unknown yet. For example, even though the choice of tuning parameters in model estimation has been extensively discussed in the literature, as to be shown here, an optimal choice of a parameter for network estimation may not be optimal in the current context of hypothesis testing. Arbitrarily choosing or mis-specifying such parameters may lead to extremely low-powered tests. Here we develop highly adaptive tests to detect group differences in brain connectivity while accounting for unknown optimal choices of some tuning parameters. The proposed tests combine statistical evidence against a null hypothesis from multiple sources across a range of plausible tuning parameter values reflecting uncertainty with the unknown truth. These highly adaptive tests are not only easy to use, but also high-powered robustly across various scenarios. The usage and advantages of these novel tests are demonstrated on an Alzheimer's disease dataset and simulated data.


Asunto(s)
Enfermedad de Alzheimer/fisiopatología , Mapeo Encefálico/métodos , Encéfalo/fisiopatología , Imagen por Resonancia Magnética/métodos , Anciano , Anciano de 80 o más Años , Simulación por Computador , Interpretación Estadística de Datos , Femenino , Humanos , Procesamiento de Imagen Asistido por Computador , Masculino
15.
Brain Connect ; 5(4): 214-31, 2015 May.
Artículo en Inglés | MEDLINE | ID: mdl-25492804

RESUMEN

Resting-state functional magnetic resonance imaging allows one to study brain functional connectivity, partly motivated by evidence that patients with complex disorders, such as Alzheimer's disease, may have altered functional brain connectivity patterns as compared with healthy subjects. A functional connectivity network describes statistical associations of the neural activities among distinct and distant brain regions. Recently, there is a major interest in group-level functional network analysis; however, there is a relative lack of studies on statistical inference, such as significance testing for group comparisons. In particular, it is still debatable which statistic should be used to measure pairwise associations as the connectivity weights. Many functional connectivity studies have used either (full or marginal) correlations or partial correlations for pairwise associations. This article investigates the performance of using either correlations or partial correlations for testing group differences in brain connectivity, and how sparsity levels and topological structures of the connectivity would influence statistical power to detect group differences. Our results suggest that, in general, testing group differences in networks deviates from estimating networks. For example, high regularization in both covariance matrices and precision matrices may lead to higher statistical power; in particular, optimally selected regularization (e.g., by cross-validation or even at the true sparsity level) on the precision matrices with small estimation errors may have low power. Most importantly, and perhaps surprisingly, using either correlations or partial correlations may give very different testing results, depending on which of the covariance matrices and the precision matrices are sparse. Specifically, if the precision matrices are sparse, presumably and arguably a reasonable assumption, then using correlations often yields much higher powered and more stable testing results than using partial correlations; the conclusion is reversed if the covariance matrices, not the precision matrices, are sparse. These results may have useful implications to future studies on testing functional connectivity differences.


Asunto(s)
Encéfalo/fisiología , Red Nerviosa/fisiología , Encéfalo/anatomía & histología , Simulación por Computador , Conectoma/métodos , Humanos , Imagen por Resonancia Magnética , Red Nerviosa/anatomía & histología
16.
Genetics ; 197(4): 1081-95, 2014 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-24831820

RESUMEN

This article focuses on conducting global testing for association between a binary trait and a set of rare variants (RVs), although its application can be much broader to other types of traits, common variants (CVs), and gene set or pathway analysis. We show that many of the existing tests have deteriorating performance in the presence of many nonassociated RVs: their power can dramatically drop as the proportion of nonassociated RVs in the group to be tested increases. We propose a class of so-called sum of powered score (SPU) tests, each of which is based on the score vector from a general regression model and hence can deal with different types of traits and adjust for covariates, e.g., principal components accounting for population stratification. The SPU tests generalize the sum test, a representative burden test based on pooling or collapsing genotypes of RVs, and a sum of squared score (SSU) test that is closely related to several other powerful variance component tests; a previous study (Basu and Pan 2011) has demonstrated good performance of one, but not both, of the Sum and SSU tests in many situations. The SPU tests are versatile in the sense that one of them is often powerful, although its identity varies with the unknown true association parameters. We propose an adaptive SPU (aSPU) test to approximate the most powerful SPU test for a given scenario, consequently maintaining high power and being highly adaptive across various scenarios. We conducted extensive simulations to show superior performance of the aSPU test over several state-of-the-art association tests in the presence of many nonassociated RVs. Finally we applied the SPU and aSPU tests to the GAW17 mini-exome sequence data to compare its practical performance with some existing tests, demonstrating their potential usefulness.


Asunto(s)
Modelos Genéticos , Fenotipo , Polimorfismo de Nucleótido Simple , Simulación por Computador , Bases de Datos Genéticas , Estudios de Asociación Genética , Genotipo , Humanos , Análisis de Componente Principal , Selección Genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA