Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 58
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
PLoS Genet ; 19(11): e1010597, 2023 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-38011285

RESUMEN

Polygenic risk score (PRS) is a quantity that aggregates the effects of variants across the genome and estimates an individual's genetic predisposition for a given trait. PRS analysis typically contains two input data sets: base data for effect size estimation and target data for individual-level prediction. Given the availability of large-scale base data, it becomes more common that the ancestral background of base and target data do not perfectly match. In this paper, we treat the GWAS summary information obtained in the base data as knowledge learned from a pre-trained model, and adopt a transfer learning framework to effectively leverage the knowledge learned from the base data that may or may not have similar ancestral background as the target samples to build prediction models for target individuals. Our proposed transfer learning framework consists of two main steps: (1) conducting false negative control (FNC) marginal screening to extract useful knowledge from the base data; and (2) performing joint model training to integrate the knowledge extracted from base data with the target training data for accurate trans-data prediction. This new approach can significantly enhance the computational and statistical efficiency of joint-model training, alleviate over-fitting, and facilitate more accurate trans-data prediction when heterogeneity level between target and base data sets is small or high.


Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Humanos , Polimorfismo de Nucleótido Simple/genética , Predisposición Genética a la Enfermedad , Fenotipo , Herencia Multifactorial/genética , Aprendizaje Automático , Factores de Riesgo
2.
Bioinformatics ; 37(16): 2259-2265, 2021 Aug 25.
Artículo en Inglés | MEDLINE | ID: mdl-33674827

RESUMEN

MOTIVATION: Facilitated by technological advances and the decrease in costs, it is feasible to gather subject data from several omics platforms. Each platform assesses different molecular events, and the challenge lies in efficiently analyzing these data to discover novel disease genes or mechanisms. A common strategy is to regress the outcomes on all omics variables in a gene set. However, this approach suffers from problems associated with high-dimensional inference. RESULTS: We introduce a tensor-based framework for variable-wise inference in multi-omics analysis. By accounting for the matrix structure of an individual's multi-omics data, the proposed tensor methods incorporate the relationship among omics effects, reduce the number of parameters, and boost the modeling efficiency. We derive the variable-specific tensor test and enhance computational efficiency of tensor modeling. Using simulations and data applications on the Cancer Cell Line Encyclopedia (CCLE), we demonstrate our method performs favorably over baseline methods and will be useful for gaining biological insights in multi-omics analysis. AVAILABILITY AND IMPLEMENTATION: R function and instruction are available from the authors' website: https://www4.stat.ncsu.edu/~jytzeng/Software/TR.omics/TRinstruction.pdf. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

3.
Genet Epidemiol ; 44(6): 611-619, 2020 09.
Artículo en Inglés | MEDLINE | ID: mdl-32216117

RESUMEN

Genome-wide expression quantitative trait loci (eQTLs) mapping explores the relationship between gene expression and DNA variants, such as single-nucleotide polymorphism (SNPs), to understand genetic basis of human diseases. Due to the large number of genes and SNPs that need to be assessed, current methods for eQTL mapping often suffer from low detection power, especially for identifying trans-eQTLs. In this paper, we propose the idea of performing SNP ranking based on the higher criticism statistic, a summary statistic developed in large-scale signal detection. We illustrate how the HC-based SNP ranking can effectively prioritize eQTL signals over noise, greatly reduce the burden of joint modeling, and improve the power for eQTL mapping. Numerical results in simulation studies demonstrate the superior performance of our method compared to existing methods. The proposed method is also evaluated in HapMap eQTL data analysis and the results are compared to a database of known eQTLs.


Asunto(s)
Polimorfismo de Nucleótido Simple/genética , Sitios de Carácter Cuantitativo/genética , Simulación por Computador , Análisis de Datos , Regulación de la Expresión Génica , Estudio de Asociación del Genoma Completo , Humanos , Modelos Genéticos
4.
Genet Epidemiol ; 44(3): 272-282, 2020 04.
Artículo en Inglés | MEDLINE | ID: mdl-31943371

RESUMEN

Testing the association between single-nucleotide polymorphism (SNP) effects and a response is often carried out through kernel machine methods based on least squares, such as the sequence kernel association test (SKAT). However, these least-squares procedures are designed for a normally distributed conditional response, which may not apply. Other robust procedures such as the quantile regression kernel machine (QRKM) restrict the choice of the loss function and only allow inference on conditional quantiles. We propose a general and robust kernel association test with a flexible choice of the loss function, no distributional assumptions, and has SKAT and QRKM as special cases. We evaluate our proposed robust association test (RobKAT) across various data distributions through a simulation study. When errors are normally distributed, RobKAT controls type I error and shows comparable power with SKAT. In all other distributional settings investigated, our robust test has similar or greater power than SKAT. Finally, we apply our robust testing method to data from the Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE) clinical trial to detect associations between selected genes including the major histocompatibility complex (MHC) region on chromosome six and neurotropic herpesvirus antibody levels in schizophrenia patients. RobKAT detected significant association with four SNP sets (HST1H2BJ, MHC, POM12L2, and SLC17A1), three of which were undetected by SKAT.


Asunto(s)
Algoritmos , Estudios de Asociación Genética , Simulación por Computador , Humanos , Modelos Genéticos , Polimorfismo de Nucleótido Simple/genética , Selección Genética
5.
PLoS Comput Biol ; 16(5): e1007797, 2020 05.
Artículo en Inglés | MEDLINE | ID: mdl-32365089

RESUMEN

Copy number variants (CNVs) are the gain or loss of DNA segments in the genome that can vary in dosage and length. CNVs comprise a large proportion of variation in human genomes and impact health conditions. To detect rare CNV associations, kernel-based methods have been shown to be a powerful tool due to their flexibility in modeling the aggregate CNV effects, their ability to capture effects from different CNV features, and their accommodation of effect heterogeneity. To perform a kernel association test, a CNV locus needs to be defined so that locus-specific effects can be retained during aggregation. However, CNV loci are arbitrarily defined and different locus definitions can lead to different performance depending on the underlying effect patterns. In this work, we develop a new kernel-based test called CONCUR (i.e., copy number profile curve-based association test) that is free from a definition of locus and evaluates CNV-phenotype associations by comparing individuals' copy number profiles across the genomic regions. CONCUR is built on the proposed concepts of "copy number profile curves" to describe the CNV profile of an individual, and the "common area under the curve (cAUC) kernel" to model the multi-feature CNV effects. The proposed method captures the effects of CNV dosage and length, accounts for the numerical nature of copy numbers, and accommodates between- and within-locus etiological heterogeneity without the need to define artificial CNV loci as required in current kernel methods. In a variety of simulation settings, CONCUR shows comparable or improved power over existing approaches. Real data analyses suggest that CONCUR is well powered to detect CNV effects in the Swedish Schizophrenia Study and the Taiwan Biobank.


Asunto(s)
Biología Computacional/métodos , Variaciones en el Número de Copia de ADN/genética , Algoritmos , Área Bajo la Curva , Predisposición Genética a la Enfermedad/genética , Variación Genética/genética , Genoma Humano/genética , Estudio de Asociación del Genoma Completo/métodos , Genómica/métodos , Humanos , Polimorfismo de Nucleótido Simple/genética , Análisis Espacial
6.
PLoS Comput Biol ; 15(2): e1006722, 2019 02.
Artículo en Inglés | MEDLINE | ID: mdl-30779729

RESUMEN

Rare variants are of increasing interest to genetic association studies because of their etiological contributions to human complex diseases. Due to the rarity of the mutant events, rare variants are routinely analyzed on an aggregate level. While aggregation analyses improve the detection of global-level signal, they are not able to pinpoint causal variants within a variant set. To perform inference on a localized level, additional information, e.g., biological annotation, is often needed to boost the information content of a rare variant. Following the observation that important variants are likely to cluster together on functional domains, we propose a protein structure guided local test (POINT) to provide variant-specific association information using structure-guided aggregation of signal. Constructed under a kernel machine framework, POINT performs local association testing by borrowing information from neighboring variants in the 3-dimensional protein space in a data-adaptive fashion. Besides merely providing a list of promising variants, POINT assigns each variant a p-value to permit variant ranking and prioritization. We assess the selection performance of POINT using simulations and illustrate how it can be used to prioritize individual rare variants in PCSK9, ANGPTL4 and CETP in the Action to Control Cardiovascular Risk in Diabetes (ACCORD) clinical trial data.


Asunto(s)
Biología Computacional/métodos , Estudios de Asociación Genética/métodos , Análisis de Secuencia de ADN/métodos , Proteína 4 Similar a la Angiopoyetina/genética , Proteínas de Transferencia de Ésteres de Colesterol/genética , Simulación por Computador , Predisposición Genética a la Enfermedad/genética , Variación Genética/genética , Humanos , Modelos Genéticos , Proproteína Convertasa 9/genética , Estructura Terciaria de Proteína , Factores de Riesgo
7.
Genet Epidemiol ; 42(1): 64-79, 2018 02.
Artículo en Inglés | MEDLINE | ID: mdl-29314255

RESUMEN

We consider the problem of assessing the joint effect of a set of genetic markers on multiple, possibly correlated phenotypes of interest. We develop a kernel machine based multivariate regression framework, where the joint effect of the marker set on each of the phenotypes is modeled using prespecified kernel functions with unknown variance components. Unlike most existing methods that mainly focus on the global association between the marker set and the phenotype set, we develop estimation and testing procedures to study phenotype-specific associations. Specifically, we develop an estimation method based on the penalized likelihood approach to estimate phenotype-specific effects and their corresponding standard errors while accounting for possible correlation among the phenotypes. We develop testing procedures for the association of the marker set with any subset of phenotypes using a score-based variance components testing method. We assess the performance of our proposed methodology via a simulation study and demonstrate the utility of the proposed method using the Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE) data.


Asunto(s)
Simulación por Computador , Marcadores Genéticos/genética , Funciones de Verosimilitud , Modelos Genéticos , Fenotipo , Factores de Edad , Antipsicóticos/uso terapéutico , Humanos , Factores Sexuales
8.
Genet Epidemiol ; 42(3): 276-287, 2018 04.
Artículo en Inglés | MEDLINE | ID: mdl-29280188

RESUMEN

Recent studies showed that population substructure (PS) can have more complex impact on rare variant tests and that similarity-based collapsing tests (e.g., SKAT) may suffer more severely by PS than burden-based tests. In this work, we evaluate the performance of SKAT coupling with principal components (PC) or variance components (VC) based PS correction methods. We consider confounding effects caused by PS including stratified populations, admixed populations, and spatially distributed nongenetic risk; we investigate which types of variants (e.g., common, less frequent, rare, or all variants) should be used to effectively control for confounding effects. We found that (i) PC-based methods can account for confounding effects in most scenarios except for admixture, although the number of sufficient PCs depends on the PS complexity and the type of variants used. (ii) PCs based on all variants (i.e., common + less frequent + rare) tend to require equal or fewer sufficient PCs and often achieve higher power than PCs based on other variant types. (iii) VC-based methods can effectively adjust for confounding in all scenarios (even for admixture), though the type of variants should be used to construct VC may vary. (iv) VC based on all variants works consistently in all scenarios, though its power may be sometimes lower than VC based on other variant types. Given that the best-performed method and which variants to use depend on the underlying unknown confounding mechanisms, a robust strategy is to perform SKAT analyses using VC-based methods based on all variants.


Asunto(s)
Estudios de Asociación Genética , Variación Genética , Análisis de Componente Principal , Simulación por Computador , Factores de Confusión Epidemiológicos , Humanos , Modelos Genéticos
9.
Int J Obes (Lond) ; 42(7): 1285-1295, 2018 07.
Artículo en Inglés | MEDLINE | ID: mdl-29511319

RESUMEN

OBJECTIVE: Human obesity is a complex metabolic disorder disproportionately affecting people of lower socioeconomic strata, and ethnic minorities, especially African Americans and Hispanics. Although genetic predisposition and a positive energy balance are implicated in obesity, these factors alone do not account for the excess prevalence of obesity in lower socioeconomic populations. Therefore, environmental factors, including exposure to pesticides, heavy metals, and other contaminants, are agents widely suspected to have obesogenic activity, and they also are spatially correlated with lower socioeconomic status. Our study investigates the causal relationship between exposure to the heavy metal, cadmium (Cd), and obesity in a cohort of children and in a zebrafish model of adipogenesis. DESIGN: An extensive collection of first trimester maternal blood samples obtained as part of the Newborn Epigenetics Study (NEST) was analyzed for the presence of Cd, and these results were cross analyzed with the weight-gain trajectory of the children through age 5 years. Next, the role of Cd as a potential obesogen was analyzed in an in vivo zebrafish model. RESULTS: Our analysis indicates that the presence of Cd in maternal blood during pregnancy is associated with increased risk of juvenile obesity in the offspring, independent of other variables, including lead (Pb) and smoking status. Our results are recapitulated in a zebrafish model, in which exposure to Cd at levels approximating those observed in the NEST study is associated with increased adiposity. CONCLUSION: Our findings identify Cd as a potential human obesogen. Moreover, these observations are recapitulated in a zebrafish model, suggesting that the underlying mechanisms may be evolutionarily conserved, and that zebrafish may be a valuable model for uncovering pathways leading to Cd-mediated obesity in human populations.


Asunto(s)
Adipogénesis/efectos de los fármacos , Cadmio/efectos adversos , Exposición a Riesgos Ambientales/efectos adversos , Exposición Materna/efectos adversos , Metales Pesados/efectos adversos , Obesidad Infantil/inducido químicamente , Efectos Tardíos de la Exposición Prenatal/inducido químicamente , Pez Cebra/metabolismo , Animales , Cadmio/análisis , Cadmio/sangre , Modelos Animales de Enfermedad , Exposición a Riesgos Ambientales/análisis , Femenino , Humanos , Recién Nacido , Masculino , Metales Pesados/análisis , Obesidad Infantil/sangre , Obesidad Infantil/epidemiología , Embarazo , Mujeres Embarazadas , Efectos Tardíos de la Exposición Prenatal/sangre , Efectos Tardíos de la Exposición Prenatal/epidemiología , Estudios Prospectivos , Factores Socioeconómicos , Estados Unidos/epidemiología
10.
PLoS Genet ; 11(10): e1005403, 2015 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-26431523

RESUMEN

Copy number variants (CNVs) play an important role in the etiology of many diseases such as cancers and psychiatric disorders. Due to a modest marginal effect size or the rarity of the CNVs, collapsing rare CNVs together and collectively evaluating their effect serves as a key approach to evaluating the collective effect of rare CNVs on disease risk. While a plethora of powerful collapsing methods are available for sequence variants (e.g., SNPs) in association analysis, these methods cannot be directly applied to rare CNVs due to the CNV-specific challenges, i.e., the multi-faceted nature of CNV polymorphisms (e.g., CNVs vary in size, type, dosage, and details of gene disruption), and etiological heterogeneity (e.g., heterogeneous effects of duplications and deletions that occur within a locus or in different loci). Existing CNV collapsing analysis methods (a.k.a. the burden test) tend to have suboptimal performance due to the fact that these methods often ignore heterogeneity and evaluate only the marginal effects of a CNV feature. We introduce CCRET, a random effects test for collapsing rare CNVs when searching for disease associations. CCRET is applicable to variants measured on a multi-categorical scale, collectively modeling the effects of multiple CNV features, and is robust to etiological heterogeneity. Multiple confounders can be simultaneously corrected. To evaluate the performance of CCRET, we conducted extensive simulations and analyzed large-scale schizophrenia datasets. We show that CCRET has powerful and robust performance under multiple types of etiological heterogeneity, and has performance comparable to or better than existing methods when there is no heterogeneity.


Asunto(s)
Variaciones en el Número de Copia de ADN/genética , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Esquizofrenia/genética , Heterogeneidad Genética , Humanos , Modelos Teóricos , Polimorfismo de Nucleótido Simple , Esquizofrenia/patología
11.
Genet Epidemiol ; 40(4): 333-40, 2016 May.
Artículo en Inglés | MEDLINE | ID: mdl-27061717

RESUMEN

DNA methylation is a key epigenetic mark involved in both normal development and disease progression. Recent advances in high-throughput technologies have enabled genome-wide profiling of DNA methylation. However, DNA methylation profiling often employs different designs and platforms with varying resolution, which hinders joint analysis of methylation data from multiple platforms. In this study, we propose a penalized functional regression model to impute missing methylation data. By incorporating functional predictors, our model utilizes information from nonlocal probes to improve imputation quality. Here, we compared the performance of our functional model to linear regression and the best single probe surrogate in real data and via simulations. Specifically, we applied different imputation approaches to an acute myeloid leukemia dataset consisting of 194 samples and our method showed higher imputation accuracy, manifested, for example, by a 94% relative increase in information content and up to 86% more CpG sites passing post-imputation filtering. Our simulated association study further demonstrated that our method substantially improves the statistical power to identify trait-associated methylation loci. These findings indicate that the penalized functional regression model is a convenient and valuable imputation tool for methylation data, and it can boost statistical power in downstream epigenome-wide association study (EWAS).


Asunto(s)
Metilación de ADN , Estudios de Asociación Genética/métodos , Leucemia Mieloide Aguda/genética , Islas de CpG/genética , Epigénesis Genética , Humanos , Modelos Lineales , Modelos Genéticos
12.
PLoS Comput Biol ; 12(6): e1004993, 2016 06.
Artículo en Inglés | MEDLINE | ID: mdl-27355347

RESUMEN

Genetic association analyses of rare variants in next-generation sequencing (NGS) studies are fundamentally challenging due to the presence of a very large number of candidate variants at extremely low minor allele frequencies. Recent developments often focus on pooling multiple variants to provide association analysis at the gene instead of the locus level. Nonetheless, pinpointing individual variants is a critical goal for genomic researches as such information can facilitate the precise delineation of molecular mechanisms and functions of genetic factors on diseases. Due to the extreme rarity of mutations and high-dimensionality, significances of causal variants cannot easily stand out from those of noncausal ones. Consequently, standard false-positive control procedures, such as the Bonferroni and false discovery rate (FDR), are often impractical to apply, as a majority of the causal variants can only be identified along with a few but unknown number of noncausal variants. To provide informative analysis of individual variants in large-scale sequencing studies, we propose the Adaptive False-Negative Control (AFNC) procedure that can include a large proportion of causal variants with high confidence by introducing a novel statistical inquiry to determine those variants that can be confidently dispatched as noncausal. The AFNC provides a general framework that can accommodate for a variety of models and significance tests. The procedure is computationally efficient and can adapt to the underlying proportion of causal variants and quality of significance rankings. Extensive simulation studies across a plethora of scenarios demonstrate that the AFNC is advantageous for identifying individual rare variants, whereas the Bonferroni and FDR are exceedingly over-conservative for rare variants association studies. In the analyses of the CoLaus dataset, AFNC has identified individual variants most responsible for gene-level significances. Moreover, single-variant results using the AFNC have been successfully applied to infer related genes with annotation information.


Asunto(s)
Frecuencia de los Genes/genética , Estudios de Asociación Genética , Predisposición Genética a la Enfermedad/genética , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Enfermedades Cardiovasculares/genética , Simulación por Computador , Bases de Datos Factuales , Sistemas de Liberación de Medicamentos , Humanos , Modelos Genéticos
13.
BMC Public Health ; 17(1): 354, 2017 04 24.
Artículo en Inglés | MEDLINE | ID: mdl-28438148

RESUMEN

BACKGROUND: Cadmium (Cd), lead (Pb) and arsenic (As) are common environmental contaminants that have been associated with lower birthweight. Although some essential metals may mitigate exposure, data are inconsistent. This study sought to evaluate the relationship between toxic metals, nutrient combinations and birthweight among 275 mother-child pairs. METHODS: Non-essential metals, Cd, Pb, As, and essential metals, iron (Fe), zinc (Zn), selenium (Se), copper (Cu), calcium (Ca), magnesium (Mg), and manganese (Mn) were measured in maternal whole blood obtained during the first trimester using inductively coupled plasma mass spectrometry. Folate concentrations were measured by microbial assay. Birthweight was obtained from medical records. We used quantile regression to evaluate the association between toxic metals and nutrients due to their underlying wedge-shaped relationship. Ordinary linear regression was used to evaluate associations between birth weight and toxic metals. RESULTS: After multivariate adjustment, the negative association between Pb or Cd and a combination of Fe, Se, Ca and folate was robust, persistent and dose-dependent (p < 0.05). However, a combination of Zn, Cu, Mn and Mg was positively associated with Pb and Cd levels. While prenatal blood Cd and Pb were also associated with lower birthweight. Fe, Se, Ca and folate did not modify these associations. CONCLUSION: Small sample size and cross-sectional design notwithstanding, the robust and persistent negative associations between some, but not all, nutrient combinations with these ubiquitous environmental contaminants suggest that only some recommended nutrient combinations may mitigate toxic metal exposure in chronically exposed populations. Larger longitudinal studies are required to confirm these findings.


Asunto(s)
Peso al Nacer , Exposición Materna/efectos adversos , Metales Pesados/sangre , Adulto , Arsénico , Cadmio/sangre , Cobre/sangre , Estudios Transversales , Femenino , Ácido Fólico , Intoxicación por Metales Pesados , Humanos , Hierro/sangre , Plomo/sangre , Manganeso/sangre , Intoxicación , Selenio/sangre , Factores Socioeconómicos , Zinc/sangre
14.
Genet Epidemiol ; 39(2): 122-33, 2015 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-25538034

RESUMEN

Studying complex diseases in the post genome-wide association studies (GWAS) era has led to developing methods that consider factor-sets rather than individual genetic/environmental factors (i.e., Multi-G-Multi-E studies), and mining for potential gene-environment (G×E) interactions has proven to be an invaluable aid in both discovery and deciphering underlying biological mechanisms. Current approaches for examining effect profiles in Multi-G-Multi-E analyses are either underpowered due to large degrees of freedom, ill-suited for detecting G×E interactions due to imprecise modeling of the G and E effects, or lack of capacity for modeling interactions between two factor-sets (e.g., existing methods focus primarily on a single E factor). In this work, we illustrate the issues encountered in constructing kernels for investigating interactions between two factor-sets, and propose a simple yet intuitive solution to construct the G×E kernel that retains the ease-of-interpretation of classic regression. We also construct a series of kernel machine (KM) score tests to evaluate the complete effect profile (i.e., the G, E, and G×E effects individually or in combination). We show, via simulations and a data application, that the proposed KM methods outperform the classic and PC regressions across a range of scenarios, including varying effect size, effect structure, and interaction complexity. The largest power gain was observed when the underlying effect structure involved complex G×E interactions; however, the proposed methods have consistent, powerful performance when the effect profile is simple or complex, suggesting that the proposed method could be a useful tool for exploratory or confirmatory G×E analysis.


Asunto(s)
Ambiente , Interacción Gen-Ambiente , Estudio de Asociación del Genoma Completo/métodos , Simulación por Computador , Predisposición Genética a la Enfermedad , Humanos , Modelos Genéticos , Programas Informáticos
15.
Genet Epidemiol ; 39(6): 456-68, 2015 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-26139508

RESUMEN

Kernel machine (KM) models are a powerful tool for exploring associations between sets of genetic variants and complex traits. Although most KM methods use a single kernel function to assess the marginal effect of a variable set, KM analyses involving multiple kernels have become increasingly popular. Multikernel analysis allows researchers to study more complex problems, such as assessing gene-gene or gene-environment interactions, incorporating variance-component based methods for population substructure into rare-variant association testing, and assessing the conditional effects of a variable set adjusting for other variable sets. The KM framework is robust, powerful, and provides efficient dimension reduction for multifactor analyses, but requires the estimation of high dimensional nuisance parameters. Traditional estimation techniques, including regularization and the "expectation-maximization (EM)" algorithm, have a large computational cost and are not scalable to large sample sizes needed for rare variant analysis. Therefore, under the context of gene-environment interaction, we propose a computationally efficient and statistically rigorous "fastKM" algorithm for multikernel analysis that is based on a low-rank approximation to the nuisance effect kernel matrices. Our algorithm is applicable to various trait types (e.g., continuous, binary, and survival traits) and can be implemented using any existing single-kernel analysis software. Through extensive simulation studies, we show that our algorithm has similar performance to an EM-based KM approach for quantitative traits while running much faster. We also apply our method to the Vitamin Intervention for Stroke Prevention (VISP) clinical trial, examining gene-by-vitamin effects on recurrent stroke risk and gene-by-age effects on change in homocysteine level.


Asunto(s)
Algoritmos , Interacción Gen-Ambiente , Variación Genética , Homocisteína/metabolismo , Humanos , Modelos Genéticos , Sitios de Carácter Cuantitativo , Análisis de Regresión , Factores de Riesgo , Programas Informáticos , Accidente Cerebrovascular/etiología , Vitaminas/metabolismo
16.
Biometrics ; 72(2): 364-71, 2016 06.
Artículo en Inglés | MEDLINE | ID: mdl-26575303

RESUMEN

We consider quantile regression for partially linear models where an outcome of interest is related to covariates and a marker set (e.g., gene or pathway). The covariate effects are modeled parametrically and the marker set effect of multiple loci is modeled using kernel machine. We propose an efficient algorithm to solve the corresponding optimization problem for estimating the effects of covariates and also introduce a powerful test for detecting the overall effect of the marker set. Our test is motivated by traditional score test, and borrows the idea of permutation test. Our estimation and testing procedures are evaluated numerically and applied to assess genetic association of change in fasting homocysteine level using the Vitamin Intervention for Stroke Prevention Trial data.


Asunto(s)
Biomarcadores , Modelos Genéticos , Modelos Estadísticos , Análisis de Regresión , Algoritmos , Biometría/métodos , Ensayos Clínicos como Asunto , Simulación por Computador , Estudios de Asociación Genética , Homocisteína/sangre , Humanos , Modelos Lineales , Polimorfismo de Nucleótido Simple
17.
Biometrics ; 72(1): 85-94, 2016 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-26288029

RESUMEN

Finding an efficient and computationally feasible approach to deal with the curse of high-dimensionality is a daunting challenge faced by modern biological science. The problem becomes even more severe when the interactions are the research focus. To improve the performance of statistical analyses, we propose a sparse and low-rank (SLR) screening based on the combination of a low-rank interaction model and the Lasso screening. SLR models the interaction effects using a low-rank matrix to achieve parsimonious parametrization. The low-rank model increases the efficiency of statistical inference and, hence, SLR screening is able to more accurately detect gene-gene interactions than conventional methods. Incorporation of SLR screening into the Screen-and-Clean approach (Wasserman and Roeder, 2009; Wu et al., 2010) is also discussed, which suffers less penalty from Boferroni correction, and is able to assign p-values for the identified variables in high-dimensional model. We apply the proposed screening procedure to the Warfarin dosage study and the CoLaus study. The results suggest that the new procedure can identify main and interaction effects that would have been omitted by conventional screening methods.


Asunto(s)
Algoritmos , Interpretación Estadística de Datos , Ensayos Analíticos de Alto Rendimiento/métodos , Modelos Estadísticos , Mapeo de Interacción de Proteínas/métodos , Análisis de Regresión , Simulación por Computador , Reconocimiento de Normas Patrones Automatizadas/métodos , Reproducibilidad de los Resultados , Sensibilidad y Especificidad
18.
Bioinformatics ; 30(11): 1501-7, 2014 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-24489370

RESUMEN

MOTIVATION: Gene set analysis is a popular method for large-scale genomic studies. Because genes that have common biological features are analyzed jointly, gene set analysis often achieves better power and generates more biologically informative results. With the advancement of technologies, genomic studies with multi-platform data have become increasingly common. Several strategies have been proposed that integrate genomic data from multiple platforms to perform gene set analysis. To evaluate the performances of existing integrative gene set methods under various scenarios, we conduct a comparative simulation analysis based on The Cancer Genome Atlas breast cancer dataset. RESULTS: We find that existing methods for gene set analysis are less effective when sample heterogeneity exists. To address this issue, we develop three methods for multi-platform genomic data with heterogeneity: two non-parametric methods, multi-platform Mann-Whitney statistics and multi-platform outlier robust T-statistics, and a parametric method, multi-platform likelihood ratio statistics. Using simulations, we show that the proposed multi-platform Mann-Whitney statistics method has higher power for heterogeneous samples and comparable performance for homogeneous samples when compared with the existing methods. Our real data applications to two datasets of The Cancer Genome Atlas also suggest that the proposed methods are able to identify novel pathways that are missed by other strategies. AVAILABILITY AND IMPLEMENTATION: http://www4.stat.ncsu.edu/∼jytzeng/Software/Multiplatform_gene_set_analysis/


Asunto(s)
Genómica/métodos , Neoplasias de la Mama/genética , Variaciones en el Número de Copia de ADN , Metilación de ADN , Femenino , Perfilación de la Expresión Génica , Genes , Humanos , Análisis de Secuencia de ARN , Estadísticas no Paramétricas
19.
Biometrics ; 71(2): 529-37, 2015 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-25604216

RESUMEN

Pharmacogenetics investigates the relationship between heritable genetic variation and the variation in how individuals respond to drug therapies. Often, gene-drug interactions play a primary role in this response, and identifying these effects can aid in the development of individualized treatment regimes. Haplotypes can hold key information in understanding the association between genetic variation and drug response. However, the standard approach for haplotype-based association analysis does not directly address the research questions dictated by individualized medicine. A complementary post-hoc analysis is required, and this post-hoc analysis is usually under powered after adjusting for multiple comparisons and may lead to seemingly contradictory conclusions. In this work, we propose a penalized likelihood approach that is able to overcome the drawbacks of the standard approach and yield the desired personalized output. We demonstrate the utility of our method by applying it to the Scottish Randomized Trial in Ovarian Cancer. We also conducted simulation studies and showed that the proposed penalized method has comparable or more power than the standard approach and maintains low Type I error rates for both binary and quantitative drug responses. The largest performance gains are seen when the haplotype frequency is low, the difference in effect sizes are small, or the true relationship among the drugs is more complex.


Asunto(s)
Funciones de Verosimilitud , Farmacogenética/estadística & datos numéricos , Antineoplásicos/efectos adversos , Biometría , Simulación por Computador , Femenino , Genes bcl-2 , Haplotipos , Humanos , Modelos Estadísticos , Neoplasias Ováricas/tratamiento farmacológico , Neoplasias Ováricas/genética , Análisis de Regresión
20.
Hum Hered ; 78(1): 17-26, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24969398

RESUMEN

OBJECTIVE: Gene-gene interactions (G×G) are important to study because of their extensiveness in biological systems and their potential in explaining missing heritability of complex traits. In this work, we propose a new similarity-based test to assess G×G at the gene level, which permits the study of epistasis at biologically functional units with amplified interaction signals. METHODS: Under the framework of gene-trait similarity regression (SimReg), we propose a gene-based test for detecting G×G. SimReg uses a regression model to correlate trait similarity with genotypic similarity across a gene. Unlike existing gene-level methods based on leading principal components (PCs), SimReg summarizes all information on genotypic variation within a gene and can be used to assess the joint/interactive effects of two genes as well as the effect of one gene conditional on another. RESULTS: Using simulations and a real data application to the Warfarin study, we show that the SimReg G×G tests have satisfactory power and robustness under different genetic architecture when compared to existing gene-based interaction tests such as PC analysis or partial least squares. A genome-wide association study with approx. 20,000 genes may be completed on a parallel computing system in 2 weeks.


Asunto(s)
Epistasis Genética , Modelos Genéticos , Polimorfismo de Nucleótido Simple , Análisis de Regresión , Algoritmos , Simulación por Computador , Genotipo , Humanos , Análisis de Componente Principal
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA