Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 58
Filtrar
1.
PLoS Genet ; 19(11): e1010597, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-38011285

RESUMO

Polygenic risk score (PRS) is a quantity that aggregates the effects of variants across the genome and estimates an individual's genetic predisposition for a given trait. PRS analysis typically contains two input data sets: base data for effect size estimation and target data for individual-level prediction. Given the availability of large-scale base data, it becomes more common that the ancestral background of base and target data do not perfectly match. In this paper, we treat the GWAS summary information obtained in the base data as knowledge learned from a pre-trained model, and adopt a transfer learning framework to effectively leverage the knowledge learned from the base data that may or may not have similar ancestral background as the target samples to build prediction models for target individuals. Our proposed transfer learning framework consists of two main steps: (1) conducting false negative control (FNC) marginal screening to extract useful knowledge from the base data; and (2) performing joint model training to integrate the knowledge extracted from base data with the target training data for accurate trans-data prediction. This new approach can significantly enhance the computational and statistical efficiency of joint-model training, alleviate over-fitting, and facilitate more accurate trans-data prediction when heterogeneity level between target and base data sets is small or high.


Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Humanos , Polimorfismo de Nucleotídeo Único/genética , Predisposição Genética para Doença , Fenótipo , Herança Multifatorial/genética , Aprendizado de Máquina , Fatores de Risco
2.
Bioinformatics ; 37(16): 2259-2265, 2021 Aug 25.
Artigo em Inglês | MEDLINE | ID: mdl-33674827

RESUMO

MOTIVATION: Facilitated by technological advances and the decrease in costs, it is feasible to gather subject data from several omics platforms. Each platform assesses different molecular events, and the challenge lies in efficiently analyzing these data to discover novel disease genes or mechanisms. A common strategy is to regress the outcomes on all omics variables in a gene set. However, this approach suffers from problems associated with high-dimensional inference. RESULTS: We introduce a tensor-based framework for variable-wise inference in multi-omics analysis. By accounting for the matrix structure of an individual's multi-omics data, the proposed tensor methods incorporate the relationship among omics effects, reduce the number of parameters, and boost the modeling efficiency. We derive the variable-specific tensor test and enhance computational efficiency of tensor modeling. Using simulations and data applications on the Cancer Cell Line Encyclopedia (CCLE), we demonstrate our method performs favorably over baseline methods and will be useful for gaining biological insights in multi-omics analysis. AVAILABILITY AND IMPLEMENTATION: R function and instruction are available from the authors' website: https://www4.stat.ncsu.edu/~jytzeng/Software/TR.omics/TRinstruction.pdf. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

3.
Genet Epidemiol ; 44(6): 611-619, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32216117

RESUMO

Genome-wide expression quantitative trait loci (eQTLs) mapping explores the relationship between gene expression and DNA variants, such as single-nucleotide polymorphism (SNPs), to understand genetic basis of human diseases. Due to the large number of genes and SNPs that need to be assessed, current methods for eQTL mapping often suffer from low detection power, especially for identifying trans-eQTLs. In this paper, we propose the idea of performing SNP ranking based on the higher criticism statistic, a summary statistic developed in large-scale signal detection. We illustrate how the HC-based SNP ranking can effectively prioritize eQTL signals over noise, greatly reduce the burden of joint modeling, and improve the power for eQTL mapping. Numerical results in simulation studies demonstrate the superior performance of our method compared to existing methods. The proposed method is also evaluated in HapMap eQTL data analysis and the results are compared to a database of known eQTLs.


Assuntos
Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas/genética , Simulação por Computador , Análise de Dados , Regulação da Expressão Gênica , Estudo de Associação Genômica Ampla , Humanos , Modelos Genéticos
4.
Genet Epidemiol ; 44(3): 272-282, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-31943371

RESUMO

Testing the association between single-nucleotide polymorphism (SNP) effects and a response is often carried out through kernel machine methods based on least squares, such as the sequence kernel association test (SKAT). However, these least-squares procedures are designed for a normally distributed conditional response, which may not apply. Other robust procedures such as the quantile regression kernel machine (QRKM) restrict the choice of the loss function and only allow inference on conditional quantiles. We propose a general and robust kernel association test with a flexible choice of the loss function, no distributional assumptions, and has SKAT and QRKM as special cases. We evaluate our proposed robust association test (RobKAT) across various data distributions through a simulation study. When errors are normally distributed, RobKAT controls type I error and shows comparable power with SKAT. In all other distributional settings investigated, our robust test has similar or greater power than SKAT. Finally, we apply our robust testing method to data from the Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE) clinical trial to detect associations between selected genes including the major histocompatibility complex (MHC) region on chromosome six and neurotropic herpesvirus antibody levels in schizophrenia patients. RobKAT detected significant association with four SNP sets (HST1H2BJ, MHC, POM12L2, and SLC17A1), three of which were undetected by SKAT.


Assuntos
Algoritmos , Estudos de Associação Genética , Simulação por Computador , Humanos , Modelos Genéticos , Polimorfismo de Nucleotídeo Único/genética , Seleção Genética
5.
PLoS Comput Biol ; 16(5): e1007797, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32365089

RESUMO

Copy number variants (CNVs) are the gain or loss of DNA segments in the genome that can vary in dosage and length. CNVs comprise a large proportion of variation in human genomes and impact health conditions. To detect rare CNV associations, kernel-based methods have been shown to be a powerful tool due to their flexibility in modeling the aggregate CNV effects, their ability to capture effects from different CNV features, and their accommodation of effect heterogeneity. To perform a kernel association test, a CNV locus needs to be defined so that locus-specific effects can be retained during aggregation. However, CNV loci are arbitrarily defined and different locus definitions can lead to different performance depending on the underlying effect patterns. In this work, we develop a new kernel-based test called CONCUR (i.e., copy number profile curve-based association test) that is free from a definition of locus and evaluates CNV-phenotype associations by comparing individuals' copy number profiles across the genomic regions. CONCUR is built on the proposed concepts of "copy number profile curves" to describe the CNV profile of an individual, and the "common area under the curve (cAUC) kernel" to model the multi-feature CNV effects. The proposed method captures the effects of CNV dosage and length, accounts for the numerical nature of copy numbers, and accommodates between- and within-locus etiological heterogeneity without the need to define artificial CNV loci as required in current kernel methods. In a variety of simulation settings, CONCUR shows comparable or improved power over existing approaches. Real data analyses suggest that CONCUR is well powered to detect CNV effects in the Swedish Schizophrenia Study and the Taiwan Biobank.


Assuntos
Biologia Computacional/métodos , Variações do Número de Cópias de DNA/genética , Algoritmos , Área Sob a Curva , Predisposição Genética para Doença/genética , Variação Genética/genética , Genoma Humano/genética , Estudo de Associação Genômica Ampla/métodos , Genômica/métodos , Humanos , Polimorfismo de Nucleotídeo Único/genética , Análise Espacial
6.
PLoS Comput Biol ; 15(2): e1006722, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-30779729

RESUMO

Rare variants are of increasing interest to genetic association studies because of their etiological contributions to human complex diseases. Due to the rarity of the mutant events, rare variants are routinely analyzed on an aggregate level. While aggregation analyses improve the detection of global-level signal, they are not able to pinpoint causal variants within a variant set. To perform inference on a localized level, additional information, e.g., biological annotation, is often needed to boost the information content of a rare variant. Following the observation that important variants are likely to cluster together on functional domains, we propose a protein structure guided local test (POINT) to provide variant-specific association information using structure-guided aggregation of signal. Constructed under a kernel machine framework, POINT performs local association testing by borrowing information from neighboring variants in the 3-dimensional protein space in a data-adaptive fashion. Besides merely providing a list of promising variants, POINT assigns each variant a p-value to permit variant ranking and prioritization. We assess the selection performance of POINT using simulations and illustrate how it can be used to prioritize individual rare variants in PCSK9, ANGPTL4 and CETP in the Action to Control Cardiovascular Risk in Diabetes (ACCORD) clinical trial data.


Assuntos
Biologia Computacional/métodos , Estudos de Associação Genética/métodos , Análise de Sequência de DNA/métodos , Proteína 4 Semelhante a Angiopoietina/genética , Proteínas de Transferência de Ésteres de Colesterol/genética , Simulação por Computador , Predisposição Genética para Doença/genética , Variação Genética/genética , Humanos , Modelos Genéticos , Pró-Proteína Convertase 9/genética , Estrutura Terciária de Proteína , Fatores de Risco
7.
Genet Epidemiol ; 42(1): 64-79, 2018 02.
Artigo em Inglês | MEDLINE | ID: mdl-29314255

RESUMO

We consider the problem of assessing the joint effect of a set of genetic markers on multiple, possibly correlated phenotypes of interest. We develop a kernel machine based multivariate regression framework, where the joint effect of the marker set on each of the phenotypes is modeled using prespecified kernel functions with unknown variance components. Unlike most existing methods that mainly focus on the global association between the marker set and the phenotype set, we develop estimation and testing procedures to study phenotype-specific associations. Specifically, we develop an estimation method based on the penalized likelihood approach to estimate phenotype-specific effects and their corresponding standard errors while accounting for possible correlation among the phenotypes. We develop testing procedures for the association of the marker set with any subset of phenotypes using a score-based variance components testing method. We assess the performance of our proposed methodology via a simulation study and demonstrate the utility of the proposed method using the Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE) data.


Assuntos
Simulação por Computador , Marcadores Genéticos/genética , Funções Verossimilhança , Modelos Genéticos , Fenótipo , Fatores Etários , Antipsicóticos/uso terapêutico , Humanos , Fatores Sexuais
8.
Genet Epidemiol ; 42(3): 276-287, 2018 04.
Artigo em Inglês | MEDLINE | ID: mdl-29280188

RESUMO

Recent studies showed that population substructure (PS) can have more complex impact on rare variant tests and that similarity-based collapsing tests (e.g., SKAT) may suffer more severely by PS than burden-based tests. In this work, we evaluate the performance of SKAT coupling with principal components (PC) or variance components (VC) based PS correction methods. We consider confounding effects caused by PS including stratified populations, admixed populations, and spatially distributed nongenetic risk; we investigate which types of variants (e.g., common, less frequent, rare, or all variants) should be used to effectively control for confounding effects. We found that (i) PC-based methods can account for confounding effects in most scenarios except for admixture, although the number of sufficient PCs depends on the PS complexity and the type of variants used. (ii) PCs based on all variants (i.e., common + less frequent + rare) tend to require equal or fewer sufficient PCs and often achieve higher power than PCs based on other variant types. (iii) VC-based methods can effectively adjust for confounding in all scenarios (even for admixture), though the type of variants should be used to construct VC may vary. (iv) VC based on all variants works consistently in all scenarios, though its power may be sometimes lower than VC based on other variant types. Given that the best-performed method and which variants to use depend on the underlying unknown confounding mechanisms, a robust strategy is to perform SKAT analyses using VC-based methods based on all variants.


Assuntos
Estudos de Associação Genética , Variação Genética , Análise de Componente Principal , Simulação por Computador , Fatores de Confusão Epidemiológicos , Humanos , Modelos Genéticos
9.
Int J Obes (Lond) ; 42(7): 1285-1295, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-29511319

RESUMO

OBJECTIVE: Human obesity is a complex metabolic disorder disproportionately affecting people of lower socioeconomic strata, and ethnic minorities, especially African Americans and Hispanics. Although genetic predisposition and a positive energy balance are implicated in obesity, these factors alone do not account for the excess prevalence of obesity in lower socioeconomic populations. Therefore, environmental factors, including exposure to pesticides, heavy metals, and other contaminants, are agents widely suspected to have obesogenic activity, and they also are spatially correlated with lower socioeconomic status. Our study investigates the causal relationship between exposure to the heavy metal, cadmium (Cd), and obesity in a cohort of children and in a zebrafish model of adipogenesis. DESIGN: An extensive collection of first trimester maternal blood samples obtained as part of the Newborn Epigenetics Study (NEST) was analyzed for the presence of Cd, and these results were cross analyzed with the weight-gain trajectory of the children through age 5 years. Next, the role of Cd as a potential obesogen was analyzed in an in vivo zebrafish model. RESULTS: Our analysis indicates that the presence of Cd in maternal blood during pregnancy is associated with increased risk of juvenile obesity in the offspring, independent of other variables, including lead (Pb) and smoking status. Our results are recapitulated in a zebrafish model, in which exposure to Cd at levels approximating those observed in the NEST study is associated with increased adiposity. CONCLUSION: Our findings identify Cd as a potential human obesogen. Moreover, these observations are recapitulated in a zebrafish model, suggesting that the underlying mechanisms may be evolutionarily conserved, and that zebrafish may be a valuable model for uncovering pathways leading to Cd-mediated obesity in human populations.


Assuntos
Adipogenia/efeitos dos fármacos , Cádmio/efeitos adversos , Exposição Ambiental/efeitos adversos , Exposição Materna/efeitos adversos , Metais Pesados/efeitos adversos , Obesidade Infantil/induzido quimicamente , Efeitos Tardios da Exposição Pré-Natal/induzido quimicamente , Peixe-Zebra/metabolismo , Animais , Cádmio/análise , Cádmio/sangue , Modelos Animais de Doenças , Exposição Ambiental/análise , Feminino , Humanos , Recém-Nascido , Masculino , Metais Pesados/análise , Obesidade Infantil/sangue , Obesidade Infantil/epidemiologia , Gravidez , Gestantes , Efeitos Tardios da Exposição Pré-Natal/sangue , Efeitos Tardios da Exposição Pré-Natal/epidemiologia , Estudos Prospectivos , Fatores Socioeconômicos , Estados Unidos/epidemiologia
10.
PLoS Genet ; 11(10): e1005403, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26431523

RESUMO

Copy number variants (CNVs) play an important role in the etiology of many diseases such as cancers and psychiatric disorders. Due to a modest marginal effect size or the rarity of the CNVs, collapsing rare CNVs together and collectively evaluating their effect serves as a key approach to evaluating the collective effect of rare CNVs on disease risk. While a plethora of powerful collapsing methods are available for sequence variants (e.g., SNPs) in association analysis, these methods cannot be directly applied to rare CNVs due to the CNV-specific challenges, i.e., the multi-faceted nature of CNV polymorphisms (e.g., CNVs vary in size, type, dosage, and details of gene disruption), and etiological heterogeneity (e.g., heterogeneous effects of duplications and deletions that occur within a locus or in different loci). Existing CNV collapsing analysis methods (a.k.a. the burden test) tend to have suboptimal performance due to the fact that these methods often ignore heterogeneity and evaluate only the marginal effects of a CNV feature. We introduce CCRET, a random effects test for collapsing rare CNVs when searching for disease associations. CCRET is applicable to variants measured on a multi-categorical scale, collectively modeling the effects of multiple CNV features, and is robust to etiological heterogeneity. Multiple confounders can be simultaneously corrected. To evaluate the performance of CCRET, we conducted extensive simulations and analyzed large-scale schizophrenia datasets. We show that CCRET has powerful and robust performance under multiple types of etiological heterogeneity, and has performance comparable to or better than existing methods when there is no heterogeneity.


Assuntos
Variações do Número de Cópias de DNA/genética , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Esquizofrenia/genética , Heterogeneidade Genética , Humanos , Modelos Teóricos , Polimorfismo de Nucleotídeo Único , Esquizofrenia/patologia
11.
Genet Epidemiol ; 40(4): 333-40, 2016 May.
Artigo em Inglês | MEDLINE | ID: mdl-27061717

RESUMO

DNA methylation is a key epigenetic mark involved in both normal development and disease progression. Recent advances in high-throughput technologies have enabled genome-wide profiling of DNA methylation. However, DNA methylation profiling often employs different designs and platforms with varying resolution, which hinders joint analysis of methylation data from multiple platforms. In this study, we propose a penalized functional regression model to impute missing methylation data. By incorporating functional predictors, our model utilizes information from nonlocal probes to improve imputation quality. Here, we compared the performance of our functional model to linear regression and the best single probe surrogate in real data and via simulations. Specifically, we applied different imputation approaches to an acute myeloid leukemia dataset consisting of 194 samples and our method showed higher imputation accuracy, manifested, for example, by a 94% relative increase in information content and up to 86% more CpG sites passing post-imputation filtering. Our simulated association study further demonstrated that our method substantially improves the statistical power to identify trait-associated methylation loci. These findings indicate that the penalized functional regression model is a convenient and valuable imputation tool for methylation data, and it can boost statistical power in downstream epigenome-wide association study (EWAS).


Assuntos
Metilação de DNA , Estudos de Associação Genética/métodos , Leucemia Mieloide Aguda/genética , Ilhas de CpG/genética , Epigênese Genética , Humanos , Modelos Lineares , Modelos Genéticos
12.
PLoS Comput Biol ; 12(6): e1004993, 2016 06.
Artigo em Inglês | MEDLINE | ID: mdl-27355347

RESUMO

Genetic association analyses of rare variants in next-generation sequencing (NGS) studies are fundamentally challenging due to the presence of a very large number of candidate variants at extremely low minor allele frequencies. Recent developments often focus on pooling multiple variants to provide association analysis at the gene instead of the locus level. Nonetheless, pinpointing individual variants is a critical goal for genomic researches as such information can facilitate the precise delineation of molecular mechanisms and functions of genetic factors on diseases. Due to the extreme rarity of mutations and high-dimensionality, significances of causal variants cannot easily stand out from those of noncausal ones. Consequently, standard false-positive control procedures, such as the Bonferroni and false discovery rate (FDR), are often impractical to apply, as a majority of the causal variants can only be identified along with a few but unknown number of noncausal variants. To provide informative analysis of individual variants in large-scale sequencing studies, we propose the Adaptive False-Negative Control (AFNC) procedure that can include a large proportion of causal variants with high confidence by introducing a novel statistical inquiry to determine those variants that can be confidently dispatched as noncausal. The AFNC provides a general framework that can accommodate for a variety of models and significance tests. The procedure is computationally efficient and can adapt to the underlying proportion of causal variants and quality of significance rankings. Extensive simulation studies across a plethora of scenarios demonstrate that the AFNC is advantageous for identifying individual rare variants, whereas the Bonferroni and FDR are exceedingly over-conservative for rare variants association studies. In the analyses of the CoLaus dataset, AFNC has identified individual variants most responsible for gene-level significances. Moreover, single-variant results using the AFNC have been successfully applied to infer related genes with annotation information.


Assuntos
Frequência do Gene/genética , Estudos de Associação Genética , Predisposição Genética para Doença/genética , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Doenças Cardiovasculares/genética , Simulação por Computador , Bases de Dados Factuais , Sistemas de Liberação de Medicamentos , Humanos , Modelos Genéticos
13.
BMC Public Health ; 17(1): 354, 2017 04 24.
Artigo em Inglês | MEDLINE | ID: mdl-28438148

RESUMO

BACKGROUND: Cadmium (Cd), lead (Pb) and arsenic (As) are common environmental contaminants that have been associated with lower birthweight. Although some essential metals may mitigate exposure, data are inconsistent. This study sought to evaluate the relationship between toxic metals, nutrient combinations and birthweight among 275 mother-child pairs. METHODS: Non-essential metals, Cd, Pb, As, and essential metals, iron (Fe), zinc (Zn), selenium (Se), copper (Cu), calcium (Ca), magnesium (Mg), and manganese (Mn) were measured in maternal whole blood obtained during the first trimester using inductively coupled plasma mass spectrometry. Folate concentrations were measured by microbial assay. Birthweight was obtained from medical records. We used quantile regression to evaluate the association between toxic metals and nutrients due to their underlying wedge-shaped relationship. Ordinary linear regression was used to evaluate associations between birth weight and toxic metals. RESULTS: After multivariate adjustment, the negative association between Pb or Cd and a combination of Fe, Se, Ca and folate was robust, persistent and dose-dependent (p < 0.05). However, a combination of Zn, Cu, Mn and Mg was positively associated with Pb and Cd levels. While prenatal blood Cd and Pb were also associated with lower birthweight. Fe, Se, Ca and folate did not modify these associations. CONCLUSION: Small sample size and cross-sectional design notwithstanding, the robust and persistent negative associations between some, but not all, nutrient combinations with these ubiquitous environmental contaminants suggest that only some recommended nutrient combinations may mitigate toxic metal exposure in chronically exposed populations. Larger longitudinal studies are required to confirm these findings.


Assuntos
Peso ao Nascer , Exposição Materna/efeitos adversos , Metais Pesados/sangue , Adulto , Arsênio , Cádmio/sangue , Cobre/sangue , Estudos Transversais , Feminino , Ácido Fólico , Intoxicação por Metais Pesados , Humanos , Ferro/sangue , Chumbo/sangue , Manganês/sangue , Intoxicação , Selênio/sangue , Fatores Socioeconômicos , Zinco/sangue
14.
Genet Epidemiol ; 39(2): 122-33, 2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-25538034

RESUMO

Studying complex diseases in the post genome-wide association studies (GWAS) era has led to developing methods that consider factor-sets rather than individual genetic/environmental factors (i.e., Multi-G-Multi-E studies), and mining for potential gene-environment (G×E) interactions has proven to be an invaluable aid in both discovery and deciphering underlying biological mechanisms. Current approaches for examining effect profiles in Multi-G-Multi-E analyses are either underpowered due to large degrees of freedom, ill-suited for detecting G×E interactions due to imprecise modeling of the G and E effects, or lack of capacity for modeling interactions between two factor-sets (e.g., existing methods focus primarily on a single E factor). In this work, we illustrate the issues encountered in constructing kernels for investigating interactions between two factor-sets, and propose a simple yet intuitive solution to construct the G×E kernel that retains the ease-of-interpretation of classic regression. We also construct a series of kernel machine (KM) score tests to evaluate the complete effect profile (i.e., the G, E, and G×E effects individually or in combination). We show, via simulations and a data application, that the proposed KM methods outperform the classic and PC regressions across a range of scenarios, including varying effect size, effect structure, and interaction complexity. The largest power gain was observed when the underlying effect structure involved complex G×E interactions; however, the proposed methods have consistent, powerful performance when the effect profile is simple or complex, suggesting that the proposed method could be a useful tool for exploratory or confirmatory G×E analysis.


Assuntos
Meio Ambiente , Interação Gene-Ambiente , Estudo de Associação Genômica Ampla/métodos , Simulação por Computador , Predisposição Genética para Doença , Humanos , Modelos Genéticos , Software
15.
Genet Epidemiol ; 39(6): 456-68, 2015 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-26139508

RESUMO

Kernel machine (KM) models are a powerful tool for exploring associations between sets of genetic variants and complex traits. Although most KM methods use a single kernel function to assess the marginal effect of a variable set, KM analyses involving multiple kernels have become increasingly popular. Multikernel analysis allows researchers to study more complex problems, such as assessing gene-gene or gene-environment interactions, incorporating variance-component based methods for population substructure into rare-variant association testing, and assessing the conditional effects of a variable set adjusting for other variable sets. The KM framework is robust, powerful, and provides efficient dimension reduction for multifactor analyses, but requires the estimation of high dimensional nuisance parameters. Traditional estimation techniques, including regularization and the "expectation-maximization (EM)" algorithm, have a large computational cost and are not scalable to large sample sizes needed for rare variant analysis. Therefore, under the context of gene-environment interaction, we propose a computationally efficient and statistically rigorous "fastKM" algorithm for multikernel analysis that is based on a low-rank approximation to the nuisance effect kernel matrices. Our algorithm is applicable to various trait types (e.g., continuous, binary, and survival traits) and can be implemented using any existing single-kernel analysis software. Through extensive simulation studies, we show that our algorithm has similar performance to an EM-based KM approach for quantitative traits while running much faster. We also apply our method to the Vitamin Intervention for Stroke Prevention (VISP) clinical trial, examining gene-by-vitamin effects on recurrent stroke risk and gene-by-age effects on change in homocysteine level.


Assuntos
Algoritmos , Interação Gene-Ambiente , Variação Genética , Homocisteína/metabolismo , Humanos , Modelos Genéticos , Locos de Características Quantitativas , Análise de Regressão , Fatores de Risco , Software , Acidente Vascular Cerebral/etiologia , Vitaminas/metabolismo
16.
Biometrics ; 72(2): 364-71, 2016 06.
Artigo em Inglês | MEDLINE | ID: mdl-26575303

RESUMO

We consider quantile regression for partially linear models where an outcome of interest is related to covariates and a marker set (e.g., gene or pathway). The covariate effects are modeled parametrically and the marker set effect of multiple loci is modeled using kernel machine. We propose an efficient algorithm to solve the corresponding optimization problem for estimating the effects of covariates and also introduce a powerful test for detecting the overall effect of the marker set. Our test is motivated by traditional score test, and borrows the idea of permutation test. Our estimation and testing procedures are evaluated numerically and applied to assess genetic association of change in fasting homocysteine level using the Vitamin Intervention for Stroke Prevention Trial data.


Assuntos
Biomarcadores , Modelos Genéticos , Modelos Estatísticos , Análise de Regressão , Algoritmos , Biometria/métodos , Ensaios Clínicos como Assunto , Simulação por Computador , Estudos de Associação Genética , Homocisteína/sangue , Humanos , Modelos Lineares , Polimorfismo de Nucleotídeo Único
17.
Biometrics ; 72(1): 85-94, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26288029

RESUMO

Finding an efficient and computationally feasible approach to deal with the curse of high-dimensionality is a daunting challenge faced by modern biological science. The problem becomes even more severe when the interactions are the research focus. To improve the performance of statistical analyses, we propose a sparse and low-rank (SLR) screening based on the combination of a low-rank interaction model and the Lasso screening. SLR models the interaction effects using a low-rank matrix to achieve parsimonious parametrization. The low-rank model increases the efficiency of statistical inference and, hence, SLR screening is able to more accurately detect gene-gene interactions than conventional methods. Incorporation of SLR screening into the Screen-and-Clean approach (Wasserman and Roeder, 2009; Wu et al., 2010) is also discussed, which suffers less penalty from Boferroni correction, and is able to assign p-values for the identified variables in high-dimensional model. We apply the proposed screening procedure to the Warfarin dosage study and the CoLaus study. The results suggest that the new procedure can identify main and interaction effects that would have been omitted by conventional screening methods.


Assuntos
Algoritmos , Interpretação Estatística de Dados , Ensaios de Triagem em Larga Escala/métodos , Modelos Estatísticos , Mapeamento de Interação de Proteínas/métodos , Análise de Regressão , Simulação por Computador , Reconhecimento Automatizado de Padrão/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
18.
Bioinformatics ; 30(11): 1501-7, 2014 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-24489370

RESUMO

MOTIVATION: Gene set analysis is a popular method for large-scale genomic studies. Because genes that have common biological features are analyzed jointly, gene set analysis often achieves better power and generates more biologically informative results. With the advancement of technologies, genomic studies with multi-platform data have become increasingly common. Several strategies have been proposed that integrate genomic data from multiple platforms to perform gene set analysis. To evaluate the performances of existing integrative gene set methods under various scenarios, we conduct a comparative simulation analysis based on The Cancer Genome Atlas breast cancer dataset. RESULTS: We find that existing methods for gene set analysis are less effective when sample heterogeneity exists. To address this issue, we develop three methods for multi-platform genomic data with heterogeneity: two non-parametric methods, multi-platform Mann-Whitney statistics and multi-platform outlier robust T-statistics, and a parametric method, multi-platform likelihood ratio statistics. Using simulations, we show that the proposed multi-platform Mann-Whitney statistics method has higher power for heterogeneous samples and comparable performance for homogeneous samples when compared with the existing methods. Our real data applications to two datasets of The Cancer Genome Atlas also suggest that the proposed methods are able to identify novel pathways that are missed by other strategies. AVAILABILITY AND IMPLEMENTATION: http://www4.stat.ncsu.edu/∼jytzeng/Software/Multiplatform_gene_set_analysis/


Assuntos
Genômica/métodos , Neoplasias da Mama/genética , Variações do Número de Cópias de DNA , Metilação de DNA , Feminino , Perfilação da Expressão Gênica , Genes , Humanos , Análise de Sequência de RNA , Estatísticas não Paramétricas
19.
Biometrics ; 71(2): 529-37, 2015 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-25604216

RESUMO

Pharmacogenetics investigates the relationship between heritable genetic variation and the variation in how individuals respond to drug therapies. Often, gene-drug interactions play a primary role in this response, and identifying these effects can aid in the development of individualized treatment regimes. Haplotypes can hold key information in understanding the association between genetic variation and drug response. However, the standard approach for haplotype-based association analysis does not directly address the research questions dictated by individualized medicine. A complementary post-hoc analysis is required, and this post-hoc analysis is usually under powered after adjusting for multiple comparisons and may lead to seemingly contradictory conclusions. In this work, we propose a penalized likelihood approach that is able to overcome the drawbacks of the standard approach and yield the desired personalized output. We demonstrate the utility of our method by applying it to the Scottish Randomized Trial in Ovarian Cancer. We also conducted simulation studies and showed that the proposed penalized method has comparable or more power than the standard approach and maintains low Type I error rates for both binary and quantitative drug responses. The largest performance gains are seen when the haplotype frequency is low, the difference in effect sizes are small, or the true relationship among the drugs is more complex.


Assuntos
Funções Verossimilhança , Farmacogenética/estatística & dados numéricos , Antineoplásicos/efeitos adversos , Biometria , Simulação por Computador , Feminino , Genes bcl-2 , Haplótipos , Humanos , Modelos Estatísticos , Neoplasias Ovarianas/tratamento farmacológico , Neoplasias Ovarianas/genética , Análise de Regressão
20.
Hum Hered ; 78(1): 17-26, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24969398

RESUMO

OBJECTIVE: Gene-gene interactions (G×G) are important to study because of their extensiveness in biological systems and their potential in explaining missing heritability of complex traits. In this work, we propose a new similarity-based test to assess G×G at the gene level, which permits the study of epistasis at biologically functional units with amplified interaction signals. METHODS: Under the framework of gene-trait similarity regression (SimReg), we propose a gene-based test for detecting G×G. SimReg uses a regression model to correlate trait similarity with genotypic similarity across a gene. Unlike existing gene-level methods based on leading principal components (PCs), SimReg summarizes all information on genotypic variation within a gene and can be used to assess the joint/interactive effects of two genes as well as the effect of one gene conditional on another. RESULTS: Using simulations and a real data application to the Warfarin study, we show that the SimReg G×G tests have satisfactory power and robustness under different genetic architecture when compared to existing gene-based interaction tests such as PC analysis or partial least squares. A genome-wide association study with approx. 20,000 genes may be completed on a parallel computing system in 2 weeks.


Assuntos
Epistasia Genética , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Análise de Regressão , Algoritmos , Simulação por Computador , Genótipo , Humanos , Análise de Componente Principal
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA