Pesquisa | BVS Aleitamento Materno

Heuristic identification of biological architectures for simulating complex hierarchical genetic interactions.

Moore, Jason H; Amos, Ryan; Kiralis, Jeff; Andrews, Peter C.

Genet Epidemiol ; 39(1): 25-34, 2015 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-25395175

RESUMO

Simulation plays an essential role in the development of new computational and statistical methods for the genetic analysis of complex traits. Most simulations start with a statistical model using methods such as linear or logistic regression that specify the relationship between genotype and phenotype. This is appealing due to its simplicity and because these statistical methods are commonly used in genetic analysis. It is our working hypothesis that simulations need to move beyond simple statistical models to more realistically represent the biological complexity of genetic architecture. The goal of the present study was to develop a prototype genotype-phenotype simulation method and software that are capable of simulating complex genetic effects within the context of a hierarchical biology-based framework. Specifically, our goal is to simulate multilocus epistasis or gene-gene interaction where the genetic variants are organized within the framework of one or more genes, their regulatory regions and other regulatory loci. We introduce here the Heuristic Identification of Biological Architectures for simulating Complex Hierarchical Interactions (HIBACHI) method and prototype software for simulating data in this manner. This approach combines a biological hierarchy, a flexible mathematical framework, a liability threshold model for defining disease endpoints, and a heuristic search strategy for identifying high-order epistatic models of disease susceptibility. We provide several simulation examples using genetic models exhibiting independent main effects and three-way epistatic effects.

Assuntos

Epistasia Genética , Modelos Genéticos , Software , Interação Gene-Ambiente , Variação Genética , Humanos , Modelos Estatísticos

ViSEN: methodology and software for visualization of statistical epistasis networks.

Hu, Ting; Chen, Yuanzhu; Kiralis, Jeff W; Moore, Jason H.

Genet Epidemiol ; 37(3): 283-5, 2013 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-23468157

RESUMO

The nonlinear interaction effect among multiple genetic factors, i.e. epistasis, has been recognized as a key component in understanding the underlying genetic basis of complex human diseases and phenotypic traits. Due to the statistical and computational complexity, most epistasis studies are limited to interactions with an order of two. We developed ViSEN to analyze and visualize epistatic interactions of both two-way and three-way. ViSEN not only identifies strong interactions among pairs or trios of genetic attributes, but also provides a global interaction map that shows neighborhood and clustering structures. This visualized information could be very helpful to infer the underlying genetic architecture of complex diseases and to generate plausible hypotheses for further biological validations. ViSEN is implemented in Java and freely available at https://sourceforge.net/projects/visen/.

Assuntos

Gráficos por Computador , Epistasia Genética , Modelos Estatísticos , Software , Humanos , Fenótipo , Linguagens de Programação

Characterizing genetic interactions in human disease association studies using statistical epistasis networks.

Hu, Ting; Sinnott-Armstrong, Nicholas A; Kiralis, Jeff W; Andrew, Angeline S; Karagas, Margaret R; Moore, Jason H.

BMC Bioinformatics ; 12: 364, 2011 Sep 12.

Artigo em Inglês | MEDLINE | ID: mdl-21910885

RESUMO

BACKGROUND: Epistasis is recognized ubiquitous in the genetic architecture of complex traits such as disease susceptibility. Experimental studies in model organisms have revealed extensive evidence of biological interactions among genes. Meanwhile, statistical and computational studies in human populations have suggested non-additive effects of genetic variation on complex traits. Although these studies form a baseline for understanding the genetic architecture of complex traits, to date they have only considered interactions among a small number of genetic variants. Our goal here is to use network science to determine the extent to which non-additive interactions exist beyond small subsets of genetic variants. We infer statistical epistasis networks to characterize the global space of pairwise interactions among approximately 1500 Single Nucleotide Polymorphisms (SNPs) spanning nearly 500 cancer susceptibility genes in a large population-based study of bladder cancer. RESULTS: The statistical epistasis network was built by linking pairs of SNPs if their pairwise interactions were stronger than a systematically derived threshold. Its topology clearly differentiated this real-data network from networks obtained from permutations of the same data under the null hypothesis that no association exists between genotype and phenotype. The network had a significantly higher number of hub SNPs and, interestingly, these hub SNPs were not necessarily with high main effects. The network had a largest connected component of 39 SNPs that was absent in any other permuted-data networks. In addition, the vertex degrees of this network were distinctively found following an approximate power-law distribution and its topology appeared scale-free. CONCLUSIONS: In contrast to many existing techniques focusing on high main-effect SNPs or models of several interacting SNPs, our network approach characterized a global picture of gene-gene interactions in a population-based genetic data. The network was built using pairwise interactions, and its distinctive network topology and large connected components indicated joint effects in a large set of SNPs. Our observations suggested that this particular statistical epistasis network captured important features of the genetic architecture of bladder cancer that have not been described previously.

Assuntos

Epistasia Genética , Polimorfismo de Nucleotídeo Único , Neoplasias da Bexiga Urinária/genética , Adulto , Idoso , Genótipo , Humanos , Pessoa de Meia-Idade , New Hampshire

A classification and characterization of two-locus, pure, strict, epistatic models for simulation and detection.

Urbanowicz, Ryan J; Granizo-Mackenzie, Ambrose Ls; Kiralis, Jeff; Moore, Jason H.

BioData Min ; 7: 8, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25057293

RESUMO

BACKGROUND: The statistical genetics phenomenon of epistasis is widely acknowledged to confound disease etiology. In order to evaluate strategies for detecting these complex multi-locus disease associations, simulation studies are required. The development of the GAMETES software for the generation of complex genetic models, has provided the means to randomly generate an architecturally diverse population of epistatic models that are both pure and strict, i.e. all n loci, but no fewer, are predictive of phenotype. Previous theoretical work characterizing complex genetic models has yet to examine pure, strict, epistasis which should be the most challenging to detect. This study addresses three goals: (1) Classify and characterize pure, strict, two-locus epistatic models, (2) Investigate the effect of model 'architecture' on detection difficulty, and (3) Explore how adjusting GAMETES constraints influences diversity in the generated models. RESULTS: In this study we utilized a geometric approach to classify pure, strict, two-locus epistatic models by "shape". In total, 33 unique shape symmetry classes were identified. Using a detection difficulty metric, we found that model shape was consistently a significant predictor of model detection difficulty. Additionally, after categorizing shape classes by the number of edges in their shape projections, we found that this edge number was also significantly predictive of detection difficulty. Analysis of constraints within GAMETES indicated that increasing model population size can expand model class coverage but does little to change the range of observed difficulty metric scores. A variable population prevalence significantly increased the range of observed difficulty metric scores and, for certain constraints, also improved model class coverage. CONCLUSIONS: These analyses further our theoretical understanding of epistatic relationships and uncover guidelines for the effective generation of complex models using GAMETES. Specifically, (1) we have characterized 33 shape classes by edge number, detection difficulty, and observed frequency (2) our results support the claim that model architecture directly influences detection difficulty, and (3) we found that GAMETES will generate a maximally diverse set of models with a variable population prevalence and a larger model population size. However, a model population size as small as 1,000 is likely to be sufficient.

An information-gain approach to detecting three-way epistatic interactions in genetic association studies.

Hu, Ting; Chen, Yuanzhu; Kiralis, Jeff W; Collins, Ryan L; Wejse, Christian; Sirugo, Giorgio; Williams, Scott M; Moore, Jason H.

J Am Med Inform Assoc ; 20(4): 630-6, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-23396514

RESUMO

BACKGROUND: Epistasis has been historically used to describe the phenomenon that the effect of a given gene on a phenotype can be dependent on one or more other genes, and is an essential element for understanding the association between genetic and phenotypic variations. Quantifying epistasis of orders higher than two is very challenging due to both the computational complexity of enumerating all possible combinations in genome-wide data and the lack of efficient and effective methodologies. OBJECTIVES: In this study, we propose a fast, non-parametric, and model-free measure for three-way epistasis. METHODS: Such a measure is based on information gain, and is able to separate all lower order effects from pure three-way epistasis. RESULTS: Our method was verified on synthetic data and applied to real data from a candidate-gene study of tuberculosis in a West African population. In the tuberculosis data, we found a statistically significant pure three-way epistatic interaction effect that was stronger than any lower-order associations. CONCLUSION: Our study provides a methodological basis for detecting and characterizing high-order gene-gene interactions in genetic association studies.

Assuntos

Biologia Computacional/métodos , Epistasia Genética , Estudos de Associação Genética , Predisposição Genética para Doença , Humanos , Teoria da Informação , Estatísticas não Paramétricas

Predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selection.

Urbanowicz, Ryan J; Kiralis, Jeff; Fisher, Jonathan M; Moore, Jason H.

BioData Min ; 5(1): 15, 2012 Sep 26.

Artigo em Inglês | MEDLINE | ID: mdl-23014095

RESUMO

BACKGROUND: Algorithms designed to detect complex genetic disease associations are initially evaluated using simulated datasets. Typical evaluations vary constraints that influence the correct detection of underlying models (i.e. number of loci, heritability, and minor allele frequency). Such studies neglect to account for model architecture (i.e. the unique specification and arrangement of penetrance values comprising the genetic model), which alone can influence the detectability of a model. In order to design a simulation study which efficiently takes architecture into account, a reliable metric is needed for model selection. RESULTS: We evaluate three metrics as predictors of relative model detection difficulty derived from previous works: (1) Penetrance table variance (PTV), (2) customized odds ratio (COR), and (3) our own Ease of Detection Measure (EDM), calculated from the penetrance values and respective genotype frequencies of each simulated genetic model. We evaluate the reliability of these metrics across three very different data search algorithms, each with the capacity to detect epistatic interactions. We find that a model's EDM and COR are each stronger predictors of model detection success than heritability. CONCLUSIONS: This study formally identifies and evaluates metrics which quantify model detection difficulty. We utilize these metrics to intelligently select models from a population of potential architectures. This allows for an improved simulation study design which accounts for differences in detection difficulty attributed to model architecture. We implement the calculation and utilization of EDM and COR into GAMETES, an algorithm which rapidly and precisely generates pure, strict, n-locus epistatic models.

GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures.

Urbanowicz, Ryan J; Kiralis, Jeff; Sinnott-Armstrong, Nicholas A; Heberling, Tamra; Fisher, Jonathan M; Moore, Jason H.

BioData Min ; 5(1): 16, 2012 Oct 01.

Artigo em Inglês | MEDLINE | ID: mdl-23025260

RESUMO

BACKGROUND: Geneticists who look beyond single locus disease associations require additional strategies for the detection of complex multi-locus effects. Epistasis, a multi-locus masking effect, presents a particular challenge, and has been the target of bioinformatic development. Thorough evaluation of new algorithms calls for simulation studies in which known disease models are sought. To date, the best methods for generating simulated multi-locus epistatic models rely on genetic algorithms. However, such methods are computationally expensive, difficult to adapt to multiple objectives, and unlikely to yield models with a precise form of epistasis which we refer to as pure and strict. Purely and strictly epistatic models constitute the worst-case in terms of detecting disease associations, since such associations may only be observed if all n-loci are included in the disease model. This makes them an attractive gold standard for simulation studies considering complex multi-locus effects. RESULTS: We introduce GAMETES, a user-friendly software package and algorithm which generates complex biallelic single nucleotide polymorphism (SNP) disease models for simulation studies. GAMETES rapidly and precisely generates random, pure, strict n-locus models with specified genetic constraints. These constraints include heritability, minor allele frequencies of the SNPs, and population prevalence. GAMETES also includes a simple dataset simulation strategy which may be utilized to rapidly generate an archive of simulated datasets for given genetic models. We highlight the utility and limitations of GAMETES with an example simulation study using MDR, an algorithm designed to detect epistasis. CONCLUSIONS: GAMETES is a fast, flexible, and precise tool for generating complex n-locus models with random architectures. While GAMETES has a limited ability to generate models with higher heritabilities, it is proficient at generating the lower heritability models typically used in simulation studies evaluating new algorithms. In addition, the GAMETES modeling strategy may be flexibly combined with any dataset simulation strategy. Beyond dataset simulation, GAMETES could be employed to pursue theoretical characterization of genetic models and epistasis.

Spatially uniform relieff (SURF) for computationally-efficient filtering of gene-gene interactions.

Greene, Casey S; Penrod, Nadia M; Kiralis, Jeff; Moore, Jason H.

BioData Min ; 2(1): 5, 2009 Sep 22.

Artigo em Inglês | MEDLINE | ID: mdl-19772641

RESUMO

BACKGROUND: Genome-wide association studies are becoming the de facto standard in the genetic analysis of common human diseases. Given the complexity and robustness of biological networks such diseases are unlikely to be the result of single points of failure but instead likely arise from the joint failure of two or more interacting components. The hope in genome-wide screens is that these points of failure can be linked to single nucleotide polymorphisms (SNPs) which confer disease susceptibility. Detecting interacting variants that lead to disease in the absence of single-gene effects is difficult however, and methods to exhaustively analyze sets of these variants for interactions are combinatorial in nature thus making them computationally infeasible. Efficient algorithms which can detect interacting SNPs are needed. ReliefF is one such promising algorithm, although it has low success rate for noisy datasets when the interaction effect is small. ReliefF has been paired with an iterative approach, Tuned ReliefF (TuRF), which improves the estimation of weights in noisy data but does not fundamentally change the underlying ReliefF algorithm. To improve the sensitivity of studies using these methods to detect small effects we introduce Spatially Uniform ReliefF (SURF). RESULTS: SURF's ability to detect interactions in this domain is significantly greater than that of ReliefF. Similarly SURF, in combination with the TuRF strategy significantly outperforms TuRF alone for SNP selection under an epistasis model. It is important to note that this success rate increase does not require an increase in algorithmic complexity and allows for increased success rate, even with the removal of a nuisance parameter from the algorithm. CONCLUSION: Researchers performing genetic association studies and aiming to discover gene-gene interactions associated with increased disease susceptibility should use SURF in place of ReliefF. For instance, SURF should be used instead of ReliefF to filter a dataset before an exhaustive MDR analysis. This change increases the ability of a study to detect gene-gene interactions. The SURF algorithm is implemented in the open source Multifactor Dimensionality Reduction (MDR) software package available from http://www.epistasis.org.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA