Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
1.
J Comput Biol ; 30(3): 323-336, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36322888

RESUMO

Information theory-based measures of variable dependency (previously published) have been implemented into a software package, MIST. The design of the software and its potential uses are described, and a demonstration is presented in the discovery of modifier alleles of the ApoE gene in affecting Alzheimer's disease (AD) by analyzing the UK Biobank dataset. The modifier genes uncovered overlap strongly with genes found to be associated with AD. Others include many known to influence AD. We discuss a range of uses of the dependency calculations using MIST that can uncover additional genetic effects in similar complex datasets, like higher degrees of interaction and phenotypic pleiotropy.


Assuntos
Doença de Alzheimer , Humanos , Alelos , Doença de Alzheimer/genética , Teoria da Informação , Apolipoproteínas E/genética , Genótipo
2.
Front Neurosci ; 15: 720778, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34580583

RESUMO

A history of traumatic brain injury (TBI) increases the odds of developing Alzheimer's disease (AD). The long latent period between injury and dementia makes it difficult to study molecular changes initiated by TBI that may increase the risk of developing AD. MicroRNA (miRNA) levels are altered in TBI at acute times post-injury (<4 weeks), and in AD. We hypothesized that miRNA levels in cerebrospinal fluid (CSF) following TBI in veterans may be indicative of increased risk for developing AD. Our population of interest is cognitively normal veterans with a history of one or more mild TBI (mTBI) at a chronic time following TBI. We measured miRNA levels in CSF from three groups of participants: (1) community controls with no lifetime history of TBI (ComC); (2) deployed Iraq/Afghanistan veterans with no lifetime history of TBI (DepC), and (3) deployed Iraq/Afghanistan veterans with a history of repetitive blast mTBI (DepTBI). CSF samples were collected at the baseline visit in a longitudinal, multimodal assessment of Gulf War veterans, and represent a heterogenous group of male veterans and community controls. The average time since the last blast mTBI experienced was 4.7 ± 2.2 years [1.5 - 11.5]. Statistical analysis of TaqManTM miRNA array data revealed 18 miRNAs with significant differential expression in the group comparisons: 10 between DepTBI and ComC, 7 between DepC and ComC, and 8 between DepTBI and DepC. We also identified 8 miRNAs with significant differential detection in the group comparisons: 5 in DepTBI vs. ComC, 3 in DepC vs. ComC, and 2 in DepTBI vs. DepC. When we applied our previously developed multivariable dependence analysis, we found 13 miRNAs (6 of which are altered in levels or detection) that show dependencies with participant phenotypes, e.g., ApoE. Target prediction and pathway analysis with miRNAs differentially expressed in DepTBI vs. either DepC or ComC identified canonical pathways highly relevant to TBI including senescence and ephrin receptor signaling, respectively. This study shows that both TBI and deployment result in persistent changes in CSF miRNA levels that are relevant to known miRNA-mediated AD pathology, and which may reflect early events in AD.

3.
BMC Bioinformatics ; 22(1): 180, 2021 Apr 07.
Artigo em Inglês | MEDLINE | ID: mdl-33827420

RESUMO

BACKGROUND: Permutation testing is often considered the "gold standard" for multi-test significance analysis, as it is an exact test requiring few assumptions about the distribution being computed. However, it can be computationally very expensive, particularly in its naive form in which the full analysis pipeline is re-run after permuting the phenotype labels. This can become intractable in multi-locus genome-wide association studies (GWAS), in which the number of potential interactions to be tested is combinatorially large. RESULTS: In this paper, we develop an approach for permutation testing in multi-locus GWAS, specifically focusing on SNP-SNP-phenotype interactions using multivariable measures that can be computed from frequency count tables, such as those based in Information Theory. We find that the computational bottleneck in this process is the construction of the count tables themselves, and that this step can be eliminated at each iteration of the permutation testing by transforming the count tables directly. This leads to a speed-up by a factor of over 103 for a typical permutation test compared to the naive approach. Additionally, this approach is insensitive to the number of samples making it suitable for datasets with large number of samples. CONCLUSIONS: The proliferation of large-scale datasets with genotype data for hundreds of thousands of individuals enables new and more powerful approaches for the detection of multi-locus genotype-phenotype interactions. Our approach significantly improves the computational tractability of permutation testing for these studies. Moreover, our approach is insensitive to the large number of samples in these modern datasets. The code for performing these computations and replicating the figures in this paper is freely available at https://github.com/kunert/permute-counts .


Assuntos
Epistasia Genética , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Genótipo , Humanos , Fenótipo
4.
Sci Rep ; 11(1): 3627, 2021 02 11.
Artigo em Inglês | MEDLINE | ID: mdl-33574451

RESUMO

Our aim was to investigate the associations between erythrocyte fatty acids and the risk of islet autoimmunity in children. The Environmental Determinants of Diabetes in the Young Study (TEDDY) is a longitudinal cohort study of children at high genetic risk for type 1 diabetes (n = 8676) born between 2004 and 2010 in the U.S., Finland, Sweden, and Germany. A nested case-control design comprised 398 cases with islet autoimmunity and 1178 sero-negative controls matched for clinical site, family history, and gender. Fatty acids composition was measured in erythrocytes collected at the age of 3, 6, and 12 months and then annually up to 6 years of age. Conditional logistic regression models were adjusted for HLA risk genotype, ancestry, and weight z-score. Higher eicosapentaenoic and docosapentaenoic acid (n - 3 polyunsaturated fatty acids) levels during infancy and conjugated linoleic acid after infancy were associated with a lower risk of islet autoimmunity. Furthermore, higher levels of some even-chain saturated (SFA) and monounsaturated fatty acids (MUFA) were associated with increased risk. Fatty acid status in early life may signal the risk for islet autoimmunity, especially n - 3 fatty acids may be protective, while increased levels of some SFAs and MUFAs may precede islet autoimmunity.


Assuntos
Autoimunidade , Eritrócitos/metabolismo , Ácidos Graxos/metabolismo , Ilhotas Pancreáticas/imunologia , Aleitamento Materno , Estudos de Casos e Controles , Criança , Pré-Escolar , Feminino , Humanos , Lactente , Masculino , Fatores de Risco
5.
J Comput Biol ; 28(6): 527-559, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-33395537

RESUMO

Quantitative genetics has evolved dramatically in the past century, and the proliferation of genetic data, in quantity as well as type, enables the characterization of complex interactions and mechanisms beyond the scope of its theoretical foundations. In this article, we argue that revisiting the framework for analysis is important and we begin to lay the foundations of an alternative formulation of quantitative genetics based on information theory. Information theory can provide sensitive and unbiased measures of statistical dependencies among variables, and it provides a natural mathematical language for an alternative view of quantitative genetics. In the previous work, we examined the information content of discrete functions and applied this approach and methods to the analysis of genetic data. In this article, we present a framework built around a set of relationships that both unifies the information measures for the discrete functions and uses them to express key quantitative genetic relationships. Information theory measures of variable interdependency are used to identify significant interactions, and a general approach is described for inferring functional relationships in genotype and phenotype data. We present information-based measures of the genetic quantities: penetrance, heritability, and degrees of statistical epistasis. Our scope here includes the consideration of both two- and three-variable dependencies and independently segregating variants, which captures additive effects, genetic interactions, and two-phenotype pleiotropy. This formalism and the theoretical approach naturally apply to higher multivariable interactions and complex dependencies, and can be adapted to account for population structure, linkage, and nonrandomly segregating markers. This article thus focuses on presenting the initial groundwork for a full formulation of quantitative genetics based on information theory.


Assuntos
Teoria da Informação , Modelos Genéticos , Bases de Dados Genéticas , Genoma Fúngico , Estudo de Associação Genômica Ampla/métodos , Genômica/métodos , Polimorfismo de Nucleotídeo Único , Saccharomyces cerevisiae
6.
Entropy (Basel) ; 22(12)2020 Nov 24.
Artigo em Inglês | MEDLINE | ID: mdl-33266517

RESUMO

Information theory provides robust measures of multivariable interdependence, but classically does little to characterize the multivariable relationships it detects. The Partial Information Decomposition (PID) characterizes the mutual information between variables by decomposing it into unique, redundant, and synergistic components. This has been usefully applied, particularly in neuroscience, but there is currently no generally accepted method for its computation. Independently, the Information Delta framework characterizes non-pairwise dependencies in genetic datasets. This framework has developed an intuitive geometric interpretation for how discrete functions encode information, but lacks some important generalizations. This paper shows that the PID and Delta frameworks are largely equivalent. We equate their key expressions, allowing for results in one framework to apply towards open questions in the other. For example, we find that the approach of Bertschinger et al. is useful for the open Information Delta question of how to deal with linkage disequilibrium. We also show how PID solutions can be mapped onto the space of delta measures. Using Bertschinger et al. as an example solution, we identify a specific plane in delta-space on which this approach's optimization is constrained, and compute it for all possible three-variable discrete functions of a three-letter alphabet. This yields a clear geometric picture of how a given solution decomposes information.

7.
PLoS One ; 15(12): e0242684, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33270668

RESUMO

The genetic mechanisms of childhood development in its many facets remain largely undeciphered. In the population of healthy infants studied in the Growing Up in Singapore Towards Healthy Outcomes (GUSTO) program, we have identified a range of dependencies among the observed phenotypes of fetal and early childhood growth, neurological development, and a number of genetic variants. We have quantified these dependencies using our information theory-based methods. The genetic variants show dependencies with single phenotypes as well as pleiotropic effects on more than one phenotype and thereby point to a large number of brain-specific and brain-expressed gene candidates. These dependencies provide a basis for connecting a range of variants with a spectrum of phenotypes (pleiotropy) as well as with each other. A broad survey of known regulatory expression characteristics, and other function-related information from the literature for these sets of candidate genes allowed us to assemble an integrated body of evidence, including a partial regulatory network, that points towards the biological basis of these general dependencies. Notable among the implicated loci are RAB11FIP4 (next to NF1), MTMR7 and PLD5, all highly expressed in the brain; DNMT1 (DNA methyl transferase), highly expressed in the placenta; and PPP1R12B and DMD (dystrophin), known to be important growth and development genes. While we cannot specify and decipher the mechanisms responsible for the phenotypes in this study, a number of connections for further investigation of fetal and early childhood growth and neurological development are indicated. These results and this approach open the door to new explorations of early human development.


Assuntos
Desenvolvimento Infantil , Desenvolvimento Fetal/genética , Sistema Nervoso/crescimento & desenvolvimento , Criança , Cromatina/genética , Epistasia Genética , Perfilação da Expressão Gênica , Regulação da Expressão Gênica no Desenvolvimento , Redes Reguladoras de Genes , Loci Gênicos , Pleiotropia Genética , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Desequilíbrio de Ligação/genética , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas/genética
8.
G3 (Bethesda) ; 9(7): 2071-2088, 2019 07 09.
Artigo em Inglês | MEDLINE | ID: mdl-31109921

RESUMO

We describe an information-theory-based method and associated software for computationally identifying sister spores derived from the same meiotic tetrad. The method exploits specific DNA sequence features of tetrads that result from meiotic centromere and allele segregation patterns. Because the method uses only the genomic sequence, it alleviates the need for tetrad-specific barcodes or other genetic modifications to the strains. Using this method, strains derived from randomly arrayed spores can be efficiently grouped back into tetrads.


Assuntos
Biologia Computacional/métodos , Software , Leveduras/fisiologia , Alelos , Segregação de Cromossomos , Regulação Fúngica da Expressão Gênica , Meiose , Recombinação Genética , Reprodutibilidade dos Testes , Esporos Fúngicos
9.
J Comput Biol ; 26(2): 152-171, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-30495984

RESUMO

Missing values in complex biological data sets have significant impacts on our ability to correctly detect and quantify interactions in biological systems and to infer relationships accurately. In this article, we propose a useful metaphor to show that information theory measures, such as mutual information and interaction information, can be employed directly for evaluating multivariable dependencies even if data contain some missing values. The metaphor is that of thinking of variable dependencies as information channels between and among variables. In this view, missing data can be thought of as noise that reduces the channel capacity in predictable ways. We extract the available information in the data even if there are missing values and use the notion of channel capacity to assess the reliability of the result. This avoids the common practice-in the absence of prior knowledge of random imputation-of eliminating samples entirely, thus losing the information they can provide. We show how this reliability function can be implemented for pairs of variables, and generalize it for an arbitrary number of variables. Illustrations of the reliability functions for several cases are provided using simulated data.


Assuntos
Bases de Dados Genéticas/normas , Teoria da Informação , Análise Multivariada , Análise de Sequência de DNA/métodos , Animais , Confiabilidade dos Dados , Humanos , Reprodutibilidade dos Testes , Análise de Sequência de DNA/normas
10.
Entropy (Basel) ; 21(1)2019 Jan 18.
Artigo em Inglês | MEDLINE | ID: mdl-33266804

RESUMO

Relations between common information measures include the duality relations based on Möbius inversion on lattices, which are the direct consequence of the symmetries of the lattices of the sets of variables (subsets ordered by inclusion). In this paper we use the lattice and functional symmetries to provide a unifying formalism that reveals some new relations and systematizes the symmetries of the information functions. To our knowledge, this is the first systematic examination of the full range of relationships of this class of functions. We define operators on functions on these lattices based on the Möbius inversions that map functions into one another, which we call Möbius operators, and show that they form a simple group isomorphic to the symmetric group S3. Relations among the set of functions on the lattice are transparently expressed in terms of the operator algebra, and, when applied to the information measures, can be used to derive a wide range of relationships among diverse information measures. The Möbius operator algebra is then naturally generalized which yields an even wider range of new relationships.

11.
J Comput Biol ; 24(12): 1153-1178, 2017 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-29028175

RESUMO

The complex of central problems in data analysis consists of three components: (1) detecting the dependence of variables using quantitative measures, (2) defining the significance of these dependence measures, and (3) inferring the functional relationships among dependent variables. We have argued previously that an information theory approach allows separation of the detection problem from the inference of functional form problem. We approach here the third component of inferring functional forms based on information encoded in the functions. We present here a direct method for classifying the functional forms of discrete functions of three variables represented in data sets. Discrete variables are frequently encountered in data analysis, both as the result of inherently categorical variables and from the binning of continuous numerical variables into discrete alphabets of values. The fundamental question of how much information is contained in a given function is answered for these discrete functions, and their surprisingly complex relationships are illustrated. The all-important effect of noise on the inference of function classes is found to be highly heterogeneous and reveals some unexpected patterns. We apply this classification approach to an important area of biological data analysis-that of inference of genetic interactions. Genetic analysis provides a rich source of real and complex biological data analysis problems, and our general methods provide an analytical basis and tools for characterizing genetic problems and for analyzing genetic data. We illustrate the functional description and the classes of a number of common genetic interaction modes and also show how different modes vary widely in their sensitivity to noise.


Assuntos
Algoritmos , Biologia Computacional/métodos , Interpretação Estatística de Dados , Epistasia Genética , Teoria da Informação , Humanos , Razão Sinal-Ruído
12.
Nucleic Acids Res ; 45(11): e104, 2017 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-28369495

RESUMO

The use of RNA-seq as the preferred method for the discovery and validation of small RNA biomarkers has been hindered by high quantitative variability and biased sequence counts. In this paper we develop a statistical model for sequence counts that accounts for ligase bias and stochastic variation in sequence counts. This model implies a linear quadratic relation between the mean and variance of sequence counts. Using a large number of sequencing datasets, we demonstrate how one can use the generalized additive models for location, scale and shape (GAMLSS) distributional regression framework to calculate and apply empirical correction factors for ligase bias. Bias correction could remove more than 40% of the bias for miRNAs. Empirical bias correction factors appear to be nearly constant over at least one and up to four orders of magnitude of total RNA input and independent of sample composition. Using synthetic mixes of known composition, we show that the GAMLSS approach can analyze differential expression with greater accuracy, higher sensitivity and specificity than six existing algorithms (DESeq2, edgeR, EBSeq, limma, DSS, voom) for the analysis of small RNA-seq data.


Assuntos
Análise de Sequência de RNA , Algoritmos , Confiabilidade dos Dados , Modelos Lineares , Modelos Genéticos , Distribuição de Poisson , Software , Processos Estocásticos
13.
J Comput Biol ; 22(11): 1005-24, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26335709

RESUMO

Information theory is valuable in multiple-variable analysis for being model-free and nonparametric, and for the modest sensitivity to undersampling. We previously introduced a general approach to finding multiple dependencies that provides accurate measures of levels of dependency for subsets of variables in a data set, which is significantly nonzero only if the subset of variables is collectively dependent. This is useful, however, only if we can avoid a combinatorial explosion of calculations for increasing numbers of variables. The proposed dependence measure for a subset of variables, τ, differential interaction information, Δ(τ), has the property that for subsets of τ some of the factors of Δ(τ) are significantly nonzero, when the full dependence includes more variables. We use this property to suppress the combinatorial explosion by following the "shadows" of multivariable dependency on smaller subsets. Rather than calculating the marginal entropies of all subsets at each degree level, we need to consider only calculations for subsets of variables with appropriate "shadows." The number of calculations for n variables at a degree level of d grows therefore, at a much smaller rate than the binomial coefficient (n, d), but depends on the parameters of the "shadows" calculation. This approach, avoiding a combinatorial explosion, enables the use of our multivariable measures on very large data sets. We demonstrate this method on simulated data sets, and characterize the effects of noise and sample numbers. In addition, we analyze a data set of a few thousand mutant yeast strains interacting with a few thousand chemical compounds.


Assuntos
Interpretação Estatística de Dados , Algoritmos , Teoria da Informação , Análise Multivariada , Tamanho da Amostra , Razão Sinal-Ruído , Leveduras/genética
14.
PLoS One ; 9(3): e92310, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24670935

RESUMO

Phenotypic variation, including that which underlies health and disease in humans, results in part from multiple interactions among both genetic variation and environmental factors. While diseases or phenotypes caused by single gene variants can be identified by established association methods and family-based approaches, complex phenotypic traits resulting from multi-gene interactions remain very difficult to characterize. Here we describe a new method based on information theory, and demonstrate how it improves on previous approaches to identifying genetic interactions, including both synthetic and modifier kinds of interactions. We apply our measure, called interaction distance, to previously analyzed data sets of yeast sporulation efficiency, lipid related mouse data and several human disease models to characterize the method. We show how the interaction distance can reveal novel gene interaction candidates in experimental and simulated data sets, and outperforms other measures in several circumstances. The method also allows us to optimize case/control sample composition for clinical studies.


Assuntos
Epistasia Genética , Teoria da Informação , Modelos Genéticos , Animais , Peso Corporal/genética , Feminino , Marcadores Genéticos , Humanos , Masculino , Camundongos , Fenótipo , Polimorfismo de Nucleotídeo Único , Saccharomyces cerevisiae/genética
15.
J Comput Biol ; 21(2): 118-40, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24377753

RESUMO

Context dependence is central to the description of complexity. Keying on the pairwise definition of "set complexity," we use an information theory approach to formulate general measures of systems complexity. We examine the properties of multivariable dependency starting with the concept of interaction information. We then present a new measure for unbiased detection of multivariable dependency, "differential interaction information." This quantity for two variables reduces to the pairwise "set complexity" previously proposed as a context-dependent measure of information in biological systems. We generalize it here to an arbitrary number of variables. Critical limiting properties of the "differential interaction information" are key to the generalization. This measure extends previous ideas about biological information and provides a more sophisticated basis for the study of complexity. The properties of "differential interaction information" also suggest new approaches to data analysis. Given a data set of system measurements, differential interaction information can provide a measure of collective dependence, which can be represented in hypergraphs describing complex system interaction patterns. We investigate this kind of analysis using simulated data sets. The conjoining of a generalized set complexity measure, multivariable dependency analysis, and hypergraphs is our central result. While our focus is on complex biological systems, our results are applicable to any complex system.


Assuntos
Biologia de Sistemas/métodos , Algoritmos , Biologia Computacional/métodos , Simulação por Computador , Modelos Biológicos
16.
Genetics ; 196(3): 853-65, 2014 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-24374355

RESUMO

Dissecting the molecular basis of quantitative traits is a significant challenge and is essential for understanding complex diseases. Even in model organisms, precisely determining causative genes and their interactions has remained elusive, due in part to difficulty in narrowing intervals to single genes and in detecting epistasis or linked quantitative trait loci. These difficulties are exacerbated by limitations in experimental design, such as low numbers of analyzed individuals or of polymorphisms between parental genomes. We address these challenges by applying three independent high-throughput approaches for QTL mapping to map the genetic variants underlying 11 phenotypes in two genetically distant Saccharomyces cerevisiae strains, namely (1) individual analysis of >700 meiotic segregants, (2) bulk segregant analysis, and (3) reciprocal hemizygosity scanning, a new genome-wide method that we developed. We reveal differences in the performance of each approach and, by combining them, identify eight polymorphic genes that affect eight different phenotypes: colony shape, flocculation, growth on two nonfermentable carbon sources, and resistance to two drugs, salt, and high temperature. Our results demonstrate the power of individual segregant analysis to dissect QTL and address the underestimated contribution of interactions between variants. We also reveal confounding factors like mutations and aneuploidy in pooled approaches, providing valuable lessons for future designs of complex trait mapping studies.


Assuntos
Genômica/métodos , Locos de Características Quantitativas , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Aneuploidia , Mapeamento Cromossômico , Variação Genética , Genoma Fúngico , Mutação , Fenótipo
17.
EURASIP J Bioinform Syst Biol ; 2012(1): 13, 2012 Sep 21.
Artigo em Inglês | MEDLINE | ID: mdl-22995062

RESUMO

: We describe some new conceptual tools for the rigorous, mathematical description of the "set-complexity" of graphs. This set-complexity has been shown previously to be a useful measure for analyzing some biological networks, and in discussing biological information in a quantitative fashion. The advances described here allow us to define some significant relationships between the set-complexity measure and the structure of graphs, and of their component sub-graphs. We show here that modular graph structures tend to maximize the set-complexity of graphs. We point out the relationship between modularity and redundancy, and discuss the significance of set-complexity in this regard. We specifically discuss the relationship between complexity and entropy in the case of complete-bipartite graphs, and present a new method for constructing highly complex, binary graphs. These results can be extended to the case of ternary graphs, and to other multi-edge graphs, which are fundamentally more relevant to biological structures and systems. Finally, our results lead us to an approach for extracting high complexity modular graphs from large, noisy graphs with low information content. We illustrate this approach with two examples.

18.
J Comput Biol ; 19(3): 316-36, 2012 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-22401592

RESUMO

For the computational analysis of biological problems-analyzing data, inferring networks and complex models, and estimating model parameters-it is common to use a range of methods based on probabilistic logic constructions, sometimes collectively called machine learning methods. Probabilistic modeling methods such as Bayesian Networks (BN) fall into this class, as do Hierarchical Bayesian Networks (HBN), Probabilistic Boolean Networks (PBN), Hidden Markov Models (HMM), and Markov Logic Networks (MLN). In this review, we describe the most general of these (MLN), and show how the above-mentioned methods are related to MLN and one another by the imposition of constraints and restrictions. This approach allows us to illustrate a broad landscape of constructions and methods, and describe some of the attendant strengths, weaknesses, and constraints of many of these methods. We then provide some examples of their applications to problems in biology and medicine, with an emphasis on genetics. The key concepts needed to picture this landscape of methods are the ideas of probabilistic graphical models, the structures of the graphs, and the scope of the logical language repertoire used (from First-Order Logic [FOL] to Boolean logic.) These concepts are interlinked and together define the nature of each of the probabilistic logic methods. Finally, we discuss the initial applications of MLN to genetics, show the relationship to less general methods like BN, and then mention several examples where such methods could be effective in new applications to specific biological and medical problems.


Assuntos
Cadeias de Markov , Modelos Biológicos , Modelos Estatísticos , Algoritmos , Teorema de Bayes , Marcadores Genéticos , Humanos , Modelos Logísticos , Medicina , Modelos Genéticos , Fenótipo
19.
J Comput Biol ; 17(11): 1491-508, 2010 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-20958249

RESUMO

Complex, non-additive genetic interactions are common and can be critical in determining phenotypes. Genome-wide association studies (GWAS) and similar statistical studies of linkage data, however, assume additive models of gene interactions in looking for genotype-phenotype associations. These statistical methods view the compound effects of multiple genes on a phenotype as a sum of influences of each gene and often miss a substantial part of the heritable effect. Such methods do not use any biological knowledge about underlying mechanisms. Modeling approaches from the artificial intelligence (AI) field that incorporate deterministic knowledge into models to perform statistical analysis can be applied to include prior knowledge in genetic analysis. We chose to use the most general such approach, Markov Logic Networks (MLNs), for combining deterministic knowledge with statistical analysis. Using simple, logistic regression-type MLNs we can replicate the results of traditional statistical methods, but we also show that we are able to go beyond finding independent markers linked to a phenotype by using joint inference without an independence assumption. The method is applied to genetic data on yeast sporulation, a complex phenotype with gene interactions. In addition to detecting all of the previously identified loci associated with sporulation, our method identifies four loci with smaller effects. Since their effect on sporulation is small, these four loci were not detected with methods that do not account for dependence between markers due to gene interactions. We show how gene interactions can be detected using more complex models, which can be used as a general framework for incorporating systems biology with genetics.


Assuntos
Bases de Dados Genéticas , Estudo de Associação Genômica Ampla , Lógica , Cadeias de Markov , Redes Neurais de Computação , Algoritmos , Biomarcadores , Matemática , Esporos Fúngicos , Leveduras/fisiologia
20.
Chaos ; 20(2): 026102, 2010 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-20590331

RESUMO

Multiple high-throughput genetic interaction studies have provided substantial evidence of modularity in genetic interaction networks. However, the correspondence between these network modules and specific pathways of information flow is often ambiguous. Genetic interaction and molecular interaction analyses have not generated large-scale maps comprising multiple clearly delineated linear pathways. We seek to clarify the situation by discerning the difference between genetic modules and classical pathways. We review a method to optimize the discovery of biologically meaningful genetic modules based on a previously described context-dependent information measure to obtain maximally informative networks. We compare the results of this method with the established measures of network clustering and find that it balances global and local clustering information in networks. We further discuss the consequences for genetic interaction networks and propose a framework for the analysis of genetic modularity.


Assuntos
Epistasia Genética , Redes Reguladoras de Genes , Modelos Genéticos , Análise por Conglomerados , Genes Fúngicos , Dinâmica não Linear , Saccharomyces cerevisiae/genética , Biologia de Sistemas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...