Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
J Math Biol ; 82(1-2): 6, 2021 01 22.
Artículo en Inglés | MEDLINE | ID: mdl-33483865

RESUMEN

The coupled Wright-Fisher diffusion is a multi-dimensional Wright-Fisher diffusion for multi-locus and multi-allelic genetic frequencies, expressed as the strong solution to a system of stochastic differential equations that are coupled in the drift, where the pairwise interaction among loci is modelled by an inter-locus selection. In this paper, an ancestral process, which is dual to the coupled Wright-Fisher diffusion, is derived. The dual process corresponds to the block counting process of coupled ancestral selection graphs, one for each locus. Jumps of the dual process arise from coalescence, mutation, single-branching, which occur at one locus at the time, and double-branching, which occur simultaneously at two loci. The coalescence and mutation rates have the typical structure of the transition rates of the Kingman coalescent process. The single-branching rate not only contains the one-locus selection parameters in a form that generalises the rates of an ancestral selection graph, but it also contains the two-locus selection parameters to include the effect of the pairwise interaction on the single loci. The double-branching rate reflects the particular structure of pairwise selection interactions of the coupled Wright-Fisher diffusion. Moreover, in the special case of two loci, two alleles, with selection and parent independent mutation, the stationary density for the coupled Wright-Fisher diffusion and the transition rates of the dual process are obtained in an explicit form.


Asunto(s)
Modelos Genéticos , Tasa de Mutación , Alelos , Frecuencia de los Genes , Genética de Población , Mutación , Selección Genética
3.
J Bioinform Comput Biol ; 3(4): 861-90, 2005 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-16078365

RESUMEN

A molecular interaction library modeling favorable non-bonded interactions between atoms and molecular fragments is considered. In this paper, we represent the structure of the interaction library by a network diagram, which demonstrates that the underlying prediction model obtained for a molecular fragment is multi-layered. We clustered the molecular fragments into four groups by analyzing the pairwise distances between the molecular fragments. The distances are represented as an unrooted tree, in which the molecular fragments fall into four groups according to their function. For each fragment group, we modeled a group-specific a priori distribution with a Dirichlet distribution. The group-specific Dirichlet distributions enable us to derive a large population of similar molecular fragments that vary only in their contact preferences. Bayes' theorem then leads to a population distribution of the posterior probability vectors referred to as a "Dickey-Savage"-density. Two known methods for approximating multivariate integrals are applied to obtain marginal distributions of the Dickey-Savage density. The results of the numerical integration methods are compared with the simulated marginal distributions. By studying interactions between the protein structure of cyclohydrolase and its ligand guanosine-5'-triphosphate, we show that the marginal distributions of the posterior probabilities are more informative than the corresponding point estimates.


Asunto(s)
Algoritmos , Modelos Químicos , Modelos Moleculares , Mapeo de Interacción de Proteínas/métodos , Proteínas/química , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Secuencia de Aminoácidos , Sitios de Unión , Simulación por Computador , Modelos Estadísticos , Datos de Secuencia Molecular , Unión Proteica , Proteínas/análisis
4.
Eur J Hum Genet ; 23(5): 688-92, 2015 May.
Artículo en Inglés | MEDLINE | ID: mdl-25159868

RESUMEN

In an attempt to map chromosomal regions carrying rare gene variants contributing to the risk of multiple sclerosis (MS), we identified segments shared identical-by-descent (IBD) using the software BEAGLE 4.0's refined IBD analysis. IBD mapping aims at identifying segments inherited from a common ancestor and shared more frequently in case-case pairs. A total of 2106 MS patients of Nordic origin and 624 matched controls were genotyped on Illumina Human Quad 660 chip and an additional 1352 ethnically matched controls typed on Illumina HumanHap 550 and Illumina 1M were added. The quality control left a total of 441 731 markers for the analysis. After identification of segments shared by descent and significance testing, a filter function for markers with low IBD sharing was applied. Four regions on chromosomes 5, 9, 14 and 19 were found to be significantly associated with the risk for MS. However, all markers but for one were located telomerically, including the very distal markers. For methodological reasons, such segments have a low sharing of IBD signals and are prone to be false positives. One marker on chromosome 19 reached genome-wide significance and was not one of the distal markers. This marker was located within the GNA11 gene, which contains no previous association with MS. We conclude that IBD mapping is not sufficiently powered to identify MS risk loci even in ethnically relatively homogenous populations, or that alternatively rare variants are not adequately present.


Asunto(s)
Mapeo Cromosómico , Estudio de Asociación del Genoma Completo , Esclerosis Múltiple/genética , Estudios de Cohortes , Marcadores Genéticos , Humanos , Mutación , Países Escandinavos y Nórdicos
5.
Syst Appl Microbiol ; 25(3): 403-15, 2002 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-12421078

RESUMEN

We apply minimization of stochastic complexity and the closely related method of cumulative classification to analyse the extensively studied BIOLOG GN data of Vibrio spp. Minimization of stochastic complexity provides an objective tool of bacterial taxonomy as it produces classifications that are optimal from the point of view of information theory. We compare the outcome of our results with previously published classifications of the same data set. Our results both confirm earlier detected relationships between species and discover new ones.


Asunto(s)
Clasificación/métodos , Biología Computacional , Vibrio/clasificación , Algoritmos , Técnicas de Tipificación Bacteriana , Procesos Estocásticos , Vibrio/genética , Vibrio/fisiología
6.
Math Biosci ; 177-178: 161-84, 2002.
Artículo en Inglés | MEDLINE | ID: mdl-11965254

RESUMEN

We present a theory of classification and predictive identification of bacteria. Bacterial strains are characterized by a binary vector and the taxonomy is specified by attaching a label to each vector. The theory is developed from only two basic assumptions, viz. that the sequence of pairs of feature vectors and the attached labels is judged (infinitely) exchangeable and predictively sufficient. We derive expressions for the training error and the probability of identification error and show that latter is an affine function of the former. We prove the law of large numbers for identification matrices, which contain the fundamental information of bacterial data. We prove the Bayesian risk consistency of the predictive identification rule given by the theory and show that the training error is a consistent estimate of the generalization error.


Asunto(s)
Bacterias/clasificación , Teorema de Bayes , Clasificación/métodos , Modelos Biológicos
7.
Bull Math Biol ; 69(3): 797-815, 2007 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-17086368

RESUMEN

We introduce a Bayesian theoretical formulation of the statistical learning problem concerning the genetic structure of populations. The two key concepts in our derivation are exchangeability in its various forms and random allocation models. Implications of our results to empirical investigation of the population structure are discussed.


Asunto(s)
Teorema de Bayes , Genética de Población , Modelos Genéticos , Animales , Evolución Biológica , Drosophila melanogaster/genética
8.
Int J Syst Evol Microbiol ; 55(Pt 1): 57-66, 2005 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-15653853

RESUMEN

Minimization of stochastic complexity (SC) was used as a method for classification of genotypic fingerprints. The method was applied to fluorescent amplified fragment length polymorphism (fAFLP) fingerprint patterns of 507 Vibrionaceae representatives. As the current BinClass implementation of the optimization algorithm for classification only works on binary vectors, the original fingerprints were discretized in a preliminary step using the sliding-window band-matching method, in order to maximally preserve the information content of the original band patterns. The novel classification generated using the BinClass software package was subjected to an in-depth comparison with a hierarchical classification of the same dataset, in order to acknowledge the applicability of the new classification method as a more objective algorithm for the classification of genotyping fingerprint patterns. Recent DNA-DNA hybridization and 16S rRNA gene sequence experiments proved that the classification based on SC-minimization forms separate clusters that contain the fAFLP patterns for all representatives of the species Enterovibrio norvegicus, Vibrio fortis, Vibrio diazotrophicus or Vibrio campbellii, while previous hierarchical cluster analysis had suggested more heterogeneity within the fAFLP patterns by splitting the representatives of the above-mentioned species into multiple distant clusters. As a result, the new classification methodology has highlighted some previously unseen relationships within the biodiversity of the family Vibrionaceae.


Asunto(s)
Técnicas de Tipificación Bacteriana , Dermatoglifia del ADN/métodos , Polimorfismo de Longitud del Fragmento de Restricción , Vibrionaceae/clasificación , Algoritmos , Genotipo , Programas Informáticos , Procesos Estocásticos , Vibrionaceae/genética
9.
J Comput Aided Mol Des ; 17(7): 435-61, 2003 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-14677639

RESUMEN

We describe a library of molecular fragments designed to model and predict non-bonded interactions between atoms. We apply the Bayesian approach, whereby prior knowledge and uncertainty of the mathematical model are incorporated into the estimated model and its parameters. The molecular interaction data are strengthened by narrowing the atom classification to 14 atom types, focusing on independent molecular contacts that lie within a short cutoff distance, and symmetrizing the interaction data for the molecular fragments. Furthermore, the location of atoms in contact with a molecular fragment are modeled by Gaussian mixture densities whose maximum a posteriori estimates are obtained by applying a version of the expectation-maximization algorithm that incorporates hyperparameters for the components of the Gaussian mixtures. A routine is introduced providing the hyperparameters and the initial values of the parameters of the Gaussian mixture densities. A model selection criterion, based on the concept of a 'minimum message length' is used to automatically select the optimal complexity of a mixture model and the most suitable orientation of a reference frame for a fragment in a coordinate system. The type of atom interacting with a molecular fragment is predicted by values of the posterior probability function and the accuracy of these predictions is evaluated by comparing the predicted atom type with the actual atom type seen in crystal structures. The fact that an atom will simultaneously interact with several molecular fragments forming a cohesive network of interactions is exploited by introducing two strategies that combine the predictions of atom types given by multiple fragments. The accuracy of these combined predictions is compared with those based on an individual fragment. Exhaustive validation analyses and qualitative examples (e.g., the ligand-binding domain of glutamate receptors) demonstrate that these improvements lead to effective modeling and prediction of molecular interactions.


Asunto(s)
Teorema de Bayes , Biblioteca de Péptidos , Algoritmos , Automatización , Diseño de Fármacos , Ligandos , Modelos Moleculares , Distribución Normal , Probabilidad
10.
Bioinformatics ; 18(9): 1257-63, 2002 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-12217918

RESUMEN

MOTIVATION: Previously, Rantanen et al. (2001; J. Mol. Biol., 313, 197-214) constructed a protein atom-ligand fragment interaction library embodying experimentally solved, high-resolution three-dimensional (3D) structural data from the Protein Data Bank (PDB). The spatial locations of protein atoms that surround ligand fragments were modeled with Gaussian mixture models, the parameters of which were estimated with the expectation-maximization (EM) algorithm. In the validation analysis of this library, there was strong indication that the protein atom classification, 24 classes, was too large and that a reduction in the classes would lead to improved predictions. RESULTS: Here, a dissimilarity (distance) matrix that is suitable for comparison and fusion of 24 pre-defined protein atom classes has been derived. Jeffreys' distances between Gaussian mixture models are used as a basis to estimate dissimilarities between protein atom classes. The dissimilarity data are analyzed both with a hierarchical clustering method and independently by using multidimensional scaling analysis. The results provide additional insight into the relationships between different protein atom classes, giving us guidance on, for example, how to readjust protein atom classification and, thus, they will help us to improve protein--ligand interaction predictions. CONTACT: vira@utu.fi


Asunto(s)
Bases de Datos de Proteínas , Modelos Moleculares , Modelos Estadísticos , Distribución Normal , Proteínas/química , Proteínas/clasificación , Análisis por Conglomerados , Ligandos , Conformación Proteica , Reproducibilidad de los Resultados , Sensibilidad y Especificidad
11.
Bull Math Biol ; 66(6): 1575-96, 2004 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-15522346

RESUMEN

Microbiologists have traditionally applied hierarchical clustering algorithms as their mathematical tool of choice to unravel the taxonomic relationships between micro-organisms. However, the interpretation of such hierarchical classifications suffers from being subjective, in that a variety of ad hoc choices must be made during their construction. On the other hand, the application of more profound and objective mathematical methods--such as the minimization of stochastic complexity--for the classification of bacterial genotyping fingerprints data is hampered by the prerequisite that such methods only act upon vectorized data. In this paper we introduce a new method, coined sliding window discretization, for the transformation of genotypic fingerprint patterns into binary vector format. In the context of an extensive amplified fragment length polymorphism (AFLP) data set of 507 strains from the Vibrionaceae family that has previously been analysed, we demonstrate by comparison with a number of other discretization methods that this new discretization method results in minimal loss of the original information content captured in the banding patterns. Finally, we investigate the implications of the different discretization methods on the classification of bacterial genotyping fingerprints by minimization of stochastic complexity, as it is implemented in the BinClass software package for probabilistic clustering of binary vectors. The new taxonomic insights learned from the resulting classification of the AFLP patterns will prove the value of combining sliding window discretization with minimization of stochastic complexity, as an alternative classification algorithm for bacterial genotyping fingerprints.


Asunto(s)
Bacterias/genética , Dermatoglifia del ADN/métodos , ADN Bacteriano/genética , Bacterias/clasificación , Cómputos Matemáticos , Modelos Genéticos , Vibrio/clasificación , Vibrio/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA