Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
PLoS One ; 19(5): e0303155, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38748653

RESUMO

Partite, 3-uniform hypergraphs are 3-uniform hypergraphs in which each hyperedge contains exactly one point from each of the 3 disjoint vertex classes. We consider the degree sequence problem of partite, 3-uniform hypergraphs, that is, to decide if such a hypergraph with prescribed degree sequences exists. We prove that this decision problem is NP-complete in general, and give a polynomial running time algorithm for third almost-regular degree sequences, that is, when each degree in one of the vertex classes is k or k - 1 for some fixed k, and there is no restriction for the other two vertex classes. We also consider the sampling problem, that is, to uniformly sample partite, 3-uniform hypergraphs with prescribed degree sequences. We propose a Parallel Tempering method, where the hypothetical energy of the hypergraphs measures the deviation from the prescribed degree sequence. The method has been implemented and tested on synthetic and real data. It can also be applied for χ2 testing of contingency tables. We have shown that this hypergraph-based χ2 test is more sensitive than the standard χ2 test. The extra sensitivity is especially advantageous on small data sets, where the proposed Parallel Tempering method shows promising performance.


Assuntos
Algoritmos
2.
Molecules ; 27(9)2022 Apr 28.
Artigo em Inglês | MEDLINE | ID: mdl-35566165

RESUMO

In this paper we describe a detailed mechanistic studies on the [FeII(PBO)2(CF3SO3)2] (1), [FeII(PBT)2(CF3SO3)2] (2), and [FeII(PBI)3](CF3SO3)2 (3)-catalyzed (PBO = 2-(2'-pyridyl)benzoxazole, PBT = 2-(2'-pyridyl)benzthiazole, PBI = 2-(2'-pyridyl)benzimidazole) Baeyer-Villiger oxidation of cycloketones by dioxygen with cooxidation of aldehydes and peroxycarboxylic acids, including the kinetics on the reactivity of (µ-1,2-peroxo)diiron(III), acylperoxo- and iodosylbenzene-iron(III) species as key intermediates.


Assuntos
Ferro , Oxigênio , Catálise , Compostos Ferrosos , Iodobenzenos , Oxirredução
3.
Elife ; 112022 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-35040779

RESUMO

Hippocampal place cells are activated sequentially as an animal explores its environment. These activity sequences are internally recreated ('replayed'), either in the same or reversed order, during bursts of activity (sharp wave-ripples [SWRs]) that occur in sleep and awake rest. SWR-associated replay is thought to be critical for the creation and maintenance of long-term memory. In order to identify the cellular and network mechanisms of SWRs and replay, we constructed and simulated a data-driven model of area CA3 of the hippocampus. Our results show that the chain-like structure of recurrent excitatory interactions established during learning not only determines the content of replay, but is essential for the generation of the SWRs as well. We find that bidirectional replay requires the interplay of the experimentally confirmed, temporally symmetric plasticity rule, and cellular adaptation. Our model provides a unifying framework for diverse phenomena involving hippocampal plasticity, representations, and dynamics, and suggests that the structured neural codes induced by learning may have greater influence over cortical network states than previously appreciated.


Assuntos
Ondas Encefálicas/fisiologia , Região CA3 Hipocampal/fisiologia , Aprendizagem/fisiologia , Células de Lugar/fisiologia , Animais , Hipocampo/fisiologia , Interneurônios/fisiologia , Memória/fisiologia , Camundongos , Modelos Teóricos , Sono/fisiologia , Vigília/fisiologia
4.
Cells ; 10(11)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34831269

RESUMO

Over 30 years after the first cancer vaccine clinical trial (CT), scientists still search the missing link between immunogenicity and clinical responses. A predictor able to estimate the outcome of cancer vaccine CTs would greatly benefit vaccine development. Published results of 94 CTs with 64 therapeutic vaccines were collected. We found that preselection of CT subjects based on a single matching HLA allele does not increase immune response rates (IRR) compared with non-preselected CTs (median 60% vs. 57%, p = 0.4490). A representative in silico model population (MP) comprising HLA-genotyped subjects was used to retrospectively calculate in silico IRRs of CTs based on the percentage of MP-subjects having epitope(s) predicted to bind ≥ 1-4 autologous HLA allele(s). We found that in vitro measured IRRs correlated with the frequency of predicted multiple autologous allele-binding epitopes (AUC 0.63-0.79). Subgroup analysis of multi-antigen targeting vaccine CTs revealed correlation between clinical response rates (CRRs) and predicted multi-epitope IRRs when HLA threshold was ≥ 3 (r = 0.7463, p = 0.0004) but not for single HLA allele-binding epitopes (r = 0.2865, p = 0.2491). Our results suggest that CRR depends on the induction of broad T-cell responses and both IRR and CRR can be predicted when epitopes binding to multiple autologous HLAs are considered.


Assuntos
Vacinas Anticâncer/imunologia , Ensaios Clínicos como Assunto , Simulação por Computador , Antígenos de Neoplasias/imunologia , Estudos de Coortes , Epitopos/imunologia , Frequência do Gene/genética , Antígenos HLA/genética , Antígenos HLA/imunologia , Humanos , Resultado do Tratamento
5.
Front Genet ; 12: 684152, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34249101

RESUMO

Long-term immunity to coronaviruses likely stems from T cell activity. We present here a novel approach for the selection of immunoprevalent SARS-CoV-2-derived T cell epitopes using an in silico cohort of HLA-genotyped individuals with different ethnicities. Nine 30-mer peptides derived from the four major structural proteins of SARS-CoV-2 were selected and included in a peptide vaccine candidate to recapitulate the broad virus-specific T cell responses observed in natural infection. PolyPEPI-SCoV-2-specific, polyfunctional CD8+ and CD4+ T cells were detected in each of the 17 asymptomatic/mild COVID-19 convalescents' blood against on average seven different vaccine peptides. Furthermore, convalescents' complete HLA-genotype predicted their T cell responses to SARS-CoV-2-derived peptides with 84% accuracy. Computational extrapolation of this relationship to a cohort of 16,000 HLA-genotyped individuals with 16 different ethnicities suggest that PolyPEPI-SCoV-2 vaccination will likely elicit multi-antigenic T cell responses in 98% of individuals, independent of ethnicity. PolyPEPI-SCoV-2 administered with Montanide ISA 51 VG generated robust, Th1-biased CD8+, and CD4+ T cell responses against all represented proteins, as well as binding antibodies upon subcutaneous injection into BALB/c and hCD34+ transgenic mice modeling human immune system. These results have implications for the development of global, highly immunogenic, T cell-focused vaccines against various pathogens and diseases.

6.
PLoS One ; 13(8): e0201995, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30102714

RESUMO

Since 1997 a considerable effort has been spent on the study of the swap (switch) Markov chains on graphic degree sequences. All of these results assume some kind of regularity in the corresponding degree sequences. Recently, Greenhill and Sfragara published a breakthrough paper about irregular normal and directed degree sequences for which rapid mixing of the swap Markov chain is proved. In this paper we present two groups of results. An example from the first group is the following theorem: let [Formula: see text] be a directed degree sequence on n vertices. Denote by Δ the maximum value among all in- and out-degrees and denote by [Formula: see text] the number of edges in the realization. Assume furthermore that [Formula: see text]. Then the swap Markov chain on the realizations of [Formula: see text] is rapidly mixing. This result is a slight improvement on one of the results of Greenhill and Sfragara. An example from the second group is the following: let d be a bipartite degree sequence on the vertex set U ⊎ V, and let 0 < c1 ≤ c2 < |U| and 0 < d1 ≤ d2 < |V| be integers, where c1 ≤ d(v) ≤ c2: ∀v ∈ V and d1 ≤ d(u) ≤ d2: ∀u ∈ U. Furthermore assume that (c2 - c1 - 1)(d2 - d1 - 1) < max{c1(|V| - d2), d1(|U| - c2)}. Then the swap Markov chain on the realizations of d is rapidly mixing. A straightforward application of this latter result shows that when a random bipartite or directed graph is generated under the Erdos-Rényi G(n, p) model with mild assumptions on n and p then the degree sequence of the generated graph has, with high probability, a rapidly mixing swap Markov chain on its realizations.


Assuntos
Modelos Teóricos , Algoritmos
7.
Dalton Trans ; 45(37): 14709-18, 2016 Oct 07.
Artigo em Inglês | MEDLINE | ID: mdl-27283752

RESUMO

The reactivity of the previously reported peroxo adducts [Fe2(µ-O2)(L(1))4(CH3CN)2](2+), and [Fe2(µ-O2)(L(2))4(CH3CN)2](2+), (L(1) = 2-(2'-pyridyl)benzimidazole and L(2) = 2-(2'-pyridyl)-N-methylbenzimidazole) towards H2O2 as catalase mimics, and towards various phenols as functional RNR-R2 mimics, is described. Kinetic, mechanistic and computational studies gave direct evidence for the involvement of the (µ-1,2-peroxo)diiron(iii) intermediate in the O-H activation process via formation of low-spin oxoiron(iv) species.

8.
BMC Bioinformatics ; 16 Suppl 14: S6, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26452124

RESUMO

BACKGROUND: Even for moderate size inputs, there are a tremendous number of optimal rearrangement scenarios, regardless what the model is and which specific question is to be answered. Therefore giving one optimal solution might be misleading and cannot be used for statistical inferring. Statistically well funded methods are necessary to sample uniformly from the solution space and then a small number of samples are sufficient for statistical inferring. CONTRIBUTION: In this paper, we give a mini-review about the state-of-the-art of sampling and counting rearrangement scenarios, focusing on the reversal, DCJ and SCJ models. Above that, we also give a Gibbs sampler for sampling most parsimonious labeling of evolutionary trees under the SCJ model. The method has been implemented and tested on real life data. The software package together with example data can be downloaded from http://www.renyi.hu/~miklosi/SCJ-Gibbs/.


Assuntos
Rearranjo Gênico , Genoma , Modelos Genéticos , Software , Evolução Molecular , Humanos
9.
PLoS One ; 10(7): e0131300, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26161994

RESUMO

In 1999 Kannan, Tetali and Vempala proposed a MCMC method to uniformly sample all possible realizations of a given graphical degree sequence and conjectured its rapidly mixing nature. Recently their conjecture was proved affirmative for regular graphs (by Cooper, Dyer and Greenhill, 2007), for regular directed graphs (by Greenhill, 2011) and for half-regular bipartite graphs (by Miklós, Erdos and Soukup, 2013). Several heuristics on counting the number of possible realizations exist (via sampling processes), and while they work well in practice, so far no approximation guarantees exist for such an approach. This paper is the first to develop a method for counting realizations with provable approximation guarantee. In fact, we solve a slightly more general problem; besides the graphical degree sequence a small set of forbidden edges is also given. We show that for the general problem (which contains the Greenhill problem and the Miklós, Erdos and Soukup problem as special cases) the derived MCMC process is rapidly mixing. Further, we show that this new problem is self-reducible therefore it provides a fully polynomial randomized approximation scheme (a.k.a. FPRAS) for counting of all realizations.


Assuntos
Algoritmos , Redes de Comunicação de Computadores , Modelos Teóricos , Gráficos por Computador , Simulação por Computador , Cadeias de Markov , Estudos de Amostragem
10.
BMC Bioinformatics ; 16: 108, 2015 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-25888064

RESUMO

BACKGROUND: A standard procedure in many areas of bioinformatics is to use a single multiple sequence alignment (MSA) as the basis for various types of analysis. However, downstream results may be highly sensitive to the alignment used, and neglecting the uncertainty in the alignment can lead to significant bias in the resulting inference. In recent years, a number of approaches have been developed for probabilistic sampling of alignments, rather than simply generating a single optimum. However, this type of probabilistic information is currently not widely used in the context of downstream inference, since most existing algorithms are set up to make use of a single alignment. RESULTS: In this work we present a framework for representing a set of sampled alignments as a directed acyclic graph (DAG) whose nodes are alignment columns; each path through this DAG then represents a valid alignment. Since the probabilities of individual columns can be estimated from empirical frequencies, this approach enables sample-based estimation of posterior alignment probabilities. Moreover, due to conditional independencies between columns, the graph structure encodes a much larger set of alignments than the original set of sampled MSAs, such that the effective sample size is greatly increased. CONCLUSIONS: The alignment DAG provides a natural way to represent a distribution in the space of MSAs, and allows for existing algorithms to be efficiently scaled up to operate on large sets of alignments. As an example, we show how this can be used to compute marginal probabilities for tree topologies, averaging over a very large number of MSAs. This framework can also be used to generate a statistically meaningful summary alignment; example applications show that this summary alignment is consistently more accurate than the majority of the alignment samples, leading to improvements in downstream tree inference. Implementations of the methods described in this article are available at http://statalign.github.io/WeaveAlign .


Assuntos
Algoritmos , Biologia Computacional/métodos , Gráficos por Computador , Modelos Estatísticos , Alinhamento de Sequência/métodos , Software , Simulação por Computador , Humanos , Incerteza
11.
PLoS Comput Biol ; 8(2): e1002356, 2012 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-22319430

RESUMO

HD amino acid duplex has been found in the active center of many different enzymes. The dyad plays remarkably different roles in their catalytic processes that usually involve metal coordination. An HD motif is positioned directly on the amyloid beta fragment (Aß) and on the carboxy-terminal region of the extracellular domain (CAED) of the human amyloid precursor protein (APP) and a taxonomically well defined group of APP orthologues (APPOs). In human Aß HD is part of a presumed, RGD-like integrin-binding motif RHD; however, neither RHD nor RXD demonstrates reasonable conservation in APPOs. The sequences of CAEDs and the position of the HD are not particularly conserved either, yet we show with a novel statistical method using evolutionary modeling that the presence of HD on CAEDs cannot be the result of neutral evolutionary forces (p<0.0001). The motif is positively selected along the evolutionary process in the majority of APPOs, despite the fact that HD motif is underrepresented in the proteomes of all species of the animal kingdom. Position migration can be explained by high probability occurrence of multiple copies of HD on intermediate sequences, from which only one is kept by selective evolutionary forces, in a similar way as in the case of the "transcription binding site turnover." CAED of all APP orthologues and homologues are predicted to bind metal ions including Amyloid-like protein 1 (APLP1) and Amyloid-like protein 2 (APLP2). Our results suggest that HDs on the CAEDs are most probably key components of metal-binding domains, which facilitate and/or regulate inter- or intra-molecular interactions in a metal ion-dependent or metal ion concentration-dependent manner. The involvement of naturally occurring mutations of HD (Tottori (D7N) and English (H6R) mutations) in early onset Alzheimer's disease gives additional support to our finding that HD has an evolutionary preserved function on APPOs.


Assuntos
Precursor de Proteína beta-Amiloide/fisiologia , Evolução Molecular , Motivos de Aminoácidos , Sequência de Aminoácidos , Precursor de Proteína beta-Amiloide/química , Precursor de Proteína beta-Amiloide/genética , Precursor de Proteína beta-Amiloide/metabolismo , Animais , Biologia Computacional , Sequência Consenso , Humanos , Metais/metabolismo , Dados de Sequência Molecular , Mutação , Alinhamento de Sequência
12.
BMC Bioinformatics ; 11: 570, 2010 Nov 23.
Artigo em Inglês | MEDLINE | ID: mdl-21092255

RESUMO

BACKGROUND: In this paper, we introduce a progressive corner cutting method called Reticular Alignment for multiple sequence alignment. Unlike previous corner-cutting methods, our approach does not define a compact part of the dynamic programming table. Instead, it defines a set of optimal and suboptimal alignments at each step during the progressive alignment. The set of alignments are represented with a network to store them and use them during the progressive alignment in an efficient way. The program contains a threshold parameter on which the size of the network depends. The larger the threshold parameter and thus the network, the deeper the search in the alignment space for better scored alignments. RESULTS: We implemented the program in the Java programming language, and tested it on the BAliBASE database. Reticular Alignment can outperform ClustalW even if a very simple scoring scheme (BLOSUM62 and affine gap penalty) is implemented and merely the threshold value is increased. However, this set-up is not sufficient for outperforming other cutting-edge alignment methods. On the other hand, the reticular alignment search strategy together with sophisticated scoring schemes (for example, differentiating gap penalties for hydrophobic and hydrophylic amino acids) overcome FSA and in some accuracy measurement, even MAFFT. The program is available from http://phylogeny-cafe.elte.hu/RetAlign/ CONCLUSIONS: Reticular alignment is an efficient search strategy for finding accurate multiple alignments. The highest accuracy achieved when this searching strategy is combined with sophisticated scoring schemes.


Assuntos
Alinhamento de Sequência/métodos , Software , Algoritmos , Bases de Dados Factuais
13.
Bioinformatics ; 26(24): 3012-9, 2010 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-21037244

RESUMO

MOTIVATION: When comparing the organization of two genomes, it is important not to draw conclusions on their modes of evolution from a single most parsimonious scenario explaining their differences. Better estimations can be obtained by sampling many different genomic rearrangement scenarios. For this problem, the Double Cut and Join (DCJ) model, while less relevant, is computationally easier than the Hannenhalli-Pevzner (HP) model. Indeed, in some special cases, the total number of DCJ sorting scenarios can be analytically calculated, and uniformly distributed random DCJ scenarios can be drawn in polynomial running time, while the complexity of counting the number of HP scenarios and sampling from the uniform distribution of their space is unknown, and conjectured to be #P-complete. Statistical methods, like Markov chain Monte Carlo (MCMC) for sampling from the uniform distribution of the most parsimonious or the Bayesian distribution of all possible HP scenarios are required. RESULTS: We use the computational facilities of the DCJ model to draw a sampling of HP scenarios. It is based on a parallel MCMC method that cools down DCJ scenarios to HP scenarios. We introduce two theorems underlying the theoretical mixing properties of this parallel MCMC method. The method was tested on yeast and mammalian genomic data, and allowed us to provide estimates of the different modes of evolution in diverse lineages. AVAILABILITY: The program implemented in Java 1.5 programming language is available from http://www.renyi.hu/~miklosi/DCJ2HP/.


Assuntos
Evolução Molecular , Genoma , Modelos Genéticos , Animais , Teorema de Bayes , Humanos , Cadeias de Markov , Método de Monte Carlo , Saccharomyces cerevisiae/genética
14.
Artigo em Inglês | MEDLINE | ID: mdl-21030742

RESUMO

Markov chain Monte Carlo has been the standard technique for inferring the posterior distribution of genome rearrangement scenarios under a Bayesian approach. We present here a negative result on the rate of convergence of the generally used Markov chains. We prove that the relaxation time of the Markov chains walking on the optimal reversal sorting scenarios might grow exponentially with the size of the signed permutations, namely, with the number of syntheny blocks.


Assuntos
Biologia Computacional/métodos , Rearranjo Gênico/genética , Genoma , Cadeias de Markov , Método de Monte Carlo
16.
Methods Mol Biol ; 673: 19-36, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-20835790

RESUMO

We give an overview of RNA structure predictions in this chapter. We discuss here the main approaches to RNA structure prediction: combinatorial approaches, comparative approaches, and kinetic approaches. The main algorithms and mathematical concepts such as transformational grammars will be briefly introduced.


Assuntos
Biologia Computacional/métodos , Conformação de Ácido Nucleico , RNA/química , Algoritmos , Cinética , Alinhamento de Sequência
17.
BMC Evol Biol ; 9: 217, 2009 Aug 28.
Artigo em Inglês | MEDLINE | ID: mdl-19715598

RESUMO

BACKGROUND: We have previously combined statistical alignment and phylogenetic footprinting to detect conserved functional elements without assuming a fixed alignment. Considering a probability-weighted distribution of alignments removes sensitivity to alignment errors, properly accommodates regions of alignment uncertainty, and increases the accuracy of functional element prediction. Our method utilized standard dynamic programming hidden markov model algorithms to analyze up to four sequences. RESULTS: We present a novel approach, implemented in the software package BigFoot, for performing phylogenetic footprinting on greater numbers of sequences. We have developed a Markov chain Monte Carlo (MCMC) approach which samples both sequence alignments and locations of slowly evolving regions. We implement our method as an extension of the existing StatAlign software package and test it on well-annotated regions controlling the expression of the even-skipped gene in Drosophila and the alpha-globin gene in vertebrates. The results exhibit how adding additional sequences to the analysis has the potential to improve the accuracy of functional predictions, and demonstrate how BigFoot outperforms existing alignment-based phylogenetic footprinting techniques. CONCLUSION: BigFoot extends a combined alignment and phylogenetic footprinting approach to analyze larger amounts of sequence data using MCMC. Our approach is robust to alignment error and uncertainty and can be applied to a variety of biological datasets. The source code and documentation are publicly available for download from http://www.stats.ox.ac.uk/~satija/BigFoot/


Assuntos
Algoritmos , Teorema de Bayes , Modelos Genéticos , Evolução Molecular , Cadeias de Markov
18.
Mol Biol Evol ; 26(9): 2087-95, 2009 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-19570746

RESUMO

Homologous genes originate from a common ancestor through vertical inheritance, duplication, or horizontal gene transfer. Entire homolog families spawned by a single ancestral gene can be identified across multiple genomes based on protein sequence similarity. The sequences, however, do not always reveal conclusively the history of large families. To study the evolution of complete gene repertoires, we propose here a mathematical framework that does not rely on resolved gene family histories. We show that so-called phylogenetic profiles, formed by family sizes across multiple genomes, are sufficient to infer principal evolutionary trends. The main novelty in our approach is an efficient algorithm to compute the likelihood of a phylogenetic profile in a model of birth-and-death processes acting on a phylogeny. We examine known gene families in 28 archaeal genomes using a probabilistic model that involves lineage- and family-specific components of gene acquisition, duplication, and loss. The model enables us to consider all possible histories when inferring statistics about archaeal evolution. According to our reconstruction, most lineages are characterized by a net loss of gene families. Major increases in gene repertoire have occurred only a few times. Our reconstruction underlines the importance of persistent streamlining processes in shaping genome composition in Archaea. It also suggests that early archaeal genomes were as complex as typical modern ones, and even show signs, in the case of the methanogenic ancestor, of an extremely large gene repertoire.


Assuntos
Archaea/genética , Genoma Arqueal/genética , Modelos Genéticos , Filogenia , Substituição de Aminoácidos/genética , Sequência de Bases , Biologia Computacional , Evolução Molecular , Genes Arqueais
19.
Stat Methods Med Res ; 18(5): 453-85, 2009 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-19221170

RESUMO

Comparison of sequences that have descended from a common ancestor based on an explicit stochastic model of substitutions, insertions and deletions has risen to prominence in the last decade. Making statements about the positions of insertions-deletions (abbr. indels) is central in sequence and genome analysis and is called alignment. This statistical approach is harder conceptually and computationally, than competing approaches based on choosing an alignment according to some optimality criteria. But it has major practical advantages in terms of testing evolutionary hypotheses and parameter estimation. Basic dynamic approaches can allow the analysis of up to 4-5 sequences. MCMC techniques can bring this to about 10-15 sequences. Beyond this, different or heuristic approaches must be used. Besides the computational challenges, increasing realism in the underlying models is presently being addressed. A recent development that has been especially fruitful is combining statistical alignment with the problem of sequence annotation, making statements about the function of each nucleotide/amino acid. So far gene finding, protein secondary structure prediction and regulatory signal detection has been tackled within this framework. Much progress can be reported, but clearly major challenges remain if this approach is to be central in the analyses of large incoming sequence data sets.


Assuntos
Evolução Molecular , Genômica , Modelos Genéticos , Modelos Estatísticos , Análise de Sequência de DNA/métodos , Algoritmos , Genoma , Humanos , Filogenia , Alinhamento de Sequência , Processos Estocásticos
20.
Genome Biol Evol ; 1: 153-64, 2009 Jun 22.
Artigo em Inglês | MEDLINE | ID: mdl-20333186

RESUMO

Inversions are among the most common mutations acting on the order and orientation of genes in a genome, and polynomial-time algorithms exist to obtain a minimal length series of inversions that transform one genome arrangement to another. However, the minimum length series of inversions (the optimal sorting path) is often not unique as many such optimal sorting paths exist. If we assume that all optimal sorting paths are equally likely, then statistical inference on genome arrangement history must account for all such sorting paths and not just a single estimate. No deterministic polynomial algorithm is known to count the number of optimal sorting paths nor sample from the uniform distribution of optimal sorting paths. Here, we propose a stochastic method that uniformly samples the set of all optimal sorting paths. Our method uses a novel formulation of parallel Markov chain Monte Carlo. In practice, our method can quickly estimate the total number of optimal sorting paths. We introduce a variant of our approach in which short inversions are modeled to be more likely, and we show how the method can be used to estimate the distribution of inversion lengths and breakpoint usage in pathogenic Yersinia pestis. The proposed method has been implemented in a program called "MC4Inversion." We draw comparison of MC4Inversion to the sampler implemented in BADGER and a previously described importance sampling (IS) technique. We find that on high-divergence data sets, MC4Inversion finds more optimal sorting paths per second than BADGER and the IS technique and simultaneously avoids bias inherent in the IS technique.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...