Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 68
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Mol Biol Evol ; 41(7)2024 Jul 03.
Article in English | MEDLINE | ID: mdl-38922185

ABSTRACT

Modern phylogeography aims at reconstructing the geographic movement of organisms based on their genomic sequences and spatial information. Phylogeographic approaches are often applied to pathogen sequences and therefore tend to neglect the possibility of recombination, which decouples the evolutionary and geographic histories of different parts of the genome. Genomic regions of recombining or reassorting pathogens often originate and evolve at different times and locations, which characterize their unique spatial histories. Measuring the extent of these differences requires new methods to compare geographic information on phylogenetic trees reconstructed from different parts of the genome. Here we develop for the first time a set of measures of phylogeographic incompatibility, aimed at detecting differences between geographical histories in terms of distances between phylogeographies. We study the effect of varying demography and recombination on phylogeographic incompatibilities using coalescent simulations. We further apply these measures to the evolutionary history of human and livestock pathogens, either reassorting or recombining, such as the Victoria and Yamagata lineages of influenza B and the O/Ind-2001 foot-and-mouth disease virus strain. Our results reveal diverse geographical paths of migration that characterize the origins and evolutionary histories of different viral genes and genomic segments. These incompatibility measures can be applied to any phylogeography, and more generally to any phylogeny where each tip has been assigned either a continuous or discrete "trait" independent of the sequence. We illustrate this flexibility with an analysis of the interplay between the phylogeography and phylolinguistics of Uralic-speaking human populations, hinting at patrilinear language transmission.


Subject(s)
Phylogeny , Phylogeography , Recombination, Genetic , Humans , Animals , Evolution, Molecular , Foot-and-Mouth Disease Virus/genetics , Genome, Viral , Models, Genetic
2.
PLoS Biol ; 20(6): e3001626, 2022 06.
Article in English | MEDLINE | ID: mdl-35658016

ABSTRACT

The evolution of cooperation in cellular groups is threatened by lineages of cheaters that proliferate at the expense of the group. These cell lineages occur within microbial communities, and multicellular organisms in the form of tumours and cancer. In contrast to an earlier study, here we show how the evolution of pleiotropic genetic architectures-which link the expression of cooperative and private traits-can protect against cheater lineages and allow cooperation to evolve. We develop an age-structured model of cellular groups and show that cooperation breaks down more slowly within groups that tie expression to a private trait than in groups that do not. We then show that this results in group selection for pleiotropy, which strongly promotes cooperation by limiting the emergence of cheater lineages. These results predict that pleiotropy will rapidly evolve, so long as groups persist long enough for cheater lineages to threaten cooperation. Our results hold when pleiotropic links can be undermined by mutations, when pleiotropy is itself costly, and in mixed-genotype groups such as those that occur in microbes. Finally, we consider features of multicellular organisms-a germ line and delayed reproductive maturity-and show that pleiotropy is again predicted to be important for maintaining cooperation. The study of cancer in multicellular organisms provides the best evidence for pleiotropic constraints, where abberant cell proliferation is linked to apoptosis, senescence, and terminal differentiation. Alongside development from a single cell, we propose that the evolution of pleiotropic constraints has been critical for cooperation in many cellular groups.


Subject(s)
Biological Evolution , Microbiota , Genotype , Mutation , Phenotype
3.
Mol Biol Evol ; 39(2)2022 02 03.
Article in English | MEDLINE | ID: mdl-35106601

ABSTRACT

The evolutionary process of genetic recombination has the potential to rapidly change the properties of a viral pathogen, and its presence is a crucial factor to consider in the development of treatments and vaccines. It can also significantly affect the results of phylogenetic analyses and the inference of evolutionary rates. The detection of recombination from samples of sequencing data is a very challenging problem and is further complicated for SARS-CoV-2 by its relatively slow accumulation of genetic diversity. The extent to which recombination is ongoing for SARS-CoV-2 is not yet resolved. To address this, we use a parsimony-based method to reconstruct possible genealogical histories for samples of SARS-CoV-2 sequences, which enables us to pinpoint specific recombination events that could have generated the data. We propose a statistical framework for disentangling the effects of recurrent mutation from recombination in the history of a sample, and hence provide a way of estimating the probability that ongoing recombination is present. We apply this to samples of sequencing data collected in England and South Africa and find evidence of ongoing recombination.


Subject(s)
COVID-19 , SARS-CoV-2 , Genome, Viral , Humans , Mutation , Phylogeny , Recombination, Genetic
4.
Theor Popul Biol ; 154: 27-39, 2023 12.
Article in English | MEDLINE | ID: mdl-37544486

ABSTRACT

Recombination is a powerful evolutionary process that shapes the genetic diversity observed in the populations of many species. Reconstructing genealogies in the presence of recombination from sequencing data is a very challenging problem, as this relies on mutations having occurred on the correct lineages in order to detect the recombination and resolve the ordering of coalescence events in the local trees. We investigate the probability of reconstructing the true topology of ancestral recombination graphs (ARGs) under the coalescent with recombination and gene conversion. We explore how sample size and mutation rate affect the inherent uncertainty in reconstructed ARGs, which sheds light on the theoretical limitations of ARG reconstruction methods. We illustrate our results using estimates of evolutionary rates for several organisms; in particular, we find that for parameter values that are realistic for SARS-CoV-2, the probability of reconstructing genealogies that are close to the truth is low.


Subject(s)
Algorithms , Recombination, Genetic , Models, Genetic , Mutation , Biological Evolution , Phylogeny
5.
PLoS Comput Biol ; 18(6): e1009414, 2022 06.
Article in English | MEDLINE | ID: mdl-35731801

ABSTRACT

Gene expression is controlled by pathways of regulatory factors often involving the activity of protein kinases on transcription factor proteins. Despite this well established mechanism, the number of well described pathways that include the regulatory role of protein kinases on transcription factors is surprisingly scarce in eukaryotes. To address this, PhosTF was developed to infer functional regulatory interactions and pathways in both simulated and real biological networks, based on linear cyclic causal models with latent variables. GeneNetWeaverPhos, an extension of GeneNetWeaver, was developed to allow the simulation of perturbations in known networks that included the activity of protein kinases and phosphatases on gene regulation. Over 2000 genome-wide gene expression profiles, where the loss or gain of regulatory genes could be observed to perturb gene regulation, were then used to infer the existence of regulatory interactions, and their mode of regulation in the budding yeast Saccharomyces cerevisiae. Despite the additional complexity, our inference performed comparably to the best methods that inferred transcription factor regulation assessed in the DREAM4 challenge on similar simulated networks. Inference on integrated genome-scale data sets for yeast identified ∼ 8800 protein kinase/phosphatase-transcription factor interactions and ∼ 6500 interactions among protein kinases and/or phosphatases. Both types of regulatory predictions captured statistically significant numbers of known interactions of their type. Surprisingly, kinases and phosphatases regulated transcription factors by a negative mode or regulation (deactivation) in over 70% of the predictions.


Subject(s)
Phosphoric Monoester Hydrolases , Protein Kinases , Gene Expression Profiling , Gene Expression Regulation/genetics , Gene Regulatory Networks/genetics , Phosphoric Monoester Hydrolases/genetics , Phosphoric Monoester Hydrolases/metabolism , Protein Kinases/genetics , Protein Kinases/metabolism , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism , Transcription Factors/genetics , Transcription Factors/metabolism
6.
Bioinformatics ; 37(19): 3277-3284, 2021 Oct 11.
Article in English | MEDLINE | ID: mdl-33970217

ABSTRACT

MOTIVATION: The reconstruction of possible histories given a sample of genetic data in the presence of recombination and recurrent mutation is a challenging problem, but can provide key insights into the evolution of a population. We present KwARG, which implements a parsimony-based greedy heuristic algorithm for finding plausible genealogical histories (ancestral recombination graphs) that are minimal or near-minimal in the number of posited recombination and mutation events. RESULTS: Given an input dataset of aligned sequences, KwARG outputs a list of possible candidate solutions, each comprising a list of mutation and recombination events that could have generated the dataset; the relative proportion of recombinations and recurrent mutations in a solution can be controlled via specifying a set of 'cost' parameters. We demonstrate that the algorithm performs well when compared against existing methods. AVAILABILITY AND IMPLEMENTATION: The software is available at https://github.com/a-ignatieva/kwarg. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

7.
Mol Biol Evol ; 37(2): 576-592, 2020 02 01.
Article in English | MEDLINE | ID: mdl-31665393

ABSTRACT

Pairs of nucleotides within functional nucleic acid secondary structures often display evidence of coevolution that is consistent with the maintenance of base-pairing. Here, we introduce a sequence evolution model, MESSI (Modeling the Evolution of Secondary Structure Interactions), that infers coevolution associated with base-paired sites in DNA or RNA sequence alignments. MESSI can estimate coevolution while accounting for an unknown secondary structure. MESSI can also use graphics processing unit parallelism to increase computational speed. We used MESSI to infer coevolution associated with GC, AU (AT in DNA), GU (GT in DNA) pairs in noncoding RNA alignments, and in single-stranded RNA and DNA virus alignments. Estimates of GU pair coevolution were found to be higher at base-paired sites in single-stranded RNA viruses and noncoding RNAs than estimates of GT pair coevolution in single-stranded DNA viruses. A potential biophysical explanation is that GT pairs do not stabilize DNA secondary structures to the same extent that GU pairs do in RNA. Additionally, MESSI estimates the degrees of coevolution at individual base-paired sites in an alignment. These estimates were computed for a SHAPE-MaP-determined HIV-1 NL4-3 RNA secondary structure. We found that estimates of coevolution were more strongly correlated with experimentally determined SHAPE-MaP pairing scores than three nonevolutionary measures of base-pairing covariation. To assist researchers in prioritizing substructures with potential functionality, MESSI automatically ranks substructures by degrees of coevolution at base-paired sites within them. Such a ranking was created for an HIV-1 subtype B alignment, revealing an excess of top-ranking substructures that have been previously identified as having structure-related functional importance, among several uncharacterized top-ranking substructures.


Subject(s)
Computational Biology/methods , DNA/chemistry , RNA/chemistry , Base Pairing , DNA/genetics , DNA, Viral/chemistry , DNA, Viral/genetics , Evolution, Molecular , Models, Molecular , RNA/genetics , RNA, Untranslated/chemistry , RNA, Untranslated/genetics , RNA, Viral/chemistry , RNA, Viral/genetics , Software
8.
J Chem Inf Model ; 61(4): 1637-1646, 2021 04 26.
Article in English | MEDLINE | ID: mdl-33844913

ABSTRACT

A main challenge in the enumeration of small-molecule chemical spaces for drug design is to quickly and accurately differentiate between possible and impossible molecules. Current approaches for screening enumerated molecules (e.g., 2D heuristics and 3D force fields) have not been able to achieve a balance between accuracy and speed. We have developed a new automated approach for fast and high-quality screening of small molecules, with the following steps: (1) for each molecule in the set, an ensemble of 2D descriptors as feature encoding is computed; (2) on a random small subset, classification (feasible/infeasible) targets via a 3D-based approach are generated; (3) a classification dataset with the computed features and targets is formed and a machine learning model for predicting the 3D approach's decisions is trained; and (4) the trained model is used to screen the remainder of the enumerated set. Our approach is ≈8× (7.96× to 8.84×) faster than screening via 3D simulations without significantly sacrificing accuracy; while compared to 2D-based pruning rules, this approach is more accurate, with better coverage of known feasible molecules. Once the topological features and 3D conformer evaluation methods are established, the process can be fully automated, without any additional chemistry expertise.


Subject(s)
Drug Design , Machine Learning
9.
Theor Popul Biol ; 134: 61-76, 2020 08.
Article in English | MEDLINE | ID: mdl-32439294

ABSTRACT

The dynamics of a population exhibiting exponential growth can be modelled as a birth-death process, which naturally captures the stochastic variation in population size over time. In this article, we consider a supercritical birth-death process, started at a random time in the past, and conditioned to have n sampled individuals at the present. The genealogy of individuals sampled at the present time is then described by the reversed reconstructed process (RRP), which traces the ancestry of the sample backwards from the present. We show that a simple, analytic, time rescaling of the RRP provides a straightforward way to derive its inter-event times. The same rescaling characterises other distributions underlying this process, obtained elsewhere in the literature via more cumbersome calculations. We also consider the case of incomplete sampling of the population, in which each leaf of the genealogy is retained with an independent Bernoulli trial with probability ψ, and we show that corresponding results for Bernoulli-sampled RRPs can be derived using time rescaling, for any values of the underlying parameters. A central result is the derivation of a scaling limit as ψ approaches 0, corresponding to the underlying population growing to infinity, using the time rescaling formalism. We show that in this setting, after a linear time rescaling, the event times are the order statistics of n logistic random variables with mode log(1∕ψ); moreover, we show that the inter-event times are approximately exponentially distributed.


Subject(s)
Population Density , Humans , Probability
10.
Syst Biol ; 68(2): 252-266, 2019 03 01.
Article in English | MEDLINE | ID: mdl-30239957

ABSTRACT

Classic alignment algorithms utilize scoring functions which maximize similarity or minimize edit distances. These scoring functions account for both insertion-deletion (indel) and substitution events. In contrast, alignments based on stochastic models aim to explicitly describe the evolutionary dynamics of sequences by inferring relevant probabilistic parameters from input sequences. Despite advances in stochastic modeling during the last two decades, scoring-based methods are still dominant, partially due to slow running times of probabilistic approaches. Alignment inference using stochastic models involves estimating the probability of events, such as the insertion or deletion of a specific number of characters. In this work, we present SimBa-SAl, a simulation-based approach to statistical alignment inference, which relies on an explicit continuous time Markov model for both indels and substitutions. SimBa-SAl has several advantages. First, using simulations, it decouples the estimation of event probabilities from the inference stage, which allows the introduction of accelerations to the alignment inference procedure. Second, it is general and can accommodate various stochastic models of indel formation. Finally, it allows computing the maximum-likelihood alignment, the probability of a given pair of sequences integrated over all possible alignments, and sampling alternative alignments according to their probability. We first show that SimBa-SAl allows accurate estimation of parameters of the long-indel model previously developed by Miklós et al. (2004). We next show that SimBa-SAl is more accurate than previously developed pairwise alignment algorithms, when analyzing simulated as well as empirical data sets. Finally, we study the goodness-of-fit of the long-indel and TKF91 models. We show that although the long-indel model fits the data sets better than TKF91, there is still room for improvement concerning the realistic modeling of evolutionary sequence dynamics.


Subject(s)
Classification/methods , Models, Statistical , Phylogeny , Computer Simulation , Evolution, Molecular , INDEL Mutation/genetics
11.
Mol Biol Evol ; 34(8): 2085-2100, 2017 08 01.
Article in English | MEDLINE | ID: mdl-28453724

ABSTRACT

Recently described stochastic models of protein evolution have demonstrated that the inclusion of structural information in addition to amino acid sequences leads to a more reliable estimation of evolutionary parameters. We present a generative, evolutionary model of protein structure and sequence that is valid on a local length scale. The model concerns the local dependencies between sequence and structure evolution in a pair of homologous proteins. The evolutionary trajectory between the two structures in the protein pair is treated as a random walk in dihedral angle space, which is modeled using a novel angular diffusion process on the two-dimensional torus. Coupling sequence and structure evolution in our model allows for modeling both "smooth" conformational changes and "catastrophic" conformational jumps, conditioned on the amino acid changes. The model has interpretable parameters and is comparatively more realistic than previous stochastic models, providing new insights into the relationship between sequence and structure evolution. For example, using the trained model we were able to identify an apparent sequence-structure evolutionary motif present in a large number of homologous protein pairs. The generative nature of our model enables us to evaluate its validity and its ability to simulate aspects of protein evolution conditioned on an amino acid sequence, a related amino acid sequence, a related structure or any combination thereof.


Subject(s)
Proteins/genetics , Sequence Alignment/methods , Sequence Analysis, Protein/methods , Amino Acid Sequence , Computer Simulation , Evolution, Molecular , Models, Genetic , Models, Molecular , Protein Conformation , Protein Structural Elements/genetics , Proteins/metabolism , Sequence Analysis, Protein/statistics & numerical data
12.
PLoS Comput Biol ; 12(6): e1004964, 2016 06.
Article in English | MEDLINE | ID: mdl-27295277

ABSTRACT

About 8% of the human genome is made up of endogenous retroviruses (ERVs). Though most human endogenous retroviruses (HERVs) are thought to be irrelevant to our biology notable exceptions include members of the HERV-H family that are necessary for the correct functioning of stem cells. ERVs are commonly found in two forms, the full-length proviral form, and the more numerous solo-LTR form, thought to result from homologous recombination events. Here we introduce a phylogenetic framework to study ERV insertion and solo-LTR formation. We then apply the framework to site patterns sampled from a set of long alignments covering six primate genomes. Studying six categories of ERVs we quantitatively recapitulate patterns of insertional activity that are usually described in qualitative terms in the literature. A slowdown in most ERV groups is observed but we suggest that HERV-K activity may have increased in humans since they diverged from chimpanzees. We find that the rate of solo-LTR formation decreases rapidly as a function of ERV age and that an age dependent model of solo-LTR formation describes the history of ERVs more accurately than the commonly used exponential decay model. We also demonstrate that HERV-H loci are markedly less likely to form solo-LTRs than ERVs from other families. We conclude that the slower dynamics of HERV-H suggest a host role for the internal regions of these exapted elements and posit that in future it will be possible to use the relationship between full-length proviruses and solo-LTRs to help identify large scale co-options in distant vertebrate genomes.


Subject(s)
Endogenous Retroviruses/genetics , Genome, Human/genetics , Models, Genetic , Animals , Base Sequence , Conserved Sequence , Evolution, Molecular , Humans , Phylogeny , Primates/genetics
13.
BMC Bioinformatics ; 16: 108, 2015 Apr 01.
Article in English | MEDLINE | ID: mdl-25888064

ABSTRACT

BACKGROUND: A standard procedure in many areas of bioinformatics is to use a single multiple sequence alignment (MSA) as the basis for various types of analysis. However, downstream results may be highly sensitive to the alignment used, and neglecting the uncertainty in the alignment can lead to significant bias in the resulting inference. In recent years, a number of approaches have been developed for probabilistic sampling of alignments, rather than simply generating a single optimum. However, this type of probabilistic information is currently not widely used in the context of downstream inference, since most existing algorithms are set up to make use of a single alignment. RESULTS: In this work we present a framework for representing a set of sampled alignments as a directed acyclic graph (DAG) whose nodes are alignment columns; each path through this DAG then represents a valid alignment. Since the probabilities of individual columns can be estimated from empirical frequencies, this approach enables sample-based estimation of posterior alignment probabilities. Moreover, due to conditional independencies between columns, the graph structure encodes a much larger set of alignments than the original set of sampled MSAs, such that the effective sample size is greatly increased. CONCLUSIONS: The alignment DAG provides a natural way to represent a distribution in the space of MSAs, and allows for existing algorithms to be efficiently scaled up to operate on large sets of alignments. As an example, we show how this can be used to compute marginal probabilities for tree topologies, averaging over a very large number of MSAs. This framework can also be used to generate a statistically meaningful summary alignment; example applications show that this summary alignment is consistently more accurate than the majority of the alignment samples, leading to improvements in downstream tree inference. Implementations of the methods described in this article are available at http://statalign.github.io/WeaveAlign .


Subject(s)
Algorithms , Computational Biology/methods , Computer Graphics , Models, Statistical , Sequence Alignment/methods , Software , Computer Simulation , Humans , Uncertainty
14.
Mol Biol Evol ; 31(9): 2251-66, 2014 Sep.
Article in English | MEDLINE | ID: mdl-24899668

ABSTRACT

For sequences that are highly divergent, there is often insufficient information to infer accurate alignments, and phylogenetic uncertainty may be high. One way to address this issue is to make use of protein structural information, since structures generally diverge more slowly than sequences. In this work, we extend a recently developed stochastic model of pairwise structural evolution to multiple structures on a tree, analytically integrating over ancestral structures to permit efficient likelihood computations under the resulting joint sequence-structure model. We observe that the inclusion of structural information significantly reduces alignment and topology uncertainty, and reduces the number of topology and alignment errors in cases where the true trees and alignments are known. In some cases, the inclusion of structure results in changes to the consensus topology, indicating that structure may contain additional information beyond that which can be obtained from sequences. We use the model to investigate the order of divergence of cytoglobins, myoglobins, and hemoglobins and observe a stabilization of phylogenetic inference: although a sequence-based inference assigns significant posterior probability to several different topologies, the structural model strongly favors one of these over the others and is more robust to the choice of data set.


Subject(s)
Bayes Theorem , Computational Biology/methods , Globins/chemistry , Hemoglobins/chemistry , Myoglobin/chemistry , Animals , Cytoglobin , Globins/genetics , Hemoglobins/genetics , Humans , Markov Chains , Models, Molecular , Mutation , Myoglobin/genetics , Phylogeny , Protein Conformation , Sequence Alignment , Sequence Analysis, Protein
15.
Retrovirology ; 12: 52, 2015 Jun 20.
Article in English | MEDLINE | ID: mdl-26088204

ABSTRACT

BACKGROUND: Endogenous retroviruses (ERVs) are often viewed as selfish DNA that do not contribute to host phenotype. Yet ERVs have also been co-opted to play important roles in the maintenance of stem cell identity and placentation, amongst other things. This has led to debate over whether the typical ERV confers a cost or benefit upon the host. We studied the divergence of orthologous ERVs since the chimp-human split with the aim of assessing whether ERVs exert detectable fitness effects. RESULTS: ERVs have evolved faster than other selfish DNA in human and chimpanzee. The divergence of ERVs relative to neighbouring selfish DNA is positively correlated with the length of the long terminal repeat of an ERV and with the percentage of neighbouring DNA that is not selfish. ERVs from the HERV-H family have diverged particularly quickly and in a manner that correlates with their level of transcription in human stem cells. A substitution into a highly transcribed HERV-H has a selective coefficient of the order of 10(-4). This is large enough to suggest these substitutions are not dominated by drift. CONCLUSIONS: ERVs differ from other selfish DNA in the extent to which they diverge and appear to have measurable effects on hosts, even after fixation. The effects are strongest for HERV-H and suggest that the HERV-H transcriptome has recently evolved under the influence of directional selection. As there are many HERV-H loci distributed across the ape lineage, our results suggest that in future this family can be used to model the evolutionary consequences of ERV exaptation in primates and other mammals.


Subject(s)
Endogenous Retroviruses/genetics , Evolution, Molecular , Pan troglodytes/virology , Primates/virology , Animals , Genetic Fitness , Humans , Repetitive Sequences, Nucleic Acid , Terminal Repeat Sequences
16.
Bioinformatics ; 29(5): 654-5, 2013 Mar 01.
Article in English | MEDLINE | ID: mdl-23335014

ABSTRACT

MOTIVATION: Comparative modeling of RNA is known to be important for making accurate secondary structure predictions. RNA structure prediction tools such as PPfold or RNAalifold use an aligned set of sequences in predictions. Obtaining a multiple alignment from a set of sequences is quite a challenging problem itself, and the quality of the alignment can affect the quality of a prediction. By implementing RNA secondary structure prediction in a statistical alignment framework, and predicting structures from multiple alignment samples instead of a single fixed alignment, it may be possible to improve predictions. RESULTS: We have extended the program StatAlign to make use of RNA-specific features, which include RNA secondary structure prediction from multiple alignments using either a thermodynamic approach (RNAalifold) or a Stochastic Context-Free Grammars (SCFGs) approach (PPfold). We also provide the user with scores relating to the quality of a secondary structure prediction, such as information entropy values for the combined space of secondary structures and sampled alignments, and a reliability score that predicts the expected number of correctly predicted base pairs. Finally, we have created RNA secondary structure visualization plugins and automated the process of setting up Markov Chain Monte Carlo runs for RNA alignments in StatAlign. AVAILABILITY AND IMPLEMENTATION: The software is available from http://statalign.github.com/statalign/.


Subject(s)
RNA/chemistry , Sequence Alignment/methods , Sequence Analysis, RNA , Software , Algorithms , Base Pairing , Bayes Theorem , Markov Chains , Nucleic Acid Conformation , Thermodynamics
17.
Bioinformatics ; 29(6): 704-10, 2013 Mar 15.
Article in English | MEDLINE | ID: mdl-23396120

ABSTRACT

MOTIVATION: Many computational methods for RNA secondary structure prediction, and, in particular, for the prediction of a consensus structure of an alignment of RNA sequences, have been developed. Most methods, however, ignore biophysical factors, such as the kinetics of RNA folding; no current implementation considers both evolutionary information and folding kinetics, thus losing information that, when considered, might lead to better predictions. RESULTS: We present an iterative algorithm, Oxfold, in the framework of stochastic context-free grammars, that emulates the kinetics of RNA folding in a simplified way, in combination with a molecular evolution model. This method improves considerably on existing grammatical models that do not consider folding kinetics. Additionally, the model compares favourably to non-kinetic thermodynamic models.


Subject(s)
Algorithms , RNA Folding , RNA/chemistry , Bayes Theorem , Evolution, Molecular , Kinetics , Models, Molecular , Sequence Alignment , Sequence Analysis, RNA/methods , Stochastic Processes , Thermodynamics
18.
BMC Bioinformatics ; 14: 149, 2013 May 01.
Article in English | MEDLINE | ID: mdl-23634662

ABSTRACT

BACKGROUND: With the advancement of next-generation sequencing and transcriptomics technologies, regulatory effects involving RNA, in particular RNA structural changes are being detected. These results often rely on RNA secondary structure predictions. However, current approaches to RNA secondary structure modelling produce predictions with a high variance in predictive accuracy, and we have little quantifiable knowledge about the reasons for these variances. RESULTS: In this paper we explore a number of factors which can contribute to poor RNA secondary structure prediction quality. We establish a quantified relationship between alignment quality and loss of accuracy. Furthermore, we define two new measures to quantify uncertainty in alignment-based structure predictions. One of the measures improves on the "reliability score" reported by PPfold, and considers alignment uncertainty as well as base-pair probabilities. The other measure considers the information entropy for SCFGs over a space of input alignments. CONCLUSIONS: Our predictive accuracy improves on the PPfold reliability score. We can successfully characterize many of the underlying reasons for and variances in poor prediction. However, there is still variability unaccounted for, which we therefore suggest comes from the RNA secondary structure predictive model itself.


Subject(s)
RNA/chemistry , Sequence Alignment/methods , Sequence Analysis, RNA , Algorithms , Base Pairing , Evolution, Molecular , Nucleic Acid Conformation , Probability , Reproducibility of Results , Sequence Alignment/standards
19.
BMC Evol Biol ; 13: 243, 2013 Nov 07.
Article in English | MEDLINE | ID: mdl-24195754

ABSTRACT

BACKGROUND: We wish to understand how sex and recombination affect endogenous retroviral insertion and deletion. While theory suggests that the risk of ectopic recombination will limit the accumulation of repetitive DNA in areas of high meiotic recombination, the experimental evidence so far has been inconsistent. Under the assumption of neutrality, we examine the genomes of eighteen species of animal in order to compute the ratio of solo-LTRs that derive from insertions occurring down the male germ line as opposed to the female one (male bias). We also extend the simple idea of comparing autosome to allosome in order to predict the ratio of full-length proviruses we would expect to see under conditions of recombination linked deletion or otherwise. RESULTS: Using our model, we predict the ratio of allosomal to autosomal full-length proviruses to lie between32 and 23 under increasing male bias in mammals and between 1 and 2 under increasing male bias in birds. In contrast to our expectations, we find that a pattern of male bias is not universal across species and that there is a frequent overabundance of full-length proviruses on the allosome beyond the ratios predicted by our model. CONCLUSIONS: We use our data as a whole to argue that full-length proviruses should be treated as deleterious mutations or as effectively neutral mutations whose persistence in a full-length state is linked to the rate of meiotic recombination and whose origin is not universally male biased. These conclusions suggest that retroviral insertions on the allosome may be more prolific and that it might be possible to identify mechanisms of replication that are enhanced in the female sex.


Subject(s)
Mammals/genetics , Mammals/virology , Mutagenesis, Insertional , Retroviridae/genetics , Animals , Birds/genetics , Birds/virology , Chromosomes , Female , Humans , Male , Proviruses/genetics , Sequence Deletion , Terminal Repeat Sequences
20.
PLoS Comput Biol ; 8(11): e1002749, 2012.
Article in English | MEDLINE | ID: mdl-23133356

ABSTRACT

In molecular recognition, it is often the case that ligand binding is coupled to conformational change in one or both of the binding partners. Two hypotheses describe the limiting cases involved; the first is the induced fit and the second is the conformational selection model. The conformational selection model requires that the protein adopts conformations that are similar to the ligand-bound conformation in the absence of ligand, whilst the induced-fit model predicts that the ligand-bound conformation of the protein is only accessible when the ligand is actually bound. The flexibility of the apo protein clearly plays a major role in these interpretations. For many proteins involved in signaling pathways there is the added complication that they are often promiscuous in that they are capable of binding to different ligand partners. The relationship between protein flexibility and promiscuity is an area of active research and is perhaps best exemplified by the PDZ domain family of proteins. In this study we use molecular dynamics simulations to examine the relationship between flexibility and promiscuity in five PDZ domains: the human Dvl2 (Dishevelled-2) PDZ domain, the human Erbin PDZ domain, the PDZ1 domain of InaD (inactivation no after-potential D protein) from fruit fly, the PDZ7 domain of GRIP1 (glutamate receptor interacting protein 1) from rat and the PDZ2 domain of PTP-BL (protein tyrosine phosphatase) from mouse. We show that despite their high structural similarity, the PDZ binding sites have significantly different dynamics. Importantly, the degree of binding pocket flexibility was found to be closely related to the various characteristics of peptide binding specificity and promiscuity of the five PDZ domains. Our findings suggest that the intrinsic motions of the apo structures play a key role in distinguishing functional properties of different PDZ domains and allow us to make predictions that can be experimentally tested.


Subject(s)
PDZ Domains , Proteins/chemistry , Proteins/metabolism , Amino Acid Sequence , Animals , Binding Sites , Cluster Analysis , Computational Biology , Drosophila Proteins , Humans , Mice , Molecular Dynamics Simulation , Molecular Sequence Data , Protein Binding , Rats , Sequence Alignment , Signal Transduction
SELECTION OF CITATIONS
SEARCH DETAIL