Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 59
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Syst Biol ; 72(6): 1403-1417, 2023 Dec 30.
Artigo em Inglês | MEDLINE | ID: mdl-37862116

RESUMO

The genomic era has opened up vast opportunities in molecular systematics, one of which is deciphering the evolutionary history in fine detail. Under this mass of data, analyzing the point mutations of standard markers is often too crude and slow for fine-scale phylogenetics. Nevertheless, genome dynamics (GD) events provide alternative, often richer information. The synteny index (SI) between a pair of genomes combines gene order and gene content information, allowing the comparison of genomes of unequal gene content, together with order considerations of their common genes. Recently, genome dynamics has been modeled as a continuous-time Markov process, and gene distance in the genome as a birth-death-immigration process. Nevertheless, due to complexities arising in this setting, no precise and provably consistent estimators could be derived, resulting in heuristic solutions. Here, we extend this modeling approach by using techniques from birth-death theory to derive explicit expressions of the system's probabilistic dynamics in the form of rational functions of the model parameters. This, in turn, allows us to infer analytically accurate distances between organisms based on their SI. Subsequently, we establish additivity of this estimated evolutionary distance (a desirable property yielding phylogenetic consistency). Applying the new measure in simulation studies shows that it provides accurate results in realistic settings and even under model extensions such as gene gain/loss or over a tree structure. In the real-data realm, we applied the new formulation to unique data structure that we constructed-the ordered orthology DB-based on a new version of the EggNOG database, to construct a tree with more than 4.5K taxa. To the best of our knowledge, this is the largest gene-order-based tree constructed and it overcomes shortcomings found in previous approaches. Constructing a GD-based tree allows to confirm and contrast findings based on other phylogenetic approaches, as we show.


Assuntos
Genoma , Genômica , Filogenia , Genômica/métodos , Simulação por Computador , Evolução Molecular
2.
J Theor Biol ; 579: 111697, 2024 02 21.
Artigo em Inglês | MEDLINE | ID: mdl-38142045

RESUMO

The association of DNA methylation with age has been extensively studied. Previous work has investigated the trajectories of methylation with age, and developed predictive biomarkers of age. However, we still have a limited understanding of the functional form of methylation-age dynamics. To address this we present a theoretical framework to model the dynamics of DNA methylation at single sites. We show that this model leads to convergence to a steady-state methylation level at an exponential rate. By fitting the model to a dataset that measures changes in DNA methylation in the brain from birth to old age, we show that the timescales of this exponential convergence are heterogeneous across sites. To model this heterogeneity we generated a simulation of CpG Methylation changes with time and investigated the functional form of the dynamics of methylation with age under the empirical distribution of timescales estimated from the dataset. The resulting dynamics of the average methylation of the system were characterized and were found to closely follow an exponential trajectory. We conclude that DNA methylation can be modeled as a system that starts out of equilibrium at birth and approaches equilibrium with age in an exponential fashion. These insights illustrate the importance of accounting for nonlinear dynamics when utilizing age associated DNA methylation changes for constructing biomarkers of aging. Thus DNA methylation, along with the exponentially increasing risk of mortality with age, further establishes the exponential nature of aging.


Assuntos
Metilação de DNA , Epigênese Genética , Ilhas de CpG/genética , Biomarcadores
3.
Mol Biol Evol ; 37(5): 1470-1479, 2020 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-31845962

RESUMO

The dramatic decrease in time and cost for generating genetic sequence data has opened up vast opportunities in molecular systematics, one of which is the ability to decipher the evolutionary history of strains of a species. Under this fine systematic resolution, the standard markers are too crude to provide a phylogenetic signal. Nevertheless, among prokaryotes, genome dynamics in the form of horizontal gene transfer (HGT) between organisms and gene loss seem to provide far richer information by affecting both gene order and gene content. The "synteny index" (SI) between a pair of genomes combines these latter two factors, allowing comparison of genomes with unequal gene content, together with order considerations of their common genes. Although this approach is useful for classifying close relatives, no rigorous statistical modeling for it has been suggested. Such modeling is valuable, as it allows observed measures to be transformed into estimates of time periods during evolution, yielding the "additivity" of the measure. To the best of our knowledge, there is no other additivity proof for other gene order/content measures under HGT. Here, we provide a first statistical model and analysis for the SI measure. We model the "gene neighborhood" as a "birth-death-immigration" process affected by the HGT activity over the genome, and analytically relate the HGT rate and time to the expected SI. This model is asymptotic and thus provides accurate results, assuming infinite size genomes. Therefore, we also developed a heuristic model following an "exponential decay" function, accounting for biologically realistic values, which performed well in simulations. Applying this model to 1,133 prokaryotes partitioned to 39 clusters by the rank of genus yields that the average number of genome dynamics events per gene in the phylogenetic depth of genus is around half with significant variability between genera. This result extends and confirms similar results obtained for individual genera in different manners.


Assuntos
Transferência Genética Horizontal , Técnicas Genéticas , Modelos Genéticos , Sintenia , Genoma Microbiano , Filogenia
4.
Bioinformatics ; 36(17): 4662-4663, 2020 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-32573701

RESUMO

SUMMARY: Epigenetic rates of change, much as evolutionary mutation rate along a lineage, vary during lifetime. Accurate estimation of the epigenetic state has vast medical and biological implications. To account for these non-linear epigenetic changes with age, we recently developed a formalism inspired by the Pacemaker model of evolution that accounts for varying rates of mutations with time. Here, we present a python implementation of the Epigenetic Pacemaker (EPM), a conditional expectation maximization algorithm that estimates epigenetic landscapes and the state of individuals and may be used to study non-linear epigenetic aging. AVAILABILITY AND IMPLEMENTATION: The EPM is available at https://pypi.org/project/EpigeneticPacemaker/ under the MIT license. The EPM is compatible with python version 3.6 and above.


Assuntos
Epigenômica , Marca-Passo Artificial , Envelhecimento , Algoritmos , Epigênese Genética , Humanos , Software
5.
PLoS Comput Biol ; 16(11): e1008454, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-33253159

RESUMO

One of the hallmarks of cancer is the extremely high mutability and genetic instability of tumor cells. Inherent heterogeneity of intra-tumor populations manifests itself in high variability of clone instability rates. Analogously to fitness landscapes, the instability rates of clonal populations form their mutability landscapes. Here, we present MULAN (MUtability LANdscape inference), a maximum-likelihood computational framework for inference of mutation rates of individual cancer subclones using single-cell sequencing data. It utilizes the partial information about the orders of mutation events provided by cancer mutation trees and extends it by inferring full evolutionary history and mutability landscape of a tumor. Evaluation of mutation rates on the level of subclones rather than individual genes allows to capture the effects of genomic interactions and epistasis. We estimate the accuracy of our approach and demonstrate that it can be used to study the evolution of genetic instability and infer tumor evolutionary history from experimental data. MULAN is available at https://github.com/compbel/MULAN.


Assuntos
Mutação , Neoplasias/genética , Neoplasias/patologia , Análise de Célula Única/métodos , Algoritmos , Instabilidade Genômica , Humanos
6.
BMC Genomics ; 21(Suppl 2): 257, 2020 Apr 16.
Artigo em Inglês | MEDLINE | ID: mdl-32299339

RESUMO

BACKGROUND: DNA methylation is widely used as a biomarker in crucial medical applications as well as for human age prediction of very high accuracy. This biomarker is based on the methylation status of several hundred CpG sites. In a recent line of publications we have adapted a versatile concept from evolutionary biology - the Universal Pacemaker (UPM) - to the setting of epigenetic aging and denoted it the Epigenetic PaceMaker (EPM). The EPM, as opposed to other epigenetic clocks, is not confined to specific pattern of aging, and the epigenetic age of the individual is inferred independently of other individuals. This allows an explicit modeling of aging trends, in particular non linear relationship between chronological and epigenetic age. In one of these recent works, we have presented an algorithmic improvement based on a two-step conditional expectation maximization (CEM) algorithm to arrive at a critical point on the likelihood surface. The algorithm alternates between a time step and a site step while advancing on the likelihood surface. RESULTS: Here we introduce non trivial improvements to these steps that are essential for analyzing data sets of realistic magnitude in a manageable time and space. These structural improvements are based on insights from linear algebra and symbolic algebra tools, providing us greater understanding of the degeneracy of the complex problem space. This understanding in turn, leads to the complete elimination of the bottleneck of cumbersome matrix multiplication and inversion, yielding a fast closed form solution in both steps of the CEM.In the experimental results part, we compare the CEM algorithm over several data sets and demonstrate the speedup obtained by the closed form solutions. Our results support the theoretical analysis of this improvement. CONCLUSIONS: These improvements enable us to increase substantially the scale of inputs analyzed by the method, allowing us to apply the new approach to data sets that could not be analyzed before.


Assuntos
Envelhecimento/genética , Relógios Biológicos/genética , Genômica/métodos , Algoritmos , Ilhas de CpG , Metilação de DNA , Epigênese Genética , Epigenômica , Humanos , Funções Verossimilhança , Modelos Genéticos
7.
BMC Genomics ; 21(Suppl 1): 106, 2020 Mar 05.
Artigo em Inglês | MEDLINE | ID: mdl-32138652

RESUMO

BACKGROUND: Horizontal gene transfer (HGT) is the event of a DNA sequence being transferred between species not by inheritance. HGT is a crucial factor in prokaryotic evolution and is a significant source for genomic novelty resulting in antibiotic resistance or the outbreak of virulent strains. Detection of HGT and the mechanisms responsible and enabling it, is hence of prime importance.Existing algorithms rely on a strong phylogenetic signal distinguishing the transferred sequence from its recipient genome. Closely related species pose an even greater challenge as most genes are very similar and therefore, the phylogenetic signal is weak anyhow. Notwithstanding, the importance of detecting HGT between such organisms is extremely high for the role of HGT in the emergence of new highly virulent strains. RESULTS: In a recent work we devised a novel technique that relies on loss of synteny around a gene as a witness for HGT. We used a novel heuristic for synteny measurement, SI (Syntent Index), and the technique was tested on both simulated and real data and was found to provide a greater sensitivity than other HGT techniques. This synteny-based approach suffers low specificity, in particular more closely related species. Here we devise an adaptive approach to cope with this by varying the criteria according to species distance. The new approach is doubly adaptive as it also considers the lengths of the genes being transferred. In particular, we use Chernoff bound to decree HGT both in simulations and real bacterial genomes taken from EggNog database. CONCLUSIONS: Here we show empirically that this approach is more conservative than the previous χ2 based approach and provides a lower false positive rate, especially for closely related species and under wide range of genome parameters.


Assuntos
Bactérias/genética , Biologia Computacional/métodos , Transferência Genética Horizontal , Vírus/genética , Algoritmos , Evolução Molecular , Especiação Genética , Filogenia
8.
Mol Phylogenet Evol ; 136: 128-137, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-30946898

RESUMO

BACKGROUND: Extensive research efforts have been made to reconstruct the Tree of Life, aiming to explain the evolutionary history of life on earth. We expect the advent of next generation sequencing methods to bring us close to solving this challenge. Notwithstanding, with the accumulation of this mass of molecular data, it becomes evident that this solution is more complex and far from reach, especially among prokaryotes. One of the reasons for this is the ability of bacteria to perform horizontal gene transfer (HGT), creating substantial conflicts between different genes histories. Fortunately, evolution has equipped us with several markers with different levels of resolution, among which is synteny - the conservation of gene order along the chromosome. RESULTS: We have performed a comprehensive phylogenomic study via synteny based footprints. We build on the synteny index (SI) concept, defined in a pilot work of ours, and extend it to a systematic phylogenetic method with well defined valid regions of operations. Applying it to the EggNOG repository, divides all species into 39 clusters, agreeing with the conventional taxonomy. We show analytically that the signal of the standard phylogenetic marker, the 16S, is too faint for reliable classification. CONCLUSIONS: This work exhibits three separate yet related contributions. In terms of phylogenetics, it demonstrates quantitatively the advantage of the SI-based approach over the standard sequence based marker. Evolutionarily, the tree we produce is unique both in its specificity and broadness. Methodologically, the U-shape approach we developed, from synthetic realm, to real life and back to simulation, is novel and allow us to simulate the exact realistic conditions.


Assuntos
Filogenia , Células Procarióticas/classificação , Sintenia/genética , Sequência de Bases , Simulação por Computador , Bases de Dados Genéticas , Reprodutibilidade dos Testes
9.
BMC Genomics ; 19(Suppl 6): 570, 2018 Aug 13.
Artigo em Inglês | MEDLINE | ID: mdl-30367577

RESUMO

BACKGROUND: Deciphering the history of life on Earth has long been regarded as one of the most central tasks in biology. In past years, widespread discordance between the evolutionary histories of different groups of orthologous genes of prokaryotes have been revealed, primarily due to horizontal gene transfers (HGTs). Nonetheless, evidence that support a strong tree-like signal of evolution have been uncovered, despite the presence of HGT events. Therefore, a challenging task is to distill this tree-like signal from the noise induced by all sources of non-tree-like events. RESULTS: In this work we tackle this question, using real and simulated data. We first tighten a recent related theoretical result in this field. In a simulation study, we infer individual quartet topologies, and then use the inferred quartets to reconstruct simulated species trees. We demonstrate that accurate tree reconstruction is feasible despite surprisingly high rates of HGT. In a real data study, we construct phylogenies of two sets of prokaryotes, and show that our tree reconstruction scheme is comparable with (and complementary better than) other commonly used methods. CONCLUSIONS: Using a blend of theoretical and empirical investigations, our study proves the feasibility of accurate quartet-based phylogenetic reconstruction, the vast impact of HGT events notwithstanding.


Assuntos
Filogenia , Simulação por Computador , Transferência Genética Horizontal , Genes Arqueais , Genes Bacterianos
10.
J Mol Evol ; 86(2): 150-165, 2018 02.
Artigo em Inglês | MEDLINE | ID: mdl-29460038

RESUMO

Despite impressive advancements in technological and theoretical tools, construction of phylogenetic (evolutionary) trees is still a challenging task. The availability of enormous quantities of molecular data has made large-scale phylogenetic reconstruction involving thousands of species, a more viable goal. For this goal, separate trees over different, overlapping subsets of species, representing histories of various markers of these species, are collected. These trees, typically with conflicting signals, are subsequently combined into a single tree over the full set, an operation denoted as supertree construction. The amalgamation of such trees into a single tree lies at the heart of many tasks in phylogenetics, yet remains a daunting endeavor, especially in light of conflicting signals. In this work, we study the performance of matrix representation with parsimony (MRP), the most widely used supertree method to date, when confronted with quartet trees. Quartet trees are the most basic informational unit when amalgamation of unrooted trees is attempted, and they remain relevant in more general settings even though standard supertree methods are not necessarily confined to quartets. This study involves both real and simulated data, and the effects of several parameters on the results are evaluated, revealing a number of anomalies associated with MRP. We show that these anomalies are surmountable when using a recently introduced supertree method, weighted quartet MaxCut (wQMC).


Assuntos
Análise de Sequência de DNA/métodos , Algoritmos , Evolução Biológica , Simulação por Computador/estatística & dados numéricos , Interpretação Estatística de Dados , Filogenia , Projetos de Pesquisa/estatística & dados numéricos
11.
Bioinformatics ; 33(14): i67-i74, 2017 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-28881962

RESUMO

MOTIVATION: There is recent interest in using gene expression data to contextualize findings from traditional genome-wide association studies (GWAS). Conditioned on a tissue, expression quantitative trait loci (eQTLs) are genetic variants associated with gene expression, and eGenes are genes whose expression levels are associated with genetic variants. eQTLs and eGenes provide great supporting evidence for GWAS hits and important insights into the regulatory pathways involved in many diseases. When a significant variant or a candidate gene identified by GWAS is also an eQTL or eGene, there is strong evidence to further study this variant or gene. Multi-tissue gene expression datasets like the Gene Tissue Expression (GTEx) data are used to find eQTLs and eGenes. Unfortunately, these datasets often have small sample sizes in some tissues. For this reason, there have been many meta-analysis methods designed to combine gene expression data across many tissues to increase power for finding eQTLs and eGenes. However, these existing techniques are not scalable to datasets containing many tissues, like the GTEx data. Furthermore, these methods ignore a biological insight that the same variant may be associated with the same gene across similar tissues. RESULTS: We introduce a meta-analysis model that addresses these problems in existing methods. We focus on the problem of finding eGenes in gene expression data from many tissues, and show that our model is better than other types of meta-analyses. AVAILABILITY AND IMPLEMENTATION: Source code is at https://github.com/datduong/RECOV . CONTACT: eeskin@cs.ucla.edu or datdb@cs.ucla.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Variação Genética , Locos de Características Quantitativas , Software , Perfilação da Expressão Gênica/métodos , Estudo de Associação Genômica Ampla/métodos , Humanos , Metanálise como Assunto , Modelos Genéticos
12.
Mol Phylogenet Evol ; 116: 141-148, 2017 11.
Artigo em Inglês | MEDLINE | ID: mdl-28842276

RESUMO

Horizontal gene transfer (HGT) is a major part of the evolution of Archaea and Bacteria, to the extent that the validity of the Tree of Life concept for prokaryotes has been seriously questioned. The patterns and routes of HGT remain a subject of intense study and debate. It was discovered that while several genes exhibit rampant HGT across the whole prokaryotic tree of life, others are lethal to certain organisms and therefore cannot be successfully transferred to them. We distinguish between these two classes of genes and show analytically that genes found to be toxic to a specific species (E. coli) also resist HGT in general. Several tools we employ show evidence to support that claim. One of those tools is the quartet plurality distribution (QPD), a mathematical tool that measures tendency to HGT over a large set of genes and species. When aggregated over a collection of genes, it can reveal important properties of this collection. We conclude that evidence of toxicity of certain genes to a wide variety of prokaryotes are revealed using the new tool of quartet plurality distribution.


Assuntos
Filogenia , Toxinas Biológicas/genética , Archaea/genética , Simulação por Computador , Escherichia coli , Evolução Molecular , Transferência Genética Horizontal , RNA Ribossômico 16S/genética , Especificidade da Espécie
13.
Mol Phylogenet Evol ; 107: 209-220, 2017 02.
Artigo em Inglês | MEDLINE | ID: mdl-27818264

RESUMO

With the availability of enormous quantities of genetic data it has become common to construct very accurate trees describing the evolutionary history of the species under study, as well as every single gene of these species. These trees allow us to examine the evolutionary compliance of given markers (characters). A marker compliant with the history of the species investigated, has undergone mutations along the species tree branches, such that every subtree of that tree exhibits a different state. Convex recoloring (CR) uses combinatorial representation to measure the adequacy of a taxonomic classifier to a given tree. Despite its biological origins, research on CR has been almost exclusively dedicated to mathematical properties of the problem, or variants of it with little, if any, relationship to taxonomy. In this work we return to the origins of CR. We put CR in a statistical framework and introduce and learn the notion of the statistical significance of a character. We apply this measure to two data sets - Passerine birds and prokaryotes, and four examples. These examples demonstrate various applications of CR, from evolutionary relatedness, through lateral evolution, to supertree construction. The above study was done with a new software that we provide, containing algorithmic improvement with a graphical output of a (optimally) recolored tree. AVAILABILITY: A code implementing the features and a README is available at http://research.haifa.ac.il/ssagi/software/convexrecoloring.zip.


Assuntos
Algoritmos , Evolução Biológica , Migração Animal , Animais , Aves/genética , Simulação por Computador , Marcadores Genéticos , Muda , Filogenia , Células Procarióticas/metabolismo
14.
PLoS Comput Biol ; 12(11): e1005183, 2016 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-27835646

RESUMO

In multiple studies DNA methylation has proven to be an accurate biomarker of age. To develop these biomarkers, the methylation of multiple CpG sites is typically linearly combined to predict chronological age. By contrast, in this study we apply the Universal PaceMaker (UPM) model to investigate changes in DNA methylation during aging. The UPM was initially developed to study rate acceleration/deceleration in sequence evolution. Rather than identifying which linear combinations of sites predicts age, the UPM models the rates of change of multiple CpG sites, as well as their starting methylation levels, and estimates the age of each individual to optimize the model fit. We refer to the estimated age as the "epigenetic age", which is in contrast to the known chronological age of each individual. We construct a statistical framework and devise an algorithm to determine whether a genomic pacemaker is in effect (i.e rates of change vary with age). The decision is made by comparing two competing likelihood based models, the molecular clock (MC) and UPM. For the molecular clock model, we use the known chronological age of each individual and fit the methylation rates at multiple sites, and express the problem as a linear least squares and solve it in polynomial time. For the UPM case, the search space is larger as we are fitting both the epigenetic age of each individual as well as the rates for each site, yet we succeed to reduce the problem to the space of individuals and polynomial in the more significant space-the methylated sites. We first tested our algorithm on simulated data to elucidate the factors affecting the identification of the pacemaker model. We find that, provided with enough data, our algorithm is capable of identifying a pacemaker even when a weak signal is present in the data. Based on these results, we applied our method to DNA methylation data from human blood from individuals of various ages. Although the improvement in variance across sites between the UPM and MC was small, the results suggest that the existence of a pacemaker is highly significant. The PaceMaker results also suggest a decay in the rate of change in DNA methylation with age.


Assuntos
Envelhecimento/genética , Relógios Biológicos/genética , Ilhas de CpG/genética , Metilação de DNA/genética , DNA/genética , Epigênese Genética/genética , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Animais , Criança , Pré-Escolar , Simulação por Computador , Interpretação Estatística de Dados , Feminino , Humanos , Lactente , Recém-Nascido , Masculino , Pessoa de Meia-Idade , Modelos Genéticos , Modelos Estatísticos , Análise de Sequência de DNA/métodos , Adulto Jovem
15.
Syst Biol ; 64(2): 233-42, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-25414175

RESUMO

Despite impressive technical and theoretical developments, reconstruction of phylogenetic trees for enormous quantities of molecular data is still a challenging task. A key tool in analyses of large data sets has been the construction of separate trees for subsets (e.g., quartets) of sequences, and subsequent combination of these subtrees into a single tree for the full set (i.e., supertree analysis). Unfortunately, even amalgamating quartets into a supertree remains a computationally daunting task. Assigning weights to quartets to indicate importance or reliability was proposed more than a decade ago, but handling weighted quartets is even more challenging and has scarcely been attempted in the past. In this work, we focus on weighted quartet-based approaches. We propose a scheme to assign weights to quartets coming from weighted trees and devise a tree similarity measure for weighted trees based on weighted quartets. We also extend the quartet MaxCut (QMC algorithm) to handle weighted quartets. We evaluate these tools on simulated and real data. Our simulated data analysis highlights the additional information that is conveyed when using the new weighted tree similarity measure, and shows that extending QMC to a weighted setting improves the quality of tree reconstruction. Our analyses of a cyanobacterial data set with weighted QMC reinforce previous results achieved with other tools.


Assuntos
Classificação/métodos , Modelos Biológicos , Filogenia , Cianobactérias/classificação , Software
16.
PLoS Comput Biol ; 11(10): e1004408, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26439115

RESUMO

Horizontal gene transfer (HGT), the transfer of genetic material between organisms, is crucial for genetic innovation and the evolution of genome architecture. Existing HGT detection algorithms rely on a strong phylogenetic signal distinguishing the transferred sequence from ancestral (vertically derived) genes in its recipient genome. Detecting HGT between closely related species or strains is challenging, as the phylogenetic signal is usually weak and the nucleotide composition is normally nearly identical. Nevertheless, there is a great importance in detecting HGT between congeneric species or strains, especially in clinical microbiology, where understanding the emergence of new virulent and drug-resistant strains is crucial, and often time-sensitive. We developed a novel, self-contained technique named Near HGT, based on the synteny index, to measure the divergence of a gene from its native genomic environment and used it to identify candidate HGT events between closely related strains. The method confirms candidate transferred genes based on the constant relative mutability (CRM). Using CRM, the algorithm assigns a confidence score based on "unusual" sequence divergence. A gene exhibiting exceptional deviations according to both synteny and mutability criteria, is considered a validated HGT product. We first employed the technique to a set of three E. coli strains and detected several highly probable horizontally acquired genes. We then compared the method to existing HGT detection tools using a larger strain data set. When combined with additional approaches our new algorithm provides richer picture and brings us closer to the goal of detecting all newly acquired genes in a particular strain.


Assuntos
Mapeamento Cromossômico/métodos , Escherichia coli/genética , Evolução Molecular , Transferência Genética Horizontal/genética , Genoma Bacteriano/genética , Sintenia/genética , Algoritmos , Modelos Genéticos , Filogenia
17.
Nucleic Acids Res ; 42(4): 2391-404, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24243847

RESUMO

The evolutionary history of all life forms is usually represented as a vertical tree-like process. In prokaryotes, however, the vertical signal is partly obscured by the massive influence of horizontal gene transfer (HGT). The HGT creates widespread discordance between evolutionary histories of different genes as genomes become mosaics of gene histories. Thus, the Tree of Life (TOL) has been questioned as an appropriate representation of the evolution of prokaryotes. Nevertheless a common hypothesis is that prokaryotic evolution is primarily tree-like, and a routine effort is made to place new isolates in their appropriate location in the TOL. Moreover, it appears desirable to exploit non-tree-like evolutionary processes for the task of microbial classification. In this work, we present a novel technique that builds on the straightforward observation that gene order conservation ('synteny') decreases in time as a result of gene mobility. This is particularly true in prokaryotes, mainly due to HGT. Using a 'synteny index' (SI) that measures the average synteny between a pair of genomes, we developed the phylogenetic reconstruction tool 'Phylo SI'. Phylo SI offers several attractive properties such as easy bootstrapping, high sensitivity in cases where phylogenetic signal is weak and computational efficiency. Phylo SI was tested both on simulated data and on two bacterial data sets and compared with two well-established phylogenetic methods. Phylo SI is particularly efficient on short evolutionary distances where synteny footprints remain detectable, whereas the nucleotide substitution signal is too weak for reliable sequence-based phylogenetic reconstruction. The method is publicly available at http://research.haifa.ac.il/ssagi/software/PhyloSI.zip.


Assuntos
Archaea/classificação , Bactérias/classificação , Filogenia , Software , Genoma Arqueal , Genoma Bacteriano , Genômica/métodos , Sintenia
18.
Mol Phylogenet Evol ; 91: 226-37, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-25987530

RESUMO

Scorpio Linnaeus, 1758 (family Scorpionidae Latreille, 1802) was considered monotypic for over a century, and comprised a single species, Scorpio maurus Linnaeus, 1758, with 19 subspecies, distributed from West Africa, throughout the Maghreb and the Middle East, to Iran. Two parapatric subspecies, Scorpio maurus fuscus (Ehrenberg, 1829) and Scorpio maurus palmatus (Ehrenberg, 1828), have long been recognized in the eastern Mediterranean region. We examined morphological variation, burrow architecture and genetic divergence among 39 populations across the distribution of the two subspecies to assess whether they are conspecific and, if not, how many species might be involved. Cuticle coloration, pedipalp chela digital carina condition, and selected measurements were recorded. Sixty burrows were excavated and examined for burrow structure and depth. A multilocus dataset comprising concatenated fragments of one nuclear (28S rDNA) and three mitochondrial (12S rDNA, 16S rDNA, Cytochrome c Oxidase Subunit I) loci, totaling ca. 2400 base-pairs, was produced for 41 individuals, and a single-locus dataset comprising 658 base-pairs of the COI locus for 156 individuals. Despite overlapping ranges in morphometric characters of pedipalp chela shape, the putative subspecies were easily distinguished by cuticle coloration and condition of the pedipalp chela digital carina, and were also found to differ significantly in burrow architecture and depth. Phylogeographical analyses of the COI and multilocus datasets recovered seven distinct clades. Separate analyses of mitochondrial sequences, and combined analyses of mitochondrial and nuclear sequences support most clades. The two major clades corresponded with the geographical distributions of S. m. fuscus and S. m. palmatus in the region. Specimens from these clades were genetically distinct, and exhibited different burrow structure in geographically-proximate localities, suggesting reproductive isolation. The palmatus clade included two distinct subclades of specimens from localities adjacent to the Dead Sea. Three other clades, comprising specimens from the most northeastern localities, were tentatively assigned to subspecies previously recorded in neighboring Jordan and Syria. The morphological, behavioral and genetic evidence supports previous suggestions that Scorpio maurus is a species complex and justifies the following taxonomic emendations: Scorpio fuscus (Ehrenberg, 1829), stat. nov.; Scorpio kruglovi Birula, 1910, stat. nov.; Scorpio palmatus (Ehrenberg, 1828), stat. nov.; Scorpio propinquus (Simon, 1872), stat. nov.


Assuntos
Escorpiões/classificação , Animais , DNA Mitocondrial/química , Fenômenos Ecológicos e Ambientais , Oriente Médio , Filogenia , Filogeografia , Escorpiões/anatomia & histologia , Escorpiões/genética , Escorpiões/fisiologia
19.
BMC Genomics ; 15: 252, 2014 Mar 31.
Artigo em Inglês | MEDLINE | ID: mdl-24684786

RESUMO

BACKGROUND: In an earlier study, we hypothesized that genomic segments with different sequence organization patterns (OPs) might display functional specificity despite their similar GC content. Here we tested this hypothesis by dividing the human genome into 100 kb segments, classifying these segments into five compositional groups according to GC content, and then characterizing each segment within the five groups by oligonucleotide counting (k-mer analysis; also referred to as compositional spectrum analysis, or CSA), to examine the distribution of sequence OPs in the segments. We performed the CSA on the entire DNA, i.e., its coding and non-coding parts the latter being much more abundant in the genome than the former. RESULTS: We identified 38 OP-type clusters of segments that differ in their compositional spectrum (CS) organization. Many of the segments that shared the same OP type were enriched with genes related to the same biological processes (developmental, signaling, etc.), components of biochemical complexes, or organelles. Thirteen OP-type clusters showed significant enrichment in genes connected to specific gene-ontology terms. Some of these clusters seemed to reflect certain events during periods of horizontal gene transfer and genome expansion, and subsequent evolution of genomic regions requiring coordinated regulation. CONCLUSIONS: There may be a tendency for genes that are involved in the same biological process, complex or organelle to use the same OP, even at a distance of ~ 100 kb from the genes. Although the intergenic DNA is non-coding, the general pattern of sequence organization (e.g., reflected in over-represented oligonucleotide "words") may be important and were protected, to some extent, in the course of evolution.


Assuntos
Heterogeneidade Genética , Genoma Humano , Genômica , Animais , Composição de Bases , Evolução Molecular , Genes , Variação Genética , Genoma Mitocondrial , Humanos , Família Multigênica , Duplicações Segmentares Genômicas
20.
J Comput Biol ; 31(5): 396-415, 2024 05.
Artigo em Inglês | MEDLINE | ID: mdl-38754138

RESUMO

In addition to undergoing evolution, members of biological populations may also migrate between locations. Examples include the spread of tumor cells from the primary tumor to distant metastases or the spread of pathogens from one host to another. One may represent migration histories by assigning a location label to each vertex of a given phylogenetic tree such that an edge connecting vertices with distinct locations represents a migration. Some biological populations undergo comigration, a phenomenon where multiple taxa from distinct lineages simultaneously comigrate from one location to another. In this work, we show that a previous problem statement for inferring migration histories that are parsimonious in terms of migrations and comigrations may lead to temporally inconsistent solutions. To remedy this deficiency, we introduce precise definitions of temporal consistency of comigrations in a phylogenetic tree, leading to three successive problems. First, we formulate the temporally consistent comigration problem to check if a set of comigrations is temporally consistent and provide a linear time algorithm for solving this problem. Second, we formulate the parsimonious consistent comigrations (PCC) problem, which aims to find comigrations given a location labeling of a phylogenetic tree. We show that PCC is NP-hard. Third, we formulate the parsimonious consistent comigration history (PCCH) problem, which infers the migration history given a phylogenetic tree and locations of its extant vertices only. We show that PCCH is NP-hard as well. On the positive side, we propose integer linear programming models to solve the PCC and PCCH problems. We demonstrate our algorithms on simulated and real data.


Assuntos
Migração Animal , Movimento Celular , Modelos Biológicos , Migração Humana , Humanos , Animais , Algoritmos , Fatores de Tempo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA