Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 59
Filtrar
1.
J Comput Biol ; 31(5): 396-415, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38754138

RESUMO

In addition to undergoing evolution, members of biological populations may also migrate between locations. Examples include the spread of tumor cells from the primary tumor to distant metastases or the spread of pathogens from one host to another. One may represent migration histories by assigning a location label to each vertex of a given phylogenetic tree such that an edge connecting vertices with distinct locations represents a migration. Some biological populations undergo comigration, a phenomenon where multiple taxa from distinct lineages simultaneously comigrate from one location to another. In this work, we show that a previous problem statement for inferring migration histories that are parsimonious in terms of migrations and comigrations may lead to temporally inconsistent solutions. To remedy this deficiency, we introduce precise definitions of temporal consistency of comigrations in a phylogenetic tree, leading to three successive problems. First, we formulate the temporally consistent comigration problem to check if a set of comigrations is temporally consistent and provide a linear time algorithm for solving this problem. Second, we formulate the parsimonious consistent comigrations (PCC) problem, which aims to find comigrations given a location labeling of a phylogenetic tree. We show that PCC is NP-hard. Third, we formulate the parsimonious consistent comigration history (PCCH) problem, which infers the migration history given a phylogenetic tree and locations of its extant vertices only. We show that PCCH is NP-hard as well. On the positive side, we propose integer linear programming models to solve the PCC and PCCH problems. We demonstrate our algorithms on simulated and real data.


Assuntos
Algoritmos , Filogenia , Humanos , Simulação por Computador , Animais , Biologia Computacional/métodos
2.
J Theor Biol ; 579: 111697, 2024 02 21.
Artigo em Inglês | MEDLINE | ID: mdl-38142045

RESUMO

The association of DNA methylation with age has been extensively studied. Previous work has investigated the trajectories of methylation with age, and developed predictive biomarkers of age. However, we still have a limited understanding of the functional form of methylation-age dynamics. To address this we present a theoretical framework to model the dynamics of DNA methylation at single sites. We show that this model leads to convergence to a steady-state methylation level at an exponential rate. By fitting the model to a dataset that measures changes in DNA methylation in the brain from birth to old age, we show that the timescales of this exponential convergence are heterogeneous across sites. To model this heterogeneity we generated a simulation of CpG Methylation changes with time and investigated the functional form of the dynamics of methylation with age under the empirical distribution of timescales estimated from the dataset. The resulting dynamics of the average methylation of the system were characterized and were found to closely follow an exponential trajectory. We conclude that DNA methylation can be modeled as a system that starts out of equilibrium at birth and approaches equilibrium with age in an exponential fashion. These insights illustrate the importance of accounting for nonlinear dynamics when utilizing age associated DNA methylation changes for constructing biomarkers of aging. Thus DNA methylation, along with the exponentially increasing risk of mortality with age, further establishes the exponential nature of aging.


Assuntos
Metilação de DNA , Epigênese Genética , Ilhas de CpG/genética , Biomarcadores
3.
Syst Biol ; 72(6): 1403-1417, 2023 Dec 30.
Artigo em Inglês | MEDLINE | ID: mdl-37862116

RESUMO

The genomic era has opened up vast opportunities in molecular systematics, one of which is deciphering the evolutionary history in fine detail. Under this mass of data, analyzing the point mutations of standard markers is often too crude and slow for fine-scale phylogenetics. Nevertheless, genome dynamics (GD) events provide alternative, often richer information. The synteny index (SI) between a pair of genomes combines gene order and gene content information, allowing the comparison of genomes of unequal gene content, together with order considerations of their common genes. Recently, genome dynamics has been modeled as a continuous-time Markov process, and gene distance in the genome as a birth-death-immigration process. Nevertheless, due to complexities arising in this setting, no precise and provably consistent estimators could be derived, resulting in heuristic solutions. Here, we extend this modeling approach by using techniques from birth-death theory to derive explicit expressions of the system's probabilistic dynamics in the form of rational functions of the model parameters. This, in turn, allows us to infer analytically accurate distances between organisms based on their SI. Subsequently, we establish additivity of this estimated evolutionary distance (a desirable property yielding phylogenetic consistency). Applying the new measure in simulation studies shows that it provides accurate results in realistic settings and even under model extensions such as gene gain/loss or over a tree structure. In the real-data realm, we applied the new formulation to unique data structure that we constructed-the ordered orthology DB-based on a new version of the EggNOG database, to construct a tree with more than 4.5K taxa. To the best of our knowledge, this is the largest gene-order-based tree constructed and it overcomes shortcomings found in previous approaches. Constructing a GD-based tree allows to confirm and contrast findings based on other phylogenetic approaches, as we show.


Assuntos
Genoma , Genômica , Filogenia , Genômica/métodos , Simulação por Computador , Evolução Molecular
4.
Front Bioinform ; 3: 1308680, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38235295

RESUMO

Epigenetic clocks are DNA methylation-based chronological age prediction models that are commonly employed to study age-related biology. The difference between the predicted and observed age is often interpreted as a form of biological age acceleration, and many studies have measured the impact of environmental and disease-associated factors on epigenetic age. Most epigenetic clocks are fit using approaches that minimize the error between the predicted and observed chronological age, and as a result, they may not accurately model the impact of factors that moderate the relationship between the actual and epigenetic age. Here, we compare epigenetic clocks that are constructed using penalized regression methods to an evolutionary framework of epigenetic aging with the epigenetic pacemaker (EPM), which directly models DNA methylation as a function of a time-dependent epigenetic state. In simulations, we show that the value of the epigenetic state is impacted by factors such as age, sex, and cell-type composition. Next, in a dataset aggregated from previous studies, we show that the epigenetic state is also moderated by sex and the cell type. Finally, we demonstrate that the epigenetic state is also moderated by toxins in a study on polybrominated biphenyl exposure. Thus, we find that the pacemaker provides a robust framework for the study of factors that impact epigenetic age acceleration and that the effect of these factors may be obscured in traditional clocks based on linear regression models.

5.
Epigenetics ; 17(11): 1497-1512, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-35502722

RESUMO

Unlike genomes, which are static throughout the lifespan of an organism, DNA methylomes are dynamic. To study these dynamics, we developed quantitative models that measure the effect of multiple factors on DNA methylomes including, age, sex, weight, and genetics. We conducted our study in canids, which prove to be an ideal species to assess epigenetic moderators due to their extreme variability in size and well-characterized genetic structure. We collected buccal swabs from 217 canids (207 domestic dogs and 10 grey wolves) and used targeted bisulphite sequencing to measure methylomes. We also measured genotypes at over one thousand single nucleotide polymorphisms (SNPs). As expected, we found that DNA methylomes are strongly associated with age, enabling the construction of epigenetic clocks. However, we also identify novel associations between methylomes and sex, weight, and sterilization status, leading to accurate models that predict these factors. Methylomes are also affected by genetics, and we observe multiple associations between SNP loci and methylated CpGs. Finally, we show that several factors moderate the relationship between epigenetic ages and real ages, such as body weight, which increases epigenetic ageing. In conclusion, we demonstrate that the plasticity of DNA methylomes is impacted by myriad genetics and physiological factors, and that DNA methylation biomarkers are accurate predictors of age, sex and sterilization status.


Assuntos
Metilação de DNA , Epigenoma , Animais , Cães , Epigenômica , Longevidade , Genótipo , Epigênese Genética
6.
Nat Ecol Evol ; 6(4): 418-426, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35256811

RESUMO

Species that hibernate generally live longer than would be expected based solely on their body size. Hibernation is characterized by long periods of metabolic suppression (torpor) interspersed by short periods of increased metabolism (arousal). The torpor-arousal cycles occur multiple times during hibernation, and it has been suggested that processes controlling the transition between torpor and arousal states cause ageing suppression. Metabolic rate is also a known correlate of longevity; we thus proposed the 'hibernation-ageing hypothesis' whereby ageing is suspended during hibernation. We tested this hypothesis in a well-studied population of yellow-bellied marmots (Marmota flaviventer), which spend 7-8 months per year hibernating. We used two approaches to estimate epigenetic age: the epigenetic clock and the epigenetic pacemaker. Variation in epigenetic age of 149 samples collected throughout the life of 73 females was modelled using generalized additive mixed models (GAMM), where season (cyclic cubic spline) and chronological age (cubic spline) were fixed effects. As expected, the GAMM using epigenetic ages calculated from the epigenetic pacemaker was better able to detect nonlinear patterns in epigenetic ageing over time. We observed a logarithmic curve of epigenetic age with time, where the epigenetic age increased at a higher rate until females reached sexual maturity (two years old). With respect to circannual patterns, the epigenetic age increased during the active season and essentially stalled during the hibernation period. Taken together, our results are consistent with the hibernation-ageing hypothesis and may explain the enhanced longevity in hibernators.


Assuntos
Hibernação , Marmota , Animais , Epigênese Genética , Feminino , Longevidade/genética , Marmota/genética , Marmota/metabolismo , Estações do Ano
7.
PLoS Comput Biol ; 16(11): e1008454, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-33253159

RESUMO

One of the hallmarks of cancer is the extremely high mutability and genetic instability of tumor cells. Inherent heterogeneity of intra-tumor populations manifests itself in high variability of clone instability rates. Analogously to fitness landscapes, the instability rates of clonal populations form their mutability landscapes. Here, we present MULAN (MUtability LANdscape inference), a maximum-likelihood computational framework for inference of mutation rates of individual cancer subclones using single-cell sequencing data. It utilizes the partial information about the orders of mutation events provided by cancer mutation trees and extends it by inferring full evolutionary history and mutability landscape of a tumor. Evaluation of mutation rates on the level of subclones rather than individual genes allows to capture the effects of genomic interactions and epistasis. We estimate the accuracy of our approach and demonstrate that it can be used to study the evolution of genetic instability and infer tumor evolutionary history from experimental data. MULAN is available at https://github.com/compbel/MULAN.


Assuntos
Mutação , Neoplasias/genética , Neoplasias/patologia , Análise de Célula Única/métodos , Algoritmos , Instabilidade Genômica , Humanos
8.
Sci Rep ; 10(1): 12425, 2020 07 24.
Artigo em Inglês | MEDLINE | ID: mdl-32709941

RESUMO

It is well established nowadays that among prokaryotes, various families of orthologous genes exhibit conflicting evolutionary history. A prime factor for this conflict is horizontal gene transfer (HGT) - the transfer of genetic material not via vertical descent. Thus, the prevalence of HGT is challenging the meaningfulness of the classical Tree of Life concept. Here we present a comprehensive study of HGT representing the entire prokaryotic world. We mainly rely on a novel analytic approach for analyzing an aggregate of gene histories, by means of the quartet plurality distribution (QPD) that we develop. Through the analysis of real and simulated data, QPD is used to reveal evidence of a barrier against HGT, separating the archaea from the bacteria and making HGT between the two domains, in general, quite rare. In contrast, bacteria's confined HGT is substantially more frequent than archaea's. Our approach also reveals that despite intensive HGT, a strong tree-like signal can be extracted, corroborating several previous works. Thus, QPD, which enables one to analytically combine information from an aggregate of gene trees, can be used for understanding patterns and rates of HGT in prokaryotes, as well as for validating or refuting models of horizontal genetic transfers and evolution in general.


Assuntos
Archaea/genética , Bactérias/genética , Evolução Molecular , Transferência Genética Horizontal , Modelos Genéticos , Filogenia
9.
Bioinformatics ; 36(17): 4662-4663, 2020 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-32573701

RESUMO

SUMMARY: Epigenetic rates of change, much as evolutionary mutation rate along a lineage, vary during lifetime. Accurate estimation of the epigenetic state has vast medical and biological implications. To account for these non-linear epigenetic changes with age, we recently developed a formalism inspired by the Pacemaker model of evolution that accounts for varying rates of mutations with time. Here, we present a python implementation of the Epigenetic Pacemaker (EPM), a conditional expectation maximization algorithm that estimates epigenetic landscapes and the state of individuals and may be used to study non-linear epigenetic aging. AVAILABILITY AND IMPLEMENTATION: The EPM is available at https://pypi.org/project/EpigeneticPacemaker/ under the MIT license. The EPM is compatible with python version 3.6 and above.


Assuntos
Epigenômica , Marca-Passo Artificial , Envelhecimento , Algoritmos , Epigênese Genética , Humanos , Software
10.
BMC Genomics ; 21(Suppl 2): 257, 2020 Apr 16.
Artigo em Inglês | MEDLINE | ID: mdl-32299339

RESUMO

BACKGROUND: DNA methylation is widely used as a biomarker in crucial medical applications as well as for human age prediction of very high accuracy. This biomarker is based on the methylation status of several hundred CpG sites. In a recent line of publications we have adapted a versatile concept from evolutionary biology - the Universal Pacemaker (UPM) - to the setting of epigenetic aging and denoted it the Epigenetic PaceMaker (EPM). The EPM, as opposed to other epigenetic clocks, is not confined to specific pattern of aging, and the epigenetic age of the individual is inferred independently of other individuals. This allows an explicit modeling of aging trends, in particular non linear relationship between chronological and epigenetic age. In one of these recent works, we have presented an algorithmic improvement based on a two-step conditional expectation maximization (CEM) algorithm to arrive at a critical point on the likelihood surface. The algorithm alternates between a time step and a site step while advancing on the likelihood surface. RESULTS: Here we introduce non trivial improvements to these steps that are essential for analyzing data sets of realistic magnitude in a manageable time and space. These structural improvements are based on insights from linear algebra and symbolic algebra tools, providing us greater understanding of the degeneracy of the complex problem space. This understanding in turn, leads to the complete elimination of the bottleneck of cumbersome matrix multiplication and inversion, yielding a fast closed form solution in both steps of the CEM.In the experimental results part, we compare the CEM algorithm over several data sets and demonstrate the speedup obtained by the closed form solutions. Our results support the theoretical analysis of this improvement. CONCLUSIONS: These improvements enable us to increase substantially the scale of inputs analyzed by the method, allowing us to apply the new approach to data sets that could not be analyzed before.


Assuntos
Envelhecimento/genética , Relógios Biológicos/genética , Genômica/métodos , Algoritmos , Ilhas de CpG , Metilação de DNA , Epigênese Genética , Epigenômica , Humanos , Funções Verossimilhança , Modelos Genéticos
11.
BMC Genomics ; 21(Suppl 1): 106, 2020 Mar 05.
Artigo em Inglês | MEDLINE | ID: mdl-32138652

RESUMO

BACKGROUND: Horizontal gene transfer (HGT) is the event of a DNA sequence being transferred between species not by inheritance. HGT is a crucial factor in prokaryotic evolution and is a significant source for genomic novelty resulting in antibiotic resistance or the outbreak of virulent strains. Detection of HGT and the mechanisms responsible and enabling it, is hence of prime importance.Existing algorithms rely on a strong phylogenetic signal distinguishing the transferred sequence from its recipient genome. Closely related species pose an even greater challenge as most genes are very similar and therefore, the phylogenetic signal is weak anyhow. Notwithstanding, the importance of detecting HGT between such organisms is extremely high for the role of HGT in the emergence of new highly virulent strains. RESULTS: In a recent work we devised a novel technique that relies on loss of synteny around a gene as a witness for HGT. We used a novel heuristic for synteny measurement, SI (Syntent Index), and the technique was tested on both simulated and real data and was found to provide a greater sensitivity than other HGT techniques. This synteny-based approach suffers low specificity, in particular more closely related species. Here we devise an adaptive approach to cope with this by varying the criteria according to species distance. The new approach is doubly adaptive as it also considers the lengths of the genes being transferred. In particular, we use Chernoff bound to decree HGT both in simulations and real bacterial genomes taken from EggNog database. CONCLUSIONS: Here we show empirically that this approach is more conservative than the previous χ2 based approach and provides a lower false positive rate, especially for closely related species and under wide range of genome parameters.


Assuntos
Bactérias/genética , Biologia Computacional/métodos , Transferência Genética Horizontal , Vírus/genética , Algoritmos , Evolução Molecular , Especiação Genética , Filogenia
12.
NAR Genom Bioinform ; 2(1): lqz013, 2020 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-33575565

RESUMO

Word-based or 'alignment-free' methods for phylogeny inference have become popular in recent years. These methods are much faster than traditional, alignment-based approaches, but they are generally less accurate. Most alignment-free methods calculate 'pairwise' distances between nucleic-acid or protein sequences; these distance values can then be used as input for tree-reconstruction programs such as neighbor-joining. In this paper, we propose the first word-based phylogeny approach that is based on 'multiple' sequence comparison and 'maximum likelihood'. Our algorithm first samples small, gap-free alignments involving four taxa each. For each of these alignments, it then calculates a quartet tree and, finally, the program 'Quartet MaxCut' is used to infer a super tree for the full set of input taxa from the calculated quartet trees. Experimental results show that trees produced with our approach are of high quality.

13.
Mol Biol Evol ; 37(5): 1470-1479, 2020 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-31845962

RESUMO

The dramatic decrease in time and cost for generating genetic sequence data has opened up vast opportunities in molecular systematics, one of which is the ability to decipher the evolutionary history of strains of a species. Under this fine systematic resolution, the standard markers are too crude to provide a phylogenetic signal. Nevertheless, among prokaryotes, genome dynamics in the form of horizontal gene transfer (HGT) between organisms and gene loss seem to provide far richer information by affecting both gene order and gene content. The "synteny index" (SI) between a pair of genomes combines these latter two factors, allowing comparison of genomes with unequal gene content, together with order considerations of their common genes. Although this approach is useful for classifying close relatives, no rigorous statistical modeling for it has been suggested. Such modeling is valuable, as it allows observed measures to be transformed into estimates of time periods during evolution, yielding the "additivity" of the measure. To the best of our knowledge, there is no other additivity proof for other gene order/content measures under HGT. Here, we provide a first statistical model and analysis for the SI measure. We model the "gene neighborhood" as a "birth-death-immigration" process affected by the HGT activity over the genome, and analytically relate the HGT rate and time to the expected SI. This model is asymptotic and thus provides accurate results, assuming infinite size genomes. Therefore, we also developed a heuristic model following an "exponential decay" function, accounting for biologically realistic values, which performed well in simulations. Applying this model to 1,133 prokaryotes partitioned to 39 clusters by the rank of genus yields that the average number of genome dynamics events per gene in the phylogenetic depth of genus is around half with significant variability between genera. This result extends and confirms similar results obtained for individual genera in different manners.


Assuntos
Transferência Genética Horizontal , Técnicas Genéticas , Modelos Genéticos , Sintenia , Genoma Microbiano , Filogenia
14.
Epigenetics ; 14(9): 912-926, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31138013

RESUMO

Epigenetic changes during ageing have been characterized by multiple epigenetic clocks that allow the prediction of chronological age based on methylation status. Despite their accuracy and utility, epigenetic age biomarkers leave many questions about epigenetic ageing unanswered. Specifically, they do not permit the unbiased characterization of non-linear epigenetic ageing trends across entire life spans, a critical question underlying this field of research. Here we provide an integrated framework to address this question. Our model, inspired from evolutionary models, is able to account for acceleration/deceleration in epigenetic changes by fitting an individual's model age, the epigenetic age, which is related to chronological age in a non-linear fashion. Application of this model to DNA methylation data measured across broad age ranges, from before birth to old age, and from two tissue types, suggests a universal logarithmic trend characterizes epigenetic ageing across entire lifespans.


Assuntos
Envelhecimento/genética , Metilação de DNA , Epigenômica/métodos , Adulto , Idoso , Idoso de 80 Anos ou mais , Ilhas de CpG , Epigênese Genética , Estudo de Associação Genômica Ampla , Humanos , Longevidade , Pessoa de Meia-Idade , Modelos Genéticos , Adulto Jovem
15.
Mol Phylogenet Evol ; 136: 128-137, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-30946898

RESUMO

BACKGROUND: Extensive research efforts have been made to reconstruct the Tree of Life, aiming to explain the evolutionary history of life on earth. We expect the advent of next generation sequencing methods to bring us close to solving this challenge. Notwithstanding, with the accumulation of this mass of molecular data, it becomes evident that this solution is more complex and far from reach, especially among prokaryotes. One of the reasons for this is the ability of bacteria to perform horizontal gene transfer (HGT), creating substantial conflicts between different genes histories. Fortunately, evolution has equipped us with several markers with different levels of resolution, among which is synteny - the conservation of gene order along the chromosome. RESULTS: We have performed a comprehensive phylogenomic study via synteny based footprints. We build on the synteny index (SI) concept, defined in a pilot work of ours, and extend it to a systematic phylogenetic method with well defined valid regions of operations. Applying it to the EggNOG repository, divides all species into 39 clusters, agreeing with the conventional taxonomy. We show analytically that the signal of the standard phylogenetic marker, the 16S, is too faint for reliable classification. CONCLUSIONS: This work exhibits three separate yet related contributions. In terms of phylogenetics, it demonstrates quantitatively the advantage of the SI-based approach over the standard sequence based marker. Evolutionarily, the tree we produce is unique both in its specificity and broadness. Methodologically, the U-shape approach we developed, from synthetic realm, to real life and back to simulation, is novel and allow us to simulate the exact realistic conditions.


Assuntos
Filogenia , Células Procarióticas/classificação , Sintenia/genética , Sequência de Bases , Simulação por Computador , Bases de Dados Genéticas , Reprodutibilidade dos Testes
16.
J Comput Biol ; 26(8): 806-821, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-30676086

RESUMO

Several studies have pointed out that the tight correlation between genes' evolutionary rate is better explained by a model denoted as the Universal PaceMaker (UPM) rather than by a simple rate constancy as manifested by the classical hypothesis of molecular clock (MC). Under UPM, each gene is associated with a single pacemaker (PM) and varies its evolutionary rate according to this PM ticks. Hence, the relative rates of all genes associated with the same PM remain nearly constant, whereas the absolute rates can change arbitrarily according to the PM ticks. A consequent question to that mentioned is finding the gene-PM association only from the gene sequence data. This, however, turns to be a nontrivial task and is affected by the number of variables, their random noise, and the amount of available information. To this end, a clustering heuristic was devised by exploiting the correlation between corresponding edge lengths across thousands of gene trees. Nevertheless, no theoretical study linking the relationship between the affecting parameters was done. We here study this question by providing theoretical bounds, expressed by the system parameters, on probabilities for positive and negative results. We corroborate these results by a simulation study that reveals the critical role of the variances.


Assuntos
Evolução Molecular , Genoma , Modelos Genéticos , Filogenia
17.
J Comput Biol ; 26(8): 794-805, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-30457889

RESUMO

In 2006, Valiant introduced a variation to his celebrated PAC (Probably Approximately Correct) model to biology, by which he wished to explain how, with two simple mechanisms-random variation and natural selection-complex life mechanisms evolved in such a short time. Subsequently, several works extended and specialized the evolvability framework to more specific processes. In this study, we extend the evolvability framework to accommodate horizontal gene transfer, the transfer of genetic material between unrelated organisms. While in a separate work, we focused on the theoretical aspects of this extension and its learnability power; here, the focus is on more practical and biological facets of this new model. Specifically, we focus on the evolutionary process of developing a trait and model it as the conjunction function. We demonstrate the speedup in learning time for a variant of conjunction to which learning algorithms are known. We also confront the new model with the recombination model on real data of Escherichia coli strains under the task of developing pathogenicity and obtain results adhering to current existing knowledge. Apart from the sheer extension to the understudied prokaryotic world, our work offers comparisons of three different models of evolution under the same conditions, which we believe is unique and of a separate interest.


Assuntos
Simulação por Computador , Escherichia coli/genética , Evolução Molecular , Transferência Genética Horizontal , Modelos Genéticos , Células Procarióticas
18.
J Comput Biol ; 26(1): 27-37, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-30422680

RESUMO

Extracting the strength of the tree signal that is encompassed by a collection of gene trees is an exceptionally challenging problem in phylogenomics. Often, this problem not only involves the construction of individual phylogenies based on different genes, which may be a difficult endeavor on its own, but is also exacerbated by many factors that create conflicts between the evolutionary histories of different gene families, such as duplications or losses of genes; hybridization events; incomplete lineage sorting; and horizontal gene transfer, the latter two play central roles in the evolution of eukaryotes and prokaryotes, respectively. In this work, we tackle the aforementioned problem by focusing on quartet trees, which are the most basic unit of information in the context of unrooted phylogenies. In the first part, we show how a theorem of Janson that generalizes the classical Hoeffding inequality can be used to develop a statistical test involving quartets. In the second part, we study real and simulated data using this theoretical advancement, thus demonstrating how the significance of the differences between sets of quartets can be assessed. Our results are particularly intriguing since they nonstandardly require the analysis of dependent random variables.


Assuntos
Família Multigênica , Algoritmos , Animais , Evolução Molecular , Transferência Genética Horizontal , Humanos , Filogenia
19.
PLoS One ; 13(11): e0204322, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30383852

RESUMO

BACKGROUND: Pseudogenes are non-functional sequences in the genome with homologous sequences that are functional (i.e. genes). They are abundant in eukaryotes where they have been extensively investigated, while in prokaryotes they are significantly scarcer and less well studied. Here we conduct a comprehensive analysis of the evolution of orthologs of Mycobacterium leprae pseudogenes in prokaryotes. The leprosy pathogen M. leprae is of particular interest since it contains an unusually large number of pseudogenes, comprising approximately 40% of its entire genome. The analysis is conducted in both broad and narrow phylogenetic ranges. RESULTS: We have developed an informatics-based approach to characterize the evolution of pseudogenes. This approach combines tools from phylogenomics, genomics, and transcriptomics. The results we obtain are used to assess the contributions of two mechanisms for pseudogene formation: failed horizontal gene transfer events and disruption of native genes. CONCLUSIONS: We conclude that, although it was reported that in most bacteria the former is most likely responsible for the majority of pseudogenization events, in mycobacteria, and in particular in M. leprae with its exceptionally high pseudogene numbers, the latter predominates. We believe that our study sheds new light on the evolution of pseudogenes in bacteria, by utilizing new methodologies that are applied to the unusually abundant M. leprae pseudogenes and their orthologs.


Assuntos
Genômica/métodos , Hanseníase/microbiologia , Mycobacterium leprae/genética , Filogenia , Pseudogenes , Evolução Molecular , Transferência Genética Horizontal , Genoma Bacteriano , Humanos
20.
BMC Genomics ; 19(Suppl 6): 570, 2018 Aug 13.
Artigo em Inglês | MEDLINE | ID: mdl-30367577

RESUMO

BACKGROUND: Deciphering the history of life on Earth has long been regarded as one of the most central tasks in biology. In past years, widespread discordance between the evolutionary histories of different groups of orthologous genes of prokaryotes have been revealed, primarily due to horizontal gene transfers (HGTs). Nonetheless, evidence that support a strong tree-like signal of evolution have been uncovered, despite the presence of HGT events. Therefore, a challenging task is to distill this tree-like signal from the noise induced by all sources of non-tree-like events. RESULTS: In this work we tackle this question, using real and simulated data. We first tighten a recent related theoretical result in this field. In a simulation study, we infer individual quartet topologies, and then use the inferred quartets to reconstruct simulated species trees. We demonstrate that accurate tree reconstruction is feasible despite surprisingly high rates of HGT. In a real data study, we construct phylogenies of two sets of prokaryotes, and show that our tree reconstruction scheme is comparable with (and complementary better than) other commonly used methods. CONCLUSIONS: Using a blend of theoretical and empirical investigations, our study proves the feasibility of accurate quartet-based phylogenetic reconstruction, the vast impact of HGT events notwithstanding.


Assuntos
Filogenia , Simulação por Computador , Transferência Genética Horizontal , Genes Arqueais , Genes Bacterianos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA