RESUMO
Despite having important biological implications, insertion, and deletion (indel) events are often disregarded or mishandled during phylogenetic inference. In multiple sequence alignment, indels are represented as gaps and are estimated without considering the distinct evolutionary history of insertions and deletions. Consequently, indels are usually excluded from subsequent inference steps, such as ancestral sequence reconstruction and phylogenetic tree search. Here, we introduce indel-aware parsimony (indelMaP), a novel way to treat gaps under the parsimony criterion by considering insertions and deletions as separate evolutionary events and accounting for long indels. By identifying the precise location of an evolutionary event on the tree, we can separate overlapping indel events and use affine gap penalties for long indel modeling. Our indel-aware approach harnesses the phylogenetic signal from indels, including them into all inference stages. Validation and comparison to state-of-the-art inference tools on simulated data show that indelMaP is most suitable for densely sampled datasets with closely to moderately related sequences, where it can reach alignment quality comparable to probabilistic methods and accurately infer ancestral sequences, including indel patterns. Due to its remarkable speed, our method is well suited for epidemiological datasets, eliminating the need for downsampling and enabling the exploitation of the additional information provided by dense taxonomic sampling. Moreover, indelMaP offers new insights into the indel patterns of biologically significant sequences and advances our understanding of genetic variability by considering gaps as crucial evolutionary signals rather than mere artefacts.
Assuntos
Mutação INDEL , Filogenia , Alinhamento de Sequência , Alinhamento de Sequência/métodos , Evolução Molecular , Modelos Genéticos , HumanosRESUMO
Modern phylogenetic methods allow inference of ancestral molecular sequences given an alignment and phylogeny relating present-day sequences. This provides insight into the evolutionary history of molecules, helping to understand gene function and to study biological processes such as adaptation and convergent evolution across a variety of applications. Here, we propose a dynamic programming algorithm for fast joint likelihood-based reconstruction of ancestral sequences under the Poisson Indel Process (PIP). Unlike previous approaches, our method, named ARPIP, enables the reconstruction with insertions and deletions based on an explicit indel model. Consequently, inferred indel events have an explicit biological interpretation. Likelihood computation is achieved in linear time with respect to the number of sequences. Our method consists of two steps, namely finding the most probable indel points and reconstructing ancestral sequences. First, we find the most likely indel points and prune the phylogeny to reflect the insertion and deletion events per site. Second, we infer the ancestral states on the pruned subtree in a manner similar to FastML. We applied ARPIP (Ancestral Reconstruction under PIP) on simulated data sets and on real data from the Betacoronavirus genus. ARPIP reconstructs both the indel events and substitutions with a high degree of accuracy. Our method fares well when compared to established state-of-the-art methods such as FastML and PAML. Moreover, the method can be extended to explore both optimal and suboptimal reconstructions, include rate heterogeneity through time and more. We believe it will expand the range of novel applications of ancestral sequence reconstruction. [Ancestral sequences; dynamic programming; evolutionary stochastic process; indel; joint ancestral sequence reconstruction; maximum likelihood; Poisson Indel Process; phylogeny; SARS-CoV.].
Assuntos
Algoritmos , Mutação INDEL , Filogenia , Funções Verossimilhança , Alinhamento de Sequência , Mutação INDEL/genética , Evolução MolecularRESUMO
Drug resistant HIV is a major threat to the long-term efficacy of antiretroviral treatment. Around 10% of ART-naïve patients in Europe are infected with drug-resistant HIV type 1. Hence it is important to understand the dynamics of transmitted drug resistance evolution. Thanks to routinely performed drug resistance tests, HIV sequence data is increasingly available and can be used to reconstruct the phylogenetic relationship among viral lineages. In this study we employ a phylodynamic approach to quantify the fitness costs of major resistance mutations in the Swiss HIV cohort. The viral phylogeny reflects the transmission tree, which we model using stochastic birth-death-sampling processes with two types: hosts infected by a sensitive or resistant strain. This allows quantification of fitness cost as the ratio between transmission rates of hosts infected by drug resistant strains and transmission rates of hosts infected by drug sensitive strains. The resistance mutations 41L, 67N, 70R, 184V, 210W, 215D, 215S and 219Q (nRTI-related) and 103N, 108I, 138A, 181C, 190A (NNRTI-related) in the reverse trancriptase and the 90M mutation in the protease gene are included in this study. Among the considered resistance mutations, only the 90M mutation in the protease gene was found to have significantly higher fitness than the drug sensitive strains. The following mutations associated with resistance to reverse transcriptase inhibitors were found to be less fit than the sensitive strains: 67N, 70R, 184V, 219Q. The highest posterior density intervals of the transmission ratios for the remaining resistance mutations included in this study all included 1, suggesting that these mutations do not have a significant effect on viral transmissibility within the Swiss HIV cohort. These patterns are consistent with alternative measures of the fitness cost of resistance mutations. Overall, we have developed and validated a novel phylodynamic approach to estimate the transmission fitness cost of drug resistance mutations.
Assuntos
Fármacos Anti-HIV/uso terapêutico , Farmacorresistência Viral/genética , Aptidão Genética , Infecções por HIV/tratamento farmacológico , HIV-1/genética , Taxa de Mutação , Adaptação Biológica/genética , Terapia Antirretroviral de Alta Atividade , Bases de Dados Factuais , Genótipo , Infecções por HIV/epidemiologia , Infecções por HIV/virologia , Humanos , Mutação , Filogenia , Inibidores da Transcriptase Reversa/uso terapêutico , Suíça/epidemiologiaRESUMO
Phylogenetics and phylodynamics are central topics in modern evolutionary biology. Phylogenetic methods reconstruct the evolutionary relationships among organisms, whereas phylodynamic approaches reveal the underlying diversification processes that lead to the observed relationships. These two fields have many practical applications in disciplines as diverse as epidemiology, developmental biology, palaeontology, ecology, and linguistics. The combination of increasingly large genetic data sets and increases in computing power is facilitating the development of more sophisticated phylogenetic and phylodynamic methods. Big data sets allow us to answer complex questions. However, since the required analyses are highly specific to the particular data set and question, a black-box method is not sufficient anymore. Instead, biologists are required to be actively involved with modeling decisions during data analysis. The modular design of the Bayesian phylogenetic software package BEAST 2 enables, and in fact enforces, this involvement. At the same time, the modular design enables computational biology groups to develop new methods at a rapid rate. A thorough understanding of the models and algorithms used by inference software is a critical prerequisite for successful hypothesis formulation and assessment. In particular, there is a need for more readily available resources aimed at helping interested scientists equip themselves with the skills to confidently use cutting-edge phylogenetic analysis software. These resources will also benefit researchers who do not have access to similar courses or training at their home institutions. Here, we introduce the "Taming the Beast" (https://taming-the-beast.github.io/) resource, which was developed as part of a workshop series bearing the same name, to facilitate the usage of the Bayesian phylogenetic software package BEAST 2.
Assuntos
Biologia Computacional/educação , Biologia Computacional/métodos , Filogenia , Software , Materiais de Ensino , AlgoritmosRESUMO
This chapter reviews the use of mathematical and computational models to facilitate understanding of the epidemiology and evolution of Mycobacterium tuberculosis. First, we introduce general epidemiological models, and describe their use with respect to epidemiological dynamics of a single strain and of multiple strains of M. tuberculosis. In particular, we discuss multi-strain models that include drug sensitivity and drug resistance. Second, we describe models for the evolution of M. tuberculosis within and between hosts, and how the resulting diversity of strains can be assessed by considering the evolutionary relationships among different strains. Third, we discuss developments in integrating evolutionary and epidemiological models to analyse M. tuberculosis genetic sequencing data. We conclude the chapter with a discussion of the practical implications of modelling - particularly modelling strain diversity - for controlling the spread of tuberculosis, and future directions for research in this area.
Assuntos
Evolução Biológica , Farmacorresistência Bacteriana Múltipla/genética , Modelos Genéticos , Modelos Estatísticos , Mycobacterium tuberculosis/genética , Tuberculose Resistente a Múltiplos Medicamentos/epidemiologia , Antituberculosos/uso terapêutico , Simulação por Computador , Monitoramento Epidemiológico , Variação Genética , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Epidemiologia Molecular , Mycobacterium tuberculosis/efeitos dos fármacos , Mycobacterium tuberculosis/crescimento & desenvolvimento , Filogenia , Tuberculose Resistente a Múltiplos Medicamentos/tratamento farmacológico , Tuberculose Resistente a Múltiplos Medicamentos/microbiologiaRESUMO
MOTIVATION: Currently, more than 40 sequence tandem repeat detectors are published, providing heterogeneous, partly complementary, partly conflicting results. RESULTS: We present TRAL, a tandem repeat annotation library that allows running and parsing of various detection outputs, clustering of redundant or overlapping annotations, several statistical frameworks for filtering false positive annotations, and importantly a tandem repeat annotation and refinement module based on circular profile hidden Markov models (cpHMMs). Using TRAL, we evaluated the performance of a multi-step tandem repeat annotation workflow on 547 085 sequences in UniProtKB/Swiss-Prot. The researcher can use these results to predict run-times for specific datasets, and to choose annotation complexity accordingly. AVAILABILITY AND IMPLEMENTATION: TRAL is an open-source Python 3 library and is available, together with documentation and tutorials via http://www.vital-it.ch/software/tral. CONTACT: elke.schaper@isb-sib.ch.
Assuntos
Bases de Dados de Proteínas , Bases de Conhecimento , Anotação de Sequência Molecular , Software , Sequências de Repetição em Tandem/genética , Sequência de Aminoácidos , Análise por Conglomerados , Documentação , Biblioteca Gênica , Humanos , Dados de Sequência MolecularRESUMO
As multi-drug resistant tuberculosis (MDR-TB) continues to spread, investigating the transmission potential of different drug-resistant strains becomes an ever more pressing topic in public health. While phylogenetic and transmission tree inferences provide valuable insight into possible transmission chains, phylodynamic inference combines evolutionary and epidemiological analyses to estimate the parameters of the underlying epidemiological processes, allowing us to describe the overall dynamics of disease spread in the population. In this study, we introduce an approach to Mycobacterium tuberculosis (M. tuberculosis) phylodynamic analysis employing an existing computationally efficient model to quantify the transmission fitness costs of drug resistance with respect to drug-sensitive strains. To determine the accuracy and precision of our approach, we first perform a simulation study, mimicking the simultaneous spread of drug-sensitive and drug-resistant tuberculosis (TB) strains. We analyse the simulated transmission trees using the phylodynamic multi-type birth-death model (MTBD, (Kühnert et al., 2016)) within the BEAST2 framework and show that this model can estimate the parameters of the epidemic well, despite the simplifying assumptions that MTBD makes compared to the complex TB transmission dynamics used for simulation. We then apply the MTBD model to an M. tuberculosis lineage 4 dataset that primarily consists of MDR sequences. Some of the MDR strains additionally exhibit resistance to pyrazinamide - an important first-line anti-tuberculosis drug. Our results support the previously proposed hypothesis that pyrazinamide resistance confers a transmission fitness cost to the bacterium, which we quantify for the given dataset. Importantly, our sensitivity analyses show that the estimates are robust to different prior distributions on the resistance acquisition rate, but are affected by the size of the dataset - i.e. we estimate a higher fitness cost when using fewer sequences for analysis. Overall, we propose that MTBD can be used to quantify the transmission fitness cost for a wide range of pathogens where the strains can be appropriately divided into two or more categories with distinct properties.
Assuntos
Epidemias , Mycobacterium tuberculosis , Tuberculose Resistente a Múltiplos Medicamentos , Antituberculosos/farmacologia , Antituberculosos/uso terapêutico , Humanos , Testes de Sensibilidade Microbiana , Mycobacterium tuberculosis/genética , Filogenia , Tuberculose Resistente a Múltiplos Medicamentos/tratamento farmacológico , Tuberculose Resistente a Múltiplos Medicamentos/epidemiologiaRESUMO
According to the World Health Organization (WHO), an estimated 257 million people worldwide are chronically infected with hepatitis B virus (HBV), with approximately 15 million of them being coinfected with hepatitis D virus (HDV). To investigate the prevalence and transmission of HBV and HDV within the general population of a rural village in Cameroon, we analyzed serum samples from most (401/448) of the villagers. HBV surface antigen (HBsAg) was detected in 54 (13.5%) of the 401 samples, with 15% of them also containing anti-HDV antibodies. Although Cameroon has integrated HBV vaccination into their Expanded Program on Immunization for newborns in 2005, an HBsAg carriage rate of 5% was found in children below the age of 5 years. Of the 54 HBsAg-positive samples, 49 HBV pre-S/S sequences (7 genotype A and 42 genotype E sequences) could be amplified by PCR. In spite of the extreme geographical restriction in the recruitment of study participants, a remarkable genetic diversity within HBV genotypes was observed. Phylogenetic analysis of the sequences obtained from PCR products combined with demographic information revealed that the presence of some genetic variants was restricted to members of one household, indicative of intrafamilial transmission, which appears to take place at least in part perinatally from mother to child. Other genetic variants were more widely distributed, reflecting horizontal interhousehold transmission. Data for two households with more than one HBV-HDV-coinfected individual indicate that the two viruses are not necessarily transmitted together, as family members with identical HBV sequences had different HDV statuses. IMPORTANCE This study revealed that the prevalence of HBV and HDV in a rural area of Cameroon is extremely high, underlining the pressing need for the improvement of control strategies. Systematic serological and phylogenetic analyses of HBV sequences turned out to be useful tools to identify networks of virus transmission within and between households. The high HBsAg carriage rate found among children demonstrates that implementation of the HBV birth dose vaccine and improvement of vaccine coverage will be key elements in preventing both HBV and HDV infections. In addition, the high HBsAg carriage rate in adolescents and adults emphasizes the need for identification of chronically infected individuals and linkage to WHO-recommended treatment to prevent progression to liver cirrhosis and hepatocellular carcinoma.
RESUMO
BACKGROUND: Tracking recent transmission is a vital part of controlling widespread pathogens such as Mycobacterium tuberculosis. Multiple methods with specific performance characteristics exist for detecting recent transmission chains, usually by clustering strains based on genotype similarities. With such a large variety of methods available, informed selection of an appropriate approach for determining transmissions within a given setting/time period is difficult. METHODS: This study combines whole genome sequence (WGS) data derived from 324 isolates collected 2005-2010 in Kinshasa, Democratic Republic of Congo (DRC), a high endemic setting, with phylodynamics to unveil the timing of transmission events posited by a variety of standard genotyping methods. Clustering data based on Spoligotyping, 24-loci MIRU-VNTR typing, WGS based SNP (Single Nucleotide Polymorphism) and core genome multi locus sequence typing (cgMLST) typing were evaluated. FINDINGS: Our results suggest that clusters based on Spoligotyping could encompass transmission events that occurred almost 200â¯years prior to sampling while 24-loci-MIRU-VNTR often represented three decades of transmission. Instead, WGS based genotyping applying low SNP or cgMLST allele thresholds allows for determination of recent transmission events, e.g. in timespans of up to 10â¯years for a 5 SNP/allele cut-off. INTERPRETATION: With the rapid uptake of WGS methods in surveillance and outbreak tracking, the findings obtained in this study can guide the selection of appropriate clustering methods for uncovering relevant transmission chains within a given time-period. For high resolution cluster analyses, WGS-SNP and cgMLST based analyses have similar clustering/timing characteristics even for data obtained from a high incidence setting.
Assuntos
Alelos , Genoma Bacteriano , Genótipo , Mycobacterium tuberculosis/genética , Polimorfismo de Nucleotídeo Único , Tuberculose , República Democrática do Congo/epidemiologia , Feminino , Técnicas de Genotipagem , Humanos , Masculino , Tuberculose/epidemiologia , Tuberculose/genética , Tuberculose/transmissãoRESUMO
Tandem repeats (TRs) are frequently observed in genomes across all domains of life. Evidence suggests that some TRs are crucial for proteins with fundamental biological functions and can be associated with virulence, resistance, and infectious/neurodegenerative diseases. Genome-scale systematic studies of TRs have the potential to unveil core mechanisms governing TR evolution and TR roles in shaping genomes. However, TR-related studies are often non-trivial due to heterogeneous and sometimes fast evolving TR regions. In this review, we discuss these intricacies and their consequences. We present our recent contributions to computational and statistical approaches for TR significance testing, sequence profile-based TR annotation, TR-aware sequence alignment, phylogenetic analyses of TR unit number and order, and TR benchmarks. Importantly, all these methods explicitly rely on the evolutionary definition of a tandem repeat as a sequence of adjacent repeat units stemming from a common ancestor. The discussed work has a focus on protein TRs, yet is generally applicable to nucleic acid TRs, sharing similar features.