Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 99
Filtrar
1.
Genome Biol Evol ; 16(4)2024 04 02.
Artículo en Inglés | MEDLINE | ID: mdl-38451738

RESUMEN

Evolutionary convergences are observed at all levels, from phenotype to DNA and protein sequences, and changes at these different levels tend to be correlated. Notably, convergent mutations can lead to convergent changes in phenotype, such as changes in metabolism, drug resistance, and other adaptations to changing environments. We propose a two-component approach to detect mutations subject to convergent evolution in protein alignments. The "Emergence" component selects mutations that emerge more often than expected, while the "Correlation" component selects mutations that correlate with the convergent phenotype under study. With regard to Emergence, a phylogeny deduced from the alignment is provided by the user and is used to simulate the evolution of each alignment position. These simulations allow us to estimate the expected number of mutations in a neutral model, which is compared to the observed number of mutations in the data studied. In Correlation, a comparative phylogenetic approach, is used to measure whether the presence of each of the observed mutations is correlated with the convergent phenotype. Each component can be used on its own, for example Emergence when no phenotype is available. Our method is implemented in a standalone workflow and a webserver, called ConDor. We evaluate the properties of ConDor using simulated data, and we apply it to three real datasets: sedge PEPC proteins, HIV reverse transcriptase, and fish rhodopsin. The results show that the two components of ConDor complement each other, with an overall accuracy that compares favorably to other available tools, especially on large datasets.


Asunto(s)
Evolución Molecular , Peces , Animales , Filogenia , Peces/genética , Rodopsina/genética , Mutación
2.
Syst Biol ; 72(6): 1387-1402, 2023 Dec 30.
Artículo en Inglés | MEDLINE | ID: mdl-37703335

RESUMEN

Multi-type birth-death (MTBD) models are phylodynamic analogies of compartmental models in classical epidemiology. They serve to infer such epidemiological parameters as the average number of secondary infections Re and the infectious time from a phylogenetic tree (a genealogy of pathogen sequences). The representatives of this model family focus on various aspects of pathogen epidemics. For instance, the birth-death exposed-infectious (BDEI) model describes the transmission of pathogens featuring an incubation period (when there is a delay between the moment of infection and becoming infectious, as for Ebola and SARS-CoV-2), and permits its estimation along with other parameters. With constantly growing sequencing data, MTBD models should be extremely useful for unravelling information on pathogen epidemics. However, existing implementations of these models in a phylodynamic framework have not yet caught up with the sequencing speed. Computing time and numerical instability issues limit their applicability to medium data sets (≤ 500 samples), while the accuracy of estimations should increase with more data. We propose a new highly parallelizable formulation of ordinary differential equations for MTBD models. We also extend them to forests to represent situations when a (sub-)epidemic started from several cases (e.g., multiple introductions to a country). We implemented it for the BDEI model in a maximum likelihood framework using a combination of numerical analysis methods for efficient equation resolution. Our implementation estimates epidemiological parameter values and their confidence intervals in two minutes on a phylogenetic tree of 10,000 samples. Comparison to the existing implementations on simulated data shows that it is not only much faster but also more accurate. An application of our tool to the 2014 Ebola epidemic in Sierra-Leone is also convincing, with very fast calculation and precise estimates. As MTBD models are closely related to Cladogenetic State Speciation and Extinction (ClaSSE)-like models, our findings could also be easily transferred to the macroevolution domain.


Asunto(s)
Epidemias , Fiebre Hemorrágica Ebola , Humanos , Filogenia , Fiebre Hemorrágica Ebola/epidemiología , Funciones de Verosimilitud , Modelos Epidemiológicos
3.
Syst Biol ; 72(6): 1280-1295, 2023 Dec 30.
Artículo en Inglés | MEDLINE | ID: mdl-37756489

RESUMEN

The bootstrap method is based on resampling sequence alignments and re-estimating trees. Felsenstein's bootstrap proportions (FBP) are the most common approach to assess the reliability and robustness of sequence-based phylogenies. However, when increasing taxon sampling (i.e., the number of sequences) to hundreds or thousands of taxa, FBP tend to return low support for deep branches. The transfer bootstrap expectation (TBE) has been recently suggested as an alternative to FBP. TBE is measured using a continuous transfer index in [0,1] for each bootstrap tree, instead of the binary {0,1} index used in FBP to measure the presence/absence of the branch of interest. TBE has been shown to yield higher and more informative supports while inducing a very low number of falsely supported branches. Nonetheless, it has been argued that TBE must be used with care due to sampling issues, especially in datasets with a high number of closely related taxa. In this study, we conduct multiple experiments by varying taxon sampling and comparing FBP and TBE support values on different phylogenetic depths, using empirical datasets. Our results show that the main critique of TBE stands in extreme cases with shallow branches and highly unbalanced sampling among clades, but that TBE is still robust in most cases, while FBP is inescapably negatively impacted by high taxon sampling. We suggest guidelines and good practices in TBE (and FBP) computing and interpretation.


Asunto(s)
Filogenia , Reproducibilidad de los Resultados
4.
Viruses ; 15(6)2023 05 25.
Artículo en Inglés | MEDLINE | ID: mdl-37376544

RESUMEN

A deeper understanding of HIV-1 transmission and drug resistance mechanisms can lead to improvements in current treatment policies. However, the rates at which HIV-1 drug resistance mutations (DRMs) are acquired and which transmitted DRMs persist are multi-factorial and vary considerably between different mutations. We develop a method for the estimation of drug resistance acquisition and transmission patterns. The method uses maximum likelihood ancestral character reconstruction informed by treatment roll-out dates and allows for the analysis of very large datasets. We apply our method to transmission trees reconstructed on the data obtained from the UK HIV Drug Resistance Database to make predictions for known DRMs. Our results show important differences between DRMs, in particular between polymorphic and non-polymorphic DRMs and between the B and C subtypes. Our estimates of reversion times, based on a very large number of sequences, are compatible but more accurate than those already available in the literature, with narrower confidence intervals. We consistently find that large resistance clusters are associated with polymorphic DRMs and DRMs with long loss times, which require special surveillance. As in other high-income countries (e.g., Switzerland), the prevalence of sequences with DRMs is decreasing, but among these, the fraction of transmitted resistance is clearly increasing compared to the fraction of acquired resistance mutations. All this indicates that efforts to monitor these mutations and the emergence of resistance clusters in the population must be maintained in the long term.


Asunto(s)
Fármacos Anti-VIH , Infecciones por VIH , Seropositividad para VIH , VIH-1 , Humanos , VIH-1/genética , Infecciones por VIH/tratamiento farmacológico , Infecciones por VIH/epidemiología , Farmacorresistencia Viral/genética , Genotipo , Mutación , Reino Unido/epidemiología , Fármacos Anti-VIH/farmacología , Fármacos Anti-VIH/uso terapéutico
5.
Microbiol Spectr ; 11(1): e0326722, 2023 02 14.
Artículo en Inglés | MEDLINE | ID: mdl-36692300

RESUMEN

In the search for control of human immunodeficiency virus type 1 (HIV-1) infection without antiretroviral therapy, posttreatment controllers (PTCs) are models of HIV remission. To better understand their mechanisms of control, we characterized the HIV blood reservoirs of 8 PTCs (median of 9.4 years after treatment interruption) in comparison with those of 13 natural HIV infection controllers (HICs) (median of 18 years of infection) and with those of individuals receiving efficient antiretroviral therapy initiated during either primary HIV infection (PHIs; n = 8) or chronic HIV infection (CHIs; n = 6). This characterization was performed with single-genome amplification and deep sequencing. The proviral diversity, which reflects the history of past viral replication, was lower in the PTCs, PHIs, and aviremic HICs than in the blipper HICs and CHIs. The proportions of intact and defective proviruses among the proviral pool in PTCs were not significantly different from those of other groups. When looking at the quantities of proviruses per million peripheral blood mononuclear cells (PBMCs), they had similar amounts of intact proviruses as other groups but smaller amounts of defective proviruses than CHIs, suggesting a role of these forms in HIV pathogenesis. Two HICs but none of the PTCs harbored only proviruses with deletion in nef; these attenuated strains could contribute to viral control in these participants. We show, for the first time, the presence of intact proviruses and low viral diversity in PTCs long after treatment interruption, as well as the absence of evolution of the proviral quasispecies in subsequent samples. This reflects low residual replication over time. Further data are necessary to confirm these results. IMPORTANCE Most people living with HIV need antiretroviral therapy to control their infection and experience viral relapse in case of treatment interruption, because of viral reservoir (proviruses) persistence. Knowing that proviruses are very diverse and most of them are defective in treated individuals, we aimed to characterize the HIV blood reservoirs of posttreatment controllers (PTCs), rare models of drug-free remission, in comparison with spontaneous controllers and treated individuals. At a median time of 9 years after treatment interruption, which is unprecedented in the literature, we showed that the proportions and quantities of intact proviruses were similar between PTCs and other individuals. Unlike 2/7 spontaneous controllers who harbored only nef-deleted proviruses, which are attenuated strains, which could contribute to their control, no such case was observed in PTCs. Furthermore, PTCs displayed low viral genetic diversity and no evolution of their reservoirs, indicating very low residual replication, despite the presence of intact proviruses.


Asunto(s)
Infecciones por VIH , VIH-1 , Humanos , Leucocitos Mononucleares , VIH-1/genética , Provirus/genética , Genoma Viral , Carga Viral , Linfocitos T CD4-Positivos
6.
Nucleic Acids Res ; 50(21): 12328-12343, 2022 11 28.
Artículo en Inglés | MEDLINE | ID: mdl-36453997

RESUMEN

G-quadruplexes (G4s) are four-stranded nucleic acid structures formed by the stacking of G-tetrads. Here we investigated their formation and function during HIV-1 infection. Using bioinformatics and biophysics analyses we first searched for evolutionary conserved G4-forming sequences in HIV-1 genome. We identified 10 G4s with conservation rates higher than those of HIV-1 regulatory sequences such as RRE and TAR. We then used porphyrin-based G4-binders to probe the formation of the G4s during infection of human cells by native HIV-1. The G4-binders efficiently inhibited HIV-1 infectivity, which is attributed to the formation of G4 structures during HIV-1 replication. Using a qRT-PCR approach, we showed that the formation of viral G4s occurs during the first 2 h post-infection and their stabilization by the G4-binders prevents initiation of reverse transcription. We also used a G4-RNA pull-down approach, based on a G4-specific biotinylated probe, to allow the direct detection and identification of viral G4-RNA in infected cells. Most of the detected G4-RNAs contain crucial regulatory elements such as the PPT and cPPT sequences as well as the U3 region. Hence, these G4s would function in the early stages of infection when the viral RNA genome is being processed for the reverse transcription step.


Asunto(s)
G-Cuádruplex , VIH-1 , Humanos , ARN/química , VIH-1/genética , Secuencias Reguladoras de Ácidos Nucleicos , Secuencia Conservada
7.
Front Bioinform ; 2: 867111, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36304258

RESUMEN

High-throughput sequencing has provided the capacity of broad virus detection for both known and unknown viruses in a variety of hosts and habitats. It has been successfully applied for novel virus discovery in many agricultural crops, leading to the current drive to apply this technology routinely for plant health diagnostics. For this, efficient and precise methods for sequencing-based virus detection and discovery are essential. However, both existing alignment-based methods relying on reference databases and even more recent machine learning approaches are not efficient enough in detecting unknown viruses in RNAseq datasets of plant viromes. We present VirHunter, a deep learning convolutional neural network approach, to detect novel and known viruses in assemblies of sequencing datasets. While our method is generally applicable to a variety of viruses, here, we trained and evaluated it specifically for RNA viruses by reinforcing the coding sequences' content in the training dataset. Trained on the NCBI plant viruses data for three different host species (peach, grapevine, and sugar beet), VirHunter outperformed the state-of-the-art method, DeepVirFinder, for the detection of novel viruses, both in the synthetic leave-out setting and on the 12 newly acquired RNAseq datasets. Compared with the traditional tBLASTx approach, VirHunter has consistently exhibited better results in the majority of leave-out experiments. In conclusion, we have shown that VirHunter can be used to streamline the analyses of plant HTS-acquired viromes and is particularly well suited for the detection of novel viral contigs, in RNAseq datasets.

8.
Virus Evol ; 8(1): veac029, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35478717

RESUMEN

The Zika virus (ZIKV) disease caused a public health emergency of international concern that started in February 2016. The overall number of ZIKV-related cases increased until November 2016, after which it declined sharply. While the evaluation of the potential risk and impact of future arbovirus epidemics remains challenging, intensified surveillance efforts along with a scale-up of ZIKV whole-genome sequencing provide an opportunity to understand the patterns of genetic diversity, evolution, and spread of ZIKV. However, a classification system that reflects the true extent of ZIKV genetic variation is lacking. Our objective was to characterize ZIKV genetic diversity and phylodynamics, identify genomic footprints of differentiation patterns, and propose a dynamic classification system that reflects its divergence levels. We analysed a curated dataset of 762 publicly available sequences spanning the full-length coding region of ZIKV from across its geographical span and collected between 1947 and 2021. The definition of genetic groups was based on comprehensive evolutionary dynamics analyses, which included recombination and phylogenetic analyses, within- and between-group pairwise genetic distances comparison, detection of selective pressure, and clustering analyses. Evidence for potential recombination events was detected in a few sequences. However, we argue that these events are likely due to sequencing errors as proposed in previous studies. There was evidence of strong purifying selection, widespread across the genome, as also detected for other arboviruses. A total of 50 sites showed evidence of positive selection, and for a few of these sites, there was amino acid (AA) differentiation between genetic clusters. Two main genetic clusters were defined, ZA and ZB, which correspond to the already characterized 'African' and 'Asian' genotypes, respectively. Within ZB, two subgroups, ZB.1 and ZB.2, represent the Asiatic and the American (and Oceania) lineages, respectively. ZB.1 is further subdivided into ZB.1.0 (a basal Malaysia sequence sampled in the 1960s and a recent Indian sequence), ZB.1.1 (South-Eastern Asia, Southern Asia, and Micronesia sequences), and ZB.1.2 (very similar sequences from the outbreak in Singapore). ZB.2 is subdivided into ZB.2.0 (basal American sequences and the sequences from French Polynesia, the putative origin of South America introduction), ZB.2.1 (Central America), and ZB.2.2 (Caribbean and North America). This classification system does not use geographical references and is flexible to accommodate potential future lineages. It will be a helpful tool for studies that involve analyses of ZIKV genomic variation and its association with pathogenicity and serve as a starting point for the public health surveillance and response to on-going and future epidemics and to outbreaks that lead to the emergence of new variants.

9.
PLoS Pathog ; 18(1): e1010224, 2022 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-34990490

RESUMEN

[This corrects the article DOI: 10.1371/journal.ppat.1009786.].

10.
Syst Biol ; 71(3): 630-648, 2022 04 19.
Artículo en Inglés | MEDLINE | ID: mdl-34469581

RESUMEN

Widely used approaches for extracting phylogenetic information from aligned sets of molecular sequences rely upon probabilistic models of nucleotide substitution or amino-acid replacement. The phylogenetic information that can be extracted depends on the number of columns in the sequence alignment and will be decreased when the alignment contains gaps due to insertion or deletion events. Motivated by the measurement of information loss, we suggest assessment of the effective sequence length (ESL) of an aligned data set. The ESL can differ from the actual number of columns in a sequence alignment because of the presence of alignment gaps. Furthermore, the estimation of phylogenetic information is affected by model misspecification. Inevitably, the actual process of molecular evolution differs from the probabilistic models employed to describe this process. This disparity means the amount of phylogenetic information in an actual sequence alignment will differ from the amount in a simulated data set of equal size, which motivated us to develop a new test for model adequacy. Via theory and empirical data analysis, we show how to disentangle the effects of gaps and model misspecification. By comparing the Fisher information of actual and simulated sequences, we identify which alignment sites and tree branches are most affected by gaps and model misspecification. [Fisher information; gaps; insertion; deletion; indel; model adequacy; goodness-of-fit test; sequence alignment.].


Asunto(s)
Evolución Molecular , Mutación INDEL , Modelos Genéticos , Modelos Estadísticos , Filogenia , Alineación de Secuencia
11.
Curr Opin Virol ; 51: 56-64, 2021 12.
Artículo en Inglés | MEDLINE | ID: mdl-34597873

RESUMEN

Drug resistance mutations appear in HIV under treatment pressure. Resistant variants can be transmitted to treatment-naive individuals, which can lead to rapid virological failure and can limit treatment options. Consequently, quantifying the prevalence, emergence and transmission of drug resistance is critical to effectively treating patients and to shape health policies. We review recent bioinformatics developments and in particular describe: (1) the machine learning approaches intended to predict and explain the level of resistance of HIV variants from their sequence data; (2) the phylogenetic methods used to survey the emergence and dynamics of resistant HIV transmission clusters; (3) the impact of deep sequencing in studying within-host and between-host genetic diversity of HIV variants, notably regarding minority resistant variants.


Asunto(s)
Biología Computacional , Farmacorresistencia Viral/genética , Infecciones por VIH/tratamiento farmacológico , Infecciones por VIH/virología , VIH/efectos de los fármacos , VIH/genética , Mutación , VIH/clasificación , Humanos , Filogenia
12.
NAR Genom Bioinform ; 3(3): lqab083, 2021 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-34522882

RESUMEN

[This corrects the article DOI: 10.1093/nargab/lqab075.].

13.
PLoS Pathog ; 17(8): e1009786, 2021 08.
Artículo en Inglés | MEDLINE | ID: mdl-34370795

RESUMEN

CRF19 is a recombinant form of HIV-1 subtypes D, A1 and G, which was first sampled in Cuba in 1999, but was already present there in 1980s. CRF19 was reported almost uniquely in Cuba, where it accounts for ∼25% of new HIV-positive patients and causes rapid progression to AIDS (∼3 years). We analyzed a large data set comprising ∼350 pol and env sequences sampled in Cuba over the last 15 years and ∼350 from Los Alamos database. This data set contained both CRF19 (∼315), and A1, D and G sequences. We performed and combined analyses for the three A1, G and D regions, using fast maximum likelihood approaches, including: (1) phylogeny reconstruction, (2) spatio-temporal analysis of the virus spread, and ancestral character reconstruction for (3) transmission mode and (4) drug resistance mutations (DRMs). We verified these results with a Bayesian approach. This allowed us to acquire new insights on the CRF19 origin and transmission patterns. We showed that CRF19 recombined between 1966 and 1977, most likely in Cuban community stationed in Congo region. We further investigated CRF19 spread on the Cuban province level, and discovered that the epidemic started in 1970s, most probably in Villa Clara, that it was at first carried by heterosexual transmissions, and then quickly spread in the 1980s within the "men having sex with men" (MSM) community, with multiple transmissions back to heterosexuals. The analysis of the transmission patterns of common DRMs found very few resistance transmission clusters. Our results show a very early introduction of CRF19 in Cuba, which could explain its local epidemiological success. Ignited by a major founder event, the epidemic then followed a similar pattern as other subtypes and CRFs in Cuba. The reason for the short time to AIDS remains to be understood and requires specific surveillance, in Cuba and elsewhere.


Asunto(s)
Transmisión de Enfermedad Infecciosa/estadística & datos numéricos , Variación Genética , Infecciones por VIH/epidemiología , VIH-1/clasificación , Filogenia , Teorema de Bayes , Cuba/epidemiología , Femenino , Infecciones por VIH/transmisión , Infecciones por VIH/virología , VIH-1/genética , VIH-1/fisiología , Humanos , Masculino
14.
NAR Genom Bioinform ; 3(3): lqab075, 2021 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-34396097

RESUMEN

Phylogenetics is nowadays at the center of numerous studies in many fields, ranging from comparative genomics to molecular epidemiology. However, phylogenetic analysis workflows are usually complex and difficult to implement, as they are often composed of many small, reccuring, but important data manipulations steps. Among these, we can find file reformatting, sequence renaming, tree re-rooting, tree comparison, bootstrap support computation, etc. These are often performed by custom scripts or by several heterogeneous tools, which may be error prone, uneasy to maintain and produce results that are challenging to reproduce. For all these reasons, the development and reuse of phylogenetic workflows is often a complex task. We identified many operations that are part of most phylogenetic analyses, and implemented them in a toolkit called Gotree/Goalign. The Gotree/Goalign toolkit implements more than 120 user-friendly commands and an API dedicated to multiple sequence alignment and phylogenetic tree manipulations. It is developed in Go, which makes executables easily installable, integrable in workflow environments, and parallelizable when possible. Moreover, Go is a compiled language, which accelerates computations compared to interpreted languages. This toolkit is freely available on most platforms (Linux, MacOS and Windows) and most architectures (amd64, i386) on GitHub at https://github.com/evolbioinfo/gotree, Bioconda and DockerHub.

15.
PLoS Comput Biol ; 17(8): e1008873, 2021 08.
Artículo en Inglés | MEDLINE | ID: mdl-34437532

RESUMEN

Drug resistance mutations (DRMs) appear in HIV under treatment pressure. DRMs are commonly transmitted to naive patients. The standard approach to reveal new DRMs is to test for significant frequency differences of mutations between treated and naive patients. However, we then consider each mutation individually and cannot hope to study interactions between several mutations. Here, we aim to leverage the ever-growing quantity of high-quality sequence data and machine learning methods to study such interactions (i.e. epistasis), as well as try to find new DRMs. We trained classifiers to discriminate between Reverse Transcriptase Inhibitor (RTI)-experienced and RTI-naive samples on a large HIV-1 reverse transcriptase (RT) sequence dataset from the UK (n ≈ 55, 000), using all observed mutations as binary representation features. To assess the robustness of our findings, our classifiers were evaluated on independent data sets, both from the UK and Africa. Important representation features for each classifier were then extracted as potential DRMs. To find novel DRMs, we repeated this process by removing either features or samples associated to known DRMs. When keeping all known resistance signal, we detected sufficiently prevalent known DRMs, thus validating the approach. When removing features corresponding to known DRMs, our classifiers retained some prediction accuracy, and six new mutations significantly associated with resistance were identified. These six mutations have a low genetic barrier, are correlated to known DRMs, and are spatially close to either the RT active site or the regulatory binding pocket. When removing both known DRM features and sequences containing at least one known DRM, our classifiers lose all prediction accuracy. These results likely indicate that all mutations directly conferring resistance have been found, and that our newly discovered DRMs are accessory or compensatory mutations. Moreover, apart from the accessory nature of the relationships we found, we did not find any significant signal of further, more subtle epistasis combining several mutations which individually do not seem to confer any resistance.


Asunto(s)
Macrodatos , Farmacorresistencia Viral/genética , Infecciones por VIH/tratamiento farmacológico , Infecciones por VIH/virología , VIH-1/efectos de los fármacos , VIH-1/genética , Aprendizaje Automático Supervisado , África , Fármacos Anti-VIH/farmacología , Teorema de Bayes , Biología Computacional , Bases de Datos Genéticas , Árboles de Decisión , Epistasis Genética , Genes Virales , Transcriptasa Inversa del VIH/antagonistas & inhibidores , Transcriptasa Inversa del VIH/química , Transcriptasa Inversa del VIH/genética , Humanos , Modelos Logísticos , Modelos Genéticos , Mutación , Reino Unido
16.
Bioinformatics ; 37(11): 1506-1514, 2021 Jul 12.
Artículo en Inglés | MEDLINE | ID: mdl-30726875

RESUMEN

MOTIVATION: Most evolutionary analyses are based on pre-estimated multiple sequence alignment. Wong et al. established the existence of an uncertainty induced by multiple sequence alignment when reconstructing phylogenies. They were able to show that in many cases different aligners produce different phylogenies, with no simple objective criterion sufficient to distinguish among these alternatives. RESULTS: We demonstrate that incorporating MSA induced uncertainty into bootstrap sampling can significantly increase correlation between clade correctness and its corresponding bootstrap value. Our procedure involves concatenating several alternative multiple sequence alignments of the same sequences, produced using different commonly used aligners. We then draw bootstrap replicates while favoring columns of the more unique aligner among the concatenated aligners. We named this concatenation and bootstrapping method, Weighted Partial Super Bootstrap (wpSBOOT). We show on three simulated datasets of 16, 32 and 64 tips that our method improves the predictive power of bootstrap values. We also used as a benchmark an empirical collection of 853 one to one orthologous genes from seven yeast species and found wpSBOOT to significantly improve discrimination capacity between topologically correct and incorrect trees. Bootstrap values of wpSBOOT are comparable to similar readouts estimated using a single method. However, for reduced trees by 50 and 95% bootstrap thresholds, wpSBOOT comes out the lowest Type I error (less FP). AVAILABILITY AND IMPLEMENTATION: The automated generation of replicates has been implemented in the T-Coffee package, which is available as open source freeware available from www.tcoffee.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

17.
Bioinformatics ; 37(12): 1761-1762, 2021 07 19.
Artículo en Inglés | MEDLINE | ID: mdl-33045068

RESUMEN

MOTIVATION: The first cases of the COVID-19 pandemic emerged in December 2019. Until the end of February 2020, the number of available genomes was below 1000 and their multiple alignment was easily achieved using standard approaches. Subsequently, the availability of genomes has grown dramatically. Moreover, some genomes are of low quality with sequencing/assembly errors, making accurate re-alignment of all genomes nearly impossible on a daily basis. A more efficient, yet accurate approach was clearly required to pursue all subsequent bioinformatics analyses of this crucial data. RESULTS: hCoV-19 genomes are highly conserved, with very few indels and no recombination. This makes the profile HMM approach particularly well suited to align new genomes, add them to an existing alignment and filter problematic ones. Using a core of ∼2500 high quality genomes, we estimated a profile using HMMER, and implemented this profile in COVID-Align, a user-friendly interface to be used online or as standalone via Docker. The alignment of 1000 genomes requires ∼50 minutes on our cluster. Moreover, COVID-Align provides summary statistics, which can be used to determine the sequencing quality and evolutionary novelty of input genomes (e.g. number of new mutations and indels). AVAILABILITY AND IMPLEMENTATION: https://covalign.pasteur.cloud, hub.docker.com/r/evolbioinfo/covid-align. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
COVID-19 , Programas Informáticos , Genoma , Humanos , Pandemias , SARS-CoV-2
18.
C R Biol ; 2020 Nov 24.
Artículo en Inglés | MEDLINE | ID: mdl-33274614

RESUMEN

SARS-CoV-2 is the virus responsible for the global COVID19 pandemic. We review what is known about the origin of this virus, detected in China at the end of December 2019. The genome of this virus mainly evolves under the effect of point mutations. These are generally neutral and have no impact on virulence and severity, but some appear to influence infectivity, notably the D614G mutation of the Spike protein. To date (30/09/2020) no recombination of the virus has been documented in the human host, and very few insertions and deletions. The worldwide spread of the virus was the subject of controversies that we summarize, before proposing a new approach free from the limitations of previous methods. The results show a complex scenario with, for example, numerous introductions to the USA and returns of the virus from the USA to certain countries including France.


Le SARS-CoV-2 est le virus responsable de la pandémie mondiale de COVID19. On dresse ici un bilan de ce qui est connu sur l'origine de ce virus, détecté en Chine fin décembre 2019. Le génome de ce virus évolue sous l'effet de mutations ponctuelles. Celles-ci sont généralement neutres et sans impact sur la virulence et la sévérité, mais certaines semblent influer sur l'infectiosité, notamment la mutation D614G de la protéine Spike. A l'inverse, on n'a à ce jour (30/09/2020) documenté aucune recombinaison du virus chez l'hôte humain, et très peu d'insertions et de délétions. La propagation mondiale du virus a fait l'objet de polémiques sur lesquelles nous revenons, avant de proposer une nouvelle approche débarrassée des limites des méthodes précédentes. Les résultats montrent une propagation complexe avec, par exemple, de très nombreuses introductions aux USA et des retours du virus depuis les USA vers certains pays dont la France.

19.
bioRxiv ; 2020 Nov 06.
Artículo en Inglés | MEDLINE | ID: mdl-33173870

RESUMEN

Although the global response to COVID-19 has not been entirely unified, the opportunity arises to assess the impact of regional public health interventions and to classify strategies according to their outcome. Analysis of genetic sequence data gathered over the course of the pandemic allows us to link the dynamics associated with networks of connected individuals with specific interventions. In this study, clusters of transmission were inferred from a phylogenetic tree representing the relationships of patient sequences sampled from December 30, 2019 to April 17, 2020. Metadata comprising sampling time and location were used to define the global behavior of transmission over this earlier sampling period, but also the involvement of individual regions in transmission cluster dynamics. Results demonstrate a positive impact of international travel restrictions and nationwide lockdowns on global cluster dynamics. However, residual, localized clusters displayed a wide range of estimated initial secondary infection rates, for which uniform public health interventions are unlikely to have sustainable effects. Our findings highlight the presence of so-called "super-spreaders", with the propensity to infect a larger-than-average number of people, in countries, such as the USA, for which additional mitigation efforts targeting events surrounding this type of spread are urgently needed to curb further dissemination of SARS-CoV-2.

20.
Nat Commun ; 11(1): 5347, 2020 10 22.
Artículo en Inglés | MEDLINE | ID: mdl-33093464

RESUMEN

In 1970, the seventh pandemic of cholera (7 P) reached both Africa and Europe. Between 1970 and 2011, several European countries reported cholera outbreaks of a few to more than 2,000 cases. We report here a whole-genome analysis of 1,324 7 P V. cholerae El Tor (7 PET) isolates, including 172 from autochthonous sporadic or outbreak cholera cases occurring between 1970 and 2011 in Europe, providing insight into the spatial and temporal spread of this pathogen across Europe. In this work, we show that the 7 PET lineage was introduced at least eight times into two main regions: Eastern and Southern Europe. Greater recurrence of the disease was observed in Eastern Europe, where it persisted until 2011. It was introduced into this region from Southern Asia, often circulating regionally in the countries bordering the Black Sea, and in the Middle East before reaching Eastern Africa on several occasions. In Southern Europe, the disease was mostly seen in individual countries during the 1970s and was imported from North and West Africa, except in 1994, when cholera was imported into Albania and Italy from the Black Sea region. These results shed light on the geographic course of cholera during the seventh pandemic and highlight the role of humans in its global dissemination.


Asunto(s)
Cólera/historia , Pandemias/historia , Cólera/epidemiología , Cólera/microbiología , Farmacorresistencia Bacteriana/genética , Europa (Continente)/epidemiología , Evolución Molecular , Genoma Bacteriano , Genómica , Historia del Siglo XX , Historia del Siglo XXI , Migración Humana/historia , Humanos , Filogenia , Ribotipificación , Análisis Espacio-Temporal , Vibrio cholerae/clasificación , Vibrio cholerae/genética , Vibrio cholerae/aislamiento & purificación
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA