Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 20
1.
Commun Med (Lond) ; 4(1): 101, 2024 May 25.
Article En | MEDLINE | ID: mdl-38796507

Bacteria are becoming increasingly resistant to antibiotics, reducing our ability to treat infections and threatening to undermine modern health care. Optimising antibiotic use is a key element in tackling the problem. Traditional economic evaluation methods do not capture many of the benefits from improved antibiotic use and the potential impact on resistance. Not capturing these benefits is a major obstacle to optimising antibiotic use, as it fails to incentivise the development and use of interventions to optimise the use of antibiotics and preserve their effectiveness (stewardship interventions). Estimates of the benefits of improving antibiotic use involve considerable uncertainty as they depend on the evolution of resistance and associated health outcomes and costs. Here we discuss how economic evaluation methods might be adapted, in the face of such uncertainties. We propose a threshold-based approach that estimates the minimum resistance-related costs that would need to be averted by an intervention to make it cost-effective. If it is probable that without the intervention costs will exceed the threshold then the intervention should be deemed cost-effective.

2.
J Infect Dis ; 2024 Jan 20.
Article En | MEDLINE | ID: mdl-38245822

BACKGROUND: Carbapenemase-producing Enterobacterales (CPE) are challenging in healthcare, with resistance to multiple classes of antibiotics. This study describes the emergence of IMP-encoding CPE amongst diverse Enterobacterales species between 2016 and 2019 across a London regional network. METHODS: We performed a network analysis of patient pathways, using electronic health records, to identify contacts between IMP-encoding CPE positive patients. Genomes of IMP-encoding CPE isolates were overlayed with patient contacts to imply potential transmission events. RESULTS: Genomic analysis of 84 Enterobacterales isolates revealed diverse species (predominantly Klebsiella spp, Enterobacter spp, E. coli); 86% (72/84) harboured an IncHI2 plasmid carrying blaIMP and colistin resistance gene mcr-9 (68/72). Phylogenetic analysis of IncHI2 plasmids identified three lineages showing significant association with patient contacts and movements between four hospital sites and across medical specialities, which was missed on initial investigations. CONCLUSIONS: Combined, our patient network and plasmid analyses demonstrate an interspecies, plasmid-mediated outbreak of blaIMPCPE, which remained unidentified during standard investigations. With DNA sequencing and multi-modal data incorporation, the outbreak investigation approach proposed here provides a framework for real-time identification of key factors causing pathogen spread. Plasmid-level outbreak analysis reveals that resistance spread may be wider than suspected, allowing more interventions to stop transmission within hospital networks.

3.
Nature ; 626(7997): 145-150, 2024 Feb.
Article En | MEDLINE | ID: mdl-38122820

How likely is it to become infected by SARS-CoV-2 after being exposed? Almost everyone wondered about this question during the COVID-19 pandemic. Contact-tracing apps1,2 recorded measurements of proximity3 and duration between nearby smartphones. Contacts-individuals exposed to confirmed cases-were notified according to public health policies such as the 2 m, 15 min guideline4,5, despite limited evidence supporting this threshold. Here we analysed 7 million contacts notified by the National Health Service COVID-19 app6,7 in England and Wales to infer how app measurements translated to actual transmissions. Empirical metrics and statistical modelling showed a strong relation between app-computed risk scores and actual transmission probability. Longer exposures at greater distances had risk similar to that of shorter exposures at closer distances. The probability of transmission confirmed by a reported positive test increased initially linearly with duration of exposure (1.1% per hour) and continued increasing over several days. Whereas most exposures were short (median 0.7 h, interquartile range 0.4-1.6), transmissions typically resulted from exposures lasting between 1 h and several days (median 6 h, interquartile range 1.4-28). Households accounted for about 6% of contacts but 40% of transmissions. With sufficient preparation, privacy-preserving yet precise analyses of risk that would inform public health measures, based on digital contact tracing, could be performed within weeks of the emergence of a new pathogen.


COVID-19 , Contact Tracing , Mobile Applications , Public Health , Risk Assessment , Humans , Contact Tracing/methods , Contact Tracing/statistics & numerical data , COVID-19/epidemiology , COVID-19/transmission , Pandemics , SARS-CoV-2 , State Medicine , Time Factors , England/epidemiology , Wales/epidemiology , Models, Statistical , Family Characteristics , Public Health/methods , Public Health/trends
4.
Viruses ; 15(2)2023 01 18.
Article En | MEDLINE | ID: mdl-36851491

Understanding how geography and human mobility shape the patterns and spread of infectious diseases such as COVID-19 is key to control future epidemics. An interesting example is provided by the second wave of the COVID-19 epidemic in Europe, which was facilitated by the intense movement of tourists around the Mediterranean coast in summer 2020. The Italian island of Sardinia is a major tourist destination and is widely believed to be the origin of the second Italian wave. In this study, we characterize the genetic variation among SARS-CoV-2 strains circulating in northern Sardinia during the first and second Italian waves using both Illumina and Oxford Nanopore Technologies Next Generation Sequencing methods. Most viruses were placed into a single clade, implying that despite substantial virus inflow, most outbreaks did not spread widely. The second epidemic wave on the island was actually driven by local transmission of a single B.1.177 subclade. Phylogeographic analyses further suggest that those viral strains circulating on the island were not a relevant source for the second epidemic wave in Italy. This result, however, does not rule out the possibility of intense mixing and transmission of the virus among tourists as a major contributor to the second Italian wave.


COVID-19 , SARS-CoV-2 , Humans , SARS-CoV-2/genetics , COVID-19/epidemiology , Molecular Epidemiology , Italy/epidemiology , Phylogeography , Genetic Variation
5.
J Theor Biol ; 548: 111186, 2022 09 07.
Article En | MEDLINE | ID: mdl-35697144

The coalescent model represents how individuals sampled from a population may have originated from a last common ancestor. The bounded coalescent model is obtained by conditioning the coalescent model such that the last common ancestor must have existed after a certain date. This conditioned model arises in a variety of applications, such as speciation, horizontal gene transfer or transmission analysis, and yet the bounded coalescent model has not been previously analysed in detail. Here we describe a new algorithm to simulate from this model directly, without resorting to rejection sampling. We show that this direct simulation algorithm is more computationally efficient than the rejection sampling approach. We also show how to calculate the probability of the last common ancestor occurring after a given date, which is required to compute the probability density of realisations under the bounded coalescent model. Our results are applicable in both the isochronous (when all samples have the same date) and heterochronous (where samples can have different dates) settings. We explore the effect of setting a bound on the date of the last common ancestor, and show that it affects a number of properties of the resulting phylogenies. All our methods are implemented in a new R package called BoundedCoalescent which is freely available online.


Algorithms , Models, Genetic , Computer Simulation , Genetics, Population , Humans , Phylogeny , Probability
6.
Microb Genom ; 8(4)2022 04.
Article En | MEDLINE | ID: mdl-35442183

A hospital outbreak of carbapenem-resistant Enterobacterales was detected by routine surveillance. Whole genome sequencing and subsequent analysis revealed a conserved promiscuous blaOXA-48 carrying plasmid as the defining factor within this outbreak. Four different species of Enterobacterales were involved in the outbreak. Escherichia coli ST399 accounted for 35 of all the 55 isolates. Comparative genomics analysis using publicly available E. coli ST399 genomes showed that the outbreak E. coli ST399 isolates formed a unique clade. We developed a mathematical model of pOXA-48-like plasmid transmission between host lineages and used it to estimate its conjugation rate, giving a lower bound of 0.23 conjugation events per lineage per year. Our analysis suggests that co-evolution between the pOXA-48-like plasmid and E. coli ST399 could have played a role in the outbreak. This is the first study to report carbapenem-resistant E. coli ST399 carrying blaOXA-48 as the main cause of a plasmid-borne outbreak within a hospital setting. Our findings suggest complementary roles for both plasmid conjugation and clonal expansion in the emergence of this outbreak.


Carbapenems , Escherichia coli Infections , Carbapenems/pharmacology , Escherichia coli/genetics , Escherichia coli/metabolism , Escherichia coli Infections/epidemiology , Hospitals , Humans , Klebsiella pneumoniae/genetics , Plasmids/genetics , beta-Lactamases/genetics , beta-Lactamases/metabolism
7.
Syst Biol ; 71(5): 1073-1087, 2022 08 10.
Article En | MEDLINE | ID: mdl-34893904

Microbial population genetics models often assume that all lineages are constrained by the same population size dynamics over time. However, many neutral and selective events can invalidate this assumption and can contribute to the clonal expansion of a specific lineage relative to the rest of the population. Such differential phylodynamic properties between lineages result in asymmetries and imbalances in phylogenetic trees that are sometimes described informally but which are difficult to analyze formally. To this end, we developed a model of how clonal expansions occur and affect the branching patterns of a phylogeny. We show how the parameters of this model can be inferred from a given dated phylogeny using Bayesian statistics, which allows us to assess the probability that one or more clonal expansion events occurred. For each putative clonal expansion event, we estimate its date of emergence and subsequent phylodynamic trajectory, including its long-term evolutionary potential which is important to determine how much effort should be placed on specific control measures. We demonstrate the applicability of our methodology on simulated and real data sets. Inference under our clonal expansion model can reveal important features in the evolution and epidemiology of infectious disease pathogens. [Clonal expansion; genomic epidemiology; microbial population genomics; phylodynamics.].


Genetics, Population , Genomics , Bayes Theorem , Phylogeny , Probability
8.
Epidemics ; 36: 100472, 2021 09.
Article En | MEDLINE | ID: mdl-34153623

INTRODUCTION: Many countries with an early outbreak of SARS-CoV-2 struggled to gauge the size and start date of the epidemic mainly due to limited testing capacities and a large proportion of undetected asymptomatic and mild infections. Iran was among the first countries with a major outbreak outside China. METHODS: We constructed a globally representative sample of 802 genomes, including 46 samples from patients inside or with a travel history to Iran. We then performed a phylogenetic analysis to identify clades related to samples from Iran and estimated the start of the epidemic and early doubling times in cases. We leveraged air travel data from 36 exported cases of COVID-19 to estimate the point-prevalence and the basic reproductive number across the country. We also analysed the province-level all-cause mortality data during winter and spring 2020 to estimate under-reporting of COVID-19-related deaths. Finally, we use this information in an SEIR model to reconstruct the early outbreak dynamics and assess the effectiveness of intervention measures in Iran. RESULTS: By identifying the most basal clade that contained genomes from Iran, our phylogenetic analysis showed that the age of the root is placed on 2019-12-21 (95 % HPD: 2019-09-07 - 2020-02-14). This date coincides with our estimated epidemic start date on 2019-12-25 (95 %CI: 2019-12-11 - 2020-02-24) based air travel data from exported cases with an early doubling time of 4.0 (95 %CI: 1.4-6.7) days in cases. Our analysis of all-cause mortality showed 21.9 (95 % CI: 16.7-27.2) thousand excess deaths by the end of summer. Our model forecasted the second epidemic peak and suggested that by 2020-08-31 a total of 15.0 (95 %CI: 4.9-25.0) million individuals recovered from the disease across the country. CONCLUSION: These findings have profound implications for assessing the stage of the epidemic in Iran despite significant levels of under-reporting. Moreover, the results shed light on the dynamics of SARS-CoV-2 transmissions in Iran and central Asia. They also suggest that in the absence of border screening, there is a high risk of introduction from travellers from areas with active outbreaks. Finally, they show both that well-informed epidemic models are able to forecast episodes of resurgence following a relaxation of interventions, and that NPIs are key to controlling ongoing epidemics.


COVID-19 , Epidemics , Humans , Iran/epidemiology , Phylogeny , SARS-CoV-2
9.
Int J Infect Dis ; 102: 463-471, 2021 Jan.
Article En | MEDLINE | ID: mdl-33130212

OBJECTIVES: In this data collation study, we aimed to provide a comprehensive database describing the epidemic trends and responses during the first wave of coronavirus disease 2019 (COVID-19) throughout the main provinces in China. METHODS: From mid-January to March 2020, we extracted publicly available data regarding the spread and control of COVID-19 from 31 provincial health authorities and major media outlets in mainland China. Based on these data, we conducted descriptive analyses of the epidemic in the six most-affected provinces. RESULTS: School closures, travel restrictions, community-level lockdown, and contact tracing were introduced concurrently around late January but subsequent epidemic trends differed among provinces. Compared with Hubei, the other five most-affected provinces reported a lower crude case fatality ratio and proportion of critical and severe hospitalised cases. From March 2020, as the local transmission of COVID-19 declined, switching the focus of measures to the testing and quarantine of inbound travellers may have helped to sustain the control of the epidemic. CONCLUSIONS: Aggregated indicators of case notifications and severity distributions are essential for monitoring an epidemic. A publicly available database containing these indicators and information regarding control measures is a useful resource for further research and policy planning in response to the COVID-19 epidemic.


COVID-19/epidemiology , SARS-CoV-2 , COVID-19/prevention & control , China/epidemiology , Contact Tracing , Databases, Factual , Humans
10.
Viruses ; 12(11)2020 11 02.
Article En | MEDLINE | ID: mdl-33147786

The expression of accessory non-structural proteins V and W in Newcastle disease virus (NDV) infections depends on RNA editing. These proteins are derived from frameshifts of the sequence coding for the P protein via co-transcriptional insertion of one or two guanines in the mRNA. However, a larger number of guanines can be inserted with lower frequencies. We analysed data from deep RNA sequencing of samples from in vitro and in vivo NDV infections to uncover the patterns of mRNA editing in NDV. The distribution of insertions is well described by a simple Markov model of polymerase stuttering, providing strong quantitative confirmation of the molecular process hypothesised by Kolakofsky and collaborators three decades ago. Our results suggest that the probability that the NDV polymerase would stutter is about 0.45 initially, and 0.3 for further subsequent insertions. The latter probability is approximately independent of the number of previous insertions, the host cell, and viral strain. However, in LaSota infections, we also observe deviations from the predicted V/W ratio of about 3:1 according to this model, which could be attributed to deviations from this stuttering model or to further mechanisms downregulating the abundance of W protein.


Capsid Proteins/genetics , Newcastle Disease/virology , Newcastle disease virus/genetics , RNA Editing , Viral Nonstructural Proteins/genetics , Animals , Cell Line , Chickens/virology , DNA-Directed DNA Polymerase/genetics , Data Analysis , Female , Fibroblasts/virology , High-Throughput Nucleotide Sequencing , Male , Markov Chains , Newcastle disease virus/enzymology
11.
Microb Genom ; 5(9)2019 09.
Article En | MEDLINE | ID: mdl-31389782

We undertook a comprehensive comparative analysis of a collection of 30 small (<25 kb) non-conjugative Escherichia coli plasmids previously classified by the gene sharing approach into 10 families, as well as plasmids found in the National Center for Biotechnology Information (NCBI) nucleotide database sharing similar genomic sequences. In total, 302 mobilizable (belonging to 2 MOBrep and 5 MOBRNA families) and 106 non-transferable/relaxase-negative (belonging to three ReLRNA families) plasmids were explored. The most striking feature was the specialization of the plasmid family types that was not related to their transmission mode and replication system. We observed a range of host strain specificity, from narrow E. coli host specificity to broad host range specificity, including a wide spectrum of Enterobacteriaceae. We found a wide variety of toxin/antitoxin systems and colicin operons in the plasmids, whose numbers and types varied according to the plasmid family type. The plasmids carried genes conferring resistance spanning almost all of the antibiotic classes, from those to which resistance developed early, such as sulphonamides, to those for which resistance has only developed recently, such as colistin. However, the prevalence of the resistance genes varied greatly according to the family type, ranging from 0 to 100 %. The evolutionary history of the plasmids based on the family type core genes showed variability within family nucleotide divergences in the range of E. coli chromosomal housekeeping genes, indicating long-term co-evolution between plasmids and host strains. In rare cases, a low evolutionary divergence suggested the massive spread of an epidemic plasmid. Overall, the importance of these small non-conjugative plasmids in bacterial adaptation varied greatly according to the type of family they belonged to, with each plasmid family having specific hosts and genetic traits.


Escherichia coli/genetics , Plasmids/metabolism , Databases, Genetic , Evolution, Molecular , Gene Frequency , Phylogeny , Plasmids/classification , Plasmids/genetics , Species Specificity
12.
Mol Biol Evol ; 36(8): 1686-1700, 2019 08 01.
Article En | MEDLINE | ID: mdl-31004162

One of the major challenges in evolutionary biology is the identification of the genetic basis of postzygotic reproductive isolation. Given its pivotal role in this process, here we explore the drivers that may account for the evolutionary dynamics of the PRDM9 gene between continental and island systems of chromosomal variation in house mice. Using a data set of nearly 400 wild-caught mice of Robertsonian systems, we identify the extent of PRDM9 diversity in natural house mouse populations, determine the phylogeography of PRDM9 at a local and global scale based on a new measure of pairwise genetic divergence, and analyze selective constraints. We find 57 newly described PRDM9 variants, this diversity being especially high on Madeira Island, a result that is contrary to the expectations of reduced variation for island populations. Our analysis suggest that the PRDM9 allelic variability observed in Madeira mice might be influenced by the presence of distinct chromosomal fusions resulting from a complex pattern of introgression or multiple colonization events onto the island. Importantly, we detect a significant reduction in the proportion of PRDM9 heterozygotes in Robertsonian mice, which showed a high degree of similarity in the amino acids responsible for protein-DNA binding. Our results suggest that despite the rapid evolution of PRDM9 and the variability detected in natural populations, functional constraints could facilitate the accumulation of allelic combinations that maintain recombination hotspot symmetry. We anticipate that our study will provide the basis for examining the role of different PRDM9 genetic backgrounds in reproductive isolation in natural populations.


Evolution, Molecular , Histone-Lysine N-Methyltransferase/genetics , Mice/genetics , Animals , Genetic Variation , Heterozygote , Phylogeography , Portugal , Selection, Genetic , Spain
13.
Microb Genom ; 4(9)2018 09.
Article En | MEDLINE | ID: mdl-30080134

To understand the evolutionary dynamics of extended-spectrum ß-lactamase (ESBL)-encoding genes in Escherichia coli, we undertook a comparative genomic analysis of 116 whole plasmid sequences of human or animal origin isolated over a period spanning before and after the use of third-generation cephalosporins (3GCs) using a gene-sharing network approach. The plasmids included 82 conjugative, 22 mobilizable and 9 non-transferable plasmids and 3 P-like bacteriophages. ESBL-encoding genes were found on 64 conjugative, 6 mobilizable, 2 non-transferable plasmids and 2 P1-like bacteriophages, indicating that these last three types of mobile elements also play a role, albeit modest, in the diffusion of the ESBLs. The network analysis showed that the plasmids clustered according to their genome backbone type, but not by origin or period of isolation or by antibiotic-resistance type, including type of ESBL-encoding gene. There was no association between the type of plasmid and the phylogenetic history of the parental strains. Finer scale analysis of the more abundant clusters IncF and IncI1 showed that ESBL-encoding plasmids and plasmids isolated before the use of 3GCs had the same diversity and phylogenetic history, and that acquisition of ESBL-encoding genes had occurred during multiple independent events. Moreover, the blaCTX-M-15 gene, unlike other CTX-M genes, was inserted at a hot spot in a blaTEM-1-Tn2 transposon. These findings showed that ESBL-encoding genes have arrived on wide range of pre-existing plasmids and that the successful spread of blaCTX-M-15 seems to be favoured by the presence of well-adapted IncF plasmids that carry a Tn2-blaTEM-1 transposon.


Escherichia coli/genetics , Plasmids/genetics , beta-Lactamases/genetics , Animals , Anti-Bacterial Agents/therapeutic use , Cephalosporins/therapeutic use , Cluster Analysis , Escherichia coli/classification , Escherichia coli/enzymology , Escherichia coli/isolation & purification , Genes, Bacterial , Humans , Phylogeny , Plasmids/classification , Sequence Analysis, DNA
14.
Genetics ; 207(1): 229-240, 2017 09.
Article En | MEDLINE | ID: mdl-28679545

We investigate the dependence of the site frequency spectrum on the topological structure of genealogical trees. We show that basic population genetic statistics, for instance, estimators of θ or neutrality tests such as Tajima's D, can be decomposed into components of waiting times between coalescent events and of tree topology. Our results clarify the relative impact of the two components on these statistics. We provide a rigorous interpretation of positive or negative values of an important class of neutrality tests in terms of the underlying tree shape. In particular, we show that values of Tajima's D and Fay and Wu's H depend in a direct way on a peculiar measure of tree balance, which is mostly determined by the root balance of the tree. We present a new test for selection in the same class as Fay and Wu's H and discuss its interpretation and power. Finally, we determine the trees corresponding to extreme expected values of these neutrality tests and present formulas for these extreme values as a function of sample size and number of segregating sites.


Models, Genetic , Mutation Rate , Phylogeny , Selection, Genetic
15.
J Antimicrob Chemother ; 72(5): 1285-1288, 2017 05 01.
Article En | MEDLINE | ID: mdl-28108681

Objectives: MRSA is a leading cause of hospital-associated infection. Acquired resistance is encoded by the mecA gene or its homologue mecC , but little is known about the evolutionary dynamics involved in gain and loss of resistance. The objective of this study was to obtain an expanded understanding of Staphylococcus aureus methicillin resistance microevolution in vivo , by focusing on a single lineage. Methods: We compared the whole-genome sequences of 231 isolates from a single epidemic lineage [clonal complex 30 (CC30) and spa -type t018] of S. aureus that caused an epidemic in the UK. Results: We show that resistance to methicillin in this single lineage was gained on at least two separate occasions, one of which led to a clonal expansion around 1995 presumably caused by a selective advantage. Resistance was, however, subsequently lost in vivo by nine strains isolated between 2008 and 2012. We describe the genetic mechanisms involved in this loss of resistance and the imperfect relationship between genotypic and phenotypic resistance. Conclusions: The recent re-emergence of methicillin susceptibility in this epidemic lineage suggests a significant fitness cost of resistance and reduced selective advantage following the introduction in the mid-2000s of MRSA hospital control measures throughout the UK.


Methicillin Resistance/genetics , Methicillin-Resistant Staphylococcus aureus/drug effects , Methicillin-Resistant Staphylococcus aureus/genetics , Methicillin/pharmacology , Cross Infection/epidemiology , Cross Infection/microbiology , DNA, Bacterial/genetics , Evolution, Molecular , Genetic Fitness , Genome, Bacterial , Genotype , Humans , Methicillin-Resistant Staphylococcus aureus/isolation & purification , Microbial Sensitivity Tests , Phenotype , Staphylococcal Infections/epidemiology , Staphylococcal Infections/microbiology , United Kingdom/epidemiology
16.
PLoS One ; 9(9): e108738, 2014.
Article En | MEDLINE | ID: mdl-25268639

The Escherichia coli species is divided in phylogenetic groups that differ in their virulence and commensal distribution. Strains belonging to the B2 group are involved in extra-intestinal pathologies but also appear to be more prevalent as commensals among human occidental populations. To investigate the genetic specificities of B2 sub-group, we used 128 sequenced genomes and identified genes of the core genome that showed marked difference between B2 and non-B2 genomes. We focused on the gene and its surrounding region with the strongest divergence between B2 and non-B2, the antiporter gene nhaA. This gene is part of the nhaAR operon, which is in the core genome but flanked by mobile regions, and is involved in growth at high pH and high sodium concentrations. Consistently, we found that a panel of non-B2 strains grew faster than B2 at high pH and high sodium concentrations. However, we could not identify differences in expression of the nhaAR operon using fluorescence reporter plasmids. Furthermore, the operon deletion had no differential impact between B2 and non-B2 strains, and did not result in a fitness modification in a murine model of gut colonization. Nevertheless, sequence analysis and experiments in a murine model of septicemia revealed that recombination in nhaA among B2 strains was observed in strains with low virulence. Finally, nhaA and nhaAR operon deletions drastically decreased virulence in one B2 strain. This effect of nhaAR deletion appeared to be stronger than deletion of all pathogenicity islands. Thus, a population genetic approach allowed us to identify an operon in the core genome without strong effect in commensalism but with an important role in extra-intestinal virulence, a landmark of the B2 strains.


DNA-Binding Proteins/genetics , Escherichia coli Proteins/genetics , Escherichia coli/pathogenicity , Sodium-Hydrogen Exchangers/genetics , Transcription Factors/genetics , Animals , DNA-Binding Proteins/classification , DNA-Binding Proteins/metabolism , Disease Models, Animal , Escherichia coli/growth & development , Escherichia coli Proteins/classification , Escherichia coli Proteins/metabolism , Female , Genome, Bacterial , Hydrogen-Ion Concentration , Mice , Operon , Osmolar Concentration , Phenotype , Phylogeny , Sepsis/microbiology , Sepsis/mortality , Sepsis/pathology , Sodium-Hydrogen Exchangers/classification , Sodium-Hydrogen Exchangers/metabolism , Survival Rate , Transcription Factors/classification , Transcription Factors/metabolism , Virulence
17.
Phys Rev E Stat Nonlin Soft Matter Phys ; 85(6 Pt 2): 066124, 2012 Jun.
Article En | MEDLINE | ID: mdl-23005179

Sequence motifs are words of nucleotides in DNA with biological functions, e.g., gene regulation. Identification of such words proceeds through rejection of Markov models on the expected motif frequency along the genome. Additional biological information can be extracted from the correlation structure among patterns of motif occurrences. In this paper a log-linear multivariate intensity Poisson model is estimated via expectation maximization on a set of motifs along the genome of E. coli K12. The proposed approach allows for excitatory as well as inhibitory interactions among motifs and between motifs and other genomic features like gene occurrences. Our findings confirm previous stylized facts about such types of interactions and shed new light on genome-maintenance functions of some particular motifs. We expect these methods to be applicable to a wider set of genomic features.


Algorithms , Chromosome Mapping/methods , DNA/chemistry , DNA/genetics , Models, Genetic , Models, Statistical , Sequence Analysis, DNA/methods , Computer Simulation , Markov Chains , Structure-Activity Relationship
18.
Biol Direct ; 7: 30, 2012 Sep 13.
Article En | MEDLINE | ID: mdl-22974057

BACKGROUND: The evolution and genomic stop codon frequencies have not been rigorously studied with the exception of coding of non-canonical amino acids. Here we study the rate of evolution and frequency distribution of stop codons in bacterial genomes. RESULTS: We show that in bacteria stop codons evolve slower than synonymous sites, suggesting the action of weak negative selection. However, the frequency of stop codons relative to genomic nucleotide content indicated that this selection regime is not straightforward. The frequency of TAA and TGA stop codons is GC-content dependent, with TAA decreasing and TGA increasing with GC-content, while TAG frequency is independent of GC-content. Applying a formal, analytical model to these data we found that the relationship between stop codon frequencies and nucleotide content cannot be explained by mutational biases or selection on nucleotide content. However, with weak nucleotide content-dependent selection on TAG, -0.5 < Nes < 1.5, the model fits all of the data and recapitulates the relationship between TAG and nucleotide content. For biologically plausible rates of mutations we show that, in bacteria, TAG stop codon is universally associated with lower fitness, with TAA being the optimal for G-content < 16% while for G-content > 16% TGA has a higher fitness than TAG. CONCLUSIONS: Our data indicate that TAG codon is universally suboptimal in the bacterial lineage, such that TAA is likely to be the preferred stop codon for low GC content while the TGA is the preferred stop codon for high GC content. The optimization of stop codon usage may therefore be useful in genome engineering or gene expression optimization applications.


Bacteria/genetics , Codon, Terminator/genetics , Evolution, Molecular , Gene Expression Regulation, Bacterial , Genome, Bacterial , Base Composition , Models, Genetic , Mutation
19.
Genome Res ; 20(6): 745-54, 2010 Jun.
Article En | MEDLINE | ID: mdl-20335526

Amino acid tandem repeats are found in a large number of eukaryotic proteins. They are often encoded by trinucleotide repeats and exhibit high intra- and interspecies size variability due to the high mutation rate associated with replication slippage. The extent to which natural selection is important in shaping amino acid repeat evolution is a matter of debate. On one hand, their high frequency may simply reflect their high probability of expansion by slippage, and they could essentially evolve in a neutral manner. On the other hand, there is experimental evidence that changes in repeat size can influence protein-protein interactions, transcriptional activity, or protein subcellular localization, indicating that repeats could be functionally relevant and thus shaped by selection. To gauge the relative contribution of neutral and selective forces in amino acid repeat evolution, we have performed a comparative analysis of amino acid repeat conservation in a large set of orthologous proteins from 12 vertebrate species. As a neutral model of repeat evolution we have used sequences with the same DNA triplet composition as the coding sequences--and thus expected to be subject to the same mutational forces--but located in syntenic noncoding genomic regions. The results strongly indicate that selection has played a more important role than previously suspected in amino acid tandem repeat evolution, by increasing the repeat retention rate and by modulating repeat size. The data obtained in this study have allowed us to identify a set of 92 repeats that are postulated to play important functional roles due to their strong selective signature, including five cases with direct experimental evidence.


Amino Acids/genetics , Proteins/genetics , Repetitive Sequences, Amino Acid , Selection, Genetic , Amino Acid Sequence , Amino Acids/chemistry , Animals , Humans , Molecular Sequence Data , Proteins/chemistry , Sequence Homology, Amino Acid
20.
PLoS Genet ; 5(3): e1000397, 2009 Mar.
Article En | MEDLINE | ID: mdl-19266028

Single amino acid repeats are prevalent in eukaryote organisms, although the role of many such sequences is still poorly understood. We have performed a comprehensive analysis of the proteins containing homopolymeric histidine tracts in the human genome and identified 86 human proteins that contain stretches of five or more histidines. Most of them are endowed with DNA- and RNA-related functions, and, in addition, there is an overrepresentation of proteins expressed in the brain and/or nervous system development. An analysis of their subcellular localization shows that 15 of the 22 nuclear proteins identified accumulate in the nuclear subcompartment known as nuclear speckles. This localization is lost when the histidine repeat is deleted, and significantly, closely related paralogous proteins without histidine repeats also fail to localize to nuclear speckles. Hence, the histidine tract appears to be directly involved in targeting proteins to this compartment. The removal of DNA-binding domains or treatment with RNA polymerase II inhibitors induces the re-localization of several polyhistidine-containing proteins from the nucleoplasm to nuclear speckles. These findings highlight the dynamic relationship between sites of transcription and nuclear speckles. Therefore, we define the histidine repeats as a novel targeting signal for nuclear speckles, and we suggest that these repeats are a way of generating evolutionary diversification in gene duplicates. These data contribute to our better understanding of the physiological role of single amino acid repeats in proteins.


Cell Nucleus/metabolism , Genome, Human , Histidine/chemistry , Nuclear Localization Signals , Proteins/metabolism , Amino Acids , Cell Line , Cell Nucleus/chemistry , Cell Nucleus/genetics , Histidine/genetics , Histidine/metabolism , Humans , Molecular Sequence Data , Nuclear Proteins/chemistry , Nuclear Proteins/genetics , Nuclear Proteins/metabolism , Protein Transport , Proteins/chemistry , Proteins/genetics , Sequence Alignment , Tandem Repeat Sequences
...