Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 28
Filter
Add more filters










Publication year range
1.
medRxiv ; 2024 May 16.
Article in English | MEDLINE | ID: mdl-38903069

ABSTRACT

Whole-genome sequencing of bacterial pathogens is used by public health agencies to link cases of food poisoning caused by the same source of contamination. The vast majority of these appear to be sporadic cases associated with small contamination episodes and do not trigger investigations. We analyzed clusters of sequenced clinical isolates of Salmonella, Escherichia coli, Campylobacter, and Listeria that differ by only a small number of mutations to provide a new understanding of the underlying contamination episodes. These analyses provide new evidence that the youngest age groups have greater susceptibility to infection from Salmonella, Escherichia coli, and Campylobacter than older age groups. This age bias is weaker for the common Salmonella serovar Enteritidis than Salmonella in general. Analysis of these clusters reveals significant regional variations in relative frequencies of Salmonella serovars across the United States. A large fraction of the contamination episodes causing sickness appear to have long duration. For example, 50% of the Salmonella cases are in clusters that persist for almost three years. For all four pathogen species, the majority of the cases were part of genetic clusters with illnesses in multiple states and likely to be caused by contaminated commercially distributed foods. The vast majority of Salmonella cases among infants < 6 months of age appear to be caused by cross-contamination from foods consumed by older age groups or by environmental bacteria rather than infant formula contaminated at production sites.

2.
mBio ; 15(1): e0264923, 2024 Jan 16.
Article in English | MEDLINE | ID: mdl-38078770

ABSTRACT

IMPORTANCE: For decades, researchers have studied the rapid evolution of influenza A viruses for vaccine design and as a useful model system for the study of host/parasite evolution. By performing an exhaustive analysis of hemagglutinin protein (HA) sequences from 49 lineages independently evolving in birds, swine, canines, equines, and humans over the last century, our work uncovers surprising features of HA evolution. In particular, the canine H3 stalk, unlike human H3 and H1 stalk domains, is not evolving slowly, suggesting that evolution in the stalk domain is not universally constrained across all host species. Therefore, a broader multi-host perspective on HA evolution may be useful during the evaluation and design of stalk-targeted vaccine candidates.


Subject(s)
Influenza A virus , Influenza Vaccines , Influenza, Human , Orthomyxoviridae Infections , Vaccines , Animals , Dogs , Humans , Swine , Horses , Influenza A virus/genetics , Hemagglutinin Glycoproteins, Influenza Virus , Hemagglutinins , Host Specificity , Antibodies, Viral
3.
PLoS Comput Biol ; 19(8): e1011419, 2023 08.
Article in English | MEDLINE | ID: mdl-37639445

ABSTRACT

Inferring dependencies between mixed-type biological traits while accounting for evolutionary relationships between specimens is of great scientific interest yet remains infeasible when trait and specimen counts grow large. The state-of-the-art approach uses a phylogenetic multivariate probit model to accommodate binary and continuous traits via a latent variable framework, and utilizes an efficient bouncy particle sampler (BPS) to tackle the computational bottleneck-integrating many latent variables from a high-dimensional truncated normal distribution. This approach breaks down as the number of specimens grows and fails to reliably characterize conditional dependencies between traits. Here, we propose an inference pipeline for phylogenetic probit models that greatly outperforms BPS. The novelty lies in 1) a combination of the recent Zigzag Hamiltonian Monte Carlo (Zigzag-HMC) with linear-time gradient evaluations and 2) a joint sampling scheme for highly correlated latent variables and correlation matrix elements. In an application exploring HIV-1 evolution from 535 viruses, the inference requires joint sampling from an 11,235-dimensional truncated normal and a 24-dimensional covariance matrix. Our method yields a 5-fold speedup compared to BPS and makes it possible to learn partial correlations between candidate viral mutations and virulence. Computational speedup now enables us to tackle even larger problems: we study the evolution of influenza H1N1 glycosylations on around 900 viruses. For broader applicability, we extend the phylogenetic probit model to incorporate categorical traits, and demonstrate its use to study Aquilegia flower and pollinator co-evolution.


Subject(s)
Influenza A Virus, H1N1 Subtype , Bayes Theorem , Influenza A Virus, H1N1 Subtype/genetics , Phylogeny , Flowers , Glycosylation
4.
Genome Biol Evol ; 15(6)2023 06 01.
Article in English | MEDLINE | ID: mdl-37216188

ABSTRACT

The rate of mutation varies among positions in a genome. Local sequence context can affect the rate and has different effects on different types of mutation. Here, I report an effect of local context that operates to some extent in all bacteria examined: the rate of T→G mutation is greatly increased by preceding runs of three or more G residues. The strength of the effect increases with the length of the run. In Salmonella, in which the effect is strongest, a G run of length three 3 increases the rate by a factor of ∼26, a run of length 4 increases it by almost a factor of 100, and runs of length 5 or more increase it by a factor of more than 400 on average. The effect is much stronger when the T is on the leading rather than the lagging strand of DNA replication. Several observations eliminate the possibility that this effect is an artifact of sequencing error.


Subject(s)
Bacteria , DNA Replication , Mutation , Bacteria/genetics
5.
Appl Environ Microbiol ; 89(1): e0167022, 2023 01 31.
Article in English | MEDLINE | ID: mdl-36519847

ABSTRACT

Metagenomic sequencing is a swift and powerful tool to ascertain the presence of an organism of interest in a sample. However, sequencing coverage of the organism of interest can be insufficient due to an inundation of reads from irrelevant organisms in the sample. Here, we report a nuclease-based approach to rapidly enrich for DNA from certain organisms, including enterobacteria, based on their differential endogenous modification patterns. We exploit the ability of taxon-specific methylated motifs to resist the action of cognate methylation-sensitive restriction endonucleases that thereby digest unwanted, unmethylated DNA. Subsequently, we use a distributive exonuclease or electrophoretic separation to deplete or exclude the digested fragments, thus enriching for undigested DNA from the organism of interest. As a proof of concept, we apply this method to enrich for the enterobacteria Escherichia coli and Salmonella enterica by 11- to 142-fold from mock metagenomic samples and validate this approach as a versatile means to enrich for genomes of interest in metagenomic samples. IMPORTANCE Pathogens that contaminate the food supply or spread through other means can cause outbreaks that bring devastating repercussions to the health of a populace. Investigations to trace the source of these outbreaks are initiated rapidly but can be drawn out due to the labored methods of pathogen isolation. Metagenomic sequencing can alleviate this hurdle but is often insufficiently sensitive. The approach and implementations detailed here provide a rapid means to enrich for many pathogens involved in foodborne outbreaks, thereby improving the utility of metagenomic sequencing as a tool in outbreak investigations. Additionally, this approach provides a means to broadly enrich for otherwise minute levels of modified DNA, which may escape unnoticed in metagenomic samples.


Subject(s)
DNA Restriction Enzymes , DNA, Bacterial , Escherichia coli , Metagenomics , Salmonella enterica , DNA , Escherichia coli/genetics , Escherichia coli/isolation & purification , High-Throughput Nucleotide Sequencing , Metagenome , Metagenomics/methods , Salmonella enterica/genetics , Salmonella enterica/isolation & purification , DNA, Bacterial/genetics
6.
Viruses ; 14(7)2022 07 15.
Article in English | MEDLINE | ID: mdl-35891531

ABSTRACT

Four seasonal human coronaviruses (sHCoVs) are endemic globally (229E, NL63, OC43, and HKU1), accounting for 5-30% of human respiratory infections. However, the epidemiology and evolution of these CoVs remain understudied due to their association with mild symptomatology. Using a multigene and complete genome analysis approach, we find the evolutionary histories of sHCoVs to be highly complex, owing to frequent recombination of CoVs including within and between sHCoVs, and uncertain, due to the under sampling of non-human viruses. The recombination rate was highest for 229E and OC43 whereas substitutions per recombination event were highest in NL63 and HKU1. Depending on the gene studied, OC43 may have ungulate, canine, or rabbit CoV ancestors. 229E may have origins in a bat, camel, or an unsampled intermediate host. HKU1 had the earliest common ancestor (1809-1899) but fell into two distinct clades (genotypes A and B), possibly representing two independent transmission events from murine-origin CoVs that appear to be a single introduction due to large gaps in the sampling of CoVs in animals. In fact, genotype B was genetically more diverse than all the other sHCoVs. Finally, we found shared amino acid substitutions in multiple proteins along the non-human to sHCoV host-jump branches. The complex evolution of CoVs and their frequent host switches could benefit from continued surveillance of CoVs across non-human hosts.


Subject(s)
Coronavirus Infections , Coronavirus , Respiratory Tract Infections , Animals , Coronavirus/genetics , Coronavirus Infections/epidemiology , Coronavirus Infections/veterinary , Dogs , Humans , Mice , Rabbits , Seasons , Sequence Analysis, DNA
7.
Microbiol Spectr ; 10(3): e0050122, 2022 06 29.
Article in English | MEDLINE | ID: mdl-35467376

ABSTRACT

Enterohemorrhagic E. coli (EHEC) is responsible for significant human illness, death, and economic loss. The main reservoir for EHEC is cattle, but plant-based foods are common vectors for human infection. Several outbreaks have been attributed to lettuce and leafy green vegetables grown in the Salinas and Santa Maria regions of California. Bacteria causing different outbreaks are mostly not close relatives, but one group of closely-related O157:H7 has caused several of them. This unusual pattern of recurrence may have some genetic basis. Here I use whole-genome sequences to reconstruct the genetic changes that occurred in the recent ancestry of this EHEC. In a short period of time corresponding to little genetic change, there were several changes to adhesion-related sequences, mainly adhesins. These changes may have greatly altered the adhesive properties of the bacteria. Possible consequences include increased persistence of cattle infections, more bacteria shed in cattle feces, and greater virulence in humans. Similar constellations of genetic change, which are detectable by current sequencing-based surveillance, may identify other bacteria that are particular threats to human health. In addition, the Santa Maria subclade carries a nonsense mutation affecting ArsR, a repressor of genes that confer resistance to arsenic and antimony. This suggests that the persistent source of Santa Maria contamination is located in an area with arsenic-contaminated groundwater, a problem in many parts of California. This inference may aid identification of the reservoir of EHEC, which would greatly aid mitigation efforts. IMPORTANCE Food-borne bacterial infections cause substantial illness and death. Understanding how bacteria contaminate food and cause disease is important for combating the problem. Closely-related E. coli, likely originating in cattle, have repeatedly caused outbreaks spread by vegetables grown in California. Such recurrence is atypical, and might have a genetic basis. The genetic changes that occurred in the recent ancestry of these E. coli can be reconstructed from their DNA sequences. Several mutations affect genes involved in bacterial adhesion. These might affect persistence of infection in cattle, quantity of bacteria in their feces, and human disease. They also suggest a way of detecting dangerous bacteria from their genome sequences. Furthermore, a subgroup carries a mutation affecting the regulation of genes conferring arsenic resistance. This suggests that the reservoir for contamination utilizes groundwater contaminated with arsenic, a problem in parts of California. This observation may be an aid to locating the persistent reservoir of contamination.


Subject(s)
Arsenic , Enterohemorrhagic Escherichia coli , Escherichia coli Infections , Escherichia coli O157 , Animals , Cattle , Disease Outbreaks , Enterohemorrhagic Escherichia coli/genetics , Escherichia coli Infections/epidemiology , Escherichia coli Infections/microbiology , Escherichia coli Infections/veterinary , Escherichia coli O157/genetics , Lactuca/microbiology
8.
mBio ; 12(2)2021 04 13.
Article in English | MEDLINE | ID: mdl-33849975

ABSTRACT

Methylation of cytosine in DNA at position C5 increases the rate of C→T mutations in bacteria and eukaryotes. Methylation at the N4 position, employed by some restriction-modification systems, is not known to increase the mutation rate. Here, I report that a Salmonella enterica Type III restriction-modification system that includes a cytosine-N4 methyltransferase causes an enormous increase in the rate of mutation of the methylated cytosines, which occur at the overlined C in the motif CACC̅GT Mutations consist mainly of C→A transversions, the rate of which is increased ∼500-fold by the restriction-modification system. The rate of C→T transitions is also increased and somewhat exceeds that at C5-methylated cytosines in Dcm sites. Two other Salmonella N4 methyltransferases investigated do not have such dramatic effects, although in one case there is a modest increase in C→A mutations along with an increase in C→T mutations. The sensitivity of the C→A rate to orientation with respect to both DNA replication and transcription is higher at hypermutable sites than at other cytosines, suggesting a fundamental mechanistic difference between hypermutation and ordinary mutation.IMPORTANCE Mutation produces the raw material for adaptive evolution but also imposes a burden because most mutations are deleterious. The rate of mutation at a particular site is affected by a variety of factors. In both prokaryotes and eukaryotes, methylation of C at the C5 position, a naturally occurring DNA modification, greatly increases the rate of C→T mutation. A distinct C modification that occurs in prokaryotes, methylation at N4, is not known to increase mutation rate. Here, I report that a bacterial restriction-modification system, found in some Salmonella bacteria, increases the rate of C→A mutation by a factor of 500 at sites that it methylates at N4. This rate increase is much greater than that caused by C5 methylation. Although fewer than 1 in 1,600 positions analyzed are methylation sites, over 10% of all mutations occur at these sites. Like other examples of extremely high mutation rate, whether naturally occurring or the result of laboratory mutation, this phenomenon may shed light on the mechanism of mutation in general.


Subject(s)
Cytosine/metabolism , DNA Methylation , Methyltransferases/metabolism , Mutation , Salmonella enterica/genetics , Base Sequence , Salmonella enterica/enzymology , Salmonella enterica/metabolism , Substrate Specificity
9.
Genome Biol Evol ; 12(3): 18-34, 2020 03 01.
Article in English | MEDLINE | ID: mdl-32044996

ABSTRACT

Bacterial genes are sometimes found to be inactivated by mutation. This inactivation may be observable simply because selection for function is intermittent or too weak to eliminate inactive alleles quickly. Here, I investigate cases in Salmonella enterica where inactivation is instead positively selected. These are identified by a rate of introduction of premature stop codons to a gene that is higher than expected under selective neutrality, as assessed by comparison to the rate of synonymous changes. I identify 84 genes that meet this criterion at a 10% false discovery rate. Many of these genes are involved in virulence, motility and chemotaxis, biofilm formation, and resistance to antibiotics or other toxic substances. It is hypothesized that most of these genes are subject to an ongoing process in which inactivation is favored under rare conditions, but the inactivated allele is deleterious under most other conditions and is subsequently driven to extinction by purifying selection.


Subject(s)
Genes, Bacterial , Mutation , Salmonella enterica/genetics , Selection, Genetic , Artifacts , Bacterial Proteins/genetics , Codon, Terminator , DNA Methylation , Evolution, Molecular , Ligases/genetics , Membrane Proteins/genetics , Membrane Transport Proteins/genetics , Phosphoric Diester Hydrolases/genetics , Polysaccharides, Bacterial/biosynthesis , Salmonella enterica/pathogenicity , Sigma Factor/genetics , Transcription Factors/metabolism , Virulence Factors/genetics
10.
J Bacteriol ; 200(24)2018 12 15.
Article in English | MEDLINE | ID: mdl-30275280

ABSTRACT

Methylation of DNA at the C-5 position of cytosine occurs in diverse organisms. This modification can increase the rate of C→T transitions at the methylated position. In Escherichia coli and related enteric bacteria, the inner C residues of the sequence CCWGG (W is A or T) are methylated by the Dcm enzyme. These sites are hot spots of mutation during rapid growth in the laboratory but not in nondividing cells, in which repair by the Vsr protein is effective. It has been suggested that hypermutation at these sites is a laboratory artifact and does not occur in nature. Many other methyltransferases, with a variety of specificities, can be found in bacteria, usually associated with restriction enzymes and confined to a subset of the population. Their methylation targets are also possible sites of hypermutation. Here, I show using whole-genome sequence data for thousands of isolates that there is indeed considerable hypermutation at Dcm sites in natural populations: their transition rate is approximately eight times the average. I also demonstrate hypermutability of targets of restriction-associated methyltransferases in several distantly related bacteria: methylation increases the transition rate by a factor ranging from 12 to 58. In addition, I demonstrate how patterns of hypermutability inferred from massive sequence data can be used to determine previously unknown methylation patterns and methyltransferase specificities.IMPORTANCE A common type of DNA modification, addition of a methyl group to cytosine (C) at carbon atom C-5, can greatly increase the rate of mutation of the C to a T. In mammals, methylation of CG sequences increases the rate of CG→TG mutations. It is unknown whether cytosine C-5 methylation increases the mutation rate in bacteria under natural conditions. I show that sites methylated by the Dcm enzyme exhibit an 8-fold increase in mutation rate in natural bacterial populations. I also show that modifications at other sites in various bacteria also increase the mutation rate, in some cases by a factor of forty or more. Finally, I demonstrate how this phenomenon can be used to infer sequence specificities of methylation enzymes.


Subject(s)
Bacteria/growth & development , DNA Methylation , DNA, Bacterial/chemistry , Whole Genome Sequencing/methods , Bacteria/genetics , Binding Sites , Cytosine , DNA, Bacterial/genetics , Genome, Bacterial , Methyltransferases/metabolism , Mutation , Promoter Regions, Genetic
11.
BMC Bioinformatics ; 18(1): 127, 2017 Feb 23.
Article in English | MEDLINE | ID: mdl-28231758

ABSTRACT

BACKGROUND: Maximum compatibility is a method of phylogenetic reconstruction that is seldom applied to molecular sequences. It may be ideal for certain applications, such as reconstructing phylogenies of closely-related bacteria on the basis of whole-genome sequencing. RESULTS: Here I present an algorithm that rapidly computes phylogenies according to a compatibility criterion. Although based on solutions to the maximum clique problem, this algorithm deals properly with ambiguities in the data. The algorithm is applied to bacterial data sets containing up to nearly 2000 genomes with several thousand variable nucleotide sites. Run times are several seconds or less. Computational experiments show that maximum compatibility is less sensitive than maximum parsimony to the inclusion of nucleotide data that, though derived from actual sequence reads, has been identified as likely to be misleading. CONCLUSIONS: Maximum compatibility is a useful tool for certain phylogenetic problems, such as inferring the relationships among closely-related bacteria from whole-genome sequence data. The algorithm presented here rapidly solves fairly large problems of this type, and provides robustness against misleading characters than can pollute large-scale sequencing data.


Subject(s)
Algorithms , Evolution, Molecular , Genome, Bacterial , Phylogeny , Salmonella enterica/classification , Salmonella enterica/genetics , Sequence Analysis, DNA , Software
13.
Genome Biol Evol ; 5(3): 494-503, 2013.
Article in English | MEDLINE | ID: mdl-23436005

ABSTRACT

The sequences of different proteins evolve at different rates. The relative evolutionary rate (ER) of a single protein also changes over evolutionary time. The cause of this ER fluctuation remains uncertain, and study of this phenomenon may shed light on protein evolution more broadly. We have characterized ER fluctuation in mammals and Drosophila. We found little correlation between the amount of rate variation observed for a protein and such factors as its expression level or phylogenetic distribution. Perhaps more surprisingly, we found little correlation between our measure of rate variation and ER itself. We also investigated the extent to which the ERs of different domains of a protein vary independently. We found that rates of different domains do tend to vary together. In fact, rates at positions in different domains are coupled just as strongly as rates at equally distant positions in the same domain. These findings provide clues to the protein evolutionary process.


Subject(s)
Drosophila Proteins/genetics , Drosophila/genetics , Evolution, Molecular , Mammals/genetics , Proteins/genetics , Animals , Drosophila/classification , Humans , Macaca mulatta , Mammals/classification , Mice , Molecular Sequence Data , Mutation Rate , Phylogeny , Rats
14.
J Virol ; 87(3): 1400-10, 2013 Feb.
Article in English | MEDLINE | ID: mdl-23115287

ABSTRACT

Individuals <60 years of age had the lowest incidence of infection, with ~25% of these people having preexisting, cross-reactive antibodies to novel 2009 H1N1 influenza. Many people >60 years old also had preexisting antibodies to novel H1N1. These observations are puzzling because the seasonal H1N1 viruses circulating during the last 60 years were not antigenically similar to novel H1N1. We therefore hypothesized that a sequence of exposures to antigenically different seasonal H1N1 viruses can elicit an antibody response that protects against novel 2009 H1N1. Ferrets were preinfected with seasonal H1N1 viruses and assessed for cross-reactive antibodies to novel H1N1. Serum from infected ferrets was assayed for cross-reactivity to both seasonal and novel 2009 H1N1 strains. These results were compared to those of ferrets that were sequentially infected with H1N1 viruses isolated prior to 1957 or more-recently isolated viruses. Following seroconversion, ferrets were challenged with novel H1N1 influenza virus and assessed for viral titers in the nasal wash, morbidity, and mortality. There was no hemagglutination inhibition (HAI) cross-reactivity in ferrets infected with any single seasonal H1N1 influenza viruses, with limited protection to challenge. However, sequential H1N1 influenza infections reduced the incidence of disease and elicited cross-reactive antibodies to novel H1N1 isolates. The amount and duration of virus shedding and the frequency of transmission following novel H1N1 challenge were reduced. Exposure to multiple seasonal H1N1 influenza viruses, and not to any single H1N1 influenza virus, elicits a breadth of antibodies that neutralize novel H1N1 even though the host was never exposed to the novel H1N1 influenza viruses.


Subject(s)
Influenza A Virus, H1N1 Subtype/immunology , Orthomyxoviridae Infections/immunology , Orthomyxoviridae Infections/virology , Animals , Antibodies, Viral/blood , Cross Reactions , Disease Models, Animal , Ferrets , Hemagglutination Inhibition Tests , Nasal Cavity/virology , Orthomyxoviridae Infections/mortality , Orthomyxoviridae Infections/pathology , Survival Analysis , Viral Load , Virus Shedding
15.
PLoS One ; 7(7): e39435, 2012.
Article in English | MEDLINE | ID: mdl-22815705

ABSTRACT

BACKGROUND: During the 2009 influenza pandemic, individuals over the age of 60 had the lowest incidence of infection with approximately 25% of these people having pre-existing, cross-reactive antibodies to novel 2009 H1N1 influenza isolates. It was proposed that older people had pre-existing antibodies induced by previous 1918-like virus infection(s) that cross-reacted to novel H1N1 strains. METHODOLOGY/PRINCIPAL FINDINGS: Using antisera collected from a cohort of individuals collected before the second wave of novel H1N1 infections, only a minority of individuals with 1918 influenza specific antibodies also demonstrated hemagglutination-inhibition activity against the novel H1N1 influenza. In this study, we examined human antisera collected from individuals that ranged between the ages of 1 month and 90 years to determine the profile of seropositive influenza immunity to viruses representing H1N1 antigenic eras over the past 100 years. Even though HAI titers to novel 2009 H1N1 and the 1918 H1N1 influenza viruses were positively associated, the association was far from perfect, particularly for the older and younger age groups. CONCLUSIONS/SIGNIFICANCE: Therefore, there may be a complex set of immune responses that are retained in people infected with seasonal H1N1 that can contribute to the reduced rates of H1N1 influenza infection in older populations.


Subject(s)
Antibodies, Viral/immunology , Immune Sera/immunology , Influenza A Virus, H1N1 Subtype/immunology , Cross Reactions , Hemagglutinin Glycoproteins, Influenza Virus/immunology , Humans , Species Specificity , Viral Vaccines/immunology
16.
PLoS Curr ; 2: RRN1200, 2010 Dec 03.
Article in English | MEDLINE | ID: mdl-21152078

ABSTRACT

Severity of seasonal influenza A epidemics is related to the antigenic novelty of the predominant viral strains circulating each year. Support for a strong correlation between epidemic severity and antigenic drift comes from infectious challenge experiments on vaccinated animals and human volunteers, field studies of vaccine efficacy, prospective studies of subjects with laboratory-confirmed prior infections, and analysis of the connection between drift and severity from surveillance data. We show that, given data on the antigenic and sequence novelty of the hemagglutinin protein of clinical isolates of H3N2 virus from a season along with the corresponding data from prior seasons, we can accurately predict the influenza severity for that season. This model therefore provides a framework for making projections of the severity of the upcoming season using assumptions based on viral isolates collected in the current season. Our results based on two independent data sets from the US and Hong Kong suggest that seasonal severity is largely determined by the novelty of the hemagglutinin protein although other factors, including mutations in other influenza genes, co-circulating pathogens and weather conditions, might also play a role. These results should be helpful for the control of seasonal influenza and have implications for improvement of influenza surveillance.

17.
Genome Biol Evol ; 2: 757-69, 2010.
Article in English | MEDLINE | ID: mdl-20884723

ABSTRACT

There is great variation in the rates of sequence evolution among proteins encoded by the same genome. The strongest correlate of evolutionary rate is expression level: highly expressed proteins tend to evolve slowly. This observation has led to the proposal that a major determinant of protein evolutionary rate involves the toxic effects of protein that misfolds due to transcriptional and translational errors (the mistranslation-induced misfolding [MIM] hypothesis). Here, I present a model that explains the correlation of evolutionary rate and expression level by selection for function. The basis of this model is that selection keeps expression levels near optima that reflect a trade-off between beneficial effects of the protein's function and some nonspecific cost of expression (e.g., the biochemical cost of synthesizing protein). Simulations confirm the predictions of the model. Like the MIM hypothesis, this model predicts several other relationships that are observed empirically. Although the model is based on selection for protein function, it is consistent with findings that a protein's rate of evolution is at most weakly correlated with its importance for fitness as measured by gene knockout experiments.


Subject(s)
Evolution, Molecular , Gene Expression , Computer Simulation , Genome , Models, Biological , Mutation , Proteins/genetics , Proteins/metabolism , Selection, Genetic , Statistics as Topic , Time Factors
18.
Mol Biol Evol ; 27(3): 735-41, 2010 Mar.
Article in English | MEDLINE | ID: mdl-19910385

ABSTRACT

The sequences of proteins encoded by a genome evolve at different rates. A correlate of a protein's evolutionary rate is its expression level: highly expressed proteins tend to evolve slowly. Some explanations of rate variation and the correlation between rate and expression predict that more slowly evolving and more highly expressed proteins have more favorable equilibrium constants for folding. Proteins from thermophiles generally have more stable folds than proteins from mesophiles, and it is known that there are systematic differences in amino acid content between thermophilic and mesophilic proteins. I examined whether there are analogous correlations of amino acid frequencies with evolutionary rate and expression level within genomes. In most of the organisms analyzed, there is a striking tendency for more slowly evolving proteins to be more thermophile-like in their amino acid compositions when adjustments are made for variation in GC content. More highly expressed proteins also tend to be more thermophile-like by the same criteria. These results suggest that part of the evolutionary rate variation among proteins is due to variation in the strength of selection for stability of the folded state. They also suggest that increasing strength of this selective force with expression level plays a role in the correlation between evolutionary rate and expression level.


Subject(s)
Evolution, Molecular , Proteins/genetics , Amino Acids/chemistry , Amino Acids/genetics , Animals , Archaeal Proteins/chemistry , Archaeal Proteins/genetics , Bacterial Proteins/chemistry , Bacterial Proteins/genetics , Base Composition , Fungal Proteins/chemistry , Fungal Proteins/genetics , Hot Temperature , Humans , Normal Distribution , Proteins/chemistry , Regression Analysis , Statistics, Nonparametric
19.
PLoS Curr ; 1: RRN1001, 2009 Aug 18.
Article in English | MEDLINE | ID: mdl-20025194

ABSTRACT

The hemagglutinin protein of influenza virus bears several sites of N-linked asparagine glycosylation. The number and location of these sites varies with strain and substrain. The human H3 hemagglutinin has gained several glycosylation sites on the antigenically important globular head since its introduction to humans, presumably due to selection. Although there is abundant evidence that glycosylation can affect antigenic and functional properties of the protein, direct evidence for selection is lacking. We have analyzed gain and loss of glycosylation sites on the side branches of a large phylogenetic tree of H(3) HA1 sequences (branches off of the main, long-term line of descent). Side branches contrast with the main line of descent: losses of glycosylation sites are not uncommon, and they outnumber gains. Although other explanations are possible, this observation is consistent with weak selection for glycosylation sites or a more complicated pattern of selection. Furthermore, terminal and internal branches differ with respect to rates of gain and loss of glycosylation sites. This pattern would not be expected under selective neutrality, but is easily explained by weak selection or selection that changes with the immune state of the host population. Thus, it provides evidence that selection acts on the glycosylation state of hemagglutinin.

20.
Genome Res ; 19(7): 1316-23, 2009 Jul.
Article in English | MEDLINE | ID: mdl-19498102

ABSTRACT

Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers. Importantly, the project coordinates on manually reviewing inconsistent protein annotations between sites, as well as annotations for which new evidence suggests a revision is needed, to progressively converge on a complete protein-coding set for the human and mouse reference genomes, while maintaining a high standard of reliability and biological accuracy. To date, the project has identified 20,159 human and 17,707 mouse consensus coding regions from 17,052 human and 16,893 mouse genes. Three evaluation methods indicate that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS. The CCDS database thus centralizes the function of identifying well-supported, identically-annotated, protein-coding regions.


Subject(s)
Consensus Sequence , Genome , Open Reading Frames/genetics , Animals , Humans , Mice , Sequence Alignment
SELECTION OF CITATIONS
SEARCH DETAIL
...