Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
mBio ; 15(1): e0264923, 2024 Jan 16.
Artigo em Inglês | MEDLINE | ID: mdl-38078770

RESUMO

IMPORTANCE: For decades, researchers have studied the rapid evolution of influenza A viruses for vaccine design and as a useful model system for the study of host/parasite evolution. By performing an exhaustive analysis of hemagglutinin protein (HA) sequences from 49 lineages independently evolving in birds, swine, canines, equines, and humans over the last century, our work uncovers surprising features of HA evolution. In particular, the canine H3 stalk, unlike human H3 and H1 stalk domains, is not evolving slowly, suggesting that evolution in the stalk domain is not universally constrained across all host species. Therefore, a broader multi-host perspective on HA evolution may be useful during the evaluation and design of stalk-targeted vaccine candidates.


Assuntos
Vírus da Influenza A , Vacinas contra Influenza , Influenza Humana , Infecções por Orthomyxoviridae , Vacinas , Animais , Cães , Humanos , Suínos , Cavalos , Vírus da Influenza A/genética , Glicoproteínas de Hemaglutininação de Vírus da Influenza , Hemaglutininas , Especificidade de Hospedeiro , Anticorpos Antivirais
2.
PLoS Comput Biol ; 19(8): e1011419, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37639445

RESUMO

Inferring dependencies between mixed-type biological traits while accounting for evolutionary relationships between specimens is of great scientific interest yet remains infeasible when trait and specimen counts grow large. The state-of-the-art approach uses a phylogenetic multivariate probit model to accommodate binary and continuous traits via a latent variable framework, and utilizes an efficient bouncy particle sampler (BPS) to tackle the computational bottleneck-integrating many latent variables from a high-dimensional truncated normal distribution. This approach breaks down as the number of specimens grows and fails to reliably characterize conditional dependencies between traits. Here, we propose an inference pipeline for phylogenetic probit models that greatly outperforms BPS. The novelty lies in 1) a combination of the recent Zigzag Hamiltonian Monte Carlo (Zigzag-HMC) with linear-time gradient evaluations and 2) a joint sampling scheme for highly correlated latent variables and correlation matrix elements. In an application exploring HIV-1 evolution from 535 viruses, the inference requires joint sampling from an 11,235-dimensional truncated normal and a 24-dimensional covariance matrix. Our method yields a 5-fold speedup compared to BPS and makes it possible to learn partial correlations between candidate viral mutations and virulence. Computational speedup now enables us to tackle even larger problems: we study the evolution of influenza H1N1 glycosylations on around 900 viruses. For broader applicability, we extend the phylogenetic probit model to incorporate categorical traits, and demonstrate its use to study Aquilegia flower and pollinator co-evolution.


Assuntos
Vírus da Influenza A Subtipo H1N1 , Teorema de Bayes , Vírus da Influenza A Subtipo H1N1/genética , Filogenia , Flores , Glicosilação
3.
Genome Biol Evol ; 15(6)2023 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-37216188

RESUMO

The rate of mutation varies among positions in a genome. Local sequence context can affect the rate and has different effects on different types of mutation. Here, I report an effect of local context that operates to some extent in all bacteria examined: the rate of T→G mutation is greatly increased by preceding runs of three or more G residues. The strength of the effect increases with the length of the run. In Salmonella, in which the effect is strongest, a G run of length three 3 increases the rate by a factor of ∼26, a run of length 4 increases it by almost a factor of 100, and runs of length 5 or more increase it by a factor of more than 400 on average. The effect is much stronger when the T is on the leading rather than the lagging strand of DNA replication. Several observations eliminate the possibility that this effect is an artifact of sequencing error.


Assuntos
Bactérias , Replicação do DNA , Mutação , Bactérias/genética
4.
Appl Environ Microbiol ; 89(1): e0167022, 2023 01 31.
Artigo em Inglês | MEDLINE | ID: mdl-36519847

RESUMO

Metagenomic sequencing is a swift and powerful tool to ascertain the presence of an organism of interest in a sample. However, sequencing coverage of the organism of interest can be insufficient due to an inundation of reads from irrelevant organisms in the sample. Here, we report a nuclease-based approach to rapidly enrich for DNA from certain organisms, including enterobacteria, based on their differential endogenous modification patterns. We exploit the ability of taxon-specific methylated motifs to resist the action of cognate methylation-sensitive restriction endonucleases that thereby digest unwanted, unmethylated DNA. Subsequently, we use a distributive exonuclease or electrophoretic separation to deplete or exclude the digested fragments, thus enriching for undigested DNA from the organism of interest. As a proof of concept, we apply this method to enrich for the enterobacteria Escherichia coli and Salmonella enterica by 11- to 142-fold from mock metagenomic samples and validate this approach as a versatile means to enrich for genomes of interest in metagenomic samples. IMPORTANCE Pathogens that contaminate the food supply or spread through other means can cause outbreaks that bring devastating repercussions to the health of a populace. Investigations to trace the source of these outbreaks are initiated rapidly but can be drawn out due to the labored methods of pathogen isolation. Metagenomic sequencing can alleviate this hurdle but is often insufficiently sensitive. The approach and implementations detailed here provide a rapid means to enrich for many pathogens involved in foodborne outbreaks, thereby improving the utility of metagenomic sequencing as a tool in outbreak investigations. Additionally, this approach provides a means to broadly enrich for otherwise minute levels of modified DNA, which may escape unnoticed in metagenomic samples.


Assuntos
Enzimas de Restrição do DNA , DNA Bacteriano , Escherichia coli , Metagenômica , Salmonella enterica , DNA , Escherichia coli/genética , Escherichia coli/isolamento & purificação , Sequenciamento de Nucleotídeos em Larga Escala , Metagenoma , Metagenômica/métodos , Salmonella enterica/genética , Salmonella enterica/isolamento & purificação , DNA Bacteriano/genética
5.
Viruses ; 14(7)2022 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-35891531

RESUMO

Four seasonal human coronaviruses (sHCoVs) are endemic globally (229E, NL63, OC43, and HKU1), accounting for 5-30% of human respiratory infections. However, the epidemiology and evolution of these CoVs remain understudied due to their association with mild symptomatology. Using a multigene and complete genome analysis approach, we find the evolutionary histories of sHCoVs to be highly complex, owing to frequent recombination of CoVs including within and between sHCoVs, and uncertain, due to the under sampling of non-human viruses. The recombination rate was highest for 229E and OC43 whereas substitutions per recombination event were highest in NL63 and HKU1. Depending on the gene studied, OC43 may have ungulate, canine, or rabbit CoV ancestors. 229E may have origins in a bat, camel, or an unsampled intermediate host. HKU1 had the earliest common ancestor (1809-1899) but fell into two distinct clades (genotypes A and B), possibly representing two independent transmission events from murine-origin CoVs that appear to be a single introduction due to large gaps in the sampling of CoVs in animals. In fact, genotype B was genetically more diverse than all the other sHCoVs. Finally, we found shared amino acid substitutions in multiple proteins along the non-human to sHCoV host-jump branches. The complex evolution of CoVs and their frequent host switches could benefit from continued surveillance of CoVs across non-human hosts.


Assuntos
Infecções por Coronavirus , Coronavirus , Infecções Respiratórias , Animais , Coronavirus/genética , Infecções por Coronavirus/epidemiologia , Infecções por Coronavirus/veterinária , Cães , Humanos , Camundongos , Coelhos , Estações do Ano , Análise de Sequência de DNA
6.
Microbiol Spectr ; 10(3): e0050122, 2022 06 29.
Artigo em Inglês | MEDLINE | ID: mdl-35467376

RESUMO

Enterohemorrhagic E. coli (EHEC) is responsible for significant human illness, death, and economic loss. The main reservoir for EHEC is cattle, but plant-based foods are common vectors for human infection. Several outbreaks have been attributed to lettuce and leafy green vegetables grown in the Salinas and Santa Maria regions of California. Bacteria causing different outbreaks are mostly not close relatives, but one group of closely-related O157:H7 has caused several of them. This unusual pattern of recurrence may have some genetic basis. Here I use whole-genome sequences to reconstruct the genetic changes that occurred in the recent ancestry of this EHEC. In a short period of time corresponding to little genetic change, there were several changes to adhesion-related sequences, mainly adhesins. These changes may have greatly altered the adhesive properties of the bacteria. Possible consequences include increased persistence of cattle infections, more bacteria shed in cattle feces, and greater virulence in humans. Similar constellations of genetic change, which are detectable by current sequencing-based surveillance, may identify other bacteria that are particular threats to human health. In addition, the Santa Maria subclade carries a nonsense mutation affecting ArsR, a repressor of genes that confer resistance to arsenic and antimony. This suggests that the persistent source of Santa Maria contamination is located in an area with arsenic-contaminated groundwater, a problem in many parts of California. This inference may aid identification of the reservoir of EHEC, which would greatly aid mitigation efforts. IMPORTANCE Food-borne bacterial infections cause substantial illness and death. Understanding how bacteria contaminate food and cause disease is important for combating the problem. Closely-related E. coli, likely originating in cattle, have repeatedly caused outbreaks spread by vegetables grown in California. Such recurrence is atypical, and might have a genetic basis. The genetic changes that occurred in the recent ancestry of these E. coli can be reconstructed from their DNA sequences. Several mutations affect genes involved in bacterial adhesion. These might affect persistence of infection in cattle, quantity of bacteria in their feces, and human disease. They also suggest a way of detecting dangerous bacteria from their genome sequences. Furthermore, a subgroup carries a mutation affecting the regulation of genes conferring arsenic resistance. This suggests that the reservoir for contamination utilizes groundwater contaminated with arsenic, a problem in parts of California. This observation may be an aid to locating the persistent reservoir of contamination.


Assuntos
Arsênio , Escherichia coli Êntero-Hemorrágica , Infecções por Escherichia coli , Escherichia coli O157 , Animais , Bovinos , Surtos de Doenças , Escherichia coli Êntero-Hemorrágica/genética , Infecções por Escherichia coli/epidemiologia , Infecções por Escherichia coli/microbiologia , Infecções por Escherichia coli/veterinária , Escherichia coli O157/genética , Lactuca/microbiologia
7.
mBio ; 12(2)2021 04 13.
Artigo em Inglês | MEDLINE | ID: mdl-33849975

RESUMO

Methylation of cytosine in DNA at position C5 increases the rate of C→T mutations in bacteria and eukaryotes. Methylation at the N4 position, employed by some restriction-modification systems, is not known to increase the mutation rate. Here, I report that a Salmonella enterica Type III restriction-modification system that includes a cytosine-N4 methyltransferase causes an enormous increase in the rate of mutation of the methylated cytosines, which occur at the overlined C in the motif CACC̅GT Mutations consist mainly of C→A transversions, the rate of which is increased ∼500-fold by the restriction-modification system. The rate of C→T transitions is also increased and somewhat exceeds that at C5-methylated cytosines in Dcm sites. Two other Salmonella N4 methyltransferases investigated do not have such dramatic effects, although in one case there is a modest increase in C→A mutations along with an increase in C→T mutations. The sensitivity of the C→A rate to orientation with respect to both DNA replication and transcription is higher at hypermutable sites than at other cytosines, suggesting a fundamental mechanistic difference between hypermutation and ordinary mutation.IMPORTANCE Mutation produces the raw material for adaptive evolution but also imposes a burden because most mutations are deleterious. The rate of mutation at a particular site is affected by a variety of factors. In both prokaryotes and eukaryotes, methylation of C at the C5 position, a naturally occurring DNA modification, greatly increases the rate of C→T mutation. A distinct C modification that occurs in prokaryotes, methylation at N4, is not known to increase mutation rate. Here, I report that a bacterial restriction-modification system, found in some Salmonella bacteria, increases the rate of C→A mutation by a factor of 500 at sites that it methylates at N4. This rate increase is much greater than that caused by C5 methylation. Although fewer than 1 in 1,600 positions analyzed are methylation sites, over 10% of all mutations occur at these sites. Like other examples of extremely high mutation rate, whether naturally occurring or the result of laboratory mutation, this phenomenon may shed light on the mechanism of mutation in general.


Assuntos
Citosina/metabolismo , Metilação de DNA , Metiltransferases/metabolismo , Mutação , Salmonella enterica/genética , Sequência de Bases , Salmonella enterica/enzimologia , Salmonella enterica/metabolismo , Especificidade por Substrato
8.
Genome Biol Evol ; 12(3): 18-34, 2020 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-32044996

RESUMO

Bacterial genes are sometimes found to be inactivated by mutation. This inactivation may be observable simply because selection for function is intermittent or too weak to eliminate inactive alleles quickly. Here, I investigate cases in Salmonella enterica where inactivation is instead positively selected. These are identified by a rate of introduction of premature stop codons to a gene that is higher than expected under selective neutrality, as assessed by comparison to the rate of synonymous changes. I identify 84 genes that meet this criterion at a 10% false discovery rate. Many of these genes are involved in virulence, motility and chemotaxis, biofilm formation, and resistance to antibiotics or other toxic substances. It is hypothesized that most of these genes are subject to an ongoing process in which inactivation is favored under rare conditions, but the inactivated allele is deleterious under most other conditions and is subsequently driven to extinction by purifying selection.


Assuntos
Genes Bacterianos , Mutação , Salmonella enterica/genética , Seleção Genética , Artefatos , Proteínas de Bactérias/genética , Códon de Terminação , Metilação de DNA , Evolução Molecular , Ligases/genética , Proteínas de Membrana/genética , Proteínas de Membrana Transportadoras/genética , Diester Fosfórico Hidrolases/genética , Polissacarídeos Bacterianos/biossíntese , Salmonella enterica/patogenicidade , Fator sigma/genética , Fatores de Transcrição/metabolismo , Fatores de Virulência/genética
9.
J Bacteriol ; 200(24)2018 12 15.
Artigo em Inglês | MEDLINE | ID: mdl-30275280

RESUMO

Methylation of DNA at the C-5 position of cytosine occurs in diverse organisms. This modification can increase the rate of C→T transitions at the methylated position. In Escherichia coli and related enteric bacteria, the inner C residues of the sequence CCWGG (W is A or T) are methylated by the Dcm enzyme. These sites are hot spots of mutation during rapid growth in the laboratory but not in nondividing cells, in which repair by the Vsr protein is effective. It has been suggested that hypermutation at these sites is a laboratory artifact and does not occur in nature. Many other methyltransferases, with a variety of specificities, can be found in bacteria, usually associated with restriction enzymes and confined to a subset of the population. Their methylation targets are also possible sites of hypermutation. Here, I show using whole-genome sequence data for thousands of isolates that there is indeed considerable hypermutation at Dcm sites in natural populations: their transition rate is approximately eight times the average. I also demonstrate hypermutability of targets of restriction-associated methyltransferases in several distantly related bacteria: methylation increases the transition rate by a factor ranging from 12 to 58. In addition, I demonstrate how patterns of hypermutability inferred from massive sequence data can be used to determine previously unknown methylation patterns and methyltransferase specificities.IMPORTANCE A common type of DNA modification, addition of a methyl group to cytosine (C) at carbon atom C-5, can greatly increase the rate of mutation of the C to a T. In mammals, methylation of CG sequences increases the rate of CG→TG mutations. It is unknown whether cytosine C-5 methylation increases the mutation rate in bacteria under natural conditions. I show that sites methylated by the Dcm enzyme exhibit an 8-fold increase in mutation rate in natural bacterial populations. I also show that modifications at other sites in various bacteria also increase the mutation rate, in some cases by a factor of forty or more. Finally, I demonstrate how this phenomenon can be used to infer sequence specificities of methylation enzymes.


Assuntos
Bactérias/crescimento & desenvolvimento , Metilação de DNA , DNA Bacteriano/química , Sequenciamento Completo do Genoma/métodos , Bactérias/genética , Sítios de Ligação , Citosina , DNA Bacteriano/genética , Genoma Bacteriano , Metiltransferases/metabolismo , Mutação , Regiões Promotoras Genéticas
10.
BMC Bioinformatics ; 18(1): 127, 2017 Feb 23.
Artigo em Inglês | MEDLINE | ID: mdl-28231758

RESUMO

BACKGROUND: Maximum compatibility is a method of phylogenetic reconstruction that is seldom applied to molecular sequences. It may be ideal for certain applications, such as reconstructing phylogenies of closely-related bacteria on the basis of whole-genome sequencing. RESULTS: Here I present an algorithm that rapidly computes phylogenies according to a compatibility criterion. Although based on solutions to the maximum clique problem, this algorithm deals properly with ambiguities in the data. The algorithm is applied to bacterial data sets containing up to nearly 2000 genomes with several thousand variable nucleotide sites. Run times are several seconds or less. Computational experiments show that maximum compatibility is less sensitive than maximum parsimony to the inclusion of nucleotide data that, though derived from actual sequence reads, has been identified as likely to be misleading. CONCLUSIONS: Maximum compatibility is a useful tool for certain phylogenetic problems, such as inferring the relationships among closely-related bacteria from whole-genome sequence data. The algorithm presented here rapidly solves fairly large problems of this type, and provides robustness against misleading characters than can pollute large-scale sequencing data.


Assuntos
Algoritmos , Evolução Molecular , Genoma Bacteriano , Filogenia , Salmonella enterica/classificação , Salmonella enterica/genética , Análise de Sequência de DNA , Software
12.
Genome Biol Evol ; 5(3): 494-503, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23436005

RESUMO

The sequences of different proteins evolve at different rates. The relative evolutionary rate (ER) of a single protein also changes over evolutionary time. The cause of this ER fluctuation remains uncertain, and study of this phenomenon may shed light on protein evolution more broadly. We have characterized ER fluctuation in mammals and Drosophila. We found little correlation between the amount of rate variation observed for a protein and such factors as its expression level or phylogenetic distribution. Perhaps more surprisingly, we found little correlation between our measure of rate variation and ER itself. We also investigated the extent to which the ERs of different domains of a protein vary independently. We found that rates of different domains do tend to vary together. In fact, rates at positions in different domains are coupled just as strongly as rates at equally distant positions in the same domain. These findings provide clues to the protein evolutionary process.


Assuntos
Proteínas de Drosophila/genética , Drosophila/genética , Evolução Molecular , Mamíferos/genética , Proteínas/genética , Animais , Drosophila/classificação , Humanos , Macaca mulatta , Mamíferos/classificação , Camundongos , Dados de Sequência Molecular , Taxa de Mutação , Filogenia , Ratos
13.
J Virol ; 87(3): 1400-10, 2013 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-23115287

RESUMO

Individuals <60 years of age had the lowest incidence of infection, with ~25% of these people having preexisting, cross-reactive antibodies to novel 2009 H1N1 influenza. Many people >60 years old also had preexisting antibodies to novel H1N1. These observations are puzzling because the seasonal H1N1 viruses circulating during the last 60 years were not antigenically similar to novel H1N1. We therefore hypothesized that a sequence of exposures to antigenically different seasonal H1N1 viruses can elicit an antibody response that protects against novel 2009 H1N1. Ferrets were preinfected with seasonal H1N1 viruses and assessed for cross-reactive antibodies to novel H1N1. Serum from infected ferrets was assayed for cross-reactivity to both seasonal and novel 2009 H1N1 strains. These results were compared to those of ferrets that were sequentially infected with H1N1 viruses isolated prior to 1957 or more-recently isolated viruses. Following seroconversion, ferrets were challenged with novel H1N1 influenza virus and assessed for viral titers in the nasal wash, morbidity, and mortality. There was no hemagglutination inhibition (HAI) cross-reactivity in ferrets infected with any single seasonal H1N1 influenza viruses, with limited protection to challenge. However, sequential H1N1 influenza infections reduced the incidence of disease and elicited cross-reactive antibodies to novel H1N1 isolates. The amount and duration of virus shedding and the frequency of transmission following novel H1N1 challenge were reduced. Exposure to multiple seasonal H1N1 influenza viruses, and not to any single H1N1 influenza virus, elicits a breadth of antibodies that neutralize novel H1N1 even though the host was never exposed to the novel H1N1 influenza viruses.


Assuntos
Vírus da Influenza A Subtipo H1N1/imunologia , Infecções por Orthomyxoviridae/imunologia , Infecções por Orthomyxoviridae/virologia , Animais , Anticorpos Antivirais/sangue , Reações Cruzadas , Modelos Animais de Doenças , Furões , Testes de Inibição da Hemaglutinação , Cavidade Nasal/virologia , Infecções por Orthomyxoviridae/mortalidade , Infecções por Orthomyxoviridae/patologia , Análise de Sobrevida , Carga Viral , Eliminação de Partículas Virais
14.
PLoS One ; 7(7): e39435, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22815705

RESUMO

BACKGROUND: During the 2009 influenza pandemic, individuals over the age of 60 had the lowest incidence of infection with approximately 25% of these people having pre-existing, cross-reactive antibodies to novel 2009 H1N1 influenza isolates. It was proposed that older people had pre-existing antibodies induced by previous 1918-like virus infection(s) that cross-reacted to novel H1N1 strains. METHODOLOGY/PRINCIPAL FINDINGS: Using antisera collected from a cohort of individuals collected before the second wave of novel H1N1 infections, only a minority of individuals with 1918 influenza specific antibodies also demonstrated hemagglutination-inhibition activity against the novel H1N1 influenza. In this study, we examined human antisera collected from individuals that ranged between the ages of 1 month and 90 years to determine the profile of seropositive influenza immunity to viruses representing H1N1 antigenic eras over the past 100 years. Even though HAI titers to novel 2009 H1N1 and the 1918 H1N1 influenza viruses were positively associated, the association was far from perfect, particularly for the older and younger age groups. CONCLUSIONS/SIGNIFICANCE: Therefore, there may be a complex set of immune responses that are retained in people infected with seasonal H1N1 that can contribute to the reduced rates of H1N1 influenza infection in older populations.


Assuntos
Anticorpos Antivirais/imunologia , Soros Imunes/imunologia , Vírus da Influenza A Subtipo H1N1/imunologia , Reações Cruzadas , Glicoproteínas de Hemaglutininação de Vírus da Influenza/imunologia , Humanos , Especificidade da Espécie , Vacinas Virais/imunologia
15.
PLoS Curr ; 2: RRN1200, 2010 Dec 03.
Artigo em Inglês | MEDLINE | ID: mdl-21152078

RESUMO

Severity of seasonal influenza A epidemics is related to the antigenic novelty of the predominant viral strains circulating each year. Support for a strong correlation between epidemic severity and antigenic drift comes from infectious challenge experiments on vaccinated animals and human volunteers, field studies of vaccine efficacy, prospective studies of subjects with laboratory-confirmed prior infections, and analysis of the connection between drift and severity from surveillance data. We show that, given data on the antigenic and sequence novelty of the hemagglutinin protein of clinical isolates of H3N2 virus from a season along with the corresponding data from prior seasons, we can accurately predict the influenza severity for that season. This model therefore provides a framework for making projections of the severity of the upcoming season using assumptions based on viral isolates collected in the current season. Our results based on two independent data sets from the US and Hong Kong suggest that seasonal severity is largely determined by the novelty of the hemagglutinin protein although other factors, including mutations in other influenza genes, co-circulating pathogens and weather conditions, might also play a role. These results should be helpful for the control of seasonal influenza and have implications for improvement of influenza surveillance.

16.
Genome Biol Evol ; 2: 757-69, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-20884723

RESUMO

There is great variation in the rates of sequence evolution among proteins encoded by the same genome. The strongest correlate of evolutionary rate is expression level: highly expressed proteins tend to evolve slowly. This observation has led to the proposal that a major determinant of protein evolutionary rate involves the toxic effects of protein that misfolds due to transcriptional and translational errors (the mistranslation-induced misfolding [MIM] hypothesis). Here, I present a model that explains the correlation of evolutionary rate and expression level by selection for function. The basis of this model is that selection keeps expression levels near optima that reflect a trade-off between beneficial effects of the protein's function and some nonspecific cost of expression (e.g., the biochemical cost of synthesizing protein). Simulations confirm the predictions of the model. Like the MIM hypothesis, this model predicts several other relationships that are observed empirically. Although the model is based on selection for protein function, it is consistent with findings that a protein's rate of evolution is at most weakly correlated with its importance for fitness as measured by gene knockout experiments.


Assuntos
Evolução Molecular , Expressão Gênica , Simulação por Computador , Genoma , Modelos Biológicos , Mutação , Proteínas/genética , Proteínas/metabolismo , Seleção Genética , Estatística como Assunto , Fatores de Tempo
17.
Mol Biol Evol ; 27(3): 735-41, 2010 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-19910385

RESUMO

The sequences of proteins encoded by a genome evolve at different rates. A correlate of a protein's evolutionary rate is its expression level: highly expressed proteins tend to evolve slowly. Some explanations of rate variation and the correlation between rate and expression predict that more slowly evolving and more highly expressed proteins have more favorable equilibrium constants for folding. Proteins from thermophiles generally have more stable folds than proteins from mesophiles, and it is known that there are systematic differences in amino acid content between thermophilic and mesophilic proteins. I examined whether there are analogous correlations of amino acid frequencies with evolutionary rate and expression level within genomes. In most of the organisms analyzed, there is a striking tendency for more slowly evolving proteins to be more thermophile-like in their amino acid compositions when adjustments are made for variation in GC content. More highly expressed proteins also tend to be more thermophile-like by the same criteria. These results suggest that part of the evolutionary rate variation among proteins is due to variation in the strength of selection for stability of the folded state. They also suggest that increasing strength of this selective force with expression level plays a role in the correlation between evolutionary rate and expression level.


Assuntos
Evolução Molecular , Proteínas/genética , Aminoácidos/química , Aminoácidos/genética , Animais , Proteínas Arqueais/química , Proteínas Arqueais/genética , Proteínas de Bactérias/química , Proteínas de Bactérias/genética , Composição de Bases , Proteínas Fúngicas/química , Proteínas Fúngicas/genética , Temperatura Alta , Humanos , Distribuição Normal , Proteínas/química , Análise de Regressão , Estatísticas não Paramétricas
18.
PLoS Curr ; 1: RRN1001, 2009 Aug 18.
Artigo em Inglês | MEDLINE | ID: mdl-20025194

RESUMO

The hemagglutinin protein of influenza virus bears several sites of N-linked asparagine glycosylation. The number and location of these sites varies with strain and substrain. The human H3 hemagglutinin has gained several glycosylation sites on the antigenically important globular head since its introduction to humans, presumably due to selection. Although there is abundant evidence that glycosylation can affect antigenic and functional properties of the protein, direct evidence for selection is lacking. We have analyzed gain and loss of glycosylation sites on the side branches of a large phylogenetic tree of H(3) HA1 sequences (branches off of the main, long-term line of descent). Side branches contrast with the main line of descent: losses of glycosylation sites are not uncommon, and they outnumber gains. Although other explanations are possible, this observation is consistent with weak selection for glycosylation sites or a more complicated pattern of selection. Furthermore, terminal and internal branches differ with respect to rates of gain and loss of glycosylation sites. This pattern would not be expected under selective neutrality, but is easily explained by weak selection or selection that changes with the immune state of the host population. Thus, it provides evidence that selection acts on the glycosylation state of hemagglutinin.

19.
Genome Res ; 19(7): 1316-23, 2009 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-19498102

RESUMO

Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers. Importantly, the project coordinates on manually reviewing inconsistent protein annotations between sites, as well as annotations for which new evidence suggests a revision is needed, to progressively converge on a complete protein-coding set for the human and mouse reference genomes, while maintaining a high standard of reliability and biological accuracy. To date, the project has identified 20,159 human and 17,707 mouse consensus coding regions from 17,052 human and 16,893 mouse genes. Three evaluation methods indicate that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS. The CCDS database thus centralizes the function of identifying well-supported, identically-annotated, protein-coding regions.


Assuntos
Sequência Consenso , Genoma , Fases de Leitura Aberta/genética , Animais , Humanos , Camundongos , Alinhamento de Sequência
20.
PLoS Biol ; 7(5): e1000112, 2009 May 05.
Artigo em Inglês | MEDLINE | ID: mdl-19468303

RESUMO

The mouse (Mus musculus) is the premier animal model for understanding human disease and development. Here we show that a comprehensive understanding of mouse biology is only possible with the availability of a finished, high-quality genome assembly. The finished clone-based assembly of the mouse strain C57BL/6J reported here has over 175,000 fewer gaps and over 139 Mb more of novel sequence, compared with the earlier MGSCv3 draft genome assembly. In a comprehensive analysis of this revised genome sequence, we are now able to define 20,210 protein-coding genes, over a thousand more than predicted in the human genome (19,042 genes). In addition, we identified 439 long, non-protein-coding RNAs with evidence for transcribed orthologs in human. We analyzed the complex and repetitive landscape of 267 Mb of sequence that was missing or misassembled in the previously published assembly, and we provide insights into the reasons for its resistance to sequencing and assembly by whole-genome shotgun approaches. Duplicated regions within newly assembled sequence tend to be of more recent ancestry than duplicates in the published draft, correcting our initial understanding of recent evolution on the mouse lineage. These duplicates appear to be largely composed of sequence regions containing transposable elements and duplicated protein-coding genes; of these, some may be fixed in the mouse population, but at least 40% of segmentally duplicated sequences are copy number variable even among laboratory mouse strains. Mouse lineage-specific regions contain 3,767 genes drawn mainly from rapidly-changing gene families associated with reproductive functions. The finished mouse genome assembly, therefore, greatly improves our understanding of rodent-specific biology and allows the delineation of ancestral biological functions that are shared with human from derived functions that are not.


Assuntos
Biologia Computacional/métodos , Genoma/genética , Animais , Bases de Dados Genéticas , Duplicação Gênica , Genoma/fisiologia , Humanos , Camundongos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...