RESUMEN
Because errors at the DNA level power pathogen evolution, a systematic understanding of the rate and molecular spectra of mutations could guide the avoidance and treatment of infectious diseases. We thus accumulated tens of thousands of spontaneous mutations in 768 repeatedly bottlenecked lineages of 18 strains from various geographical sites, temporal spread, and genetic backgrounds. Entailing over â¼1.36 million generations, the resultant data yield an average mutation rate of â¼0.0005 per genome per generation, with a significant within-species variation. This is one of the lowest bacterial mutation rates reported, giving direct support for a high genome stability in this pathogen resulting from high DNA-mismatch-repair efficiency and replication-machinery fidelity. Pathogenicity genes do not exhibit an accelerated mutation rate, and thus, elevated mutation rates may not be the major determinant for the diversification of toxin and secretion systems. Intriguingly, a low error rate at the transcript level is not observed, suggesting distinct fidelity of the replication and transcription machinery. This study urges more attention on the most basic evolutionary processes of even the best-known human pathogens and deepens the understanding of their genome evolution.
Asunto(s)
Salmonella enterica , Salmonella , Genoma Bacteriano , Mutación , Tasa de Mutación , Salmonella/genética , Salmonella enterica/genéticaRESUMEN
The rate of cytosine deamination is much higher in single-stranded DNA (ssDNA) than in double-stranded DNA, and copying the resulting uracils causes C to T mutations. To study this phenomenon, the catalytic domain of APOBEC3G (A3G-CTD), an ssDNA-specific cytosine deaminase, was expressed in an Escherichia coli strain defective in uracil repair (ung mutant), and the mutations that accumulated over thousands of generations were determined by whole-genome sequencing. C:G to T:A transitions dominated, with significantly more cytosines mutated to thymine in the lagging-strand template (LGST) than in the leading-strand template (LDST). This strand bias was present in both repair-defective and repair-proficient cells and was strongest and highly significant in cells expressing A3G-CTD. These results show that the LGST is accessible to cellular cytosine deaminating agents, explains the well-known GC skew in microbial genomes, and suggests the APOBEC3 family of mutators may target the LGST in the human genome.
Asunto(s)
Escherichia coli/genética , Escherichia coli/metabolismo , Desaminasa APOBEC-3G , Secuencia de Bases , Citidina Desaminasa/genética , Citidina Desaminasa/metabolismo , Citosina/metabolismo , ADN/genética , ADN/metabolismo , Reparación del ADN/genética , Replicación del ADN , ADN Bacteriano/genética , ADN Bacteriano/metabolismo , ADN de Cadena Simple/genética , ADN de Cadena Simple/metabolismo , Desaminación , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Genes Bacterianos , Humanos , Mutación , Proteínas Recombinantes/genética , Proteínas Recombinantes/metabolismo , Timina/metabolismo , Uracilo/metabolismo , Uracil-ADN Glicosidasa/genética , Uracil-ADN Glicosidasa/metabolismoRESUMEN
Although it is well known that microbial populations can respond adaptively to challenges from antibiotics, empirical difficulties in distinguishing the roles of de novo mutation and natural selection have left several issues unresolved. Here, we explore the mutational properties of Escherichia coli exposed to long-term sublethal levels of the antibiotic norfloxacin, using a mutation accumulation design combined with whole-genome sequencing of replicate lines. The genome-wide mutation rate significantly increases with norfloxacin concentration. This response is associated with enhanced expression of error-prone DNA polymerases and may also involve indirect effects of norfloxacin on DNA mismatch and oxidative-damage repair. Moreover, we find that acquisition of antibiotic resistance can be enhanced solely by accelerated mutagenesis, i.e., without direct involvement of selection. Our results suggest that antibiotics may generally enhance the mutation rates of target cells, thereby accelerating the rate of adaptation not only to the antibiotic itself but to additional challenges faced by invasive pathogens.
Asunto(s)
Escherichia coli/genética , Genoma Bacteriano/genética , Inestabilidad Genómica/genética , Mutagénesis/genética , Mutación/genética , Norfloxacino/administración & dosificación , Antibacterianos/administración & dosificación , Daño del ADN/genética , Reparación del ADN/efectos de los fármacos , Reparación del ADN/genética , Relación Dosis-Respuesta a Droga , Escherichia coli/efectos de los fármacos , Evolución Molecular , Genoma Bacteriano/efectos de los fármacos , Inestabilidad Genómica/efectos de los fármacos , Mutagénesis/efectos de los fármacos , Mutación/efectos de los fármacosRESUMEN
A majority of large-scale bacterial genome rearrangements involve mobile genetic elements such as insertion sequence (IS) elements. Here we report novel insertions and excisions of IS elements and recombination between homologous IS elements identified in a large collection of Escherichia coli mutation accumulation lines by analysis of whole genome shotgun sequencing data. Based on 857 identified events (758 IS insertions, 98 recombinations and 1 excision), we estimate that the rate of IS insertion is 3.5 × 10(-4) insertions per genome per generation and the rate of IS homologous recombination is 4.5 × 10(-5) recombinations per genome per generation. These events are mostly contributed by the IS elements IS1, IS2, IS5 and IS186 Spatial analysis of new insertions suggest that transposition is biased to proximal insertions, and the length spectrum of IS-caused deletions is largely explained by local hopping. For any of the ISs studied there is no region of the circular genome that is favored or disfavored for new insertions but there are notable hotspots for deletions. Some elements have preferences for non-coding sequence or for the beginning and end of coding regions, largely explained by target site motifs. Interestingly, transposition and deletion rates remain constant across the wild-type and 12 mutant E. coli lines, each deficient in a distinct DNA repair pathway. Finally, we characterized the target sites of four IS families, confirming previous results and characterizing a highly specific pattern at IS186 target-sites, 5'-GGGG(N6/N7)CCCC-3'. We also detected 48 long deletions not involving IS elements.
Asunto(s)
Elementos Transponibles de ADN/genética , Escherichia coli/genética , Genoma Bacteriano/genética , Mutagénesis Insercional/genética , Secuencia de Bases , Evolución MolecularRESUMEN
A complete understanding of evolutionary processes requires that factors determining spontaneous mutation rates and spectra be identified and characterized. Using mutation accumulation followed by whole-genome sequencing, we found that the mutation rates of three widely diverged commensal Escherichia coli strains differ only by about 50%, suggesting that a rate of 1-2 × 10(-3) mutations per generation per genome is common for this bacterium. Four major forces are postulated to contribute to spontaneous mutations: intrinsic DNA polymerase errors, endogenously induced DNA damage, DNA damage caused by exogenous agents, and the activities of error-prone polymerases. To determine the relative importance of these factors, we studied 11 strains, each defective for a major DNA repair pathway. The striking result was that only loss of the ability to prevent or repair oxidative DNA damage significantly impacted mutation rates or spectra. These results suggest that, with the exception of oxidative damage, endogenously induced DNA damage does not perturb the overall accuracy of DNA replication in normally growing cells and that repair pathways may exist primarily to defend against exogenously induced DNA damage. The thousands of mutations caused by oxidative damage recovered across the entire genome revealed strong local-sequence biases of these mutations. Specifically, we found that the identity of the 3' base can affect the mutability of a purine by oxidative damage by as much as eightfold.
Asunto(s)
Escherichia coli/genética , Genes Bacterianos , Mutación , Alquilación , Reparación del ADNRESUMEN
Deinococcus bacteria are extremely resistant to radiation, oxidation, and desiccation. Resilience to these factors has been suggested to be due to enhanced damage prevention and repair mechanisms, as well as highly efficient antioxidant protection systems. Here, using mutation-accumulation experiments, we find that the GC-rich Deinococcus radiodurans has an overall background genomic mutation rate similar to that of E. coli, but differs in mutation spectrum, with the A/T to G/C mutation rate (based on a total count of 88 A:T â G:C transitions and 82 A:T â C:G transversions) per site per generation higher than that in the other direction (based on a total count of 157 G:C â A:T transitions and 33 G:C â T:A transversions). We propose that this unique spectrum is shaped mainly by the abundant uracil DNA glycosylases reducing G:C â A:T transitions, adenine methylation elevating A:T â C:G transversions, and absence of cytosine methylation decreasing G:C â A:T transitions. As opposed to the greater than 100× elevation of the mutation rate in MMR(-) (DNA Mismatch Repair deficient) strains of most other organisms, MMR(-) D. radiodurans only exhibits a 4-fold elevation, raising the possibility that other DNA repair mechanisms compensate for a relatively low-efficiency DNA MMR pathway. As D. radiodurans has plentiful insertion sequence (IS) elements in the genome and the activities of IS elements are rarely directly explored, we also estimated the insertion (transposition) rate of the IS elements to be 2.50 × 10(-3) per genome per generation in the wild-type strain; knocking out MMR did not elevate the IS element insertion rate in this organism.
Asunto(s)
ADN Bacteriano/genética , Deinococcus/genética , Proteínas Bacterianas/genética , Daño del ADN , Metilación de ADN , Reparación del ADN , Deinococcus/enzimología , Genes Bacterianos , Flujo Genético , Mutagénesis Insercional , Tasa de Mutación , Plásmidos/genética , Mutación Puntual , Tolerancia a Radiación , Uracil-ADN Glicosidasa/genéticaRESUMEN
Recent advances in sequencing technologies have initiated an era of personal genome sequences. To date, human genome sequences have been reported for individuals with ancestry in three distinct geographical regions: a Yoruba African, two individuals of northwest European origin, and a person from China. Here we provide a highly annotated, whole-genome sequence for a Korean individual, known as AK1. The genome of AK1 was determined by an exacting, combined approach that included whole-genome shotgun sequencing (27.8x coverage), targeted bacterial artificial chromosome sequencing, and high-resolution comparative genomic hybridization using custom microarrays featuring more than 24 million probes. Alignment to the NCBI reference, a composite of several ethnic clades, disclosed nearly 3.45 million single nucleotide polymorphisms (SNPs), including 10,162 non-synonymous SNPs, and 170,202 deletion or insertion polymorphisms (indels). SNP and indel densities were strongly correlated genome-wide. Applying very conservative criteria yielded highly reliable copy number variants for clinical considerations. Potential medical phenotypes were annotated for non-synonymous SNPs, coding domain indels, and structural variants. The integration of several human whole-genome sequences derived from several ethnic groups will assist in understanding genetic ancestry, migration patterns and population bottlenecks.
Asunto(s)
Pueblo Asiatico/genética , Genoma Humano/genética , Cromosomas Artificiales Bacterianos/genética , Hibridación Genómica Comparativa , Biología Computacional , Humanos , Mutación INDEL/genética , Corea (Geográfico) , Análisis de Secuencia por Matrices de Oligonucleótidos , Polimorfismo de Nucleótido Simple/genética , Análisis de Secuencia de ADNRESUMEN
Knowledge of the rate and nature of spontaneous mutation is fundamental to understanding evolutionary and molecular processes. In this report, we analyze spontaneous mutations accumulated over thousands of generations by wild-type Escherichia coli and a derivative defective in mismatch repair (MMR), the primary pathway for correcting replication errors. The major conclusions are (i) the mutation rate of a wild-type E. coli strain is ~1 × 10(-3) per genome per generation; (ii) mutations in the wild-type strain have the expected mutational bias for G:C > A:T mutations, but the bias changes to A:T > G:C mutations in the absence of MMR; (iii) during replication, A:T > G:C transitions preferentially occur with A templating the lagging strand and T templating the leading strand, whereas G:C > A:T transitions preferentially occur with C templating the lagging strand and G templating the leading strand; (iv) there is a strong bias for transition mutations to occur at 5'ApC3'/3'TpG5' sites (where bases 5'A and 3'T are mutated) and, to a lesser extent, at 5'GpC3'/3'CpG5' sites (where bases 5'G and 3'C are mutated); (v) although the rate of small (≤4 nt) insertions and deletions is high at repeat sequences, these events occur at only 1/10th the genomic rate of base-pair substitutions. MMR activity is genetically regulated, and bacteria isolated from nature often lack MMR capacity, suggesting that modulation of MMR can be adaptive. Thus, comparing results from the wild-type and MMR-defective strains may lead to a deeper understanding of factors that determine mutation rates and spectra, how these factors may differ among organisms, and how they may be shaped by environmental conditions.
Asunto(s)
Escherichia coli/genética , Genoma Bacteriano/genética , Mutación , Análisis de Secuencia de ADN/métodos , Adenosina Trifosfatasas/genética , Secuencia de Bases , Sitios de Unión/genética , Metilación de ADN , Reparación de la Incompatibilidad de ADN/genética , Replicación del ADN/genética , ADN Bacteriano/química , ADN Bacteriano/genética , Proteínas de Escherichia coli/genética , Genes Bacterianos/genética , Mutación INDEL , Método de Montecarlo , Proteínas MutL , Tasa de Mutación , Mutación Puntual , Polimorfismo de Nucleótido Simple , Selección GenéticaRESUMEN
The rapidly expanding use of wastewater for public health surveillance requires new strategies to protect privacy rights, while data are collected at increasingly discrete geospatial scales, i.e., city, neighborhood, campus, and building-level. Data collected at high geospatial resolution can inform on labile, short-lived biomarkers, thereby making wastewater-derived data both more actionable and more likely to cause privacy concerns and stigmatization of subpopulations. Additionally, data sharing restrictions among neighboring cities and communities can complicate efforts to balance public health protections with citizens' privacy. Here, we have created an encrypted framework that facilitates the sharing of sensitive population health data among entities that lack trust for one another (e.g., between adjacent municipalities with different governance of health monitoring and data sharing). We demonstrate the utility of this approach with two real-world cases. Our results show the feasibility of sharing encrypted data between two municipalities and a laboratory, while performing secure private computations for wastewater-based epidemiology (WBE) with high precision, fast speeds, and low data costs. This framework is amenable to other computations used by WBE researchers including population normalized mass loads, fecal indicator normalizations, and quality control measures. The Centers for Disease Control and Prevention's National Wastewater Surveillance System shows â¼8 % of the records attributed to collection before the wastewater treatment plant, illustrating an opportunity to further expand currently limited community-level sampling and public health surveillance through security and responsible data-sharing as outlined here.
Asunto(s)
Difusión de la Información , Aguas Residuales , Privacidad , Humanos , Seguridad Computacional , Monitoreo del Ambiente/métodos , Monitoreo Epidemiológico Basado en Aguas ResidualesRESUMEN
MOTIVATION: Gene clusters are arrangements of functionally related genes on a chromosome. In bacteria, it is expected that evolutionary pressures would conserve these arrangements due to the functional advantages they provide. Visualization of conserved gene clusters across multiple genomes provides key insights into their evolutionary histories. Therefore, a software tool that enables visualization and functional analyses of gene clusters would be a great asset to the biological research community. RESULTS: We have developed GeneclusterViz, a Java-based tool that allows for the visualization, exploration and downstream analyses of conserved gene clusters across multiple genomes. GeneclusterViz combines an easy-to-use exploration interface for gene clusters with a host of other analysis features such as multiple sequence alignments, phylogenetic analyses and integration with the KEGG pathway database. AVAILABILITY: http://biohealth.snu.ac.kr/GeneclusterViz/; http://microbial.informatics.indiana.edu/GeneclusterViz/
Asunto(s)
Alphaproteobacteria/clasificación , Alphaproteobacteria/genética , Familia de Multigenes , Filogenia , Programas Informáticos , Análisis por Conglomerados , Escherichia coli/genética , Genoma , Alineación de SecuenciaRESUMEN
Accurate prediction of TCR binding affinity to a target antigen is important for development of immunotherapy strategies. Recent computational methods were built on various deep neural networks and used the evolutionary-based distance matrix BLOSUM to embed amino acids of TCR and epitope sequences to numeric values. A pre-trained language model of amino acids is an alternative embedding method where each amino acid in a peptide is embedded as a continuous numeric vector. Little attention has yet been given to summarize the amino-acid-wise embedding vectors to sequence-wise representations. In this paper, we propose PiTE, a two-step pipeline for the TCR-epitope binding affinity prediction. First, we use an amino acids embedding model pre-trained on a large number of unlabeled TCR sequences and obtain a real-valued representation from a string representation of amino acid sequences. Second, we train a binding affinity prediction model that consists of two sequence encoders and a stack of linear layers predicting the affinity score of a given TCR and epitope pair. In particular, we explore various types of neural network architectures for the sequence encoders in the two-step binding affinity prediction pipeline. We show that our Transformer-like sequence encoder achieves a state-of-the-art performance and significantly outperforms the others, perhaps due to the model's ability to capture contextual information between amino acids in each sequence. Our work highlights that an advanced sequence encoder on top of pre-trained representation significantly improves performance of the TCR-epitope binding affinity prediction.
Asunto(s)
Biología Computacional , Redes Neurales de la Computación , Humanos , Epítopos , Biología Computacional/métodos , Aminoácidos , Receptores de Antígenos de Linfocitos T/genéticaRESUMEN
TCR-epitope pair binding is the key component for T cell regulation. The ability to predict whether a given pair binds is fundamental to understanding the underlying biology of the binding mechanism as well as developing T-cell mediated immunotherapy approaches. The advent of large-scale public databases containing TCR-epitope binding pairs enabled the recent development of computational prediction methods for TCR-epitope binding. However, the number of epitopes reported along with binding TCRs is far too small, resulting in poor out-of-sample performance for unseen epitopes. In order to address this issue, we present our model ATM-TCR which uses a multi-head self-attention mechanism to capture biological contextual information and improve generalization performance. Additionally, we present a novel application of the attention map from our model to improve out-of-sample performance by demonstrating on recent SARS-CoV-2 data.
Asunto(s)
Epítopos de Linfocito T , Receptores de Antígenos de Linfocitos T , Biología Computacional , Epítopos de Linfocito T/metabolismo , Humanos , Unión Proteica , Receptores de Antígenos de Linfocitos T/metabolismo , SARS-CoV-2RESUMEN
Encounters between DNA replication and transcription can cause genomic disruption, particularly when the two meet head-on. Whether these conflicts produce point mutations is debated. This paper presents detailed analyses of a large collection of mutations generated during mutation accumulation experiments with mismatch repair (MMR)-defective Escherichia coli. With MMR absent, mutations are primarily due to DNA replication errors. Overall, there were no differences in the frequencies of base pair substitutions or small indels (i.e., insertion and deletions of ≤4 bp) in the coding sequences or promoters of genes oriented codirectionally versus head-on to replication. Among a subset of highly expressed genes, there was a 2- to 3-fold bias for indels in genes oriented head-on to replication, but this difference was almost entirely due to the asymmetrical genomic locations of tRNA genes containing mononucleotide runs, which are hot spots for indels. No additional orientation bias in mutation frequencies occurred when MMR- strains were also defective for transcription-coupled repair (TCR). However, in contrast to other reports, loss of TCR slightly increased the overall mutation rate, meaning that TCR is antimutagenic. There was no orientation bias in mutation frequencies among the stress response genes that are regulated by RpoS or induced by DNA damage. Thus, biases in the locations of mutational targets can account for most, if not all, apparent biases in mutation frequencies between genes oriented head-on versus codirectional to replication. In addition, the data revealed a strong correlation of the frequency of base pair substitutions with gene length but no correlation with gene expression levels. IMPORTANCE Because DNA replication and transcription occur on the same DNA template, encounters between the two machines occur frequently. When these encounters are head-to-head, genomic disruption can occur. However, whether replication-transcription conflicts contribute to spontaneous mutations is debated. Analyzing in detail a large collection of mutations generated with mismatch repair-defective Escherichia coli strains, we found that across the genome there are no significant differences in mutation frequencies between genes oriented codirectionally and those oriented head-on to replication. Among a subset of highly expressed genes, there was a 2- to 3-fold bias for small insertions and deletions in head-on-oriented genes, but this difference was almost entirely due to the asymmetrical locations of tRNA genes containing mononucleotide runs, which are hot spots for these mutations. Thus, biases in the positions of mutational target sequences can account for most, if not all, apparent biases in mutation frequencies between genes oriented head-on and codirectionally to replication.
Asunto(s)
Replicación del ADN , Escherichia coli/genética , Genoma Bacteriano/genética , Mutación , Transcripción Genética , Reparación de la Incompatibilidad de ADN , Mutación del Sistema de Lectura , Tasa de Mutación , Mutación PuntualRESUMEN
When its DNA is damaged, Escherichia coli induces the SOS response, which consists of about 40 genes that encode activities to repair or tolerate the damage. Certain alleles of the major SOS-control genes, recA and lexA, cause constitutive expression of the response, resulting in an increase in spontaneous mutations. These mutations, historically called "untargeted", have been the subject of many previous studies. Here we re-examine SOS-induced mutagenesis using mutation accumulation followed by whole-genome sequencing (MA/WGS), which allows a detailed picture of the types of mutations induced as well as their sequence-specificity. Our results confirm previous findings that SOS expression specifically induces transversion base-pair substitutions, with rates averaging about 60-fold above wild-type levels. Surprisingly, the rates of G:C to C:G transversions, normally an extremely rare mutation, were induced an average of 160-fold above wild-type levels. The SOS-induced transversion showed strong sequence specificity, the most extreme of which was the G:C to C:G transversions, 60% of which occurred at the middle base of 5'GGC3'+5'GCC3' sites, although these sites represent only 8% of the G:C base pairs in the genome. SOS-induced transversions were also DNA strand-biased, occurring, on average, 2- to 4- times more often when the purine was on the leading-strand template and the pyrimidine on the lagging-strand template than in the opposite orientation. However, the strand bias was also sequence specific, and even of reverse orientation at some sites. By eliminating constraints on the mutations that can be recovered, the MA/WGS protocol revealed new complexities of SOS "untargeted" mutations.
Asunto(s)
Escherichia coli/genética , Mutagénesis , Mutación , Respuesta SOS en Genética , ADN Bacteriano/metabolismo , ADN Polimerasa Dirigida por ADN/metabolismo , Tasa de Mutación , Secuenciación Completa del GenomaRESUMEN
Mutation accumulation experiments followed by whole-genome sequencing have revealed that, for several bacterial species, the rate of base-pair substitutions (BPSs) is not constant across the chromosome but varies in a wave-like pattern that is symmetrical about the origin of replication. The experiments reported here demonstrated that, in Escherichia coli, several interacting factors determine the wave. The origin is a major driver of BPS rates. When it is relocated, the BPS rates in a 1,000-kb region surrounding the new origin reproduce the pattern that surrounds the normal origin. However, the pattern across distant regions of the chromosome is unaltered and thus must be determined by other factors. Increasing the deoxynucleoside triphosphate (dNTP) concentration shifts the wave pattern away from the origin, supporting the hypothesis that fluctuations in dNTP pools coincident with replication firing contribute to the variations in the mutation rate. The nucleoid binding proteins (HU and Fis) and the terminus organizing protein (MatP) are also major factors. These proteins alter the three-dimensional structure of the DNA, and results suggest that mutation rates increase when highly structured DNA is replicated. Biases in error correction by proofreading and mismatch repair, both of which may be responsive to dNTP concentrations and DNA structure, also are major determinants of the wave pattern. These factors should apply to most bacterial and, possibly, eukaryotic genomes and suggest that different areas of the genome evolve at different rates.IMPORTANCE It has been found in several species of bacteria that the rate at which single base pairs are mutated is not constant across the genome but varies in a wave-like pattern that is symmetrical about the origin of replication. Using Escherichia coli as our model system, we show that this pattern is the result of several interconnected factors. First, the timing and progression of replication are important in determining the wave pattern. Second, the three-dimensional structure of the DNA is also a factor, and the results suggest that mutation rates increase when highly structured DNA is replicated. Finally, biases in error correction, which may be responsive both to the progression of DNA synthesis and to DNA structure, are major determinants of the wave pattern. These factors should apply to most bacterial and, possibly, eukaryotic genomes and suggest that different areas of the genome evolve at different rates.
Asunto(s)
Emparejamiento Base , Cromosomas Bacterianos , Escherichia coli/genética , Tasa de Mutación , Mutación Puntual , Origen de Réplica , Proteínas de Escherichia coli/metabolismo , Nucleósidos/metabolismo , Análisis EspacialRESUMEN
Members of the genus Thermococcus, sulfur-reducing hyperthermophilic archaea, are ubiquitously present in various deep-sea hydrothermal vent systems and are considered to play a significant role in the microbial consortia. We present the complete genome sequence and feature analysis of Thermococcus onnurineus NA1 isolated from a deep-sea hydrothermal vent area, which reveal clues to its physiology. Based on results of genomic analysis, T. onnurineus NA1 possesses the metabolic pathways for organotrophic growth on peptides, amino acids, or sugars. More interesting was the discovery that the genome encoded unique proteins that are involved in carboxydotrophy to generate energy by oxidation of CO to CO(2), thereby providing a mechanistic basis for growth with CO as a substrate. This lithotrophic feature in combination with carbon fixation via RuBisCO (ribulose 1,5-bisphosphate carboxylase/oxygenase) introduces a new strategy with a complementing energy supply for T. onnurineus NA1 potentially allowing it to cope with nutrient stress in the surrounding of hydrothermal vents, providing the first genomic evidence for the carboxydotrophy in Thermococcus.
Asunto(s)
Genoma Arqueal/genética , Thermococcus/genética , Thermococcus/metabolismo , Aminoácidos/metabolismo , Monóxido de Carbono/metabolismo , ADN de Archaea/química , ADN de Archaea/genética , Modelos Genéticos , Datos de Secuencia Molecular , Familia de Multigenes/genética , Filogenia , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Agua de Mar/microbiología , Análisis de Secuencia de ADN , Thermococcus/clasificaciónRESUMEN
Accurate typing of human leukocyte antigen (HLA) is essential for successful organ transplantation and HLA genes are heavily associated with various diseases. Widely used typing assays often involve a set of specially designed primers or probes requiring additional experiments. With the maturing of high-throughput sequencing (HTS) technologies, whole genome sequencing (WGS) as well as other HTS assays are becoming more accessible even in the clinical settings. We describe various computational methods capable of directly typing HLA genes using HTS data including Kourami, our HLA assembler. Kourami is the first HLA assembler capable of discovering novel alleles. Kourami assembles full-length sequences across the peptide-binding regions of HLA genes. Here, we focus on how a user would use Kourami on a new sample. We demonstrate the application by typing HLA alleles from a recently published WGS data with validated HLA types using Kourami.
Asunto(s)
Algoritmos , Prueba de Histocompatibilidad/métodos , Alelos , Secuencia de Bases , Genoma Humano , Humanos , Alineación de Secuencia , Programas Informáticos , Secuenciación Completa del GenomaRESUMEN
Accurate typing of human leukocyte antigen (HLA) is important because HLA genes play important roles in immune responses and disease genesis. Previously available computational methods are database-matching approaches and their outputs are inherently limited by the completeness of already known types, making them unsuitable for discovery of novel alleles. We have developed a graph-guided assembly technique for classical HLA genes, which can construct allele sequences given high-coverage whole-genome sequencing data. Our method delivers highly accurate HLA typing, comparable to the current state-of-the-art methods. Using various data, we also demonstrate that our method can type novel alleles.
Asunto(s)
Alelos , Antígenos HLA/genética , Prueba de Histocompatibilidad/métodos , Genoma , Haplotipos , Humanos , Terminología como Asunto , Secuenciación Completa del GenomaRESUMEN
When the DNA polymerase that replicates the Escherichia coli chromosome, DNA polymerase III, makes an error, there are two primary defenses against mutation: proofreading by the ϵ subunit of the holoenzyme and mismatch repair. In proofreading-deficient strains, mismatch repair is partially saturated and the cell's response to DNA damage, the SOS response, may be partially induced. To investigate the nature of replication errors, we used mutation accumulation experiments and whole-genome sequencing to determine mutation rates and mutational spectra across the entire chromosome of strains deficient in proofreading, mismatch repair, and the SOS response. We report that a proofreading-deficient strain has a mutation rate 4000-fold greater than wild-type strains. While the SOS response may be induced in these cells, it does not contribute to the mutational load. Inactivating mismatch repair in a proofreading-deficient strain increases the mutation rate another 1.5-fold. DNA polymerase has a bias for converting G:C to A:T base pairs, but proofreading reduces the impact of these mutations, helping to maintain the genomic G:C content. These findings give an unprecedented view of how polymerase and error-correction pathways work together to maintain E. coli's low mutation rate of 1 per 1000 generations.
Asunto(s)
Replicación del ADN , ADN Bacteriano/genética , Escherichia coli/genética , Secuenciación Completa del Genoma/métodos , Daño del ADN , Reparación de la Incompatibilidad de ADN , ADN Polimerasa III/metabolismo , Proteínas de Escherichia coli/metabolismo , Tasa de Mutación , Respuesta SOS en GenéticaRESUMEN
Mismatch repair (MMR) is a major contributor to replication fidelity, but its impact varies with sequence context and the nature of the mismatch. Mutation accumulation experiments followed by whole-genome sequencing of MMR-defective Escherichia coli strains yielded ≈30,000 base-pair substitutions (BPSs), revealing mutational patterns across the entire chromosome. The BPS spectrum was dominated by A:T to G:C transitions, which occurred predominantly at the center base of 5'NAC3'+5'GTN3' triplets. Surprisingly, growth on minimal medium or at low temperature attenuated these mutations. Mononucleotide runs were also hotspots for BPSs, and the rate at which these occurred increased with run length. Comparison with ≈2000 BPSs accumulated in MMR-proficient strains revealed that both kinds of hotspots appeared in the wild-type spectrum and so are likely to be sites of frequent replication errors. In MMR-defective strains transitions were strand biased, occurring twice as often when A and C rather than T and G were on the lagging-strand template. Loss of nucleotide diphosphate kinase increases the cellular concentration of dCTP, which resulted in increased rates of mutations due to misinsertion of C opposite A and T. In an mmr ndk double mutant strain, these mutations were more frequent when the template A and T were on the leading strand, suggesting that lagging-strand synthesis was more error-prone, or less well corrected by proofreading, than was leading strand synthesis.