RESUMO
The draft genome sequences of six Bacillus strains, isolated from the International Space Station and belonging to the Bacillus anthracis-B. cereus-B. thuringiensis group, are presented here. These strains were isolated from the Japanese Experiment Module (one strain), U.S. Harmony Node 2 (three strains), and Russian Segment Zvezda Module (two strains).
RESUMO
BACKGROUND: High throughput sequencing is beginning to make a transformative impact in the area of viral evolution. Deep sequencing has the potential to reveal the mutant spectrum within a viral sample at high resolution, thus enabling the close examination of viral mutational dynamics both within- and between-hosts. The challenge however, is to accurately model the errors in the sequencing data and differentiate real viral mutations, particularly those that exist at low frequencies, from sequencing errors. RESULTS: We demonstrate that overlapping read pairs (ORP) -- generated by combining short fragment sequencing libraries and longer sequencing reads -- significantly reduce sequencing error rates and improve rare variant detection accuracy. Using this sequencing protocol and an error model optimized for variant detection, we are able to capture a large number of genetic mutations present within a viral population at ultra-low frequency levels (<0.05%). CONCLUSIONS: Our rare variant detection strategies have important implications beyond viral evolution and can be applied to any basic and clinical research area that requires the identification of rare mutations.
Assuntos
Análise Mutacional de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Mutação , Vírus/genética , Benchmarking , Genoma Viral/genética , Reação em Cadeia da PolimeraseRESUMO
The high mutation rate of RNA viruses enables a diverse genetic population of viral genotypes to exist within a single infected host. In-host genetic diversity could better position the virus population to respond and adapt to a diverse array of selective pressures such as host-switching events. Multiple new coronaviruses, including SARS, have been identified in human samples just within the last ten years, demonstrating the potential of coronaviruses as emergent human pathogens. Deep sequencing was used to characterize genomic changes in coronavirus quasispecies during simulated host-switching. Three bovine nasal samples infected with bovine coronavirus were used to infect human and bovine macrophage and lung cell lines. The virus reproduced relatively well in macrophages, but the lung cell lines were not infected efficiently enough to allow passage of non lab-adapted samples. Approximately 12 kb of the genome was amplified before and after passage and sequenced at average coverages of nearly 950×(454 sequencing) and 38,000×(Illumina). The consensus sequence of many of the passaged samples had a 12 nucleotide insert in the consensus sequence of the spike gene, and multiple point mutations were associated with the presence of the insert. Deep sequencing revealed that the insert was present but very rare in the unpassaged samples and could quickly shift to dominate the population when placed in a different environment. The insert coded for three arginine residues, occurred in a region associated with fusion entry into host cells, and may allow infection of new cell types via heparin sulfate binding. Analysis of the deep sequencing data indicated that two distinct genotypes circulated at different frequency levels in each sample, and support the hypothesis that the mutations present in passaged strains were "selected" from a pre-existing pool rather than through de novo mutation and subsequent population fixation.
Assuntos
Bovinos/virologia , Infecções por Coronavirus/veterinária , Infecções por Coronavirus/virologia , Coronavirus Bovino/genética , Sequência de Aminoácidos , Animais , Linhagem Celular , Sequência Consenso , Coronavirus Bovino/química , Coronavirus Bovino/fisiologia , Variação Genética , Genoma Viral , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Modelos Moleculares , Dados de Sequência Molecular , Taxa de Mutação , Filogenia , Mutação Puntual , Estrutura Terciária de Proteína , Alinhamento de Sequência , Proteínas Virais/química , Proteínas Virais/genética , Internalização do VírusRESUMO
BACKGROUND: We developed an extendable open-source Loop-mediated isothermal AMPlification (LAMP) signature design program called LAVA (LAMP Assay Versatile Analysis). LAVA was created in response to limitations of existing LAMP signature programs. RESULTS: LAVA identifies combinations of six primer regions for basic LAMP signatures, or combinations of eight primer regions for LAMP signatures with loop primers, which can be used as LAMP signatures. The identified primers are conserved among target organism sequences. Primer combinations are optimized based on lengths, melting temperatures, and spacing among primer sites. We compare LAMP signature candidates for Staphylococcus aureus created both by LAVA and by PrimerExplorer. We also include signatures from a sample run targeting all strains of Mycobacterium tuberculosis. CONCLUSIONS: We have designed and demonstrated new software for identifying signature candidates appropriate for LAMP assays. The software is available for download at http://lava-dna.googlecode.com/.
Assuntos
Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Software , Sequência de Bases , Primers do DNA/genética , Técnicas de Amplificação de Ácido Nucleico , Sensibilidade e EspecificidadeRESUMO
Bluetongue virus (BTV) causes disease in domestic and wild ruminants and results in significant economic loss. The closely related Epizootic hemorrhagic disease virus (EHDV) has been associated with bluetongue-like disease in cattle. Although U.S. EHDV strains have not been experimentally proven to cause disease in cattle, there is serologic evidence of infection in cattle. Therefore, rapid diagnosis and differentiation of BTV and EHDV is required. The genetic sequence information and bioinformatic analysis necessary to design a real-time reverse transcription polymerase chain reaction (RT-PCR) assay for the early detection of indigenous and exotic BTV and EHDV is described. This sequence data foundation focused on 2 conserved target genes: one that is highly expressed in infected mammalian cells, and the other is highly expressed in infected insect cells. The analysis of all BTV and EHDV prototype strains indicated that a complex primer design was necessary for both a virus group-comprehensive and virus group-specific gene amplification diagnostic test. This information has been used as the basis for the development of a rapid multiplex BTV-EHDV real-time RT-PCR that detects all known serotypes of both viruses and distinguishes between BTV and EHDV serogroups. The sensitivity of this rapid, single-tube, real-time RT-PCR assay is sufficient for diagnostic application, without the contamination problems associated with standard gel-based RT-PCR, especially nested RT-PCR tests.
Assuntos
Vírus Bluetongue/genética , Vírus da Doença Hemorrágica Epizoótica/genética , Reação em Cadeia da Polimerase Via Transcriptase Reversa/métodos , Animais , Sequência de Bases , Bluetongue/epidemiologia , Vírus Bluetongue/classificação , Bovinos , Doenças dos Bovinos/epidemiologia , Doenças dos Bovinos/virologia , Clonagem Molecular , Primers do DNA , Amplificação de Genes , Vírus da Doença Hemorrágica Epizoótica/classificação , Filogenia , Infecções por Reoviridae/epidemiologia , Sorotipagem , Especificidade da EspécieRESUMO
Sequencing pathogen genomes is costly, demanding careful allocation of limited sequencing resources. We built a computational Sequencing Analysis Pipeline (SAP) to guide decisions regarding the amount of genomic sequencing necessary to develop high-quality diagnostic DNA and protein signatures. SAP uses simulations to estimate the number of target genomes and close phylogenetic relatives (near neighbors or NNs) to sequence. We use SAP to assess whether draft data are sufficient or finished sequencing is required using Marburg and variola virus sequences. Simulations indicate that intermediate to high-quality draft with error rates of 10(-3)-10(-5) (approximately 8x coverage) of target organisms is suitable for DNA signature prediction. Low-quality draft with error rates of approximately 1% (3x to 6x coverage) of target isolates is inadequate for DNA signature prediction, although low-quality draft of NNs is sufficient, as long as the target genomes are of high quality. For protein signature prediction, sequencing errors in target genomes substantially reduce the detection of amino acid sequence conservation, even if the draft is of high quality. In summary, high-quality draft of target and low-quality draft of NNs appears to be a cost-effective investment for DNA signature prediction, but may lead to underestimation of predicted protein signatures.
Assuntos
Genoma Bacteriano , Genoma Viral , Genômica/métodos , Análise de Sequência de DNA/métodos , Análise de Sequência de Proteína/métodos , Biologia Computacional , Marburgvirus/genética , Marburgvirus/isolamento & purificação , Filogenia , Alinhamento de Sequência , Software , Vírus da Varíola/genética , Vírus da Varíola/isolamento & purificação , Proteínas Virais/químicaRESUMO
We built a system to guide decisions regarding the amount of genomic sequencing required to develop diagnostic DNA signatures, which are short sequences that are sufficient to uniquely identify a viral species. We used our existing DNA diagnostic signature prediction pipeline, which selects regions of a target species genome that are conserved among strains of the target (for reliability, to prevent false negatives) and unique relative to other species (for specificity, to avoid false positives). We performed simulations, based on existing sequence data, to assess the number of genome sequences of a target species and of close phylogenetic relatives (near neighbors) that are required to predict diagnostic signature regions that are conserved among strains of the target species and unique relative to other bacterial and viral species. For DNA viruses such as variola (smallpox), three target genomes provide sufficient guidance for selecting species-wide signatures. Three near-neighbor genomes are critical for species specificity. In contrast, most RNA viruses require four target genomes and no near-neighbor genomes, since lack of conservation among strains is more limiting than uniqueness. Severe acute respiratory syndrome and Ebola Zaire are exceptional, as additional target genomes currently do not improve predictions, but near-neighbor sequences are urgently needed. Our results also indicate that double-stranded DNA viruses are more conserved among strains than are RNA viruses, since in most cases there was at least one conserved signature candidate for the DNA viruses and zero conserved signature candidates for the RNA viruses.
Assuntos
Sequência de Bases , Vírus de DNA/classificação , Genoma Viral , Vírus de RNA/classificação , Viroses/diagnóstico , Vírus de DNA/genética , Humanos , Método de Monte Carlo , Vírus de RNA/genética , Especificidade da Espécie , Viroses/virologiaRESUMO
Rapid advances in the genomic sequencing of bacteria and viruses over the past few years have made it possible to consider sequencing the genomes of all pathogens that affect humans and the crops and livestock upon which our lives depend. Recent events make it imperative that full genome sequencing be accomplished as soon as possible for pathogens that could be used as weapons of mass destruction or disruption. This sequence information must be exploited to provide rapid and accurate diagnostics to identify pathogens and distinguish them from harmless near-neighbours and hoaxes. The Chem-Bio Non-Proliferation (CBNP) programme of the US Department of Energy (DOE) began a large-scale effort of pathogen detection in early 2000 when it was announced that the DOE would be providing bio-security at the 2002 Winter Olympic Games in Salt Lake City, Utah. Our team at the Lawrence Livermore National Lab (LLNL) was given the task of developing reliable and validated assays for a number of the most likely bioterrorist agents. The short timeline led us to devise a novel system that utilised whole-genome comparison methods to rapidly focus on parts of the pathogen genomes that had a high probability of being unique. Assays developed with this approach have been validated by the Centers for Disease Control (CDC). They were used at the 2002 Winter Olympics, have entered the public health system, and have been in continual use for non-publicised aspects of homeland defence since autumn 2001. Assays have been developed for all major threat list agents for which adequate genomic sequence is available, as well as for other pathogens requested by various government agencies. Collaborations with comparative genomics algorithm developers have enabled our LLNL team to make major advances in pathogen detection, since many of the existing tools simply did not scale well enough to be of practical use for this application. It is hoped that a discussion of a real-life practical application of comparative genomics algorithms may help spur algorithm developers to tackle some of the many remaining problems that need to be addressed. Solutions to these problems will advance a wide range of biological disciplines, only one of which is pathogen detection. For example, exploration in evolution and phylogenetics, annotating gene coding regions, predicting and understanding gene function and regulation, and untangling gene networks all rely on tools for aligning multiple sequences, detecting gene rearrangements and duplications, and visualising genomic data. Two key problems currently needing improved solutions are: (1) aligning incomplete, fragmentary sequence (eg draft genome contigs or arbitrary genome regions) with both complete genomes and other fragmentary sequences; and (2) ordering, aligning and visualising non-colinear gene rearrangements and inversions in addition to the colinear alignments handled by current tools.