RESUMEN
Zebrafish have become a popular organism for the study of vertebrate gene function. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination.
Asunto(s)
Secuencia Conservada/genética , Genoma/genética , Pez Cebra/genética , Animales , Cromosomas/genética , Evolución Molecular , Femenino , Genes/genética , Genoma Humano/genética , Genómica , Humanos , Masculino , Meiosis/genética , Anotación de Secuencia Molecular , Seudogenes/genética , Estándares de Referencia , Procesos de Determinación del Sexo/genética , Proteínas de Pez Cebra/genéticaRESUMEN
The Red Queen hypothesis proposes that coevolution of interacting species (such as hosts and parasites) should drive molecular evolution through continual natural selection for adaptation and counter-adaptation. Although the divergence observed at some host-resistance and parasite-infectivity genes is consistent with this, the long time periods typically required to study coevolution have so far prevented any direct empirical test. Here we show, using experimental populations of the bacterium Pseudomonas fluorescens SBW25 and its viral parasite, phage Phi2 (refs 10, 11), that the rate of molecular evolution in the phage was far higher when both bacterium and phage coevolved with each other than when phage evolved against a constant host genotype. Coevolution also resulted in far greater genetic divergence between replicate populations, which was correlated with the range of hosts that coevolved phage were able to infect. Consistent with this, the most rapidly evolving phage genes under coevolution were those involved in host infection. These results demonstrate, at both the genomic and phenotypic level, that antagonistic coevolution is a cause of rapid and divergent evolution, and is likely to be a major driver of evolutionary change within species.
Asunto(s)
Bacteriófagos/fisiología , Evolución Biológica , Evolución Molecular , Pseudomonas fluorescens/genética , Pseudomonas fluorescens/virología , Bacteriófagos/genética , Variación Genética , Datos de Secuencia Molecular , Fenotipo , Selección Genética/genéticaRESUMEN
Babesia spp. are tick-borne, intraerythrocytic hemoparasites that use antigenic variation to resist host immunity, through sequential modification of the parasite-derived variant erythrocyte surface antigen (VESA) expressed on the infected red blood cell surface. We identified the genomic processes driving antigenic diversity in genes encoding VESA (ves1) through comparative analysis within and between three Babesia species, (B. bigemina, B. divergens and B. bovis). Ves1 structure diverges rapidly after speciation, notably through the evolution of shortened forms (ves2) from 5' ends of canonical ves1 genes. Phylogenetic analyses show that ves1 genes are transposed between loci routinely, whereas ves2 genes are not. Similarly, analysis of sequence mosaicism shows that recombination drives variation in ves1 sequences, but less so for ves2, indicating the adoption of different mechanisms for variation of the two families. Proteomic analysis of the B. bigemina PR isolate shows that two dominant VESA1 proteins are expressed in the population, whereas numerous VESA2 proteins are co-expressed, consistent with differential transcriptional regulation of each family. Hence, VESA2 proteins are abundant and previously unrecognized elements of Babesia biology, with evolutionary dynamics consistently different to those of VESA1, suggesting that their functions are distinct.
Asunto(s)
Variación Antigénica , Babesia/genética , Evolución Molecular , Genes Protozoarios , Interacciones Huésped-Parásitos/genética , Puntos de Rotura del Cromosoma , Genoma de Protozoos , Proteínas Protozoarias/genética , Recombinación GenéticaRESUMEN
Visceral leishmaniasis is a potentially fatal disease endemic to large parts of Asia and Africa, primarily caused by the protozoan parasite Leishmania donovani. Here, we report a high-quality reference genome sequence for a strain of L. donovani from Nepal, and use this sequence to study variation in a set of 16 related clinical lines, isolated from visceral leishmaniasis patients from the same region, which also differ in their response to in vitro drug susceptibility. We show that whole-genome sequence data reveals genetic structure within these lines not shown by multilocus typing, and suggests that drug resistance has emerged multiple times in this closely related set of lines. Sequence comparisons with other Leishmania species and analysis of single-nucleotide diversity within our sample showed evidence of selection acting in a range of surface- and transport-related genes, including genes associated with drug resistance. Against a background of relative genetic homogeneity, we found extensive variation in chromosome copy number between our lines. Other forms of structural variation were significantly associated with drug resistance, notably including gene dosage and the copy number of an experimentally verified circular episome present in all lines and described here for the first time. This study provides a basis for more powerful molecular profiling of visceral leishmaniasis, providing additional power to track the drug resistance and epidemiology of an important human pathogen.
Asunto(s)
Resistencia a Medicamentos/genética , Dosificación de Gen , Genes Protozoarios , Leishmania donovani/genética , Leishmaniasis Visceral/genética , Secuencia de Bases , Humanos , Leishmania donovani/metabolismo , Leishmaniasis Visceral/tratamiento farmacológico , Leishmaniasis Visceral/epidemiología , Leishmaniasis Visceral/metabolismo , Datos de Secuencia Molecular , Análisis de Secuencia de ADN , Especificidad de la EspecieRESUMEN
BACKGROUND: Our understanding of the dynamics of genome stability versus gene flux within bacteriophage lineages is limited. Recently, there has been a renewed interest in the use of bacteriophages as 'therapeutic' agents; a prerequisite for their use in such therapies is a thorough understanding of their genetic complement, genome stability and their ecology to avoid the dissemination or mobilisation of phage or bacterial virulence and toxin genes. Campylobacter, a food-borne pathogen, is one of the organisms for which the use of bacteriophage is being considered to reduce human exposure to this organism. RESULTS: Sequencing and genome analysis was performed for two Campylobacter bacteriophages. The genomes were extremely similar at the nucleotide level (> or = 96%) with most differences accounted for by novel insertion sequences, DNA methylases and an approximately 10 kb contiguous region of metabolic genes that were dissimilar at the sequence level but similar in gene function between the two phages. Both bacteriophages contained a large number of radical S-adenosylmethionine (SAM) genes, presumably involved in boosting host metabolism during infection, as well as evidence that many genes had been acquired from a wide range of bacterial species. Further bacteriophages, from the UK Campylobacter typing set, were screened for the presence of bacteriophage structural genes, DNA methylases, mobile genetic elements and regulatory genes identified from the genome sequences. The results indicate that many of these bacteriophages are related, with 10 out of 15 showing some relationship to the sequenced genomes. CONCLUSIONS: Two large virulent Campylobacter bacteriophages were found to show very high levels of sequence conservation despite separation in time and place of isolation. The bacteriophages show adaptations to their host and possess genes that may enhance Campylobacter metabolism, potentially advantaging both the bacteriophage and its host. Genetic conservation has been shown to extend to other Campylobacter bacteriophages, forming a highly conserved lineage of bacteriophages that predate upon campylobacters and indicating that highly adapted bacteriophage genomes can be stable over prolonged periods of time.
Asunto(s)
Bacteriófagos/genética , Campylobacter/virología , Bacteriófagos/patogenicidad , Secuencia Conservada , Genoma Viral , Análisis de Secuencia , Proteínas Estructurales Virales/genética , VirulenciaRESUMEN
Stearoyl-CoA desaturases (SCDs) are key enzymes of fatty acid biosynthesis whose regulation underpins responses to dietary, thermal, and hormonal treatment. Although two isoforms are known to exist in the common carp and human and four in mouse, there is no coherent view on how this gene family evolved to generate functionally diverse members. Here we identify numerous new SCD homologs in teleost fishes, using sequence data from expressed sequence tag (EST) and cDNA collections and genomic model species. Phylogenetic analyses of the deduced coding sequences produced only partially resolved molecular trees. The multiple SCD isoforms were, however, consistent with having arisen by an ancient gene duplication event in teleost fishes together with a more recent duplication in the tetraploid carp and possibly also salmonid lineages. Critical support for this interpretation comes from comparison across all vertebrate groups of the gene order in the genomic environments of the SCD isoforms. Using syntenically aligned chromosomal fragments from large-insert clones of common carp and grass carp together with those from genomically sequenced model species, we show that the ancient and modern SCD duplication events in the carp lineage were each associated with large chromosomal segment duplications, both possibly linked to whole genome duplications. By contrast, the four mouse isoforms likely arose by tandem duplications. Each duplication in the carp lineage gave rise to differentially expressed SCD isoforms, either induced by cold or diet as previously shown for the recent duplicated carp isoforms or tissue specific as demonstrated here for the ancient duplicate zebrafish isoforms.
Asunto(s)
Evolución Molecular , Peces/genética , Duplicación de Gen , Estearoil-CoA Desaturasa/genética , Secuencia de Aminoácidos , Animales , ADN Complementario/genética , Etiquetas de Secuencia Expresada , Genómica/métodos , Datos de Secuencia Molecular , Filogenia , Isoformas de Proteínas/genética , Sintenía , Takifugu/genética , Pez Cebra/genéticaRESUMEN
We report full-length draft de novo genome assemblies for 16 widely used inbred mouse strains and find extensive strain-specific haplotype variation. We identify and characterize 2,567 regions on the current mouse reference genome exhibiting the greatest sequence diversity. These regions are enriched for genes involved in pathogen defence and immunity and exhibit enrichment of transposable elements and signatures of recent retrotransposition events. Combinations of alleles and genes unique to an individual strain are commonly observed at these loci, reflecting distinct strain phenotypes. We used these genomes to improve the mouse reference genome, resulting in the completion of 10 new gene structures. Also, 62 new coding loci were added to the reference genome annotation. These genomes identified a large, previously unannotated, gene (Efcab3-like) encoding 5,874 amino acids. Mutant Efcab3-like mice display anomalies in multiple brain regions, suggesting a possible role for this gene in the regulation of brain development.
Asunto(s)
Mapeo Cromosómico , Sitios Genéticos , Genoma , Haplotipos , Ratones Endogámicos/genética , Animales , Animales de Laboratorio , Mapeo Cromosómico/veterinaria , Haplotipos/genética , Ratones , Ratones Endogámicos BALB C/genética , Ratones Endogámicos C3H/genética , Ratones Endogámicos C57BL/genética , Ratones Endogámicos CBA/genética , Ratones Endogámicos DBA/genética , Ratones Endogámicos NOD/genética , Ratones Endogámicos/clasificación , Anotación de Secuencia Molecular , Filogenia , Polimorfismo de Nucleótido Simple , Especificidad de la EspecieRESUMEN
Comparative nematode genomics has thus far been largely constrained to the genus Caenorhabditis, but a huge diversity of other nematode species, and genomes, exist. The Brugia malayi genome is approximately 100 Mb in size, and distributed across five chromosome pairs. Previous genomic investigations have included definition of major repeat classes and sequencing of selected genes. We have generated over 18,000 sequences from the ends of large-insert clones from bacterial artificial chromosome libraries. These end sequences, totalling over 10 Mb of sequence, contain just under 8 Mb of unique sequence. We identified the known Mbo I and Hha I repeat families in the sequence data, and also identified several new repeats based on their abundance. Genomic copies of 17% of B. malayi genes defined by expressed sequence tags have been identified. Nearly one quarter of end sequences can encode peptides with significant similarity to protein sequences in the public databases, and we estimate that we have identified more than 2700 new B. malayi genes. Importantly, 459 end sequences had homologues in other organisms, but lacked a match in the completely sequenced genomes of Caenorhabditis briggsae and Caenorhabditis elegans, emphasising the role of gene loss in genome evolution. B. malayi is estimated to have over 18,500 protein-coding genes.
Asunto(s)
Brugia Malayi/genética , Genes de Helminto , Animales , Caenorhabditis/genética , Caenorhabditis elegans/genética , Mapeo Cromosómico , Cromosomas Artificiales Bacterianos/genética , ADN de Helmintos/genética , Conversión Génica , Genoma , Genómica , Datos de Secuencia Molecular , ARN de Helminto/genética , ARN Ribosómico/genética , Secuencias Repetitivas de Ácidos Nucleicos , Retroelementos/genética , Especificidad de la EspecieRESUMEN
IMPORTANCE: The latest generation of benchtop DNA sequencing platforms can provide an accurate whole-genome sequence (WGS) for a broad range of bacteria in less than a day. These could be used to more effectively contain the spread of multidrug-resistant pathogens. OBJECTIVE: To compare WGS with standard clinical microbiology practice for the investigation of nosocomial outbreaks caused by multidrug-resistant bacteria, the identification of genetic determinants of antimicrobial resistance, and typing of other clinically important pathogens. DESIGN, SETTING, AND PARTICIPANTS: A laboratory-based study of hospital inpatients with a range of bacterial infections at Cambridge University Hospitals NHS Foundation Trust, a secondary and tertiary referral center in England, comparing WGS with standard diagnostic microbiology using stored bacterial isolates and clinical information. MAIN OUTCOMES AND MEASURES: Specimens were taken and processed as part of routine clinical care, and cultured isolates stored and referred for additional reference laboratory testing as necessary. Isolates underwent DNA extraction and library preparation prior to sequencing on the Illumina MiSeq platform. Bioinformatic analyses were performed by persons blinded to the clinical, epidemiologic, and antimicrobial susceptibility data. RESULTS: We investigated 2 putative nosocomial outbreaks, one caused by vancomycin-resistant Enterococcus faecium and the other by carbapenem-resistant Enterobacter cloacae; WGS accurately discriminated between outbreak and nonoutbreak isolates and was superior to conventional typing methods. We compared WGS with standard methods for the identification of the mechanism of carbapenem resistance in a range of gram-negative bacteria (Acinetobacter baumannii, E cloacae, Escherichia coli, and Klebsiella pneumoniae). This demonstrated concordance between phenotypic and genotypic results, and the ability to determine whether resistance was attributable to the presence of carbapenemases or other resistance mechanisms. Whole-genome sequencing was used to recapitulate reference laboratory typing of clinical isolates of Neisseria meningitidis and to provide extended phylogenetic analyses of these. CONCLUSIONS AND RELEVANCE: The speed, accuracy, and depth of information provided by WGS platforms to confirm or refute outbreaks in hospitals and the community, and to accurately define transmission of multidrug-resistant and other organisms, represents an important advance.
Asunto(s)
Infección Hospitalaria/diagnóstico , Genoma Bacteriano/genética , Bacterias Gramnegativas/aislamiento & purificación , Análisis de Secuencia de ADN/métodos , Antibacterianos/farmacología , Proteínas Bacterianas/genética , Infección Hospitalaria/microbiología , Brotes de Enfermedades , Farmacorresistencia Bacteriana/genética , Inglaterra , Bacterias Gramnegativas/clasificación , Bacterias Gramnegativas/genética , Hospitales Universitarios , Humanos , Salud Pública , beta-Lactamasas/genéticaRESUMEN
We have developed a sequencing method on the Pacific Biosciences RS sequencer (the PacBio) for small DNA molecules that avoids the need for a standard library preparation. To date this approach has been applied toward sequencing single-stranded and double-stranded viral genomes, bacterial plasmids, plasmid vector models for DNA-modification analysis, and linear DNA fragments covering an entire bacterial genome. Using direct sequencing it is possible to generate sequence data from as little as 1 ng of DNA, offering a significant advantage over current protocols which typically require 400-500 ng of sheared DNA for the library preparation.
Asunto(s)
Análisis de Secuencia de ADN/instrumentación , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Secuencia de Bases , Biblioteca de Genes , Genoma Bacteriano , Genómica/métodos , Datos de Secuencia Molecular , Interfaz Usuario-ComputadorRESUMEN
Large insert genome libraries have been a core resource required to sequence genomes, analyze haplotypes, and aid gene discovery. While next generation sequencing technologies are revolutionizing the field of genomics, traditional genome libraries will still be required for accurate genome assembly. Their utility is also being extended to functional studies for understanding DNA regulatory elements. Here, we present a detailed method for constructing genomic fosmid libraries, testing for common contaminants, gridding the library to nylon membranes, then hybridizing the library membranes with a radiolabeled probe to identify corresponding genomic clones. While this chapter focuses on fosmid libraries, many of these steps can also be applied to bacterial artificial chromosome libraries.
Asunto(s)
ADN/genética , Biblioteca Genómica , Genómica/métodos , Bacteriófagos , Células Clonales , Contaminación de ADN , Sondas de ADN/metabolismo , Electricidad , Escherichia coli/virología , Vectores Genéticos/genética , Membranas Artificiales , Reproducibilidad de los ResultadosRESUMEN
Sequencing large insert clones to completion is useful for characterizing specific genomic regions, identifying haplotypes, and closing gaps in whole genome sequencing projects. Despite being a standard technique in molecular laboratories, DNA sequencing using the Sanger method can be highly problematic when complex secondary structures or sequence repeats are encountered in genomic clones. Here, we describe methods to isolate DNA from a large insert clone (fosmid or BAC), subclone the sample, and sequence the region to the highest industry standard. Troubleshooting solutions for sequencing difficult templates are discussed.
Asunto(s)
Clonación Molecular/métodos , ADN/genética , Biblioteca Genómica , Genómica/métodos , Análisis de Secuencia de ADN/métodos , Secuencia de Bases , Cromosomas Artificiales Bacterianos/genética , Células Clonales , Bases de Datos Genéticas , Vectores Genéticos/genética , Datos de Secuencia MolecularRESUMEN
The obligately anaerobic bacterium Bacteroides fragilis, an opportunistic pathogen and inhabitant of the normal human colonic microbiota, exhibits considerable within-strain phase and antigenic variation of surface components. The complete genome sequence has revealed an unusual breadth (in number and in effect) of DNA inversion events that potentially control expression of many different components, including surface and secreted components, regulatory molecules, and restriction-modification proteins. Invertible promoters of two different types (12 group 1 and 11 group 2) were identified. One group has inversion crossover (fix) sites similar to the hix sites of Salmonella typhimurium. There are also four independent intergenic shufflons that potentially alter the expression and function of varied genes. The composition of the 10 different polysaccharide biosynthesis gene clusters identified (7 with associated invertible promoters) suggests a mechanism of synthesis similar to the O-antigen capsules of Escherichia coli.