ABSTRACT
Large repeat rich genomes present challenges for assembly using short read technologies. The 32 Gb axolotl genome is estimated to contain ~19 Gb of repetitive DNA making an assembly from short reads alone effectively impossible. Indeed, this model species has been sequenced to 20× coverage but the reads could not be conventionally assembled. Using an alternative strategy, we have assembled subsets of these reads into scaffolds describing over 19,000 gene models. We call this method Virtual Genome Walking as it locally assembles whole genome reads based on a reference transcriptome, identifying exons and iteratively extending them into surrounding genomic sequence. These assemblies are then linked and refined to generate gene models including upstream and downstream genomic, and intronic, sequence. Our assemblies are validated by comparison with previously published axolotl bacterial artificial chromosome (BAC) sequences. Our analyses of axolotl intron length, intron-exon structure, repeat content and synteny provide novel insights into the genic structure of this model species. This resource will enable new experimental approaches in axolotl, such as ChIP-Seq and CRISPR and aid in future whole genome sequencing efforts. The assembled sequences and annotations presented here are freely available for download from https://tinyurl.com/y8gydc6n . The software pipeline is available from https://github.com/LooseLab/iterassemble .
Subject(s)
Ambystoma mexicanum/genetics , Chromosome Walking/methods , Computational Biology/methods , Animals , Gene Expression Profiling/methods , Introns , Molecular Sequence Annotation , SoftwareABSTRACT
PIF-like elements are the first-described members of a recently discovered and widespread superfamily of DNA transposons, named PIF/Harbinger. Complete and partial PIF-like elements have been isolated from hundreds of plant species. Previously, we identified 139 partial PIF-like transposases in the Bambusoideae, of which three were from the bamboo species Phyllostachys pubescens. Here we report identification and isolation of the first full-length PIF-like element (PpPIF-1) from P. pubescens; identification was made by chromosome walking, based on a modified magnetic enrichment procedure that allows efficient cloning of flanking sequences up to 3 kb in length. PpPIF-1 is 5953 bp in length, with 20-bp imperfect inverted terminal repeats and 3-bp target site duplications. This element contains two open reading frames, one encoding a putative transposase, including the complete DDE-domain typical of PIF/Harbinger elements from plants, and the other encoding a DNA-binding protein. There are seven termination codons and two frameshift mutations in the open reading frames, probably due to vertical inactivation.
Subject(s)
Bambusa/genetics , Genes, Plant , Amino Acid Sequence , Base Sequence , Chromosome Walking , DNA Primers , Frameshift Mutation , Molecular Sequence Data , Open Reading Frames , Phylogeny , Sequence Homology, Amino Acid , Transposases/chemistry , Transposases/geneticsABSTRACT
The mechanisms involved in the formation of subtelomeric rearrangements are now beginning to be elucidated. Breakpoint sequencing analysis of 1p36 rearrangements has made important contributions to this line of inquiry. Despite the unique architecture of segmental duplications inherent to human subtelomeres, no common mechanism has been identified thus far and different nonexclusive recombination-repair mechanisms seem to predominate. In order to gain further insights into the mechanisms of chromosome breakage, repair, and stabilization mediating subtelomeric rearrangements in humans, we investigated the constitutional rearrangements of 1p36. Cloning of the breakpoint junctions in a complex rearrangement and three non-reciprocal translocations revealed similarities at the junctions, such as microhomology of up to three nucleotides, along with no significant sequence identity in close proximity to the breakpoint regions. All the breakpoints appeared to be unique and their occurrence was limited to non-repetitive, unique DNA sequences. Several recombination- or cleavage-associated motifs that may promote non-homologous recombination were observed in close proximity to the junctions. We conclude that NHEJ is likely the mechanism of DNA repair that generates these rearrangements. Additionally, two apparently pure terminal deletions were also investigated, and the refinement of the breakpoint regions identified two distinct genomic intervals ~25-kb apart, each containing a series of 1p36 specific segmental duplications with 90-98% identity. Segmental duplications can serve as substrates for ectopic homologous recombination or stimulate genomic rearrangements.
Subject(s)
Chromosomes, Human, Pair 1/genetics , Gene Duplication , Gene Rearrangement , Recombination, Genetic , Base Sequence , Cell Line , Chromosome Breakage , Chromosome Walking , Cloning, Molecular , Comparative Genomic Hybridization , DNA Repair , Humans , In Situ Hybridization, Fluorescence , Molecular Sequence Data , Oligonucleotide Array Sequence Analysis , Sequence Analysis, DNA , Translocation, GeneticABSTRACT
BACKGROUND: With over 20 parapatric races differing in their warningly colored wing patterns, the butterfly Heliconius erato provides a fascinating example of an adaptive radiation. Together with matching races of its co-mimic Heliconius melpomene, H. erato also represents a textbook case of Müllerian mimicry, a phenomenon where common warning signals are shared amongst noxious organisms. It is of great interest to identify the specific genes that control the mimetic wing patterns of H. erato and H. melpomene. To this end we have undertaken comparative mapping and targeted genomic sequencing in both species. This paper reports on a comparative analysis of genomic sequences linked to color pattern mimicry genes in Heliconius. RESULTS: Scoring AFLP polymorphisms in H. erato broods allowed us to survey loci at approximately 362 kb intervals across the genome. With this strategy we were able to identify markers tightly linked to two color pattern genes: D and Cr, which were then used to screen H. erato BAC libraries in order to identify clones for sequencing. Gene density across 600 kb of BAC sequences appeared relatively low, although the number of predicted open reading frames was typical for an insect. We focused analyses on the D- and Cr-linked H. erato BAC sequences and on the Yb-linked H. melpomene BAC sequence. A comparative analysis between homologous regions of H. erato (Cr-linked BAC) and H. melpomene (Yb-linked BAC) revealed high levels of sequence conservation and microsynteny between the two species. We found that repeated elements constitute 26% and 20% of BAC sequences from H. erato and H. melpomene respectively. The majority of these repetitive sequences appear to be novel, as they showed no significant similarity to any other available insect sequences. We also observed signs of fine scale conservation of gene order between Heliconius and the moth Bombyx mori, suggesting that lepidopteran genome architecture may be conserved over very long evolutionary time scales. CONCLUSION: Here we have demonstrated the tractability of progressing from a genetic linkage map to genomic sequence data in Heliconius butterflies. We have also shown that fine-scale gene order is highly conserved between distantly related Heliconius species, and also between Heliconius and B. mori. Together, these findings suggest that genome structure in macrolepidoptera might be very conserved, and show that mapping and positional cloning efforts in different lepidopteran species can be reciprocally informative.
Subject(s)
Butterflies/genetics , Gene Order , Genes, Insect , Genetic Linkage , Repetitive Sequences, Nucleic Acid , Amplified Fragment Length Polymorphism Analysis , Animals , Base Sequence , Chromosome Walking , Chromosomes, Artificial, Bacterial , Conserved Sequence , DNA/genetics , Genetic Markers , Phenotype , Pigmentation/genetics , Sequence Analysis , Synteny , Wings, AnimalABSTRACT
The rplKAJL-rpoBC operon or beta operon is a classic bacterial gene cluster, which codes for proteins K, A, J and L of the large ribosomal subunit, as well as proteins B (beta subunit) and C (beta' subunit) of RNA polymerase. In the early 1990s, the operon was obtained as a 2.6 kbp DNA fragment (In-2.6) by random cloning of DNA from periwinkle plants infected with the Poona (India) strain of the huanglongbing agent, later named 'Candidatus (Ca.) Liberibacter asiaticus'. DNA from periwinkle plants infected with the Nelspruit strain (South Africa) of 'Ca. L. africanus' was amplified with a primer pair designed from In-2.6 and yielded, after cloning and sequencing, a 1.7 kbp DNA fragment (AS-1.7) of the beta operon of 'Ca. L. africanus'. The beta operon of the American liberibacter, as well as the three upstream genes (tufB, secE, nusG), have now also been obtained by the technique of chromosome walking and extend over 4673 bp, comprising the following genes: tufB, secE, nusG, rplK, rplA, rplJ, rplL and rpoB. The sequence of the beta operon was also determined for a Brazilian strain of 'Ca. L. asiaticus', from nusG to rpoB (3025 bp), and was found to share 99 % identity with the corresponding beta operon sequences of an Indian and a Japanese strain. Finally, the beta operon sequence of 'Ca. L. africanus' was extended from 1673 bp (rplA to rpoB) to 3013 bp (nusG to rpoB), making it possible to compare the beta operon sequences of the African, Asian and American liberibacters over a length of approximately 3000 bp, from nusG to rpoB. While 'Ca. L. africanus' and 'Ca. L. asiaticus' shared 81.2 % sequence identity, the percentage for 'Ca. L. americanus' and 'Ca. L. africanus' was only 72.2 %, and identity for 'Ca. L. americanus' and 'Ca. L. asiaticus' was only 71.4 %. The approximately 3000 bp nusG-rpoB sequence was also used to construct a phylogenetic tree, and this tree was found to be identical to the known 16S rRNA gene sequence-based tree. These results confirm earlier findings that 'Ca. L. americanus' is a distinct liberibacter, more distantly related to 'Ca. L. africanus' and 'Ca. L. asiaticus' than 'Ca. L. africanus' is to 'Ca. L. asiaticus'. The dates of speciation have also been estimated.
Subject(s)
Bacterial Proteins/genetics , Citrus sinensis/microbiology , Multigene Family , Phylogeny , Plant Diseases/microbiology , Rhizobiaceae/classification , Sequence Analysis, DNA , Vinca/microbiology , Chromosome Walking , DNA, Bacterial/analysis , Molecular Sequence Data , Plant Leaves/microbiology , RNA, Ribosomal, 16S/genetics , Rhizobiaceae/genetics , Rhizobiaceae/isolation & purification , Ribosomal Proteins/genetics , Species SpecificityABSTRACT
The increased demand for enzymes with new properties makes indispensable the development of easy and rapid strategies to obtain complete genes of new enzymes. Here a strategy is described which includes screening by PCR of new subtilases mediated by Consensus-Degenerate Hybrid Oligonucleotide Primers (CODEHOP) and an improved genome walking method to obtain the complete sequence of the identified genes. Existing methods of genome walking have many limitations, which make them inefficient and time consuming. We have developed an improved genome walking method with novel advances to get a simple, rapid and more efficient procedure based on cassette-ligation. Improvements consist basically in the possibility of a genomic DNA digestion with any restriction enzyme, blunting and 3' adenylation of digested DNA by Taq DNA polymerase to avoid self-circularization, followed by TA ligation of the adenine 3' overhanging end to the same unphosphorylated oligo-cassette. The efficiency of the genome walking method was demonstrated by finding the unknown ends of all gene fragments tested, previously obtained by CODEHOP-mediated PCR, including three subtilases (P4, P6 and P7), one xylanase and one lipase, from different strains of Antarctic marine bacteria.
Subject(s)
Bacteria/enzymology , Bacteria/genetics , Chromosome Walking/methods , Genes, Bacterial/genetics , Genome, Bacterial/genetics , Seawater/microbiology , Water Microbiology , Antarctic Regions , Base Sequence , Cloning, Molecular , Consensus Sequence , DNA Restriction Enzymes , Gene Library , Hydrolysis , Phylogeny , Polymerase Chain Reaction , Recombinant Proteins/metabolism , Sequence Analysis, Protein , Solubility , Subtilisin/geneticsABSTRACT
The whole nucleotide sequence of Oropouche virus medium (M) RNA, Orthobunyavirus genus, Bunyaviridae family, was obtained using a new genomic amplification method. This method is based on the use of a single and specific primer of high melting temperature in a linear amplification (LA), followed by a single primer polymerase chain reaction (LASP-PCR). The LASP-PCR was used to walk along the Oropouche M RNA completing the sequence in seven successive walks. The amplicons obtained in each walking step ranged from 300 to 1100 bp; however, amplicons of up to 3970 bp was obtained when the extension time of the LASP-PCR was increased from 120 to 270 s. This method was tested successfully for Escherichia coli and cytomegalovirus obtaining amplicons of up to 2130 and 6500 bp, respectively, indicating that it can be applied to amplify unknown DNA sequences adjacent to a short stretch of known sequence of more complex genomes.
Subject(s)
Base Sequence , Bunyaviridae/genetics , Chromosome Walking/methods , Polymerase Chain Reaction/methods , RNA, Viral/chemistry , Cytomegalovirus/genetics , DNA Primers , DNA, Complementary/genetics , Escherichia coli/genetics , Escherichia coli Proteins/genetics , RNA, Viral/genetics , Sequence Analysis, DNAABSTRACT
Some genes present in only certain strains of the genetically diverse gastric pathogen Helicobacter pylori may affect its phenotype and/or evolutionary potential. Here we describe a new 16.3-kb segment, 7 of whose 16 open reading frames are homologs of type IV secretion genes (virB4, virB7 to virB11, and virD4), the third such putative secretion gene cluster found in H. pylori. This segment, to be called tfs3, was discovered by subtractive hybridization and chromosome walking. Full-length and truncated tfs3 elements were found in 20 and 19%, respectively, of 94 strains tested, which were from Spain, Peru, India, and Japan. A tfs3 remnant (6 kb) was found in an archived stock of reference strain J99, although it was not included in this strain's published genome sequence. PCR and DNA sequence analyses indicated the following. (i) tfs3's ends are conserved. (ii) Right-end insertion occurred at one specific site in a chromosomal region that is varied in gene content and arrangement, the "plasticity zone." (iii) Left-end insertion occurred at different sites in each of nine strains studied. (iv) Sequences next to the right-end target in tfs3-free strains were absent from most strains carrying full-length tfs3 elements. These patterns suggested insertion by a transposition-like event, but one in which targets are chosen with little or no specificity at the left end and high specificity at the right end, thereby deleting the intervening DNA.
Subject(s)
Bacterial Proteins/genetics , DNA Transposable Elements , Genes, Bacterial , Helicobacter pylori/genetics , Bacterial Proteins/metabolism , Base Sequence , Chromosome Walking , DNA, Bacterial/analysis , Helicobacter pylori/physiology , Humans , India , Japan , Molecular Sequence Data , Multigene Family , Nucleic Acid Hybridization , Open Reading Frames , Peru , Sequence Analysis, DNA , SpainABSTRACT
Eight additional genes, jadX, O, P, Q, S, T, U and V, in the jad cluster of Streptomyces venezuelae ISP5230, were located immediately downstream of jadN by chromosome walking. Sequence analyses and comparisons implicated them in biosynthesis of the 2,6-dideoxysugar in jadomycin B. The genes were cloned in Escherichia coli, inactivated by inserting an apramycin resistance cassette with a promoter driving transcription of downstream genes, and transferred into Streptomyces venezuelae by intergeneric conjugation. Analysis by HPLC and NMR of intermediates accumulated by cultures of the insertionally inactivated Streptomyces venezuelae mutants indicated that jadO, P, Q, S, T, U and V mediate formation of the dideoxysugar moiety of jadomycin B and its attachment to the aglycone. Based on these results and sequence similarities to genes described in other species producing deoxysugar derivatives, a biosynthetic pathway is proposed in which the jadQ product (glucose-1-phosphate nucleotidyltransferase) activates glucose to its nucleotide diphosphate (NDP) derivative, and the jadT product (a 4,6-dehydratase) converts this to NDP-4-keto-6-deoxy-D-glucose. An NDP-hexose 2,3-dehydratase and an oxidoreductase, encoded by jadO and jadP, respectively, catalyse ensuing reactions that produce an NDP-2,6-dideoxy-D-threo-4-hexulose. The product of jadU (NDP-4-keto-2,6-dideoxy-5-epimerase) converts this intermediate to its L-erythro form and the jadV product (NDP-4-keto-2,6-dideoxyhexose 4-ketoreductase) reduces the keto group of the NDP-4-hexulose to give an activated form of the L-digitoxose moiety in jadomycin B. Finally, a glycosyltransferase encoded by jadS transfers the activated sugar to jadomycin aglycone. The function of jadX is unclear; the gene is not essential for jadomycin B biosynthesis, but its presence ensures complete conversion of the aglycone to the glycoside. The deduced amino acid sequence of a 612 bp ORF (jadR*) downstream of the dideoxysugar biosynthesis genes resembles many TetR-family transcriptional regulator sequences.
Subject(s)
Anti-Bacterial Agents/chemical synthesis , Chromosomes, Fungal/genetics , Deoxy Sugars/biosynthesis , Hexoses/biosynthesis , Isoquinolines/metabolism , Multigene Family , Streptomyces/genetics , Amino Acid Sequence , Chromosome Walking , Cloning, Molecular , Conjugation, Genetic , Diploidy , Escherichia coli/genetics , Genetic Complementation Test , Genotype , Introns , Molecular Sequence Data , Phenotype , Promoter Regions, Genetic , Restriction Mapping , Sequence Alignment , Sequence Homology, Amino AcidABSTRACT
Regions of the Streptomyces venezuelae ISP5230 chromosome flanking pabAB, an amino-deoxychorismate synthase gene needed for chloramphenicol (Cm) production, were examined for involvement in biosynthesis of the antibiotic. Three of four ORFs in the sequence downstream of pabAB resembled genes involved in the shikimate pathway. BLASTX searches of GenBank showed that the deduced amino acid sequences of ORF3 and ORF4 were similar to proteins encoded by monofunctional genes for chorismate mutase and prephenate dehydrogenase, respectively, while the sequence of the ORF5 product resembled deoxy-arabino-heptulosonate-7-phosphate (DAHP) synthase, the enzyme that initiates the shikimate pathway. A relationship to Cm biosynthesis was indicated by sequence similarities between the ORF6 product and membrane proteins associated with Cm export. BLASTX searches of GenBank for matches with the translated sequence of ORF1 in chromosomal DNA immediately upstream of pabAB did not detect products relevant to Cm biosynthesis. However, the presence of Cm biosynthesis genes in a 7.5 kb segment of the chromosome beyond ORF1 was inferred when conjugal transfer of the DNA into a blocked S. venezuelae mutant restored Cm production. Deletions in the 7.5 kb segment of the wild-type chromosome eliminated Cm production, confirming the presence of Cm biosynthesis genes in this region. Sequencing and analysis located five ORFs, one of which (ORF8) was deduced from BLAST searches of GenBank, and from characteristic motifs detected in alignments of its deduced amino acid sequence, to be a monomodular nonribosomal peptide synthetase. GenBank searches did not identify ORF7, but matched the translated sequences of ORFs 9, 10 and 11 with short-chain ketoreductases, the ATP-binding cassettes of ABC transporters, and coenzyme A ligases, respectively. As has been shown for ORF2, disrupting ORF3, ORF7, ORF8 or ORF9 blocked Cm production.
Subject(s)
Bacterial Proteins/genetics , Chloramphenicol/biosynthesis , Genes, Bacterial , Shikimic Acid/metabolism , Streptomyces/genetics , Transaminases/genetics , Amino Acid Sequence , Bacterial Proteins/chemistry , Carbon-Nitrogen Ligases , Chromosome Walking , Cloning, Molecular , Gene Deletion , Genetic Complementation Test , Molecular Sequence Data , Multigene Family , Mutation , Sequence Analysis, DNA , Streptomyces/metabolism , Transaminases/metabolismABSTRACT
Six different domains of CAG repeats from a human chromosome 12-specific cosmid library were identified, cloned, and sequenced. These CAG repeat domains were localized into the human chromosomic region 12q24.1. Five of them constitute repeat candidates for expansions in autosomal dominant neurological disorders with genetic anticipation, and they can also contribute to the chromosome walking in the human genome project.
Subject(s)
Chromosomes, Human, Pair 12/genetics , Trinucleotide Repeats/genetics , Anticipation, Genetic , Ataxins , Base Sequence , Chromosome Walking , Chromosomes, Artificial, Yeast/genetics , Cloning, Molecular , Codon/genetics , Cosmids/genetics , DNA Probes/genetics , Genetic Linkage/genetics , Glutamine/genetics , Histidine/genetics , Human Genome Project , Humans , Molecular Sequence Data , Nerve Tissue Proteins , Nervous System Diseases/genetics , Peptides/genetics , Proteins/genetics , Trinucleotide Repeat Expansion/geneticsABSTRACT
Three XX males, two XX true hermaphrodites, and an XY female were studied for possible deletions using probes for the recently characterised SRY gene and the pseudoautosomal boundary. The XX males and true hermaphrodites were negative for all three probes, while the XY female was positive. One XX male and one XX true hermaphrodite were sibs. A previous sib pair of an XX male and an XX true hermaphrodite have been shown to be positive for Y chromosomal material near the pseudoautosomal boundary. Thus, both phenotypes can be produced from different mutations, some involving the SRY gene and others not.