RESUMO
We report on the quality of a whole-genome assembly of Drosophila melanogaster and the nature of the computer algorithms that accomplished it. Three independent external data sources essentially agree with and support the assembly's sequence and ordering of contigs across the euchromatic portion of the genome. In addition, there are isolated contigs that we believe represent nonrepetitive pockets within the heterochromatin of the centromeres. Comparison with a previously sequenced 2.9- megabase region indicates that sequencing accuracy within nonrepetitive segments is greater than 99. 99% without manual curation. As such, this initial reconstruction of the Drosophila sequence should be of substantial value to the scientific community.
Assuntos
Biologia Computacional , Drosophila melanogaster/genética , Genoma , Análise de Sequência de DNA , Algoritmos , Animais , Cromatina/genética , Mapeamento de Sequências Contíguas , Eucromatina , Genes de Insetos , Heterocromatina/genética , Dados de Sequência Molecular , Mapeamento Físico do Cromossomo , Sequências Repetitivas de Ácido Nucleico , Sitios de Sequências RotuladasRESUMO
The fly Drosophila melanogaster is one of the most intensively studied organisms in biology and serves as a model system for the investigation of many developmental and cellular processes common to higher eukaryotes, including humans. We have determined the nucleotide sequence of nearly all of the approximately 120-megabase euchromatic portion of the Drosophila genome using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map. Efforts are under way to close the remaining gaps; however, the sequence is of sufficient accuracy and contiguity to be declared substantially complete and to support an initial analysis of genome structure and preliminary gene annotation and interpretation. The genome encodes approximately 13,600 genes, somewhat fewer than the smaller Caenorhabditis elegans genome, but with comparable functional diversity.
Assuntos
Drosophila melanogaster/genética , Genoma , Análise de Sequência de DNA , Animais , Transporte Biológico/genética , Cromatina/genética , Clonagem Molecular , Biologia Computacional , Mapeamento de Sequências Contíguas , Sistema Enzimático do Citocromo P-450/genética , Reparo do DNA/genética , Replicação do DNA/genética , Drosophila melanogaster/metabolismo , Eucromatina , Biblioteca Gênica , Genes de Insetos , Heterocromatina/genética , Proteínas de Insetos/química , Proteínas de Insetos/genética , Proteínas de Insetos/fisiologia , Proteínas Nucleares/genética , Biossíntese de Proteínas , Transcrição GênicaRESUMO
We have optimized the conditions for using the Stretch modification for the Applied Biosystems 373 Automated DNA Sequencers for sequencing double-stranded DNA using 34-cm well-to-read and 48-cm well-to-read configurations. With the manufacturer's recommended settings, uneven spacing within the first 100 bases was observed, which led to miscalls, insertions and deletions in the analyzed data. A significant decrease in accuracy for reads greater than 400 bases was also observed. Various gel concentrations were tested to improve the base spacing for the first 100 bases while maintaining accuracy and usable length of data. A longer average usable length and better resolution of smaller fragments were achieved by increased acrylamide concentration coupled with increased wattage. Using the Applied Biosystems CATALYST 800 Molecular Biology LabStation, Taq dye primer cycle sequencing reactions were optimized for -21 M13 and M13RP1 primers to produce a more even distribution of dye-labeled fragments that increased the overall signal strengths and decreased background signal. These reaction products, run on the Stretch sequencers using the new gel conditions, provided longer reads with increased reliability and accuracy of the data.