Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 17 de 17
Filter
Add more filters










Publication year range
1.
PLoS One ; 7(11): e50586, 2012.
Article in English | MEDLINE | ID: mdl-23226320

ABSTRACT

Formalin fixed paraffin embedded (FFPE) tissues are a vast resource of annotated clinical samples. As such, they represent highly desirable and informative materials for the application of high definition genomics for improved patient management and to advance the development of personalized therapeutics. However, a limitation of FFPE tissues is the variable quality of DNA extracted for analyses. Furthermore, admixtures of non-tumor and polyclonal neoplastic cell populations limit the number of biopsies that can be studied and make it difficult to define cancer genomes in patient samples. To exploit these valuable tissues we applied flow cytometry-based methods to isolate pure populations of tumor cell nuclei from FFPE tissues and developed a methodology compatible with oligonucleotide array CGH and whole exome sequencing analyses. These were used to profile a variety of tumors (breast, brain, bladder, ovarian and pancreas) including the genomes and exomes of matching fresh frozen and FFPE pancreatic adenocarcinoma samples.


Subject(s)
Formaldehyde/metabolism , High-Throughput Nucleotide Sequencing/methods , Paraffin Embedding , Sequence Analysis, DNA/methods , Tissue Fixation , Adenocarcinoma/genetics , Adenocarcinoma/pathology , Cloning, Molecular , Comparative Genomic Hybridization , Humans , Pancreatic Neoplasms/genetics , Pancreatic Neoplasms/pathology
2.
Genes Chromosomes Cancer ; 51(5): 490-500, 2012 May.
Article in English | MEDLINE | ID: mdl-22334367

ABSTRACT

To identify the genetic drivers of colorectal tumorigenesis, we applied array comparative genomic hybridization (aCGH) to 13 formalin-fixed paraffin-embedded (FFPE) samples of early, localized human colon adenocarcinomas arising in high-grade adenomas (so-called "malignant polyps"). These lesions are small and hence the amount of DNA is limited. Additionally, the quality of DNA is compromised due to the fragmentation as a consequence of formalin fixation. To overcome these problems, we optimized a newly developed isothermal whole genome amplification system (NuGEN Ovation® WGA FFPE System). Starting with 100 ng of FFPE DNA, the amplification system produced 4.01 ± 0.29 µg (mean ± standard deviation) of DNA. The excellent quality of amplified DNA was further indicated by a high signal-to-noise ratio and a low derivative log(2) ratio spread. Both, the amount of amplified DNA and aCGH performance were independent of the age of the FFPE blocks and the associated degradation of the extracted DNA. We observed losses of chromosome arms 5q and 18q in the adenoma components of the malignant polyp samples, while the embedded early carcinomas revealed losses of 8p, 17p, and 18, and gains of 7, 13, and 20. Aberrations detected in the adenoma components were invariably maintained in the embedded carcinomas. This approach demonstrates that using isothermally whole genome amplified FFPE DNA is technically suitable for aCGH. In addition to demonstrating the clonal origin of the adenoma and carcinoma part within a malignant polyp, the gain of chromosome arm 20q was an indicator for progression from adenoma to carcinoma.


Subject(s)
Clonal Evolution/genetics , Colonic Polyps/genetics , Colorectal Neoplasms/genetics , Comparative Genomic Hybridization , Adenoma/genetics , Carcinoma/genetics , Chromosome Aberrations , Chromosomes, Human, Pair 20 , DNA Copy Number Variations , DNA, Neoplasm , Humans
3.
Cell ; 147(3): 690-703, 2011 Oct 28.
Article in English | MEDLINE | ID: mdl-22036573

ABSTRACT

Determining the composition of protein complexes is an essential step toward understanding the cell as an integrated system. Using coaffinity purification coupled to mass spectrometry analysis, we examined protein associations involving nearly 5,000 individual, FLAG-HA epitope-tagged Drosophila proteins. Stringent analysis of these data, based on a statistical framework designed to define individual protein-protein interactions, led to the generation of a Drosophila protein interaction map (DPiM) encompassing 556 protein complexes. The high quality of the DPiM and its usefulness as a paradigm for metazoan proteomes are apparent from the recovery of many known complexes, significant enrichment for shared functional attributes, and validation in human cells. The DPiM defines potential novel members for several important protein complexes and assigns functional links to 586 protein-coding genes lacking previous experimental annotation. The DPiM represents, to our knowledge, the largest metazoan protein complex map and provides a valuable resource for analysis of protein complex evolution.


Subject(s)
Drosophila Proteins/metabolism , Drosophila melanogaster/metabolism , Protein Interaction Mapping , Animals , Drosophila Proteins/genetics , Proteasome Endopeptidase Complex/metabolism , Proteomics , SNARE Proteins/metabolism
4.
Methods Mol Biol ; 723: 257-72, 2011.
Article in English | MEDLINE | ID: mdl-21370071

ABSTRACT

We describe a method for high-throughput production of protein expression-ready clones. Open-reading frames (ORFs) are amplified by PCR from sequence-verified cDNA clones and subcloned into an appropriate loxP-containing donor vector. Each ORF is represented by two types of clones, one containing the native stop codon for expression of the native protein or amino-terminal fusion constructs and the other made without the stop codon to allow for carboxy-terminal fusion constructs. The expression-ready clone is sequenced to verify that no PCR errors have been introduced. We have made over 11,000 clones ranging in size from 78-6,699 bp with a median of 1,056 bp. This is the largest set of fully sequence-verified-"movable ORFs" of any model organism genome project. The donor clone facilitates rapid and simple transfer of the ORF into any expression vector of choice. Vectors are available for expressing these ORFs in bacteria, cell lines, or transgenic animals. The flexibility of this ORF clone collection makes possible a variety of proteomic applications, including protein interaction mapping, high-throughput cell-based expression screens, and functional studies. We have transferred 5,800 ORFs to a vector that allows production of a FLAG-HA tagged protein in Drosophila tissue culture cells with a metallothionein-inducible promoter. These clones are being used to produce a protein complex map of Drosophila from Schneider cells.


Subject(s)
Gene Library , Proteomics/methods , Ampicillin/pharmacology , Bacteria/cytology , Bacteria/drug effects , Bacteria/genetics , Cell Culture Techniques , Cell Extracts , Chromatography, Gel , Cloning, Molecular , Computational Biology , DNA Primers/genetics , Deoxyribonucleases, Type II Site-Specific/metabolism , Drug Resistance, Bacterial , Electrophoresis, Agar Gel , Gene Expression , Genetic Vectors/genetics , Open Reading Frames/genetics , Polymerase Chain Reaction , Transformation, Bacterial
5.
Genome Biol ; 10(7): R80, 2009.
Article in English | MEDLINE | ID: mdl-19627575

ABSTRACT

BACKGROUND: We previously established that six sequence-specific transcription factors that initiate anterior/posterior patterning in Drosophila bind to overlapping sets of thousands of genomic regions in blastoderm embryos. While regions bound at high levels include known and probable functional targets, more poorly bound regions are preferentially associated with housekeeping genes and/or genes not transcribed in the blastoderm, and are frequently found in protein coding sequences or in less conserved non-coding DNA, suggesting that many are likely non-functional. RESULTS: Here we show that an additional 15 transcription factors that regulate other aspects of embryo patterning show a similar quantitative continuum of function and binding to thousands of genomic regions in vivo. Collectively, the 21 regulators show a surprisingly high overlap in the regions they bind given that they belong to 11 DNA binding domain families, specify distinct developmental fates, and can act via different cis-regulatory modules. We demonstrate, however, that quantitative differences in relative levels of binding to shared targets correlate with the known biological and transcriptional regulatory specificities of these factors. CONCLUSIONS: It is likely that the overlap in binding of biochemically and functionally unrelated transcription factors arises from the high concentrations of these proteins in nuclei, which, coupled with their broad DNA binding specificities, directs them to regions of open chromatin. We suggest that most animal transcription factors will be found to show a similar broad overlapping pattern of binding in vivo, with specificity achieved by modulating the amount, rather than the identity, of bound factor.


Subject(s)
Blastoderm/metabolism , Drosophila Proteins/metabolism , Genome, Insect/genetics , Transcription Factors/metabolism , Animals , Binding Sites/genetics , Body Patterning/genetics , Chromatin Immunoprecipitation , Drosophila Proteins/genetics , Drosophila melanogaster/embryology , Drosophila melanogaster/genetics , Drosophila melanogaster/metabolism , Gene Expression Regulation, Developmental , Homeodomain Proteins/genetics , Protein Binding , Snail Family Transcription Factors , Transcription Factors/genetics , Transcription Initiation Site
6.
PLoS Biol ; 6(2): e27, 2008 Feb.
Article in English | MEDLINE | ID: mdl-18271625

ABSTRACT

Identifying the genomic regions bound by sequence-specific regulatory factors is central both to deciphering the complex DNA cis-regulatory code that controls transcription in metazoans and to determining the range of genes that shape animal morphogenesis. We used whole-genome tiling arrays to map sequences bound in Drosophila melanogaster embryos by the six maternal and gap transcription factors that initiate anterior-posterior patterning. We find that these sequence-specific DNA binding proteins bind with quantitatively different specificities to highly overlapping sets of several thousand genomic regions in blastoderm embryos. Specific high- and moderate-affinity in vitro recognition sequences for each factor are enriched in bound regions. This enrichment, however, is not sufficient to explain the pattern of binding in vivo and varies in a context-dependent manner, demonstrating that higher-order rules must govern targeting of transcription factors. The more highly bound regions include all of the over 40 well-characterized enhancers known to respond to these factors as well as several hundred putative new cis-regulatory modules clustered near developmental regulators and other genes with patterned expression at this stage of embryogenesis. The new targets include most of the microRNAs (miRNAs) transcribed in the blastoderm, as well as all major zygotically transcribed dorsal-ventral patterning genes, whose expression we show to be quantitatively modulated by anterior-posterior factors. In addition to these highly bound regions, there are several thousand regions that are reproducibly bound at lower levels. However, these poorly bound regions are, collectively, far more distant from genes transcribed in the blastoderm than highly bound regions; are preferentially found in protein-coding sequences; and are less conserved than highly bound regions. Together these observations suggest that many of these poorly bound regions are not involved in early-embryonic transcriptional regulation, and a significant proportion may be nonfunctional. Surprisingly, for five of the six factors, their recognition sites are not unambiguously more constrained evolutionarily than the immediate flanking DNA, even in more highly bound and presumably functional regions, indicating that comparative DNA sequence analysis is limited in its ability to identify functional transcription factor targets.


Subject(s)
Blastoderm/metabolism , Drosophila melanogaster/embryology , Transcription Factors/metabolism , Animals , Binding Sites , DNA/metabolism , Evolution, Molecular , MicroRNAs/metabolism
7.
Pediatr Dent ; 29(5): 397-402, 2007.
Article in English | MEDLINE | ID: mdl-18027774

ABSTRACT

PURPOSE: The purpose of this study was to analyze cases in which dentistry was combined with other procedures during a single outpatient general anesthetic (GA) in a children's hospital. Financial and time savings were evaluated for a subgroup of combined care patients. METHODS: Records of 120 patients who received combined dental and one other procedure under GA were reviewed. All were treated as outpatients, and dental procedures were more than just radiographs. Descriptive statistics were calculated for: (1) patient characteristics; (2) procedures; (3) times for procedures; (4) anesthesia; (5) recovery; and (6) total time in hospital. Records of 18 patients with combined dentistry and extraction of third molars were compared to 36 patients receiving the same procedures during separate GAs to evaluate time and costs for combined vs separate procedures. RESULTS: Patients ranged from 2 to 21 years, and 98% had special health care needs. Oral surgery (41%) and otolaryngology (23%) were most frequently combined with dentistry. Estimated mean savings for patients receiving dentistry and third molar extractions in combination were 312 minutes and $2,177. CONCLUSIONS: Combining care offers an economical vehicle for providing medical and dental care to patients needing multiple procedures. Awareness of the efficiency of combined care may lead to more combinations of procedures when possible.


Subject(s)
Ambulatory Surgical Procedures , Anesthesia, General/statistics & numerical data , Dental Care for Disabled , Otorhinolaryngologic Surgical Procedures , Tooth Extraction , Adolescent , Adult , Analysis of Variance , Anesthesia Recovery Period , Child , Child, Preschool , Cost-Benefit Analysis , Diagnosis-Related Groups , Female , Hospitals, Pediatric , Humans , Length of Stay , Male , Time Factors
8.
RNA ; 12(11): 1922-32, 2006 Nov.
Article in English | MEDLINE | ID: mdl-17018572

ABSTRACT

Adenosine deaminases that act on RNA [adenosine deaminase, RNA specific (ADAR)] catalyze the site-specific conversion of adenosine to inosine in primary mRNA transcripts. These re-coding events affect coding potential, splice sites, and stability of mature mRNAs. ADAR is an essential gene, and studies in mouse, Caenorhabditis elegans, and Drosophila suggest that its primary function is to modify adult behavior by altering signaling components in the nervous system. By comparing the sequence of isogenic cDNAs to genomic DNA, we have identified and experimentally verified 27 new targets of Drosophila ADAR. Our analyses led us to identify new classes of genes whose transcripts are targets of ADAR, including components of the actin cytoskeleton and genes involved in ion homeostasis and signal transduction. Our results indicate that editing in Drosophila increases the diversity of the proteome, and does so in a manner that has direct functional consequences on protein function.


Subject(s)
Adenosine Deaminase/metabolism , Drosophila melanogaster/genetics , Genes, Insect/genetics , RNA Editing/genetics , Adenosine Deaminase/genetics , Animals , Base Sequence , Cytoskeletal Proteins/metabolism , DNA Primers , DNA, Complementary/genetics , Drosophila Proteins/genetics , Drosophila melanogaster/enzymology , Eye Proteins/genetics , Ion Channels/genetics , Membrane Glycoproteins/genetics , Molecular Sequence Data , RNA-Binding Proteins , Receptors, Peptide/genetics , Reverse Transcriptase Polymerase Chain Reaction , Sequence Analysis, DNA , Signal Transduction/genetics
9.
Nat Protoc ; 1(2): 624-32, 2006.
Article in English | MEDLINE | ID: mdl-17406289

ABSTRACT

Libraries of cDNA clones are valuable resources for analyzing the expression, structure and regulation of genes, and for studying protein functions and interactions. Full-length cDNA clones provide information about intron and exon structures, splice junctions, and 5' and 3' untranslated regions (UTRs). Open reading frames (ORFs) derived from cDNA clones can be used to generate constructs allowing the expression of both wild-type proteins and proteins tagged at their amino or carboxy terminus. Thus, obtaining full-length cDNA clones and sequences for most or all genes in an organism is essential for understanding genome functions. EST sequencing samples cDNA libraries at random, an approach that is most useful at the beginning of large-scale screening projects. As projects progress towards completion, however, the probability of identifying unique cDNAs by EST sequencing diminishes, resulting in poor recovery of rare transcripts. Here we describe an adapted, high-throughput protocol intended for the recovery of specific, full-length clones from plasmid cDNA libraries in 5 d.


Subject(s)
Cloning, Molecular/methods , Gene Library , Plasmids/genetics , Open Reading Frames/genetics
10.
Nucleic Acids Res ; 33(21): e185, 2005 Dec 02.
Article in English | MEDLINE | ID: mdl-16326860

ABSTRACT

cDNA cloning is a central technology in molecular biology. cDNA sequences are used to determine mRNA transcript structures, including splice junctions, open reading frames (ORFs) and 5'- and 3'-untranslated regions (UTRs). cDNA clones are valuable reagents for functional studies of genes and proteins. Expressed Sequence Tag (EST) sequencing is the method of choice for recovering cDNAs representing many of the transcripts encoded in a eukaryotic genome. However, EST sequencing samples a cDNA library at random, and it recovers transcripts with low expression levels inefficiently. We describe a PCR-based method for directed screening of plasmid cDNA libraries. We demonstrate its utility in a screen of libraries used in our Drosophila EST projects for 153 transcription factor genes that were not represented by full-length cDNA clones in our Drosophila Gene Collection. We recovered high-quality, full-length cDNAs for 72 genes and variously compromised clones for an additional 32 genes. The method can be used at any scale, from the isolation of cDNA clones for a particular gene of interest, to the improvement of large gene collections in model organisms and the human. Finally, we discuss the relative merits of directed cDNA library screening and RT-PCR approaches.


Subject(s)
DNA, Complementary/genetics , Gene Library , Polymerase Chain Reaction/methods , Animals , DNA, Complementary/chemistry , Drosophila melanogaster/genetics , Expressed Sequence Tags , Genes, Insect , Plasmids/genetics , Sequence Analysis, DNA , Time Factors , Transcription Factors/genetics
12.
Proc Natl Acad Sci U S A ; 99(26): 16899-903, 2002 Dec 24.
Article in English | MEDLINE | ID: mdl-12477932

ABSTRACT

The National Institutes of Health Mammalian Gene Collection (MGC) Program is a multiinstitutional effort to identify and sequence a cDNA clone containing a complete ORF for each human and mouse gene. ESTs were generated from libraries enriched for full-length cDNAs and analyzed to identify candidate full-ORF clones, which then were sequenced to high accuracy. The MGC has currently sequenced and verified the full ORF for a nonredundant set of >9,000 human and >6,000 mouse genes. Candidate full-ORF clones for an additional 7,800 human and 3,500 mouse genes also have been identified. All MGC sequences and clones are available without restriction through public databases and clone distribution networks (see http:mgc.nci.nih.gov).


Subject(s)
DNA, Complementary/chemistry , Sequence Analysis, DNA , Algorithms , Animals , DNA, Complementary/analysis , Gene Library , Humans , Mice , Open Reading Frames
13.
Genome Res ; 12(8): 1294-300, 2002 Aug.
Article in English | MEDLINE | ID: mdl-12176937

ABSTRACT

Collections of full-length nonredundant cDNA clones are critical reagents for functional genomics. The first step toward these resources is the generation and single-pass sequencing of cDNA libraries that contain a high proportion of full-length clones. The first release of the Drosophila Gene Collection Release 1 (DGCr1) was produced from six libraries representing various tissues, developmental stages, and the cultured S2 cell line. Nearly 80,000 random 5' expressed sequence tags (5' expressed sequence tags [ESTs]from these libraries were collapsed into a nonredundant set of 5849 cDNAs, corresponding to ~40% of the 13,474 predicted genes in Drosophila. To obtain cDNA clones representing the remaining genes, we have generated an additional 157,835 5' ESTs from two previously existing and three new libraries. One new library is derived from adult testis, a tissue we previously did not exploit for gene discovery; two new cap-trapped normalized libraries are derived from 0-22-h embryos and adult heads. Taking advantage of the annotated D. melanogaster genome sequence, we clustered the ESTs by aligning them to the genome. Clusters that overlap genes not already represented by cDNA clones in the DGCr1 were analyzed further, and putative full-length clones were selected for inclusion in the new DGC. This second release of the DGC (DGCr2) contains 5061 additional clones, extending the collection to 10,910 cDNAs representing >70% of the predicted genes in Drosophila.


Subject(s)
DNA, Complementary/genetics , Drosophila melanogaster/genetics , Genes, Insect/genetics , Animals , Cell Line , Cluster Analysis , Drosophila melanogaster/cytology , Expressed Sequence Tags , Gene Library , Male , Membrane Glycoproteins , Membrane Proteins , Molecular Sequence Data , Platelet Glycoprotein GPIb-IX Complex , RNA/isolation & purification , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Testis/chemistry
14.
Genome Biol ; 3(12): RESEARCH0079, 2002.
Article in English | MEDLINE | ID: mdl-12537568

ABSTRACT

BACKGROUND: The Drosophila melanogaster genome was the first metazoan genome to have been sequenced by the whole-genome shotgun (WGS) method. Two issues relating to this achievement were widely debated in the genomics community: how correct is the sequence with respect to base-pair (bp) accuracy and frequency of assembly errors? And, how difficult is it to bring a WGS sequence to the accepted standard for finished sequence? We are now in a position to answer these questions. RESULTS: Our finishing process was designed to close gaps, improve sequence quality and validate the assembly. Sequence traces derived from the WGS and draft sequencing of individual bacterial artificial chromosomes (BACs) were assembled into BAC-sized segments. These segments were brought to high quality, and then joined to constitute the sequence of each chromosome arm. Overall assembly was verified by comparison to a physical map of fingerprinted BAC clones. In the current version of the 116.9 Mb euchromatic genome, called Release 3, the six euchromatic chromosome arms are represented by 13 scaffolds with a total of 37 sequence gaps. We compared Release 3 to Release 2; in autosomal regions of unique sequence, the error rate of Release 2 was one in 20,000 bp. CONCLUSIONS: The WGS strategy can efficiently produce a high-quality sequence of a metazoan genome while generating the reagents required for sequence finishing. However, the initial method of repeat assembly was flawed. The sequence we report here, Release 3, is a reliable resource for molecular genetic experimentation and computational analysis.


Subject(s)
Drosophila melanogaster/genetics , Euchromatin/genetics , Genome , Sequence Analysis, DNA/methods , Animals , Physical Chromosome Mapping/methods , Research Design , X Chromosome/genetics
15.
Genome Biol ; 3(12): RESEARCH0083, 2002.
Article in English | MEDLINE | ID: mdl-12537572

ABSTRACT

BACKGROUND: The recent completion of the Drosophila melanogaster genomic sequence to high quality and the availability of a greatly expanded set of Drosophila cDNA sequences, aligning to 78% of the predicted euchromatic genes, afforded FlyBase the opportunity to significantly improve genomic annotations. We made the annotation process more rigorous by inspecting each gene visually, utilizing a comprehensive set of curation rules, requiring traceable evidence for each gene model, and comparing each predicted peptide to SWISS-PROT and TrEMBL sequences. RESULTS: Although the number of predicted protein-coding genes in Drosophila remains essentially unchanged, the revised annotation significantly improves gene models, resulting in structural changes to 85% of the transcripts and 45% of the predicted proteins. We annotated transposable elements and non-protein-coding RNAs as new features, and extended the annotation of untranslated (UTR) sequences and alternative transcripts to include more than 70% and 20% of genes, respectively. Finally, cDNA sequence provided evidence for dicistronic transcripts, neighboring genes with overlapping UTRs on the same DNA sequence strand, alternatively spliced genes that encode distinct, non-overlapping peptides, and numerous nested genes. CONCLUSIONS: Identification of so many unusual gene models not only suggests that some mechanisms for gene regulation are more prevalent than previously believed, but also underscores the complex challenges of eukaryotic gene prediction. At present, experimental data and human curation remain essential to generate high-quality genome annotations.


Subject(s)
Computational Biology/methods , Drosophila melanogaster/genetics , Euchromatin/genetics , Genes, Insect , Genome , Animals , Databases, Genetic , Databases, Protein , Drosophila Proteins/genetics , Humans
16.
Genome Biol ; 3(12): RESEARCH0086, 2002.
Article in English | MEDLINE | ID: mdl-12537575

ABSTRACT

BACKGROUND: It is widely accepted that comparative sequence data can aid the functional annotation of genome sequences; however, the most informative species and features of genome evolution for comparison remain to be determined. RESULTS: We analyzed conservation in eight genomic regions (apterous, even-skipped, fushi tarazu, twist, and Rhodopsins 1, 2, 3 and 4) from four Drosophila species (D. erecta, D. pseudoobscura, D. willistoni, and D. littoralis) covering more than 500 kb of the D. melanogaster genome. All D. melanogaster genes (and 78-82% of coding exons) identified in divergent species such as D. pseudoobscura show evidence of functional constraint. Addition of a third species can reveal functional constraint in otherwise non-significant pairwise exon comparisons. Microsynteny is largely conserved, with rearrangement breakpoints, novel transposable element insertions, and gene transpositions occurring in similar numbers. Rates of amino-acid substitution are higher in uncharacterized genes relative to genes that have previously been studied. Conserved non-coding sequences (CNCSs) tend to be spatially clustered with conserved spacing between CNCSs, and clusters of CNCSs can be used to predict enhancer sequences. CONCLUSIONS: Our results provide the basis for choosing species whose genome sequences would be most useful in aiding the functional annotation of coding and cis-regulatory sequences in Drosophila. Furthermore, this work shows how decoding the spatial organization of conserved sequences, such as the clustering of CNCSs, can complement efforts to annotate eukaryotic genomes on the basis of sequence conservation alone.


Subject(s)
Computational Biology/methods , Drosophila/genetics , Genome , Animals , Conserved Sequence/genetics , Databases, Genetic , Drosophila melanogaster/genetics , Evolution, Molecular , Forecasting , Gene Rearrangement , Genes, Insect , Genetic Variation , RNA, Messenger/analysis , Sequence Analysis, DNA/methods , Species Specificity , Untranslated Regions/analysis
17.
Genome Biol ; 3(12): RESEARCH0080, 2002.
Article in English | MEDLINE | ID: mdl-12537569

ABSTRACT

BACKGROUND: A collection of sequenced full-length cDNAs is an important resource both for functional genomics studies and for the determination of the intron-exon structure of genes. Providing this resource to the Drosophila melanogaster research community has been a long-term goal of the Berkeley Drosophila Genome Project. We have previously described the Drosophila Gene Collection (DGC), a set of putative full-length cDNAs that was produced by generating and analyzing over 250,000 expressed sequence tags (ESTs) derived from a variety of tissues and developmental stages. RESULTS: We have generated high-quality full-insert sequence for 8,921 clones in the DGC. We compared the sequence of these clones to the annotated Release 3 genomic sequence, and identified more than 5,300 cDNAs that contain a complete and accurate protein-coding sequence. This corresponds to at least one splice form for 40% of the predicted D. melanogaster genes. We also identified potential new cases of RNA editing. CONCLUSIONS: We show that comparison of cDNA sequences to a high-quality annotated genomic sequence is an effective approach to identifying and eliminating defective clones from a cDNA collection and ensure its utility for experimentation. Clones were eliminated either because they carry single nucleotide discrepancies, which most probably result from reverse transcriptase errors, or because they are truncated and contain only part of the protein-coding sequence.


Subject(s)
DNA, Complementary/analysis , Databases, Genetic , Drosophila melanogaster/genetics , Amino Acid Sequence/genetics , Animals , Base Sequence/genetics , Drosophila melanogaster/chemistry , Genes, Insect , Molecular Sequence Data , Sequence Analysis, DNA/methods
SELECTION OF CITATIONS
SEARCH DETAIL
...