RESUMO
A synergistic combination of two next-generation sequencing platforms with a detailed comparative BAC physical contig map provided a cost-effective assembly of the genome sequence of the domestic turkey (Meleagris gallopavo). Heterozygosity of the sequenced source genome allowed discovery of more than 600,000 high quality single nucleotide variants. Despite this heterozygosity, the current genome assembly (â¼1.1 Gb) includes 917 Mb of sequence assigned to specific turkey chromosomes. Annotation identified nearly 16,000 genes, with 15,093 recognized as protein coding and 611 as non-coding RNA genes. Comparative analysis of the turkey, chicken, and zebra finch genomes, and comparing avian to mammalian species, supports the characteristic stability of avian genomes and identifies genes unique to the avian lineage. Clear differences are seen in number and variety of genes of the avian immune system where expansions and novel genes are less frequent than examples of gene loss. The turkey genome sequence provides resources to further understand the evolution of vertebrate genomes and genetic variation underlying economically important quantitative traits in poultry. This integrated approach may be a model for providing both gene and chromosome level assemblies of other species with agricultural, ecological, and evolutionary interest.
Assuntos
Genoma , Perus/genética , Animais , Sequência de Bases , Mapeamento Cromossômico , DNA/genética , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA , Homologia de Sequência do Ácido Nucleico , Especificidade da EspécieRESUMO
BACKGROUND: Transcriptome sequencing using next-generation sequencing platforms will soon be competing with DNA microarray technologies for global gene expression analysis. As a preliminary evaluation of these promising technologies, we performed deep sequencing of cDNA synthesized from the Microarray Quality Control (MAQC) reference RNA samples using Roche's 454 Genome Sequencer FLX. RESULTS: We generated more that 3.6 million sequence reads of average length 250 bp for the MAQC A and B samples and introduced a data analysis pipeline for translating cDNA read counts into gene expression levels. Using BLAST, 90% of the reads mapped to the human genome and 64% of the reads mapped to the RefSeq database of well annotated genes with e-values = 10-20. We measured gene expression levels in the A and B samples by counting the numbers of reads that mapped to individual RefSeq genes in multiple sequencing runs to evaluate the MAQC quality metrics for reproducibility, sensitivity, specificity, and accuracy and compared the results with DNA microarrays and Quantitative RT-PCR (QRTPCR) from the MAQC studies. In addition, 88% of the reads were successfully aligned directly to the human genome using the AceView alignment programs with an average 90% sequence similarity to identify 137,899 unique exon junctions, including 22,193 new exon junctions not yet contained in the RefSeq database. CONCLUSION: Using the MAQC metrics for evaluating the performance of gene expression platforms, the ExpressSeq results for gene expression levels showed excellent reproducibility, sensitivity, and specificity that improved systematically with increasing shotgun sequencing depth, and quantitative accuracy that was comparable to DNA microarrays and QRTPCR. In addition, a careful mapping of the reads to the genome using the AceView alignment programs shed new light on the complexity of the human transcriptome including the discovery of thousands of new splice variants.
Assuntos
Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Análise de Sequência de RNA/métodos , DNA Complementar/genética , Bases de Dados Genéticas , Biblioteca Gênica , Genoma Humano , Humanos , Controle de Qualidade , Padrões de Referência , Sensibilidade e Especificidade , Alinhamento de Sequência , SoftwareRESUMO
BACKGROUND: The design and construction of novel biological systems by combining basic building blocks represents a dominant paradigm in synthetic biology. Creating and maintaining a database of these building blocks is a way to streamline the fabrication of complex constructs. The Registry of Standard Biological Parts (Registry) is the most advanced implementation of this idea. METHODS/PRINCIPAL FINDINGS: By analyzing inclusion relationships between the sequences of the Registry entries, we build a network that can be related to the Registry abstraction hierarchy. The distribution of entry reuse and complexity was extracted from this network. The collection of clones associated with the database entries was also analyzed. The plasmid inserts were amplified and sequenced. The sequences of 162 inserts could be confirmed experimentally but unexpected discrepancies have also been identified. CONCLUSIONS/SIGNIFICANCE: Organizational guidelines are proposed to help design and manage this new type of scientific resources. In particular, it appears necessary to compare the cost of ensuring the integrity of database entries and associated biological samples with their value to the users. The initial strategy that permits including any combination of parts irrespective of its potential value leads to an exponential and economically unsustainable growth that may be detrimental to the quality and long-term value of the resource to its users.