RESUMO
Background: One of the most overlooked, yet critical, components of a whole genome sequencing (WGS) project is the submission and curation of the data to a genomic repository, most commonly the National Center for Biotechnology Information (NCBI). While large genome centers or genome groups have developed software tools for post-annotation assembly filtering, annotation, and conversion into the NCBI's annotation table format, these tools typically require back-end setup and connection to an Structured Query Language (SQL) database and/or some knowledge of programming (Perl, Python) to implement. With WGS becoming commonplace, genome sequencing projects are moving away from the genome centers and into the ecology or biology lab, where fewer resources are present to support the process of genome assembly curation. To fill this gap, we developed software to assess, filter, and transfer annotation and convert a draft genome assembly and annotation set into the NCBI annotation table (.tbl) format, facilitating submission to the NCBI Genome Assembly database. This software has no dependencies, is compatible across platforms, and utilizes a simple command to perform a variety of simple and complex post-analysis, pre-NCBI submission WGS project tasks. Findings: The Genome Annotation Generator is a consistent and user-friendly bioinformatics tool that can be used to generate a .tbl file that is consistent with the NCBI submission pipeline. Conclusions: The Genome Annotation Generator achieves the goal of providing a publicly available tool that will facilitate the submission of annotated genome assemblies to the NCBI. It is useful for any individual researcher or research group that wishes to submit a genome assembly of their study system to the NCBI.
Assuntos
Biologia Computacional/métodos , Software , Bases de Dados Genéticas , Anotação de Sequência Molecular , Sequenciamento Completo do GenomaRESUMO
BACKGROUND: The Braconid wasp Fopius arisanus (Sonan) has been utilized for biological control of the Mediterranean fruit fly (Ceratitis capitata), and the oriental fruit fly (Bactrocera dorsalis), both of which are phytophagous fruit fly pests of economic importance in many tropical and subtropical regions of the world. We have sequenced and assembled the transcriptome of this wasp using tissue from four different life stages: larvae, pupae, adult males and adult females, with the aim to contribute foundational resources to aid in the understanding of the biology and behavior of this important parasitoid. FINDINGS: The transcriptome of the parasitic wasp Fopius arisanus was sequenced and reconstructed using a strategy that identified 15,346 high confidence, non-redundant transcripts derived from 8,307 predicted unigenes. In addition, Pfam domain annotations were detected in 78 % of these transcripts. The distribution of transcript length is comparable to that found in other hymenoptera genomes. Through orthology analysis, 7,154 transcripts were identified as having orthologs in at least one of the four other hymenopteran parasitoid species examined. Approximately 4,000 core orthologs were found to be shared between F. arisanus and all four of the other parasitoids. CONCLUSIONS: Availability of high quality genomic data is fundamental for the improvement and advancement of research in any biological organism. Parasitic wasps are important in the biological control of agricultural pests. The transcriptome data presented here represent the first large-scale molecular resource for this species, or any closely related Opiine species. The assembly is available in NCBI for use by the scientific community, with supporting data available in GigaDB.
Assuntos
Ceratitis capitata/parasitologia , Himenópteros/fisiologia , Óvulo/parasitologia , Transcriptoma , Animais , Himenópteros/genética , RNA/genéticaRESUMO
BACKGROUND: Bactrocera cucurbitae is a serious global agricultural pest. Basic genomic information is lacking for this species, and this would be useful to inform methods of control, damage mitigation, and eradication efforts. Here, we have sequenced, assembled, and annotated a comprehensive transcriptome for a mass-rearing sexing strain of this species. This forms a foundational genomic and transcriptomic resource that can be used to better understand the physiology and biochemistry of this insect as well as being a useful tool for population genetics. FINDINGS: A transcriptome assembly was constructed containing 17,654 transcript isoforms derived from 10,425 unigenes. This transcriptome size is similar to reports from other Tephritid species and probably includes about 70-80% of the protein-coding genes in the genome. The dataset is publicly available in NCBI and GigaDB as a resource for researchers. CONCLUSIONS: Foundational knowledge on the protein-coding genes in B. cucurbitae will lead to improved resources for this species. Through comparison with a model system such as Drosophila as well as a growing number of related Tephritid transcriptomes, improved strategies can be developed to control this pest.