Pesquisa | BVS Integralidade em Saúde

GIDL: a rule based expert system for GenBank Intelligent Data Loading into the Molecular Biodiversity Database.

Pannarale, Paolo; Catalano, Domenico; De Caro, Giorgio; Grillo, Giorgio; Leo, Pietro; Pappadà, Graziano; Rubino, Francesco; Scioscia, Gaetano; Licciulli, Flavio.

BMC Bioinformatics ; 13 Suppl 4: S4, 2012 Mar 28.

Artigo em Inglês | MEDLINE | ID: mdl-22536971

RESUMO

BACKGROUND: In the scientific biodiversity community, it is increasingly perceived the need to build a bridge between molecular and traditional biodiversity studies. We believe that the information technology could have a preeminent role in integrating the information generated by these studies with the large amount of molecular data we can find in bioinformatics public databases. This work is primarily aimed at building a bioinformatic infrastructure for the integration of public and private biodiversity data through the development of GIDL, an Intelligent Data Loader coupled with the Molecular Biodiversity Database. The system presented here organizes in an ontological way and locally stores the sequence and annotation data contained in the GenBank primary database. METHODS: The GIDL architecture consists of a relational database and of an intelligent data loader software. The relational database schema is designed to manage biodiversity information (Molecular Biodiversity Database) and it is organized in four areas: MolecularData, Experiment, Collection and Taxonomy. The MolecularData area is inspired to an established standard in Generic Model Organism Databases, the Chado relational schema. The peculiarity of Chado, and also its strength, is the adoption of an ontological schema which makes use of the Sequence Ontology. The Intelligent Data Loader (IDL) component of GIDL is an Extract, Transform and Load software able to parse data, to discover hidden information in the GenBank entries and to populate the Molecular Biodiversity Database. The IDL is composed by three main modules: the Parser, able to parse GenBank flat files; the Reasoner, which automatically builds CLIPS facts mapping the biological knowledge expressed by the Sequence Ontology; the DBFiller, which translates the CLIPS facts into ordered SQL statements used to populate the database. In GIDL Semantic Web technologies have been adopted due to their advantages in data representation, integration and processing. RESULTS AND CONCLUSIONS: Entries coming from Virus (814,122), Plant (1,365,360) and Invertebrate (959,065) divisions of GenBank rel.180 have been loaded in the Molecular Biodiversity Database by GIDL. Our system, combining the Sequence Ontology and the Chado schema, allows a more powerful query expressiveness compared with the most commonly used sequence retrieval systems like Entrez or SRS.

Assuntos

Biodiversidade , Biologia Computacional/métodos , Bases de Dados de Ácidos Nucleicos , Sistemas Inteligentes , Animais , Internet , Software

Towards barcode markers in Fungi: an intron map of Ascomycota mitochondria.

Santamaria, Monica; Vicario, Saverio; Pappadà, Graziano; Scioscia, Gaetano; Scazzocchio, Claudio; Saccone, Cecilia.

BMC Bioinformatics ; 10 Suppl 6: S15, 2009 Jun 16.

Artigo em Inglês | MEDLINE | ID: mdl-19534740

RESUMO

BACKGROUND: A standardized and cost-effective molecular identification system is now an urgent need for Fungi owing to their wide involvement in human life quality. In particular the potential use of mitochondrial DNA species markers has been taken in account. Unfortunately, a serious difficulty in the PCR and bioinformatic surveys is due to the presence of mobile introns in almost all the fungal mitochondrial genes. The aim of this work is to verify the incidence of this phenomenon in Ascomycota, testing, at the same time, a new bioinformatic tool for extracting and managing sequence databases annotations, in order to identify the mitochondrial gene regions where introns are missing so as to propose them as species markers. METHODS: The general trend towards a large occurrence of introns in the mitochondrial genome of Fungi has been confirmed in Ascomycota by an extensive bioinformatic analysis, performed on all the entries concerning 11 mitochondrial protein coding genes and 2 mitochondrial rRNA (ribosomal RNA) specifying genes, belonging to this phylum, available in public nucleotide sequence databases. A new query approach has been developed to retrieve effectively introns information included in these entries. RESULTS: After comparing the new query-based approach with a blast-based procedure, with the aim of designing a faithful Ascomycota mitochondrial intron map, the first method appeared clearly the most accurate. Within this map, despite the large pervasiveness of introns, it is possible to distinguish specific regions comprised in several genes, including the full NADH dehydrogenase subunit 6 (ND6) gene, which could be considered as barcode candidates for Ascomycota due to their paucity of introns and to their length, above 400 bp, comparable to the lower end size of the length range of barcodes successfully used in animals. CONCLUSION: The development of the new query system described here would answer the pressing requirement to improve drastically the bioinformatics support to the DNA Barcode Initiative. The large scale investigation of Ascomycota mitochondrial introns performed through this tool, allowing to exclude the introns-rich sequences from the barcode candidates exploration, could be the first step towards a mitochondrial barcoding strategy for these organisms, similar to the standard approach employed in metazoans.

Assuntos

Ascomicetos/genética , DNA Mitocondrial/química , Íntrons , Genes Fúngicos , Marcadores Genéticos , Genoma Fúngico , Genoma Mitocondrial

HmtDB, a human mitochondrial genomic resource based on variability studies supporting population genetics and biomedical research.

Attimonelli, Marcella; Accetturo, Matteo; Santamaria, Monica; Lascaro, Daniela; Scioscia, Gaetano; Pappadà, Graziano; Russo, Luigi; Zanchetta, Luigi; Tommaseo-Ponzetta, Mila.

BMC Bioinformatics ; 6 Suppl 4: S4, 2005 Dec 01.

Artigo em Inglês | MEDLINE | ID: mdl-16351753

RESUMO

BACKGROUND: Population genetics studies based on the analysis of mtDNA and mitochondrial disease studies have produced a huge quantity of sequence data and related information. These data are at present worldwide distributed in differently organised databases and web sites not well integrated among them. Moreover it is not generally possible for the user to submit and contemporarily analyse its own data comparing them with the content of a given database, both for population genetics and mitochondrial disease data. RESULTS: HmtDB is a well-integrated web-based human mitochondrial bioinformatic resource aimed at supporting population genetics and mitochondrial disease studies, thanks to a new approach based on site-specific nucleotide and aminoacid variability estimation. HmtDB consists of a database of Human Mitochondrial Genomes, annotated with population data, and a set of bioinformatic tools, able to produce site-specific variability data and to automatically characterize newly sequenced human mitochondrial genomes. A query system for the retrieval of genomes and a web submission tool for the annotation of new genomes have been designed and will soon be implemented. The first release contains 1255 fully annotated human mitochondrial genomes. Nucleotide site-specific variability data and multialigned genomes can be downloaded. Intra-human and inter-species aminoacid variability data estimated on the 13 coding for proteins genes of the 1255 human genomes and 60 mammalian species are also available. HmtDB is freely available, upon registration, at http://www.hmdb.uniba.it. CONCLUSION: The HmtDB project will contribute towards completing and/or refining haplogroup classification and revealing the real pathogenic potential of mitochondrial mutations, on the basis of variability estimation.

Assuntos

Biologia Computacional/métodos , DNA Mitocondrial , Bases de Dados Genéticas , Proteínas Mitocondriais/genética , Bases de Dados Factuais , Bases de Dados de Ácidos Nucleicos , Bases de Dados de Proteínas , Genética Populacional , Genoma , Genótipo , Humanos , Armazenamento e Recuperação da Informação , Internet , Proteínas Mitocondriais/fisiologia , Alinhamento de Sequência

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa