Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 38
Filtrar
1.
Nucleic Acids Res ; 45(D1): D611-D618, 2017 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-28053166

RESUMO

The World Data Centre for Microorganisms (WDCM) was established 50 years ago as the data center of the World Federation for Culture Collections (WFCC)-Microbial Resource Center (MIRCEN). WDCM aims to provide integrated information services using big data technology for microbial resource centers and microbiologists all over the world. Here, we provide an overview of WDCM including all of its integrated services. Culture Collections Information Worldwide (CCINFO) provides metadata information on 708 culture collections from 72 countries and regions. Global Catalogue of Microorganism (GCM) gathers strain catalogue information and provides a data retrieval, analysis, and visualization system of microbial resources. Currently, GCM includes >368 000 strains from 103 culture collections in 43 countries and regions. Analyzer of Bioresource Citation (ABC) is a data mining tool extracting strain related publications, patents, nucleotide sequences and genome information from public data sources to form a knowledge base. Reference Strain Catalogue (RSC) maintains a database of strains listed in International Standards Organization (ISO) and other international or regional standards. RSC allocates a unique identifier to strains recommended for use in diagnosis and quality control, and hence serves as a valuable cross-platform reference. WDCM provides free access to all these services at www.wdcm.org.


Assuntos
Biologia Computacional/métodos , Bases de Dados Factuais , Microbiologia , Microbiota , Software , Biodiversidade , Mineração de Dados , Metagenômica/métodos , Filogenia , Navegador , Fluxo de Trabalho
2.
BMC Struct Biol ; 17(1): 4, 2017 04 24.
Artigo em Inglês | MEDLINE | ID: mdl-28438161

RESUMO

BACKGROUND: More than 7000 papers related to "protein refolding" have been published to date, with approximately 300 reports each year during the last decade. Whilst some of these papers provide experimental protocols for protein refolding, a survey in the structural life science communities showed a necessity for a comprehensive database for refolding techniques. We therefore have developed a new resource - "REFOLDdb" that collects refolding techniques into a single, searchable repository to help researchers develop refolding protocols for proteins of interest. RESULTS: We based our resource on the existing REFOLD database, which has not been updated since 2009. We redesigned the data format to be more concise, allowing consistent representations among data entries compared with the original REFOLD database. The remodeled data architecture enhances the search efficiency and improves the sustainability of the database. After an exhaustive literature search we added experimental refolding protocols from reports published 2009 to early 2017. In addition to this new data, we fully converted and integrated existing REFOLD data into our new resource. REFOLDdb contains 1877 entries as of March 17th, 2017, and is freely available at http://p4d-info.nig.ac.jp/refolddb/ . CONCLUSION: REFOLDdb is a unique database for the life sciences research community, providing annotated information for designing new refolding protocols and customizing existing methodologies. We envisage that this resource will find wide utility across broad disciplines that rely on the production of pure, active, recombinant proteins. Furthermore, the database also provides a useful overview of the recent trends and statistics in refolding technology development.


Assuntos
Algoritmos , Bases de Dados de Proteínas , Internet , Redobramento de Proteína , Proteínas/química , Humanos , Interface Usuário-Computador
3.
J Struct Funct Genomics ; 17(4): 69-81, 2016 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-28012137

RESUMO

Life science research now heavily relies on all sorts of databases for genome sequences, transcription, protein three-dimensional (3D) structures, protein-protein interactions, phenotypes and so forth. The knowledge accumulated by all the omics research is so vast that a computer-aided search of data is now a prerequisite for starting a new study. In addition, a combinatory search throughout these databases has a chance to extract new ideas and new hypotheses that can be examined by wet-lab experiments. By virtually integrating the related databases on the Internet, we have built a new web application that facilitates life science researchers for retrieving experts' knowledge stored in the databases and for building a new hypothesis of the research target. This web application, named VaProS, puts stress on the interconnection between the functional information of genome sequences and protein 3D structures, such as structural effect of the gene mutation. In this manuscript, we present the notion of VaProS, the databases and tools that can be accessed without any knowledge of database locations and data formats, and the power of search exemplified in quest of the molecular mechanisms of lysosomal storage disease. VaProS can be freely accessed at http://p4d-info.nig.ac.jp/vapros/ .


Assuntos
Biologia Computacional , Bases de Dados Genéticas , Genoma , Internet , Software , Animais , Humanos , Camundongos , Conformação Proteica , Ratos , Análise de Sequência de DNA
4.
BMC Genomics ; 14: 933, 2013 Dec 30.
Artigo em Inglês | MEDLINE | ID: mdl-24377417

RESUMO

BACKGROUND: Throughout the long history of industrial and academic research, many microbes have been isolated, characterized and preserved (whenever possible) in culture collections. With the steady accumulation in observational data of biodiversity as well as microbial sequencing data, bio-resource centers have to function as data and information repositories to serve academia, industry, and regulators on behalf of and for the general public. Hence, the World Data Centre for Microorganisms (WDCM) started to take its responsibility for constructing an effective information environment that would promote and sustain microbial research data activities, and bridge the gaps currently present within and outside the microbiology communities. DESCRIPTION: Strain catalogue information was collected from collections by online submission. We developed tools for automatic extraction of strain numbers and species names from various sources, including Genbank, Pubmed, and SwissProt. These new tools connect strain catalogue information with the corresponding nucleotide and protein sequences, as well as to genome sequence and references citing a particular strain. All information has been processed and compiled in order to create a comprehensive database of microbial resources, and was named Global Catalogue of Microorganisms (GCM). The current version of GCM contains information of over 273,933 strains, which includes 43,436 bacterial, fungal and archaea species from 52 collections in 25 countries and regions.A number of online analysis and statistical tools have been integrated, together with advanced search functions, which should greatly facilitate the exploration of the content of GCM. CONCLUSION: A comprehensive dynamic database of microbial resources has been created, which unveils the resources preserved in culture collections especially for those whose informatics infrastructures are still under development, which should foster cumulative research, facilitating the activities of microbiologists world-wide, who work in both public and industrial research centres. This database is available from http://gcm.wfcc.info.


Assuntos
Archaea/classificação , Bactérias/classificação , Bases de Dados Factuais , Fungos/classificação , Armazenamento e Recuperação da Informação , Biologia Computacional , Internet
5.
Nucleic Acids Res ; 39(Database issue): D986-90, 2011 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-20972215

RESUMO

Autophagy is a process of self-digestion generally observed in eukaryotes and has been shown to play crucial roles for survival under starvation and removal of deleterious substances. Despite great advances that have been made, many problems in mechanisms of autophagy remain unsolved. As a large number of autophagy-related proteins are identified in each species, a database that collects data, identifies their homologs in other species and makes them available will contribute to research advancement. As no such resources exist, we built the Autophagy database (http://tp-apg.genes.nig.ac.jp/autophagy) to provide basics, up-to-date information on relevant literature, and a list of autophagy-related proteins and their homologs in 41 eukaryotes. From the database, the user can search for proteins by keywords or sequences to obtain a wealth of data including functional and structural information and find possible functional homologs of proteins whose functions have been demonstrated in other species. As proteins that bind the phospholipid, phosphatidyl inositol 3-phosphate (PI3P) are essential for autophagy to proceed, we carried out an original analysis to identify probable PI3P-binding proteins, and made the list available from the database. The database is expected to give impetus to further research on autophagy by providing basic and specialized data on the subject.


Assuntos
Autofagia , Bases de Dados de Proteínas , Animais , Humanos , Camundongos , Fosfatos de Fosfatidilinositol/metabolismo , Pesquisa , Homologia de Sequência de Aminoácidos
6.
Nucleic Acids Res ; 39(Database issue): D19-21, 2011 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-21062823

RESUMO

The combination of significantly lower cost and increased speed of sequencing has resulted in an explosive growth of data submitted into the primary next-generation sequence data archive, the Sequence Read Archive (SRA). The preservation of experimental data is an important part of the scientific record, and increasing numbers of journals and funding agencies require that next-generation sequence data are deposited into the SRA. The SRA was established as a public repository for the next-generation sequence data and is operated by the International Nucleotide Sequence Database Collaboration (INSDC). INSDC partners include the National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EBI) and the DNA Data Bank of Japan (DDBJ). The SRA is accessible at http://www.ncbi.nlm.nih.gov/Traces/sra from NCBI, at http://www.ebi.ac.uk/ena from EBI and at http://trace.ddbj.nig.ac.jp from DDBJ. In this article, we present the content and structure of the SRA, detail our support for sequencing platforms and provide recommended data submission levels and formats. We also briefly outline our response to the challenge of data growth.


Assuntos
Bases de Dados de Ácidos Nucleicos , Sequenciamento de Nucleotídeos em Larga Escala
7.
Nucleic Acids Res ; 39(Database issue): D22-7, 2011 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-21062814

RESUMO

The DNA Data Bank of Japan (DDBJ, http://www.ddbj.nig.ac.jp) provides a nucleotide sequence archive database and accompanying database tools for sequence submission, entry retrieval and annotation analysis. The DDBJ collected and released 3,637,446 entries/2,272,231,889 bases between July 2009 and June 2010. A highlight of the released data was archive datasets from next-generation sequencing reads of Japanese rice cultivar, Koshihikari submitted by the National Institute of Agrobiological Sciences. In this period, we started a new archive for quantitative genomics data, the DDBJ Omics aRchive (DOR). The DOR stores quantitative data both from the microarray and high-throughput new sequencing platforms. Moreover, we improved the content of the DDBJ patent sequence, released a new submission tool of the DDBJ Sequence Read Archive (DRA) which archives massive raw sequencing reads, and enhanced a cloud computing-based analytical system from sequencing reads, the DDBJ Read Annotation Pipeline. In this article, we describe these new functions of the DDBJ databases and support tools.


Assuntos
Bases de Dados de Ácidos Nucleicos , Sequência de Aminoácidos , Bases de Dados de Proteínas , Genômica , Anotação de Sequência Molecular , Patentes como Assunto , Software
8.
J Struct Funct Genomics ; 13(3): 145-54, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-22644393

RESUMO

The Targeted Proteins Research Program (TPRP) promoted by the Ministry of Education, Culture, Sports, Science and Technology (MEXT) of Japan is the phase II of structural biology project (2007-2011) following the Protein 3000 Project (2002-2006) in Japan. While the phase I Protein 3000 Project put partial emphasis on the construction and maintenance of pipelines for structural analyses, the TPRP is dedicated to revealing the structures and functions of the targeted proteins that have great importance in both basic research and industrial applications. To pursue this objective, 35 Targeted Proteins (TP) Projects selected in the three areas of fundamental biology, medicine and pharmacology, and food and environment are tightly collaborated with 10 Advanced Technology (AT) Projects in the four fields of protein production, structural analyses, chemical library and screening, and information platform. Here, the outlines and achievements of the 35 TP Projects are summarized in the system named TP Atlas. Progress in the diversified areas is described in the modules of Graphical Summary, General Summary, Tabular Summary, and Structure Gallery of the TP Atlas in the standard and unified format. Advances in TP Projects owing to novel technologies stemmed from AT Projects and collaborative research among TP Projects are illustrated as a hallmark of the Program. The TP Atlas can be accessed at http://net.genes.nig.ac.jp/tpatlas/index_e.html .


Assuntos
Proteínas/química , Proteômica/métodos , Software , Gráficos por Computador , Bases de Dados de Proteínas , Gestão da Informação/métodos , Gestão da Informação/organização & administração , Internet , Japão , Conformação Proteica , Mapas de Interação de Proteínas , Proteômica/organização & administração , Transdução de Sinais , Relação Estrutura-Atividade
9.
Nucleic Acids Res ; 38(Database issue): D870-1, 2010 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-19965774

RESUMO

Next generation sequencing platforms are producing biological sequencing data in unprecedented amounts. The partners of the International Nucleotide Sequencing Database Collaboration, which includes the National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EBI), and the DNA Data Bank of Japan (DDBJ), have established the Sequence Read Archive (SRA) to provide the scientific community with an archival destination for next generation data sets. The SRA is now accessible at http://www.ncbi.nlm.nih.gov/Traces/sra from NCBI, at http://www.ebi.ac.uk/ena from EBI and at http://www.ddbj.nig.ac.jp/sub/trace_sra-e.html from DDBJ. Users of these resources can obtain data sets deposited in any of the three SRA instances. Links and submission instructions are provided.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Bases de Dados de Ácidos Nucleicos , Genômica/métodos , Armazenamento e Recuperação da Informação/métodos , Análise de Sequência de DNA/métodos , Animais , Biologia Computacional/tendências , Computadores , Europa (Continente) , Humanos , Internet , Japão , National Library of Medicine (U.S.) , Análise de Sequência de DNA/tendências , Software , Estados Unidos
10.
Nucleic Acids Res ; 38(Database issue): D26-32, 2010 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-19934255

RESUMO

The National BioResource Project (NBRP) is a Japanese project that aims to establish a system for collecting, preserving and providing bioresources for use as experimental materials for life science research. It is promoted by 27 core resource facilities, each concerned with a particular group of organisms, and by one information center. The NBRP database is a product of this project. Thirty databases and an integrated database-retrieval system (BioResource World: BRW) have been created and made available through the NBRP home page (http://www.nbrp.jp). The 30 independent databases have individual features which directly reflect the data maintained by each resource facility. The BRW is designed for users who need to search across several resources without moving from one database to another. BRW provides access to a collection of 4.5-million records on bioresources including wild species, inbred lines, mutants, genetically engineered lines, DNA clones and so on. BRW supports summary browsing, keyword searching, and searching by DNA sequences or gene ontology. The results of searches provide links to online requests for distribution of research materials. A circulation system allows users to submit details of papers published on research conducted using NBRP resources.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Bases de Dados de Ácidos Nucleicos , Algoritmos , Animais , Biologia Computacional/tendências , Bases de Dados de Proteínas , Perfilação da Expressão Gênica/métodos , Genoma de Planta , Genoma Viral , Humanos , Armazenamento e Recuperação da Informação/métodos , Internet , Japão , Software
11.
Nucleic Acids Res ; 37(Web Server issue): W11-6, 2009 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-19417067

RESUMO

DNA Data Bank of Japan (DDBJ) provides Web-based systems for biological analysis, called Web APIs for biology (WABI). So far, we have developed over 20 SOAP services and several workflows that consist of a series of method invocations. In this article, we present newly developed services of WABI, that is, REST-based Web services, additional workflows and a workflow navigation system. Each Web service and workflow can be used as a complete service or a building block for programmers to construct more complex information processing systems. The workflow navigation system aims to help non-programming biologists perform analysis tasks by providing next applicable services on Web browsers according to the output of a previously selected service. With this function, users can apply multiple services consecutively only by following links without any programming or manual copy-and-paste operations on Web browsers. The listed services are determined automatically by the system referring to the dictionaries of service categories, the input/output types of services and HTML tags. WABI and the workflow navigation system are freely accessible at http://www.xml.nig.ac.jp/index.html and http://cyclamen.ddbj.nig.ac.jp/, respectively.


Assuntos
Biologia , Bases de Dados de Ácidos Nucleicos , Software , Internet , Interface Usuário-Computador
12.
Nucleic Acids Res ; 37(Database issue): D16-8, 2009 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-18927114

RESUMO

DNA Data Bank of Japan (DDBJ) (http://www.ddbj.nig.ac.jp) collected and released 2 368 110 entries or 1 415 106 598 bases in the period from July 2007 to June 2008. The releases in this period include genome scale data of Bombyx mori, Oryzas latipes, Drosophila and Lotus japonicus. In addition, from this year we collected and released trace archive data in collaboration with National Center for Biotechnology Information (NCBI). The first release contains those of O. latipes and bacterial meta genomes in human gut. To cope with the current progress of sequencing technology, we also accepted and released more than 100 million of short reads of parasitic protozoa and their hosts that were produced by using a Solexa sequencer.


Assuntos
Bases de Dados de Ácidos Nucleicos , Análise de Sequência de DNA/tendências , Animais , Genômica , Humanos
13.
Nucleic Acids Res ; 37(Database issue): D333-7, 2009 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-18987007

RESUMO

The Genomes TO Protein Structures and Functions (GTOP) database (http://spock.genes.nig.ac.jp/~genome/gtop.html) freely provides an extensive collection of information on protein structures and functions obtained by application of various computational tools to the amino acid sequences of entirely sequenced genomes. GTOP contains annotations of 3D structures, protein families, functions, and other useful data of a protein of interest in user-friendly ways to give a deep insight into the protein structure. From the initial 1999 version, GTOP has been continually updated to reap the fruits of genome projects and augmented to supply novel information, in particular intrinsically disordered regions. As intrinsically disordered regions constitute a considerable fraction of proteins and often play crucial roles especially in eukaryotes, their assignments give important additional clues to the functionality of proteins. Additionally, we have incorporated the following features into GTOP: a platform independent structural viewer, results of HMM searches against SCOP and Pfam, secondary structure predictions, color display of exon boundaries in eukaryotic proteins, assignments of gene ontology terms, search tools, and master files.


Assuntos
Bases de Dados de Proteínas , Conformação Proteica , Proteínas/genética , Éxons , Genômica , Proteínas/química , Proteínas/fisiologia , Análise de Sequência de Proteína , Software
14.
Adv Exp Med Biol ; 680: 125-35, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-20865494

RESUMO

The Center for Information Biology and DNA Data Bank of Japan (CIB-DDBJ) has operated biological databases since 1987 in collaboration with NCBI and EBI. As one of the three major public databases, CIB-DDBJ has run four primary databases DDBJ, CIBEX, DDBJ Trace Archive (DTA), and DDBJ Read Archive (DRA) to collect, archive, and provide various kinds of biological data. As the massively parallel new sequencing platforms are increasingly in use, huge amounts of the raw data have been produced. To archive these raw data, we at CIB-DDBJ began operating a new repository, the DDBJ Read Archive (DRA). To accommodate efficiently the processed data as well, we have developed a new pipeline, the DDBJ Read Annotation Pipeline that deals with both data submission and analysis. For data produced by the next generation platforms, the three archives DRA, DDBJ, and CIBEX, which are interconnected by the pipeline, collect the raw, processed sequence, and quantitative data, respectively. The public biological databases at CIB-DDBJ, EBI, and NCBI will together construct world-wide archives for biological data by data sharing to accelerate research in life sciences in the era of next generation sequencing technologies.


Assuntos
Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Análise de Sequência de DNA/estatística & dados numéricos , Biologia Computacional , Bases de Dados de Ácidos Nucleicos/tendências , Japão , Modelos Estatísticos , Análise de Sequência de DNA/tendências
15.
Hum Mutat ; 30(6): 968-77, 2009 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-19479963

RESUMO

Torrents of genotype-phenotype data are being generated, all of which must be captured, processed, integrated, and exploited. To do this optimally requires the use of standard and interoperable "object models," providing a description of how to partition the total spectrum of information being dealt with into elemental "objects" (such as "alleles," "genotypes," "phenotype values," "methods") with precisely stated logical interrelationships (such as "A objects are made up from one or more B objects"). We herein propose the Phenotype and Genotype Experiment Object Model (PaGE-OM; www.pageom.org), which has been tested and implemented in conjunction with several major databases, and approved as a standard by the Object Management Group (OMG). PaGE-OM is open-source, ready for use by the wider community, and can be further developed as needs arise. It will help to improve information management, assist data integration, and simplify the task of informatics resource design and construction for genotype and phenotype data projects.


Assuntos
DNA/genética , Bases de Dados Genéticas , Variação Genética , Modelos Genéticos , Genótipo , Humanos , Fenótipo
16.
Nucleic Acids Res ; 35(Database issue): D13-5, 2007 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-17108353

RESUMO

DNA Data Bank of Japan (DDBJ) (http://www.ddbj.nig.ac.jp) newly collected and released 12,927,184 entries or 13,787,688,598 bases in the period from July 2005 to June 2006. The released data contain honeybee expressed sequence tags (ESTs), re-examined and re-annotated complete genome data of Escherichia coli K-12 W3110, medaka WGS and human MGA. We also systematically evaluated and classified the genes in the complete bacterial genomes submitted to the International Nucleotide Sequence Database Collaboration (INSDC, http://insdc.org) that is composed of DDBJ, EMBL Bank and GenBank. The examination and classification selected 557,000 genes as reliable ones among all the bacterial genes predicted by us.


Assuntos
Bases de Dados de Ácidos Nucleicos , Genes Bacterianos , Animais , Classificação , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Estudos de Avaliação como Assunto , Genoma Bacteriano , Humanos , Internet , Fases de Leitura Aberta
17.
Nucleic Acids Res ; 35(Database issue): D339-42, 2007 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-17158166

RESUMO

Genome Information Broker for Viruses (GIB-V) is a comprehensive virus genome/segment database. We extracted 18 418 complete virus genomes/segments from the International Nucleotide Sequence Database Collaboration (INSDC, http://www.insdc.org/) by DNA Data Bank of Japan (DDBJ), EMBL and GenBank and stored them in our system. The list of registered viruses is arranged hierarchically according to taxonomy. Keyword searches can be performed for genome/segment data or biological features of any virus stored in GIB-V. GIB-V is equipped with a BLAST search function, and search results are displayed graphically or in list form. Moreover, the BLAST results can be used online with the ClustalW feature of the DDBJ. All available virus genome/segment data can be collected by the GIB-V download function. GIB-V can be accessed at no charge at http://gib-v.genes.nig.ac.jp/.


Assuntos
Bases de Dados de Ácidos Nucleicos , Genoma Viral , Genômica , Internet , Alinhamento de Sequência , Software , Interface Usuário-Computador
18.
Nucleic Acids Res ; 34(Database issue): D6-9, 2006 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-16381940

RESUMO

In the past year, DDBJ (http://www.ddbj.nig.ac.jp) collected and released 1,956,826 entries or 1,741,313,111 bases. The released data include approximately 90,000 ESTs and cDNAs of Macaca fascicularis, and 280 million bases of mouse GSS. In addition to the data collection, we have indexed the submitted data to the International Nucleotide Sequence Database Collaboration (INSDC, http://www.insdc.org) to classify the entries into research projects behind data submissions. They are expected to be useful to the data submitters and users for enhancing the data submission, retrieval and systematic data analyses at INSDC. The results of indexing also allow one to grasp research projects in life sciences that promoted and produced the DNA sequences submitted to INSDC.


Assuntos
Bases de Dados de Ácidos Nucleicos , Indexação e Redação de Resumos , Animais , Sequência de Bases , DNA/química , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Geografia , Humanos , Internet , Macaca fascicularis , Camundongos , Pesquisa , Interface Usuário-Computador
19.
BMC Bioinformatics ; 8: 281, 2007 Aug 02.
Artigo em Inglês | MEDLINE | ID: mdl-17683520

RESUMO

BACKGROUND: There are a number of different methods for generation of trees and algorithms for phylogenetic analysis in the study of bacterial taxonomy. Genotypic information, such as SSU rRNA gene sequences, now plays a more prominent role in microbial systematics than does phenotypic information. However, the integration of genotypic and phenotypic information for polyphasic studies is necessary for the classification and identification of microbes. Thus, we devised an algorithm that objectively identifies discriminative characteristics for focused clusters on generated trees from a dataset composed of coded data, such as phenotypic information. Moreover, this algorithm has been integrated into the polyphasic analysis software, InforBIO. RESULTS: We developed a differential-character-finding algorithm based on information measures and used this algorithm to identify the characteristic that best discriminates operational taxonomic unit clusters. For all characteristics in a dataset, the algorithm estimates commonality in focused clusters and diversity among clusters by scoring based on Shannon's and relative entropies. All the characteristics selected for scoring are equally weighted. Thresholds for the scores are defined to identify discriminative characteristics for clusters efficiently from a database. The unique feature of the algorithm, which is implemented in the InforBIO software, is that it can identify the phenotypic characteristics that discriminate and are associated with the clusters of a phylogenetic tree. We successfully applied this algorithm to the study of phylogenetic clusters of Pseudomonas species. CONCLUSION: The algorithm in the InforBIO software is a novel and useful approach for microbial polyphasic studies. The algorithm can also be applied to diverse cluster analyses. The InforBIO software is available from the download site http://wdcm.nig.ac.jp/inforbio/. This software is free for personal but not commercial use.


Assuntos
DNA Bacteriano/genética , Bases de Dados Genéticas , Pseudomonas/classificação , Pseudomonas/genética , Locos de Características Quantitativas/genética , Análise de Sequência de DNA/métodos , Software , Análise por Conglomerados , Análise Discriminante
20.
BMC Bioinformatics ; 7: 368, 2006 Aug 04.
Artigo em Inglês | MEDLINE | ID: mdl-16887044

RESUMO

BACKGROUND: Genome databases contain diverse kinds of information, including gene annotations and nucleotide and amino acid sequences. It is not easy to integrate such information for genomic study. There are few tools for integrated analyses of genomic data, therefore, we developed software that enables users to handle, manipulate, and analyze genome data with a variety of sequence analysis programs. RESULTS: The G-InforBIO system is a novel tool for genome data management and sequence analysis. The system can import genome data encoded as eXtensible Markup Language documents as formatted text documents, including annotations and sequences, from DNA Data Bank of Japan and GenBank encoded as flat files. The genome database is constructed automatically after importing, and the database can be exported as documents formatted with eXtensible Markup Language or tab-deliminated text. Users can retrieve data from the database by keyword searches, edit annotation data of genes, and process data with G-InforBIO. In addition, information in the G-InforBIO database can be analyzed seamlessly with nine different software programs, including programs for clustering and homology analyses. CONCLUSION: The G-InforBIO system simplifies genome analyses by integrating several available software programs to allow efficient handling and manipulation of genome data. G-InforBIO is freely available from the download site.


Assuntos
Mapeamento Cromossômico/métodos , Bases de Dados Genéticas , Genoma Bacteriano/genética , Armazenamento e Recuperação da Informação/métodos , Análise de Sequência/métodos , Software , Interface Usuário-Computador , Algoritmos , Gráficos por Computador , Sistemas de Gerenciamento de Base de Dados , Genômica/métodos , Alinhamento de Sequência/métodos , Design de Software , Integração de Sistemas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA