Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 61
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Am J Hum Genet ; 103(3): 389-399, 2018 09 06.
Artículo en Inglés | MEDLINE | ID: mdl-30173820

RESUMEN

Recently, to speed up the differential-diagnosis process based on symptoms and signs observed from an affected individual in the diagnosis of rare diseases, researchers have developed and implemented phenotype-driven differential-diagnosis systems. The performance of those systems relies on the quantity and quality of underlying databases of disease-phenotype associations (DPAs). Although such databases are often developed by manual curation, they inherently suffer from limited coverage. To address this problem, we propose a text-mining approach to increase the coverage of DPA databases and consequently improve the performance of differential-diagnosis systems. Our analysis showed that a text-mining approach using one million case reports obtained from PubMed could increase the coverage of manually curated DPAs in Orphanet by 125.6%. We also present PubCaseFinder (see Web Resources), a new phenotype-driven differential-diagnosis system in a freely available web application. By utilizing automatically extracted DPAs from case reports in addition to manually curated DPAs, PubCaseFinder improves the performance of automated differential diagnosis. Moreover, PubCaseFinder helps clinicians search for relevant case reports by using phenotype-based comparisons and confirm the results with detailed contextual information.


Asunto(s)
Enfermedades Raras/diagnóstico , Enfermedades Raras/genética , Minería de Datos/métodos , Bases de Datos Genéticas , Diagnóstico Diferencial , Humanos , Fenotipo
2.
Nucleic Acids Res ; 46(D1): D48-D51, 2018 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-29190397

RESUMEN

For more than 30 years, the International Nucleotide Sequence Database Collaboration (INSDC; http://www.insdc.org/) has been committed to capturing, preserving and providing access to comprehensive public domain nucleotide sequence and associated metadata which enables discovery in biomedicine, biodiversity and biological sciences. Since 1987, the DNA Data Bank of Japan (DDBJ) at the National Institute for Genetics in Mishima, Japan; the European Nucleotide Archive (ENA) at the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) in Hinxton, UK; and GenBank at National Center for Biotechnology Information (NCBI), National Library of Medicine, National Institutes of Health in Bethesda, Maryland, USA have worked collaboratively to enable access to nucleotide sequence data in standardized formats for the worldwide scientific community. In this article, we reiterate the principles of the INSDC collaboration and briefly summarize the trends of the archival content.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Animales , Clasificación , Biología Computacional , Bases de Datos Factuales , Bases de Datos de Ácidos Nucleicos/tendencias , Europa (Continente) , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Cooperación Internacional , Japón , National Library of Medicine (U.S.) , Estados Unidos
3.
Nucleic Acids Res ; 46(D1): D30-D35, 2018 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-29040613

RESUMEN

The DNA Data Bank of Japan (DDBJ) Center (http://www.ddbj.nig.ac.jp) has been providing public data services for 30 years since 1987. We are collecting nucleotide sequence data and associated biological information from researchers as a member of the International Nucleotide Sequence Database Collaboration (INSDC), in collaboration with the US National Center for Biotechnology Information and the European Bioinformatics Institute. The DDBJ Center also services the Japanese Genotype-phenotype Archive (JGA) with the National Bioscience Database Center to collect genotype and phenotype data of human individuals. Here, we outline our database activities for INSDC and JGA over the past year, and introduce submission, retrieval and analysis services running on our supercomputer system and their recent developments. Furthermore, we highlight our responses to the amended Japanese rules for the protection of personal information and the launch of the DDBJ Group Cloud service for sharing pre-publication data among research groups.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Academias e Institutos , Nube Computacional , Biología Computacional , Confidencialidad/legislación & jurisprudencia , Bases de Datos de Ácidos Nucleicos/historia , Bases de Datos de Ácidos Nucleicos/tendencias , Europa (Continente) , Estudios de Asociación Genética , Historia del Siglo XX , Historia del Siglo XXI , Humanos , Almacenamiento y Recuperación de la Información , Cooperación Internacional , Japón , National Library of Medicine (U.S.) , Estados Unidos
4.
Nucleic Acids Res ; 45(D1): D25-D31, 2017 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-27924010

RESUMEN

The DNA Data Bank of Japan (DDBJ) (http://www.ddbj.nig.ac.jp) has been providing public data services for thirty years (since 1987). We are collecting nucleotide sequence data from researchers as a member of the International Nucleotide Sequence Database Collaboration (INSDC, http://www.insdc.org), in collaboration with the US National Center for Biotechnology Information (NCBI) and European Bioinformatics Institute (EBI). The DDBJ Center also services Japanese Genotype-phenotype Archive (JGA), with the National Bioscience Database Center to collect human-subjected data from Japanese researchers. Here, we report our database activities for INSDC and JGA over the past year, and introduce retrieval and analytical services running on our supercomputer system and their recent modifications. Furthermore, with the Database Center for Life Science, the DDBJ Center improves semantic web technologies to integrate and to share biological data, for providing the RDF version of the sequence data.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Análisis de Secuencia de ADN , Animales , Genotipo , Humanos , Internet , Japón , Anotación de Secuencia Molecular , Fenotipo , Programas Informáticos
5.
Nucleic Acids Res ; 44(D1): D48-50, 2016 Jan 04.
Artículo en Inglés | MEDLINE | ID: mdl-26657633

RESUMEN

The International Nucleotide Sequence Database Collaboration (INSDC; http://www.insdc.org) comprises three global partners committed to capturing, preserving and providing comprehensive public-domain nucleotide sequence information. The INSDC establishes standards, formats and protocols for data and metadata to make it easier for individuals and organisations to submit their nucleotide data reliably to public archives. This work enables the continuous, global exchange of information about living things. Here we present an update of the INSDC in 2015, including data growth and diversification, new standards and requirements by publishers for authors to submit their data to the public archives. The INSDC serves as a model for data sharing in the life sciences.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN , Conducta Cooperativa , Bases de Datos de Ácidos Nucleicos/normas , Políticas
6.
Nucleic Acids Res ; 44(D1): D51-7, 2016 Jan 04.
Artículo en Inglés | MEDLINE | ID: mdl-26578571

RESUMEN

The DNA Data Bank of Japan Center (DDBJ Center; http://www.ddbj.nig.ac.jp) maintains and provides public archival, retrieval and analytical services for biological information. The contents of the DDBJ databases are shared with the US National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute (EBI) within the framework of the International Nucleotide Sequence Database Collaboration (INSDC). Since 2013, the DDBJ Center has been operating the Japanese Genotype-phenotype Archive (JGA) in collaboration with the National Bioscience Database Center (NBDC) in Japan. In addition, the DDBJ Center develops semantic web technologies for data integration and sharing in collaboration with the Database Center for Life Science (DBCLS) in Japan. This paper briefly reports on the activities of the DDBJ Center over the past year including submissions to databases and improvements in our services for data retrieval, analysis, and integration.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Análisis de Secuencia de ADN , Ontologías Biológicas , Computadores , Genotipo , Fenotipo
7.
Nucleic Acids Res ; 43(Database issue): D18-22, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25477381

RESUMEN

The DNA Data Bank of Japan Center (DDBJ Center; http://www.ddbj.nig.ac.jp) maintains and provides public archival, retrieval and analytical services for biological information. Since October 2013, DDBJ Center has operated the Japanese Genotype-phenotype Archive (JGA) in collaboration with our partner institute, the National Bioscience Database Center (NBDC) of the Japan Science and Technology Agency. DDBJ Center provides the JGA database system which securely stores genotype and phenotype data collected from individuals whose consent agreements authorize data release only for specific research use. NBDC has established guidelines and policies for sharing human-derived data and reviews data submission and usage requests from researchers. In addition to the JGA project, DDBJ Center develops Semantic Web technologies for data integration and sharing in collaboration with the Database Center for Life Science. This paper describes the overview of the JGA project, updates to the DDBJ databases, and services for data retrieval, analysis and integration.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Genotipo , Fenotipo , Estudios de Asociación Genética , Humanos , Internet , Análisis de Secuencia de ADN
8.
Nucleic Acids Res ; 42(Database issue): D44-9, 2014 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-24194602

RESUMEN

The DNA Data Bank of Japan (DDBJ; http://www.ddbj.nig.ac.jp) maintains and provides archival, retrieval and analytical resources for biological information. This database content is shared with the US National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute (EBI) within the framework of the International Nucleotide Sequence Database Collaboration (INSDC). DDBJ launched a new nucleotide sequence submission system for receiving traditional nucleotide sequence. We expect that the new submission system will be useful for many submitters to input accurate annotation and reduce the time needed for data input. In addition, DDBJ has started a new service, the Japanese Genotype-phenotype Archive (JGA), with our partner institute, the National Bioscience Database Center (NBDC). JGA permanently archives and shares all types of individual human genetic and phenotypic data. We also introduce improvements in the DDBJ services and databases made during the past year.


Asunto(s)
Secuencia de Bases , Bases de Datos de Ácidos Nucleicos , Anotación de Secuencia Molecular , Genómica , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Internet , Fenotipo
9.
Nucleic Acids Res ; 41(Database issue): D25-9, 2013 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-23180790

RESUMEN

The DNA data bank of Japan (DDBJ, http://www.ddbj.nig.ac.jp) maintains a primary nucleotide sequence database and provides analytical resources for biological information to researchers. This database content is exchanged with the US National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute (EBI) within the framework of the International Nucleotide Sequence Database Collaboration (INSDC). Resources provided by the DDBJ include traditional nucleotide sequence data released in the form of 27 316 452 entries or 16 876 791 557 base pairs (as of June 2012), and raw reads of new generation sequencers in the sequence read archive (SRA). A Japanese researcher published his own genome sequence via DDBJ-SRA on 31 July 2012. To cope with the ongoing genomic data deluge, in March 2012, our computer previous system was totally replaced by a commodity cluster-based system that boasts 122.5 TFlops of CPU capacity and 5 PB of storage space. During this upgrade, it was considered crucial to replace and refactor substantial portions of the DDBJ software systems as well. As a result of the replacement process, which took more than 2 years to perform, we have achieved significant improvements in system performance.


Asunto(s)
Secuencia de Bases , Bases de Datos de Ácidos Nucleicos , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Internet , Análisis de Secuencia de ADN , Programas Informáticos
10.
Brief Bioinform ; 13(2): 258-68, 2012 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-21803786

RESUMEN

In recent years, biological web resources such as databases and tools have become more complex because of the enormous amounts of data generated in the field of life sciences. Traditional methods of distributing tutorials include publishing textbooks and posting web documents, but these static contents cannot adequately describe recent dynamic web services. Due to improvements in computer technology, it is now possible to create dynamic content such as video with minimal effort and low cost on most modern computers. The ease of creating and distributing video tutorials instead of static content improves accessibility for researchers, annotators and curators. This article focuses on online video repositories for educational and tutorial videos provided by resource developers and users. It also describes a project in Japan named TogoTV (http://togotv.dbcls.jp/en/) and discusses the production and distribution of high-quality tutorial videos, which would be useful to viewer, with examples. This article intends to stimulate and encourage researchers who develop and use databases and tools to distribute how-to videos as a tool to enhance product usability.


Asunto(s)
Biología Computacional/métodos , Internet , Programas Informáticos , Grabación de Cinta de Video , Bases de Datos Factuales , Almacenamiento y Recuperación de la Información , Japón
11.
Nucleic Acids Res ; 40(Database issue): D38-42, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22110025

RESUMEN

The DNA Data Bank of Japan (DDBJ; http://www.ddbj.nig.ac.jp) maintains and provides archival, retrieval and analytical resources for biological information. The central DDBJ resource consists of public, open-access nucleotide sequence databases including raw sequence reads, assembly information and functional annotation. Database content is exchanged with EBI and NCBI within the framework of the International Nucleotide Sequence Database Collaboration (INSDC). In 2011, DDBJ launched two new resources: the 'DDBJ Omics Archive' (DOR; http://trace.ddbj.nig.ac.jp/dor) and BioProject (http://trace.ddbj.nig.ac.jp/bioproject). DOR is an archival database of functional genomics data generated by microarray and highly parallel new generation sequencers. Data are exchanged between the ArrayExpress at EBI and DOR in the common MAGE-TAB format. BioProject provides an organizational framework to access metadata about research projects and the data from the projects that are deposited into different databases. In this article, we describe major changes and improvements introduced to the DDBJ services, and the launch of two new resources: DOR and BioProject.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Genómica , Perfilación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Internet , Análisis de Secuencia por Matrices de Oligonucleótidos , Análisis de Secuencia de ADN , Análisis de Secuencia de ARN
12.
Nucleic Acids Res ; 39(Database issue): D22-7, 2011 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-21062814

RESUMEN

The DNA Data Bank of Japan (DDBJ, http://www.ddbj.nig.ac.jp) provides a nucleotide sequence archive database and accompanying database tools for sequence submission, entry retrieval and annotation analysis. The DDBJ collected and released 3,637,446 entries/2,272,231,889 bases between July 2009 and June 2010. A highlight of the released data was archive datasets from next-generation sequencing reads of Japanese rice cultivar, Koshihikari submitted by the National Institute of Agrobiological Sciences. In this period, we started a new archive for quantitative genomics data, the DDBJ Omics aRchive (DOR). The DOR stores quantitative data both from the microarray and high-throughput new sequencing platforms. Moreover, we improved the content of the DDBJ patent sequence, released a new submission tool of the DDBJ Sequence Read Archive (DRA) which archives massive raw sequencing reads, and enhanced a cloud computing-based analytical system from sequencing reads, the DDBJ Read Annotation Pipeline. In this article, we describe these new functions of the DDBJ databases and support tools.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Genómica , Anotación de Secuencia Molecular , Patentes como Asunto , Programas Informáticos
13.
BMC Bioinformatics ; 13 Suppl 11: S1, 2012 Jun 26.
Artículo en Inglés | MEDLINE | ID: mdl-22759455

RESUMEN

BACKGROUND: The Genia task, when it was introduced in 2009, was the first community-wide effort to address a fine-grained, structural information extraction from biomedical literature. Arranged for the second time as one of the main tasks of BioNLP Shared Task 2011, it aimed to measure the progress of the community since 2009, and to evaluate generalization of the technology to full text papers. The Protein Coreference task was arranged as one of the supporting tasks, motivated from one of the lessons of the 2009 task that the abundance of coreference structures in natural language text hinders further improvement with the Genia task. RESULTS: The Genia task received final submissions from 15 teams. The results show that the community has made a significant progress, marking 74% of the best F-score in extracting bio-molecular events of simple structure, e.g., gene expressions, and 45% ~ 48% in extracting those of complex structure, e.g., regulations. The Protein Coreference task received 6 final submissions. The results show that the coreference resolution performance in biomedical domain is lagging behind that in newswire domain, cf. 50% vs. 66% in MUC score. Particularly, in terms of protein coreference resolution the best system achieved 34% in F-score. CONCLUSIONS: Detailed analysis performed on the results improves our insight into the problem and suggests the directions for further improvements.


Asunto(s)
Sistemas de Información , Procesamiento de Lenguaje Natural , Proteínas/química , Congresos como Asunto , Expresión Génica , Proteínas/genética , Proteínas/metabolismo
14.
BMC Genomics ; 13 Suppl 7: S24, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-23281970

RESUMEN

BACKGROUND: The overwhelming amount of network data in functional genomics is making its visualization cluttered with jumbling nodes and edges. Such cluttered network visualization, which is known as "hair-balls", is significantly hindering data interpretation and analysis of researchers. Effective navigation approaches that can always abstract network data properly and present them insightfully are hence required, to help researchers interpret the data and acquire knowledge efficiently. Cytoscape is a de facto standard platform for network visualization and analysis, which has many users around the world. Apart from its core sophisticated features, it easily allows for extension of the functionalities by loading extra plug-ins. RESULTS: We developed NaviClusterCS, which enables researchers to interactively navigate large biological networks of ~100,000 nodes in a "Google Maps-like" manner in the Cytoscape environment. NaviClusterCS rapidly and automatically identifies biologically meaningful clusters in large networks, e.g., proteins sharing similar biological functions in protein-protein interaction networks. Then, it displays not all nodes but only preferable numbers of those clusters at any magnification to avoid creating the cluttered network visualization, while its zooming and re-centering functions still enable researchers to interactively analyze the networks in detail. Its application to a real Arabidopsis co-expression network dataset illustrated a practical use of the tool for suggesting knowledge that is hidden in large biological networks and difficult to be obtained using other visualization methods. CONCLUSIONS: NaviClusterCS provides interactive and multi-scale network navigation to a wide range of biologists in the big data era, via the de facto standard platform for network visualization. It can be freely downloaded at http://navicluster.cb.k.u-tokyo.ac.jp/cs/ and installed as a plug-in of Cytoscape.


Asunto(s)
Genómica , Programas Informáticos , Arabidopsis/genética , Arabidopsis/metabolismo , Análisis por Conglomerados , Biología Computacional , Bases de Datos Factuales , Redes Reguladoras de Genes , Internet , Interfaz Usuario-Computador
15.
BMC Genomics ; 13 Suppl 3: S8, 2012 Jun 11.
Artículo en Inglés | MEDLINE | ID: mdl-22759617

RESUMEN

BACKGROUND: Term clustering, by measuring the string similarities between terms, is known within the natural language processing community to be an effective method for improving the quality of texts and dictionaries. However, we have observed that chemical names are difficult to cluster using string similarity measures. In order to clearly demonstrate this difficulty, we compared the string similarities determined using the edit distance, the Monge-Elkan score, SoftTFIDF, and the bigram Dice coefficient for chemical names with those for non-chemical names. RESULTS: Our experimental results revealed the following: (1) The edit distance had the best performance in the matching of full forms, whereas Cohen et al. reported that SoftTFIDF with the Jaro-Winkler distance would yield the best measure for matching pairs of terms for their experiments. (2) For each of the string similarity measures above, the best threshold for term matching differs for chemical names and for non-chemical names; the difference is especially large for the edit distance. (3) Although the matching results obtained for chemical names using the edit distance, Monge-Elkan scores, or the bigram Dice coefficients are better than the result obtained for non-chemical names, the results were contrary when using SoftTFIDF. (4) A suitable weight for chemical names varies substantially from one for non-chemical names. In particular, a weight vector that has been optimized for non-chemical names is not suitable for chemical names. (5) The matching results using the edit distances improve further by dividing a set of full forms into two subsets, according to whether a full form is a chemical name or not. These results show that our hypothesis is acceptable, and that we can significantly improve the performance of abbreviation-full form clustering by computing chemical names and non-chemical names separately. CONCLUSIONS: In conclusion, the discriminative application of string similarity methods to chemical and non-chemical names may be a simple yet effective way to improve the performance of term clustering.


Asunto(s)
Algoritmos , Análisis por Conglomerados , Almacenamiento y Recuperación de la Información/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Química , Biología Computacional/métodos , Reproducibilidad de los Resultados , Terminología como Asunto
16.
Bioinformatics ; 27(8): 1121-7, 2011 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-21349867

RESUMEN

MOTIVATION: Many types of omics data are compiled as lists of connections between elements and visualized as networks or graphs where the nodes and edges correspond to the elements and the connections, respectively. However, these networks often appear as 'hair-balls'-with a large number of extremely tangled edges-and cannot be visually interpreted. RESULTS: We present an interactive, multiscale navigation method for biological networks. Our approach can automatically and rapidly abstract any portion of a large network of interest to an immediately interpretable extent. The method is based on an ultrafast graph clustering technique that abstracts networks of about 100 000 nodes in a second by iteratively grouping densely connected portions and a biological-property-based clustering technique that takes advantage of biological information often provided for biological entities (e.g. Gene Ontology terms). It was confirmed to be effective by applying it to real yeast protein network data, and would greatly help modern biologists faced with large, complicated networks in a similar manner to how Web mapping services enable interactive multiscale navigation of geographical maps (e.g. Google Maps). AVAILABILITY: Java implementation of our method, named NaviCluster, is available at http://navicluster.cb.k.u-tokyo.ac.jp/. CONTACT: thanet@cb.k.u-tokyo.ac.jp SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Mapeo de Interacción de Proteínas/métodos , Análisis por Conglomerados , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Programas Informáticos , Interfaz Usuario-Computador
17.
Nucleic Acids Res ; 38(Web Server issue): W706-11, 2010 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-20472643

RESUMEN

Web services have become widely used in bioinformatics analysis, but there exist incompatibilities in interfaces and data types, which prevent users from making full use of a combination of these services. Therefore, we have developed the TogoWS service to provide an integrated interface with advanced features. In the TogoWS REST (REpresentative State Transfer) API (application programming interface), we introduce a unified access method for major database resources through intuitive URIs that can be used to search, retrieve, parse and convert the database entries. The TogoWS SOAP API resolves compatibility issues found on the server and client-side SOAP implementations. The TogoWS service is freely available at: http://togows.dbcls.jp/.


Asunto(s)
Biología Computacional , Bases de Datos Factuales , Programas Informáticos , Internet , Integración de Sistemas , Interfaz Usuario-Computador
18.
Nucleic Acids Res ; 38(Database issue): D33-8, 2010 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-19850725

RESUMEN

The DNA Data Bank of Japan (DDBJ) (http://www.ddbj.nig.ac.jp) has collected and released 1,701,110 entries/1,116,138,614 bases between July 2008 and June 2009. A few highlighted data releases from DDBJ were the complete genome sequence of an endosymbiont within protist cells in the termite gut and Cap Analysis Gene Expression tags for human and mouse deposited from the Functional Annotation of the Mammalian cDNA consortium. In this period, we started a novel user announcement service using Really Simple Syndication (RSS) to deliver a list of data released from DDBJ on a daily basis. Comprehensive visualization of a DDBJ release data was attempted by using a word cloud program. Moreover, a new archive for sequencing data from next-generation sequencers, the 'DDBJ Read Archive' (DRA), was launched. Concurrently, for read data registered in DRA, a semi-automatic annotation tool called the 'DDBJ Read Annotation Pipeline' was released as a preliminary step. The pipeline consists of two parts: basic analysis for reference genome mapping and de novo assembly and high-level analysis of structural and functional annotations. These new services will aid users' research and provide easier access to DDBJ databases.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Bases de Datos de Ácidos Nucleicos , Algoritmos , Animales , Biología Computacional/tendencias , Bases de Datos de Proteínas , Genoma Bacteriano , Humanos , Almacenamiento y Recuperación de la Información/métodos , Internet , Japón , Programas Informáticos
19.
PLoS Genet ; 5(3): e1000402, 2009 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-19266023

RESUMEN

The evolutionary history of biological pathways is of general interest, especially in this post-genomic era, because it may provide clues for understanding how complex systems encoded on genomes have been organized. To explain how pathways can evolve de novo, some noteworthy models have been proposed. However, direct reconstruction of pathway evolutionary history both on a genomic scale and at the depth of the tree of life has suffered from artificial effects in estimating the gene content of ancestral species. Recently, we developed an algorithm that effectively reconstructs gene-content evolution without these artificial effects, and we applied it to this problem. The carefully reconstructed history, which was based on the metabolic pathways of 160 prokaryotic species, confirmed that pathways have grown beyond the random acquisition of individual genes. Pathway acquisition took place quickly, probably eliminating the difficulty in holding genes during the course of the pathway evolution. This rapid evolution was due to massive horizontal gene transfers as gene groups, some of which were possibly operon transfers, which would convey existing pathways but not be able to generate novel pathways. To this end, we analyzed how these pathways originally appeared and found that the original acquisition of pathways occurred more contemporaneously than expected across different phylogenetic clades. As a possible model to explain this observation, we propose that novel pathway evolution may be facilitated by bidirectional horizontal gene transfers in prokaryotic communities. Such a model would complement existing pathway evolution models.


Asunto(s)
Bacterias/genética , Evolución Molecular , Transferencia de Gen Horizontal , Bacterias/clasificación , Modelos Genéticos , Filogenia
20.
Hum Genome Var ; 9(1): 44, 2022 Dec 12.
Artículo en Inglés | MEDLINE | ID: mdl-36509753

RESUMEN

TogoVar ( https://togovar.org ) is a database that integrates allele frequencies derived from Japanese populations and provides annotations for variant interpretation. First, a scheme to reanalyze individual-level genome sequence data deposited in the Japanese Genotype-phenotype Archive (JGA), a controlled-access database, was established to make allele frequencies publicly available. As more Japanese individual-level genome sequence data are deposited in JGA, the sample size employed in TogoVar is expected to increase, contributing to genetic study as reference data for Japanese populations. Second, public datasets of Japanese and non-Japanese populations were integrated into TogoVar to easily compare allele frequencies in Japanese and other populations. Each variant detected in Japanese populations was assigned a TogoVar ID as a permanent identifier. Third, these variants were annotated with molecular consequence, pathogenicity, and literature information for interpreting and prioritizing variants. Here, we introduce the newly developed TogoVar database that compares allele frequencies among Japanese and non-Japanese populations and describes the integrated annotations.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA