Pesquisa | BVS Educação Profissional em Saúde

DDBJ update in 2023: the MetaboBank for metabolomics data and associated metadata.

Ara, Takeshi; Kodama, Yuichi; Tokimatsu, Toshiaki; Fukuda, Asami; Kosuge, Takehide; Mashima, Jun; Tanizawa, Yasuhiro; Tanjo, Tomoya; Ogasawara, Osamu; Fujisawa, Takatomo; Nakamura, Yasukazu; Arita, Masanori.

Nucleic Acids Res ; 52(D1): D67-D71, 2024 Jan 05.

Artigo em Inglês | MEDLINE | ID: mdl-37971299

RESUMO

The Bioinformation and DNA Data Bank of Japan (DDBJ) Center (https://www.ddbj.nig.ac.jp) provides database archives that cover a wide range of fields in life sciences. As a founding member of the International Nucleotide Sequence Database Collaboration (INSDC), DDBJ accepts and distributes nucleotide sequence data as well as their study and sample information along with the National Center for Biotechnology Information in the United States and the European Bioinformatics Institute (EBI). Besides INSDC databases, the DDBJ Center provides databases for functional genomics (GEA: Genomic Expression Archive), metabolomics (MetaboBank) and human genetic and phenotypic data (JGA: Japanese Genotype-phenotype Archive). These database systems have been built on the National Institute of Genetics (NIG) supercomputer, which is also open for domestic life science researchers to analyze large-scale sequence data. This paper reports recent updates on the archival databases and the services of the DDBJ Center, highlighting the newly redesigned MetaboBank. MetaboBank uses BioProject and BioSample in its metadata description making it suitable for multi-omics large studies. Its collaboration with MetaboLights at EBI brings synergy in locating and reusing public data.

Assuntos

Bases de Dados de Ácidos Nucleicos , Metabolômica , Metadados , Humanos , Biologia Computacional , Genômica , Internet , Japão , Multiômica/métodos

DNA Data Bank of Japan (DDBJ) update report 2022.

Tanizawa, Yasuhiro; Fujisawa, Takatomo; Kodama, Yuichi; Kosuge, Takehide; Mashima, Jun; Tanjo, Tomoya; Nakamura, Yasukazu.

Nucleic Acids Res ; 51(D1): D101-D105, 2023 01 06.

Artigo em Inglês | MEDLINE | ID: mdl-36420889

RESUMO

The Bioinformation and DNA Data Bank of Japan (DDBJ) Center (https://www.ddbj.nig.ac.jp) maintains database archives that cover a wide range of fields in life sciences. As a founding member of the International Nucleotide Sequence Database Collaboration (INSDC), our primary mission is to collect and distribute nucleotide sequence data, as well as their study and sample information, in collaboration with the National Center for Biotechnology Information in the United States and the European Bioinformatics Institute. In addition to INSDC resources, the Center operates databases for functional genomics (GEA: Genomic Expression Archive), metabolomics (MetaboBank), and human genetic and phenotypic data (JGA: Japanese Genotype-Phenotype Archive). These databases are built on the supercomputer of the National Institute of Genetics, whose remaining computational capacity is actively utilized by domestic researchers for large-scale biological data analyses. Here, we report our recent updates and the activities of our services.

Assuntos

Bases de Dados de Ácidos Nucleicos , Genômica , Humanos , Estados Unidos , Biologia Computacional , Computadores , Sequência de Bases , Japão , Internet

Practical guide for managing large-scale human genome data in research.

Tanjo, Tomoya; Kawai, Yosuke; Tokunaga, Katsushi; Ogasawara, Osamu; Nagasaki, Masao.

J Hum Genet ; 66(1): 39-52, 2021 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-33097812

RESUMO

Studies in human genetics deal with a plethora of human genome sequencing data that are generated from specimens as well as available on public domains. With the development of various bioinformatics applications, maintaining the productivity of research, managing human genome data, and analyzing downstream data is essential. This review aims to guide struggling researchers to process and analyze these large-scale genomic data to extract relevant information for improved downstream analyses. Here, we discuss worldwide human genome projects that could be integrated into any data for improved analysis. Obtaining human whole-genome sequencing data from both data stores and processes is costly; therefore, we focus on the development of data format and software that manipulate whole-genome sequencing. Once the sequencing is complete and its format and data processing tools are selected, a computational platform is required. For the platform, we describe a multi-cloud strategy that balances between cost, performance, and customizability. A good quality published research relies on data reproducibility to ensure quality results, reusability for applications to other datasets, as well as scalability for the future increase of datasets. To solve these, we describe several key technologies developed in computer science, including workflow engine. We also discuss the ethical guidelines inevitable for human genomic data analysis that differ from model organisms. Finally, the future ideal perspective of data processing and analysis is summarized.

Assuntos

Biologia Computacional/métodos , Genoma Humano/genética , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Projeto Genoma Humano , Sequenciamento Completo do Genoma/métodos , Humanos , Armazenamento e Recuperação da Informação/métodos , Reprodutibilidade dos Testes , Software

Cloud service checklist for academic communities and customization for genome medical research.

Kobayashi, Kumiko; Yoshida, Hiroshi; Tanjo, Tomoya; Aida, Kento.

Hum Genome Var ; 9(1): 36, 2022 Oct 17.

Artigo em Inglês | MEDLINE | ID: mdl-36253343

RESUMO

In this paper, we present a cloud service checklist designed to help IT administrators or researchers in academic organizations select the most suitable cloud services. This checklist, which comprises items that we believe IT administrators or researchers in academic organizations should consider when they adopt cloud services, comprehensively covers the issues related to a variety of cloud services, including security, functionality, performance, and law. In response to the increasing demands for storage and computing resources in genome medical science communities, various guidelines for using resources operated by external organizations, such as cloud services, have been published by different academic funding agencies and the Japanese government. However, it is sometimes difficult to identify the checklist items that satisfy the genome medical science community's guidelines, and some of these requirements are not included in the existing checklists. This issue provided our motivation for creating a cloud service checklist customized for genome medical research communities. The resulting customized checklist is designed to help researchers easily find information about the cloud services that satisfy the guidelines in genome medical science communities. Additionally, we explore whether many cloud service providers satisfy the requirements or checklist items in the cloud service checklist for genome medical research by evaluating their survey responses.

Accumulating computational resource usage of genomic data analysis workflow to optimize cloud computing instance selection.

Ohta, Tazro; Tanjo, Tomoya; Ogasawara, Osamu.

Gigascience ; 8(4)2019 04 01.

Artigo em Inglês | MEDLINE | ID: mdl-31222199

RESUMO

BACKGROUND: Container virtualization technologies such as Docker are popular in the bioinformatics domain because they improve the portability and reproducibility of software deployment. Along with software packaged in containers, the standardized workflow descriptors Common Workflow Language (CWL) enable data to be easily analyzed on multiple computing environments. These technologies accelerate the use of on-demand cloud computing platforms, which can be scaled according to the quantity of data. However, to optimize the time and budgetary restraints of cloud usage, users must select a suitable instance type that corresponds to the resource requirements of their workflows. RESULTS: We developed CWL-metrics, a utility tool for cwltool (the reference implementation of CWL), to collect runtime metrics of Docker containers and workflow metadata to analyze workflow resource requirements. To demonstrate the use of this tool, we analyzed 7 transcriptome quantification workflows on 6 instance types. The results revealed that choice of instance type can deliver lower financial costs and faster execution times using the required amount of computational resources. CONCLUSIONS: CWL-metrics can generate a summary of resource requirements for workflow executions, which can help users to optimize their use of cloud computing by selecting appropriate instances. The runtime metrics data generated by CWL-metrics can also help users to share workflows between different workflow management frameworks.

Assuntos

Computação em Nuvem , Biologia Computacional/métodos , Genômica/métodos , Software , Sequenciamento de Nucleotídeos em Larga Escala , Fluxo de Trabalho

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA