Pesquisa | BVS IEC

Closha: bioinformatics workflow system for the analysis of massive sequencing data.

Ko, GunHwan; Kim, Pan-Gyu; Yoon, Jongcheol; Han, Gukhee; Park, Seong-Jin; Song, Wangho; Lee, Byungwook.

BMC Bioinformatics ; 19(Suppl 1): 43, 2018 02 19.

Artigo em Inglês | MEDLINE | ID: mdl-29504905

RESUMO

BACKGROUND: While next-generation sequencing (NGS) costs have fallen in recent years, the cost and complexity of computation remain substantial obstacles to the use of NGS in bio-medical care and genomic research. The rapidly increasing amounts of data available from the new high-throughput methods have made data processing infeasible without automated pipelines. The integration of data and analytic resources into workflow systems provides a solution to the problem by simplifying the task of data analysis. RESULTS: To address this challenge, we developed a cloud-based workflow management system, Closha, to provide fast and cost-effective analysis of massive genomic data. We implemented complex workflows making optimal use of high-performance computing clusters. Closha allows users to create multi-step analyses using drag and drop functionality and to modify the parameters of pipeline tools. Users can also import the Galaxy pipelines into Closha. Closha is a hybrid system that enables users to use both analysis programs providing traditional tools and MapReduce-based big data analysis programs simultaneously in a single pipeline. Thus, the execution of analytics algorithms can be parallelized, speeding up the whole process. We also developed a high-speed data transmission solution, KoDS, to transmit a large amount of data at a fast rate. KoDS has a file transfer speed of up to 10 times that of normal FTP and HTTP. The computer hardware for Closha is 660 CPU cores and 800 TB of disk storage, enabling 500 jobs to run at the same time. CONCLUSIONS: Closha is a scalable, cost-effective, and publicly available web service for large-scale genomic data analysis. Closha supports the reliable and highly scalable execution of sequencing analysis workflows in a fully automated manner. Closha provides a user-friendly interface to all genomic scientists to try to derive accurate results from NGS platform data. The Closha cloud server is freely available for use from http://closha.kobic.re.kr/ .

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Algoritmos , Computação em Nuvem , Genômica/métodos , Fluxo de Trabalho

KoNA: Korean Nucleotide Archive as A New Data Repository for Nucleotide Sequence Data.

Ko, Gunhwan; Lee, Jae Ho; Sim, Young Mi; Song, Wangho; Yoon, Byung-Ha; Byeon, Iksu; Lee, Bang Hyuck; Kim, Sang-Ok; Choi, Jinhyuk; Jang, Insoo; Kim, Hyerin; Yang, Jin Ok; Jang, Kiwon; Kim, Sora; Kim, Jong-Hwan; Jeon, Jongbum; Jung, Jaeeun; Hwang, Seungwoo; Park, Ji-Hwan; Kim, Pan-Gyu; Kim, Seon-Young; Lee, Byungwook.

Genomics Proteomics Bioinformatics ; 22(1)2024 May 09.

Artigo em Inglês | MEDLINE | ID: mdl-38862433

RESUMO

During the last decade, the generation and accumulation of petabase-scale high-throughput sequencing data have resulted in great challenges, including access to human data, as well as transfer, storage, and sharing of enormous amounts of data. To promote data-driven biological research, the Korean government announced that all biological data generated from government-funded research projects should be deposited at the Korea BioData Station (K-BDS), which consists of multiple databases for individual data types. Here, we introduce the Korean Nucleotide Archive (KoNA), a repository of nucleotide sequence data. As of July 2022, the Korean Read Archive in KoNA has collected over 477 TB of raw next-generation sequencing data from national genome projects. To ensure data quality and prepare for international alignment, a standard operating procedure was adopted, which is similar to that of the International Nucleotide Sequence Database Collaboration. The standard operating procedure includes quality control processes for submitted data and metadata using an automated pipeline, followed by manual examination. To ensure fast and stable data transfer, a high-speed transmission system called GBox is used in KoNA. Furthermore, the data uploaded to or downloaded from KoNA through GBox can be readily processed using a cloud computing service called Bio-Express. This seamless coupling of KoNA, GBox, and Bio-Express enhances the data experience, including submission, access, and analysis of raw nucleotide sequences. KoNA not only satisfies the unmet needs for a national sequence repository in Korea but also provides datasets to researchers globally and contributes to advances in genomics. The KoNA is available at https://www.kobic.re.kr/kona/.

Assuntos

Bases de Dados de Ácidos Nucleicos , República da Coreia , Humanos , Sequenciamento de Nucleotídeos em Larga Escala/métodos

Introduction of the Korea BioData Station (K-BDS) for sharing biological data.

Lee, Byungwook; Hwang, Seungwoo; Kim, Pan-Gyu; Ko, Gunwhan; Jang, Kiwon; Kim, Sangok; Kim, Jong-Hwan; Jeon, Jongbum; Kim, Hyerin; Jung, Jaeeun; Yoon, Byoung-Ha; Byeon, Iksu; Jang, Insu; Song, Wangho; Choi, Jinhyuk; Kim, Seon-Young.

Genomics Inform ; 21(1): e12, 2023 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-37037470

RESUMO

A wave of new technologies has created opportunities for the cost-effective generation of high-throughput profiles of biological systems, foreshadowing a "data-driven science" era. The large variety of data available from biological research is also a rich resource that can be used for innovative endeavors. However, we are facing considerable challenges in big data deposition, integration, and translation due to the complexity of biological data and its production at unprecedented exponential rates. To address these problems, in 2020, the Korean government officially announced a national strategy to collect and manage the biological data produced through national R&D fund allocations and provide the collected data to researchers. To this end, the Korea Bioinformation Center (KOBIC) developed a new biological data repository, the Korea BioData Station (K-BDS), for sharing data from individual researchers and research programs to create a data-driven biological study environment. The K-BDS is dedicated to providing free open access to a suite of featured data resources in support of worldwide activities in both academia and industry.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA