Pesquisa | Portal Regional da BVS

Introduction of the Korea BioData Station (K-BDS) for sharing biological data.

Lee, Byungwook; Hwang, Seungwoo; Kim, Pan-Gyu; Ko, Gunwhan; Jang, Kiwon; Kim, Sangok; Kim, Jong-Hwan; Jeon, Jongbum; Kim, Hyerin; Jung, Jaeeun; Yoon, Byoung-Ha; Byeon, Iksu; Jang, Insu; Song, Wangho; Choi, Jinhyuk; Kim, Seon-Young.

Genomics Inform ; 21(1): e12, 2023 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-37037470

RESUMO

A wave of new technologies has created opportunities for the cost-effective generation of high-throughput profiles of biological systems, foreshadowing a "data-driven science" era. The large variety of data available from biological research is also a rich resource that can be used for innovative endeavors. However, we are facing considerable challenges in big data deposition, integration, and translation due to the complexity of biological data and its production at unprecedented exponential rates. To address these problems, in 2020, the Korean government officially announced a national strategy to collect and manage the biological data produced through national R&D fund allocations and provide the collected data to researchers. To this end, the Korea Bioinformation Center (KOBIC) developed a new biological data repository, the Korea BioData Station (K-BDS), for sharing data from individual researchers and research programs to create a data-driven biological study environment. The K-BDS is dedicated to providing free open access to a suite of featured data resources in support of worldwide activities in both academia and industry.

Prometheus, an omics portal for interkingdom comparative genomic analyses.

Ko, Gunhwan; Jang, Insu; Koo, Namjin; Park, Seong-Jin; Oh, Sang-Ho; Kim, Min-Seo; Choi, Jin-Hyuk; Kim, Hyeongmin; Sim, Young Mi; Byeon, Iksu; Kim, Pan-Gyu; Kim, Kye Young; Yoon, Jong-Cheol; Mun, Kyung-Lok; Lee, Banghyuk; Han, Gukhee; Kim, Yong-Min.

PLoS One ; 15(10): e0240191, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-33112870

RESUMO

Functional analyses of genes are crucial for unveiling biological responses, genetic engineering, and developing new medicines. However, functional analyses have largely been restricted to model organisms, representing a major hurdle for functional studies and industrial applications. To resolve this, comparative genome analyses can be used to provide clues to gene functions as well as their evolutionary history. To this end, we present Prometheus, a web-based omics portal that contains more than 17,215 sequences from prokaryotic and eukaryotic genomes. This portal supports interkingdom comparative analyses via a domain architecture-based gene identification system and Gene Search, and users can easily and rapidly identify single or entire gene sets in specific pathways. Bioinformatics tools for further analyses are provided in Prometheus or through Bio-Express, a cloud-based bioinformatics analysis platform. Prometheus is a new paradigm for comparative analyses of large amounts of genomic information.

Assuntos

Genômica/métodos , Software , Animais , Archaea/genética , Bactérias/genética , Fungos/genética , Humanos , Metabolômica/métodos , Plantas/genética , Alinhamento de Sequência/métodos

Bioinformatics services for analyzing massive genomic datasets.

Ko, Gunhwan; Kim, Pan-Gyu; Cho, Youngbum; Jeong, Seongmun; Kim, Jae-Yoon; Kim, Kyoung Hyoun; Lee, Ho-Yeon; Han, Jiyeon; Yu, Namhee; Ham, Seokjin; Jang, Insoon; Kang, Byunghee; Shin, Sunguk; Kim, Lian; Lee, Seung-Won; Nam, Dougu; Kim, Jihyun F; Kim, Namshin; Kim, Seon-Young; Lee, Sanghyuk; Roh, Tae-Young; Lee, Byungwook.

Genomics Inform ; 18(1): e8, 2020 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-32224841

RESUMO

The explosive growth of next-generation sequencing data has resulted in ultra-large-scale datasets and ensuing computational problems. In Korea, the amount of genomic data has been increasing rapidly in the recent years. Leveraging these big data requires researchers to use large-scale computational resources and analysis pipelines. A promising solution for addressing this computational challenge is cloud computing, where CPUs, memory, storage, and programs are accessible in the form of virtual machines. Here, we present a cloud computing-based system, Bio-Express, that provides user-friendly, cost-effective analysis of massive genomic datasets. Bio-Express is loaded with predefined multi-omics data analysis pipelines, which are divided into genome, transcriptome, epigenome, and metagenome pipelines. Users can employ predefined pipelines or create a new pipeline for analyzing their own omics data. We also developed several web-based services for facilitating downstream analysis of genome data. Bio-Express web service is freely available at https://www.bioexpress.re.kr/.

Closha: bioinformatics workflow system for the analysis of massive sequencing data.

Ko, GunHwan; Kim, Pan-Gyu; Yoon, Jongcheol; Han, Gukhee; Park, Seong-Jin; Song, Wangho; Lee, Byungwook.

BMC Bioinformatics ; 19(Suppl 1): 43, 2018 02 19.

Artigo em Inglês | MEDLINE | ID: mdl-29504905

RESUMO

BACKGROUND: While next-generation sequencing (NGS) costs have fallen in recent years, the cost and complexity of computation remain substantial obstacles to the use of NGS in bio-medical care and genomic research. The rapidly increasing amounts of data available from the new high-throughput methods have made data processing infeasible without automated pipelines. The integration of data and analytic resources into workflow systems provides a solution to the problem by simplifying the task of data analysis. RESULTS: To address this challenge, we developed a cloud-based workflow management system, Closha, to provide fast and cost-effective analysis of massive genomic data. We implemented complex workflows making optimal use of high-performance computing clusters. Closha allows users to create multi-step analyses using drag and drop functionality and to modify the parameters of pipeline tools. Users can also import the Galaxy pipelines into Closha. Closha is a hybrid system that enables users to use both analysis programs providing traditional tools and MapReduce-based big data analysis programs simultaneously in a single pipeline. Thus, the execution of analytics algorithms can be parallelized, speeding up the whole process. We also developed a high-speed data transmission solution, KoDS, to transmit a large amount of data at a fast rate. KoDS has a file transfer speed of up to 10 times that of normal FTP and HTTP. The computer hardware for Closha is 660 CPU cores and 800 TB of disk storage, enabling 500 jobs to run at the same time. CONCLUSIONS: Closha is a scalable, cost-effective, and publicly available web service for large-scale genomic data analysis. Closha supports the reliable and highly scalable execution of sequencing analysis workflows in a fully automated manner. Closha provides a user-friendly interface to all genomic scientists to try to derive accurate results from NGS platform data. The Closha cloud server is freely available for use from http://closha.kobic.re.kr/ .

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Algoritmos , Computação em Nuvem , Genômica/métodos , Fluxo de Trabalho

Genome analysis of Hibiscus syriacus provides insights of polyploidization and indeterminate flowering in woody plants.

Kim, Yong-Min; Kim, Seungill; Koo, Namjin; Shin, Ah-Young; Yeom, Seon-In; Seo, Eunyoung; Park, Seong-Jin; Kang, Won-Hee; Kim, Myung-Shin; Park, Jieun; Jang, Insu; Kim, Pan-Gyu; Byeon, Iksu; Kim, Min-Seo; Choi, JinHyuk; Ko, Gunhwan; Hwang, JiHye; Yang, Tae-Jin; Choi, Sang-Bong; Lee, Je Min; Lim, Ki-Byung; Lee, Jungho; Choi, Ik-Young; Park, Beom-Seok; Kwon, Suk-Yoon; Choi, Doil; Kim, Ryan W.

DNA Res ; 24(1): 71-80, 2017 Feb 01.

Artigo em Inglês | MEDLINE | ID: mdl-28011721

RESUMO

Hibiscus syriacus (L.) (rose of Sharon) is one of the most widespread garden shrubs in the world. We report a draft of the H. syriacus genome comprised of a 1.75 Gb assembly that covers 92% of the genome with only 1.7% (33 Mb) gap sequences. Predicted gene modeling detected 87,603 genes, mostly supported by deep RNA sequencing data. To define gene family distribution among relatives of H. syriacus, orthologous gene sets containing 164,660 genes in 21,472 clusters were identified by OrthoMCL analysis of five plant species, including H. syriacus, Arabidopsis thaliana, Gossypium raimondii, Theobroma cacao and Amborella trichopoda. We inferred their evolutionary relationships based on divergence times among Malvaceae plant genes and found that gene families involved in flowering regulation and disease resistance were more highly divergent and expanded in H. syriacus than in its close relatives, G. raimondii (DD) and T. cacao. Clustered gene families and gene collinearity analysis revealed that two recent rounds of whole-genome duplication were followed by diploidization of the H. syriacus genome after speciation. Copy number variation and phylogenetic divergence indicates that WGDs and subsequent diploidization led to unequal duplication and deletion of flowering-related genes in H. syriacus and may affect its unique floral morphology.

Assuntos

Flores/crescimento & desenvolvimento , Genoma de Planta , Hibiscus/genética , Poliploidia , Proteínas de Ligação a DNA/genética , Hibiscus/fisiologia , Família Multigênica , Proteínas de Ligação a RNA/genética , Transcriptoma

A scaffold analysis tool using mate-pair information in genome sequencing.

Kim, Pan-Gyu; Cho, Hwan-Gue; Park, Kiejung.

J Biomed Biotechnol ; 2008: 675741, 2008.

Artigo em Inglês | MEDLINE | ID: mdl-18414585

RESUMO

We have developed a Windows-based program, ConPath, as a scaffold analyzer. ConPath constructs scaffolds by ordering and orienting separate sequence contigs by exploiting the mate-pair information between contig-pairs. Our algorithm builds directed graphs from link information and traverses them to find the longest acyclic graphs. Using end read pairs of fixed-sized mate-pair libraries, ConPath determines relative orientations of all contigs, estimates the gap size of each adjacent contig pair, and reports wrong assembly information by validating orientations and gap sizes. We have utilized ConPath in more than 10 microbial genome projects, including Mannheimia succiniciproducens and Vibro vulnificus, where we verified contig assembly and identified several erroneous contigs using the four types of error defined in ConPath. Also, ConPath supports some convenient features and viewers that permit investigation of each contig in detail; these include contig viewer, scaffold viewer, edge information list, mate-pair list, and the printing of complex scaffold structures.

Assuntos

Algoritmos , Mapeamento de Sequências Contíguas/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Software , Pareamento Incorreto de Bases , Sequência de Bases , Dados de Sequência Molecular

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA