Pesquisa | Portal Regional da BVS

GAAP: A Genome Assembly + Annotation Pipeline.

Kong, Jinhwa; Huh, Sun; Won, Jung-Im; Yoon, Jeehee; Kim, Baeksop; Kim, Kiyong.

Biomed Res Int ; 2019: 4767354, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31346518

RESUMO

Genomic analysis begins with de novo assembly of short-read fragments in order to reconstruct full-length base sequences without exploiting a reference genome sequence. Then, in the annotation step, gene locations are identified within the base sequences, and the structures and functions of these genes are determined. Recently, a wide range of powerful tools have been developed and published for whole-genome analysis, enabling even individual researchers in small laboratories to perform whole-genome analyses on their objects of interest. However, these analytical tools are generally complex and use diverse algorithms, parameter setting methods, and input formats; thus, it remains difficult for individual researchers to select, utilize, and combine these tools to obtain their final results. To resolve these issues, we have developed a genome analysis pipeline (GAAP) for semiautomated, iterative, and high-throughput analysis of whole-genome data. This pipeline is designed to perform read correction, de novo genome (transcriptome) assembly, gene prediction, and functional annotation using a range of proven tools and databases. We aim to assist non-IT researchers by describing each stage of analysis in detail and discussing current approaches. We also provide practical advice on how to access and use the bioinformatics tools and databases and how to implement the provided suggestions. Whole-genome analysis of Toxocara canis is used as case study to show intermediate results at each stage, demonstrating the practicality of the proposed method.

Assuntos

Bases de Dados de Ácidos Nucleicos , Genoma Helmíntico , Anotação de Sequência Molecular , Toxocara canis/genética , Sequenciamento Completo do Genoma , Animais , Genômica

Shape-based retrieval of CNV regions in read coverage data.

Hong, Sangkyun; Yoon, Jeehee; Hong, Dongwan; Lee, Unjoo; Kim, Baeksop; Park, Sanghyun.

Int J Data Min Bioinform ; 9(3): 254-76, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25163168

RESUMO

This study proposes a novel copy number variation (CNV) detection method, CNV_shape, based on variations in the shape of the read coverage data which are obtained from millions of short reads aligned to a reference sequence. The proposed method carries out two transforms, mean shift transform and mean slope transform, to extract the shape of a CNV more precisely from real human data, which are vulnerable to experimental and biological noises. The mean shift transform is a procedure for gaining a preliminary estimation of the CNVs by statistically evaluating moving averages of given read coverage data. The mean slope transform extracts candidate CNVs by filtering out non-stationary sub-regions from each of the primary CNVs pre-estimated in the mean shift procedure. Each of the candidate CNVs is merged with neighbours depending on the merging score to be finally identified as a putative CNV, where the merging score is estimated by the ratio of the positions with non-zero values of the mean shift transform to the total length of the region including two neighbouring candidate CNVs and the interval between them. The proposed CNV detection method was validated experimentally with simulated data and real human data. The simulated data with coverage in the range of 1x to 10x were generated for various sampling sizes and p-values. Five individual human genomes were used as real human data. The results show that relatively small CNVs (> 1 kbp) can be detected from low coverage (> 1.7x) data. The results also reveal that, in contrast to conventional methods, performance improvement from 8.18 to 87.90% was achieved in CNV_shape. The outcomes suggest that the proposed method is very effective in reducing noises inherent in real data as well as in detecting CNVs of various sizes and types.

Assuntos

Variações do Número de Cópias de DNA , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Algoritmos , Biologia Computacional/métodos , Simulação por Computador , Processamento Eletrônico de Dados , Variação Genética , Genoma Humano , Humanos , Modelos Estatísticos , Reprodutibilidade dos Testes , Processamento de Sinais Assistido por Computador

A computational method for detecting copy number variations using scale-space filtering.

Lee, Jongkeun; Lee, Unjoo; Kim, Baeksop; Yoon, Jeehee.

BMC Bioinformatics ; 14: 57, 2013 Feb 18.

Artigo em Inglês | MEDLINE | ID: mdl-23418726

RESUMO

BACKGROUND: As next-generation sequencing technology made rapid and cost-effective sequencing available, the importance of computational approaches in finding and analyzing copy number variations (CNVs) has been amplified. Furthermore, most genome projects need to accurately analyze sequences with fairly low-coverage read data. It is urgently needed to develop a method to detect the exact types and locations of CNVs from low coverage read data. RESULTS: Here, we propose a new CNV detection method, CNV_SS, which uses scale-space filtering. The scale-space filtering is evaluated by applying to the read coverage data the Gaussian convolution for various scales according to a given scaling parameter. Next, by differentiating twice and finding zero-crossing points, inflection points of scale-space filtered read coverage data are calculated per scale. Then, the types and the exact locations of CNVs are obtained by analyzing the finger print map, the contours of zero-crossing points for various scales. CONCLUSIONS: The performance of CNV_SS showed that FNR and FPR stay in the range of 1.27% to 2.43% and 1.14% to 2.44%, respectively, even at a relatively low coverage (0.5x ≤C ≤2x). CNV_SS gave also much more effective results than the conventional methods in the evaluation of FNR, at 3.82% at least and 76.97% at most even when the coverage level of read data is low. CNV_SS source code is freely available from http://dblab.hallym.ac.kr/CNV SS/.

Assuntos

Variações do Número de Cópias de DNA , Análise de Sequência de DNA/métodos , Biologia Computacional/métodos , Genoma , Projeto HapMap , Humanos

Detection of copy number variation using scale space filtering.

Lee, Jongkeun; Kim, Baeksop; Yoon, Jeehee; Lee, Unjoo.

Annu Int Conf IEEE Eng Med Biol Soc ; 2011: 5555-8, 2011.

Artigo em Inglês | MEDLINE | ID: mdl-22255597

RESUMO

This study proposes a novel CNV detection algorithm based on scale space filtering. It uses Gaussian filter for the convolution with a scale parameter. The range of the scale parameter is adjusted according to the coverage level of read data. The position of a CNV region is determined through a coarse and a fine searches over the scales. The results showed low dependency of the performance of the proposed method on the coverage level compared to the conventional methods. The results also showed that the proposed method outperforms the conventional methods by 63.29 ~ 73.57 %.

Assuntos

Algoritmos , Variações do Número de Cópias de DNA/genética , Análise Mutacional de DNA/métodos , Dosagem de Genes/genética , Análise de Sequência de DNA/métodos , Sequência de Bases , Dados de Sequência Molecular

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA