Pesquisa | Portal Regional da BVS

Mostrar: 20 | 50 | 100

Resultados 1 - 10 de 10

Filtrar

Finding Highly Similar Regions of Genomic Sequences Through Homomorphic Encryption.

Bataa, Magsarjav; Song, Siwoo; Park, Kunsoo; Kim, Miran; Cheon, Jung Hee; Kim, Sun.

J Comput Biol ; 31(3): 197-212, 2024 03.

Artigo em Inglês | MEDLINE | ID: mdl-38531050

RESUMO

Finding highly similar regions of genomic sequences is a basic computation of genomic analysis. Genomic analyses on a large amount of data are efficiently processed in cloud environments, but outsourcing them to a cloud raises concerns over the privacy and security issues. Homomorphic encryption (HE) is a powerful cryptographic primitive that preserves privacy of genomic data in various analyses processed in an untrusted cloud environment. We introduce an efficient algorithm for finding highly similar regions of two homomorphically encrypted sequences, and describe how to implement it using the bit-wise and word-wise HE schemes. In the experiment, our algorithm outperforms an existing algorithm by up to two orders of magnitude in terms of elapsed time. Overall, it finds highly similar regions of the sequences in real data sets in a feasible time.

Assuntos

Segurança Computacional , Genômica , Algoritmos

RDscan: A New Method for Improving Germline and Somatic Variant Calling Based on Read Depth Distribution.

Lee, Sunho; Hong, Seokchol; Woo, Jonathan; Lee, Jae-Hak; Kim, Kyunghee; Kim, Lucia; Park, Kunsoo; Jung, Jongsun.

J Comput Biol ; 29(9): 987-1000, 2022 09.

Artigo em Inglês | MEDLINE | ID: mdl-35749140

RESUMO

Several tools have been developed for calling variants from next-generation sequencing (NGS) data. Although they are generally accurate and reliable, most of them have room for improvement, especially regarding calling variants in datasets with low read depth. In addition, the somatic variants predicted by several somatic variant callers tend to have very low concordance rates. In this study, we developed a new method (RDscan) for improving germline and somatic variant calling in NGS data. RDscan removes misaligned reads, repositions reads, and calculates RDscore based on the read depth distribution. With RDscore, RDscan improves the precision of variant callers by removing false-positive variant calls. When we tested our new tool using the latest variant calling algorithms and data from the 1000 Genomes Project and Illumina's public datasets, accuracy was improved for most of the algorithms. After screening variants with RDscan, calling accuracies increased for germline variants in 11 of 12 cases and for somatic variants in 21 of 24 cases. RDscan is simple to use and can effectively remove false-positive variants while maintaining a low computation load. Therefore, RDscan, along with existing variant callers, should contribute to improvements in genome analysis.

Assuntos

Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala , Células Germinativas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Polimorfismo de Nucleotídeo Único , Software

A new graph model and algorithms for consistent superstring problems.

Na, Joong Chae; Cho, Sukhyeun; Choi, Siwon; Kim, Jin Wook; Park, Kunsoo; Sim, Jeong Seop.

Philos Trans A Math Phys Eng Sci ; 372(2016): 20130134, 2014 May 28.

Artigo em Inglês | MEDLINE | ID: mdl-24751868

RESUMO

Problems related to string inclusion and non-inclusion have been vigorously studied in diverse fields such as data compression, molecular biology and computer security. Given a finite set of positive strings P and a finite set of negative strings N, a string α is a consistent superstring if every positive string is a substring of α and no negative string is a substring of α. The shortest (resp. longest) consistent superstring problem is to find a string α that is the shortest (resp. longest) among all the consistent superstrings for the given sets of strings. In this paper, we first propose a new graph model for consistent superstrings for given P and N. In our graph model, the set of strings represented by paths satisfying some conditions is the same as the set of consistent superstrings for P and N. We also present algorithms for the shortest and the longest consistent superstring problems. Our algorithms solve the consistent superstring problems for all cases, including cases that are not considered in previous work. Moreover, our algorithms solve in polynomial time the consistent superstring problems for more cases than the previous algorithms. For the polynomially solvable cases, our algorithms are more efficient than the previous ones.

Assuntos

Algoritmos , Gráficos por Computador , Modelos Teóricos

GapMis: a tool for pairwise sequence alignment with a single gap.

Flouri, Tomás; Frousios, Kimon; Iliopoulos, Costas S; Park, Kunsoo; Pissis, Solon P; Tischler, German.

Recent Pat DNA Gene Seq ; 7(2): 84-95, 2013 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-22974258

RESUMO

MOTIVATION: Pairwise sequence alignment has received a new motivation due to the advent of recent patents in next-generation sequencing technologies, particularly so for the application of re-sequencing---the assembly of a genome directed by a reference sequence. After the fast alignment between a factor of the reference sequence and a high-quality fragment of a short read by a short-read alignment programme, an important problem is to find the alignment between a relatively short succeeding factor of the reference sequence and the remaining low-quality part of the read allowing a number of mismatches and the insertion of a single gap in the alignment. RESULTS: We present GapMis, a tool for pairwise sequence alignment with a single gap. It is based on a simple algorithm, which computes a different version of the traditional dynamic programming matrix. The presented experimental results demonstrate that GapMis is more suitable and efficient than most popular tools for this task.

Assuntos

Algoritmos , Software , Sequenciamento de Nucleotídeos em Larga Escala , Patentes como Assunto , Alinhamento de Sequência

Monoisotopic mass determination algorithm for selenocysteine-containing polypeptides from mass spectrometric data based on theoretical modeling of isotopic peak intensity ratios.

Kim, Jin Wook; Lee, Sunho; Park, Kunsoo; Na, Seungjin; Paek, Eunok; Park, Hyung Seo; Park, Heejin; Lee, Kong-Joo; Jeong, Jaeho; Kim, Hwa-Young.

J Proteome Res ; 11(9): 4488-98, 2012 Sep 07.

Artigo em Inglês | MEDLINE | ID: mdl-22779694

RESUMO

Selenoproteins, containing selenocysteine (Sec, U) as the 21st amino acid in the genetic code, are well conserved from bacteria to human, except yeast and higher plants that miss the Sec insertion machinery. Determination of Sec association is important to find substrates and to understand redox action of selenoproteins. While mass spectrometry (MS) has become a common and powerful tool to determine an amino acid sequence of a protein, identification of a protein sequence containing Sec was not easy using MS because of the limited stability of Sec in selenoproteins. Se has six naturally occurring isotopes, 74Se, 76Se, 77Se, 78Se, 8°Se, and 8²Se, and 8°Se is the most abundant isotope. These characteristics provide a good indicator for selenopeptides but make it difficult to detect selenopeptides using software analysis tools developed for common peptides. Thus, previous reports verified MS scans of selenopeptides by manual inspection. None of the fully automated algorithms have taken into account the isotopes of Se, leading to the wrong interpretation for selenopeptides. In this paper, we present an algorithm to determine monoisotopic masses of selenocysteine-containing polypeptides. Our algorithm is based on a theoretical model for an isotopic distribution of a selenopeptide, which regards peak intensities in an isotopic distribution as the natural abundances of C, H, N, O, S, and Se. Our algorithm uses two kinds of isotopic peak intensity ratios: one for two adjacent peaks and another for two distant peaks. It is shown that our algorithm for selenopeptides performs accurately, which was demonstrated with two LC-MS/MS data sets. Using this algorithm, we have successfully identified the Sec-Cys and Sec-Sec cross-linking of glutaredoxin 1 (GRX1) from mass spectra obtained by UPLC-ESI-q-TOF instrument.

Assuntos

Algoritmos , Espectrometria de Massas/métodos , Modelos Químicos , Peptídeos/química , Selenocisteína/química , Selenoproteínas/química , Sequência de Aminoácidos , Isótopos/química , Dados de Sequência Molecular

High-throughput peptide quantification using mTRAQ reagent triplex.

Yoon, Joo Young; Yeom, Jeonghun; Lee, Heebum; Kim, Kyutae; Na, Seungjin; Park, Kunsoo; Paek, Eunok; Lee, Cheolju.

BMC Bioinformatics ; 12 Suppl 1: S46, 2011 Feb 15.

Artigo em Inglês | MEDLINE | ID: mdl-21342578

RESUMO

BACKGROUND: Protein quantification is an essential step in many proteomics experiments. A number of labeling approaches have been proposed and adopted in mass spectrometry (MS) based relative quantification. The mTRAQ, one of the stable isotope labeling methods, is amine-specific and available in triplex format, so that the sample throughput could be doubled when compared with duplex reagents. METHODS AND RESULTS: Here we propose a novel data analysis algorithm for peptide quantification in triplex mTRAQ experiments. It improved the accuracy of quantification in two features. First, it identified and separated triplex isotopic clusters of a peptide in each full MS scan. We designed a schematic model of triplex overlapping isotopic clusters, and separated triplex isotopic clusters by solving cubic equations, which are deduced from the schematic model. Second, it automatically determined the elution areas of peptides. Some peptides have similar atomic masses and elution times, so their elution areas can have overlaps. Our algorithm successfully identified the overlaps and found accurate elution areas. We validated our algorithm using standard protein mixture experiments. CONCLUSIONS: We showed that our algorithm was able to accurately quantify peptides in triplex mTRAQ experiments. Its software implementation is compatible with Trans-Proteomic Pipeline (TPP), and thus enables high-throughput analysis of proteomics data.

Assuntos

Algoritmos , Peptídeos/química , Proteômica/métodos , Software , Análise por Conglomerados , Marcação por Isótopo , Espectrometria de Massas , Modelos Estatísticos , Isoformas de Proteínas/química

Improved quantitative analysis of mass spectrometry using quadratic equations.

Yoon, Joo Young; Lim, Kyung Young; Lee, Sunho; Park, Kunsoo; Paek, Eunok; Kang, Un-Beom; Yeom, Jeonghun; Lee, Cheolju.

J Proteome Res ; 9(5): 2775-85, 2010 May 07.

Artigo em Inglês | MEDLINE | ID: mdl-20329765

RESUMO

Protein quantification is one of the principal computational problems in mass spectrometry (MS) based proteomics. For robust and trustworthy protein quantification, accurate peptide quantification must be preceded. In recent years, stable isotope labeling has become the most popular method for relative quantification of peptides. However, some stable isotope labeling methods may carry a critical problem, which is an overlap of isotopic clusters. If the mass difference between the light- and heavy-labeled peptides is very small, the overlap of their isotopic clusters becomes larger as the mass of original peptide increases. Here we propose a new algorithm for peptide quantification that separates overlapping isotopic clusters using quadratic equations. It can be easily applied in Trans-Proteomic Pipeline (TPP) instead of XPRESS. For the mTRAQ-labeled peptides obtained by an Orbitrap mass spectrometer, it showed more accurate ratios and better standard deviations than XPRESS. Especially, for the peptides that do not contain lysine, the ratio difference between XPRESS and our algorithm became larger as the peptide masses increased. We expect that this algorithm can also be applied to other labeling methods such as (18)O labeling and acrylamide labeling.

Assuntos

Algoritmos , Espectrometria de Massas/métodos , Proteômica/métodos , Proteínas Sanguíneas/análise , Mineração de Dados/métodos , Humanos , Marcação por Isótopo , Modelos Lineares , Fragmentos de Peptídeos/análise

Alignment of biological sequences with quality scores.

Na, Joong Chae; Roh, Kangho; Apostolico, Alberto; Park, Kunsoo.

Int J Bioinform Res Appl ; 5(1): 97-113, 2009.

Artigo em Inglês | MEDLINE | ID: mdl-19136367

RESUMO

In this paper we consider the problem of sequence alignment with quality scores. DNA sequences produced by a base-calling program (as part of sequencing) have quality scores which represent the confidence level for individual bases. However, previous sequence alignment algorithms do not consider such quality scores. To solve sequence alignment with quality scores, we first consider a more general problem where the input is weighted sequences which are sequences with probabilities that characters occur in each position. We propose a meaningful measure of an alignment of two weighted sequences and show that an optimal alignment in this measure can be found by dynamic programming. Sequence alignment with quality scores can be solved as a special case of the weighted sequence alignment problem.

Assuntos

Biologia Computacional/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Algoritmos

Isotopic peak intensity ratio based algorithm for determination of isotopic clusters and monoisotopic masses of polypeptides from high-resolution mass spectrometric data.

Park, Kunsoo; Yoon, Joo Young; Lee, Sunho; Paek, Eunok; Park, Heejin; Jung, Hee-Jung; Lee, Sang-Won.

Anal Chem ; 80(19): 7294-303, 2008 Oct 01.

Artigo em Inglês | MEDLINE | ID: mdl-18754627

RESUMO

Determining isotopic clusters and their monoisotopic masses is a first step in interpreting complex mass spectra generated by high-resolution mass spectrometers. We propose a mathematical model for isotopic distributions of polypeptides and an effective interpretation algorithm. Our model uses two types of ratios: intensity ratio of two adjacent peaks and intensity ratio product of three adjacent peaks in an isotopic distribution. These ratios can be approximated as simple functions of a polypeptide mass, the values of which fall within certain ranges, depending on the polypeptide mass. Given a spectrum as a peak list, our algorithm first finds all isotopic clusters consisting of two or more peaks. Then, it scores clusters using the ranges of ratio functions and computes the monoisotopic masses of the identified clusters. Our method was applied to high-resolution mass spectra obtained from a Fourier transform ion cyclotron resonance (FTICR) mass spectrometer coupled to reverse-phase liquid chromatography (RPLC). For polypeptides whose amino acid sequences were identified by tandem mass spectrometry (MS/MS), we applied both THRASH-based software implementations and our method. Our method was observed to find more masses of known peptides when the numbers of the total clusters identified by both methods were fixed. Experimental results show that our method performed better for isotopic mass clusters of weak intensity where the isotopic distributions deviate significantly from their theoretical distributions. Also, it correctly identified some isotopic clusters that were not found by THRASH-based implementations, especially those for which THRASH gave 1 Da mismatches. Another advantage of our method is that it is very fast, much faster than THRASH that calculates the least-squares fit.

Assuntos

Algoritmos , Modelos Estatísticos , Peptídeos/análise , Espectrometria de Massas em Tandem/métodos , Análise por Conglomerados , Ciclotrons , Análise de Fourier , Peptídeos/química , Proteômica/métodos

10.

The influence of the signal dynamics of activated form of IKK on NF-kappaB and anti-apoptotic gene expressions: a systems biology approach.

Park, Sung Gyoo; Lee, Taehyung; Kang, Hee Yong; Park, Kunsoo; Cho, Kwang-Hyun; Jung, Guhung.

FEBS Lett ; 580(3): 822-30, 2006 Feb 06.

Artigo em Inglês | MEDLINE | ID: mdl-16413545

RESUMO

NF-kappaB activation plays a crucial role in anti-apoptotic responses in response to the apoptotic signaling during tumor necrosis factor (TNF)-alpha stimulation. TNF-alpha induces apoptosis sensitive to the hepatitis B virus (HBV) infected cells, despite sustained NF-kappaB activation. Our results indicate that the HBV infection induces sustained NF-kappaB activation, in a manner similar to the TNF-alpha stimulation. However, these effects are not merely combined. Computational simulations show that the level of form of the IKK complex activated by phosphorylation (IKK-p) affects the dynamic pattern of NF-kappaB activation during TNF-alpha stimulation in the following ways: (i) the initial level of IKK-p determines the incremental change in IKK-p at the same level of TNF-alpha stimulation, (ii) the incremental change in IKK-p determines the amplitudes of active NF-kappaB oscillation, and (iii) the steady state level of IKK-p after the incremental change determines the period of active NF-kappaB oscillation. Based on experiments, we observed that the initial level of IKK-p was upregulated and the active NF-kappaB oscillation showed smaller amplitudes for a shorter period in HepG2.2.15 cells (HBV-producing cells) during TNF-alpha stimulation, as was indicated by the computational simulations. Furthermore, we found that during TNF-alpha stimulation, NF-kappaB-regulated anti-apoptotic genes were upregulated in HepG2 cells but were downregulated in HepG2.2.15 cells. Based on the previously mentioned results, we can conclude that the IKK-p-level changes induced by HBV infection modulate the dynamic pattern of active NF-kappaB and thereby could affect NF-kappaB-regulated anti-apoptotic gene expressions. Finally, we postulate that the sensitive apoptotic response of HBV-infected cells to TNF-alpha stimulation is governed by the dynamic patterns of active NF-kappaB based on IKK-p level changes.

Assuntos

Apoptose , Simulação por Computador , Vírus da Hepatite B/metabolismo , Hepatite B/metabolismo , Quinase I-kappa B/metabolismo , NF-kappa B/metabolismo , Linhagem Celular , Biologia Computacional , Humanos , Fator de Necrose Tumoral alfa/metabolismo , Fator de Necrose Tumoral alfa/farmacologia , Regulação para Cima/efeitos dos fármacos

Ver mais detalhes

ENVIAR RESULTADO:

Exportar

Imprimir

RSS

XML

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA