Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38695119

RESUMO

Sequence similarity is of paramount importance in biology, as similar sequences tend to have similar function and share common ancestry. Scoring matrices, such as PAM or BLOSUM, play a crucial role in all bioinformatics algorithms for identifying similarities, but have the drawback that they are fixed, independent of context. We propose a new scoring method for amino acid similarity that remedies this weakness, being contextually dependent. It relies on recent advances in deep learning architectures that employ self-supervised learning in order to leverage the power of enormous amounts of unlabelled data to generate contextual embeddings, which are vector representations for words. These ideas have been applied to protein sequences, producing embedding vectors for protein residues. We propose the E-score between two residues as the cosine similarity between their embedding vector representations. Thorough testing on a wide variety of reference multiple sequence alignments indicate that the alignments produced using the new $E$-score method, especially ProtT5-score, are significantly better than those obtained using BLOSUM matrices. The new method proposes to change the way alignments are computed, with far-reaching implications in all areas of textual data that use sequence similarity. The program to compute alignments based on various $E$-scores is available as a web server at e-score.csd.uwo.ca. The source code is freely available for download from github.com/lucian-ilie/E-score.


Assuntos
Algoritmos , Biologia Computacional , Alinhamento de Sequência , Alinhamento de Sequência/métodos , Biologia Computacional/métodos , Software , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Proteínas/química , Proteínas/genética , Aprendizado Profundo , Bases de Dados de Proteínas
2.
Entropy (Basel) ; 25(8)2023 Aug 05.
Artigo em Inglês | MEDLINE | ID: mdl-37628198

RESUMO

Stochastic modeling of biochemical processes at the cellular level has been the subject of intense research in recent years. The Chemical Master Equation is a broadly utilized stochastic discrete model of such processes. Numerous important biochemical systems consist of many species subject to many reactions. As a result, their mathematical models depend on many parameters. In applications, some of the model parameters may be unknown, so their values need to be estimated from the experimental data. However, the problem of parameter value inference can be quite challenging, especially in the stochastic setting. To estimate accurately the values of a subset of parameters, the system should be sensitive with respect to variations in each of these parameters and they should not be correlated. In this paper, we propose a technique for detecting collinearity among models' parameters and we apply this method for selecting subsets of parameters that can be estimated from the available data. The analysis relies on finite-difference sensitivity estimations and the singular value decomposition of the sensitivity matrix. We illustrated the advantages of the proposed method by successfully testing it on several models of biochemical systems of practical interest.

3.
Math Biosci ; 312: 23-32, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-30998936

RESUMO

The present paper introduces a new micro-meso hybrid algorithm based on the Ghost Cell Method concept in which the microscopic subdomain is governed by the Reactive Multi-Particle Collision (RMPC) dynamics. The mesoscopic subdomain is modeled using the Reaction-Diffusion Master Equation (RDME). The RDME is solved by means of the Inhomogeneous Stochastic Simulation Algorithm. No hybrid algorithm has hitherto used the RMPC dynamics for modeling reactions and the trajectories of each individual particle. The RMPC is faster than other molecular based methods and has the advantage of conserving mass, energy and momentum in the collision and free streaming steps. The new algorithm is tested on three reaction-diffusion systems. In all the systems studied, very good agreement with the deterministic solutions of the corresponding differential equations is obtained. In addition, it has been shown that proper discretization of the computational domain results in significant speed-ups in comparison with the full RMPC algorithm.


Assuntos
Algoritmos , Fenômenos Bioquímicos , Simulação por Computador , Modelos Teóricos , Processos Estocásticos , Difusão
4.
IET Syst Biol ; 12(4): 123-130, 2018 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-33451187

RESUMO

Simulation of cellular processes is achieved through a range of mathematical modelling approaches. Deterministic differential equation models are a commonly used first strategy. However, because many biochemical processes are inherently probabilistic, stochastic models are often called for to capture the random fluctuations observed in these systems. In that context, the Chemical Master Equation (CME) is a widely used stochastic model of biochemical kinetics. Use of these models relies on estimates of kinetic parameters, which are often poorly constrained by experimental observations. Consequently, sensitivity analysis, which quantifies the dependence of systems dynamics on model parameters, is a valuable tool for model analysis and assessment. A number of approaches to sensitivity analysis of biochemical models have been developed. In this study, the authors present a novel method for estimation of sensitivity coefficients for CME models of biochemical reaction systems that span a wide range of time-scales. They make use of finite-difference approximations and adaptive implicit tau-leaping strategies to estimate sensitivities for these stiff models, resulting in significant computational efficiencies in comparison with previously published approaches of similar accuracy, as evidenced by illustrative applications.

5.
Biosystems ; 151: 43-52, 2017 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-27914944

RESUMO

Sensitivity analysis characterizes the dependence of a model's behaviour on system parameters. It is a critical tool in the formulation, characterization, and verification of models of biochemical reaction networks, for which confident estimates of parameter values are often lacking. In this paper, we propose a novel method for sensitivity analysis of discrete stochastic models of biochemical reaction systems whose dynamics occur over a range of timescales. This method combines finite-difference approximations and adaptive tau-leaping strategies to efficiently estimate parametric sensitivities for stiff stochastic biochemical kinetics models, with negligible loss in accuracy compared with previously published approaches. We analyze several models of interest to illustrate the advantages of our method.


Assuntos
Algoritmos , Fenômenos Bioquímicos/fisiologia , Fenômenos Fisiológicos Celulares/fisiologia , Processos Estocásticos , Simulação por Computador , Cinética , Modelos Biológicos , Modelos Químicos , Modelos Genéticos
6.
J Chem Phys ; 143(23): 234108, 2015 Dec 21.
Artigo em Inglês | MEDLINE | ID: mdl-26696047

RESUMO

In this paper, we present a novel hybrid method to simulate discrete stochastic reaction-diffusion models arising in biochemical signaling pathways. We study moderately stiff systems, for which we can partition each reaction or diffusion channel into either a slow or fast subset, based on its propensity. Numerical approaches missing this distinction are often limited with respect to computational run time or approximation quality. We design an approximate scheme that remedies these pitfalls by using a new blending strategy of the well-established inhomogeneous stochastic simulation algorithm and the tau-leaping simulation method. The advantages of our hybrid simulation algorithm are demonstrated on three benchmarking systems, with special focus on approximation accuracy and efficiency.

7.
J Chem Phys ; 137(23): 234110, 2012 Dec 21.
Artigo em Inglês | MEDLINE | ID: mdl-23267474

RESUMO

Stochastic modeling is essential for an accurate description of the biochemical network dynamics at the level of a single cell. Biochemically reacting systems often evolve on multiple time-scales, thus their stochastic mathematical models manifest stiffness. Stochastic models which, in addition, are stiff and computationally very challenging, therefore the need for developing effective and accurate numerical methods for approximating their solution. An important stochastic model of well-stirred biochemical systems is the chemical Langevin Equation. The chemical Langevin equation is a system of stochastic differential equation with multidimensional non-commutative noise. This model is valid in the regime of large molecular populations, far from the thermodynamic limit. In this paper, we propose a variable time-stepping strategy for the numerical solution of a general chemical Langevin equation, which applies for any level of randomness in the system. Our variable stepsize method allows arbitrary values of the time-step. Numerical results on several models arising in applications show significant improvement in accuracy and efficiency of the proposed adaptive scheme over the existing methods, the strategies based on halving/doubling of the stepsize and the fixed step-size ones.


Assuntos
Algoritmos , Modelos Lineares , Modelos Químicos , Processos Estocásticos , Simulação por Computador , Cinética
8.
J Chem Phys ; 136(18): 184101, 2012 May 14.
Artigo em Inglês | MEDLINE | ID: mdl-22583271

RESUMO

Mathematical and computational modeling are key tools in analyzing important biological processes in cells and living organisms. In particular, stochastic models are essential to accurately describe the cellular dynamics, when the assumption of the thermodynamic limit can no longer be applied. However, stochastic models are computationally much more challenging than the traditional deterministic models. Moreover, many biochemical systems arising in applications have multiple time-scales, which lead to mathematical stiffness. In this paper we investigate the numerical solution of a stochastic continuous model of well-stirred biochemical systems, the chemical Langevin equation. The chemical Langevin equation is a stochastic differential equation with multiplicative, non-commutative noise. We propose an adaptive stepsize algorithm for approximating the solution of models of biochemical systems in the Langevin regime, with small noise, based on estimates of the local error. The underlying numerical method is the Milstein scheme. The proposed adaptive method is tested on several examples arising in applications and it is shown to have improved efficiency and accuracy compared to the existing fixed stepsize schemes.


Assuntos
Modelos Químicos , Algoritmos , Dinâmica não Linear , Processos Estocásticos , Fatores de Tempo
9.
BMC Res Notes ; 5: 123, 2012 Feb 28.
Artigo em Inglês | MEDLINE | ID: mdl-22373455

RESUMO

BACKGROUND: The most frequently used tools in bioinformatics are those searching for similarities, or local alignments, between biological sequences. Since the exact dynamic programming algorithm is quadratic, linear-time heuristics such as BLAST are used. Spaced seeds are much more sensitive than the consecutive seed of BLAST and using several seeds represents the current state of the art in approximate search for biological sequences. The most important aspect is computing highly sensitive seeds. Since the problem seems hard, heuristic algorithms are used. The leading software in the common Bernoulli model is the SpEED program. FINDINGS: SpEED uses a hill climbing method based on the overlap complexity heuristic. We propose a new algorithm for this heuristic that improves its speed by over one order of magnitude. We use the new implementation to compute improved seeds for several software programs. We compute as well multiple seeds of the same weight as MegaBLAST, that greatly improve its sensitivity. CONCLUSION: Multiple spaced seeds are being successfully used in bioinformatics software programs. Enabling researchers to compute very fast high quality seeds will help expanding the range of their applications.


Assuntos
Algoritmos , Biologia Computacional/métodos , Alinhamento de Sequência/métodos , Software , Reprodutibilidade dos Testes , Fatores de Tempo
10.
BMC Genomics ; 12: 280, 2011 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-21627845

RESUMO

BACKGROUND: DNA oligonucleotides are a very useful tool in biology. The best algorithms for designing good DNA oligonucleotides are filtering out unsuitable regions using a seeding approach. Determining the quality of the seeds is crucial for the performance of these algorithms. RESULTS: We present a sound framework for evaluating the quality of seeds for oligonucleotide design. The F - score is used to measure the accuracy of each seed. A number of natural candidates are tested: contiguous (BLAST-like), spaced, transitions-constrained, and multiple spaced seeds. Multiple spaced seeds are the best, with more seeds providing better accuracy. Single spaced and transition seeds are very close whereas, as expected, contiguous seeds come last. Increased accuracy comes at the price of reduced efficiency. An exception is that single spaced and transitions-constrained seeds are both more accurate and more efficient than contiguous ones. CONCLUSIONS: Our work confirms another application where multiple spaced seeds perform the best. It will be useful in improving the algorithms for oligonucleotide design.


Assuntos
Técnicas de Amplificação de Ácido Nucleico/métodos , Oligonucleotídeos/genética , DNA/genética
11.
Bioinformatics ; 27(17): 2433-4, 2011 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-21690104

RESUMO

SUMMARY: Multiple spaced seeds represent the current state-of-the-art for similarity search in bioinformatics, with applications in various areas such as sequence alignment, read mapping, oligonucleotide design, etc. We present SpEED, a software program that computes highly sensitive multiple spaced seeds. SpEED can be several orders of magnitude faster and computes better seeds than the existing leading software programs. AVAILABILITY: The source code of SpEED is freely available at www.csd.uwo.ca/~ilie/SpEED/ CONTACT: ilie@csd.uwo.ca SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Software , Algoritmos , Alinhamento de Sequência
12.
Bioinformatics ; 27(3): 295-302, 2011 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-21115437

RESUMO

MOTIVATION: High-throughput sequencing technologies produce very large amounts of data and sequencing errors constitute one of the major problems in analyzing such data. Current algorithms for correcting these errors are not very accurate and do not automatically adapt to the given data. RESULTS: We present HiTEC, an algorithm that provides a highly accurate, robust and fully automated method to correct reads produced by high-throughput sequencing methods. Our approach provides significantly higher accuracy than previous methods. It is time and space efficient and works very well for all read lengths, genome sizes and coverage levels. AVAILABILITY: The source code of HiTEC is freely available at www.csd.uwo.ca/~ilie/HiTEC/.


Assuntos
Algoritmos , Análise de Sequência de DNA/métodos , Genoma , Modelos Genéticos , Reprodutibilidade dos Testes , Software
13.
Bioinformatics ; 25(6): 822-3, 2009 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-19176560

RESUMO

MOTIVATION: Alignment of biological sequences is one of the most frequently performed computer tasks. The current state of the art involves the use of (multiple) spaced seeds for producing high quality alignments. A particular important class is that of neighbor seeds which combine high sensitivity with reduced space requirements. Current algorithms for computing good neighbor seeds are very slow (exponential). RESULTS: We give a polynomial-time heuristic algorithm that computes better neighbor seeds than previous ones while being several orders of magnitude faster.


Assuntos
Algoritmos , Biologia Computacional/métodos , Alinhamento de Sequência/métodos
14.
Bioinformatics ; 23(22): 2969-77, 2007 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-17804438

RESUMO

MOTIVATION: Homology search finds similar segments between two biological sequences, such as DNA or protein sequences. The introduction of optimal spaced seeds in PatternHunter has increased both the sensitivity and the speed of homology search, and it has been adopted by many alignment programs such as BLAST. With the further improvement provided by multiple spaced seeds in PatternHunterII, Smith-Waterman sensitivity is approached at BLASTn speed. However, computing optimal multiple spaced seeds was proved to be NP-hard and current heuristic algorithms are all very slow (exponential). RESULTS: We give a simple algorithm which computes good multiple seeds in polynomial time. Due to a completely different approach, the difference with respect to the previous methods is dramatic. The multiple spaced seed of PatternHunterII, with 16 weight 11 seeds, was computed in 12 days. It takes us 17 s to find a better one. Our approach changes the way of looking at multiple spaced seeds.


Assuntos
Algoritmos , Reconhecimento Automatizado de Padrão/métodos , Alinhamento de Sequência/métodos , Análise de Sequência/métodos , Homologia de Sequência
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA