Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
Bioinformatics ; 34(13): i447-i456, 2018 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-29949967

RESUMO

Motivation: Most gene prioritization methods model each disease or phenotype individually, but this fails to capture patterns common to several diseases or phenotypes. To overcome this limitation, we formulate the gene prioritization task as the factorization of a sparsely filled gene-phenotype matrix, where the objective is to predict the unknown matrix entries. To deliver more accurate gene-phenotype matrix completion, we extend classical Bayesian matrix factorization to work with multiple side information sources. The availability of side information allows us to make non-trivial predictions for genes for which no previous disease association is known. Results: Our gene prioritization method can innovatively not only integrate data sources describing genes, but also data sources describing Human Phenotype Ontology terms. Experimental results on our benchmarks show that our proposed model can effectively improve accuracy over the well-established gene prioritization method, Endeavour. In particular, our proposed method offers promising results on diseases of the nervous system; diseases of the eye and adnexa; endocrine, nutritional and metabolic diseases; and congenital malformations, deformations and chromosomal abnormalities, when compared to Endeavour. Availability and implementation: The Bayesian data fusion method is implemented as a Python/C++ package: https://github.com/jaak-s/macau. It is also available as a Julia package: https://github.com/jaak-s/BayesianDataFusion.jl. All data and benchmarks generated or analyzed during this study can be downloaded at https://owncloud.esat.kuleuven.be/index.php/s/UGb89WfkZwMYoTn. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Ontologia Genética , Predisposição Genética para Doença , Armazenamento e Recuperação da Informação/métodos , Software , Algoritmos , Teorema de Bayes , Humanos
2.
Bioinformatics ; 30(13): 1850-7, 2014 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-24590441

RESUMO

MOTIVATION: Various approaches based on features extracted from protein sequences and often machine learning methods have been used in the prediction of protein folds. Finding an efficient technique for integrating these different protein features has received increasing attention. In particular, kernel methods are an interesting class of techniques for integrating heterogeneous data. Various methods have been proposed to fuse multiple kernels. Most techniques for multiple kernel learning focus on learning a convex linear combination of base kernels. In addition to the limitation of linear combinations, working with such approaches could cause a loss of potentially useful information. RESULTS: We design several techniques to combine kernel matrices by taking more involved, geometry inspired means of these matrices instead of convex linear combinations. We consider various sequence-based protein features including information extracted directly from position-specific scoring matrices and local sequence alignment. We evaluate our methods for classification on the SCOP PDB-40D benchmark dataset for protein fold recognition. The best overall accuracy on the protein fold recognition test set obtained by our methods is ∼ 86.7%. This is an improvement over the results of the best existing approach. Moreover, our computational model has been developed by incorporating the functional domain composition of proteins through a hybridization model. It is observed that by using our proposed hybridization model, the protein fold recognition accuracy is further improved to 89.30%. Furthermore, we investigate the performance of our approach on the protein remote homology detection problem by fusing multiple string kernels. AVAILABILITY AND IMPLEMENTATION: The MATLAB code used for our proposed geometric kernel fusion frameworks are publicly available at http://people.cs.kuleuven.be/∼raf.vandebril/homepage/software/geomean.php?menu=5/.


Assuntos
Dobramento de Proteína , Proteínas/química , Algoritmos , Reconhecimento Automatizado de Padrão/métodos , Matrizes de Pontuação de Posição Específica , Proteínas/metabolismo , Software
3.
Comput Struct Biotechnol J ; 20: 5235-5255, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36187917

RESUMO

Multi-omics technologies are being increasingly utilized in angiogenesis research. Yet, computational methods have not been widely used for angiogenic target discovery and prioritization in this field, partly because (wet-lab) vascular biologists are insufficiently familiar with computational biology tools and the opportunities they may offer. With this review, written for vascular biologists who lack expertise in computational methods, we aspire to break boundaries between both fields and to illustrate the potential of these tools for future angiogenic target discovery. We provide a comprehensive survey of currently available computational approaches that may be useful in prioritizing candidate genes, predicting associated mechanisms, and identifying their specificity to endothelial cell subtypes. We specifically highlight tools that use flexible, machine learning frameworks for large-scale data integration and gene prioritization. For each purpose-oriented category of tools, we describe underlying conceptual principles, highlight interesting applications and discuss limitations. Finally, we will discuss challenges and recommend some guidelines which can help to optimize the process of accurate target discovery.

4.
J Theor Biol ; 269(1): 208-16, 2011 Jan 21.
Artigo em Inglês | MEDLINE | ID: mdl-21040732

RESUMO

In this study, the predictors are developed for protein submitochondria locations based on various features of sequences. Information about the submitochondria location for a mitochondria protein can provide much better understanding about its function. We use ten representative models of protein samples such as pseudo amino acid composition, dipeptide composition, functional domain composition, the combining discrete model based on prediction of solvent accessibility and secondary structure elements, the discrete model of pairwise sequence similarity, etc. We construct a predictor based on support vector machines (SVMs) for each representative model. The overall prediction accuracy by the leave-one-out cross validation test obtained by the predictor which is based on the discrete model of pairwise sequence similarity is 1% better than the best computational system that exists for this problem. Moreover, we develop a method based on ordered weighted averaging (OWA) which is one of the fusion data operators. Therefore, OWA is applied on the 11 best SVM-based classifiers that are constructed based on various features of sequence. This method is called Mito-Loc. The overall leave-one-out cross validation accuracy obtained by Mito-Loc is about 95%. This indicates that our proposed approach (Mito-Loc) is superior to the result of the best existing approach which has already been reported.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas , Mitocôndrias/metabolismo , Proteínas Mitocondriais/química , Proteínas Mitocondriais/metabolismo , Análise de Sequência de Proteína , Algoritmos , Sequência de Aminoácidos , Transporte Proteico
5.
PLoS One ; 16(2): e0247200, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33626106

RESUMO

Inspired by the competition exclusion principle, this work aims at providing a computational framework to explore the theoretical feasibility of viral co-infection as a possible strategy to reduce the spread of a fatal strain in a population. We propose a stochastic-based model-called Co-Wish-to understand how competition between two viruses over a shared niche can affect the spread of each virus in infected tissue. To demonstrate the co-infection of two viruses, we first simulate the characteristics of two virus growth processes separately. Then, we examine their interactions until one can dominate the other. We use Co-Wish to explore how the model varies as the parameters of each virus growth process change when two viruses infect the host simultaneously. We will also investigate the effect of the delayed initiation of each infection. Moreover, Co-Wish not only examines the co-infection at the cell level but also includes the innate immune response during viral infection. The results highlight that the waiting times in the five stages of the viral infection of a cell in the model-namely attachment, penetration, eclipse, replication, and release-play an essential role in the competition between the two viruses. While it could prove challenging to fully understand the therapeutic potentials of viral co-infection, we discuss that our theoretical framework hints at an intriguing research direction in applying co-infection dynamics in controlling any viral outbreak's speed.


Assuntos
Coinfecção/virologia , Modelos Teóricos , Viroses/virologia , Fenômenos Fisiológicos Virais , Animais , Coinfecção/prevenção & controle , Controle de Doenças Transmissíveis/métodos , Humanos , Processos Estocásticos , Viroses/prevenção & controle , Vírus/patogenicidade
6.
Nat Commun ; 12(1): 124, 2021 01 05.
Artigo em Inglês | MEDLINE | ID: mdl-33402734

RESUMO

High-dimensional multi-omics data are now standard in biology. They can greatly enhance our understanding of biological systems when effectively integrated. To achieve proper integration, joint Dimensionality Reduction (jDR) methods are among the most efficient approaches. However, several jDR methods are available, urging the need for a comprehensive benchmark with practical guidelines. We perform a systematic evaluation of nine representative jDR methods using three complementary benchmarks. First, we evaluate their performances in retrieving ground-truth sample clustering from simulated multi-omics datasets. Second, we use TCGA cancer data to assess their strengths in predicting survival, clinical annotations and known pathways/biological processes. Finally, we assess their classification of multi-omics single-cell data. From these in-depth comparisons, we observe that intNMF performs best in clustering, while MCIA offers an effective behavior across many contexts. The code developed for this benchmark study is implemented in a Jupyter notebook-multi-omics mix (momix)-to foster reproducibility, and support users and future developers.


Assuntos
Algoritmos , Biologia Computacional/métodos , Regulação Neoplásica da Expressão Gênica , Proteínas de Neoplasias/genética , Neoplasias/genética , Benchmarking , Linhagem Celular Tumoral , Conjuntos de Dados como Assunto , Ontologia Genética , Humanos , Anotação de Sequência Molecular , Redução Dimensional com Múltiplos Fatores , Proteínas de Neoplasias/metabolismo , Neoplasias/diagnóstico , Neoplasias/mortalidade , Neoplasias/patologia , Reprodutibilidade dos Testes , Análise de Célula Única , Análise de Sobrevida
7.
Sci Rep ; 9(1): 7106, 2019 May 03.
Artigo em Inglês | MEDLINE | ID: mdl-31053760

RESUMO

A correction to this article has been published and is linked from the HTML and PDF versions of this paper. The error has been fixed in the paper.

8.
Sci Rep ; 8(1): 8322, 2018 05 29.
Artigo em Inglês | MEDLINE | ID: mdl-29844324

RESUMO

Despite the abundance of large-scale molecular and drug-response data, the insights gained about the mechanisms underlying treatment efficacy in cancer has been in general limited. Machine learning algorithms applied to those datasets most often are used to provide predictions without interpretation, or reveal single drug-gene association and fail to derive robust insights. We propose to use Macau, a bayesian multitask multi-relational algorithm to generalize from individual drugs and genes and explore the interactions between the drug targets and signaling pathways' activation. A typical insight would be: "Activation of pathway Y will confer sensitivity to any drug targeting protein X". We applied our methodology to the Genomics of Drug Sensitivity in Cancer (GDSC) screening, using gene expression of 990 cancer cell lines, activity scores of 11 signaling pathways derived from the tool PROGENy as cell line input and 228 nominal targets for 265 drugs as drug input. These interactions can guide a tissue-specific combination treatment strategy, for example suggesting to modulate a certain pathway to maximize the drug response for a given tissue. We confirmed in literature drug combination strategies derived from our result for brain, skin and stomach tissues. Such an analysis of interactions across tissues might help target discovery, drug repurposing and patient stratification strategies.


Assuntos
Biologia Computacional/métodos , Descoberta de Drogas/métodos , Reposicionamento de Medicamentos/métodos , Algoritmos , Antineoplásicos/uso terapêutico , Teorema de Bayes , Sistemas de Liberação de Medicamentos , Humanos , Aprendizado de Máquina , Neoplasias/tratamento farmacológico , Neoplasias/genética , Transdução de Sinais , Resultado do Tratamento
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA