Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Mais filtros

Bases de dados
Tipo de documento
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-31369384

RESUMO

Single-cell RNA sequencing (scRNA-seq) technology provides quantitative gene expression profiles at single-cell resolution. As a result, researchers have established new ways to explore cell population heterogeneity and genetic variability of cells. One of the current research directions for scRNA-seq data is to identify different cell types accurately through unsupervised clustering methods. However, scRNA-seq data analysis is challenging because of their high noise level, high dimensionality and sparsity. Moreover, the impact of multiple latent factors on gene expression heterogeneity and on the ability to accurately identify cell types remains unclear. How to overcome these challenges to reveal the biological difference between cell types has become the key to analyze scRNA-seq data. For these reasons, the unsupervised learning for cell population discovery based on scRNA-seq data analysis has become an important research area. A cell similarity assessment method plays a significant role in cell clustering. Here, we present BioRank, a new cell similarity assessment method based on annotated gene sets and gene ranks. To evaluate the performances, we cluster cells by two classical clustering algorithms based on the similarity between cells obtained by BioRank. In addition, BioRank can be used by any clustering algorithm that requires a similarity matrix. Applying BioRank to 12 public scRNA-seq datasets, we show that it is better than or at least as well as several popular similarity assessment methods for single cell clustering.


Assuntos
Biologia Computacional/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Algoritmos , Animais , Análise por Conglomerados , Bases de Dados Genéticas , Ontologia Genética , Humanos , Camundongos , Transcriptoma/genética
2.
Artigo em Inglês | MEDLINE | ID: mdl-28541220

RESUMO

A cellular signal transduction network is an important means to describe biological responses to environmental stimuli and exchange of biological signals. Constructing the cellular signal transduction network provides an important basis for the study of the biological activities, the mechanism of the diseases, drug targets and so on. The statistical approaches to network inference are popular in literature. Granger test has been used as an effective method for causality inference. Compared with bivariate granger tests, multivariate granger tests reduce the indirect causality and were used widely for the construction of cellular signal transduction networks. A multivariate Granger test requires that the number of time points in the time-series data is more than the number of nodes involved in the network. However, there are many real datasets with a few time points which are much less than the number of nodes in the network. In this study, we propose a new multivariate Granger test-based framework to construct cellular signal transduction network, called MGT-SM. Our MGT-SM uses SVD to compute the coefficient matrix from gene expression data and adopts Monte Carlo simulation to estimate the significance of directed edges in the constructed networks. We apply the proposed MGT-SM to Yeast Synthetic Network and MDA-MB-468, and evaluate its performance in terms of the recall and the AUC. The results show that MGT-SM achieves better results, compared with other popular methods (CGC2SPR, PGC, and DBN).


Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Algoritmos , Simulação por Computador , Humanos , Modelos Lineares , Método de Monte Carlo , Neoplasias/genética , Transdução de Sinais/genética , Leveduras/genética
3.
IEEE/ACM Trans Comput Biol Bioinform ; 16(6): 1890-1900, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-29994051

RESUMO

Drug repositioning is an efficient and promising strategy to identify new indications for existing drugs, which can improve the productivity of traditional drug discovery and development. Rapid advances in high-throughput technologies have generated various types of biomedical data over the past decades, which lay the foundations for furthering the development of computational drug repositioning approaches. Although many researches have tried to improve the repositioning accuracy by integrating information from multiple sources and different levels, it is still appealing to further investigate how to efficiently exploit valuable data for drug repositioning. In this study, we propose an efficient approach, Random Walk on a Heterogeneous Network for Drug Repositioning (RWHNDR), to prioritize candidate drugs for diseases. First, an integrated heterogeneous network is constructed by combining multiple sources including drugs, drug targets, diseases and disease genes data. Then, a random walk model is developed to capture the global information of the heterogeneous network. RWHNDR takes advantage of drug targets and disease genes data more comprehensively for drug repositioning. The experiment results show that our approach can achieve better performance, compared with other state-of-the-art approaches which prioritized candidate drugs based on multi-source data.


Assuntos
Biologia Computacional/métodos , Descoberta de Drogas , Reposicionamento de Medicamentos/métodos , Algoritmos , Neoplasias da Mama/tratamento farmacológico , Bases de Dados Factuais , Progressão da Doença , Indústria Farmacêutica/tendências , Humanos , Doença de Huntington/tratamento farmacológico , Neoplasias Pulmonares/tratamento farmacológico , Doença de Parkinson/tratamento farmacológico , Fenótipo , Curva ROC , Software
4.
Brief Bioinform ; 18(5): 798-819, 2017 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-27444371

RESUMO

Protein-protein interactions (PPIs) participate in all important biological processes in living organisms, such as catalyzing metabolic reactions, DNA replication, DNA transcription, responding to stimuli and transporting molecules from one location to another. To reveal the function mechanisms in cells, it is important to identify PPIs that take place in the living organism. A large number of PPIs have been discovered by high-throughput experiments and computational methods. However, false-positive PPIs have been introduced too. Therefore, to obtain reliable PPIs, many computational methods have been proposed. Generally, these methods can be classified into two categories. One category includes the methods that are designed to determine new reliable PPIs. The other one is designed to assess the reliability of existing PPIs and filter out the unreliable ones. In this article, we review the two kinds of methods for detecting reliable PPIs, and then focus on evaluating the performance of some of these typical methods. Later on, we also enumerate several PPI network-based applications with taking a reliability assessment of the PPI data into consideration. Finally, we will discuss the challenges for obtaining reliable PPIs and future directions of the construction of reliable PPI networks. Our research will provide readers some guidance for choosing appropriate methods and features for obtaining reliable PPIs.


Assuntos
Proteínas/metabolismo , Biologia Computacional , Humanos , Mapeamento de Interação de Proteínas , Reprodutibilidade dos Testes
5.
BMC Med Genomics ; 7 Suppl 2: S2, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25350511

RESUMO

BACKGROUND: Now multiple types of data are available for identifying disease genes. Those data include gene-disease associations, disease phenotype similarities, protein-protein interactions, pathways, gene expression profiles, etc.. It is believed that integrating different kinds of biological data is an effective method to identify disease genes. RESULTS: In this paper, we propose a multiple data integration method based on the theory of Markov random field (MRF) and the method of Bayesian analysis for identifying human disease genes. The proposed method is not only flexible in easily incorporating different kinds of data, but also reliable in predicting candidate disease genes. CONCLUSIONS: Numerical experiments are carried out by integrating known gene-disease associations, protein complexes, protein-protein interactions, pathways and gene expression profiles. Predictions are evaluated by the leave-one-out method. The proposed method achieves an AUC score of 0.743 when integrating all those biological data in our experiments.


Assuntos
Biologia Computacional/métodos , Doença/genética , Teorema de Bayes , Humanos , Cadeias de Markov
6.
Sci China Life Sci ; 57(11): 1054-63, 2014 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-25326067

RESUMO

Genes associated with similar diseases are often functionally related. This principle is largely supported by many biological data sources, such as disease phenotype similarities, protein complexes, protein-protein interactions, pathways and gene expression profiles. Integrating multiple types of biological data is an effective method to identify disease genes for many genetic diseases. To capture the gene-disease associations based on biological networks, a kernel-based MRF method is proposed by combining graph kernels and the Markov random field (MRF) method. In the proposed method, three kinds of kernels are employed to describe the overall relationships of vertices in five biological networks, respectively, and a novel weighted MRF method is developed to integrate those data. In addition, an improved Gibbs sampling procedure and a novel parameter estimation method are proposed to generate predictions from the kernel-based MRF method. Numerical experiments are carried out by integrating known gene-disease associations, protein complexes, protein-protein interactions, pathways and gene expression profiles. The proposed kernel-based MRF method is evaluated by the leave-one-out cross validation paradigm, achieving an AUC score of 0.771 when integrating all those biological data in our experiments, which indicates that our proposed method is very promising compared with many existing methods.


Assuntos
Biologia Computacional/métodos , Software , Algoritmos , Área Sob a Curva , Teorema de Bayes , Reações Falso-Positivas , Predisposição Genética para Doença , Genoma Humano , Humanos , Cadeias de Markov , Modelos Teóricos , Fenótipo , Probabilidade , Curva ROC , Reprodutibilidade dos Testes
7.
Proteomics ; 11(19): 3779-85, 2011 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-21809447

RESUMO

Peptide-spectrum matching is one of the most time-consuming portion of the database search method for assignment of tandem mass spectra to peptides. In this study, we develop a parallel algorithm for peptide-spectrum matching using Single-Instruction Multiple Data (SIMD) instructions. Unlike other parallel algorithms in peptide-spectrum matching, our algorithm parallelizes the computation of matches between a single spectrum and a given peptide sequence from the database. It also significantly reduces the number of comparison operations. Extra improvements are obtained by using SIMD instructions to avoid conditional branches and unnecessary memory access within the algorithm. The implementation of the developed algorithm is based on the Streaming SIMD Extensions technology that is embedded in most Intel microprocessors. Similar technology also exists in other modern microprocessors. A simulation shows that the developed algorithm achieves an 18-fold speedup over the previous version of Real-Time Peptide-Spectrum Matching algorithm [F. X. Wu et al., Rapid Commun. Mass Sepctrom. 2006, 20, 1199-1208]. Therefore, the developed algorithm can be employed to develop real-time control methods for MS/MS.


Assuntos
Peptídeos/química , Espectrometria de Massas em Tandem/métodos , Algoritmos , Bases de Dados de Proteínas , Espectrometria de Massas em Tandem/economia , Fatores de Tempo
8.
Int J Data Min Bioinform ; 5(1): 73-88, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21491845

RESUMO

In literature, hundreds of features have been proposed to assess the quality of tandem mass spectra. However, many of these features are irrelevant in describing the spectrum quality and they can degenerate the spectrum quality assessment performance. We propose a two-stage Recursive Feature Elimination based on Support Vector Machine (SVM-RFE) method to select the highly relevant features from those collected in literature. Classifiers are trained to verify the relevance of selected features. The results demonstrate that these selected features can better describe the quality of tandem mass spectra and hence improve the performance of tandem mass spectrum quality assessment.


Assuntos
Inteligência Artificial , Espectrometria de Massas em Tandem/métodos , Algoritmos , Bases de Dados Factuais , Reconhecimento Automatizado de Padrão , Proteínas/química
9.
Artigo em Inglês | MEDLINE | ID: mdl-19963684

RESUMO

Several computational methods have been proposed to assess the quality of tandem mass spectra. These methods range from supervised to unsupervised algorithms, discriminative to generative models. Unsupervised learning algorithms for tandem mass spectra are not probabilistic model based and they don't provide probabilities for spectra quality assessment. In this study, the distribution of high quality spectra and poor quality spectra are modeled by a mixture of Gaussian distributions. The Expectation Maximization (EM) algorithm is used to estimate the parameters of the Gaussian mixture model. A spectrum is assigned to the high quality or poor quality cluster according to its posterior probability. Experiments are conducted on two datasets: ISB and TOV. The results show about 57.64% and 66.38% of poor quality spectra can be removed without losing more than 10% of high quality spectra for the two spectral datasets, respectively. This indicates clustering as an exploratory data analysis tool is valuable for the quality assessment of tandem mass spectra without using a pre-labeled training dataset.


Assuntos
Modelos Biológicos , Espectrometria de Massas em Tandem/métodos , Espectrometria de Massas em Tandem/normas , Algoritmos , Análise por Conglomerados , Bases de Dados de Proteínas , Humanos , Controle de Qualidade
10.
IEEE Trans Inf Technol Biomed ; 13(2): 184-94, 2009 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-19272861

RESUMO

In this paper, we present a novel method to mine, model, and evaluate a regulatory system executing cellular functions that can be represented as a biomolecular network. Our method consists of two steps. First, a novel scale-free network clustering approach is applied to such a biomolecular network to obtain various subnetworks. Second, computational models are generated for the subnetworks and simulated to predict their behavior in the cellular context. We discuss and evaluate some of the advanced computational modeling approaches, in particular, state-space modeling, probabilistic Boolean network modeling, and fuzzy logic modeling. The modeling and simulation results represent hypotheses that are tested against high-throughput biological datasets (microarrays and/or genetic screens) under normal and perturbation conditions. Experimental results on time-series gene expression data for the human cell cycle indicate that our approach is promising for subnetwork mining and simulation from large biomolecular networks.


Assuntos
Bases de Dados Genéticas , Lógica Fuzzy , Redes Reguladoras de Genes , Algoritmos , Análise por Conglomerados , Simulação por Computador , Humanos , Cadeias de Markov , Análise em Microsséries , Modelos Moleculares
11.
BMC Bioinformatics ; 10 Suppl 1: S49, 2009 Jan 30.
Artigo em Inglês | MEDLINE | ID: mdl-19208151

RESUMO

BACKGROUND: Tandem mass spectrometry has become particularly useful for the rapid identification and characterization of protein components of complex biological mixtures. Powerful database search methods have been developed for the peptide identification, such as SEQUEST and MASCOT, which are implemented by comparing the mass spectra obtained from unknown proteins or peptides with theoretically predicted spectra derived from protein databases. However, the majority of spectra generated from a mass spectrometry experiment are of too poor quality to be interpreted while some of spectra with high quality cannot be interpreted by one method but perhaps by others. Hence a filtering algorithm that removes those spectra with poor quality prior to the database search is appealing. RESULTS: This paper proposes a support vector machine (SVM) based approach to assess the quality of tandem mass spectra. Each mass spectrum is mapping into the 16 proposed features to describe its quality. Based the results from SEQUEST, four SVM classifiers with the input of the 16 features are trained and tested on ISB data and TOV data, respectively. The superior performance of the proposed SVM classifiers is illustrated both by the comparison with the existing classifiers and by the validation in terms of MASCOT search results. CONCLUSION: The proposed method can be employed to effectively remove the poor quality spectra before the spectral searching, and also to find the more peptides or post-translational peptides from spectra with high quality using different search engines or de novo method.


Assuntos
Inteligência Artificial , Proteínas/análise , Espectrometria de Massas em Tandem/normas , Algoritmos , Biologia Computacional/métodos , Bases de Dados de Proteínas , Proteínas/química , Proteômica/métodos , Espectrometria de Massas em Tandem/métodos
12.
BMC Bioinformatics ; 9 Suppl 6: S13, 2008 May 28.
Artigo em Inglês | MEDLINE | ID: mdl-18541048

RESUMO

BACKGROUND: Tandem mass spectrometry has emerged as a cornerstone of high throughput proteomic studies owing in part to various high throughput search engines which are used to interpret these tandem mass spectra. However, majority of experimental tandem mass spectra cannot be interpreted by any existing methods. There are many reasons why this happens. However, one of the most important reasons is that majority of experimental spectra are of too poor quality to be interpretable. It wastes time to interpret these "uninterpretable" spectra by any methods. On the other hand, some spectra of high quality are not able to get a score high enough to be interpreted by existing search engines because there are many similar peptides in the searched database. However, such spectra may be good enough to be interpreted by de novo methods or manually verifying methods. Therefore, it is worth in developing a method for assessing spectral quality, which can used for filtering the spectra of poor quality before any interpretation attempts or for finding the most potential candidates for de novo methods or manually verifying methods. RESULTS: This paper develops a novel method to assess the quality of tandem mass spectra, which can eliminate majority of poor quality spectra while losing very minority of high quality spectra. First, a number of features are proposed to describe the quality of tandem mass spectra. The proposed method maps each tandem spectrum into a feature vector. Then Fisher linear discriminant analysis (FLDA) is employed to construct the classifier (the filter) which discriminates the high quality spectra from the poor quality ones. The proposed method has been tested on two tandem mass spectra datasets acquired by ion trap mass spectrometers. CONCLUSION: Computational experiments illustrate that the proposed method outperforms the existing ones. The proposed method is generic, and is expected to be applicable to assessing the quality of spectra acquired by instruments other than ion trap mass spectrometers.


Assuntos
Algoritmos , Espectrometria de Massas/métodos , Mapeamento de Peptídeos/métodos , Peptídeos/química , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Dados de Sequência Molecular , Controle de Qualidade
13.
J Bioinform Comput Biol ; 4(5): 959-80, 2006 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-17099936

RESUMO

Hidden Markov models (HMMs) are one of various methods that have been applied to prediction of major histo-compatibility complex (MHC) binding peptide. In terms of model topology, a fully-connected HMM (fcHMM) has the greatest potential to predict binders, at the cost of intensive computation. While a profile HMM (pHMM) performs dramatically fewer computations, it potentially merges overlapping patterns into one which results in some patterns being missed. In a profile HMM a state corresponds to a position on a peptide while in an fcHMM a state has no specific biological meaning. This work proposes optimally-connected HMMs (ocHMMs), which do not merge overlapping patterns and yet, by performing topological reductions, a model's connectivity is greatly reduced from an fcHMM. The parameters of ocHMMs are initialized using a novel amino acid grouping approach called "multiple property grouping." Each group represents a state in an ocHMM. The proposed ocHMMs are compared to a pHMM implementation using HMMER, based on performance tests on two MHC alleles HLA (Human Leukocyte Antigen)-A*0201 and HLA-B*3501. The results show that the heuristic approaches can be adjusted to make an ocHMM achieve higher predictive accuracy than HMMER. Hence, such obtained ocHMMs are worthy of trial for predicting MHC-binding peptides.


Assuntos
Inteligência Artificial , Antígenos de Histocompatibilidade Classe I/química , Modelos Químicos , Modelos Moleculares , Peptídeos/química , Mapeamento de Interação de Proteínas/métodos , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Sítios de Ligação , Simulação por Computador , Cadeias de Markov , Dados de Sequência Molecular , Ligação Proteica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA