Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Mais filtros











Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-38963747

RESUMO

Microarray data provide lots of information regarding gene expression levels. Due to the large amount of such data, their analysis requires sufficient computational methods for identifying and analyzing gene regulation networks; however, researchers in this field are faced with numerous challenges such as consideration for too many genes and at the same time, the limited number of samples and their noisy nature of the data. In this paper, a hybrid method base on fuzzy cognitive map and compressed sensing is used to identify interactions between genes. For this purpose, in inference of the gene regulation network, the Ensemble Kalman filtered compressed sensing is used to learn the fuzzy cognitive map. Using the Ensemble Kalman filter and compressed sensing, the fuzzy cognitive map will be robust against noise. The proposed algorithm is evaluated using several metrics and compared with several well know methods such as LASSOFCM, KFRegular, CMI2NI. The experimental results show that the proposed method outperforms methods proposed in recent years in terms of SSmean, Data Error and accuracy.

2.
PLoS One ; 18(8): e0288173, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37535616

RESUMO

Drug discovery relies on predicting drug-target interaction (DTI), which is an important challenging task. The purpose of DTI is to identify the interaction between drug chemical compounds and protein targets. Traditional wet lab experiments are time-consuming and expensive, that's why in recent years, the use of computational methods based on machine learning has attracted the attention of many researchers. Actually, a dry lab environment focusing more on computational methods of interaction prediction can be helpful in limiting search space for wet lab experiments. In this paper, a novel multi-stage approach for DTI is proposed that called SRX-DTI. In the first stage, combination of various descriptors from protein sequences, and a FP2 fingerprint that is encoded from drug are extracted as feature vectors. A major challenge in this application is the imbalanced data due to the lack of known interactions, in this regard, in the second stage, the One-SVM-US technique is proposed to deal with this problem. Next, the FFS-RF algorithm, a forward feature selection algorithm, coupled with a random forest (RF) classifier is developed to maximize the predictive performance. This feature selection algorithm removes irrelevant features to obtain optimal features. Finally, balanced dataset with optimal features is given to the XGBoost classifier to identify DTIs. The experimental results demonstrate that our proposed approach SRX-DTI achieves higher performance than other existing methods in predicting DTIs. The datasets and source code are available at: https://github.com/Khojasteh-hb/SRX-DTI.


Assuntos
Proteínas , Software , Proteínas/química , Simulação por Computador , Sequência de Aminoácidos , Descoberta de Drogas/métodos , Interações Medicamentosas
3.
Sci Rep ; 13(1): 3594, 2023 03 03.
Artigo em Inglês | MEDLINE | ID: mdl-36869062

RESUMO

Drug-target interaction prediction is a vital stage in drug development, involving lots of methods. Experimental methods that identify these relationships on the basis of clinical remedies are time-taking, costly, laborious, and complex introducing a lot of challenges. One group of new methods is called computational methods. The development of new computational methods which are more accurate can be preferable to experimental methods, in terms of total cost and time. In this paper, a new computational model to predict drug-target interaction (DTI), consisting of three phases, including feature extraction, feature selection, and classification is proposed. In feature extraction phase, different features such as EAAC, PSSM and etc. would be extracted from sequence of proteins and fingerprint features from drugs. These extracted features would then be combined. In the next step, one of the wrapper feature selection methods named IWSSR, due to the large amount of extracted data, is applied. The selected features are then given to rotation forest classification, to have a more efficient prediction. Actually, the innovation of our work is that we extract different features; and then select features by the use of IWSSR. The accuracy of the rotation forest classifier based on tenfold on the golden standard datasets (enzyme, ion channels, G-protein-coupled receptors, nuclear receptors) is as follows: 98.12, 98.07, 96.82, and 95.64. The results of experiments indicate that the proposed model has an acceptable rate in DTI prediction and is compatible with the proposed methods in other papers.


Assuntos
Sistemas de Liberação de Medicamentos , Trabalho de Parto , Gravidez , Feminino , Humanos , Desenvolvimento de Medicamentos , Florestas , Domínios Proteicos
4.
Sci Rep ; 12(1): 5756, 2022 04 06.
Artigo em Inglês | MEDLINE | ID: mdl-35388017

RESUMO

Lysine malonylation is one of the most important post-translational modifications (PTMs). It affects the functionality of cells. Malonylation site prediction in proteins can unfold the mechanisms of cellular functionalities. Experimental methods are one of the due prediction approaches. But they are typically costly and time-consuming to implement. Recently, methods based on machine-learning solutions have been proposed to tackle this problem. Such practices have been shown to reduce costs and time complexities and increase accuracy. However, these approaches also have specific shortcomings, including inappropriate feature extraction out of protein sequences, high-dimensional features, and inefficient underlying classifiers. A machine learning-based method is proposed in this paper to cope with these problems. In the proposed approach, seven different features are extracted. Then, the extracted features are combined, ranked based on the Fisher's score (F-score), and the most efficient ones are selected. Afterward, malonylation sites are predicted using various classifiers. Simulation results show that the proposed method has acceptable performance compared with some state-of-the-art approaches. In addition, the XGBOOST classifier, founded on extracted features such as TFCRF, has a higher prediction rate than the other methods. The codes are publicly available at: https://github.com/jimy2020/Malonylation-site-prediction.


Assuntos
Lisina , Processamento de Proteína Pós-Traducional , Sequência de Aminoácidos , Lisina/metabolismo , Aprendizado de Máquina , Proteínas/metabolismo
5.
BMC Bioinformatics ; 22(1): 555, 2021 Nov 17.
Artigo em Inglês | MEDLINE | ID: mdl-34789169

RESUMO

BACKGROUND: Wet-lab experiments for identification of interactions between drugs and target proteins are time-consuming, costly and labor-intensive. The use of computational prediction of drug-target interactions (DTIs), which is one of the significant points in drug discovery, has been considered by many researchers in recent years. It also reduces the search space of interactions by proposing potential interaction candidates. RESULTS: In this paper, a new approach based on unifying matrix factorization and nuclear norm minimization is proposed to find a low-rank interaction. In this combined method, to solve the low-rank matrix approximation, the terms in the DTI problem are used in such a way that the nuclear norm regularized problem is optimized by a bilinear factorization based on Rank-Restricted Soft Singular Value Decomposition (RRSSVD). In the proposed method, adjacencies between drugs and targets are encoded by graphs. Drug-target interaction, drug-drug similarity, target-target, and combination of similarities have also been used as input. CONCLUSIONS: The proposed method is evaluated on four benchmark datasets known as Enzymes (E), Ion channels (ICs), G protein-coupled receptors (GPCRs) and nuclear receptors (NRs) based on AUC, AUPR, and time measure. The results show an improvement in the performance of the proposed method compared to the state-of-the-art techniques.


Assuntos
Algoritmos , Preparações Farmacêuticas , Desenvolvimento de Medicamentos , Descoberta de Drogas , Proteínas
6.
J Bioinform Comput Biol ; 19(2): 2150002, 2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-33657986

RESUMO

A central problem of systems biology is the reconstruction of Gene Regulatory Networks (GRNs) by the use of time series data. Although many attempts have been made to design an efficient method for GRN inference, providing a best solution is still a challenging task. Existing noise, low number of samples, and high number of nodes are the main reasons causing poor performance of existing methods. The present study applies the ensemble Kalman filter algorithm to model a GRN from gene time series data. The inference of a GRN is decomposed with p genes into p subproblems. In each subproblem, the ensemble Kalman filter algorithm identifies the weight of interactions for each target gene. With the use of the ensemble Kalman filter, the expression pattern of the target gene is predicted from the expression patterns of all the remaining genes. The proposed method is compared with several well-known approaches. The results of the evaluation indicate that the proposed method improves inference accuracy and demonstrates better regulatory relations with noisy data.


Assuntos
Algoritmos , Redes Reguladoras de Genes , Biologia de Sistemas , Fatores de Tempo
7.
Data Brief ; 32: 106144, 2020 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-32835040

RESUMO

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is responsible for the COVID-19 pandemic. It was first detected in China and was rapidly spread to other countries. Several thousands of whole genome sequences of SARS-CoV-2 have been reported and it is important to compare them and identify distinctive evolutionary/mutant markers. Utilizing chaos game representation (CGR) as well as recurrence quantification analysis (RQA) as a powerful nonlinear analysis technique, we proposed an effective process to extract several valuable features from genomic sequences of SARS-CoV-2. The represented features enable us to compare genomic sequences with different lengths. The provided dataset involves totally 18 RQA-based features for 4496 instances of SARS-CoV-2.

8.
Sci Rep ; 9(1): 18580, 2019 12 09.
Artigo em Inglês | MEDLINE | ID: mdl-31819106

RESUMO

Feature selection problem is one of the most significant issues in data classification. The purpose of feature selection is selection of the least number of features in order to increase accuracy and decrease the cost of data classification. In recent years, due to appearance of high-dimensional datasets with low number of samples, classification models have encountered over-fitting problem. Therefore, the need for feature selection methods that are used to remove the extensions and irrelevant features is felt. Recently, although, various methods have been proposed for selecting the optimal subset of features with high precision, these methods have encountered some problems such as instability, high convergence time, selection of a semi-optimal solution as the final result. In other words, they have not been able to fully extract the effective features. In this paper, a hybrid method based on the IWSSr method and Shuffled Frog Leaping Algorithm (SFLA) is proposed to select effective features in a large-scale gene dataset. The proposed algorithm is implemented in two phases: filtering and wrapping. In the filter phase, the Relief method is used for weighting features. Then, in the wrapping phase, by using the SFLA and the IWSSr algorithms, the search for effective features in a feature-rich area is performed. The proposed method is evaluated by using some standard gene expression datasets. The experimental results approve that the proposed approach in comparison to similar methods, has been achieved a more compact set of features along with high accuracy. The source code and testing datasets are available at https://github.com/jimy2020/SFLA_IWSSr-Feature-Selection.


Assuntos
Interpretação Estatística de Dados , Neoplasias/genética , Algoritmos , Simulação por Computador , Bases de Dados Genéticas , Conjuntos de Dados como Assunto , Feminino , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Humanos , Aprendizado de Máquina , Masculino , Análise de Sequência com Séries de Oligonucleotídeos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Software
9.
J Bioinform Comput Biol ; 17(3): 1950018, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-31288638

RESUMO

In this study, in order to deal with the noise and uncertainty in gene expression data, learning networks, especially Bayesian networks, that have the ability to use prior knowledge, were used to infer gene regulatory network. Learning networks are methods that have the structure of the network and a learning process to obtain relationships. One of the methods which have been used for measuring the relationship between genes is the correlation metrics, but the high correlated genes not necessarily mean that they have causal effect on each other. Studies on common methods in inference of gene regulatory networks are yet to pay attention to their biological importance and as such, predictions by these methods are less accurate in terms of biological significance. Hence, in the proposed method, genes with high correlation were identified in one cluster using clustering, and the existence of edge between the genes in the cluster was prevented. Finally, after the Bayesian network modeling, based on knowledge gained from clustering, the refining phase and improving regulatory interactions using biological correlation were done. In order to show the efficiency, the proposed method has been compared with several common methods in this area including GENIE3 and BMALR. The results of the evaluation indicate that the proposed method recognized regulatory relations in Bayesian modeling process well, due to using of biological knowledge which is hidden in the data collection, and is able to recognize gene regulatory networks align with important methods in this field.


Assuntos
Biologia Computacional/métodos , Redes Reguladoras de Genes , Teorema de Bayes , Análise por Conglomerados , Bases de Dados Genéticas , Modelos Genéticos , Curva ROC
10.
PLoS One ; 13(7): e0200094, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30001352

RESUMO

The reconstruction of the topology of gene regulatory networks (GRNs) using high throughput genomic data such as microarray gene expression data is an important problem in systems biology. The main challenge in gene expression data is the high number of genes and low number of samples; also the data are often impregnated with noise. In this paper, in dealing with the noisy data, Kalman filter based method that has the ability to use prior knowledge on learning the network was used. In the proposed method namely (KFLR), in the first phase by using mutual information, the noisy regulations with low correlations were removed. The proposed method utilized a new closed form solution to compute the posterior probabilities of the edges from regulators to the target gene within a hybrid framework of Bayesian model averaging and linear regression methods. In order to show the efficiency, the proposed method was compared with several well know methods. The results of the evaluation indicate that the inference accuracy was improved by the proposed method which also demonstrated better regulatory relations with the noisy data.


Assuntos
Redes Reguladoras de Genes , Algoritmos , Teorema de Bayes , Biologia Computacional/métodos , Bases de Dados Genéticas , Perfilação da Expressão Gênica/estatística & dados numéricos , Modelos Lineares , Modelos Genéticos , Curva ROC
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA