Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
1.
Nucleic Acids Res ; 43(5): e31, 2015 Mar 11.
Artigo em Inglês | MEDLINE | ID: mdl-25539927

RESUMO

Mutual information (MI), a quantity describing the nonlinear dependence between two random variables, has been widely used to construct gene regulatory networks (GRNs). Despite its good performance, MI cannot separate the direct regulations from indirect ones among genes. Although the conditional mutual information (CMI) is able to identify the direct regulations, it generally underestimates the regulation strength, i.e. it may result in false negatives when inferring gene regulations. In this work, to overcome the problems, we propose a novel concept, namely conditional mutual inclusive information (CMI2), to describe the regulations between genes. Furthermore, with CMI2, we develop a new approach, namely CMI2NI (CMI2-based network inference), for reverse-engineering GRNs. In CMI2NI, CMI2 is used to quantify the mutual information between two genes given a third one through calculating the Kullback-Leibler divergence between the postulated distributions of including and excluding the edge between the two genes. The benchmark results on the GRNs from DREAM challenge as well as the SOS DNA repair network in Escherichia coli demonstrate the superior performance of CMI2NI. Specifically, even for gene expression data with small sample size, CMI2NI can not only infer the correct topology of the regulation networks but also accurately quantify the regulation strength between genes. As a case study, CMI2NI was also used to reconstruct cancer-specific GRNs using gene expression data from The Cancer Genome Atlas (TCGA). CMI2NI is freely accessible at http://www.comp-sysbio.org/cmi2ni.


Assuntos
Algoritmos , Biologia Computacional/métodos , Regulação da Expressão Gênica , Redes Reguladoras de Genes/genética , Modelos Genéticos , Doença Aguda , Simulação por Computador , Escherichia coli/genética , Internet , Leucemia Mieloide/genética , Reprodutibilidade dos Testes , Saccharomyces cerevisiae/genética
2.
Bioinformatics ; 31(8): 1226-34, 2015 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-25505085

RESUMO

MOTIVATION: MicroRNAs (miRNAs) are short non-coding RNAs that play important roles in post-transcriptional regulations as well as other important biological processes. Recently, accumulating evidences indicate that miRNAs are extensively involved in cancer. However, it is a big challenge to identify which miRNAs are related to which cancer considering the complex processes involved in tumors, where one miRNA may target hundreds or even thousands of genes and one gene may regulate multiple miRNAs. Despite integrative analysis of matched gene and miRNA expression data can help identify cancer-associated miRNAs, such kind of data is not commonly available. On the other hand, there are huge amount of gene expression data that are publicly accessible. It will significantly improve the efficiency of characterizing miRNA's function in cancer if we can identify cancer miRNAs directly from gene expression data. RESULTS: We present a novel computational framework to identify the cancer-related miRNAs based solely on gene expression profiles without requiring either miRNA expression data or the matched gene and miRNA expression data. The results on multiple cancer datasets show that our proposed method can effectively identify cancer-related miRNAs with higher precision compared with other popular approaches. Furthermore, some of our novel predictions are validated by both differentially expressed miRNAs and evidences from literature, implying the predictive power of our proposed method. In addition, we construct a cancer-miRNA-pathway network, which can help explain how miRNAs are involved in cancer. AVAILABILITY AND IMPLEMENTATION: The R code and data files for the proposed method are available at http://comp-sysbio.org/miR_Path/ CONTACT: liukeq@gmail.com SUPPLEMENTARY INFORMATION: supplementary data are available at Bioinformatics online.


Assuntos
Biomarcadores Tumorais/genética , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Regulação Neoplásica da Expressão Gênica , MicroRNAs/genética , Neoplasias/genética , Redes Reguladoras de Genes , Humanos , MicroRNAs/análise , Neoplasias/diagnóstico , Neoplasias/metabolismo , RNA Mensageiro/metabolismo
3.
BMC Bioinformatics ; 15: 156, 2014 May 21.
Artigo em Inglês | MEDLINE | ID: mdl-24885988

RESUMO

BACKGROUND: The locations of the TM segments inside the membrane proteins are the consequence of a cascade of several events: the localizing of the nascent chain to the membrane, its insertion through the translocon, and the conformation adopted to reach its stable state inside the lipid bilayer. Even though the hydrophobic h-region of signal peptides and a typical TM segment are both composed of mostly hydrophobic side chains, the translocon has the ability to determine whether a given segment is to be inserted into the membrane. Our goal is to acquire robust biological insights into the influence of the translocon on membrane insertion of helices, obtained from the in silico discrimination between signal peptides and transmembrane segments of bitopic proteins. Therefore, by exploiting this subtle difference, we produce an optimized scale that evaluates the tendency of each amino acid to form sequences destined for membrane insertion by the translocon. RESULTS: The learning phase of our approach is conducted on carefully chosen data and easily converges on an optimal solution called the PMIscale (Potential Membrane Insertion scale). Our study leads to two striking results. Firstly, with a very simple sliding-window prediction method, PMIscale enables an efficient discrimination between signal peptides and signal anchors. Secondly, PMIscale is also able to identify TM segments and to localize them within protein sequences. CONCLUSIONS: Despite its simplicity, the localization method based on PMIscale nearly attains the highest level of TM topography prediction accuracy as the current state-of-the-art prediction methods. These observations confirm the prominent role of the translocon in the localization of TM segments and suggest several biological hypotheses about the physical properties of the translocon.


Assuntos
Algoritmos , Proteínas de Membrana/química , Sequência de Aminoácidos , Aminoácidos/química , Simulação por Computador , Interações Hidrofóbicas e Hidrofílicas , Proteínas de Membrana/metabolismo , Sinais Direcionadores de Proteínas , Estrutura Secundária de Proteína
4.
Bioinformatics ; 29(1): 106-13, 2013 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-23080116

RESUMO

MOTIVATION: Reconstruction of gene regulatory networks (GRNs) is of utmost interest to biologists and is vital for understanding the complex regulatory mechanisms within the cell. Despite various methods developed for reconstruction of GRNs from gene expression profiles, they are notorious for high false positive rate owing to the noise inherited in the data, especially for the dataset with a large number of genes but a small number of samples. RESULTS: In this work, we present a novel method, namely NARROMI, to improve the accuracy of GRN inference by combining ordinary differential equation-based recursive optimization (RO) and information theory-based mutual information (MI). In the proposed algorithm, the noisy regulations with low pairwise correlations are first removed by using MI, and the redundant regulations from indirect regulators are further excluded by RO to improve the accuracy of inferred GRNs. In particular, the RO step can help to determine regulatory directions without prior knowledge of regulators. The results on benchmark datasets from Dialogue for Reverse Engineering Assessments and Methods challenge and experimentally determined GRN of Escherichia coli show that NARROMI significantly outperforms other popular methods in terms of false positive rates and accuracy. AVAILABILITY: All the source data and code are available at: http://csb.shu.edu.cn/narromi.htm.


Assuntos
Algoritmos , Redes Reguladoras de Genes , Escherichia coli/genética , Transcriptoma
5.
Bioinformatics ; 28(1): 98-104, 2012 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-22088843

RESUMO

MOTIVATION: Reconstruction of gene regulatory networks (GRNs), which explicitly represent the causality of developmental or regulatory process, is of utmost interest and has become a challenging computational problem for understanding the complex regulatory mechanisms in cellular systems. However, all existing methods of inferring GRNs from gene expression profiles have their strengths and weaknesses. In particular, many properties of GRNs, such as topology sparseness and non-linear dependence, are generally in regulation mechanism but seldom are taken into account simultaneously in one computational method. RESULTS: In this work, we present a novel method for inferring GRNs from gene expression data considering the non-linear dependence and topological structure of GRNs by employing path consistency algorithm (PCA) based on conditional mutual information (CMI). In this algorithm, the conditional dependence between a pair of genes is represented by the CMI between them. With the general hypothesis of Gaussian distribution underlying gene expression data, CMI between a pair of genes is computed by a concise formula involving the covariance matrices of the related gene expression profiles. The method is validated on the benchmark GRNs from the DREAM challenge and the widely used SOS DNA repair network in Escherichia coli. The cross-validation results confirmed the effectiveness of our method (PCA-CMI), which outperforms significantly other previous methods. Besides its high accuracy, our method is able to distinguish direct (or causal) interactions from indirect associations. AVAILABILITY: All the source data and code are available at: http://csb.shu.edu.cn/subweb/grn.htm. CONTACT: lnchen@sibs.ac.cn; zpliu@sibs.ac.cn SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Escherichia coli/genética , Redes Reguladoras de Genes , Análise de Componente Principal , Algoritmos , Perfilação da Expressão Gênica , Resposta SOS em Genética
6.
PeerJ Comput Sci ; 9: e1597, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37869465

RESUMO

Product development projects usually contain many interrelated activities with complex information dependences, which induce activity rework, project delay and cost overrun. To reduce negative impacts, scheduling interrelated activities in an appropriate sequence is an important issue for project managers. This study develops a double-decomposition based parallel branch-and-prune algorithm, to determine the optimal activity sequence that minimizes the total feedback length (FLMP). This algorithm decomposes FLMP from two perspectives, which enables the use of all available computing resources to solve subproblems concurrently. In addition, we propose a result-compression strategy and a hash-address strategy to enhance this algorithm. Experimental results indicate that our algorithm can find the optimal sequence for FLMP up to 27 activities within 1 h, and outperforms state of the art exact algorithms.

7.
PeerJ Comput Sci ; 9: e1245, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37346631

RESUMO

Given a directed graph G = (V, E), a feedback vertex set is a vertex subset C whose removal makes the graph G acyclic. The feedback vertex set problem is to find the subset C* whose cardinality is the minimum. As a general model, this problem has a variety of applications. However, the problem is known to be NP-hard, and thus computationally challenging. To solve this difficult problem, this article develops an iterated dynamic thresholding search algorithm, which features a combination of local optimization, dynamic thresholding search, and perturbation. Computational experiments on 101 benchmark graphs from various sources demonstrate the advantage of the algorithm compared with the state-of-the-art algorithms, by reporting record-breaking best solutions for 24 graphs, equally best results for 75 graphs, and worse best results for only two graphs. We also study how the key components of the algorithm affect its performance of the algorithm.

8.
BMC Bioinformatics ; 13 Suppl 7: S11, 2012 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-22594997

RESUMO

BACKGROUND: Biclustering aims at finding subgroups of genes that show highly correlated behaviors across a subgroup of conditions. Biclustering is a very useful tool for mining microarray data and has various practical applications. From a computational point of view, biclustering is a highly combinatorial search problem and can be solved with optimization methods. RESULTS: We describe a stochastic pattern-driven neighborhood search algorithm for the biclustering problem. Starting from an initial bicluster, the proposed method improves progressively the quality of the bicluster by adjusting some genes and conditions. The adjustments are based on the quality of each gene and condition with respect to the bicluster and the initial data matrix. The performance of the method was evaluated on two well-known microarray datasets (Yeast cell cycle and Saccharomyces cerevisiae), showing that it is able to obtain statistically and biologically significant biclusters. The proposed method was also compared with six reference methods from the literature. CONCLUSIONS: The proposed method is computationally fast and can be applied to discover significant biclusters. It can also used to effectively improve the quality of existing biclusters provided by other biclustering methods.


Assuntos
Algoritmos , Análise por Conglomerados , Saccharomyces cerevisiae/genética , Ciclo Celular , Perfilação da Expressão Gênica , Análise de Sequência com Séries de Oligonucleotídeos , Saccharomyces cerevisiae/citologia
9.
BMC Bioinformatics ; 13: 126, 2012 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-22676414

RESUMO

BACKGROUND: Cancers, a group of multifactorial complex diseases, are generally caused by mutation of multiple genes or dysregulation of pathways. Identifying biomarkers that can characterize cancers would help to understand and diagnose cancers. Traditional computational methods that detect genes differentially expressed between cancer and normal samples fail to work due to small sample size and independent assumption among genes. On the other hand, genes work in concert to perform their functions. Therefore, it is expected that dysregulated pathways will serve as better biomarkers compared with single genes. RESULTS: In this paper, we propose a novel approach to identify dysregulated pathways in cancer based on a pathway interaction network. Our contribution is three-fold. Firstly, we present a new method to construct pathway interaction network based on gene expression, protein-protein interactions and cellular pathways. Secondly, the identification of dysregulated pathways in cancer is treated as a feature selection problem, which is biologically reasonable and easy to interpret. Thirdly, the dysregulated pathways are identified as subnetworks from the pathway interaction networks, where the subnetworks characterize very well the functional dependency or crosstalk between pathways. The benchmarking results on several distinct cancer datasets demonstrate that our method can obtain more reliable and accurate results compared with existing state of the art methods. Further functional analysis and independent literature evidence also confirm that our identified potential pathogenic pathways are biologically reasonable, indicating the effectiveness of our method. CONCLUSIONS: Dysregulated pathways can serve as better biomarkers compared with single genes. In this work, by utilizing pathway interaction networks and gene expression data, we propose a novel approach that effectively identifies dysregulated pathways, which can not only be used as biomarkers to diagnose cancers but also serve as potential drug targets in the future.


Assuntos
Biologia Computacional/métodos , Expressão Gênica , Redes e Vias Metabólicas , Neoplasias/metabolismo , Mapas de Interação de Proteínas , Biomarcadores Tumorais/análise , Humanos , Neoplasias/genética
10.
Brief Bioinform ; 11(1): 127-41, 2010 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-19789265

RESUMO

Gene selection aims at identifying a (small) subset of informative genes from the initial data in order to obtain high predictive accuracy for classification. Gene selection can be considered as a combinatorial search problem and thus be conveniently handled with optimization methods. In this article, we summarize some recent developments of using metaheuristic-based methods within an embedded approach for gene selection. In particular, we put forward the importance and usefulness of integrating problem-specific knowledge into the search operators of such a method. To illustrate the point, we explain how ranking coefficients of a linear classifier such as support vector machine (SVM) can be profitably used to reinforce the search efficiency of Local Search and Evolutionary Search metaheuristic algorithms for gene selection and classification.


Assuntos
Genes , Análise de Sequência com Séries de Oligonucleotídeos , Algoritmos
11.
PeerJ Comput Sci ; 8: e972, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35721414

RESUMO

The Clustered Traveling Salesman Problem (CTSP) is a variant of the popular Traveling Salesman Problem (TSP) arising from a number of real-life applications. In this work, we explore a transformation approach that solves the CTSP by converting it to the well-studied TSP. For this purpose, we first investigate a technique to convert a CTSP instance to a TSP and then apply powerful TSP solvers (including exact and heuristic solvers) to solve the resulting TSP instance. We want to answer the following questions: How do state-of-the-art TSP solvers perform on clustered instances converted from the CTSP? Do state-of-the-art TSP solvers compete well with the best performing methods specifically designed for the CTSP? For this purpose, we present intensive computational experiments on various benchmark instances to draw conclusions.

12.
IEEE Trans Cybern ; 52(9): 9391-9403, 2022 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-33635807

RESUMO

The clique partitioning problem (CPP) of an edge-weighted complete graph is to partition the vertex set V into k disjoint subsets such that the sum of the edge weights within all cliques induced by the subsets is as large as possible. The problem has a number of practical applications in areas, such as data mining, engineering, and bioinformatics, and is, however, computationally challenging. To solve this NP-hard problem, we propose the first evolutionary algorithm that combines a dedicated merge-divide crossover operator to generate offspring solutions and an effective simulated annealing-based local optimization procedure to find high-quality local optima. The extensive experiments on three sets of 94 benchmark instances (including two sets of 63 classical benchmark instances and one new set of 31 large benchmark) show a remarkable performance of the proposed approach compared to the state-of-the-art methods. We analyze the key algorithmic ingredients to shed light on their impacts on the performance of the algorithm. The algorithm and its available source code can benefit people working on practical problems related to CPP.


Assuntos
Algoritmos , Biologia Computacional , Biologia Computacional/métodos , Mineração de Dados , Humanos , Software
13.
IEEE Trans Cybern ; 49(10): 3699-3712, 2019 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-29994417

RESUMO

Critical node problems (CNPs) involve finding a set of critical nodes from a graph whose removal results in optimizing a predefined measure over the residual graph. As useful models for a variety of practical applications, these problems are computationally challenging. In this paper, we study the classic CNP and introduce an effective memetic algorithm for solving CNP. The proposed algorithm combines a double backbone-based crossover operator (to generate promising offspring solutions), a component-based neighborhood search procedure (to find high-quality local optima), and a rank-based pool updating strategy (to guarantee a healthy population). Extensive evaluations on 42 synthetic and real-world benchmark instances show that the proposed algorithm discovers 24 new upper bounds and matches 15 previous best-known bounds. We also demonstrate the relevance of our algorithm for effectively solving a variant of the classic CNP, called the cardinality-constrained CNP. Finally, we investigate the usefulness of each key algorithmic component.

14.
Artigo em Inglês | MEDLINE | ID: mdl-18245882

RESUMO

The Maximum Parsimony (MP) problem aims at reconstructing a phylogenetic tree from DNA sequences while minimizing the number of genetic transformations. To solve this NP-complete problem, heuristic methods have been developed, often based on local search. In this article, we focus on the influence of the neighborhood relations. After analyzing the advantages and drawbacks of the well-known Nearest Neighbor Interchange (NNI), Subtree Pruning Regrafting (SPR) and Tree-Bisection-Reconnection (TBR) neighborhoods, we introduce the concept of Progressive Neighborhood (PN) which consists in constraining progressively the size of the neighborhood as the search advances. We empirically show that applied to the Maximum Parsimony problem, this progressive neighborhood turns out to be more efficient and robust than the classic neighborhoods using a descent algorithm. Indeed, it allows to find better solutions with a smaller number of iterations or trees evaluated.


Assuntos
Modelos Genéticos , Filogenia , Algoritmos , Sequência de Bases , Evolução Molecular , Modelos Estatísticos , Probabilidade
15.
Genomics Proteomics Bioinformatics ; 6(2): 61-73, 2008 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-18973862

RESUMO

Gene subset selection is essential for classification and analysis of microarray data. However, gene selection is known to be a very difficult task since gene expression data not only have high dimensionalities, but also contain redundant information and noises. To cope with these difficulties, this paper introduces a fuzzy logic based pre-processing approach composed of two main steps. First, we use fuzzy inference rules to transform the gene expression levels of a given dataset into fuzzy values. Then we apply a similarity relation to these fuzzy values to define fuzzy equivalence groups, each group containing strongly similar genes. Dimension reduction is achieved by considering for each group of similar genes a single representative based on mutual information. To assess the usefulness of this approach, extensive experimentations were carried out on three well-known public datasets with a combined classification model using three statistic filters and three classifiers.


Assuntos
Lógica Fuzzy , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos , Colo/metabolismo , Neoplasias do Colo/genética , Biologia Computacional , Interpretação Estatística de Dados , Bases de Dados Genéticas , Perfilação da Expressão Gênica/estatística & dados numéricos , Humanos , Leucemia/genética , Linfoma/genética , Modelos Estatísticos
16.
Cell Discov ; 2: 16025, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27625789

RESUMO

Despite the explosion in the numbers of cancer genomic studies, metastasis is still the major cause of cancer mortality. In breast cancer, approximately one-fifth of metastatic patients survive 5 years. Therefore, detecting the patients at a high risk of developing distant metastasis at first diagnosis is critical for effective treatment strategy. We hereby present a novel systems biology approach to identify driver mutations escalating the risk of metastasis based on both exome and RNA sequencing of our collected 78 normal-paired breast cancers. Unlike driver mutations occurring commonly in cancers as reported in the literature, the mutations detected here are relatively rare mutations occurring in less than half metastatic samples. By supposing that the driver mutations should affect the metastasis gene signatures, we develop a novel computational pipeline to identify the driver mutations that affect transcription factors regulating metastasis gene signatures. We identify driver mutations in ADPGK, NUP93, PCGF6, PKP2 and SLC22A5, which are verified to enhance cancer cell migration and prompt metastasis with in vitro experiments. The discovered somatic mutations may be helpful for identifying patients who are likely to develop distant metastasis.

17.
BioData Min ; 2: 9, 2009 Dec 16.
Artigo em Inglês | MEDLINE | ID: mdl-20015398

RESUMO

BACKGROUND: In a number of domains, like in DNA microarray data analysis, we need to cluster simultaneously rows (genes) and columns (conditions) of a data matrix to identify groups of rows coherent with groups of columns. This kind of clustering is called biclustering. Biclustering algorithms are extensively used in DNA microarray data analysis. More effective biclustering algorithms are highly desirable and needed. METHODS: We introduce BiMine, a new enumeration algorithm for biclustering of DNA microarray data. The proposed algorithm is based on three original features. First, BiMine relies on a new evaluation function called Average Spearman's rho (ASR). Second, BiMine uses a new tree structure, called Bicluster Enumeration Tree (BET), to represent the different biclusters discovered during the enumeration process. Third, to avoid the combinatorial explosion of the search tree, BiMine introduces a parametric rule that allows the enumeration process to cut tree branches that cannot lead to good biclusters. RESULTS: The performance of the proposed algorithm is assessed using both synthetic and real DNA microarray data. The experimental results show that BiMine competes well with several other biclustering methods. Moreover, we test the biological significance using a gene annotation web-tool to show that our proposed method is able to produce biologically relevant biclusters. The software is available upon request from the authors to academic users.

18.
Evol Comput ; 14(2): 223-53, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-16831107

RESUMO

This paper presents GASAT, a hybrid algorithm for the satisfiability problem (SAT). The main feature of GASAT is that it includes a recombination stage based on a specific crossover and a tabu search stage. We have conducted experiments to evaluate the different components of GASAT and to compare its overall performance with state-of-the-art SAT algorithms. These experiments show that GASAT provides very competitive results.


Assuntos
Biologia Computacional/métodos , Algoritmos , Simulação por Computador , Evolução Molecular , Humanos , Modelos Genéticos , Modelos Estatísticos , Modelos Teóricos , Reconhecimento Automatizado de Padrão , Linguagens de Programação , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA