Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
1.
J Comput Aided Mol Des ; 33(9): 831-844, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31628660

RESUMO

Quantitative Structure-Activity Relationship (QSAR) models are critical in various areas of drug discovery, for example in lead optimisation and virtual screening. Recently, the need for models that are not only predictive but also interpretable has been highlighted. In this paper, a new methodology is proposed to build interpretable QSAR models by combining elements of network analysis and piecewise linear regression. The algorithm presented, modSAR, splits data using a two-step procedure. First, compounds associated with a common target are represented as a network in terms of their structural similarity, revealing modules of similar chemical properties. Second, each module is subdivided into subsets (regions), each of which is modelled by an independent linear equation. Comparative analysis of QSAR models across five data sets of protein inhibitors obtained from ChEMBL is reported and it is shown that modSAR offers similar predictive accuracy to popular algorithms, such as Random Forest and Support Vector Machine. Moreover, we show that models built by modSAR are interpretatable, capable of evaluating the applicability domain of the compounds and serve well tasks such as virtual screening and the development of new drug leads.


Assuntos
Biologia Computacional , Descoberta de Drogas/métodos , Proteínas/ultraestrutura , Relação Quantitativa Estrutura-Atividade , Algoritmos , Humanos , Modelos Lineares , Modelos Moleculares , Proteínas/antagonistas & inibidores , Proteínas/química , Máquina de Vetores de Suporte , Interface Usuário-Computador
2.
BMC Bioinformatics ; 15: 390, 2014 Dec 05.
Artigo em Inglês | MEDLINE | ID: mdl-25475756

RESUMO

BACKGROUND: Applying machine learning methods on microarray gene expression profiles for disease classification problems is a popular method to derive biomarkers, i.e. sets of genes that can predict disease state or outcome. Traditional approaches where expression of genes were treated independently suffer from low prediction accuracy and difficulty of biological interpretation. Current research efforts focus on integrating information on protein interactions through biochemical pathway datasets with expression profiles to propose pathway-based classifiers that can enhance disease diagnosis and prognosis. As most of the pathway activity inference methods in literature are either unsupervised or applied on two-class datasets, there is good scope to address such limitations by proposing novel methodologies. RESULTS: A supervised multiclass pathway activity inference method using optimisation techniques is reported. For each pathway expression dataset, patterns of its constituent genes are summarised into one composite feature, termed pathway activity, and a novel mathematical programming model is proposed to infer this feature as a weighted linear summation of expression of its constituent genes. Gene weights are determined by the optimisation model, in a way that the resulting pathway activity has the optimal discriminative power with regards to disease phenotypes. Classification is then performed on the resulting low-dimensional pathway activity profile. CONCLUSIONS: The model was evaluated through a variety of published gene expression profiles that cover different types of disease. We show that not only does it improve classification accuracy, but it can also perform well in multiclass disease datasets, a limitation of other approaches from the literature. Desirable features of the model include the ability to control the maximum number of genes that may participate in determining pathway activity, which may be pre-specified by the user. Overall, this work highlights the potential of building pathway-based multi-phenotype classifiers for accurate disease diagnosis and prognosis problems.


Assuntos
Doença/classificação , Doença/genética , Perfilação da Expressão Gênica/métodos , Computação Matemática , Modelos Teóricos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Transdução de Sinais , Inteligência Artificial , Neoplasias da Mama/genética , Bases de Dados Genéticas , Feminino , Humanos , Neoplasias Pulmonares/genética , Masculino , Neoplasias da Próstata/genética , Psoríase/genética
3.
Artif Intell Med ; 147: 102700, 2024 01.
Artigo em Inglês | MEDLINE | ID: mdl-38184363

RESUMO

BACKGROUND: The search for new antimalarial treatments is urgent due to growing resistance to existing therapies. The Open Source Malaria (OSM) project offers a promising starting point, having extensively screened various compounds for their effectiveness. Further analysis of the chemical space surrounding these compounds could provide the means for innovative drugs. METHODS: We report an optimisation-based method for quantitative structure-activity relationship (QSAR) modelling that provides explainable modelling of ligand activity through a mathematical programming formulation. The methodology is based on piecewise regression principles and offers optimal detection of breakpoint features, efficient allocation of samples into distinct sub-groups based on breakpoint feature values, and insightful regression coefficients. Analysis of OSM antimalarial compounds yields interpretable results through rules generated by the model that reflect the contribution of individual fingerprint fragments in ligand activity prediction. Using knowledge of fragment prioritisation and screening of commercially available compound libraries, potential lead compounds for antimalarials are identified and evaluated experimentally via a Plasmodium falciparum asexual growth inhibition assay (PfGIA) and a human cell cytotoxicity assay. CONCLUSIONS: Three compounds are identified as potential leads for antimalarials using the methodology described above. This work illustrates how explainable predictive models based on mathematical optimisation can pave the way towards more efficient fragment-based lead discovery as applied in malaria.


Assuntos
Antimaláricos , Malária , Humanos , Antimaláricos/farmacologia , Ligantes , Malária/tratamento farmacológico
4.
Cancers (Basel) ; 15(6)2023 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-36980673

RESUMO

BACKGROUND: With advances in high-throughput technologies, there has been an enormous increase in data related to profiling the activity of molecules in disease. While such data provide more comprehensive information on cellular actions, their large volume and complexity pose difficulty in accurate classification of disease phenotypes. Therefore, novel modelling methods that can improve accuracy while offering interpretable means of analysis are required. Biological pathways can be used to incorporate a priori knowledge of biological interactions to decrease data dimensionality and increase the biological interpretability of machine learning models. METHODOLOGY: A mathematical optimisation model is proposed for pathway activity inference towards precise disease phenotype prediction and is applied to RNA-Seq datasets. The model is based on mixed-integer linear programming (MILP) mathematical optimisation principles and infers pathway activity as the linear combination of pathway member gene expression, multiplying expression values with model-determined gene weights that are optimised to maximise discrimination of phenotype classes and minimise incorrect sample allocation. RESULTS: The model is evaluated on the transcriptome of breast and colorectal cancer, and exhibits solution results of good optimality as well as good prediction performance on related cancer subtypes. Two baseline pathway activity inference methods and three advanced methods are used for comparison. Sample prediction accuracy, robustness against noise expression data, and survival analysis suggest competitive prediction performance of our model while providing interpretability and insight on key pathways and genes. Overall, our work demonstrates that the flexible nature of mathematical programming lends itself well to developing efficient computational strategies for pathway activity inference and disease subtype prediction.

5.
Mol Inform ; 38(3): e1800028, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-30251339

RESUMO

Quantitative Structure-Activity Relationship (QSAR) models have been successfully applied to lead optimisation, virtual screening and other areas of drug discovery over the years. Recent studies, however, have focused on the development of models that are predictive but often not interpretable. In this article, we propose the application of a piecewise linear regression algorithm, OPLRAreg, to develop both predictive and interpretable QSAR models. The algorithm determines a feature to best separate the data into regions and identifies linear equations to predict the outcome variable in each region. A regularisation term is introduced to prevent overfitting problems and implicitly selects the most informative features. As OPLRAreg is based on mathematical programming, a flexible and transparent representation for optimisation problems, the algorithm also permits customised constraints to be easily added to the model. The proposed algorithm is presented as a more interpretable alternative to other commonly used machine learning algorithms and has shown comparable predictive accuracy to Random Forest, Support Vector Machine and Random Generalised Linear Model on tests with five QSAR data sets compiled from the ChEMBL database.


Assuntos
Inibidores Enzimáticos/química , Relação Quantitativa Estrutura-Atividade , Bases de Dados de Compostos Químicos , Inibidores Enzimáticos/farmacologia , Humanos , Modelos Lineares
6.
PeerJ Comput Sci ; 4: e161, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-33816814

RESUMO

This paper presents a novel prototype platform that uses the same LaTeX mark-up language, commonly used to typeset mathematical content, as an input language for modeling optimization problems of various classes. The platform converts the LaTeX model into a formal Algebraic Modeling Language (AML) representation based on Pyomo through a parsing engine written in Python and solves by either via NEOS server or locally installed solvers, using a friendly Graphical User Interface (GUI). The distinct advantages of our approach can be summarized in (i) simplification and speed-up of the model design and development process (ii) non-commercial character (iii) cross-platform support (iv) easier typo and logic error detection in the description of the models and (v) minimization of working knowledge of programming and AMLs to perform mathematical programming modeling. Overall, this is a presentation of a complete workable scheme on using LaTeX for mathematical programming modeling which assists in furthering our ability to reproduce and replicate scientific work.

7.
Sci Total Environ ; 636: 314-338, 2018 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-29709850

RESUMO

Climate change becomes increasingly more relevant in the context of water systems planning. Tools are necessary to provide the most economic investment option considering the reliability of the infrastructure from technical and environmental perspectives. Accordingly, in this work, an optimisation approach, formulated as a spatially-explicit multi-period Mixed Integer Linear Programming (MILP) model, is proposed for the design of water supply chains at regional and national scales. The optimisation framework encompasses decisions such as installation of new purification plants, capacity expansion, and raw water trading schemes. The objective is to minimise the total cost incurring from capital and operating expenditures. Assessment of available resources for withdrawal is performed based on hydrological balances, governmental rules and sustainable limits. In the light of the increasing importance of reliability of water supply, a second objective, seeking to maximise the reliability of the supply chains, is introduced. The epsilon-constraint method is used as a solution procedure for the multi-objective formulation. Nash bargaining approach is applied to investigate the fair trade-offs between the two objectives and find the Pareto optimality. The models' capability is addressed through a case study based on Australia. The impact of variability in key input parameters is tackled through the implementation of a rigorous global sensitivity analysis (GSA). The findings suggest that variations in water demand can be more disruptive for the water supply chain than scenarios in which rainfalls are reduced. The frameworks can facilitate governmental multi-aspect decision making processes for the adequate and strategic investments of regional water supply infrastructure.

8.
Biotechnol Prog ; 33(4): 1116-1126, 2017 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-28393478

RESUMO

This work addresses rapid resin selection for integrated chromatographic separations when conducted as part of a high-throughput screening exercise during the early stages of purification process development. An optimization-based decision support framework is proposed to process the data generated from microscale experiments to identify the best resins to maximize key performance metrics for a biopharmaceutical manufacturing process, such as yield and purity. A multiobjective mixed integer nonlinear programming model is developed and solved using the ε-constraint method. Dinkelbach's algorithm is used to solve the resulting mixed integer linear fractional programming model. The proposed framework is successfully applied to an industrial case study of a process to purify recombinant Fc Fusion protein from low molecular weight and high molecular weight product related impurities, involving two chromatographic steps with eight and three candidate resins for each step, respectively. The computational results show the advantage of the proposed framework in terms of computational efficiency and flexibility. © 2017 The Authors Biotechnology Progress published by Wiley Periodicals, Inc. on behalf of American Institute of Chemical Engineers Biotechnol. Prog., 33:1116-1126, 2017.


Assuntos
Cromatografia/métodos , Fragmentos Fc das Imunoglobulinas/isolamento & purificação , Proteínas Recombinantes de Fusão/isolamento & purificação , Resinas Sintéticas/química , Ensaios de Triagem em Larga Escala , Humanos , Fragmentos Fc das Imunoglobulinas/química , Proteínas Recombinantes de Fusão/química
9.
Biotechnol Prog ; 22(6): 1630-6, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-17137311

RESUMO

The growing trend of employing multiproduct manufacturing facilities along with the randomness inherent in the biopharmaceutical manufacturing environment is creating significant scheduling and planning challenges for the biopharmaceutical industry. This work focuses on capturing the effect of uncertainty in fermentation titers when optimizing the planning of biopharmaceutical manufacturing campaigns. A mixed integer linear programming (MILP) model based on previous work is derived via chance constrained programming (CCP). The methodology is applied to two illustrative examples, and the results are compared with those from the deterministic model and a multiscenario model accompanied by an iterative construction algorithm. The computational results indicate that the proposed methodology offers significant improvements in solution quality over the compared approaches and presents an opportunity for biopharmaceutical manufacturers to make better medium term planning decisions, particularly under uncertain manufacturing conditions.


Assuntos
Algoritmos , Fenômenos Fisiológicos Bacterianos , Biofarmácia/métodos , Técnicas de Apoio para a Decisão , Indústria Farmacêutica/métodos , Fermentação/fisiologia , Proteínas Recombinantes/biossíntese , Simulação por Computador , Modelos Biológicos , Modelos Estatísticos , Técnicas de Planejamento , Titulometria
10.
FEBS Lett ; 579(14): 3037-42, 2005 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-15896791

RESUMO

The p53 protein interaction network is crucial in regulating the metazoan cell cycle and apoptosis. Here, the robustness of the p53 network is studied by analyzing its degeneration under two modes of attack. Linear Programming is used to calculate average path lengths among proteins and the network diameter as measures of functionality. The p53 network is found to be robust to random loss of nodes, but vulnerable to a targeted attack against its hubs, as a result of its architecture. The significance of the results is considered with respect to mutational knockouts of proteins and the directed attacks mounted by tumour inducing viruses.


Assuntos
Simulação por Computador , Modelos Biológicos , Vírus Oncogênicos/fisiologia , Transdução de Sinais , Proteína Supressora de Tumor p53/metabolismo , Neoplasias/metabolismo , Neoplasias/virologia
11.
Biotechnol Prog ; 21(5): 1478-89, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-16209554

RESUMO

Regulatory pressures and capacity constraints are forcing the biopharmaceutical industry to consider employing multiproduct manufacturing facilities running on a campaign basis. The need for such flexible and cost-effective manufacture poses a significant challenge for planning and scheduling. This paper reviews the problem of planning and scheduling of biopharmaceutical manufacture and presents a methodology for the planning of multiproduct biopharmaceutical manufacturing facilities. The problem is formulated as a mixed integer linear program (MILP) to represent the relevant decisions required within the planning process and is tested on two typical biopharmaceutical industry planning problems. The proposed formulation is compared with an industrial rule based approach, which it outperforms in terms of profitability. The results indicate that the developed formulation offers an effective representation of the planning problem and would be a useful decision tool for manufacturers in the biopharmaceutical industry particularly at times of limited manufacturing capacity.


Assuntos
Algoritmos , Biofarmácia/métodos , Biofarmácia/organização & administração , Técnicas de Apoio para a Decisão , Indústria Farmacêutica/métodos , Indústria Farmacêutica/organização & administração , Técnicas de Planejamento , Manufaturas , Modelos Teóricos , Análise Numérica Assistida por Computador
12.
Biotechnol Prog ; 21(3): 875-84, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-15932268

RESUMO

The development of systematic methods for the synthesis of downstream protein processing operations has seen growing interest in recent years, as purification is often the most complex and costly stage in biochemical production plants. The objective of the work presented here is to develop mathematical models based on mixed integer optimization techniques, which integrate the selection of optimal peptide purification tags into an established framework for the synthesis of protein purification processes. Peptide tags are comparatively short sequences of amino acids fused onto the protein product, capable of reducing the required purification steps. The methodology is illustrated through its application on two example protein mixtures involving up to 13 contaminants and a set of 11 candidate chromatographic steps. The results are indicative of the benefits resulting by the appropriate use of peptide tags in purification processes and provide a guideline for both optimal tag design and downstream process synthesis.


Assuntos
Cromatografia/métodos , Técnicas de Química Combinatória , Modelos Químicos , Peptídeos/química , Proteínas/química , Proteínas/isolamento & purificação , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Misturas Complexas/química , Misturas Complexas/isolamento & purificação , Simulação por Computador , Dados de Sequência Molecular , Análise Numérica Assistida por Computador , Peptídeos/isolamento & purificação , Soluções
13.
Sci Rep ; 5: 10345, 2015 May 27.
Artigo em Inglês | MEDLINE | ID: mdl-26012716

RESUMO

The detection of community structure is a widely accepted means of investigating the principles governing biological systems. Recent efforts are exploring ways in which multiple data sources can be integrated to generate a more comprehensive model of cellular interactions, leading to the detection of more biologically relevant communities. In this work, we propose a mathematical programming model to cluster multiplex biological networks, i.e. multiple network slices, each with a different interaction type, to determine a single representative partition of composite communities. Our method, known as SimMod, is evaluated through its application to yeast networks of physical, genetic and co-expression interactions. A comparative analysis involving partitions of the individual networks, partitions of aggregated networks and partitions generated by similar methods from the literature highlights the ability of SimMod to identify functionally enriched modules. It is further shown that SimMod offers enhanced results when compared to existing approaches without the need to train on known cellular interactions.


Assuntos
Modelos Teóricos , Análise por Conglomerados , Redes e Vias Metabólicas , Saccharomyces cerevisiae/metabolismo
14.
Math Biosci ; 260: 25-34, 2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-25242610

RESUMO

In microarray data analysis, traditional methods that focus on single genes are increasingly replaced by methods that analyse functional units corresponding to biochemical pathways, as these are considered to offer more insight into gene expression and disease associations. However, the development of robust pipelines to relate genotypic functional modules to disease phenotypes through known molecular interactions is still at its early stages. In this article we first discuss methodologies that employ groups of genes in disease classification tasks that aim to link gene expression patterns with disease outcome. Then we present a pathway-based approach for disease classification through a mathematical programming model based on hyper-box principles. Association rules derived from the model are extracted and discussed with respect to pathway-specific molecular patterns related to the disease. Overall, we argue that the use of gene sets corresponding to disease-relevant pathways is a promising route to uncover expression-to-phenotype relations in disease classification and we illustrate the potential of hyper-box classification in assessing the predictive power of functional pathways and uncover the effect of specific genes in the prediction of disease phenotypes.


Assuntos
Mineração de Dados/métodos , Perfilação da Expressão Gênica/métodos , Modelos Teóricos , Neoplasias da Mama/classificação , Humanos , Psoríase/classificação
15.
PLoS One ; 9(11): e112821, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25412367

RESUMO

Community structure detection has proven to be important in revealing the underlying properties of complex networks. The standard problem, where a partition of disjoint communities is sought, has been continually adapted to offer more realistic models of interactions in these systems. Here, a two-step procedure is outlined for exploring the concept of overlapping communities. First, a hard partition is detected by employing existing methodologies. We then propose a novel mixed integer non linear programming (MINLP) model, known as OverMod, which transforms disjoint communities to overlapping. The procedure is evaluated through its application to protein-protein interaction (PPI) networks of the rat, E. coli, yeast and human organisms. Connector nodes of hard partitions exhibit topological and functional properties indicative of their suitability as candidates for multiple module membership. OverMod identifies two types of connector nodes, inter and intra-connector, each with their own particular characteristics pertaining to their topological and functional role in the organisation of the network. Inter-connector proteins are shown to be highly conserved proteins participating in pathways that control essential cellular processes, such as proliferation, differentiation and apoptosis and their differences with intra-connectors is highlighted. Many of these proteins are shown to possess multiple roles of distinct nature through their participation in different network modules, setting them apart from proteins that are simply 'hubs', i.e. proteins with many interaction partners but with a more specific biochemical role.


Assuntos
Modelos Teóricos , Mapas de Interação de Proteínas , Algoritmos , Animais , Escherichia coli/metabolismo , Proteínas de Escherichia coli/metabolismo , Humanos , Ratos , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo
16.
Biotechnol Prog ; 30(3): 594-606, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24376262

RESUMO

Production planning for biopharmaceutical portfolios becomes more complex when products switch between fed-batch and continuous perfusion culture processes. This article describes the development of a discrete-time mixed integer linear programming (MILP) model to optimize capacity plans for multiple biopharmaceutical products, with either batch or perfusion bioprocesses, across multiple facilities to meet quarterly demands. The model comprised specific features to account for products with fed-batch or perfusion culture processes such as sequence-dependent changeover times, continuous culture constraints, and decoupled upstream and downstream operations that permit independent scheduling of each. Strategic inventory levels were accounted for by applying cost penalties when they were not met. A rolling time horizon methodology was utilized in conjunction with the MILP model and was shown to obtain solutions with greater optimality in less computational time than the full-scale model. The model was applied to an industrial case study to illustrate how the framework aids decisions regarding outsourcing capacity to third party manufacturers or building new facilities. The impact of variations on key parameters such as demand or titres on the optimal production plans and costs was captured. The analysis identified the critical ratio of in-house to contract manufacturing organization (CMO) manufacturing costs that led the optimization results to favor building a future facility over using a CMO. The tool predicted that if titres were higher than expected then the optimal solution would allocate more production to in-house facilities, where manufacturing costs were lower. Utilization graphs indicated when capacity expansion should be considered.


Assuntos
Anticorpos Monoclonais/economia , Biofarmácia/economia , Biotecnologia , Indústria Farmacêutica , Anticorpos Monoclonais/biossíntese , Técnicas de Cultura Celular por Lotes , Custos e Análise de Custo , Humanos , Modelos Teóricos
17.
Biotechnol Prog ; 29(6): 1472-83, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23956206

RESUMO

Chromatography operations are identified as critical steps in a monoclonal antibody (mAb) purification process and can represent a significant proportion of the purification material costs. This becomes even more critical with increasing product titers that result in higher mass loads onto chromatography columns, potentially causing capacity bottlenecks. In this work, a mixed-integer nonlinear programming (MINLP) model was created and applied to an industrially relevant case study to optimize the design of a facility by determining the most cost-effective chromatography equipment sizing strategies for the production of mAbs. Furthermore, the model was extended to evaluate the ability of a fixed facility to cope with higher product titers up to 15 g/L. Examination of the characteristics of the optimal chromatography sizing strategies across different titer values enabled the identification of the maximum titer that the facility could handle using a sequence of single column chromatography steps as well as multi-column steps. The critical titer levels for different ratios of upstream to dowstream trains where multiple parallel columns per step resulted in the removal of facility bottlenecks were identified. Different facility configurations in terms of number of upstream trains were considered and the trade-off between their cost and ability to handle higher titers was analyzed. The case study insights demonstrate that the proposed modeling approach, combining MINLP models with visualization tools, is a valuable decision-support tool for the design of cost-effective facility configurations and to aid facility fit decisions. 2013.


Assuntos
Anticorpos Monoclonais/isolamento & purificação , Cromatografia/métodos , Análise Custo-Benefício , Anticorpos Monoclonais/química , Anticorpos Monoclonais/uso terapêutico , Cromatografia/economia , Humanos , Modelos Teóricos
18.
Biotechnol Prog ; 27(6): 1653-60, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21976368

RESUMO

Downstream bioprocessing and especially chromatographic steps, commonly used for the purification of multicomponent systems, are significant cost drivers in the production of therapeutic proteins. There has been an increased interest in the development of systematic methods for the design of such processes, and the appropriate selection of a series of chromatographic steps is still a major challenge to be addressed. Several models have been developed previously but have assumed that 100% recovery of the desired product is obtained at each chromatographic step. In this work, a mathematical framework is proposed, based on mixed integer optimisation techniques, that removes this assumption and allows full flexibility on the position of retention time cut-points, between which the desired product fraction is collected. The proposed model is demonstrated on three example protein mixtures, each containing up to 13 contaminants and selecting from a set of up to 21 candidate steps. The proposed model results in a reduction of one to three chromatographic steps over solutions that no losses are allowed.


Assuntos
Biotecnologia/instrumentação , Cromatografia/instrumentação , Proteínas/isolamento & purificação , Biotecnologia/métodos , Cromatografia/métodos , Modelos Teóricos
19.
Algorithms Mol Biol ; 5: 36, 2010 Nov 12.
Artigo em Inglês | MEDLINE | ID: mdl-21073720

RESUMO

BACKGROUND: The detection of modules or community structure is widely used to reveal the underlying properties of complex networks in biology, as well as physical and social sciences. Since the adoption of modularity as a measure of network topological properties, several methodologies for the discovery of community structure based on modularity maximisation have been developed. However, satisfactory partitions of large graphs with modest computational resources are particularly challenging due to the NP-hard nature of the related optimisation problem. Furthermore, it has been suggested that optimising the modularity metric can reach a resolution limit whereby the algorithm fails to detect smaller communities than a specific size in large networks. RESULTS: We present a novel solution approach to identify community structure in large complex networks and address resolution limitations in module detection. The proposed algorithm employs modularity to express network community structure and it is based on mixed integer optimisation models. The solution procedure is extended through an iterative procedure to diminish effects that tend to agglomerate smaller modules (resolution limitations). CONCLUSIONS: A comprehensive comparative analysis of methodologies for module detection based on modularity maximisation shows that our approach outperforms previously reported methods. Furthermore, in contrast to previous reports, we propose a strategy to handle resolution limitations in modularity maximisation. Overall, we illustrate ways to improve existing methodologies for community structure identification so as to increase its efficiency and applicability.

20.
Metab Eng ; 5(3): 211-9, 2003 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-12948755

RESUMO

The solution of the shortest path problem in biochemical systems constitutes an important step for studies of their evolution. In this paper, a linear programming (LP) algorithm for calculating minimal pathway distances in metabolic networks is studied. Minimal pathway distances are identified as the smallest number of metabolic steps separating two enzymes in metabolic pathways. The algorithm deals effectively with circularity and reaction directionality. The applicability of the algorithm is illustrated by calculating the minimal pathway distances for Escherichia coli small molecule metabolism enzymes, and then considering their correlations with genome distance (distance separating two genes on a chromosome) and enzyme function (as characterised by enzyme commission number). The results illustrate the effectiveness of the LP model. In addition, the data confirm that propinquity of genes on the genome implies similarity in function (as determined by co-involvement in the same region of the metabolic network), but suggest that no correlation exists between pathway distance and enzyme function. These findings offer insight into the probable mechanism of pathway evolution.


Assuntos
Algoritmos , Escherichia coli/genética , Escherichia coli/metabolismo , Metabolismo/fisiologia , Modelos Biológicos , Complexos Multienzimáticos/genética , Complexos Multienzimáticos/metabolismo , Programação Linear , Simulação por Computador , Evolução Molecular , Regulação Bacteriana da Expressão Gênica/fisiologia , Regulação Enzimológica da Expressão Gênica/genética , Análise Numérica Assistida por Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA