Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
1.
Bioinformatics ; 40(8)2024 08 02.
Artigo em Inglês | MEDLINE | ID: mdl-39180771

RESUMO

MOTIVATION: A key challenge in metabolomics is annotating measured spectra from a biological sample with chemical identities. Currently, only a small fraction of measurements can be assigned identities. Two complementary computational approaches have emerged to address the annotation problem: mapping candidate molecules to spectra, and mapping query spectra to molecular candidates. In essence, the candidate molecule with the spectrum that best explains the query spectrum is recommended as the target molecule. Despite candidate ranking being fundamental in both approaches, limited prior works incorporated rank learning tasks in determining the target molecule. RESULTS: We propose a novel machine learning model, Ensemble Spectral Prediction (ESP), for metabolite annotation. ESP takes advantage of prior neural network-based annotation models that utilize multilayer perceptron (MLP) networks and Graph Neural Networks (GNNs). Based on the ranking results of the MLP- and GNN-based models, ESP learns a weighting for the outputs of MLP and GNN spectral predictors to generate a spectral prediction for a query molecule. Importantly, training data is stratified by molecular formula to provide candidate sets during model training. Further, baseline MLP and GNN models are enhanced by considering peak dependencies through label mixing and multi-tasking on spectral topic distributions. When trained on the NIST 2020 dataset and evaluated on the relevant candidate sets from PubChem, ESP improves average rank by 23.7% and 37.2% over the MLP and GNN baselines, respectively, demonstrating performance gain over state-of-the-art neural network approaches. However, MLP approaches remain strong contenders when considering top five ranks. Importantly, we show that annotation performance is dependent on the training dataset, the number of molecules in the candidate set and candidate similarity to the target molecule. AVAILABILITY AND IMPLEMENTATION: The ESP code, a trained model, and a Jupyter notebook that guide users on using the ESP tool is available at https://github.com/HassounLab/ESP.


Assuntos
Aprendizado de Máquina , Metabolômica , Redes Neurais de Computação , Metabolômica/métodos , Algoritmos , Metaboloma
2.
NPJ Syst Biol Appl ; 10(1): 56, 2024 May 27.
Artigo em Inglês | MEDLINE | ID: mdl-38802371

RESUMO

Despite significant advances in reconstructing genome-scale metabolic networks, the understanding of cellular metabolism remains incomplete for many organisms. A promising approach for elucidating cellular metabolism is analysing the full scope of enzyme promiscuity, which exploits the capacity of enzymes to bind to non-annotated substrates and generate novel reactions. To guide time-consuming costly experimentation, different computational methods have been proposed for exploring enzyme promiscuity. One relevant algorithm is PROXIMAL, which strongly relies on KEGG to define generic reaction rules and link specific molecular substructures with associated chemical transformations. Here, we present a completely new pipeline, PROXIMAL2, which overcomes the dependency on KEGG data. In addition, PROXIMAL2 introduces two relevant improvements with respect to the former version: i) correct treatment of multi-step reactions and ii) tracking of electric charges in the transformations. We compare PROXIMAL and PROXIMAL2 in recovering annotated products from substrates in KEGG reactions, finding a highly significant improvement in the level of accuracy. We then applied PROXIMAL2 to predict degradation reactions of phenolic compounds in the human gut microbiota. The results were compared to RetroPath RL, a different and relevant enzyme promiscuity method. We found a significant overlap between these two methods but also complementary results, which open new research directions into this relevant question in nutrition.


Assuntos
Algoritmos , Biologia Computacional , Microbioma Gastrointestinal , Redes e Vias Metabólicas , Fenóis , Microbioma Gastrointestinal/fisiologia , Humanos , Fenóis/metabolismo , Biologia Computacional/métodos
3.
bioRxiv ; 2024 Feb 08.
Artigo em Inglês | MEDLINE | ID: mdl-38370723

RESUMO

Although untargeted mass spectrometry-based metabolomics is crucial for understanding life's molecular underpinnings, its effectiveness is hampered by low annotation rates of the generated tandem mass spectra. To address this issue, we introduce a novel data-driven approach, Biotransformation-based Annotation Method (BAM), that leverages molecular structural similarities inherent in biochemical reactions. BAM operates by applying biotransformation rules to known 'anchor' molecules, which exhibit high spectral similarity to unknown spectra, thereby hypothesizing and ranking potential structures for the corresponding 'suspect' molecule. BAM's effectiveness is demonstrated by its success in annotating suspect spectra in a global molecular network comprising hundreds of millions of spectra. BAM was able to assign correct molecular structures to 24.2 % of examined anchor-suspect cases, thereby demonstrating remarkable advancement in metabolite annotation.

4.
Bioinformatics ; 39(8)2023 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-37490457

RESUMO

MOTIVATION: Accurately predicting the likelihood of interaction between two objects (compound-protein sequence, user-item, author-paper, etc.) is a fundamental problem in Computer Science. Current deep-learning models rely on learning accurate representations of the interacting objects. Importantly, relationships between the interacting objects, or features of the interaction, offer an opportunity to partition the data to create multi-views of the interacting objects. The resulting congruent and non-congruent views can then be exploited via contrastive learning techniques to learn enhanced representations of the objects. RESULTS: We present a novel method, Contrastive Stratification for Interaction Prediction (CSI), to stratify (partition) a dataset in a manner that can be exploited via Contrastive Multiview Coding to learn embeddings that maximize the mutual information across congruent data views. CSI assigns a key and multiple views to each data point, where data partitions under a particular key form congruent views of the data. We showcase the effectiveness of CSI by applying it to the compound-protein sequence interaction prediction problem, a pressing problem whose solution promises to expedite drug delivery (drug-protein interaction prediction), metabolic engineering, and synthetic biology (compound-enzyme interaction prediction) applications. Comparing CSI with a baseline model that does not utilize data stratification and contrastive learning, and show gains in average precision ranging from 13.7% to 39% using compounds and sequences as keys across multiple drug-target and enzymatic datasets, and gains ranging from 16.9% to 63% using reaction features as keys across enzymatic datasets. AVAILABILITY AND IMPLEMENTATION: Code and dataset available at https://github.com/HassounLab/CSI.


Assuntos
Sistemas de Liberação de Medicamentos , Engenharia Metabólica , Sequência de Aminoácidos , Probabilidade , Biologia Sintética
5.
Curr Opin Chem Biol ; 74: 102288, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-36966702

RESUMO

The computational metabolomics field brings together computer scientists, bioinformaticians, chemists, clinicians, and biologists to maximize the impact of metabolomics across a wide array of scientific and medical disciplines. The field continues to expand as modern instrumentation produces datasets with increasing complexity, resolution, and sensitivity. These datasets must be processed, annotated, modeled, and interpreted to enable biological insight. Techniques for visualization, integration (within or between omics), and interpretation of metabolomics data have evolved along with innovation in the databases and knowledge resources required to aid understanding. In this review, we highlight recent advances in the field and reflect on opportunities and innovations in response to the most pressing challenges. This review was compiled from discussions from the 2022 Dagstuhl seminar entitled "Computational Metabolomics: From Spectra to Knowledge".


Assuntos
Biologia Computacional , Metabolômica , Metabolômica/métodos , Espectrometria de Massas/métodos , Bases de Dados Factuais , Biologia Computacional/métodos
6.
Bioinformatics ; 39(3)2023 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-36790067

RESUMO

MOTIVATION: While traditionally utilized for identifying site-specific metabolic activity within a compound to alter its interaction with a metabolizing enzyme, predicting the site-of-metabolism (SOM) is essential in analyzing the promiscuity of enzymes on substrates. The successful prediction of SOMs and the relevant promiscuous products has a wide range of applications that include creating extended metabolic models (EMMs) that account for enzyme promiscuity and the construction of novel heterologous synthesis pathways. There is therefore a need to develop generalized methods that can predict molecular SOMs for a wide range of metabolizing enzymes. RESULTS: This article develops a Graph Neural Network (GNN) model for the classification of an atom (or a bond) being an SOM. Our model, GNN-SOM, is trained on enzymatic interactions, available in the KEGG database, that span all enzyme commission numbers. We demonstrate that GNN-SOM consistently outperforms baseline machine learning models, when trained on all enzymes, on Cytochrome P450 (CYP) enzymes, or on non-CYP enzymes. We showcase the utility of GNN-SOM in prioritizing predicted enzymatic products due to enzyme promiscuity for two biological applications: the construction of EMMs and the construction of synthesis pathways. AVAILABILITY AND IMPLEMENTATION: A python implementation of the trained SOM predictor model can be found at https://github.com/HassounLab/GNN-SOM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Sistema Enzimático do Citocromo P-450 , Redes Neurais de Computação , Sistema Enzimático do Citocromo P-450/metabolismo
7.
Bioinformatics ; 38(10): 2832-2838, 2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35561204

RESUMO

MOTIVATION: Despite experimental and curation efforts, the extent of enzyme promiscuity on substrates continues to be largely unexplored and under documented. Providing computational tools for the exploration of the enzyme-substrate interaction space can expedite experimentation and benefit applications such as constructing synthesis pathways for novel biomolecules, identifying products of metabolism on ingested compounds, and elucidating xenobiotic metabolism. Recommender systems (RS), which are currently unexplored for the enzyme-substrate interaction prediction problem, can be utilized to provide enzyme recommendations for substrates, and vice versa. The performance of Collaborative-Filtering (CF) RSs; however, hinges on the quality of embedding vectors of users and items (enzymes and substrates in our case). Importantly, enhancing CF embeddings with heterogeneous auxiliary data, specially relational data (e.g. hierarchical, pairwise or groupings), remains a challenge. RESULTS: We propose an innovative general RS framework, termed Boost-RS that enhances RS performance by 'boosting' embedding vectors through auxiliary data. Specifically, Boost-RS is trained and dynamically tuned on multiple relevant auxiliary learning tasks Boost-RS utilizes contrastive learning tasks to exploit relational data. To show the efficacy of Boost-RS for the enzyme-substrate prediction interaction problem, we apply the Boost-RS framework to several baseline CF models. We show that each of our auxiliary tasks boosts learning of the embedding vectors, and that contrastive learning using Boost-RS outperforms attribute concatenation and multi-label learning. We also show that Boost-RS outperforms similarity-based models. Ablation studies and visualization of learned representations highlight the importance of using contrastive learning on some of the auxiliary data in boosting the embedding vectors. AVAILABILITY AND IMPLEMENTATION: A Python implementation for Boost-RS is provided at https://github.com/HassounLab/Boost-RS. The enzyme-substrate interaction data is available from the KEGG database (https://www.genome.jp/kegg/).


Assuntos
Aprendizado de Máquina
8.
Integr Comp Biol ; 61(6): 2267-2275, 2022 02 05.
Artigo em Inglês | MEDLINE | ID: mdl-34448841

RESUMO

Despite efforts to integrate research across different subdisciplines of biology, the scale of integration remains limited. We hypothesize that future generations of Artificial Intelligence (AI) technologies specifically adapted for biological sciences will help enable the reintegration of biology. AI technologies will allow us not only to collect, connect, and analyze data at unprecedented scales, but also to build comprehensive predictive models that span various subdisciplines. They will make possible both targeted (testing specific hypotheses) and untargeted discoveries. AI for biology will be the cross-cutting technology that will enhance our ability to do biological research at every scale. We expect AI to revolutionize biology in the 21st century much like statistics transformed biology in the 20th century. The difficulties, however, are many, including data curation and assembly, development of new science in the form of theories that connect the subdisciplines, and new predictive and interpretable AI models that are more suited to biology than existing machine learning and AI techniques. Development efforts will require strong collaborations between biological and computational scientists. This white paper provides a vision for AI for Biology and highlights some challenges.


Assuntos
Inteligência Artificial , Aprendizado de Máquina , Animais , Biologia , Tecnologia
9.
Metab Eng Commun ; 12: e00170, 2021 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-33850714

RESUMO

Increasing understanding of metabolic and regulatory networks underlying microbial physiology has enabled creation of progressively more complex synthetic biological systems for biochemical, biomedical, agricultural, and environmental applications. However, despite best efforts, confounding phenotypes still emerge from unforeseen interplay between biological parts, and the design of robust and modular biological systems remains elusive. Such interactions are difficult to predict when designing synthetic systems and may manifest during experimental testing as inefficiencies that need to be overcome. Transforming organisms such as Escherichia coli into microbial factories is achieved via several engineering strategies, used individually or in combination, with the goal of maximizing the production of chosen target compounds. One technique relies on suppressing or overexpressing selected genes; another involves introducing heterologous enzymes into a microbial host. These modifications steer mass flux towards the set of desired metabolites but may create unexpected interactions. In this work, we develop a computational method, termed Metabolic Disruption Workflow (MDFlow), for discovering interactions and network disruptions arising from enzyme promiscuity - the ability of enzymes to act on a wide range of molecules that are structurally similar to their native substrates. We apply MDFlow to two experimentally verified cases where strains with essential genes knocked out are rescued by interactions resulting from overexpression of one or more other genes. We demonstrate how enzyme promiscuity may aid cells in adapting to disruptions of essential metabolic functions. We then apply MDFlow to predict and evaluate a number of putative promiscuous reactions that can interfere with two heterologous pathways designed for 3-hydroxypropionic acid (3-HP) production. Using MDFlow, we can identify putative enzyme promiscuity and the subsequent formation of unintended and undesirable byproducts that are not only disruptive to the host metabolism but also to the intended end-objective of high biosynthetic productivity and yield. As we demonstrate, MDFlow provides an innovative workflow to systematically identify incompatibilities between the native metabolism of the host and its engineered modifications due to enzyme promiscuity.

10.
Bioinformatics ; 2021 Jan 30.
Artigo em Inglês | MEDLINE | ID: mdl-33515234

RESUMO

MOTIVATION: As experimental efforts are costly and time consuming, computational characterization of enzyme capabilities is an attractive alternative. We present and evaluate several machine-learning models to predict which of 983 distinct enzymes, as defined via the Enzyme Commission (EC) numbers, are likely to interact with a given query molecule. Our data consists of enzyme-substrate interactions from the BRENDA database. Some interactions are attributed to natural selection and involve the enzyme's natural substrates. The majority of the interactions however involve non-natural substrates, thus reflecting promiscuous enzymatic activities. RESULTS: We frame this "enzyme promiscuity prediction" problem as a multi-label classification task. We maximally utilize inhibitor and unlabelled data to train prediction models that can take advantage of known hierarchical relationships between enzyme classes. We report that a hierarchical multi-label neural network, EPP-HMCNF, is the best model for solving this problem, outperforming k-nearest neighbours similarity-based and other machine learning models. We show that inhibitor information during training consistently improves predictive power, particularly for EPP-HMCNF. We also show that all promiscuity prediction models perform worse under a realistic data split when compared to a random data split, and when evaluating performance on non-natural substrates compared to natural substrates. AVAILABILITY AND IMPLEMENTATION: We provide Python code for EPP-HMCNF and other models in a repository termed EPP (Enzyme Promiscuity Prediction) at https://github.com/hassounlab/EPP. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

11.
Bioinformatics ; 37(6): 793-799, 2021 05 05.
Artigo em Inglês | MEDLINE | ID: mdl-33051674

RESUMO

MOTIVATION: The complete characterization of enzymatic activities between molecules remains incomplete, hindering biological engineering and limiting biological discovery. We develop in this work a technique, enzymatic link prediction (ELP), for predicting the likelihood of an enzymatic transformation between two molecules. ELP models enzymatic reactions cataloged in the KEGG database as a graph. ELP is innovative over prior works in using graph embedding to learn molecular representations that capture not only molecular and enzymatic attributes but also graph connectivity. RESULTS: We explore transductive (test nodes included in the training graph) and inductive (test nodes not part of the training graph) learning models. We show that ELP achieves high AUC when learning node embeddings using both graph connectivity and node attributes. Further, we show that graph embedding improves link prediction by 30% in area under curve over fingerprint-based similarity approaches and by 8% over support vector machines. We compare ELP against rule-based methods. We also evaluate ELP for predicting links in pathway maps and for reconstruction of edges in reaction networks of four common gut microbiota phyla: actinobacteria, bacteroidetes, firmicutes and proteobacteria. To emphasize the importance of graph embedding in the context of biochemical networks, we illustrate how graph embedding can guide visualization. AVAILABILITY AND IMPLEMENTATION: The code and datasets are available through https://github.com/HassounLab/ELP.


Assuntos
Aprendizado de Máquina
12.
Metabolites ; 10(5)2020 May 03.
Artigo em Inglês | MEDLINE | ID: mdl-32375258

RESUMO

Motivation: Untargeted metabolomics comprehensively characterizes small molecules and elucidates activities of biochemical pathways within a biological sample. Despite computational advances, interpreting collected measurements and determining their biological role remains a challenge. Results: To interpret measurements, we present an inference-based approach, termed Probabilistic modeling for Untargeted Metabolomics Analysis (PUMA). Our approach captures metabolomics measurements and the biological network for the biological sample under study in a generative model and uses stochastic sampling to compute posterior probability distributions. PUMA predicts the likelihood of pathways being active, and then derives probabilistic annotations, which assign chemical identities to measurements. Unlike prior pathway analysis tools that analyze differentially active pathways, PUMA defines a pathway as active if the likelihood that the path generated the observed measurements is above a particular (user-defined) threshold. Due to the lack of "ground truth" metabolomics datasets, where all measurements are annotated and pathway activities are known, PUMA is validated on synthetic datasets that are designed to mimic cellular processes. PUMA, on average, outperforms pathway enrichment analysis by 8%. PUMA is applied to two case studies. PUMA suggests many biological meaningful pathways as active. Annotation results were in agreement to those obtained using other tools that utilize additional information in the form of spectral signatures. Importantly, PUMA annotates many measurements, suggesting 23 chemical identities for metabolites that were previously only identified as isomers, and a significant number of additional putative annotations over spectral database lookups. For an experimentally validated 50-compound dataset, annotations using PUMA yielded 0.833 precision and 0.676 recall.

13.
Metabolites ; 10(4)2020 Apr 21.
Artigo em Inglês | MEDLINE | ID: mdl-32326153

RESUMO

Mass spectrometry coupled with chromatography separation techniques provides a powerful platform for untargeted metabolomics. Determining the chemical identities of detected compounds however remains a major challenge. Here, we present a novel computational workflow, termed extended metabolic model filtering (EMMF), that aims to engineer a candidate set, a listing of putative chemical identities to be used during annotation, through an extended metabolic model (EMM). An EMM includes not only canonical substrates and products of enzymes already cataloged in a database through a reference metabolic model, but also metabolites that can form due to substrate promiscuity. EMMF aims to strike a balance between discovering previously uncharacterized metabolites and the computational burden of annotation. EMMF was applied to untargeted LC-MS data collected from cultures of Chinese hamster ovary (CHO) cells and murine cecal microbiota. EMM metabolites matched, on average, to 23.92% of measured masses, providing a > 7-fold increase in the candidate set size when compared to a reference metabolic model. Many metabolites suggested by EMMF are not catalogued in PubChem. For the CHO cell, we experimentally confirmed the presence of 4-hydroxyphenyllactate, a metabolite predicted by EMMF that has not been previously documented as part of the CHO cell metabolic model.

14.
PLoS Comput Biol ; 16(4): e1007779, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-32339164

RESUMO

Antibodies are capable of potently and specifically binding individual antigens and, in some cases, disrupting their functions. The key challenge in generating antibody-based inhibitors is the lack of fundamental information relating sequences of antibodies to their unique properties as inhibitors. We develop a pipeline, Antibody Sequence Analysis Pipeline using Statistical testing and Machine Learning (ASAP-SML), to identify features that distinguish one set of antibody sequences from antibody sequences in a reference set. The pipeline extracts feature fingerprints from sequences. The fingerprints represent germline, CDR canonical structure, isoelectric point and frequent positional motifs. Machine learning and statistical significance testing techniques are applied to antibody sequences and extracted feature fingerprints to identify distinguishing feature values and combinations thereof. To demonstrate how it works, we applied the pipeline on sets of antibody sequences known to bind or inhibit the activities of matrix metalloproteinases (MMPs), a family of zinc-dependent enzymes that promote cancer progression and undesired inflammation under pathological conditions, against reference datasets that do not bind or inhibit MMPs. ASAP-SML identifies features and combinations of feature values found in the MMP-targeting sets that are distinct from those in the reference sets.


Assuntos
Anticorpos , Biologia Computacional/métodos , Aprendizado de Máquina , Análise de Sequência de Proteína/métodos , Software , Algoritmos , Anticorpos/química , Anticorpos/metabolismo , Bases de Dados de Proteínas , Humanos , Inibidores de Metaloproteinases de Matriz/química , Inibidores de Metaloproteinases de Matriz/metabolismo , Metaloproteinases da Matriz/química , Metaloproteinases da Matriz/metabolismo
15.
Brief Bioinform ; 21(6): 1875-1885, 2020 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-31745550

RESUMO

While elementary flux mode (EFM) analysis is now recognized as a cornerstone computational technique for cellular pathway analysis and engineering, EFM application to genome-scale models remains computationally prohibitive. This article provides a review of aspects of EFM computation that elucidates bottlenecks in scaling EFM computation. First, algorithms for computing EFMs are reviewed. Next, the impact of redundant constraints, sensitivity to constraint ordering and network compression are evaluated. Then, the advantages and limitations of recent parallelization and GPU-based efforts are highlighted. The article then reviews alternative pathway analysis approaches that aim to reduce the EFM solution space. Despite advances in EFM computation, our review concludes that continued scaling of EFM computation is necessary to apply EFM to genome-scale models. Further, our review concludes that pathway analysis methods that target specific pathway properties can provide powerful alternatives to EFM analysis.


Assuntos
Algoritmos , Análise do Fluxo Metabólico , Análise do Fluxo Metabólico/métodos , Redes e Vias Metabólicas , Projetos de Pesquisa , Biologia de Sistemas/métodos
16.
Microb Cell Fact ; 18(1): 109, 2019 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-31196115

RESUMO

BACKGROUND: Metabolic models are indispensable in guiding cellular engineering and in advancing our understanding of systems biology. As not all enzymatic activities are fully known and/or annotated, metabolic models remain incomplete, resulting in suboptimal computational analysis and leading to unexpected experimental results. We posit that one major source of unaccounted metabolism is promiscuous enzymatic activity. It is now well-accepted that most, if not all, enzymes are promiscuous-i.e., they transform substrates other than their primary substrate. However, there have been no systematic analyses of genome-scale metabolic models to predict putative reactions and/or metabolites that arise from enzyme promiscuity. RESULTS: Our workflow utilizes PROXIMAL-a tool that uses reactant-product transformation patterns from the KEGG database-to predict putative structural modifications due to promiscuous enzymes. Using iML1515 as a model system, we first utilized a computational workflow, referred to as Extended Metabolite Model Annotation (EMMA), to predict promiscuous reactions catalyzed, and metabolites produced, by natively encoded enzymes in Escherichia coli. We predict hundreds of new metabolites that can be used to augment iML1515. We then validated our method by comparing predicted metabolites with the Escherichia coli Metabolome Database (ECMDB). CONCLUSIONS: We utilized EMMA to augment the iML1515 metabolic model to more fully reflect cellular metabolic activity. This workflow uses enzyme promiscuity as basis to predict hundreds of reactions and metabolites that may exist in E. coli but may have not been documented in iML1515 or other databases. We provide detailed analysis of 23 predicted reactions and 16 associated metabolites. Interestingly, nine of these metabolites, which are in ECMDB, have not previously been documented in any other E. coli databases. Four of the predicted reactions provide putative transformations parallel to those already in iML1515. We suggest adding predicted metabolites and reactions to iML1515 to create an extended metabolic model (EMM) for E. coli.


Assuntos
Proteínas de Escherichia coli/metabolismo , Escherichia coli/enzimologia , Bases de Dados de Proteínas , Escherichia coli/genética , Escherichia coli/metabolismo , Proteínas de Escherichia coli/genética , Metaboloma , Metabolômica , Modelos Biológicos
17.
Biotechnol Bioeng ; 116(6): 1405-1416, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-30802311

RESUMO

Current pathway synthesis tools identify possible pathways that can be added to a host to produce the desired target molecule through the exploration of abstract metabolic and reaction network space. However, not many of these tools explore gene-level information required to physically realize the identified synthesis pathways, and none explore enzyme-host compatibility. Developing tools that address this disconnect between abstract reactions/metabolic design space and physical genetic sequence design space will enable expedited experimental efforts that avoid exploring unprofitable synthesis pathways. This work describes a workflow, termed Probabilistic Pathway Assembly with Solubility Confidence Scores (ProPASS), which links synthesis pathway construction with the exploration of the physical design space as imposed by the availability of enzymes with predicted characterized activities within the host. Predicted protein solubility propensity scores are used as a confidence level to quantify the compatibility of each pathway enzyme with the host Escherichia coli (E. coli). This study also presents a database, termed Protein Solubility Database (ProSol DB), which provides solubility confidence scores in E. coli for 240,016 characterized enzymes obtained from UniProtKB/Swiss-Prot. The utility of ProPASS is demonstrated by generating genetic implementations of heterologous synthesis pathways in E. coli that target several commercially useful biomolecules.


Assuntos
Vias Biossintéticas , Proteínas de Escherichia coli/metabolismo , Escherichia coli/enzimologia , Biocatálise , Escherichia coli/genética , Escherichia coli/metabolismo , Proteínas de Escherichia coli/genética , Microbiologia Industrial , Engenharia Metabólica , Solubilidade , Fluxo de Trabalho
18.
Genomics ; 109(3-4): 196-203, 2017 07.
Artigo em Inglês | MEDLINE | ID: mdl-28347827

RESUMO

Failure by RNA polymerase to break contacts with promoter DNA results in release of bound RNA and re-initiation of transcription. These abortive RNAs were assumed to be non-functional but have recently been shown to affect termination in bacteriophage T7. Little is known about the functional role of these RNA in other genetic models. Using a computational approach, we investigated whether abortive RNA could exert function in E. coli. Fragments generated from 3780 transcription units were used as query sequences within their respective transcription units to search for possible binding sites. Sites that fell within known regulatory features were then ranked based upon the free energy of hybridization to the abortive. We further hypothesize about mechanisms of regulatory action for a select number of likely matches. Future experimental validation of these putative abortive-mRNA pairs may confirm our findings and promote exploration of functional abortive RNAs (faRNAs) in natural and synthetic systems.


Assuntos
RNA Polimerases Dirigidas por DNA/metabolismo , Escherichia coli/genética , Regulação Bacteriana da Expressão Gênica , Regiões Promotoras Genéticas , RNA não Traduzido/metabolismo , Biologia Computacional , Escherichia coli/metabolismo , Modelos Genéticos , RNA Bacteriano/metabolismo , Transcrição Gênica
19.
Metab Eng Commun ; 4: 37-47, 2017 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-29468131

RESUMO

Directed evolution of enzymes consists of an iterative process of creating mutant libraries and choosing desired phenotypes through screening or selection until the enzymatic activity reaches a desired goal. The biggest challenge in directed enzyme evolution is identifying high-throughput screens or selections to isolate the variant(s) with the desired property. We present in this paper a computational metabolic engineering framework, Selection Finder (SelFi), to construct a selection pathway from a desired enzymatic product to a cellular host and to couple the pathway with cell survival. We applied SelFi to construct selection pathways for four enzymes and their desired enzymatic products xylitol, D-ribulose-1,5-bisphosphate, methanol, and aniline. Two of the selection pathways identified by SelFi were previously experimentally validated for engineering Xylose Reductase and RuBisCO. Importantly, SelFi advances directed evolution of enzymes as there is currently no known generalized strategies or computational techniques for identifying high-throughput selections for engineering enzymes.

20.
PLoS One ; 11(5): e0155405, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27228122

RESUMO

Stunting or reduced linear growth is very prevalent in low-income countries. Recent studies have demonstrated a causal relationship between alterations in the gut microbiome and moderate or severe acute malnutrition in children in these countries. However, there have been no primary longitudinal studies comparing the intestinal microbiota of persistently stunted children to that of non-stunted children in the same community. In this pilot study, we characterized gut microbial community composition and diversity of the fecal microbiota of 10 children with low birth weight and persistent stunting (cases) and 10 children with normal birth weight and no stunting (controls) from a birth cohort every 3 months up to 2 years of age in a slum community in south India. There was an increase in diversity indices (P <0.0001) with increasing age in all children. However, there were no differences in diversity indices or in the rates of their increase with increasing age between cases and controls. The percent relative abundance of the Bacteroidetes phylum was higher in stunted compared to control children at 12 months of age (P = 0.043). There was an increase in the relative abundance of this phylum with increasing age in all children (P = 0.0380) with no difference in the rate of increase between cases and controls. There was a decrease in the relative abundance of Proteobacteria (P = 0.0004) and Actinobacteria (P = 0.0489) with increasing age in cases. The microbiota of control children was enriched in probiotic species Bifidobacterium longum and Lactobacillus mucosae, whereas that of stunted children was enriched in inflammogenic taxa including those in the Desulfovibrio genus and Campylobacterales order. Larger, longitudinal studies on the compositional and functional maturation of the microbiome in children are needed.


Assuntos
Bactérias , Microbioma Gastrointestinal , Transtornos do Crescimento/microbiologia , Fatores Etários , Bactérias/classificação , Bactérias/genética , Bactérias/metabolismo , Pré-Escolar , Feminino , Humanos , Índia , Lactente , Recém-Nascido , Estudos Longitudinais , Masculino
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA