Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 78
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Biochemistry ; 63(2): 230-240, 2024 Jan 16.
Artigo em Inglês | MEDLINE | ID: mdl-38150593

RESUMO

The first step of histidine biosynthesis in Acinetobacter baumannii, the condensation of ATP and 5-phospho-α-d-ribosyl-1-pyrophosphate to produce N1-(5-phospho-ß-d-ribosyl)-ATP (PRATP) and pyrophosphate, is catalyzed by the hetero-octameric enzyme ATP phosphoribosyltransferase, a promising target for antibiotic design. The catalytic subunit, HisGS, is allosterically activated upon binding of the regulatory subunit, HisZ, to form the hetero-octameric holoenzyme (ATPPRT), leading to a large increase in kcat. Here, we present the crystal structure of ATPPRT, along with kinetic investigations of the rate-limiting steps governing catalysis in the nonactivated (HisGS) and activated (ATPPRT) forms of the enzyme. A pH-rate profile showed that maximum catalysis is achieved above pH 8.0. Surprisingly, at 25 °C, kcat is higher when ADP replaces ATP as substrate for ATPPRT but not for HisGS. The HisGS-catalyzed reaction is limited by the chemical step, as suggested by the enhancement of kcat when Mg2+ was replaced by Mn2+, and by the lack of a pre-steady-state burst of product formation. Conversely, the ATPPRT-catalyzed reaction rate is determined by PRATP diffusion from the active site, as gleaned from a substantial solvent viscosity effect. A burst of product formation could be inferred from pre-steady-state kinetics, but the first turnover was too fast to be directly observed. Lowering the temperature to 5 °C allowed observation of the PRATP formation burst by ATPPRT. At this temperature, the single-turnover rate constant was significantly higher than kcat, providing additional evidence for a step after chemistry limiting catalysis by ATPPRT. This demonstrates allosteric activation by HisZ accelerates the chemical step.


Assuntos
ATP Fosforribosiltransferase , Acinetobacter baumannii , ATP Fosforribosiltransferase/química , Difosfatos , Acinetobacter baumannii/metabolismo , Domínio Catalítico , Cinética , Trifosfato de Adenosina/metabolismo , Catálise
2.
BMC Bioinformatics ; 23(1): 261, 2022 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-35778683

RESUMO

BACKGROUND: Relationships among genetic or epigenetic features can be explored by learning probabilistic networks and unravelling the dependencies among a set of given genetic/epigenetic features. Bayesian networks (BNs) consist of nodes that represent the variables and arcs that represent the probabilistic relationships between the variables. However, practical guidance on how to make choices among the wide array of possibilities in Bayesian network analysis is limited. Our study aimed to apply a BN approach, while clearly laying out our analysis choices as an example for future researchers, in order to provide further insights into the relationships among epigenetic features and a stressful condition in chickens (Gallus gallus). RESULTS: Chickens raised under control conditions (n = 22) and chickens exposed to a social isolation protocol (n = 24) were used to identify differentially methylated regions (DMRs). A total of 60 DMRs were selected by a threshold, after bioinformatic pre-processing and analysis. The treatment was included as a binary variable (control = 0; stress = 1). Thereafter, a BN approach was applied: initially, a pre-filtering test was used for identifying pairs of features that must not be included in the process of learning the structure of the network; then, the average probability values for each arc of being part of the network were calculated; and finally, the arcs that were part of the consensus network were selected. The structure of the BN consisted of 47 out of 61 features (60 DMRs and the stressful condition), displaying 43 functional relationships. The stress condition was connected to two DMRs, one of them playing a role in tight and adhesive intracellular junctions in organs such as ovary, intestine, and brain. CONCLUSIONS: We clearly explain our steps in making each analysis choice, from discrete BN models to final generation of a consensus network from multiple model averaging searches. The epigenetic BN unravelled functional relationships among the DMRs, as well as epigenetic features in close association with the stressful condition the chickens were exposed to. The DMRs interacting with the stress condition could be further explored in future studies as possible biomarkers of stress in poultry species.


Assuntos
Galinhas , Aves Domésticas , Animais , Feminino , Teorema de Bayes , Galinhas/genética , Epigênese Genética
3.
Chemistry ; 28(70): e202201728, 2022 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-36112344

RESUMO

Is-PETase has become an enzyme of significant interest due to its ability to catalyse the degradation of polyethylene terephthalate (PET) at mesophilic temperatures. We performed hybrid quantum mechanics and molecular mechanics (QM/MM) at the DSD-PBEP86-D3/ma-def2-TZVP/CHARMM27//rev-PBE-D3/dev2-SVP/CHARMM level to calculate the energy profile for the degradation of a suitable PET model by this enzyme. Very low overall barriers are computed for serine protease-type hydrolysis steps (as low as 34.1 kJ mol-1 ). Spontaneous deprotonation of the final product, terephthalic acid, with a high computed driving force indicates that product release could be rate limiting.


Assuntos
Ácidos Ftálicos , Polietilenotereftalatos , Hidrolases/metabolismo , Catálise , Etilenos
4.
J Chem Inf Model ; 56(11): 2162-2179, 2016 11 28.
Artigo em Inglês | MEDLINE | ID: mdl-27749062

RESUMO

We compare a range of computational methods for the prediction of sublimation thermodynamics (enthalpy, entropy, and free energy of sublimation). These include a model from theoretical chemistry that utilizes crystal lattice energy minimization (with the DMACRYS program) and quantitative structure property relationship (QSPR) models generated by both machine learning (random forest and support vector machines) and regression (partial least squares) methods. Using these methods we investigate the predictability of the enthalpy, entropy and free energy of sublimation, with consideration of whether such a method may be able to improve solubility prediction schemes. Previous work has suggested that the major source of error in solubility prediction schemes involving a thermodynamic cycle via the solid state is in the modeling of the free energy change away from the solid state. Yet contrary to this conclusion other work has found that the inclusion of terms such as the enthalpy of sublimation in QSPR methods does not improve the predictions of solubility. We suggest the use of theoretical chemistry terms, detailed explicitly in the Methods section, as descriptors for the prediction of the enthalpy and free energy of sublimation. A data set of 158 molecules with experimental sublimation thermodynamics values and some CSD refcodes has been collected from the literature and is provided with their original source references.


Assuntos
Informática/métodos , Compostos Orgânicos/química , Transição de Fase , Entropia , Modelos Moleculares , Conformação Molecular , Relação Quantitativa Estrutura-Atividade
5.
PLoS Comput Biol ; 10(5): e1003642, 2014 May.
Artigo em Inglês | MEDLINE | ID: mdl-24874434

RESUMO

Phylogenomic analysis of the occurrence and abundance of protein domains in proteomes has recently showed that the α/ß architecture is probably the oldest fold design. This holds important implications for the origins of biochemistry. Here we explore structure-function relationships addressing the use of chemical mechanisms by ancestral enzymes. We test the hypothesis that the oldest folds used the most mechanisms. We start by tracing biocatalytic mechanisms operating in metabolic enzymes along a phylogenetic timeline of the first appearance of homologous superfamilies of protein domain structures from CATH. A total of 335 enzyme reactions were retrieved from MACiE and were mapped over fold age. We define a mechanistic step type as one of the 51 mechanistic annotations given in MACiE, and each step of each of the 335 mechanisms was described using one or more of these annotations. We find that the first two folds, the P-loop containing nucleotide triphosphate hydrolase and the NAD(P)-binding Rossmann-like homologous superfamilies, were α/ß architectures responsible for introducing 35% (18/51) of the known mechanistic step types. We find that these two oldest structures in the phylogenomic analysis of protein domains introduced many mechanistic step types that were later combinatorially spread in catalytic history. The most common mechanistic step types included fundamental building blocks of enzyme chemistry: "Proton transfer," "Bimolecular nucleophilic addition," "Bimolecular nucleophilic substitution," and "Unimolecular elimination by the conjugate base." They were associated with the most ancestral fold structure typical of P-loop containing nucleotide triphosphate hydrolases. Over half of the mechanistic step types were introduced in the evolutionary timeline before the appearance of structures specific to diversified organisms, during a period of architectural diversification. The other half unfolded gradually after organismal diversification and during a period that spanned ∼2 billion years of evolutionary history.


Assuntos
Catálise , Enzimas/química , Enzimas/genética , Evolução Molecular , Enzimas/ultraestrutura , Dobramento de Proteína , Estrutura Terciária de Proteína , Relação Estrutura-Atividade
6.
J Comput Aided Mol Des ; 29(2): 183-98, 2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-25425329

RESUMO

Recently developed multi-targeted ligands are novel drug candidates able to interact with monoamine oxidase A and B; acetylcholinesterase and butyrylcholinesterase; or with histamine N-methyltransferase and histamine H3-receptor (H3R). These proteins are drug targets in the treatment of depression, Alzheimer's disease, obsessive disorders, and Parkinson's disease. A probabilistic method, the Parzen-Rosenblatt window approach, was used to build a "predictor" model using data collected from the ChEMBL database. The model can be used to predict both the primary pharmaceutical target and off-targets of a compound based on its structure. Molecular structures were represented based on the circular fingerprint methodology. The same approach was used to build a "predictor" model from the DrugBank dataset to determine the main pharmacological groups of the compound. The study of off-target interactions is now recognised as crucial to the understanding of both drug action and toxicology. Primary pharmaceutical targets and off-targets for the novel multi-target ligands were examined by use of the developed cheminformatic method. Several multi-target ligands were selected for further study, as compounds with possible additional beneficial pharmacological activities. The cheminformatic targets identifications were in agreement with four 3D-QSAR (H3R/D1R/D2R/5-HT2aR) models and by in vitro assays for serotonin 5-HT1a and 5-HT2a receptor binding of the most promising ligand (71/MBA-VEG8).


Assuntos
Doença de Alzheimer/tratamento farmacológico , Doenças do Sistema Nervoso/tratamento farmacológico , Doença de Parkinson/tratamento farmacológico , Acetilcolinesterase/química , Acetilcolinesterase/metabolismo , Bases de Dados Factuais , Descoberta de Drogas , Histamina N-Metiltransferase/química , Histamina N-Metiltransferase/metabolismo , Humanos , Ligantes , Monoaminoxidase/química , Monoaminoxidase/metabolismo , Relação Quantitativa Estrutura-Atividade , Receptor 5-HT2A de Serotonina/química , Receptor 5-HT2A de Serotonina/metabolismo
7.
Pattern Recognit Lett ; 63: 30-35, 2015 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-26435560

RESUMO

Pattern classification methods assign an object to one of several predefined classes/categories based on features extracted from observed attributes of the object (pattern). When L discriminatory features for the pattern can be accurately determined, the pattern classification problem presents no difficulty. However, precise identification of the relevant features for a classification algorithm (classifier) to be able to categorize real world patterns without errors is generally infeasible. In this case, the pattern classification problem is often cast as devising a classifier that minimizes the misclassification rate. One way of doing this is to consider both the pattern attributes and its class label as random variables, estimate the posterior class probabilities for a given pattern and then assign the pattern to the class/category for which the posterior class probability value estimated is maximum. More often than not, the form of the posterior class probabilities is unknown. The so-called Parzen Window approach is widely employed to estimate class-conditional probability (class-specific probability) densities for a given pattern. These probability densities can then be utilized to estimate the appropriate posterior class probabilities for that pattern. However, the Parzen Window scheme can become computationally impractical when the size of the training dataset is in the tens of thousands and L is also large (a few hundred or more). Over the years, various schemes have been suggested to ameliorate the computational drawback of the Parzen Window approach, but the problem still remains outstanding and unresolved. In this paper, we revisit the Parzen Window technique and introduce a novel approach that may circumvent the aforementioned computational bottleneck. The current paper presents the mathematical aspect of our idea. Practical realizations of the proposed scheme will be given elsewhere.

8.
BMC Bioinformatics ; 15: 150, 2014 May 19.
Artigo em Inglês | MEDLINE | ID: mdl-24885296

RESUMO

BACKGROUND: In this work we predict enzyme function at the level of chemical mechanism, providing a finer granularity of annotation than traditional Enzyme Commission (EC) classes. Hence we can predict not only whether a putative enzyme in a newly sequenced organism has the potential to perform a certain reaction, but how the reaction is performed, using which cofactors and with susceptibility to which drugs or inhibitors, details with important consequences for drug and enzyme design. Work that predicts enzyme catalytic activity based on 3D protein structure features limits the prediction of mechanism to proteins already having either a solved structure or a close relative suitable for homology modelling. RESULTS: In this study, we evaluate whether sequence identity, InterPro or Catalytic Site Atlas sequence signatures provide enough information for bulk prediction of enzyme mechanism. By splitting MACiE (Mechanism, Annotation and Classification in Enzymes database) mechanism labels to a finer granularity, which includes the role of the protein chain in the overall enzyme complex, the method can predict at 96% accuracy (and 96% micro-averaged precision, 99.9% macro-averaged recall) the MACiE mechanism definitions of 248 proteins available in the MACiE, EzCatDb (Database of Enzyme Catalytic Mechanisms) and SFLD (Structure Function Linkage Database) databases using an off-the-shelf K-Nearest Neighbours multi-label algorithm. CONCLUSION: We find that InterPro signatures are critical for accurate prediction of enzyme mechanism. We also find that incorporating Catalytic Site Atlas attributes does not seem to provide additional accuracy. The software code (ml2db), data and results are available online at http://sourceforge.net/projects/ml2db/ and as supplementary files.


Assuntos
Inteligência Artificial , Enzimas/química , Análise de Sequência de Proteína , Algoritmos , Catálise , Domínio Catalítico , Bases de Dados de Proteínas , Conformação Proteica , Software
9.
J Mol Evol ; 79(3-4): 117-29, 2014 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-25185655

RESUMO

Bacteria use metallo-ß-lactamase enzymes to hydrolyse lactam rings found in many antibiotics, rendering them ineffective. Metallo-ß-lactamase activity is thought to be polyphyletic, having arisen on more than one occasion within a single functionally diverse homologous superfamily. Since discovery of multiple origins of enzymatic activity conferring antibiotic resistance has broad implications for the continued clinical use of antibiotics, we test the hypothesis of polyphyly further; if lactamase function has arisen twice independently, the most recent common ancestor (MRCA) is not expected to possess lactam-hydrolysing activity. Two major problems present themselves. Firstly, even with a perfectly known phylogeny, ancestral sequence reconstruction is error prone. Secondly, the phylogeny is not known, and in fact reconstructing a single, unambiguous phylogeny for the superfamily has proven impossible. To obtain a more statistical view of the strength of evidence for or against MRCA lactamase function, we reconstructed a sample of 98 MRCAs of the metallo-ß-lactamases, each based on a different tree in a bootstrap sample of reconstructed phylogenies. InterPro sequence signatures and homology modelling were then used to assess our sample of MRCAs for lactamase functionality. Only 5 % of these models conform to our criteria for metallo-ß-lactamase functionality, suggesting that the ancestor was unlikely to have been a metallo-ß-lactamase. On the other hand, given that ancestral proteins may have had metallo-ß-lactamase functionality with variation in sequence and structural properties compared with extant enzymes, our criteria are conservative, estimating a lower bound of evidence for metallo-ß-lactamase functionality but not an upper bound.


Assuntos
Bactérias/genética , Evolução Biológica , Filogenia , beta-Lactamases/genética , Sequência de Aminoácidos , Proteínas de Bactérias/genética , Funções Verossimilhança , Modelos Genéticos , Modelos Moleculares , Dados de Sequência Molecular , Estrutura Terciária de Proteína , Homologia de Sequência de Aminoácidos
10.
Mol Pharm ; 11(8): 2962-72, 2014 Aug 04.
Artigo em Inglês | MEDLINE | ID: mdl-24919008

RESUMO

We report the results of testing quantitative structure-property relationships (QSPR) that were trained upon the same druglike molecules but two different sets of solubility data: (i) data extracted from several different sources from the published literature, for which the experimental uncertainty is estimated to be 0.6-0.7 log S units (referred to mol/L); (ii) data measured by a single accurate experimental method (CheqSol), for which experimental uncertainty is typically <0.05 log S units. Contrary to what might be expected, the models derived from the CheqSol experimental data are not more accurate than those derived from the "noisy" literature data. The results suggest that, at the present time, it is the deficiency of QSPR methods (algorithms and/or descriptor sets), and not, as is commonly quoted, the uncertainty in the experimental measurements, which is the limiting factor in accurately predicting aqueous solubility for pharmaceutical molecules.


Assuntos
Química Farmacêutica/métodos , Água/química , Algoritmos , Cinética , Modelos Químicos , Estrutura Molecular , Relação Quantitativa Estrutura-Atividade , Análise de Regressão , Reprodutibilidade dos Testes , Projetos de Pesquisa , Software , Solubilidade , Temperatura , Termodinâmica
11.
J Chem Inf Model ; 54(3): 844-56, 2014 Mar 24.
Artigo em Inglês | MEDLINE | ID: mdl-24564264

RESUMO

We present four models of solution free-energy prediction for druglike molecules utilizing cheminformatics descriptors and theoretically calculated thermodynamic values. We make predictions of solution free energy using physics-based theory alone and using machine learning/quantitative structure-property relationship (QSPR) models. We also develop machine learning models where the theoretical energies and cheminformatics descriptors are used as combined input. These models are used to predict solvation free energy. While direct theoretical calculation does not give accurate results in this approach, machine learning is able to give predictions with a root mean squared error (RMSE) of ~1.1 log S units in a 10-fold cross-validation for our Drug-Like-Solubility-100 (DLS-100) dataset of 100 druglike molecules. We find that a model built using energy terms from our theoretical methodology as descriptors is marginally less predictive than one built on Chemistry Development Kit (CDK) descriptors. Combining both sets of descriptors allows a further but very modest improvement in the predictions. However, in some cases, this is a statistically significant enhancement. These results suggest that there is little complementarity between the chemical information provided by these two sets of descriptors, despite their different sources and methods of calculation. Our machine learning models are also able to predict the well-known Solubility Challenge dataset with an RMSE value of 0.9-1.0 log S units.


Assuntos
Modelos Químicos , Preparações Farmacêuticas/química , Inteligência Artificial , Cristalização , Modelos Moleculares , Solubilidade , Termodinâmica , Água/química
12.
Commun Chem ; 7(1): 77, 2024 Apr 06.
Artigo em Inglês | MEDLINE | ID: mdl-38582930

RESUMO

Heavy-isotope substitution into enzymes slows down bond vibrations and may alter transition-state barrier crossing probability if this is coupled to fast protein motions. ATP phosphoribosyltransferase from Acinetobacter baumannii is a multi-protein complex where the regulatory protein HisZ allosterically enhances catalysis by the catalytic protein HisGS. This is accompanied by a shift in rate-limiting step from chemistry to product release. Here we report that isotope-labelling of HisGS has no effect on the nonactivated reaction, which involves negative activation heat capacity, while HisZ-activated HisGS catalytic rate decreases in a strictly mass-dependent fashion across five different HisGS masses, at low temperatures. Surprisingly, the effect is not linked to the chemical step, but to fast motions governing product release in the activated enzyme. Disruption of a specific enzyme-product interaction abolishes the isotope effects. Results highlight how altered protein mass perturbs allosterically modulated thermal motions relevant to the catalytic cycle beyond the chemical step.

13.
Sci Rep ; 14(1): 9019, 2024 04 19.
Artigo em Inglês | MEDLINE | ID: mdl-38641606

RESUMO

Bayesian networks represent a useful tool to explore interactions within biological systems. The aims of this study were to identify a reduced number of genes associated with a stress condition in chickens (Gallus gallus) and to unravel their interactions by implementing a Bayesian network approach. Initially, one publicly available dataset (3 control vs. 3 heat-stressed chickens) was used to identify the stress signal, represented by 25 differentially expressed genes (DEGs). The dataset was augmented by looking for the 25 DEGs in other four publicly available databases. Bayesian network algorithms were used to discover the informative relationships between the DEGs. Only ten out of the 25 DEGs displayed interactions. Four of them were Heat Shock Proteins that could be playing a key role, especially under stress conditions, where maintaining the correct functioning of the cell machinery might be crucial. One of the DEGs is an open reading frame whose function is yet unknown, highlighting the power of Bayesian networks in knowledge discovery. Identifying an initial stress signal, augmenting it by combining other databases, and finally learning the structure of Bayesian networks allowed us to find genes closely related to stress, with the possibility of further exploring the system in future studies.


Assuntos
Galinhas , Perfilação da Expressão Gênica , Animais , Galinhas/genética , Galinhas/metabolismo , Perfilação da Expressão Gênica/veterinária , Teorema de Bayes , Resposta ao Choque Térmico/genética , Encéfalo , Redes Reguladoras de Genes
14.
BMC Bioinformatics ; 14: 213, 2013 Jul 03.
Artigo em Inglês | MEDLINE | ID: mdl-23819480

RESUMO

BACKGROUND: We present the algorithm PFClust (Parameter Free Clustering), which is able automatically to cluster data and identify a suitable number of clusters to group them into without requiring any parameters to be specified by the user. The algorithm partitions a dataset into a number of clusters that share some common attributes, such as their minimum expectation value and variance of intra-cluster similarity. A set of n objects can be clustered into any number of clusters from one to n, and there are many different hierarchical and partitional, agglomerative and divisive, clustering methodologies available that can be used to do this. Nonetheless, automatically determining the number of clusters present in a dataset constitutes a significant challenge for clustering algorithms. Identifying a putative optimum number of clusters to group the objects into involves computing and evaluating a range of clusterings with different numbers of clusters. However, there is no agreed or unique definition of optimum in this context. Thus, we test PFClust on datasets for which an external gold standard of 'correct' cluster definitions exists, noting that this division into clusters may be suboptimal according to other reasonable criteria. PFClust is heuristic in the sense that it cannot be described in terms of optimising any single simply-expressed metric over the space of possible clusterings. RESULTS: We validate PFClust firstly with reference to a number of synthetic datasets consisting of 2D vectors, showing that its clustering performance is at least equal to that of six other leading methodologies - even though five of the other methods are told in advance how many clusters to use. We also demonstrate the ability of PFClust to classify the three dimensional structures of protein domains, using a set of folds taken from the structural bioinformatics database CATH. CONCLUSIONS: We show that PFClust is able to cluster the test datasets a little better, on average, than any of the other algorithms, and furthermore is able to do this without the need to specify any external parameters. Results on the synthetic datasets demonstrate that PFClust generates meaningful clusters, while our algorithm also shows excellent agreement with the correct assignments for a dataset extracted from the CATH part-manually curated classification of protein domain structures.


Assuntos
Algoritmos , Análise por Conglomerados , Dobramento de Proteína , Estrutura Terciária de Proteína
18.
J Chem Inf Model ; 53(8): 1957-66, 2013 Aug 26.
Artigo em Inglês | MEDLINE | ID: mdl-23829430

RESUMO

In this study, two probabilistic machine-learning algorithms were compared for in silico target prediction of bioactive molecules, namely the well-established Laplacian-modified Naïve Bayes classifier (NB) and the more recently introduced (to Cheminformatics) Parzen-Rosenblatt Window. Both classifiers were trained in conjunction with circular fingerprints on a large data set of bioactive compounds extracted from ChEMBL, covering 894 human protein targets with more than 155,000 ligand-protein pairs. This data set is also provided as a benchmark data set for future target prediction methods due to its size as well as the number of bioactivity classes it contains. In addition to evaluating the methods, different performance measures were explored. This is not as straightforward as in binary classification settings, due to the number of classes, the possibility of multiple class memberships, and the need to translate model scores into "yes/no" predictions for assessing model performance. Both algorithms achieved a recall of correct targets that exceeds 80% in the top 1% of predictions. Performance depends significantly on the underlying diversity and size of a given class of bioactive compounds, with small classes and low structural similarity affecting both algorithms to different degrees. When tested on an external test set extracted from WOMBAT covering more than 500 targets by excluding all compounds with Tanimoto similarity above 0.8 to compounds from the ChEMBL data set, the current methodologies achieved a recall of 63.3% and 66.6% among the top 1% for Naïve Bayes and Parzen-Rosenblatt Window, respectively. While those numbers seem to indicate lower performance, they are also more realistic for settings where protein targets need to be established for novel chemical substances.


Assuntos
Algoritmos , Biologia Computacional/métodos , Teorema de Bayes , Benchmarking , Descoberta de Drogas , Humanos , Ligantes , Ligação Proteica , Proteínas/metabolismo , Reprodutibilidade dos Testes
19.
BMC Bioinformatics ; 13: 60, 2012 Apr 24.
Artigo em Inglês | MEDLINE | ID: mdl-22530800

RESUMO

BACKGROUND: We investigate the relationships between the EC (Enzyme Commission) class, the associated chemical reaction, and the reaction mechanism by building predictive models using Support Vector Machine (SVM), Random Forest (RF) and k-Nearest Neighbours (kNN). We consider two ways of encoding the reaction mechanism in descriptors, and also three approaches that encode only the overall chemical reaction. Both cross-validation and also an external test set are used. RESULTS: The three descriptor sets encoding overall chemical transformation perform better than the two descriptions of mechanism. SVM and RF models perform comparably well; kNN is less successful. Oxidoreductases and hydrolases are relatively well predicted by all types of descriptor; isomerases are well predicted by overall reaction descriptors but not by mechanistic ones. CONCLUSIONS: Our results suggest that pairs of similar enzyme reactions tend to proceed by different mechanisms. Oxidoreductases, hydrolases, and to some extent isomerases and ligases, have clear chemical signatures, making them easier to predict than transferases and lyases. We find evidence that isomerases as a class are notably mechanistically diverse and that their one shared property, of substrate and product being isomers, can arise in various unrelated ways.The performance of the different machine learning algorithms is in line with many cheminformatics applications, with SVM and RF being roughly equally effective. kNN is less successful, given the role that non-local information plays in successful classification. We note also that, despite a lack of clarity in the literature, EC number prediction is not a single problem; the challenge of predicting protein function from available sequence data is quite different from assigning an EC classification from a cheminformatics representation of a reaction.


Assuntos
Algoritmos , Inteligência Artificial , Enzimas/química , Enzimas/classificação , Modelos Químicos , Análise por Conglomerados , Bases de Dados de Proteínas , Enzimas/metabolismo , Máquina de Vetores de Suporte , Terminologia como Assunto
20.
Sci Rep ; 12(1): 7482, 2022 05 06.
Artigo em Inglês | MEDLINE | ID: mdl-35523843

RESUMO

Differences in the expression patterns of genes have been used to measure the effects of non-stress or stress conditions in poultry species. However, the list of genes identified can be extensive and they might be related to several biological systems. Therefore, the aim of this study was to identify a small set of genes closely associated with stress in a poultry animal model, the chicken (Gallus gallus), by reusing and combining data previously published together with bioinformatic analysis and Bayesian networks in a multi-step approach. Two datasets were collected from publicly available repositories and pre-processed. Bioinformatics analyses were performed to identify genes common to both datasets that showed differential expression patterns between non-stress and stress conditions. Bayesian networks were learnt using a Simulated Annealing algorithm implemented in the software Banjo. The structure of the Bayesian network consisted of 16 out of 19 genes together with the stress condition. Network structure showed CARD19 directly connected to the stress condition plus highlighted CYGB, BRAT1, and EPN3 as relevant, suggesting these genes could play a role in stress. The biological functionality of these genes is related to damage, apoptosis, and oxygen provision, and they could potentially be further explored as biomarkers of stress.


Assuntos
Galinhas , Baço , Algoritmos , Animais , Teorema de Bayes , Galinhas/genética , Biologia Computacional , Perfilação da Expressão Gênica , Redes Reguladoras de Genes
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa