Pesquisa | BVS Doenças Infecciosas e Parasitárias

1.

Characterizing and predicting ccRCC-causing missense mutations in Von Hippel-Lindau disease.

Serghini, Adam; Portelli, Stephanie; Troadec, Guillaume; Song, Catherine; Pan, Qisheng; Pires, Douglas E V; Ascher, David B.

Hum Mol Genet ; 33(3): 224-232, 2024 Jan 20.

Artigo em Inglês | MEDLINE | ID: mdl-37883464

RESUMO

BACKGROUND: Mutations within the Von Hippel-Lindau (VHL) tumor suppressor gene are known to cause VHL disease, which is characterized by the formation of cysts and tumors in multiple organs of the body, particularly clear cell renal cell carcinoma (ccRCC). A major challenge in clinical practice is determining tumor risk from a given mutation in the VHL gene. Previous efforts have been hindered by limited available clinical data and technological constraints. METHODS: To overcome this, we initially manually curated the largest set of clinically validated VHL mutations to date, enabling a robust assessment of existing predictive tools on an independent test set. Additionally, we comprehensively characterized the effects of mutations within VHL using in silico biophysical tools describing changes in protein stability, dynamics and affinity to binding partners to provide insights into the structure-phenotype relationship. These descriptive properties were used as molecular features for the construction of a machine learning model, designed to predict the risk of ccRCC development as a result of a VHL missense mutation. RESULTS: Analysis of our model showed an accuracy of 0.81 in the identification of ccRCC-causing missense mutations, and a Matthew's Correlation Coefficient of 0.44 on a non-redundant blind test, a significant improvement in comparison to the previous available approaches. CONCLUSION: This work highlights the power of using protein 3D structure to fully explore the range of molecular and functional consequences of genomic variants. We believe this optimized model will better enable its clinical implementation and assist guiding patient risk stratification and management.

Assuntos

Aprendizado de Máquina , Mutação de Sentido Incorreto , Doença de von Hippel-Lindau , Humanos , Carcinoma de Células Renais/genética , Carcinoma de Células Renais/metabolismo , Neoplasias Renais/metabolismo , Mutação de Sentido Incorreto/genética , Doença de von Hippel-Lindau/genética , Doença de von Hippel-Lindau/patologia , Proteína Supressora de Tumor Von Hippel-Lindau/genética , Proteína Supressora de Tumor Von Hippel-Lindau/química , Proteína Supressora de Tumor Von Hippel-Lindau/metabolismo

2.

A recurrent de novo splice site variant involving DNM1 exon 10a causes developmental and epileptic encephalopathy through a dominant-negative mechanism.

Parthasarathy, Shridhar; Ruggiero, Sarah McKeown; Gelot, Antoinette; Soardi, Fernanda C; Ribeiro, Bethânia F R; Pires, Douglas E V; Ascher, David B; Schmitt, Alain; Rambaud, Caroline; Represa, Alfonso; Xie, Hongbo M; Lusk, Laina; Wilmarth, Olivia; McDonnell, Pamela Pojomovsky; Juarez, Olivia A; Grace, Alexandra N; Buratti, Julien; Mignot, Cyril; Gras, Domitille; Nava, Caroline; Pierce, Samuel R; Keren, Boris; Kennedy, Benjamin C; Pena, Sergio D J; Helbig, Ingo; Cuddapah, Vishnu Anand.

Am J Hum Genet ; 109(12): 2253-2269, 2022 12 01.

Artigo em Inglês | MEDLINE | ID: mdl-36413998

RESUMO

Heterozygous pathogenic variants in DNM1 cause developmental and epileptic encephalopathy (DEE) as a result of a dominant-negative mechanism impeding vesicular fission. Thus far, pathogenic variants in DNM1 have been studied with a canonical transcript that includes the alternatively spliced exon 10b. However, after performing RNA sequencing in 39 pediatric brain samples, we find the primary transcript expressed in the brain includes the downstream exon 10a instead. Using this information, we evaluated genotype-phenotype correlations of variants affecting exon 10a and identified a cohort of eleven previously unreported individuals. Eight individuals harbor a recurrent de novo splice site variant, c.1197-8G>A (GenBank: NM_001288739.1), which affects exon 10a and leads to DEE consistent with the classical DNM1 phenotype. We find this splice site variant leads to disease through an unexpected dominant-negative mechanism. Functional testing reveals an in-frame upstream splice acceptor causing insertion of two amino acids predicted to impair oligomerization-dependent activity. This is supported by neuropathological samples showing accumulation of enlarged synaptic vesicles adherent to the plasma membrane consistent with impaired vesicular fission. Two additional individuals with missense variants affecting exon 10a, p.Arg399Trp and p.Gly401Asp, had a similar DEE phenotype. In contrast, one individual with a missense variant affecting exon 10b, p.Pro405Leu, which is less expressed in the brain, had a correspondingly less severe presentation. Thus, we implicate variants affecting exon 10a as causing the severe DEE typically associated with DNM1-related disorders. We highlight the importance of considering relevant isoforms for disease-causing variants as well as the possibility of splice site variants acting through a dominant-negative mechanism.

Assuntos

Encefalopatias , Dinaminas , Síndromes Epilépticas , Humanos , Encefalopatias/genética , Causalidade , Dinaminas/genética , Éxons/genética , Heterozigoto , Mutação/genética , Síndromes Epilépticas/genética

3.

epitope1D: accurate taxonomy-aware B-cell linear epitope prediction.

da Silva, Bruna Moreira; Ascher, David B; Pires, Douglas E V.

Brief Bioinform ; 24(3)2023 05 19.

Artigo em Inglês | MEDLINE | ID: mdl-37039696

RESUMO

The ability to identify B-cell epitopes is an essential step in vaccine design, immunodiagnostic tests and antibody production. Several computational approaches have been proposed to identify, from an antigen protein or peptide sequence, which residues are more likely to be part of an epitope, but have limited performance on relatively homogeneous data sets and lack interpretability, limiting biological insights that could otherwise be obtained. To address these limitations, we have developed epitope1D, an explainable machine learning method capable of accurately identifying linear B-cell epitopes, leveraging two new descriptors: a graph-based signature representation of protein sequences, based on our well-established Cutoff Scanning Matrix algorithm and Organism Ontology information. Our model achieved Areas Under the ROC curve of up to 0.935 on cross-validation and blind tests, demonstrating robust performance. A comprehensive comparison to alternative methods using distinct benchmark data sets was also employed, with our model outperforming state-of-the-art tools. epitope1D represents not only a significant advance in predictive performance, but also allows biologically meaningful features to be combined and used for model interpretation. epitope1D has been made available as a user-friendly web server interface and application programming interface at https://biosig.lab.uq.edu.au/epitope1d/.

Assuntos

Algoritmos , Epitopos de Linfócito B , Sequência de Aminoácidos , Curva ROC

4.

DDMut: predicting effects of mutations on protein stability using deep learning.

Zhou, Yunzhuo; Pan, Qisheng; Pires, Douglas E V; Rodrigues, Carlos H M; Ascher, David B.

Nucleic Acids Res ; 51(W1): W122-W128, 2023 07 05.

Artigo em Inglês | MEDLINE | ID: mdl-37283042

RESUMO

Understanding the effects of mutations on protein stability is crucial for variant interpretation and prioritisation, protein engineering, and biotechnology. Despite significant efforts, community assessments of predictive tools have highlighted ongoing limitations, including computational time, low predictive power, and biased predictions towards destabilising mutations. To fill this gap, we developed DDMut, a fast and accurate siamese network to predict changes in Gibbs Free Energy upon single and multiple point mutations, leveraging both forward and hypothetical reverse mutations to account for model anti-symmetry. Deep learning models were built by integrating graph-based representations of the localised 3D environment, with convolutional layers and transformer encoders. This combination better captured the distance patterns between atoms by extracting both short-range and long-range interactions. DDMut achieved Pearson's correlations of up to 0.70 (RMSE: 1.37 kcal/mol) on single point mutations, and 0.70 (RMSE: 1.84 kcal/mol) on double/triple mutants, outperforming most available methods across non-redundant blind test sets. Importantly, DDMut was highly scalable and demonstrated anti-symmetric performance on both destabilising and stabilising mutations. We believe DDMut will be a useful platform to better understand the functional consequences of mutations, and guide rational protein engineering. DDMut is freely available as a web server and API at https://biosig.lab.uq.edu.au/ddmut.

Assuntos

Aprendizado Profundo , Estabilidade Proteica , Proteínas , Software , Mutação , Mutação Puntual , Proteínas/química , Proteínas/genética

5.

Oxidative desulfurization pathway for complete catabolism of sulfoquinovose by bacteria.

Sharma, Mahima; Lingford, James P; Petricevic, Marija; Snow, Alexander J D; Zhang, Yunyang; Järvå, Michael A; Mui, Janice W-Y; Scott, Nichollas E; Saunders, Eleanor C; Mao, Runyu; Epa, Ruwan; da Silva, Bruna M; Pires, Douglas E V; Ascher, David B; McConville, Malcolm J; Davies, Gideon J; Williams, Spencer J; Goddard-Borger, Ethan D.

Proc Natl Acad Sci U S A ; 119(4)2022 01 25.

Artigo em Inglês | MEDLINE | ID: mdl-35074914

RESUMO

Catabolism of sulfoquinovose (SQ; 6-deoxy-6-sulfoglucose), the ubiquitous sulfosugar produced by photosynthetic organisms, is an important component of the biogeochemical carbon and sulfur cycles. Here, we describe a pathway for SQ degradation that involves oxidative desulfurization to release sulfite and enable utilization of the entire carbon skeleton of the sugar to support the growth of the plant pathogen Agrobacterium tumefaciens SQ or its glycoside sulfoquinovosyl glycerol are imported into the cell by an ATP-binding cassette transporter system with an associated SQ binding protein. A sulfoquinovosidase hydrolyzes the SQ glycoside and the liberated SQ is acted on by a flavin mononucleotide-dependent sulfoquinovose monooxygenase, in concert with an NADH-dependent flavin reductase, to release sulfite and 6-oxo-glucose. An NAD(P)H-dependent oxidoreductase reduces the 6-oxo-glucose to glucose, enabling entry into primary metabolic pathways. Structural and biochemical studies provide detailed insights into the recognition of key metabolites by proteins in this pathway. Bioinformatic analyses reveal that the sulfoquinovose monooxygenase pathway is distributed across Alpha- and Betaproteobacteria and is especially prevalent within the Rhizobiales order. This strategy for SQ catabolism is distinct from previously described pathways because it enables the complete utilization of all carbons within SQ by a single organism with concomitant production of inorganic sulfite.

Assuntos

Bactérias/metabolismo , Fenômenos Fisiológicos Bacterianos , Redes e Vias Metabólicas , Metilglucosídeos/metabolismo , Estresse Oxidativo , Transportadores de Cassetes de Ligação de ATP/química , Transportadores de Cassetes de Ligação de ATP/genética , Transportadores de Cassetes de Ligação de ATP/metabolismo , Metabolismo dos Carboidratos , Regulação Bacteriana da Expressão Gênica , Modelos Biológicos , Modelos Moleculares , Ligação Proteica , Conformação Proteica , Relação Estrutura-Atividade , Enxofre/metabolismo

6.

Structural landscapes of PPI interfaces.

Rodrigues, Carlos H M; Pires, Douglas E V; Blundell, Tom L; Ascher, David B.

Brief Bioinform ; 23(4)2022 07 18.

Artigo em Inglês | MEDLINE | ID: mdl-35656714

RESUMO

Proteins are capable of highly specific interactions and are responsible for a wide range of functions, making them attractive in the pursuit of new therapeutic options. Previous studies focusing on overall geometry of protein-protein interfaces, however, concluded that PPI interfaces were generally flat. More recently, this idea has been challenged by their structural and thermodynamic characterisation, suggesting the existence of concave binding sites that are closer in character to traditional small-molecule binding sites, rather than exhibiting complete flatness. Here, we present a large-scale analysis of binding geometry and physicochemical properties of all protein-protein interfaces available in the Protein Data Bank. In this review, we provide a comprehensive overview of the protein-protein interface landscape, including evidence that even for overall larger, more flat interfaces that utilize discontinuous interacting regions, small and potentially druggable pockets are utilized at binding sites.

Assuntos

Proteínas , Sítios de Ligação , Bases de Dados de Proteínas , Ligação Proteica , Proteínas/química

7.

cropCSM: designing safe and potent herbicides with graph-based signatures.

Pires, Douglas E V; Stubbs, Keith A; Mylne, Joshua S; Ascher, David B.

Brief Bioinform ; 23(2)2022 03 10.

Artigo em Inglês | MEDLINE | ID: mdl-35211724

RESUMO

Herbicides have revolutionised weed management, increased crop yields and improved profitability allowing for an increase in worldwide food security. Their widespread use, however, has also led to a rise in resistance and concerns about their environmental impact. Despite the need for potent and safe herbicidal molecules, no herbicide with a new mode of action has reached the market in 30 years. Although development of computational approaches has proven invaluable to guide rational drug discovery pipelines, leading to higher hit rates and lower attrition due to poor toxicity, little has been done in contrast for herbicide design. To fill this gap, we have developed cropCSM, a computational platform to help identify new, potent, nontoxic and environmentally safe herbicides. By using a knowledge-based approach, we identified physicochemical properties and substructures enriched in safe herbicides. By representing the small molecules as a graph, we leveraged these insights to guide the development of predictive models trained and tested on the largest collected data set of molecules with experimentally characterised herbicidal profiles to date (over 4500 compounds). In addition, we developed six new environmental and human toxicity predictors, spanning five different species to assist in molecule prioritisation. cropCSM was able to correctly identify 97% of herbicides currently available commercially, while predicting toxicity profiles with accuracies of up to 92%. We believe cropCSM will be an essential tool for the enrichment of screening libraries and to guide the development of potent and safe herbicides. We have made the method freely available through a user-friendly webserver at http://biosig.unimelb.edu.au/crop_csm.

Assuntos

Herbicidas , Descoberta de Drogas , Herbicidas/química , Herbicidas/toxicidade , Humanos

8.

Systematic evaluation of computational tools to predict the effects of mutations on protein stability in the absence of experimental structures.

Pan, Qisheng; Nguyen, Thanh Binh; Ascher, David B; Pires, Douglas E V.

Brief Bioinform ; 23(2)2022 03 10.

Artigo em Inglês | MEDLINE | ID: mdl-35189634

RESUMO

Changes in protein sequence can have dramatic effects on how proteins fold, their stability and dynamics. Over the last 20 years, pioneering methods have been developed to try to estimate the effects of missense mutations on protein stability, leveraging growing availability of protein 3D structures. These, however, have been developed and validated using experimentally derived structures and biophysical measurements. A large proportion of protein structures remain to be experimentally elucidated and, while many studies have based their conclusions on predictions made using homology models, there has been no systematic evaluation of the reliability of these tools in the absence of experimental structural data. We have, therefore, systematically investigated the performance and robustness of ten widely used structural methods when presented with homology models built using templates at a range of sequence identity levels (from 15% to 95%) and contrasted performance with sequence-based tools, as a baseline. We found there is indeed performance deterioration on homology models built using templates with sequence identity below 40%, where sequence-based tools might become preferable. This was most marked for mutations in solvent exposed residues and stabilizing mutations. As structure prediction tools improve, the reliability of these predictors is expected to follow, however we strongly suggest that these factors should be taken into consideration when interpreting results from structure-based predictors of mutation effects on protein stability.

Assuntos

Biologia Computacional , Proteínas , Biologia Computacional/métodos , Bases de Dados de Proteínas , Mutação , Estabilidade Proteica , Proteínas/química , Proteínas/genética , Reprodutibilidade dos Testes

9.

epitope3D: a machine learning method for conformational B-cell epitope prediction.

da Silva, Bruna Moreira; Myung, YooChan; Ascher, David B; Pires, Douglas E V.

Brief Bioinform ; 23(1)2022 01 17.

Artigo em Inglês | MEDLINE | ID: mdl-34676398

RESUMO

The ability to identify antigenic determinants of pathogens, or epitopes, is fundamental to guide rational vaccine development and immunotherapies, which are particularly relevant for rapid pandemic response. A range of computational tools has been developed over the past two decades to assist in epitope prediction; however, they have presented limited performance and generalization, particularly for the identification of conformational B-cell epitopes. Here, we present epitope3D, a novel scalable machine learning method capable of accurately identifying conformational epitopes trained and evaluated on the largest curated epitope data set to date. Our method uses the concept of graph-based signatures to model epitope and non-epitope regions as graphs and extract distance patterns that are used as evidence to train and test predictive models. We show epitope3D outperforms available alternative approaches, achieving Mathew's Correlation Coefficient and F1-scores of 0.55 and 0.57 on cross-validation and 0.45 and 0.36 during independent blind tests, respectively.

Assuntos

Epitopos de Linfócito B , Máquina de Vetores de Suporte , Biologia Computacional/métodos , Aprendizado de Máquina , Conformação Molecular

10.

CSM-carbohydrate: protein-carbohydrate binding affinity prediction and docking scoring function.

Nguyen, Thanh Binh; Pires, Douglas E V; Ascher, David B.

Brief Bioinform ; 23(1)2022 01 17.

Artigo em Inglês | MEDLINE | ID: mdl-34882232

RESUMO

Protein-carbohydrate interactions are crucial for many cellular processes but can be challenging to biologically characterise. To improve our understanding and ability to model these molecular interactions, we used a carefully curated set of 370 protein-carbohydrate complexes with experimental structural and biophysical data in order to train and validate a new tool, cutoff scanning matrix (CSM)-carbohydrate, using machine learning algorithms to accurately predict their binding affinity and rank docking poses as a scoring function. Information on both protein and carbohydrate complementarity, in terms of shape and chemistry, was captured using graph-based structural signatures. Across both training and independent test sets, we achieved comparable Pearson's correlations of 0.72 under cross-validation [root mean square error (RMSE) of 1.58 Kcal/mol] and 0.67 on the independent test (RMSE of 1.72 Kcal/mol), providing confidence in the generalisability and robustness of the final model. Similar performance was obtained across mono-, di- and oligosaccharides, further highlighting the applicability of this approach to the study of larger complexes. We show CSM-carbohydrate significantly outperformed previous approaches and have implemented our method and make all data freely available through both a user-friendly web interface and application programming interface, to facilitate programmatic access at http://biosig.unimelb.edu.au/csm_carbohydrate/. We believe CSM-carbohydrate will be an invaluable tool for helping assess docking poses and the effects of mutations on protein-carbohydrate affinity, unravelling important aspects that drive binding recognition.

Assuntos

Proteínas , Software , Carboidratos , Ligantes , Aprendizado de Máquina , Simulação de Acoplamento Molecular , Ligação Proteica , Proteínas/química

11.

Evaluating hierarchical machine learning approaches to classify biological databases.

Rezende, Pâmela M; Xavier, Joicymara S; Ascher, David B; Fernandes, Gabriel R; Pires, Douglas E V.

Brief Bioinform ; 23(4)2022 07 18.

Artigo em Inglês | MEDLINE | ID: mdl-35724625

RESUMO

The rate of biological data generation has increased dramatically in recent years, which has driven the importance of databases as a resource to guide innovation and the generation of biological insights. Given the complexity and scale of these databases, automatic data classification is often required. Biological data sets are often hierarchical in nature, with varying degrees of complexity, imposing different challenges to train, test and validate accurate and generalizable classification models. While some approaches to classify hierarchical data have been proposed, no guidelines regarding their utility, applicability and limitations have been explored or implemented. These include 'Local' approaches considering the hierarchy, building models per level or node, and 'Global' hierarchical classification, using a flat classification approach. To fill this gap, here we have systematically contrasted the performance of 'Local per Level' and 'Local per Node' approaches with a 'Global' approach applied to two different hierarchical datasets: BioLip and CATH. The results show how different components of hierarchical data sets, such as variation coefficient and prediction by depth, can guide the choice of appropriate classification schemes. Finally, we provide guidelines to support this process when embarking on a hierarchical classification task, which will help optimize computational resources and predictive performance.

Assuntos

Aprendizado Profundo , Algoritmos , Bases de Dados Factuais

12.

toxCSM: comprehensive prediction of small molecule toxicity profiles.

de Sá, Alex G C; Long, Yangyang; Portelli, Stephanie; Pires, Douglas E V; Ascher, David B.

Brief Bioinform ; 23(5)2022 09 20.

Artigo em Inglês | MEDLINE | ID: mdl-35998885

RESUMO

Drug discovery is a lengthy, costly and high-risk endeavour that is further convoluted by high attrition rates in later development stages. Toxicity has been one of the main causes of failure during clinical trials, increasing drug development time and costs. To facilitate early identification and optimisation of toxicity profiles, several computational tools emerged aiming at improving success rates by timely pre-screening drug candidates. Despite these efforts, there is an increasing demand for platforms capable of assessing both environmental as well as human-based toxicity properties at large scale. Here, we present toxCSM, a comprehensive computational platform for the study and optimisation of toxicity profiles of small molecules. toxCSM leverages on the well-established concepts of graph-based signatures, molecular descriptors and similarity scores to develop 36 models for predicting a range of toxicity properties, which can assist in developing safer drugs and agrochemicals. toxCSM achieved an Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) of up to 0.99 and Pearson's correlation coefficients of up to 0.94 on 10-fold cross-validation, with comparable performance on blind test sets, outperforming all alternative methods. toxCSM is freely available as a user-friendly web server and API at http://biosig.lab.uq.edu.au/toxcsm.

Assuntos

Agroquímicos , Descoberta de Drogas , Descoberta de Drogas/métodos , Humanos , Curva ROC

13.

GASS-Metal: identifying metal-binding sites on protein structures using genetic algorithms.

Paiva, Vinícius A; Mendonça, Murillo V; Silveira, Sabrina A; Ascher, David B; Pires, Douglas E V; Izidoro, Sandro C.

Brief Bioinform ; 23(5)2022 09 20.

Artigo em Inglês | MEDLINE | ID: mdl-35595534

RESUMO

Metals are present in >30% of proteins found in nature and assist them to perform important biological functions, including storage, transport, signal transduction and enzymatic activity. Traditional and experimental techniques for metal-binding site prediction are usually costly and time-consuming, making computational tools that can assist in these predictions of significant importance. Here we present Genetic Active Site Search (GASS)-Metal, a new method for protein metal-binding site prediction. The method relies on a parallel genetic algorithm to find candidate metal-binding sites that are structurally similar to curated templates from M-CSA and MetalPDB. GASS-Metal was thoroughly validated using homologous proteins and conservative mutations of residues, showing a robust performance. The ability of GASS-Metal to identify metal-binding sites was also compared with state-of-the-art methods, outperforming similar methods and achieving an MCC of up to 0.57 and detecting up to 96.1% of the sites correctly. GASS-Metal is freely available at https://gassmetal.unifei.edu.br. The GASS-Metal source code is available at https://github.com/sandroizidoro/gassmetal-local.

Assuntos

Proteínas , Software , Algoritmos , Sítios de Ligação , Domínio Catalítico , Metais/química , Metais/metabolismo , Proteínas/química

14.

Understanding the complementarity and plasticity of antibody-antigen interfaces.

Myung, Yoochan; Pires, Douglas E V; Ascher, David B.

Bioinformatics ; 39(7)2023 07 01.

Artigo em Inglês | MEDLINE | ID: mdl-37382557

RESUMO

MOTIVATION: While antibodies have been ground-breaking therapeutic agents, the structural determinants for antibody binding specificity remain to be fully elucidated, which is compounded by the virtually unlimited repertoire of antigens they can recognize. Here, we have explored the structural landscapes of antibody-antigen interfaces to identify the structural determinants driving target recognition by assessing concavity and interatomic interactions. RESULTS: We found that complementarity-determining regions utilized deeper concavity with their longer H3 loops, especially H3 loops of nanobody showing the deepest use of concavity. Of all amino acid residues found in complementarity-determining regions, tryptophan used deeper concavity, especially in nanobodies, making it suitable for leveraging concave antigen surfaces. Similarly, antigens utilized arginine to bind to deeper pockets of the antibody surface. Our findings fill a gap in knowledge about the antibody specificity, binding affinity, and the nature of antibody-antigen interface features, which will lead to a better understanding of how antibodies can be more effective to target druggable sites on antigen surfaces. AVAILABILITY AND IMPLEMENTATION: The data and scripts are available at: https://github.com/YoochanMyung/scripts.

Assuntos

Anticorpos , Regiões Determinantes de Complementaridade , Regiões Determinantes de Complementaridade/química , Anticorpos/química , Antígenos , Especificidade de Anticorpos , Sítios de Ligação de Anticorpos

15.

LEGO-CSM: a tool for functional characterization of proteins.

Nguyen, Thanh Binh; de Sá, Alex G C; Rodrigues, Carlos H M; Pires, Douglas E V; Ascher, David B.

Bioinformatics ; 39(7)2023 07 01.

Artigo em Inglês | MEDLINE | ID: mdl-37382560

RESUMO

MOTIVATION: With the development of sequencing techniques, the discovery of new proteins significantly exceeds the human capacity and resources for experimentally characterizing protein functions. Localization, EC numbers, and GO terms with the structure-based Cutoff Scanning Matrix (LEGO-CSM) is a comprehensive web-based resource that fills this gap by leveraging the well-established and robust graph-based signatures to supervised learning models using both protein sequence and structure information to accurately model protein function in terms of Subcellular Localization, Enzyme Commission (EC) numbers, and Gene Ontology (GO) terms. RESULTS: We show our models perform as well as or better than alternative approaches, achieving area under the receiver operating characteristic curve of up to 0.93 for subcellular localization, up to 0.93 for EC, and up to 0.81 for GO terms on independent blind tests. AVAILABILITY AND IMPLEMENTATION: LEGO-CSM's web server is freely available at https://biosig.lab.uq.edu.au/lego_csm. In addition, all datasets used to train and test LEGO-CSM's models can be downloaded at https://biosig.lab.uq.edu.au/lego_csm/data.

Assuntos

Proteínas , Software , Humanos , Proteínas/química

16.

GRaSP-web: a machine learning strategy to predict binding sites based on residue neighborhood graphs.

Santana, Charles A; Izidoro, Sandro C; de Melo-Minardi, Raquel C; Tyzack, Jonathan D; Ribeiro, António J M; Pires, Douglas E V; Thornton, Janet M; de A Silveira, Sabrina.

Nucleic Acids Res ; 50(W1): W392-W397, 2022 07 05.

Artigo em Inglês | MEDLINE | ID: mdl-35524575

RESUMO

Proteins are essential macromolecules for the maintenance of living systems. Many of them perform their function by interacting with other molecules in regions called binding sites. The identification and characterization of these regions are of fundamental importance to determine protein function, being a fundamental step in processes such as drug design and discovery. However, identifying such binding regions is not trivial due to the drawbacks of experimental methods, which are costly and time-consuming. Here we propose GRaSP-web, a web server that uses GRaSP (Graph-based Residue neighborhood Strategy to Predict binding sites), a residue-centric method based on graphs that uses machine learning to predict putative ligand binding site residues. The method outperformed 6 state-of-the-art residue-centric methods (MCC of 0.61). Also, GRaSP-web is scalable as it takes 10-20 seconds to predict binding sites for a protein complex (the state-of-the-art residue-centric method takes 2-5h on the average). It proved to be consistent in predicting binding sites for bound/unbound structures (MCC 0.61 for both) and for a large dataset of multi-chain proteins (4500 entries, MCC 0.61). GRaSPWeb is freely available at https://grasp.ufv.br.

Assuntos

Aprendizado de Máquina , Proteínas , Proteínas/química , Sítios de Ligação , Ligantes , Domínios Proteicos , Ligação Proteica

17.

CSM-AB: graph-based antibody-antigen binding affinity prediction and docking scoring function.

Myung, Yoochan; Pires, Douglas E V; Ascher, David B.

Bioinformatics ; 38(4): 1141-1143, 2022 01 27.

Artigo em Inglês | MEDLINE | ID: mdl-34734992

RESUMO

MOTIVATION: Understanding antibody-antigen interactions is key to improving their binding affinities and specificities. While experimental approaches are fundamental for developing new therapeutics, computational methods can provide quick assessment of binding landscapes, guiding experimental design. Despite this, little effort has been devoted to accurately predicting the binding affinity between antibodies and antigens and to develop tailored docking scoring functions for this type of interaction. Here, we developed CSM-AB, a machine learning method capable of predicting antibody-antigen binding affinity by modelling interaction interfaces as graph-based signatures. RESULTS: CSM-AB outperformed alternative methods achieving a Pearson's correlation of up to 0.64 on blind tests. We also show CSM-AB can accurately rank near-native poses, working effectively as a docking scoring function. We believe CSM-AB will be an invaluable tool to assist in the development of new immunotherapies. AVAILABILITY AND IMPLEMENTATION: CSM-AB is freely available as a user-friendly web interface and API at http://biosig.unimelb.edu.au/csm_ab/datasets. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Anticorpos , Software , Antígenos , Aprendizado de Máquina , Ligação Proteica , Simulação de Acoplamento Molecular

18.

embryoTox: Using Graph-Based Signatures to Predict the Teratogenicity of Small Molecules.

Aljarf, Raghad; Tang, Simon; Pires, Douglas E V; Ascher, David B.

J Chem Inf Model ; 63(2): 432-441, 2023 01 23.

Artigo em Inglês | MEDLINE | ID: mdl-36595441

RESUMO

Teratogenic drugs can lead to extreme fetal malformation and consequently critically influence the fetus's health, yet the teratogenic risks associated with most approved drugs are unknown. Here, we propose a novel predictive tool, embryoTox, which utilizes a graph-based signature representation of the chemical structure of a small molecule to predict and classify molecules likely to be safe during pregnancy. embryoTox was trained and validated using in vitro bioactivity data of over 700 small molecules with characterized teratogenicity effects. Our final model achieved an area under the receiver operating characteristic curve (AUC) of up to 0.96 on 10-fold cross-validation and 0.82 on nonredundant blind tests, outperforming alternative approaches. We believe that our predictive tool will provide a practical resource for optimizing screening libraries to determine effective and safe molecules to use during pregnancy. To provide a simple and integrated platform to rapidly screen for potential safe molecules and their risk factors, we made embryoTox freely available online at https://biosig.lab.uq.edu.au/embryotox/.

Assuntos

Projetos de Pesquisa , Gravidez , Feminino , Humanos , Curva ROC

19.

Data-driven overdiagnosis definitions: A scoping review.

Senevirathna, Prabodi; Pires, Douglas E V; Capurro, Daniel.

J Biomed Inform ; 147: 104506, 2023 11.

Artigo em Inglês | MEDLINE | ID: mdl-37769829

RESUMO

INTRODUCTION: Adequate methods to promptly translate digital health innovations for improved patient care are essential. Advances in Artificial Intelligence (AI) and Machine Learning (ML) have been sources of digital innovation and hold the promise to revolutionize the way we treat, manage and diagnose patients. Understanding the benefits but also the potential adverse effects of digital health innovations, particularly when these are made available or applied on healthier segments of the population is essential. One of such adverse effects is overdiagnosis. OBJECTIVE: to comprehensively analyze quantification strategies and data-driven definitions for overdiagnosis reported in the literature. METHODS: we conducted a scoping systematic review of manuscripts describing quantitative methods to estimate the proportion of overdiagnosed patients. RESULTS: we identified 46 studies that met our inclusion criteria. They covered a variety of clinical conditions, primarily breast and prostate cancer. Methods to quantify overdiagnosis included both prospective and retrospective methods including randomized clinical trials, and simulations. CONCLUSION: a variety of methods to quantify overdiagnosis have been published, producing widely diverging results. A standard method to quantify overdiagnosis is needed to allow its mitigation during the rapidly increasing development of new digital diagnostic tools.

Assuntos

Inteligência Artificial , Neoplasias da Próstata , Masculino , Humanos , Estudos Retrospectivos , Sobrediagnóstico , Estudos Prospectivos , Neoplasias da Próstata/diagnóstico

20.

Developing a deep learning natural language processing algorithm for automated reporting of adverse drug reactions.

McMaster, Christopher; Chan, Julia; Liew, David F L; Su, Elizabeth; Frauman, Albert G; Chapman, Wendy W; Pires, Douglas E V.

J Biomed Inform ; 137: 104265, 2023 01.

Artigo em Inglês | MEDLINE | ID: mdl-36464227

RESUMO

The detection of adverse drug reactions (ADRs) is critical to our understanding of the safety and risk-benefit profile of medications. With an incidence that has not changed over the last 30 years, ADRs are a significant source of patient morbidity, responsible for 5%-10% of acute care hospital admissions worldwide. Spontaneous reporting of ADRs has long been the standard method of reporting, however this approach is known to have high rates of under-reporting, a problem that limits pharmacovigilance efforts. Automated ADR reporting presents an alternative pathway to increase reporting rates, although this may be limited by over-reporting of other drug-related adverse events. We developed a deep learning natural language processing algorithm to identify ADRs in discharge summaries at a single academic hospital centre. Our model was developed in two stages: first, a pre-trained model (DeBERTa) was further pre-trained on 1.1 million unlabelled clinical documents; secondly, this model was fine-tuned to detect ADR mentions in a corpus of 861 annotated discharge summaries. This model was compared to a version without the pre-training step, and a previously published RoBERTa model pretrained on MIMIC III, which has demonstrated strong performance on other pharmacovigilance tasks. To ensure that our algorithm could differentiate ADRs from other drug-related adverse events, the annotated corpus was enriched for both validated ADR reports and confounding drug-related adverse events using. The final model demonstrated good performance with a ROC-AUC of 0.955 (95% CI 0.933 - 0.978) for the task of identifying discharge summaries containing ADR mentions, significantly outperforming the two comparator models.

Assuntos

Aprendizado Profundo , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Processamento de Linguagem Natural , Sistemas de Notificação de Reações Adversas a Medicamentos , Algoritmos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/epidemiologia , Farmacovigilância

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA