RESUMEN
The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. Here, we report recent developments with InterPro (version 90.0) and its associated software, including updates to data content and to the website. These developments extend and enrich the information provided by InterPro, and provide a more user friendly access to the data. Additionally, we have worked on adding Pfam website features to the InterPro website, as the Pfam website will be retired in late 2022. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB. Moreover, we report the development of a card game as a method of engaging the non-scientific community. Finally, we discuss the benefits and challenges brought by the use of artificial intelligence for protein structure prediction.
Asunto(s)
Bases de Datos de Proteínas , Humanos , Secuencia de Aminoácidos , Inteligencia Artificial , Internet , Proteínas/química , Programas InformáticosRESUMEN
SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) is a novel virus of the family Coronaviridae. The virus causes the infectious disease COVID-19. The biology of coronaviruses has been studied for many years. However, bioinformatics tools designed explicitly for SARS-CoV-2 have only recently been developed as a rapid reaction to the need for fast detection, understanding and treatment of COVID-19. To control the ongoing COVID-19 pandemic, it is of utmost importance to get insight into the evolution and pathogenesis of the virus. In this review, we cover bioinformatics workflows and tools for the routine detection of SARS-CoV-2 infection, the reliable analysis of sequencing data, the tracking of the COVID-19 pandemic and evaluation of containment measures, the study of coronavirus evolution, the discovery of potential drug targets and development of therapeutic strategies. For each tool, we briefly describe its use case and how it advances research specifically for SARS-CoV-2. All tools are free to use and available online, either through web applications or public code repositories. Contact:evbc@unj-jena.de.
Asunto(s)
COVID-19/prevención & control , Biología Computacional , SARS-CoV-2/aislamiento & purificación , Investigación Biomédica , COVID-19/epidemiología , COVID-19/virología , Genoma Viral , Humanos , Pandemias , SARS-CoV-2/genéticaRESUMEN
The Pfam database is a widely used resource for classifying protein sequences into families and domains. Since Pfam was last described in this journal, over 350 new families have been added in Pfam 33.1 and numerous improvements have been made to existing entries. To facilitate research on COVID-19, we have revised the Pfam entries that cover the SARS-CoV-2 proteome, and built new entries for regions that were not covered by Pfam. We have reintroduced Pfam-B which provides an automatically generated supplement to Pfam and contains 136 730 novel clusters of sequences that are not yet matched by a Pfam family. The new Pfam-B is based on a clustering by the MMseqs2 software. We have compared all of the regions in the RepeatsDB to those in Pfam and have started to use the results to build and refine Pfam repeat families. Pfam is freely available for browsing and download at http://pfam.xfam.org/.
Asunto(s)
Biología Computacional/estadística & datos numéricos , Bases de Datos de Proteínas , Proteínas/metabolismo , Proteoma/metabolismo , Animales , COVID-19/epidemiología , COVID-19/prevención & control , COVID-19/virología , Biología Computacional/métodos , Epidemias , Humanos , Internet , Modelos Moleculares , Estructura Terciaria de Proteína , Proteínas/química , Proteínas/genética , Proteoma/clasificación , Proteoma/genética , Secuencias Repetitivas de Aminoácido/genética , SARS-CoV-2/genética , SARS-CoV-2/fisiología , Análisis de Secuencia de Proteína/métodosRESUMEN
The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. InterProScan is the underlying software that allows protein and nucleic acid sequences to be searched against InterPro's signatures. Signatures are predictive models which describe protein families, domains or sites, and are provided by multiple databases. InterPro combines signatures representing equivalent families, domains or sites, and provides additional information such as descriptions, literature references and Gene Ontology (GO) terms, to produce a comprehensive resource for protein classification. Founded in 1999, InterPro has become one of the most widely used resources for protein family annotation. Here, we report the status of InterPro (version 81.0) in its 20th year of operation, and its associated software, including updates to database content, the release of a new website and REST API, and performance improvements in InterProScan.
Asunto(s)
Bases de Datos de Proteínas , Proteínas/química , Secuencia de Aminoácidos , COVID-19/metabolismo , Internet , Anotación de Secuencia Molecular , Dominios Proteicos , Mapas de Interacción de Proteínas , SARS-CoV-2/metabolismo , Alineación de SecuenciaRESUMEN
Classification of protein domains based on homology and structural similarity serves as a fundamental tool to gain biological insights into protein function. Recent advancements in protein structure prediction, exemplified by AlphaFold, have revolutionized the availability of protein structural data. We focus on classifying about 9000 Pfam families into ECOD (Evolutionary Classification of Domains) by using predicted AlphaFold models and the DPAM (Domain Parser for AlphaFold Models) tool. Our results offer insights into their homologous relationships and domain boundaries. More than half of these Pfam families contain DPAM domains that can be confidently assigned to the ECOD hierarchy. Most assigned domains belong to highly populated folds such as Immunoglobulin-like (IgL), Armadillo (ARM), helix-turn-helix (HTH), and Src homology 3 (SH3). A large fraction of DPAM domains, however, cannot be confidently assigned to ECOD homologous groups. These unassigned domains exhibit statistically different characteristics, including shorter average length, fewer secondary structure elements, and more abundant transmembrane segments. They could potentially define novel families remotely related to domains with known structures or novel superfamilies and folds. Manual scrutiny of a subset of these domains revealed an abundance of internal duplications and recurring structural motifs. Exploring sequence and structural features such as disulfide bond patterns, metal-binding sites, and enzyme active sites helped uncover novel structural folds as well as remote evolutionary relationships. By bridging the gap between sequence-based Pfam and structure-based ECOD domain classifications, our study contributes to a more comprehensive understanding of the protein universe by providing structural and functional insights into previously uncharacterized proteins.
RESUMEN
BACKGROUND: Since their introduction in the virtual screening field, Receiver Operating Characteristic (ROC) curve-derived metrics have been widely used for benchmarking of computational methods and algorithms intended for virtual screening applications. Whereas in classification problems, the ratio between sensitivity and specificity for a given score value is very informative, a practical concern in virtual screening campaigns is to predict the actual probability that a predicted hit will prove truly active when submitted to experimental testing (in other words, the Positive Predictive Value - PPV). Estimation of such probability is however, obstructed due to its dependency on the yield of actives of the screened library, which cannot be known a priori. OBJECTIVE: To explore the use of PPV surfaces derived from simulated ranking experiments (retrospective virtual screening) as a complementary tool to ROC curves, for both benchmarking and optimization of score cutoff values. METHODS: The utility of the proposed approach is assessed in retrospective virtual screening experiments with four datasets used to infer QSAR classifiers: inhibitors of Trypanosoma cruzi trypanothione synthetase; inhibitors of Trypanosoma brucei N-myristoyltransferase; inhibitors of GABA transaminase and anticonvulsant activity in the 6 Hz seizure model. RESULTS: Besides illustrating the utility of PPV surfaces to compare the performance of machine learning models for virtual screening applications and to select an adequate score threshold, our results also suggest that ensemble learning provides models with better predictivity and more robust behavior. CONCLUSION: PPV surfaces are valuable tools to assess virtual screening tools and choose score thresholds to be applied in prospective in silico screens. Ensemble learning approaches seem to consistently lead to improved predictivity and robustness.
Asunto(s)
Aprendizaje Automático , Relación Estructura-Actividad Cuantitativa , 4-Aminobutirato Transaminasa/antagonistas & inhibidores , 4-Aminobutirato Transaminasa/metabolismo , Animales , Anticonvulsivantes/química , Anticonvulsivantes/uso terapéutico , Área Bajo la Curva , Proteínas Protozoarias/antagonistas & inhibidores , Proteínas Protozoarias/metabolismo , Curva ROC , Convulsiones/tratamiento farmacológico , Convulsiones/patología , Trypanosoma/metabolismoRESUMEN
Malaria is among the leading causes of death worldwide. The emergence of Plasmodium falciparum resistant strains with reduced sensitivity to the first line combination therapy and suboptimal responses to insecticides used for Anopheles vector management have led to renewed interest in novel therapeutic options. Here, we report the development and validation of an ensemble of ligand-based computational models capable of identifying falcipain-2 inhibitors, and their subsequent application in the virtual screening of DrugBank and Sweetlead libraries. Among four hits submitted to enzymatic assays, two (odanacatib, an abandoned investigational treatment for osteoporosis and bone metastasis, and the antibiotic methacycline) confirmed inhibitory effects on falcipain-2, with Ki of 98.2 nM and 84.4 µM. Interestingly, Methacycline proved to be a non-competitive inhibitor (α = 1.42) of falcipain-2. The effects of both hits on falcipain-2 hemoglobinase activity and on the development of P. falciparum were also studied.
RESUMEN
Much interest has been paid in the last decade on molecular predictors of promiscuity, including molecular weight, log P, molecular complexity, acidity constant and molecular topology, with correlations between promiscuity and those descriptors seemingly being context-dependent. It has been observed that certain therapeutic categories (e.g. mood disorders therapies) display a tendency to include multi-target agents (i.e. selective non-selectivity). Numerous QSAR models based on topological descriptors suggest that the topology of a given drug could be used to infer its therapeutic applications. Here, we have used descriptive statistics to explore the distribution of molecular topology descriptors and other promiscuity predictors across different therapeutic categories. Working with the publicly available ChEMBL database and 14 molecular descriptors, both hierarchical and non-hierchical clustering methods were applied to the descriptors mean values of the therapeutic categories after the refinement of the database (770 drugs grouped into 34 therapeutic categories). On the other hand, another publicly available database (repoDB) was used to retrieve cases of clinically-approved drug repositioning examples that could be classified into the therapeutic categories considered by the aforementioned clusters (111 cases), and the correspondence between the two studies was evaluated. Interestingly, a 3- cluster hierarchical clustering scheme based on only 14 molecular descriptors linked to promiscuity seem to explain up to 82.9% of approved cases of drug repurposing retrieved of repoDB. Therapeutic categories seem to display distinctive molecular patterns, which could be used as a basis for drug screening and drug design campaigns, and to unveil drug repurposing opportunities between particular therapeutic categories.
Asunto(s)
Diseño de Fármacos , Reposicionamiento de Medicamentos , Modelos Químicos , Bases de Datos Factuales , Descubrimiento de Drogas , Humanos , Relación Estructura-ActividadRESUMEN
Bisphosphonates such as alendronate are antiosteoporotic drugs that inhibit the activity of bone-resorbing osteoclasts and secondarily promote osteoblastic function. Diabetes increases bone-matrix-associated advanced glycation end products (AGEs) that impair bone marrow progenitor cell (BMPC) osteogenic potential and decrease bone quality. Here we investigated the in vitro effect of alendronate and/or AGEs on the osteoblastogenic, adipogenic, and chondrogenic potential of BMPC isolated from nondiabetic untreated rats. We also evaluated the in vivo effect of alendronate (administered orally to rats with insulin-deficient Diabetes) on long-bone microarchitecture and BMPC multilineage potential. In vitro, the osteogenesis (Runx2, alkaline phosphatase, type 1 collagen, and mineralization) and chondrogenesis (glycosaminoglycan production) of BMPC were both decreased by AGEs, while coincubation with alendronate prevented these effects. The adipogenesis of BMPC (PPARγ, intracellular triglycerides, and lipase) was increased by AGEs, and this was prevented by coincubation with alendronate. In vivo, experimental Diabetes (a) decreased femoral trabecular bone area, osteocyte density, and osteoclastic TRAP activity; (b) increased bone marrow adiposity; and (c) deregulated BMPC phenotypic potential (increasing adipogenesis and decreasing osteogenesis and chondrogenesis). Orally administered alendronate prevented all these Diabetes-induced effects on bone. Thus, alendronate could improve bone alterations in diabetic rats by preventing the antiosteogenic, antichondrogenic, and proadipocytic effects of AGEs on BMPC.
Asunto(s)
Alendronato/administración & dosificación , Células de la Médula Ósea/citología , Diferenciación Celular/efectos de los fármacos , Diabetes Mellitus Experimental/tratamiento farmacológico , Adipogénesis/efectos de los fármacos , Animales , Células de la Médula Ósea/efectos de los fármacos , Células de la Médula Ósea/metabolismo , Regeneración Ósea/efectos de los fármacos , Condrogénesis/efectos de los fármacos , Diabetes Mellitus Experimental/patología , Productos Finales de Glicación Avanzada/administración & dosificación , Productos Finales de Glicación Avanzada/metabolismo , Humanos , Osteogénesis/efectos de los fármacos , RatasRESUMEN
AIMS: Diabetes mellitus is associated with metabolic bone disease and increased low-impact fractures. The insulin-sensitizer metformin possesses in vitro, in vivo and ex vivo osteogenic effects, although this has not been adequately studied in the context of diabetes. We evaluated the effect of insulin-deficient diabetes and/or metformin on bone microarchitecture, on osteogenic potential of bone marrow progenitor cells (BMPC) and possible mechanisms involved. METHODS: Partially insulin-deficient diabetes was induced in rats by nicotinamide/streptozotocin-injection, with or without oral metformin treatment. Femoral metaphysis micro-architecture, ex vivo osteogenic potential of BMPC, and BMPC expression of Runx-2, PPARγ and receptor for advanced glycation endproducts (RAGE) were investigated. RESULTS: Histomorphometric analysis of diabetic femoral metaphysis demonstrated a slight decrease in trabecular area and a significant reduction in osteocyte density, growth plate height and TRAP (tartrate-resistant acid phosphatase) activity in the primary spongiosa. BMPC obtained from diabetic animals showed a reduction in Runx-2/PPARγ ratio and in their osteogenic potential, and an increase in RAGE expression. Metformin treatment prevented the diabetes-induced alterations in bone micro-architecture and BMPC osteogenic potential. CONCLUSION: Partially insulin-deficient diabetes induces deleterious effects on long-bone micro-architecture that are associated with a decrease in BMPC osteogenic potential, which could be mediated by a decrease in their Runx-2/PPARγ ratio and up-regulation of RAGE. These diabetes-induced alterations can be totally or partially prevented by oral administration of metformin.