Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
1.
Brain ; 146(2): 519-533, 2023 02 13.
Artículo en Inglés | MEDLINE | ID: mdl-36256779

RESUMEN

Neurodevelopmental disorders (NDDs), including severe paediatric epilepsy, autism and intellectual disabilities are heterogeneous conditions in which clinical genetic testing can often identify a pathogenic variant. For many of them, genetic therapies will be tested in this or the coming years in clinical trials. In contrast to first-generation symptomatic treatments, the new disease-modifying precision medicines require a genetic test-informed diagnosis before a patient can be enrolled in a clinical trial. However, even in 2022, most identified genetic variants in NDD genes are 'variants of uncertain significance'. To safely enrol patients in precision medicine clinical trials, it is important to increase our knowledge about which regions in NDD-associated proteins can 'tolerate' missense variants and which ones are 'essential' and will cause a NDD when mutated. In addition, knowledge about functionally indispensable regions in the 3D structure context of proteins can also provide insights into the molecular mechanisms of disease variants. We developed a novel consensus approach that overlays evolutionary, and population based genomic scores to identify 3D essential sites (Essential3D) on protein structures. After extensive benchmarking of AlphaFold predicted and experimentally solved protein structures, we generated the currently largest expert curated protein structure set for 242 NDDs and identified 14 377 Essential3D sites across 189 gene disorders associated proteins. We demonstrate that the consensus annotation of Essential3D sites improves prioritization of disease mutations over single annotations. The identified Essential3D sites were enriched for functional features such as intermembrane regions or active sites and discovered key inter-molecule interactions in protein complexes that were otherwise not annotated. Using the currently largest autism, developmental disorders, and epilepsies exome sequencing studies including >360 000 NDD patients and population controls, we found that missense variants at Essential3D sites are 8-fold enriched in patients. In summary, we developed a comprehensive protein structure set for 242 NDDs and identified 14 377 Essential3D sites in these. All data are available at https://es-ndd.broadinstitute.org for interactive visual inspection to enhance variant interpretation and development of mechanistic hypotheses for 242 NDDs genes. The provided resources will enhance clinical variant interpretation and in silico drug target development for NDD-associated genes and encoded proteins.


Asunto(s)
Discapacidad Intelectual , Trastornos del Neurodesarrollo , Humanos , Niño , Trastornos del Neurodesarrollo/genética , Pruebas Genéticas , Mutación/genética , Discapacidad Intelectual/genética , Mutación Missense
2.
Nucleic Acids Res ; 50(W1): W593-W597, 2022 07 05.
Artículo en Inglés | MEDLINE | ID: mdl-35609995

RESUMEN

Knowledge of protein-ligand binding sites (LBSs) enables research ranging from protein function annotation to structure-based drug design. To this end, we have previously developed a stand-alone tool, P2Rank, and the web server PrankWeb (https://prankweb.cz/) for fast and accurate LBS prediction. Here, we present significant enhancements to PrankWeb. First, a new, more accurate evolutionary conservation estimation pipeline based on the UniRef50 sequence database and the HMMER3 package is introduced. Second, PrankWeb now allows users to enter UniProt ID to carry out LBS predictions in situations where no experimental structure is available by utilizing the AlphaFold model database. Additionally, a range of minor improvements has been implemented. These include the ability to deploy PrankWeb and P2Rank as Docker containers, support for the mmCIF file format, improved public REST API access, or the ability to batch download the LBS predictions for the whole PDB archive and parts of the AlphaFold database.


Asunto(s)
Proteínas , Programas Informáticos , Ligandos , Proteínas/química , Sitios de Unión , Unión Proteica , Dominios Proteicos , Bases de Datos de Proteínas , Internet
3.
Bioinformatics ; 38(24): 5452-5453, 2022 12 13.
Artículo en Inglés | MEDLINE | ID: mdl-36282546

RESUMEN

SUMMARY: Understanding the mechanism of action of a protein or designing better ligands for it, often requires access to a bound (holo) and an unbound (apo) state of the protein. Resources for the quick and easy retrieval of such conformations are severely limited. Apo-Holo Juxtaposition (AHoJ), is a web application for retrieving apo-holo structure pairs for user-defined ligands. Given a query structure and one or more user-specified ligands, it retrieves all other structures of the same protein that feature the same binding site(s), aligns them, and examines the superimposed binding sites to determine whether each structure is apo or holo, in reference to the query. The resulting superimposed datasets of apo-holo pairs can be visualized and downloaded for further analysis. AHoJ accepts multiple input queries, allowing the creation of customized apo-holo datasets. AVAILABILITY AND IMPLEMENTATION: Freely available for non-commercial use at http://apoholo.cz. Source code available at https://github.com/cusbg/AHoJ-project. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Proteínas , Programas Informáticos , Conformación Proteica , Ligandos , Proteínas/química , Sitios de Unión
4.
Proc Natl Acad Sci U S A ; 117(45): 28201-28211, 2020 11 10.
Artículo en Inglés | MEDLINE | ID: mdl-33106425

RESUMEN

Interpretation of the colossal number of genetic variants identified from sequencing applications is one of the major bottlenecks in clinical genetics, with the inference of the effect of amino acid-substituting missense variations on protein structure and function being especially challenging. Here we characterize the three-dimensional (3D) amino acid positions affected in pathogenic and population variants from 1,330 disease-associated genes using over 14,000 experimentally solved human protein structures. By measuring the statistical burden of variations (i.e., point mutations) from all genes on 40 3D protein features, accounting for the structural, chemical, and functional context of the variations' positions, we identify features that are generally associated with pathogenic and population missense variants. We then perform the same amino acid-level analysis individually for 24 protein functional classes, which reveals unique characteristics of the positions of the altered amino acids: We observe up to 46% divergence of the class-specific features from the general characteristics obtained by the analysis on all genes, which is consistent with the structural diversity of essential regions across different protein classes. We demonstrate that the function-specific 3D features of the variants match the readouts of mutagenesis experiments for BRCA1 and PTEN, and positively correlate with an independent set of clinically interpreted pathogenic and benign missense variants. Finally, we make our results available through a web server to foster accessibility and downstream research. Our findings represent a crucial step toward translational genetics, from highlighting the impact of mutations on protein structure to rationalizing the variants' pathogenicity in terms of the perturbed molecular mechanisms.


Asunto(s)
Mutación Missense/genética , Proteínas/química , Proteínas/genética , Secuencia de Aminoácidos , Proteína BRCA1/química , Proteína BRCA1/genética , Biología Computacional/métodos , Humanos , Aprendizaje Automático , Modelos Moleculares , Mutación Missense/fisiología , Fosfohidrolasa PTEN/química , Fosfohidrolasa PTEN/genética , Conformación Proteica , Proteínas/fisiología
5.
Brief Bioinform ; 21(4): 1249-1260, 2020 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-31273380

RESUMEN

The understanding of complex biological networks often relies on both a dedicated layout and a topology. Currently, there are three major competing layout-aware systems biology formats, but there are no software tools or software libraries supporting all of them. This complicates the management of molecular network layouts and hinders their reuse and extension. In this paper, we present a high-level overview of the layout formats in systems biology, focusing on their commonalities and differences, review their support in existing software tools, libraries and repositories and finally introduce a new conversion module within the MINERVA platform. The module is available via a REST API and offers, besides the ability to convert between layout-aware systems biology formats, the possibility to export layouts into several graphical formats. The module enables conversion of very large networks with thousands of elements, such as disease maps or metabolic reconstructions, rendering it widely applicable in systems biology.


Asunto(s)
Biología de Sistemas , Algoritmos , Humanos , Almacenamiento y Recuperación de la Información , Programas Informáticos
6.
Nucleic Acids Res ; 48(W1): W132-W139, 2020 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-32402084

RESUMEN

Human genome sequencing efforts have greatly expanded, and a plethora of missense variants identified both in patients and in the general population is now publicly accessible. Interpretation of the molecular-level effect of missense variants, however, remains challenging and requires a particular investigation of amino acid substitutions in the context of protein structure and function. Answers to questions like 'Is a variant perturbing a site involved in key macromolecular interactions and/or cellular signaling?', or 'Is a variant changing an amino acid located at the protein core or part of a cluster of known pathogenic mutations in 3D?' are crucial. Motivated by these needs, we developed MISCAST (missense variant to protein structure analysis web suite; http://miscast.broadinstitute.org/). MISCAST is an interactive and user-friendly web server to visualize and analyze missense variants in protein sequence and structure space. Additionally, a comprehensive set of protein structural and functional features have been aggregated in MISCAST from multiple databases, and displayed on structures alongside the variants to provide users with the biological context of the variant location in an integrated platform. We further made the annotated data and protein structures readily downloadable from MISCAST to foster advanced offline analysis of missense variants by a wide biological community.


Asunto(s)
Mutación Missense , Conformación Proteica , Programas Informáticos , Humanos , Internet , Proteínas/química , Proteínas/genética
7.
Nucleic Acids Res ; 47(W1): W345-W349, 2019 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-31114880

RESUMEN

PrankWeb is an online resource providing an interface to P2Rank, a state-of-the-art method for ligand binding site prediction. P2Rank is a template-free machine learning method based on the prediction of local chemical neighborhood ligandability centered on points placed on a solvent-accessible protein surface. Points with a high ligandability score are then clustered to form the resulting ligand binding sites. In addition, PrankWeb provides a web interface enabling users to easily carry out the prediction and visually inspect the predicted binding sites via an integrated sequence-structure view. Moreover, PrankWeb can determine sequence conservation for the input molecule and use this in both the prediction and result visualization steps. Alongside its online visualization options, PrankWeb also offers the possibility of exporting the results as a PyMOL script for offline visualization. The web frontend communicates with the server side via a REST API. In high-throughput scenarios, therefore, users can utilize the server API directly, bypassing the need for a web-based frontend or installation of the P2Rank application. PrankWeb is available at http://prankweb.cz/, while the web application source code and the P2Rank method can be accessed at https://github.com/jendelel/PrankWebApp and https://github.com/rdk/p2rank, respectively.


Asunto(s)
Aprendizaje Automático , Proteínas/química , Programas Informáticos , Secuencia de Aminoácidos , Benchmarking , Sitios de Unión , Conjuntos de Datos como Asunto , Humanos , Internet , Ligandos , Unión Proteica , Conformación Proteica en Hélice alfa , Conformación Proteica en Lámina beta , Dominios y Motivos de Interacción de Proteínas , Proteínas/metabolismo , Termodinámica
8.
Bioinformatics ; 35(21): 4496-4498, 2019 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-31074494

RESUMEN

SUMMARY: The complexity of molecular networks makes them difficult to navigate and interpret, creating a need for specialized software. MINERVA is a web platform for visualization, exploration and management of molecular networks. Here, we introduce an extension to MINERVA architecture that greatly facilitates the access and use of the stored molecular network data. It allows to incorporate such data in analytical pipelines via a programmatic access interface, and to extend the platform's visual exploration and analytics functionality via plugin architecture. This is possible for any molecular network hosted by the MINERVA platform encoded in well-recognized systems biology formats. To showcase the possibilities of the plugin architecture, we have developed several plugins extending the MINERVA core functionalities. In the article, we demonstrate the plugins for interactive tree traversal of molecular networks, for enrichment analysis and for mapping and visualization of known disease variants or known adverse drug reactions to molecules in the network. AVAILABILITY AND IMPLEMENTATION: Plugins developed and maintained by the MINERVA team are available under the AGPL v3 license at https://git-r3lab.uni.lu/minerva/plugins/. The MINERVA API and plugin documentation is available at https://minerva-web.lcsb.uni.lu.


Asunto(s)
Programas Informáticos , Biología de Sistemas
9.
Bioinformatics ; 34(23): 4127-4128, 2018 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-29931246

RESUMEN

Summary: MolArt fills the gap between sequence and structure visualization by providing a light-weight, interactive environment enabling exploration of sequence annotations in the context of available experimental or predicted protein structures. Provided a UniProt ID, MolArt downloads and displays sequence annotations, sequence-structure mapping and relevant structures. The sequence and structure views are interlinked, enabling sequence annotations being color overlaid over the mapped structures, thus providing an enhanced understanding and interpretation of the available molecular data. Availability and implementation: MolArt is released under the Apache 2 license and is available at https://github.com/davidhoksza/MolArt. The project web page https://davidhoksza.github.io/MolArt/ features examples and applications of the tool.


Asunto(s)
Estructura Molecular , Conformación Proteica , Proteínas , Programas Informáticos , Color , Biología Computacional
11.
BMC Bioinformatics ; 18(1): 487, 2017 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-29141608

RESUMEN

BACKGROUND: Visualization of RNA secondary structures is a complex task, and, especially in the case of large RNA structures where the expected layout is largely habitual, the existing visualization tools often fail to produce suitable visualizations. This led us to the idea to use existing layouts as templates for the visualization of new RNAs similarly to how templates are used in homology-based structure prediction. RESULTS: This article introduces Traveler, a software tool enabling visualization of a target RNA secondary structure using an existing layout of a sufficiently similar RNA structure as a template. Traveler is based on an algorithm which converts the target and template structures into corresponding tree representations and utilizes tree edit distance coupled with layout modification operations to transform the template layout into the target one. Traveler thus accepts a pair of secondary structures and a template layout and outputs a layout for the target structure. CONCLUSIONS: Traveler is a command-line open source tool able to quickly generate layouts for even the largest RNA structures in the presence of a sufficiently similar layout. It is available at http://github.com/davidhoksza/traveler .


Asunto(s)
ARN/química , Programas Informáticos , Algoritmos , Conformación de Ácido Nucleico
12.
BMC Bioinformatics ; 18(Suppl 15): 492, 2017 Dec 06.
Artículo en Inglés | MEDLINE | ID: mdl-29244012

RESUMEN

BACKGROUND: Protein-protein interactions (PPI) play a key role in an investigation of various biochemical processes, and their identification is thus of great importance. Although computational prediction of which amino acids take part in a PPI has been an active field of research for some time, the quality of in-silico methods is still far from perfect. RESULTS: We have developed a novel prediction method called INSPiRE which benefits from a knowledge base built from data available in Protein Data Bank. All proteins involved in PPIs were converted into labeled graphs with nodes corresponding to amino acids and edges to pairs of neighboring amino acids. A structural neighborhood of each node was then encoded into a bit string and stored in the knowledge base. When predicting PPIs, INSPiRE labels amino acids of unknown proteins as interface or non-interface based on how often their structural neighborhood appears as interface or non-interface in the knowledge base. We evaluated INSPiRE's behavior with respect to different types and sizes of the structural neighborhood. Furthermore, we examined the suitability of several different features for labeling the nodes. Our evaluations showed that INSPiRE clearly outperforms existing methods with respect to Matthews correlation coefficient. CONCLUSION: In this paper we introduce a new knowledge-based method for identification of protein-protein interaction sites called INSPiRE. Its knowledge base utilizes structural patterns of known interaction sites in the Protein Data Bank which are then used for PPI prediction. Extensive experiments on several well-established datasets show that INSPiRE significantly surpasses existing PPI approaches.


Asunto(s)
Aminoácidos , Bases del Conocimiento , Mapeo de Interacción de Proteínas/métodos , Proteínas , Programas Informáticos , Aminoácidos/química , Aminoácidos/metabolismo , Biología Computacional , Bases de Datos de Proteínas , Modelos Estadísticos , Proteínas/química , Proteínas/metabolismo
13.
BMC Bioinformatics ; 16: 253, 2015 Aug 12.
Artículo en Inglés | MEDLINE | ID: mdl-26264783

RESUMEN

BACKGROUND: Understanding the architecture and function of RNA molecules requires methods for comparing and analyzing their tertiary and quaternary structures. While structural superposition of short RNAs is achievable in a reasonable time, large structures represent much bigger challenge. Therefore, we have developed a fast and accurate algorithm for RNA pairwise structure superposition called SETTER and implemented it in the SETTER web server. However, though biological relationships can be inferred by a pairwise structure alignment, key features preserved by evolution can be identified only from a multiple structure alignment. Thus, we extended the SETTER algorithm to the alignment of multiple RNA structures and developed the MultiSETTER algorithm. RESULTS: In this paper, we present the updated version of the SETTER web server that implements a user friendly interface to the MultiSETTER algorithm. The server accepts RNA structures either as the list of PDB IDs or as user-defined PDB files. After the superposition is computed, structures are visualized in 3D and several reports and statistics are generated. CONCLUSION: To the best of our knowledge, the MultiSETTER web server is the first publicly available tool for a multiple RNA structure alignment. The MultiSETTER server offers the visual inspection of an alignment in 3D space which may reveal structural and functional relationships not captured by other multiple alignment methods based either on a sequence or on secondary structure motifs.


Asunto(s)
Algoritmos , Internet , Conformación de Ácido Nucleico , ARN/química , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Alineación de Secuencia/métodos
14.
Nucleic Acids Res ; 40(Web Server issue): W42-8, 2012 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-22693209

RESUMEN

The recent discoveries of regulatory non-coding RNAs changed our view of RNA as a simple information transfer molecule. Understanding the architecture and function of active RNA molecules requires methods for comparing and analyzing their 3D structures. While structural alignment of short RNAs is achievable in a reasonable amount of time, large structures represent much bigger challenge. Here, we present the SETTER web server for the RNA structure pairwise comparison utilizing the SETTER (SEcondary sTructure-based TERtiary Structure Similarity Algorithm) algorithm. The SETTER method divides an RNA structure into the set of non-overlapping structural elements called generalized secondary structure units (GSSUs). The SETTER algorithm scales as O(n(2)) with the size of a GSSUs and as O(n) with the number of GSSUs in the structure. This scaling gives SETTER its high speed as the average size of the GSSU remains constant irrespective of the size of the structure. However, the favorable speed of the algorithm does not compromise its accuracy. The SETTER web server together with the stand-alone implementation of the SETTER algorithm are freely accessible at http://siret.cz/setter.


Asunto(s)
ARN/química , Programas Informáticos , Algoritmos , Internet , Conformación de Ácido Nucleico
15.
J Mol Biol ; : 168545, 2024 Mar 18.
Artículo en Inglés | MEDLINE | ID: mdl-38508305

RESUMEN

A single protein structure is rarely sufficient to capture the conformational variability of a protein. Both bound and unbound (holo and apo) forms of a protein are essential for understanding its geometry and making meaningful comparisons. Nevertheless, docking or drug design studies often still consider only single protein structures in their holo form, which are for the most part rigid. With the recent explosion in the field of structural biology, large, curated datasets are urgently needed. Here, we use a previously developed application (AHoJ) to perform a comprehensive search for apo-holo pairs for 468,293 biologically relevant protein-ligand interactions across 27,983 proteins. In each search, the binding pocket is captured and mapped across existing structures within the same UniProt, and the mapped pockets are annotated as apo or holo, based on the presence or absence of ligands. We assemble the results into a database, AHoJ-DB (www.apoholo.cz/db), that captures the variability of proteins with identical sequences, thereby exposing the agents responsible for the observed differences in geometry. We report several metrics for each annotated pocket, and we also include binding pockets that form at the interface of multiple chains. Analysis of the database shows that about 24% of the binding sites occur at the interface of two or more chains and that less than 50% of the total binding sites processed have an apo form in the PDB. These results can be used to train and evaluate predictors, discover potentially druggable proteins, and reveal protein- and ligand-specific relationships that were previously obscured by intermittent or partial data. Availability: www.apoholo.cz/db.

16.
bioRxiv ; 2024 Jan 02.
Artículo en Inglés | MEDLINE | ID: mdl-38260256

RESUMEN

Recent advances in AI-based methods have revolutionized the field of structural biology. Concomitantly, high-throughput sequencing and functional genomics technologies have enabled the detection and generation of variants at an unprecedented scale. However, efficient tools and resources are needed to link these two disparate data types - to "map" variants onto protein structures, to better understand how the variation causes disease and thereby design therapeutics. Here we present the Genomics 2 Proteins Portal (G2P; g2p.broadinstitute.org/): a human proteome-wide resource that maps 19,996,443 genetic variants onto 42,413 protein sequences and 77,923 structures, with a comprehensive set of structural and functional features. Additionally, the G2P portal generalizes the capability of linking genomics to proteins beyond databases by allowing users to interactively upload protein residue-wise annotations (variants, scores, etc.) as well as the protein structure to establish the connection. The portal serves as an easy-to-use discovery tool for researchers and scientists to hypothesize the structure-function relationship between natural or synthetic variations and their molecular phenotype.

17.
Bioinformatics ; 28(14): 1858-64, 2012 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-22611129

RESUMEN

MOTIVATION: Understanding the architecture and function of RNA molecules requires methods for comparing and analyzing their 3D structures. Although a structural alignment of short RNAs is achievable in a reasonable amount of time, large structures represent much bigger challenge. However, the growth of the number of large RNAs deposited in the PDB database calls for the development of fast and accurate methods for analyzing their structures, as well as for rapid similarity searches in databases. RESULTS: In this article a novel algorithm for an RNA structural comparison SETTER (SEcondary sTructure-based TERtiary Structure Similarity Algorithm) is introduced. SETTER uses a pairwise comparison method based on 3D similarity of the so-called generalized secondary structure units. For each pair of structures, SETTER produces a distance score and an indication of its statistical significance. SETTER can be used both for the structural alignments of structures that are already known to be homologous, as well as for 3D structure similarity searches and functional annotation. The algorithm presented is both accurate and fast and does not impose limits on the size of aligned RNA structures. AVAILABILITY: The SETTER program, as well as all datasets, is freely available from http://siret.cz/hoksza/projects/setter/.


Asunto(s)
Algoritmos , Conformación de Ácido Nucleico , ARN/química , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Biología Computacional/métodos , Alineación de Secuencia/métodos
18.
Front Bioinform ; 3: 1101505, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37502697

RESUMEN

Introduction: Investigation of molecular mechanisms of human disorders, especially rare diseases, require exploration of various knowledge repositories for building precise hypotheses and complex data interpretation. Recently, increasingly more resources offer diagrammatic representation of such mechanisms, including disease-dedicated schematics in pathway databases and disease maps. However, collection of knowledge across them is challenging, especially for research projects with limited manpower. Methods: In this article we present an automated workflow for construction of maps of molecular mechanisms for rare diseases. The workflow requires a standardized definition of a disease using Orphanet or HPO identifiers to collect relevant genes and variants, and to assemble a functional, visual repository of related mechanisms, including data overlays. The diagrams composing the final map are unified to a common systems biology format from CellDesigner SBML, GPML and SBML+layout+render. The constructed resource contains disease-relevant genes and variants as data overlays for immediate visual exploration, including embedded genetic variant browser and protein structure viewer. Results: We demonstrate the functionality of our workflow on two examples of rare diseases: Kawasaki disease and retinitis pigmentosa. Two maps are constructed based on their corresponding identifiers. Moreover, for the retinitis pigmentosa use-case, we include a list of differentially expressed genes to demonstrate how to tailor the workflow using omics datasets. Discussion: In summary, our work allows for an ad-hoc construction of molecular diagrams combined from different sources, preserving their layout and graphical style, but integrating them into a single resource. This allows to reduce time consuming tasks of prototyping of a molecular disease map, enabling visual exploration, hypothesis building, data visualization and further refinement. The code of the workflow is open and accessible at https://gitlab.lcsb.uni.lu/minerva/automap/.

19.
Proteome Sci ; 9 Suppl 1: S20, 2011 Oct 14.
Artículo en Inglés | MEDLINE | ID: mdl-22166105

RESUMEN

BACKGROUND: Similarity search in protein databases is one of the most essential issues in computational proteomics. With the growing number of experimentally resolved protein structures, the focus shifted from sequences to structures. The area of structure similarity forms a big challenge since even no standard definition of optimal structure similarity exists in the field. RESULTS: We propose a protein structure similarity measure called SProt. SProt concentrates on high-quality modeling of local similarity in the process of feature extraction. SProt's features are based on spherical spatial neighborhood of amino acids where similarity can be well-defined. On top of the partial local similarities, global measure assessing similarity to a pair of protein structures is built. Finally, indexing is applied making the search process by an order of magnitude faster. CONCLUSIONS: The proposed method outperforms other methods in classification accuracy on SCOP superfamily and fold level, while it is at least comparable to the best existing solutions in terms of precision-recall or quality of alignment.

20.
IEEE/ACM Trans Comput Biol Bioinform ; 18(3): 1130-1141, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-31484128

RESUMEN

Visualization of biological mechanisms by means of pathway graphs is necessary to better understand the often complex underlying system. Manual layout of such pathways or maps of knowledge is a difficult and time consuming process. Node duplication is a technique that makes layouts with improved readability possible by reducing edge crossings and shortening edge lengths in drawn diagrams. In this article, we propose an approach using Machine Learning (ML) to facilitate parts of this task by training a Support Vector Machine (SVM) with actions taken during manual biocuration. Our training input is a series of incremental snapshots of a diagram describing mechanisms of a disease, progressively curated by a human expert employing node duplication in the process. As a test of the trained SVM models, they are applied to a single large instance and 25 medium-sized instances of hand-curated biological pathways. Finally, in a user validation study, we compare the model predictions to the outcome of a node duplication questionnaire answered by users of biological pathways with varying experience. We successfully predicted nodes for duplication and emulated human choices, demonstrating that our approach can effectively learn human-like node duplication preferences to support curation of pathway diagrams in various contexts.


Asunto(s)
Biología Computacional/métodos , Aprendizaje Automático , Modelos Biológicos , Presentación de Datos , Humanos , Transducción de Señal , Máquina de Vectores de Soporte
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA