Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 58
Filtrar
1.
Bioinformatics ; 40(5)2024 May 02.
Artículo en Inglés | MEDLINE | ID: mdl-38648741

RESUMEN

SUMMARY: SIMSApiper is a Nextflow pipeline that creates reliable, structure-informed MSAs of thousands of protein sequences faster than standard structure-based alignment methods. Structural information can be provided by the user or collected by the pipeline from online resources. Parallelization with sequence identity-based subsets can be activated to significantly speed up the alignment process. Finally, the number of gaps in the final alignment can be reduced by leveraging the position of conserved secondary structure elements. AVAILABILITY AND IMPLEMENTATION: The pipeline is implemented using Nextflow, Python3, and Bash. It is publicly available on github.com/Bio2Byte/simsapiper.


Asunto(s)
Proteínas , Alineación de Secuencia , Análisis de Secuencia de Proteína , Programas Informáticos , Proteínas/química , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Algoritmos , Secuencia de Aminoácidos , Biología Computacional/métodos , Bases de Datos de Proteínas
2.
Elife ; 122024 Feb 16.
Artículo en Inglés | MEDLINE | ID: mdl-38363283

RESUMEN

The RNA recognition motif (RRM) is the most common RNA-binding protein domain identified in nature. However, RRM-containing proteins are only prevalent in eukaryotic phyla, in which they play central regulatory roles. Here, we engineered an orthogonal post-transcriptional control system of gene expression in the bacterium Escherichia coli with the mammalian RNA-binding protein Musashi-1, which is a stem cell marker with neurodevelopmental role that contains two canonical RRMs. In the circuit, Musashi-1 is regulated transcriptionally and works as an allosteric translation repressor thanks to a specific interaction with the N-terminal coding region of a messenger RNA and its structural plasticity to respond to fatty acids. We fully characterized the genetic system at the population and single-cell levels showing a significant fold change in reporter expression, and the underlying molecular mechanism by assessing the in vitro binding kinetics and in vivo functionality of a series of RNA mutants. The dynamic response of the system was well recapitulated by a bottom-up mathematical model. Moreover, we applied the post-transcriptional mechanism engineered with Musashi-1 to specifically regulate a gene within an operon, implement combinatorial regulation, and reduce protein expression noise. This work illustrates how RRM-based regulation can be adapted to simple organisms, thereby adding a new regulatory layer in prokaryotes for translation control.


Asunto(s)
Proteínas del Tejido Nervioso , Proteínas de Unión al ARN , Animales , Proteínas del Tejido Nervioso/metabolismo , Proteínas de Unión al ARN/metabolismo , ARN/metabolismo , ARN Mensajero/metabolismo , Escherichia coli/genética , Escherichia coli/metabolismo , Mamíferos/genética
3.
Proteins ; 91(6): 771-780, 2023 06.
Artículo en Inglés | MEDLINE | ID: mdl-36629258

RESUMEN

Inactive rhodopsin can absorb photons, which induces different structural transitions that finally activate rhodopsin. We have examined the change in spatial configurations and physicochemical factors that result during the transition mechanism from the inactive to the active rhodopsin state via intermediates. During the activation process, many existing atomic contacts are disrupted, and new ones are formed. This is related to the movement of Helix 5, which tilts away from Helix 3 in the intermediate state in lumirhodopsin and moves closer to Helix 3 again in the active state. Similar patterns of changing atomic contacts are observed between Helices 3 and 5 of the adenosine and neurotensin receptors. In addition, residues 220-238 of rhodopsin, which are disordered in the inactive state, fold in the active state before binding to the Gα, where it catalyzes GDP/GTP exchange on the Gα subunit. Finally, molecular dynamics simulations in the membrane environment revealed that the arrestin binding region adopts a more flexible extended conformation upon phosphorylation, likely promoting arrestin binding and inactivation. In summary, our results provide additional structural understanding of specific rhodopsin activation which might be relevant to other Class A G protein-coupled receptor proteins.


Asunto(s)
Receptores Acoplados a Proteínas G , Rodopsina , Animales , Bovinos , Rodopsina/química , Rodopsina/metabolismo , Conformación Proteica , Receptores Acoplados a Proteínas G/química , Simulación de Dinámica Molecular , Arrestinas/metabolismo
4.
PLoS Comput Biol ; 19(1): e1010859, 2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-36689472

RESUMEN

RNA recognition motifs (RRM) are the most prevalent class of RNA binding domains in eucaryotes. Their RNA binding preferences have been investigated for almost two decades, and even though some RRM domains are now very well described, their RNA recognition code has remained elusive. An increasing number of experimental structures of RRM-RNA complexes has become available in recent years. Here, we perform an in-depth computational analysis to derive an RNA recognition code for canonical RRMs. We present and validate a computational scoring method to estimate the binding between an RRM and a single stranded RNA, based on structural data from a carefully curated multiple sequence alignment, which can predict RRM binding RNA sequence motifs based on the RRM protein sequence. Given the importance and prevalence of RRMs in humans and other species, this tool could help design RNA binding motifs with uses in medical or synthetic biology applications, leading towards the de novo design of RRMs with specific RNA recognition.


Asunto(s)
Motivo de Reconocimiento de ARN , ARN , Humanos , ARN/química , Secuencia de Aminoácidos , Alineación de Secuencia , Motivos de Nucleótidos/genética , Unión Proteica , Sitios de Unión
5.
Front Mol Biosci ; 9: 959956, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35992270

RESUMEN

Traditionally, our understanding of how proteins operate and how evolution shapes them is based on two main data sources: the overall protein fold and the protein amino acid sequence. However, a significant part of the proteome shows highly dynamic and/or structurally ambiguous behavior, which cannot be correctly represented by the traditional fixed set of static coordinates. Representing such protein behaviors remains challenging and necessarily involves a complex interpretation of conformational states, including probabilistic descriptions. Relating protein dynamics and multiple conformations to their function as well as their physiological context (e.g., post-translational modifications and subcellular localization), therefore, remains elusive for much of the proteome, with studies to investigate the effect of protein dynamics relying heavily on computational models. We here investigate the possibility of delineating three classes of protein conformational behavior: order, disorder, and ambiguity. These definitions are explored based on three different datasets, using interpretable machine learning from a set of features, from AlphaFold2 to sequence-based predictions, to understand the overlap and differences between these datasets. This forms the basis for a discussion on the current limitations in describing the behavior of dynamic and ambiguous proteins.

6.
J Proteome Res ; 21(8): 1894-1915, 2022 08 05.
Artículo en Inglés | MEDLINE | ID: mdl-35793420

RESUMEN

Protein phosphorylation is the most common reversible post-translational modification of proteins and is key in the regulation of many cellular processes. Due to this importance, phosphorylation is extensively studied, resulting in the availability of a large amount of mass spectrometry-based phospho-proteomics data. Here, we leverage the information in these large-scale phospho-proteomics data sets, as contained in Scop3P, to analyze and characterize proteome-wide protein phosphorylation sites (P-sites). First, we set out to differentiate correctly observed P-sites from false-positive sites using five complementary site properties. We then describe the context of these P-sites in terms of the protein structure, solvent accessibility, structural transitions and disorder, and biophysical properties. We also investigate the relative prevalence of disease-linked mutations on and around P-sites. Moreover, we assess the structural dynamics of P-sites in their phosphorylated and unphosphorylated states. As a result, we show how large-scale reprocessing of available proteomics experiments can enable a more reliable view on proteome-wide P-sites. Furthermore, adding the structural context of proteins around P-sites helps uncover possible conformational switches upon phosphorylation. Moreover, by placing sites in different biophysical contexts, we show the differential preference in protein dynamics at phosphorylated sites when compared to the nonphosphorylated counterparts.


Asunto(s)
Proteoma , Proteómica , Humanos , Espectrometría de Masas , Fosforilación , Procesamiento Proteico-Postraduccional , Proteoma/metabolismo , Proteómica/métodos
7.
BMC Mol Cell Biol ; 22(1): 23, 2021 Apr 23.
Artículo en Inglés | MEDLINE | ID: mdl-33892639

RESUMEN

BACKGROUND: The SARS-CoV-2 virus, the causative agent of COVID-19, consists of an assembly of proteins that determine its infectious and immunological behavior, as well as its response to therapeutics. Major structural biology efforts on these proteins have already provided essential insights into the mode of action of the virus, as well as avenues for structure-based drug design. However, not all of the SARS-CoV-2 proteins, or regions thereof, have a well-defined three-dimensional structure, and as such might exhibit ambiguous, dynamic behaviour that is not evident from static structure representations, nor from molecular dynamics simulations using these structures. MAIN: We present a website ( https://bio2byte.be/sars2/ ) that provides protein sequence-based predictions of the backbone and side-chain dynamics and conformational propensities of these proteins, as well as derived early folding, disorder, ß-sheet aggregation, protein-protein interaction and epitope propensities. These predictions attempt to capture the inherent biophysical propensities encoded in the sequence, rather than context-dependent behaviour such as the final folded state. In addition, we provide the biophysical variation that is observed in homologous proteins, which gives an indication of the limits of their functionally relevant biophysical behaviour. CONCLUSION: The https://bio2byte.be/sars2/ website provides a range of protein sequence-based predictions for 27 SARS-CoV-2 proteins, enabling researchers to form hypotheses about their possible functional modes of action.


Asunto(s)
SARS-CoV-2/química , Proteínas Virales/química , Bases de Datos de Proteínas , Humanos , Acceso a Internet , Alineación de Secuencia , Análisis de Secuencia de Proteína , Programas Informáticos , Proteínas Virales/metabolismo
8.
Nucleic Acids Res ; 49(D1): D361-D367, 2021 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-33237329

RESUMEN

The MobiDB database (URL: https://mobidb.org/) provides predictions and annotations for intrinsically disordered proteins. Here, we report recent developments implemented in MobiDB version 4, regarding the database format, with novel types of annotations and an improved update process. The new website includes a re-designed user interface, a more effective search engine and advanced API for programmatic access. The new database schema gives more flexibility for the users, as well as simplifying the maintenance and updates. In addition, the new entry page provides more visualisation tools including customizable feature viewer and graphs of the residue contact maps. MobiDB v4 annotates the binding modes of disordered proteins, whether they undergo disorder-to-order transitions or remain disordered in the bound state. In addition, disordered regions undergoing liquid-liquid phase separation or post-translational modifications are defined. The integrated information is presented in a simplified interface, which enables faster searches and allows large customized datasets to be downloaded in TSV, Fasta or JSON formats. An alternative advanced interface allows users to drill deeper into features of interest. A new statistics page provides information at database and proteome levels. The new MobiDB version presents state-of-the-art knowledge on disordered proteins and improves data accessibility for both computational and experimental users.


Asunto(s)
Bases de Datos de Proteínas , Proteínas Intrínsecamente Desordenadas/química , Algoritmos , Internet , Anotación de Secuencia Molecular , Procesamiento Proteico-Postraduccional , Programas Informáticos
9.
Nucleic Acids Res ; 48(W1): W36-W40, 2020 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-32459331

RESUMEN

Nuclear magnetic resonance (NMR) spectroscopy data provides valuable information on the behaviour of proteins in solution. The primary data to determine when studying proteins are the per-atom NMR chemical shifts, which reflect the local environment of atoms and provide insights into amino acid residue dynamics and conformation. Within an amino acid residue, chemical shifts present multi-dimensional and complexly cross-correlated information, making them difficult to analyse. The ShiftCrypt method, based on neural network auto-encoder architecture, compresses the per-amino acid chemical shift information in a single, interpretable, amino acid-type independent value that reflects the biophysical state of a residue. We here present the ShiftCrypt web server, which makes the method readily available. The server accepts chemical shifts input files in the NMR Exchange Format (NEF) or NMR-STAR format, executes ShiftCrypt and visualises the results, which are also accessible via an API. It also enables the "biophysically-based" pairwise alignment of two proteins based on their ShiftCrypt values. This approach uses Dynamic Time Warping and can optionally include their amino acid code information, and has applications in, for example, the alignment of disordered regions. The server uses a token-based system to ensure the anonymity of the users and results. The web server is available at www.bio2byte.be/shiftcrypt.


Asunto(s)
Resonancia Magnética Nuclear Biomolecular/métodos , Proteínas/química , Programas Informáticos , Aminoácidos/química , Redes Neurales de la Computación , Desnaturalización Proteica , Pliegue de Proteína , Desplegamiento Proteico
10.
F1000Res ; 82019.
Artículo en Inglés | MEDLINE | ID: mdl-31824649

RESUMEN

Intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs) are now recognised as major determinants in cellular regulation. This white paper presents a roadmap for future e-infrastructure developments in the field of IDP research within the ELIXIR framework. The goal of these developments is to drive the creation of high-quality tools and resources to support the identification, analysis and functional characterisation of IDPs. The roadmap is the result of a workshop titled "An intrinsically disordered protein user community proposal for ELIXIR" held at the University of Padua. The workshop, and further consultation with the members of the wider IDP community, identified the key priority areas for the roadmap including the development of standards for data annotation, storage and dissemination; integration of IDP data into the ELIXIR Core Data Resources; and the creation of benchmarking criteria for IDP-related software. Here, we discuss these areas of priority, how they can be implemented in cooperation with the ELIXIR platforms, and their connections to existing ELIXIR Communities and international consortia. The article provides a preliminary blueprint for an IDP Community in ELIXIR and is an appeal to identify and involve new stakeholders.


Asunto(s)
Proteínas Intrínsecamente Desordenadas/metabolismo
11.
Sci Rep ; 9(1): 16932, 2019 11 15.
Artículo en Inglés | MEDLINE | ID: mdl-31729443

RESUMEN

Machine learning (ML) is ubiquitous in bioinformatics, due to its versatility. One of the most crucial aspects to consider while training a ML model is to carefully select the optimal feature encoding for the problem at hand. Biophysical propensity scales are widely adopted in structural bioinformatics because they describe amino acids properties that are intuitively relevant for many structural and functional aspects of proteins, and are thus commonly used as input features for ML methods. In this paper we reproduce three classical structural bioinformatics prediction tasks to investigate the main assumptions about the use of propensity scales as input features for ML methods. We investigate their usefulness with different randomization experiments and we show that their effectiveness varies among the ML methods used and the tasks. We show that while linear methods are more dependent on the feature encoding, the specific biophysical meaning of the features is less relevant for non-linear methods. Moreover, we show that even among linear ML methods, the simpler one-hot encoding can surprisingly outperform the "biologically meaningful" scales. We also show that feature selection performed with non-linear ML methods may not be able to distinguish between randomized and "real" propensity scales by properly prioritizing to the latter. Finally, we show that learning problem-specific embeddings could be a simple, assumptions-free and optimal way to perform feature learning/engineering for structural bioinformatics tasks.


Asunto(s)
Biología Computacional/métodos , Aprendizaje Automático , Análisis de Secuencia de Proteína/métodos , Aminoácidos/química , Fenómenos Biofísicos , Cisteína , Oxidación-Reducción , Puntaje de Propensión , Proteínas/química , Solventes/química
12.
Sci Rep ; 9(1): 12140, 2019 Aug 15.
Artículo en Inglés | MEDLINE | ID: mdl-31413290

RESUMEN

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

13.
Bioinformatics ; 35(22): 4617-4623, 2019 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-30994888

RESUMEN

MOTIVATION: Eukaryotic cells contain different membrane-delimited compartments, which are crucial for the biochemical reactions necessary to sustain cell life. Recent studies showed that cells can also trigger the formation of membraneless organelles composed by phase-separated proteins to respond to various stimuli. These condensates provide new ways to control the reactions and phase-separation proteins (PSPs) are thus revolutionizing how cellular organization is conceived. The small number of experimentally validated proteins, and the difficulty in discovering them, remain bottlenecks in PSPs research. RESULTS: Here we present PSPer, the first in-silico screening tool for prion-like RNA-binding PSPs. We show that it can prioritize PSPs among proteins containing similar RNA-binding domains, intrinsically disordered regions and prions. PSPer is thus suitable to screen proteomes, identifying the most likely PSPs for further experimental investigation. Moreover, its predictions are fully interpretable in the sense that it assigns specific functional regions to the predicted proteins, providing valuable information for experimental investigation of targeted mutations on these regions. Finally, we show that it can estimate the ability of artificially designed proteins to form condensates (r=-0.87), thus providing an in-silico screening tool for protein design experiments. AVAILABILITY AND IMPLEMENTATION: PSPer is available at bio2byte.com/psp. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Proteínas de Unión al ARN/metabolismo , Orgánulos , Priones , Proteoma
14.
Sci Rep ; 8(1): 16980, 2018 11 19.
Artículo en Inglés | MEDLINE | ID: mdl-30451933

RESUMEN

Next generation sequencing technologies are providing increasing amounts of sequencing data, paving the way for improvements in clinical genetics and precision medicine. The interpretation of the observed genomic variants in the light of their phenotypic effects is thus emerging as a crucial task to solve in order to advance our understanding of how exomic variants affect proteins and how the proteins' functional changes affect human health. Since the experimental evaluation of the effects of every observed variant is unfeasible, Bioinformatics methods are being developed to address this challenge in-silico, by predicting the impact of millions of variants, thus providing insight into the deleteriousness landscape of entire proteomes. Here we show the feasibility of this approach by using the recently developed DEOGEN2 variant-effect predictor to perform the largest in-silico mutagenesis scan to date. We computed the deleteriousness score of 170 million variants over 15000 human proteins and we analysed the results, investigating how the predicted deleteriousness landscape of the proteins relates to known functionally and structurally relevant protein regions and biophysical properties. Moreover, we qualitatively validated our results by comparing them with two mutagenesis studies targeting two specific proteins, showing the consistency of DEOGEN2 predictions with respect to experimental data.


Asunto(s)
Mutagénesis , Proteoma , Biología Computacional , Simulación por Computador , Humanos
15.
Bioinformatics ; 34(18): 3118-3125, 2018 09 15.
Artículo en Inglés | MEDLINE | ID: mdl-29684140

RESUMEN

Motivation: Evolutionary information is crucial for the annotation of proteins in bioinformatics. The amount of retrieved homologs often correlates with the quality of predicted protein annotations related to structure or function. With a growing amount of sequences available, fast and reliable methods for homology detection are essential, as they have a direct impact on predicted protein annotations. Results: We developed a discriminative, alignment-free algorithm for homology detection with quasi-linear complexity, enabling theoretically much faster homology searches. To reach this goal, we convert the protein sequence into numeric biophysical representations. These are shrunk to a fixed length using a novel vector quantization method which uses a Discrete Cosine Transform compression. We then compute, for each compressed representation, similarity scores between proteins with the Dynamic Time Warping algorithm and we feed them into a Random Forest. The WARP performances are comparable with state of the art methods. Availability and implementation: The method is available at http://ibsquare.be/warp. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Proteínas/química , Algoritmos , Secuencia de Aminoácidos , Compresión de Datos , Anotación de Secuencia Molecular , Programas Informáticos , Factores de Tiempo
16.
Nucleic Acids Res ; 46(D1): D471-D476, 2018 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-29136219

RESUMEN

The MobiDB (URL: mobidb.bio.unipd.it) database of protein disorder and mobility annotations has been significantly updated and upgraded since its last major renewal in 2014. Several curated datasets for intrinsic disorder and folding upon binding have been integrated from specialized databases. The indirect evidence has also been expanded to better capture information available in the PDB, such as high temperature residues in X-ray structures and overall conformational diversity. Novel nuclear magnetic resonance chemical shift data provides an additional experimental information layer on conformational dynamics. Predictions have been expanded to provide new types of annotation on backbone rigidity, secondary structure preference and disordered binding regions. MobiDB 3.0 contains information for the complete UniProt protein set and synchronization has been improved by covering all UniParc sequences. An advanced search function allows the creation of a wide array of custom-made datasets for download and further analysis. A large amount of information and cross-links to more specialized databases are intended to make MobiDB the central resource for the scientific community working on protein intrinsic disorder and mobility.


Asunto(s)
Bases de Datos de Proteínas , Proteínas Intrínsecamente Desordenadas/química , Anotación de Secuencia Molecular , Programas Informáticos , Secuencia de Aminoácidos , Sitios de Unión , Conjuntos de Datos como Asunto , Ontología de Genes , Humanos , Internet , Proteínas Intrínsecamente Desordenadas/genética , Proteínas Intrínsecamente Desordenadas/metabolismo , Modelos Moleculares , Unión Proteica , Pliegue de Proteína , Dominios y Motivos de Interacción de Proteínas , Alineación de Secuencia
17.
Nucleic Acids Res ; 46(D1): D387-D392, 2018 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-29040693

RESUMEN

Soluble functional proteins may transform into insoluble amyloid fibrils that deposit in a variety of tissues. Amyloid formation is a hallmark of age-related degenerative disorders. Perhaps surprisingly, amyloid fibrils can also be beneficial and are frequently exploited for diverse functional roles in organisms. Here we introduce AmyPro, an open-access database providing a comprehensive, carefully curated collection of validated amyloid fibril-forming proteins from all kingdoms of life classified into broad functional categories (http://amypro.net). In particular, AmyPro provides the boundaries of experimentally validated amyloidogenic sequence regions, short descriptions of the functional relevance of the proteins and their amyloid state, a list of the experimental techniques applied to study the amyloid state, important structural/functional/variation/mutation data transferred from UniProt, a list of relevant PDB structures categorized according to protein states, database cross-references and literature references. AmyPro greatly improves on similar currently available resources by incorporating both prions and functional amyloids in addition to pathogenic amyloids, and allows users to screen their sequences against the entire collection of validated amyloidogenic sequence fragments. By enabling further elucidation of the sequential determinants of amyloid fibril formation, we hope AmyPro will enhance the development of new methods for the precise prediction of amyloidogenic regions within proteins.


Asunto(s)
Proteínas Amiloidogénicas/química , Bases de Datos de Proteínas , Interfaz Usuario-Computador
18.
Bioinformatics ; 34(2): 294-296, 2018 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-29028877

RESUMEN

MOTIVATION: Protein function is directly related to amino acid residue composition and the dynamics of these residues. Centrality analyses based on residue interaction networks permit to identify key residues in a protein that are important for its fold or function. Such central residues and their environment constitute suitable targets for mutagenesis experiments. Predicted flexibility and changes in flexibility upon mutation provide valuable additional information for the design of such experiments. RESULTS: We combined centrality analyses with DynaMine flexibility predictions in a Cytoscape app called RINspector. The app performs centrality analyses and directly visualizes the results on a graph of predicted residue flexibility. In addition, the effect of mutations on local flexibility can be calculated. AVAILABILITY AND IMPLEMENTATION: The app is publicly available in the Cytoscape app store. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

19.
Sci Rep ; 7(1): 8826, 2017 08 18.
Artículo en Inglés | MEDLINE | ID: mdl-28821744

RESUMEN

Protein folding is a complex process that can lead to disease when it fails. Especially poorly understood are the very early stages of protein folding, which are likely defined by intrinsic local interactions between amino acids close to each other in the protein sequence. We here present EFoldMine, a method that predicts, from the primary amino acid sequence of a protein, which amino acids are likely involved in early folding events. The method is based on early folding data from hydrogen deuterium exchange (HDX) data from NMR pulsed labelling experiments, and uses backbone and sidechain dynamics as well as secondary structure propensities as features. The EFoldMine predictions give insights into the folding process, as illustrated by a qualitative comparison with independent experimental observations. Furthermore, on a quantitative proteome scale, the predicted early folding residues tend to become the residues that interact the most in the folded structure, and they are often residues that display evolutionary covariation. The connection of the EFoldMine predictions with both folding pathway data and the folded protein structure suggests that the initial statistical behavior of the protein chain with respect to local structure formation has a lasting effect on its subsequent states.


Asunto(s)
Secuencia de Aminoácidos , Pliegue de Proteína , Proteínas/química , Proteínas Intrínsecamente Desordenadas/química , Espectroscopía de Resonancia Magnética , Espectrometría de Masas , Fenómenos Mecánicos , Modelos Moleculares , Conformación Proteica , Curva ROC , Reproducibilidad de los Resultados
20.
Bioinformatics ; 33(24): 3902-3908, 2017 Dec 15.
Artículo en Inglés | MEDLINE | ID: mdl-28666322

RESUMEN

MOTIVATION: Methods able to provide reliable protein alignments are crucial for many bioinformatics applications. In the last years many different algorithms have been developed and various kinds of information, from sequence conservation to secondary structure, have been used to improve the alignment performances. This is especially relevant for proteins with highly divergent sequences. However, recent works suggest that different features may have different importance in diverse protein classes and it would be an advantage to have more customizable approaches, capable to deal with different alignment definitions. RESULTS: Here we present Rigapollo, a highly flexible pairwise alignment method based on a pairwise HMM-SVM that can use any type of information to build alignments. Rigapollo lets the user decide the optimal features to align their protein class of interest. It outperforms current state of the art methods on two well-known benchmark datasets when aligning highly divergent sequences. AVAILABILITY AND IMPLEMENTATION: A Python implementation of the algorithm is available at http://ibsquare.be/rigapollo. CONTACT: wim.vranken@vub.be. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Máquina de Vectores de Soporte , Algoritmos , Cadenas de Markov , Estructura Secundaria de Proteína , Proteínas/química , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...