RESUMO
An unambiguous description of an experiment, and the subsequent biological observation, is vital for accurate data interpretation. Minimum information guidelines define the fundamental complement of data that can support an unambiguous conclusion based on experimental observations. We present the Minimum Information About Disorder Experiments (MIADE) guidelines to define the parameters required for the wider scientific community to understand the findings of an experiment studying the structural properties of intrinsically disordered regions (IDRs). MIADE guidelines provide recommendations for data producers to describe the results of their experiments at source, for curators to annotate experimental data to community resources and for database developers maintaining community resources to disseminate the data. The MIADE guidelines will improve the interpretability of experimental results for data consumers, facilitate direct data submission, simplify data curation, improve data exchange among repositories and standardize the dissemination of the key metadata on an IDR experiment by IDR data sources.
Assuntos
Proteínas Intrinsicamente Desordenadas , Proteínas Intrinsicamente Desordenadas/química , Conformação ProteicaRESUMO
DisProt (URL: https://disprot.org) is the gold standard database for intrinsically disordered proteins and regions, providing valuable information about their functions. The latest version of DisProt brings significant advancements, including a broader representation of functions and an enhanced curation process. These improvements aim to increase both the quality of annotations and their coverage at the sequence level. Higher coverage has been achieved by adopting additional evidence codes. Quality of annotations has been improved by systematically applying Minimum Information About Disorder Experiments (MIADE) principles and reporting all the details of the experimental setup that could potentially influence the structural state of a protein. The DisProt database now includes new thematic datasets and has expanded the adoption of Gene Ontology terms, resulting in an extensive functional repertoire which is automatically propagated to UniProtKB. Finally, we show that DisProt's curated annotations strongly correlate with disorder predictions inferred from AlphaFold2 pLDDT (predicted Local Distance Difference Test) confidence scores. This comparison highlights the utility of DisProt in explaining apparent uncertainty of certain well-defined predicted structures, which often correspond to folding-upon-binding fragments. Overall, DisProt serves as a comprehensive resource, combining experimental evidence of disorder information to enhance our understanding of intrinsically disordered proteins and their functional implications.
Assuntos
Bases de Dados de Proteínas , Proteínas Intrinsicamente Desordenadas , Ontologia Genética , Proteínas Intrinsicamente Desordenadas/química , Anotação de Sequência MolecularRESUMO
The Protein Ensemble Database (PED) (URL: https://proteinensemble.org) is the primary resource for depositing structural ensembles of intrinsically disordered proteins. This updated version of PED reflects advancements in the field, denoting a continual expansion with a total of 461 entries and 538 ensembles, including those generated without explicit experimental data through novel machine learning (ML) techniques. With this significant increment in the number of ensembles, a few yet-unprecedented new entries entered the database, including those also determined or refined by electron paramagnetic resonance or circular dichroism data. In addition, PED was enriched with several new features, including a novel deposition service, improved user interface, new database cross-referencing options and integration with the 3D-Beacons network-all representing efforts to improve the FAIRness of the database. Foreseeably, PED will keep growing in size and expanding with new types of ensembles generated by accurate and fast ML-based generative models and coarse-grained simulations. Therefore, among future efforts, priority will be given to further develop the database to be compatible with ensembles modeled at a coarse-grained level.
Assuntos
Bases de Dados de Proteínas , Proteínas Intrinsicamente Desordenadas , Proteínas Intrinsicamente Desordenadas/química , Simulação de Dinâmica Molecular , Ressonância Magnética Nuclear Biomolecular , Conformação ProteicaRESUMO
Residue interaction networks (RINs) are a valuable approach for representing contacts in protein structures. RINs have been widely used in various research areas, including the analysis of mutation effects, domain-domain communication, catalytic activity, and molecular dynamics simulations. The RING server is a powerful tool to calculate non-covalent molecular interactions based on geometrical parameters, providing high-quality and reliable results. Here, we introduce RING 4.0, which includes significant enhancements for identifying both covalent and non-covalent bonds in protein structures. It now encompasses seven different interaction types, with the addition of π-hydrogen, halogen bonds and metal ion coordination sites. The definitions of all available bond types have also been refined and RING can now process the complete PDB chemical component dictionary (over 35000 different molecules) which provides atom names and covalent connectivity information for all known ligands. Optimization of the software has improved execution time by an order of magnitude. The RING web server has been redesigned to provide a more engaging and interactive user experience, incorporating new visualization tools. Users can now visualize all types of interactions simultaneously in the structure viewer and network component. The web server, including extensive help and tutorials, is available from URL: https://ring.biocomputingup.it/.
Assuntos
Software , Proteínas/química , Proteínas/metabolismo , Internet , Ligantes , Conformação ProteicaRESUMO
Proteins form complex interactions in the cellular environment to carry out their functions. They exhibit a wide range of binding modes depending on the cellular conditions, which result in a variety of ordered or disordered assemblies. To help rationalise the binding behavior of proteins, the FuzPred server predicts their sequence-based binding modes without specifying their binding partners. The binding mode defines whether the bound state is formed through a disorder-to-order transition resulting in a well-defined conformation, or through a disorder-to-disorder transition where the binding partners remain conformationally heterogeneous. To account for the context-dependent nature of the binding modes, the FuzPred method also estimates the multiplicity of binding modes, the likelihood of sampling multiple binding modes. Protein regions with a high multiplicity of binding modes may serve as regulatory sites or hot-spots for structural transitions in the assembly. To facilitate the interpretation of the predictions, protein regions with different interaction behaviors can be visualised on protein structures generated by AlphaFold. The FuzPred web server (https://fuzpred.bio.unipd.it) thus offers insights into the structural and dynamical changes of proteins upon interactions and contributes to development of structure-function relationships under a variety of cellular conditions.
Assuntos
Computadores , Proteínas , Conformação Proteica , Proteínas/química , Domínios Proteicos , SoftwareRESUMO
Intrinsic disorder (ID) in proteins is well-established in structural biology, with increasing evidence for its involvement in essential biological processes. As measuring dynamic ID behavior experimentally on a large scale remains difficult, scores of published ID predictors have tried to fill this gap. Unfortunately, their heterogeneity makes it difficult to compare performance, confounding biologists wanting to make an informed choice. To address this issue, the Critical Assessment of protein Intrinsic Disorder (CAID) benchmarks predictors for ID and binding regions as a community blind-test in a standardized computing environment. Here we present the CAID Prediction Portal, a web server executing all CAID methods on user-defined sequences. The server generates standardized output and facilitates comparison between methods, producing a consensus prediction highlighting high-confidence ID regions. The website contains extensive documentation explaining the meaning of different CAID statistics and providing a brief description of all methods. Predictor output is visualized in an interactive feature viewer and made available for download in a single table, with the option to recover previous sessions via a private dashboard. The CAID Prediction Portal is a valuable resource for researchers interested in studying ID in proteins. The server is available at the URL: https://caid.idpcentral.org.
Assuntos
Biologia Molecular , Proteínas , Benchmarking , Consenso , Proteínas/química , Software , Proteínas Intrinsicamente DesordenadasRESUMO
The MobiDB database (URL: https://mobidb.org/) is a knowledge base of intrinsically disordered proteins. MobiDB aggregates disorder annotations derived from the literature and from experimental evidence along with predictions for all known protein sequences. MobiDB generates new knowledge and captures the functional significance of disordered regions by processing and combining complementary sources of information. Since its first release 10 years ago, the MobiDB database has evolved in order to improve the quality and coverage of protein disorder annotations and its accessibility. MobiDB has now reached its maturity in terms of data standardization and visualization. Here, we present a new release which focuses on the optimization of user experience and database content. The major advances compared to the previous version are the integration of AlphaFoldDB predictions and the re-implementation of the homology transfer pipeline, which expands manually curated annotations by two orders of magnitude. Finally, the entry page has been restyled in order to provide an overview of the available annotations along with two separate views that highlight structural disorder evidence and functions associated with different binding modes.
Assuntos
Proteínas Intrinsicamente Desordenadas , Proteínas Intrinsicamente Desordenadas/química , Bases de Dados de Proteínas , Anotação de Sequência Molecular , Sequência de Aminoácidos , Bases de Conhecimento , Conformação ProteicaRESUMO
The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. Here, we report recent developments with InterPro (version 90.0) and its associated software, including updates to data content and to the website. These developments extend and enrich the information provided by InterPro, and provide a more user friendly access to the data. Additionally, we have worked on adding Pfam website features to the InterPro website, as the Pfam website will be retired in late 2022. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB. Moreover, we report the development of a card game as a method of engaging the non-scientific community. Finally, we discuss the benefits and challenges brought by the use of artificial intelligence for protein structure prediction.
Assuntos
Bases de Dados de Proteínas , Humanos , Sequência de Aminoácidos , Inteligência Artificial , Internet , Proteínas/química , SoftwareRESUMO
Intrinsically disordered proteins, defying the traditional protein structure-function paradigm, are a challenge to study experimentally. Because a large part of our knowledge rests on computational predictions, it is crucial that their accuracy is high. The Critical Assessment of protein Intrinsic Disorder prediction (CAID) experiment was established as a community-based blind test to determine the state of the art in prediction of intrinsically disordered regions and the subset of residues involved in binding. A total of 43 methods were evaluated on a dataset of 646 proteins from DisProt. The best methods use deep learning techniques and notably outperform physicochemical methods. The top disorder predictor has Fmax = 0.483 on the full dataset and Fmax = 0.792 following filtering out of bona fide structured regions. Disordered binding regions remain hard to predict, with Fmax = 0.231. Interestingly, computing times among methods can vary by up to four orders of magnitude.
Assuntos
Biologia Computacional , Proteínas Intrinsicamente Desordenadas/química , Sequência de Aminoácidos , Bases de Dados de Proteínas , Ligação Proteica , Conformação Proteica , Dobramento de Proteína , SoftwareRESUMO
RING-PyMOL is a plugin for PyMOL providing a set of analysis tools for structural ensembles and molecular dynamic simulations. RING-PyMOL combines residue interaction networks, as provided by the RING software, with structural clustering to enhance the analysis and visualization of the conformational complexity. It combines precise calculation of non-covalent interactions with the power of PyMOL to manipulate and visualize protein structures. The plugin identifies and highlights correlating contacts and interaction patterns that can explain structural allostery, active sites, and structural heterogeneity connected with molecular function. It is easy to use and extremely fast, processing and rendering hundreds of models and long trajectories in seconds. RING-PyMOL generates a number of interactive plots and output files for use with external tools. The underlying RING software has been improved extensively. It is 10 times faster, can process mmCIF files and it identifies typed interactions also for nucleic acids. AVAILABILITY AND IMPLEMENTATION: https://github.com/BioComputingUP/ring-pymol.
Assuntos
Simulação de Dinâmica Molecular , Software , Proteínas/química , Análise por Conglomerados , Domínio CatalíticoRESUMO
Many proteins perform their functions within membraneless organelles, where they form a liquid-like condensed state, also known as droplet state. The FuzDrop method predicts the probability of spontaneous liquid-liquid phase separation of proteins and provides a sequence-based score to identify the regions that promote this process. Furthermore, the FuzDrop method estimates the propensity of conversion of proteins to the amyloid state, and identifies aggregation hot-spots, which can drive the irreversible maturation of the liquid-like droplet state. These predictions can also identify mutations that can induce formation of amyloid aggregates, including those implicated in human diseases. To facilitate the interpretation of the predictions, the droplet-promoting and aggregation-promoting regions can be visualized on protein structures generated by AlphaFold. The FuzDrop server (https://fuzdrop.bio.unipd.it) thus offers insights into the complex behavior of proteins in their condensed states and facilitates the understanding of the functional relationships of proteins.
Assuntos
Amiloide , Conformação Proteica , Software , Humanos , Amiloide/genética , Amiloide/química , MutaçãoRESUMO
Fuzzy interactions are specific, variable contacts between proteins and other biomolecules (proteins, DNA, RNA, small molecules) formed in accord to the cellular context. Fuzzy interactions have recently been demonstrated to regulate biomolecular condensates generated by liquid-liquid phase separation. The FuzDB v4.0 database (https://fuzdb.org) assembles experimentally identified examples of fuzzy interactions, where disordered regions mediate functionally important, context-dependent contacts between the partners in stoichiometric and higher-order assemblies. The new version of FuzDB establishes cross-links with databases on structure (PDB, BMRB, PED), function (ELM, UniProt) and biomolecular condensates (PhaSepDB, PhaSePro, LLPSDB). FuzDB v4.0 is a source to decipher molecular basis of complex cellular interaction behaviors, including those in protein droplets.
Assuntos
DNA/metabolismo , Bases de Dados de Proteínas , Proteínas Intrinsicamente Desordenadas/metabolismo , RNA/metabolismo , Software , Sequência de Aminoácidos , Sítios de Ligação , DNA/química , DNA/genética , Humanos , Proteínas Intrinsicamente Desordenadas/química , Proteínas Intrinsicamente Desordenadas/genética , Modelos Moleculares , Anotação de Sequência Molecular , Transição de Fase , Ligação Proteica , Conformação Proteica em alfa-Hélice , Conformação Proteica em Folha beta , Domínios e Motivos de Interação entre Proteínas , RNA/química , RNA/genética , Relação Estrutura-AtividadeRESUMO
Residue interaction networks (RINs) are used to represent residue contacts in protein structures. Thanks to the advances in network theory, RINs have been proved effective as an alternative to coordinate data in the analysis of complex systems. The RING server calculates high quality and reliable non-covalent molecular interactions based on geometrical parameters. Here, we present the new RING 3.0 version extending the previous functionality in several ways. The underlying software library has been re-engineered to improve speed by an order of magnitude. RING now also supports the mmCIF format and provides typed interactions for the entire PDB chemical component dictionary, including nucleic acids. Moreover, RING now employs probabilistic graphs, where multiple conformations (e.g. NMR or molecular dynamics ensembles) are mapped as weighted edges, opening up new ways to analyze structural data. The web interface has been expanded to include a simultaneous view of the RIN alongside a structure viewer, with both synchronized and clickable. Contact evolution across models (or time) is displayed as a heatmap and can help in the discovery of correlating interaction patterns. The web server, together with an extensive help and tutorial, is available from URL: https://ring.biocomputingup.it/.
Assuntos
Proteínas , Software , Internet , Simulação de Dinâmica Molecular , Conformação Proteica , Proteínas/química , ProbabilidadeRESUMO
The Evidence and Conclusion Ontology (ECO) is a community resource that provides an ontology of terms used to capture the type of evidence that supports biomedical annotations and assertions. Consistent capture of evidence information with ECO allows tracking of annotation provenance, establishment of quality control measures, and evidence-based data mining. ECO is in use by dozens of data repositories and resources with both specific and general areas of focus. ECO is continually being expanded and enhanced in response to user requests as well as our aim to adhere to community best-practices for ontology development. The ECO support team engages in multiple collaborations with other ontologies and annotating groups. Here we report on recent updates to the ECO ontology itself as well as associated resources that are available through this project. ECO project products are freely available for download from the project website (https://evidenceontology.org/) and GitHub (https://github.com/evidenceontology/evidenceontology). ECO is released into the public domain under a CC0 1.0 Universal license.
Assuntos
Biologia Computacional/normas , Bases de Dados Genéticas , Ontologia Genética , Software , Humanos , Anotação de Sequência MolecularRESUMO
The Human Proteome Organization (HUPO) Proteomics Standards Initiative (PSI) has been successfully developing guidelines, data formats, and controlled vocabularies (CVs) for the proteomics community and other fields supported by mass spectrometry since its inception 20 years ago. Here we describe the general operation of the PSI, including its leadership, working groups, yearly workshops, and the document process by which proposals are thoroughly and publicly reviewed in order to be ratified as PSI standards. We briefly describe the current state of the many existing PSI standards, some of which remain the same as when originally developed, some of which have undergone subsequent revisions, and some of which have become obsolete. Then the set of proposals currently being developed are described, with an open call to the community for participation in the forging of the next generation of standards. Finally, we describe some synergies and collaborations with other organizations and look to the future in how the PSI will continue to promote the open sharing of data and thus accelerate the progress of the field of proteomics.
Assuntos
Proteoma , Proteômica , Humanos , Padrões de Referência , Vocabulário Controlado , Espectrometria de Massas , Bases de Dados de ProteínasRESUMO
Structured tandem repeats proteins (STRPs) are a specific kind of tandem repeat proteins characterized by a modular and repetitive three-dimensional structure arrangement. The majority of STRPs adopt solenoid structures, but with the increasing availability of experimental structures and high-quality predicted structural models, more STRP folds can be characterized. Here, we describe "Box repeats", an overlooked STRP fold present in the DNA sliding clamp processivity factors, which has eluded classification although structural data has been available since the late 1990s. Each Box repeat is a ßâºßßß module of about 60 residues, which forms a class V "beads-on-a-string" type STRP. The number of repeats present in processivity factors is organism dependent. Monomers of PCNA proteins in both Archaea and Eukarya have 4 repeats, while the monomers of bacterial beta-sliding clamps have 6 repeats. This new repeat fold has been added to the RepeatsDB database, which now provides structural annotation for 66 Box repeat proteins belonging to different organisms, including viruses.
Assuntos
Proteínas , Sequências de Repetição em Tandem , Proteínas/química , Sequências de Repetição em Tandem/genética , DNA/genéticaRESUMO
Tandem Repeat Proteins (TRPs) are a class of proteins with repetitive amino acid sequences that have been studied extensively for over two decades. Different features at the level of sequence, structure, function and evolution have been attributed to them by various authors. And yet many of its salient features appear only when looking at specific subclasses of protein tandem repeats. Here, we attempt to rationalize the existing knowledge on Tandem Repeat Proteins (TRPs) by pointing out several dichotomies. The emerging picture is more nuanced than generally assumed and allows us to draw some boundaries of what is not a "proper" TRP. We conclude with an operational definition of a specific subset, which we have denominated STRPs (Structural Tandem Repeat Proteins), which separates a subclass of tandem repeats with distinctive features from several other less well-defined types of repeats. We believe that this definition will help researchers in the field to better characterize the biological meaning of this large yet largely understudied group of proteins.
Assuntos
Proteínas , Sequências de Repetição em Tandem , Proteínas/genética , Proteínas/química , Sequências de Repetição em Tandem/genética , Sequência de AminoácidosRESUMO
Protein intrinsic disorder (ID) is a complex and context-dependent phenomenon that covers a continuum between fully disordered states and folded states with long dynamic regions. The lack of a ground truth that fits all ID flavors and the potential for order-to-disorder transitions depending on specific conditions makes ID prediction challenging. The CAID2 challenge aimed to evaluate the performance of different prediction methods across different benchmarks, leveraging the annotation provided by the DisProt database, which stores the coordinates of ID regions when there is experimental evidence in the literature. The CAID2 challenge demonstrated varying performance of different prediction methods across different benchmarks, highlighting the need for continued development of more versatile and efficient prediction software. Depending on the application, researchers may need to balance performance with execution time when selecting a predictor. Methods based on AlphaFold2 seem to be good ID predictors but they are better at detecting absence of order rather than ID regions as defined in DisProt. The CAID2 predictors can be freely used through the CAID Prediction Portal, and CAID has been integrated into OpenEBench, which will become the official platform for running future CAID challenges.
Assuntos
Proteínas Intrinsicamente Desordenadas , Proteínas , Software , Bases de Dados de ProteínasRESUMO
SUMMARY: Mathematical models are effective in studying cancer development at different scales from metabolism to tissue. Phase Field Models (PFMs) have been shown to reproduce accurately cancer growth and other related phenomena, including expression of relevant molecules, extracellular matrix remodeling and angiogenesis. However, implementations of such models are rarely published, reducing access to these techniques. To reduce this gap, we developed Mocafe, a modular open-source Python package that implements some of the most important PFMs reported in the literature. Mocafe is designed to handle both PFMs purely based on differential equations and hybrid agent-based PFMs. Moreover, Mocafe is meant to be extensible, allowing the inclusion of new models in future releases. AVAILABILITY AND IMPLEMENTATION: Mocafe is a Python package based on FEniCS, a popular computing platform for solving partial differential equations. The source code, extensive documentation and demos are provided on GitHub at URL: https://github.com/BioComputingUP/mocafe. Moreover, we uploaded on Zenodo an archive of the package, which is available at https://doi.org/10.5281/zenodo.6366052.
Assuntos
Bibliotecas , Neoplasias , Humanos , Software , DocumentaçãoRESUMO
SUMMARY: Biological data is ever-increasing in amount and complexity. The mapping of this data to biological entities such as nucleotide and amino acid sequences supports biological data analysis, classification and prediction. Sequence alignments and comparison allow the transfer of knowledge to evolutionary-related entities, the mapping of functional domains, the identification of binding and modification sites. To support these types of studies, we developed ProSeqViewer, a tool to visualize annotation on single sequences and multiple sequence alignments. This state-of-the-art multifunctional library was developed as a modular component to be integrated into static or dynamic web resources and support intuitive visualization of sequence features. ProseSeqViewer is extremely lightweight, fast, interactive, dynamic, responsive and works at any screen size. It generates pure HTML which is compatible with any browser and operating system. ProSeqViewer can exchange events with other visualization components and is already used by multiple biological databases. AVAILABILITY AND IMPLEMENTATION: ProSeqViewer is an open-source TypeScript library compatible with state-of-the-art website environments. The source code and an extensive documentation including use cases are available from the URL: https://github.com/BioComputingUP/ProSeqViewer.