RESUMEN
ChannelsDB 2.0 is an updated database providing structural information about the position, geometry and physicochemical properties of protein channels-tunnels and pores-within deposited biomacromolecular structures from PDB and AlphaFoldDB databases. The newly deposited information originated from several sources. Firstly, we included data calculated using a popular CAVER tool to complement the data obtained using original MOLE tool for detection and analysis of protein tunnels and pores. Secondly, we added tunnels starting from cofactors within the AlphaFill database to enlarge the scope of the database to protein models based on Uniprot. This has enlarged available channel annotations â¼4.6 times as of 1 September 2023. The database stores information about geometrical features, e.g. length and radius, and physico-chemical properties based on channel-lining amino acids. The stored data are interlinked with the available UniProt mutation annotation data. ChannelsDB 2.0 provides an excellent resource for deep analysis of the role of biomacromolecular tunnels and pores. The database is available free of charge: https://channelsdb2.biodata.ceitec.cz.
Asunto(s)
Bases de Datos de Proteínas , Proteínas , Programas Informáticos , Aminoácidos , Proteínas/química , Conformación ProteicaRESUMEN
AlphaFind is a web-based search engine that provides fast structure-based retrieval in the entire set of AlphaFold DB structures. Unlike other protein processing tools, AlphaFind is focused entirely on tertiary structure, automatically extracting the main 3D features of each protein chain and using a machine learning model to find the most similar structures. This indexing approach and the 3D feature extraction method used by AlphaFind have both demonstrated remarkable scalability to large datasets as well as to large protein structures. The web application itself has been designed with a focus on clarity and ease of use. The searcher accepts any valid UniProt ID, Protein Data Bank ID or gene symbol as input, and returns a set of similar protein chains from AlphaFold DB, including various similarity metrics between the query and each of the retrieved results. In addition to the main search functionality, the application provides 3D visualizations of protein structure superpositions in order to allow researchers to instantly analyze the structural similarity of the retrieved results. The AlphaFind web application is available online for free and without any registration at https://alphafind.fi.muni.cz.
Asunto(s)
Bases de Datos de Proteínas , Proteoma , Programas Informáticos , Proteoma/química , Proteoma/genética , Internet , Motor de Búsqueda , Aprendizaje Automático , Conformación Proteica , Proteínas/química , Proteínas/genética , Proteínas/metabolismo , Pliegue de Proteína , Modelos Moleculares , Homología Estructural de ProteínaRESUMEN
The AlphaFold2 prediction algorithm opened up the possibility of exploring proteins' structural space at an unprecedented scale. Currently, >200 million protein structures predicted by this approach are deposited in AlphaFoldDB, covering entire proteomes of multiple organisms, including humans. Predicted structures are, however, stored without detailed functional annotations describing their chemical behaviour. Partial atomic charges, which map electron distribution over a molecule and provide a clue to its chemical reactivity, are an important example of such data. We introduce the web application αCharges: a tool for the quick calculation of partial atomic charges for protein structures from AlphaFoldDB. The charges are calculated by the recent empirical method SQE+qp, parameterised for this class of molecules using robust quantum mechanics charges (B3LYP/6-31G*/NPA) on PROPKA3 protonated structures. The computed partial atomic charges can be downloaded in common data formats or visualised via the powerful Mol* viewer. The αCharges application is freely available at https://alphacharges.ncbr.muni.cz with no login requirement.
Asunto(s)
Biología Computacional , Proteínas , Programas Informáticos , Humanos , Algoritmos , Proteoma , Conformación Proteica , Proteínas/química , Biología Computacional/instrumentación , Biología Computacional/métodosRESUMEN
Segmentation helps interpret imaging data in a biological context. With the development of powerful tools for automated segmentation, public repositories for imaging data have added support for sharing and visualizing segmentations, creating the need for interactive web-based visualization of 3D volume segmentations. To address the ongoing challenge of integrating and visualizing multimodal data, we developed Mol* Volumes and Segmentations (Mol*VS), which enables the interactive, web-based visualization of cellular imaging data supported by macromolecular data and biological annotations. Mol*VS is fully integrated into Mol* Viewer, which is already used for visualization by several public repositories. All EMDB and EMPIAR entries with segmentation datasets are accessible via Mol*VS, which supports the visualization of data from a wide range of electron and light microscopy experiments. Additionally, users can run a local instance of Mol*VS to visualize and share custom datasets in generic or application-specific formats including volumes in .ccp4, .mrc, and .map, and segmentations in EMDB-SFF .hff, Amira .am, iMod .mod, and Segger .seg. Mol*VS is open source and freely available at https://molstarvolseg.ncbr.muni.cz/.
Asunto(s)
Procesamiento de Imagen Asistido por Computador , Microscopía , Programas Informáticos , Sustancias Macromoleculares , InternetRESUMEN
SUMMARY: Every protein family has a set of characteristic secondary structures. However, due to individual variations, a single structure is not enough to represent the whole family. OverProt can create a secondary structure consensus, showing the general fold of the family as well as its variation. Our server provides precomputed results for all CATH superfamilies and user-defined computations, visualized by an interactive viewer, which shows the secondary structure element type, length, frequency of occurrence, spatial variability and ß-connectivity. AVAILABILITY AND IMPLEMENTATION: OverProt Server is freely available at https://overprot.ncbr.muni.cz. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Proteínas , Programas Informáticos , Consenso , Proteínas/química , Estructura Secundaria de Proteína , ComputadoresRESUMEN
Large biomolecular structures are being determined experimentally on a daily basis using established techniques such as crystallography and electron microscopy. In addition, emerging integrative or hybrid methods (I/HM) are producing structural models of huge macromolecular machines and assemblies, sometimes containing 100s of millions of non-hydrogen atoms. The performance requirements for visualization and analysis tools delivering these data are increasing rapidly. Significant progress in developing online, web-native three-dimensional (3D) visualization tools was previously accomplished with the introduction of the LiteMol suite and NGL Viewers. Thereafter, Mol* development was jointly initiated by PDBe and RCSB PDB to combine and build on the strengths of LiteMol (developed by PDBe) and NGL (developed by RCSB PDB). The web-native Mol* Viewer enables 3D visualization and streaming of macromolecular coordinate and experimental data, together with capabilities for displaying structure quality, functional, or biological context annotations. High-performance graphics and data management allows users to simultaneously visualise up to hundreds of (superimposed) protein structures, stream molecular dynamics simulation trajectories, render cell-level models, or display huge I/HM structures. It is the primary 3D structure viewer used by PDBe and RCSB PDB. It can be easily integrated into third-party services. Mol* Viewer is open source and freely available at https://molstar.org/.
Asunto(s)
Sustancias Macromoleculares/química , Modelos Moleculares , Programas Informáticos , Internet , Conformación ProteicaRESUMEN
CATH (https://www.cathdb.info) identifies domains in protein structures from wwPDB and classifies these into evolutionary superfamilies, thereby providing structural and functional annotations. There are two levels: CATH-B, a daily snapshot of the latest domain structures and superfamily assignments, and CATH+, with additional derived data, such as predicted sequence domains, and functionally coherent sequence subsets (Functional Families or FunFams). The latest CATH+ release, version 4.3, significantly increases coverage of structural and sequence data, with an addition of 65,351 fully-classified domains structures (+15%), providing 500 238 structural domains, and 151 million predicted sequence domains (+59%) assigned to 5481 superfamilies. The FunFam generation pipeline has been re-engineered to cope with the increased influx of data. Three times more sequences are captured in FunFams, with a concomitant increase in functional purity, information content and structural coverage. FunFam expansion increases the structural annotations provided for experimental GO terms (+59%). We also present CATH-FunVar web-pages displaying variations in protein sequences and their proximity to known or predicted functional sites. We present two case studies (1) putative cancer drivers and (2) SARS-CoV-2 proteins. Finally, we have improved links to and from CATH including SCOP, InterPro, Aquaria and 2DProt.
Asunto(s)
Biología Computacional/estadística & datos numéricos , Bases de Datos de Proteínas/estadística & datos numéricos , Dominios Proteicos , Proteínas/química , Secuencia de Aminoácidos , COVID-19/epidemiología , COVID-19/prevención & control , COVID-19/virología , Biología Computacional/métodos , Epidemias , Humanos , Internet , Anotación de Secuencia Molecular , Proteínas/genética , Proteínas/metabolismo , SARS-CoV-2/genética , SARS-CoV-2/metabolismo , SARS-CoV-2/fisiología , Análisis de Secuencia de Proteína/métodos , Homología de Secuencia de Aminoácido , Proteínas Virales/química , Proteínas Virales/genética , Proteínas Virales/metabolismoRESUMEN
SUMMARY: Secondary structures provide a deep insight into the protein architecture. They can serve for comparison between individual protein family members. The most straightforward way how to deal with protein secondary structure is its visualization using 2D diagrams. Several software tools for the generation of 2D diagrams were developed. Unfortunately, they create 2D diagrams based on only a single protein. Therefore, 2D diagrams of two proteins from one family markedly differ. For this reason, we developed the 2DProts database, which contains secondary structure 2D diagrams for all domains from the CATH and all proteins from PDB databases. These 2D diagrams are generated based on a whole protein family, and they also consider information about the 3D arrangement of secondary structure elements. Moreover, 2DProts database contains multiple 2D diagrams, which provide an overview of a whole protein family's secondary structures. 2DProts is updated weekly and is integrated into CATH. AVAILABILITY AND IMPLEMENTATION: Freely accessible at https://2dprots.ncbr.muni.cz. The web interface was implemented in JavaScript. The database was implemented in Python. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Proteínas , Programas Informáticos , Proteínas/química , Estructura Secundaria de Proteína , Bases de Datos FactualesRESUMEN
Partial atomic charges serve as a simple model for the electrostatic distribution of a molecule that drives its interactions with its surroundings. Since partial atomic charges are frequently used in computational chemistry, chemoinformatics and bioinformatics, many computational approaches for calculating them have been introduced. The most applicable are fast and reasonably accurate empirical charge calculation approaches. Here, we introduce Atomic Charge Calculator II (ACC II), a web application that enables the calculation of partial atomic charges via all the main empirical approaches and for all types of molecules. ACC II implements 17 empirical charge calculation methods, including the highly cited (QEq, EEM), the recently published (EQeq, EQeq+C), and the old but still often used (PEOE). ACC II enables the fast calculation of charges even for large macromolecular structures. The web server also offers charge visualization, courtesy of the powerful LiteMol viewer. The calculation setup of ACC II is very straightforward and enables the quick calculation of high-quality partial charges. The application is available at https://acc2.ncbr.muni.cz.
Asunto(s)
Modelos Moleculares , Programas Informáticos , Hidrógeno/química , Internet , Estructura Molecular , Fenoles/química , Receptores Nicotínicos/química , Electricidad Estática , Proteína X Asociada a bcl-2/químicaRESUMEN
3D macromolecular structural data is growing ever more complex and plentiful in the wake of substantive advances in experimental and computational structure determination methods including macromolecular crystallography, cryo-electron microscopy, and integrative methods. Efficient means of working with 3D macromolecular structural data for archiving, analyses, and visualization are central to facilitating interoperability and reusability in compliance with the FAIR Principles. We address two challenges posed by growth in data size and complexity. First, data size is reduced by bespoke compression techniques. Second, complexity is managed through improved software tooling and fully leveraging available data dictionary schemas. To this end, we introduce BinaryCIF, a serialization of Crystallographic Information File (CIF) format files that maintains full compatibility to related data schemas, such as PDBx/mmCIF, while reducing file sizes by more than a factor of two versus gzip compressed CIF files. Moreover, for the largest structures, BinaryCIF provides even better compression-factor ten and four versus CIF files and gzipped CIF files, respectively. Herein, we describe CIFTools, a set of libraries in Java and TypeScript for generic and typed handling of CIF and BinaryCIF files. Together, BinaryCIF and CIFTools enable lightweight, efficient, and extensible handling of 3D macromolecular structural data.
Asunto(s)
Cristalografía/métodos , Compresión de Datos/métodos , Modelos Moleculares , Programas Informáticos , Bases de Datos de Compuestos Químicos , Sustancias Macromoleculares/química , Sustancias Macromoleculares/ultraestructuraRESUMEN
SUMMARY: Structures in PDB tend to contain errors. This is a very serious issue for authors that rely on such potentially problematic data. The community of structural biologists develops validation methods as countermeasures, which are also included in the PDB deposition system. But how are these validation efforts influencing the structure quality of subsequently published data? Which quality aspects are improving, and which remain problematic? We developed ValTrendsDB, a database that provides the results of an extensive exploratory analysis of relationships between quality criteria, size and metadata of biomacromolecules. Key input data are sourced from PDB. The discovered trends are presented via precomputed information-rich plots. ValTrendsDB also supports the visualization of a set of user-defined structures on top of general quality trends. Therefore, ValTrendsDB enables users to see the quality of structures published by selected author, laboratory or journal, discover quality outliers, etc. ValTrendsDB is updated weekly. AVAILABILITY AND IMPLEMENTATION: Freely accessible at http://ncbr.muni.cz/ValTrendsDB. The web interface was implemented in JavaScript. The database was implemented in C++. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Programas Informáticos , Bases de Datos de Proteínas , Internet , Proteínas , Interfaz Usuario-ComputadorRESUMEN
OBJECTIVES: To analyse the expression regulation of two inducible HSP70 genes - HSPA1A and HSPA1B - located within the major histocompatibility complex (MHC) in patients with various systemic autoimmune diseases and to prove the reliability of MHC-located HSP70 genes as molecular markers reflecting the autoimmune process. METHODS: 94 adult patients with idiopathic inflammatory myopathy (IIM, n=31), systemic lupus erythematosus (SLE, n=31) or systemic sclerosis (SSc, n=32) and 37 healthy individuals were analysed. The mRNA expression level was determined using quantitative real-time PCR method. The expression of intracellular HSP70 was established by flow cytometry, the extracellular HSP70 protein was measured in plasma samples using a commercially available sandwich enzyme-linked immunosorbent assay (ELISA). RESULTS: The expression of HSPA1A gene was significantly up-regulated in patients with autoimmune diseases (SLE: p<0.01; SSc: p<0.01; IIM: p<0.0001) compared to healthy controls. The expression of HSPA1B gene was increased only in patients with myositis (p<0.05). Furthermore, the HSPA1B gene expression is associated with the HLA-DRB1*03 risk allele in patients with IIM. In addition, we have found a relation between HSPA1A gene expression regulation and the presence of disease specific autoantibodies in patients with SLE and myositis. The level of intracellular HSP70 was not increased; however, the level of extracellular HSP70 protein was increased in patients suffering from SSc and IIM as compared to controls. CONCLUSIONS: The results suggest an involvement of the MHC-linked HSP70 genes in the pathology of studied autoimmune disorders. Therefore, the HSPA1A and HSPA1B genes might serve as an interesting candidate molecule for development of distinct types of autoimmunities.
Asunto(s)
Autoinmunidad/genética , Proteínas HSP70 de Choque Térmico/genética , Lupus Eritematoso Sistémico/genética , Miositis/genética , Esclerodermia Sistémica/genética , Adulto , Anciano , Alelos , Autoanticuerpos , Biomarcadores , Femenino , Regulación de la Expresión Génica , Predisposición Genética a la Enfermedad , Humanos , Lupus Eritematoso Sistémico/inmunología , Masculino , Persona de Mediana Edad , Miositis/inmunología , Esclerodermia Sistémica/inmunologíaRESUMEN
Channels, tunnels, and pores serve as pathways for the transport of molecules and ions through protein structures, thus participating to their functions. MOLEonline ( https://mole.upol.cz ) is an interactive web-based tool with enhanced capabilities for detecting and characterizing channels, tunnels, and pores within protein structures. MOLEonline has two distinct calculation modes for analysis of channel and tunnels or transmembrane pores. This application gives researchers rich analytical insights into channel detection, structural characterization, and physicochemical properties. ChannelsDB 2.0 ( https://channelsdb2.biodata.ceitec.cz/ ) is a comprehensive database that offers information on the location, geometry, and physicochemical characteristics of tunnels and pores within macromolecular structures deposited in Protein Data Bank and AlphaFill databases. These tunnels are sourced from manual deposition from literature and automatic detection using software tools MOLE and CAVER. MOLEonline and ChannelsDB visualization is powered by the LiteMol Viewer and Mol* viewer, ensuring a user-friendly workspace. This chapter provides an overview of user applications and usage.
Asunto(s)
Bases de Datos de Proteínas , Programas Informáticos , Conformación Proteica , Interfaz Usuario-Computador , Modelos Moleculares , Canales Iónicos/metabolismo , Canales Iónicos/química , Biología Computacional/métodos , Proteínas/química , Proteínas/metabolismo , Navegador WebRESUMEN
OBJECTIVES: The risk of activation of latent tuberculosis infection (LTBI) is increased in patients treated with anti-TNF-α drugs. Tuberculin skin test (TST) and Quantiferon-TB Gold test (QFT) are used to detect LTBI before and during anti-TNF-α treatment. We describe here a relation of these tests at various timepoints and also longitudinal QFT data. METHODS: Study group consisted of 305 patients with several rheumatic inflammatory diseases treated and/or scheduled for anti-TNF-α drugs. The QFT was performed in 303 patients during therapy and in 177 patients also during screening. The TST was used in 284 patients. Both tests simultaneously were utilised in 360 instances. RESULTS: Twenty-two patients were QFT positive; 3.9% before and 5.9% during anti-TNF-α treatment. Two patients who became QFT positive developed active tuberculosis. The TST was positive in 42% and 38% of patients before and during treatment, respectively. There was poor agreement between the two tests. Patients on glucocorticoids had a negative TST more frequently. The IFN-γ response to mycobacterial antigens significantly increased after application of tuberculin, but never reached the positive threshold. There was a significant increase in mitogen-induced IFN-γ production after initiation of anti-TNF-α therapy. CONCLUSIONS: Poor correlation between the QFT and TST renders the TST non-specific for LTBI. QFT is more specific to detect LTBI and conversion to a positive result may predict active TB. An increase in IFN-γ production in response to mycobacterial antigens is seen when the TST is performed before the QFT. Mitogen-induced IFN-γ production increases after initiation of anti-TNF-α therapy.
Asunto(s)
Antirreumáticos/efectos adversos , Artritis/tratamiento farmacológico , Ensayos de Liberación de Interferón gamma , Interferón gamma/sangre , Tuberculosis Latente/diagnóstico , Prueba de Tuberculina , Factor de Necrosis Tumoral alfa/antagonistas & inhibidores , Adulto , Artritis/diagnóstico , Artritis/inmunología , Biomarcadores/sangre , Distribución de Chi-Cuadrado , Femenino , Humanos , Tuberculosis Latente/sangre , Tuberculosis Latente/inducido químicamente , Tuberculosis Latente/inmunología , Tuberculosis Latente/microbiología , Estudios Longitudinales , Masculino , Persona de Mediana Edad , Valor Predictivo de las Pruebas , Factores de Riesgo , Sensibilidad y Especificidad , Factores de Tiempo , Resultado del TratamientoRESUMEN
BACKGROUND: Partial atomic charges find many applications in computational chemistry, chemoinformatics, bioinformatics, and nanoscience. Currently, frequently used methods for charge calculation are the Electronegativity Equalization Method (EEM), Charge Equilibration method (QEq), and Extended QEq (EQeq). They all are fast, even for large molecules, but require empirical parameters. However, even these advanced methods have limitations-e.g., their application for peptides, proteins, and other macromolecules is problematic. An empirical charge calculation method that is promising for peptides and other macromolecular systems is the Split-charge Equilibration method (SQE) and its extension SQE+q0. Unfortunately, only one parameter set is available for these methods, and their implementation is not easily accessible. RESULTS: In this article, we present for the first time an optimized guided minimization method (optGM) for the fast parameterization of empirical charge calculation methods and compare it with the currently available guided minimization (GDMIN) method. Then, we introduce a further extension to SQE, SQE+qp, adapted for peptide datasets, and compare it with the common approaches EEM, QEq EQeq, SQE, and SQE+q0. Finally, we integrate SQE and SQE+qp into the web application Atomic Charge Calculator II (ACC II), including several parameter sets. CONCLUSION: The main contribution of the article is that it makes SQE methods with their parameters accessible to the users via the ACC II web application ( https://acc2.ncbr.muni.cz ) and also via a command-line application. Furthermore, our improvement, SQE+qp, provides an excellent solution for peptide datasets. Additionally, optGM provides comparable parameters to GDMIN in a markedly shorter time. Therefore, optGM allows us to perform parameterizations for charge calculation methods with more parameters (e.g., SQE and its extensions) using large datasets.
RESUMEN
Protein structural families are groups of homologous proteins defined by the organization of secondary structure elements (SSEs). Nowadays, many families contain vast numbers of structures, and the SSEs can help to orient within them. Communities around specific protein families have even developed specialized SSE annotations, always assigning the same name to the equivalent SSEs in homologous proteins. A detailed analysis of the groups of equivalent SSEs provides an overview of the studied family and enriches the analysis of any particular protein at hand. We developed a workflow for the analysis of the secondary structure anatomy of a protein family. We applied this analysis to the model family of cytochromes P450 (CYPs)-a family of important biotransformation enzymes with a community-wide used SSE annotation. We report the occurrence, typical length and amino acid sequence for the equivalent SSE groups, the conservation/variability of these properties and relationship to the substrate recognition sites. We also suggest a generic residue numbering scheme for the CYP family. Comparing the bacterial and eukaryotic part of the family highlights the significant differences and reveals a well-known anomalous group of bacterial CYPs with some typically eukaryotic features. Our workflow for SSE annotation for CYP and other families can be freely used at address https://sestra.ncbr.muni.cz .
Asunto(s)
Sistema Enzimático del Citocromo P-450/química , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Animales , Humanos , Simulación de Dinámica MolecularRESUMEN
Two citations in the article by Sehnal et al. [(2020), Acta Cryst. D76, 1167-1173] are corrected.
RESUMEN
LiteMol suite is an innovative solution that enables near-instant delivery of model and experimental biomacromolecular structural data, providing users with an interactive and responsive experience in all modern web browsers and mobile devices. LiteMol suite is a combination of data delivery services (CoordinateServer and DensityServer), compression format (BinaryCIF), and a molecular viewer (LiteMol Viewer). The LiteMol suite is integrated into Protein Data Bank in Europe (PDBe) and other life science web applications (e.g., UniProt, Ensemble, SIB, and CNRS services), it is freely available at https://litemol.org , and its source code is available via GitHub. LiteMol suite provides advanced functionality (annotations and their visualization, powerful selection features), and this chapter will describe their use for visual inspection of protein structures.
Asunto(s)
Conformación Proteica , Proteínas/química , Bases de Datos de Proteínas , Europa (Continente) , Internet , Programas Informáticos , Interfaz Usuario-Computador , Navegador WebRESUMEN
Biomacromolecular structural data make up a vital and crucial scientific resource that has grown not only in terms of its amount but also in its size and complexity. Furthermore, these data are accompanied by large and increasing amounts of experimental data. Additionally, the macromolecular data are enriched with value-added annotations describing their biological, physicochemical and structural properties. Today, the scientific community requires fast and fully interactive web visualization to exploit this complex structural information. This article provides a survey of the available cutting-edge web services that address this challenge. Specifically, it focuses on data-delivery problems, discusses the visualization of a single structure, including experimental data and annotations, and concludes with a focus on the results of molecular-dynamics simulations and the visualization of structural ensembles.