Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 106
Filtrar
1.
ACS Omega ; 9(16): 18412-18428, 2024 Apr 23.
Artículo en Inglés | MEDLINE | ID: mdl-38680295

RESUMEN

The present study discusses the influence of the TRiC chaperonin involved in the folding of the component of reovirus mu1/σ3. The TRiC chaperone is treated as a provider of a specific external force field in the fuzzy oil drop model during the structural formation of a target folded protein. The model also determines the status of the final product, which represents the structure directed by an external force field in the form of a chaperonin. This can be used for in silico folding as the process is environment-dependent. The application of the model enables the quantitative assessment of the folding dependence of an external force field, which appears to have universal application.

2.
Front Chem ; 12: 1342434, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38595701

RESUMEN

Introduction: The protein folding process is very sensitive to environmental conditions. Many possibilities in the form of numerous pathways for this process can-if an incorrect one is chosen-lead to the creation of forms described as misfolded. The aqueous environment is the natural one for the protein folding process. Nonetheless, other factors such as the cell membrane and the presence of specific molecules (chaperones) affect this process, ensuring the correct expected structural form to guarantee biological activity. All these factors can be considered components of the external force field for this process. Methods: The fuzzy oil drop-modified (FOD-M) model makes possible the quantitative evaluation of the modification of the external field, treating the aqueous environment as a reference. The FOD-M model (tested on membrane proteins) includes the component modifying the water environment, allowing the assessment of the external force field generated by prefoldin. Results: In this work, prefoldin was treated as the provider of a specific external force field for actin and tubulin. The discussed model can be applied to any folding process simulation, taking into account the changed external conditions. Hence, it can help simulate the in silico protein folding process under defined external conditions determined by the respective external force field. In this work, the structures of prefoldin and protein folded with the participation of prefoldin were analyzed. Discussion: Thus, the role of prefoldin can be treated as a provider of an external field comparable to other environmental factors affecting the protein folding process.

3.
ACS Omega ; 9(7): 8188-8203, 2024 Feb 20.
Artículo en Inglés | MEDLINE | ID: mdl-38405467

RESUMEN

The biocatalysis process takes place with the participation of enzymes, which, depending on the reaction carried out, require, apart from the appropriate arrangement of catalytic residues, an appropriate external force field. It is generated by the protein body. The relatively small size of the part directly involved in the process itself is supported by the presence of an often complex structure of the protein body, the purpose of which is to provide an appropriate local force field, eliminating the influence of water. Very often, the large size of the enzyme is an expression of the complex form of this field. In this paper, a comparative analysis of arbitrarily selected enzymes, representatives of different enzyme classes, was carried out, focusing on the measurement of the diversity of the force field provided by a given protein. This analysis was based on the fuzzy oil drop model (FOD) and its modified version (FOD-M), which takes into account the participation of nonaqueous external factors in shaping the structure and thus the force field within the protein. The degree and type of ordering of the hydrophobicity distribution in the protein molecule is the result of the influence of the environment but also the supplier of the local environment for a given process, including the catalysis process in particular. Determining the share of a nonaqueous environment is important due to the ubiquity of polar water, whose participation in processes with high specificity requires control. It can be assumed that some enzymes in their composition have a permanently built-in part, the role of which is reduced to that of a permanent chaperone. It provides a specific external force field needed for the process. The proposed model, generalized to other types of proteins, may also provide a form of recording the environment model for the simulation of the in silico protein folding process, taking into account the impact of its differentiation.

4.
Nanomedicine (Lond) ; 19(4): 281-292, 2024 02.
Artículo en Inglés | MEDLINE | ID: mdl-38240228

RESUMEN

Aim: FeT is a complex of Fe3+, ferricyanide and tartrate, similar in structure to Prussian Blue. Its synthesis was planned to produce a potential antiproliferative drug. Methods: Dynamic light scattering was applied to study nanostructures formed by FeT complexes, while their biological activity was tested following changes in cell proliferation using cultured T24 human bladder cancer cells. Results: The antiproliferative activity of FeT derived from its ability to peroxidate unsaturated fatty acids, which can cause cell death through oxidative stress and/or ferroptosis. FeT molecules associate into drop-like nanostructures in water solutions, between 10-130 nm, which can bind albumin. Conclusion: Fatty acid peroxidation is significantly activated by light. The characteristics and reactivity of FeT represent a prospective application in medicine.


Asunto(s)
Hierro , Nanoestructuras , Humanos , Hierro/química , Ácidos Grasos Insaturados , Nanoestructuras/química , Ferrocianuros/química
5.
Proteins ; 92(5): 593-609, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38062872

RESUMEN

Transmembrane proteins are active in amphipathic environments. To stabilize the protein in such surrounding the exposure of hydrophobic residues on the protein surface is required. Transmembrane proteins are responsible for the transport of various molecules. Therefore, they often represent structures in the form of channels. This analysis focused on the stability and local flexibility of transmembrane proteins, particularly those related to their biological activity. Different forms of anchorage were identified using the fuzzy oil-drop model (FOD) and its modified form, FOD-M. The mainly helical as well as ß-barrel structural forms are compared with respect to the mechanism of stabilization in the cell membrane. The different anchoring system was found to stabilize protein molecules with possible local fluctuation.


Asunto(s)
Proteínas de la Membrana , Membrana Celular
6.
BMC Bioinformatics ; 24(1): 425, 2023 Nov 11.
Artículo en Inglés | MEDLINE | ID: mdl-37950210

RESUMEN

BACKGROUND: Recently, significant progress has been made in the field of protein structure prediction by the application of artificial intelligence techniques, as shown by the results of the CASP13 and CASP14 (Critical Assessment of Structure Prediction) competition. However, the question of the mechanism behind the protein folding process itself remains unanswered. Correctly predicting the structure also does not solve the problem of, for example, amyloid proteins, where a polypeptide chain with an unaltered sequence adopts a different 3D structure. RESULTS: This work was an attempt at explaining the structural variation by considering the contribution of the environment to protein structuring. The application of the fuzzy oil drop (FOD) model to assess the validity of the selected models provided in the CASP13, CASP14 and CASP15 projects reveals the need for an environmental factor to determine the 3D structure of proteins. Consideration of the external force field in the form of polar water (Fuzzy Oil Drop) and a version modified by the presence of the hydrophobic compounds, FOD-M (FOD-Modified) reveals that the protein folding process is environmentally dependent. An analysis of selected models from the CASP competitions indicates the need for structure prediction as dependent on the consideration of the protein folding environment. CONCLUSIONS: The conditions governed by the environment direct the protein folding process occurring in a certain environment. Therefore, the variation of the external force field should be taken into account in the models used in protein structure prediction.


Asunto(s)
Inteligencia Artificial , Proteínas , Modelos Moleculares , Proteínas/química , Pliegue de Proteína , Interacciones Hidrofóbicas e Hidrofílicas , Conformación Proteica
7.
BMC Bioinformatics ; 24(1): 418, 2023 Nov 07.
Artículo en Inglés | MEDLINE | ID: mdl-37932669

RESUMEN

BACKGROUND: The aqueous environment directs the protein folding process towards the generation of micelle-type structures, which results in the exposure of hydrophilic residues on the surface (polarity) and the concentration of hydrophobic residues in the center (hydrophobic core). Obtaining a structure without a hydrophobic core requires a different type of external force field than those generated by a water. The examples are membrane proteins, where the distribution of hydrophobicity is opposite to that of water-soluble proteins. Apart from these two extreme examples, the process of protein folding can be directed by chaperones, resulting in a structure devoid of a hydrophobic core. RESULTS: The current work presents such example: DnaJ Hsp40 in complex with alkaline phosphatase PhoA-U (PDB ID-6PSI)-the client molecule. The availability of WT form of the folding protein-alkaline phosphatase (PDB ID-1EW8) enables a comparative analysis of the structures: at the stage of interaction with the chaperone and the final, folded structure of this biologically active protein. The fuzzy oil drop model in its modified FOD-M version was used in this analysis, taking into account the influence of an external force field, in this case coming from a chaperone. CONCLUSIONS: The FOD-M model identifies the external force field introduced by chaperon influencing the folding proces. The identified specific external force field can be applied in Ab Initio protein structure prediction as the environmental conditioning the folding proces.


Asunto(s)
Fosfatasa Alcalina , Chaperonas Moleculares , Humanos , Fosfatasa Alcalina/metabolismo , Chaperonas Moleculares/metabolismo , Proteínas del Choque Térmico HSP40/metabolismo , Pliegue de Proteína , Agua
8.
Front Mol Biosci ; 10: 1230922, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37583961

RESUMEN

Proteins from the intrinsically disordered group (IDP) focus the attention of many researchers engaged in protein structure analysis. The main criteria used in their identification are lack of secondary structure and significant structural variability. This variability takes forms that cannot be identified in the X-ray technique. In the present study, different criteria were used to assess the status of IDP proteins and their fragments recognized as intrinsically disordered regions (IDRs). The status of the hydrophobic core in proteins identified as IDPs and in their complexes was assessed. The status of IDRs as components of the ordering structure resulting from the construction of the hydrophobic core was also assessed. The hydrophobic core is understood as a structure encompassing the entire molecule in the form of a centrally located high concentration of hydrophobicity and a shell with a gradually decreasing level of hydrophobicity until it reaches a level close to zero on the protein surface. It is a model assuming that the protein folding process follows a micellization pattern aiming at exposing polar residues on the surface, with the simultaneous isolation of hydrophobic amino acids from the polar aquatic environment. The use of the model of hydrophobicity distribution in proteins in the form of the 3D Gaussian distribution described on the protein particle introduces the possibility of assessing the degree of similarity to the assumed micelle-like distribution and also enables the identification of deviations and mismatch between the actual distribution and the idealized distribution. The FOD (fuzzy oil drop) model and its modified FOD-M version allow for the quantitative assessment of these differences and the assessment of the relationship of these areas to the protein function. In the present work, the sections of IDRs in protein complexes classified as IDPs are analyzed. The classification "disordered" in the structural sense (lack of secondary structure or high flexibility) does not always entail a mismatch with the structure of the hydrophobic core. Particularly, the interface area, often consisting of IDRs, in many analyzed complexes shows the compliance of the hydrophobicity distribution with the idealized distribution, which proves that matching to the structure of the hydrophobic core does not require secondary structure ordering.

9.
Acta Biochim Pol ; 70(2): 435-445, 2023 Jun 18.
Artículo en Inglés | MEDLINE | ID: mdl-37330698

RESUMEN

Numerous Alpha-synuclein amyloid structures available in PDB enable their comparative analysis. They are all characterized by a flat structure of each individual chain with an extensive network of inter-chain hydrogen bonds. The identification of such amyloid fibril structures requires determining the special conditions imposed on the torsion angles. Such conditions have already been formulated by the Authors resulting in the model of idealised amyloid. Here, we investigate the fit of this model in the group of A-Syn amyloid fibrils. We identify and describe the characteristic supersecondary structures in amyloids. Generally, the amyloid transformation is suggested to be the 3D to 2D transformation engaging mostly the loops linking Beta-structural fragments. The loop structure introducing the 3D organisation of Beta-sheet change to flat form (2D) introduces the mutual reorientation of Beta-strands enabling the large-scale H-bonds generation with the water molecules. Based on the model of idealised amyloid we postulate the hypothesis for amyloid fibril formation based on the shaking, an experimental procedure producing the amyloids.


Asunto(s)
Amiloide , alfa-Sinucleína , Amiloide/química , alfa-Sinucleína/química , Estructura Secundaria de Proteína , Proteínas Amiloidogénicas
10.
Entropy (Basel) ; 25(6)2023 May 26.
Artículo en Inglés | MEDLINE | ID: mdl-37372194

RESUMEN

Interpreting biological phenomena at the molecular and cellular levels reveals the ways in which information that is specific to living organisms is processed: from the genetic record contained in a strand of DNA, to the translation process, and then to the construction of proteins that carry the flow and processing of information as well as reveal evolutionary mechanisms. The processing of a surprisingly small amount of information, i.e., in the range of 1 GB, contains the record of human DNA that is used in the construction of the highly complex system that is the human body. This shows that what is important is not the quantity of information but rather its skillful use-in other words, this facilitates proper processing. This paper describes the quantitative relations that characterize information during the successive steps of the "biological dogma", illustrating a transition from the recording of information in a DNA strand to the production of proteins exhibiting a defined specificity. It is this that is encoded in the form of information and that determines the unique activity, i.e., the measure of a protein's "intelligence". In a situation of information deficit at the transformation stage of a primary protein structure to a tertiary or quaternary structure, a particular role is served by the environment as a supplier of complementary information, thus leading to the achievement of a structure that guarantees the fulfillment of a specified function. Its quantitative evaluation is possible via using a "fuzzy oil drop" (FOD), particularly with respect to its modified version. This can be achieved when taking into account the participation of an environment other than water in the construction of a specific 3D structure (FOD-M). The next step of information processing on the higher organizational level is the construction of the proteome, where the interrelationship between different functional tasks and organism requirements can be generally characterized by homeostasis. An open system that maintains the stability of all components can be achieved exclusively in a condition of automatic control that is realized by negative feedback loops. This suggests a hypothesis of proteome construction that is based on the system of negative feedback loops. The purpose of this paper is the analysis of information flow in organisms with a particular emphasis on the role of proteins in this process. This paper also presents a model introducing the component of changed conditions and its influence on the protein folding process-since the specificity of proteins is coded in their structure.

11.
Biomedicines ; 11(5)2023 Apr 29.
Artículo en Inglés | MEDLINE | ID: mdl-37238996

RESUMEN

The structural transformation producing amyloids is a phenomenon that sheds new light on the protein folding problem. The analysis of the polymorphic structures of the α-synuclein amyloid available in the PDB database allows analysis of the amyloid-oriented structural transformation itself, but also the protein folding process as such. The polymorphic amyloid structures of α-synuclein analyzed employing the hydrophobicity distribution (fuzzy oil drop model) reveal a differentiation with a dominant distribution consistent with the micelle-like system (hydrophobic core with polar shell). This type of ordering of the hydrophobicity distribution covers the entire spectrum from the example with all three structural units (single chain, proto-fibril, super-fibril) exhibiting micelle-like form, through gradually emerging examples of local disorder, to structures with an extremely different structuring pattern. The water environment directing protein structures towards the generation of ribbon micelle-like structures (concentration of hydrophobic residues in the center of the molecule forming a hydrophobic core with the exposure of polar residues on the surface) also plays a role in the amyloid forms of α-synuclein. The polymorphic forms of α-synuclein reveal local structural differentiation with a common tendency to accept the micelle-like structuralization in certain common fragments of the polypeptide chain of this protein.

12.
J Cell Biochem ; 124(6): 818-835, 2023 06.
Artículo en Inglés | MEDLINE | ID: mdl-37139783

RESUMEN

Generating the structure of the hydrophobic core is based on the orientation of hydrophobic residues towards the central part of the protein molecule with the simultaneous exposure of polar residues. Such a course of the protein folding process takes place with the active participation of the polar water environment. While the self-assembly process leading to the formation of micelles concerns freely moving bi-polar molecules, bipolar amino acids in polypeptide chain have limited mobility due to the covalent bonds. Therefore, proteins form a more or less perfect micelle-like structure. The criterion is the hydrophobicity distribution, which to a greater or lesser extent reproduces the distribution expressed by the 3D Gaussian function on the protein body. The vast majority of proteins must ensure solubility, so a certain part of it-as it is expected-should reproduce the structuring of micelles. The biological activity of proteins is encoded in the part that does not reproduce the micelle-like system. The location and quantitative assessment of the contribution of orderliness to disorder is of critical importance for the determination of biological activity. The form of maladjustment to the 3D Gauss function may be varied-hence the obtained high diversity of specific interactions with strictly defined molecules: ligands or substrates. The correctness of this interpretation was verified on the basis of the group of enzymes Peptidylprolyl isomerase-E.C.5.2.1.8. In proteins representing this class of enzymes, zones responsible for solubility-micelle-like hydrophobicity system-the location and specificity of the incompatible part in which the specific activity of the enzyme is located and coded were identified. The present study showed that the enzymes of the discussed group show two different schemes of the structure of catalytic center (taking into account the status as defined by the fuzzy oil drop model).


Asunto(s)
Micelas , Isomerasa de Peptidilprolil , Modelos Moleculares , Proteínas/química , Péptidos/química
13.
Proteins ; 91(5): 608-618, 2023 05.
Artículo en Inglés | MEDLINE | ID: mdl-36448315

RESUMEN

The protein secondary structure (SS) prediction plays an important role in the characterization of general protein structure and function. In recent years, a new generation of algorithms for SS prediction based on embeddings from protein language models (pLMs) is emerging. These algorithms reach state-of-the-art accuracy without the need for time-consuming multiple sequence alignment (MSA) calculations. Long short-term memory (LSTM)-based SPOT-1D-LM and NetSurfP-3.0 are the latest examples of such predictors. We present the ProteinUnetLM model using a convolutional Attention U-Net architecture that provides prediction quality and inference times at least as good as the best LSTM-based models for 8-class SS prediction (SS8). Additionally, we address the issue of the heavily imbalanced nature of the SS8 problem by extending the loss function with the Matthews correlation coefficient, and by proper assessment using previously introduced adjusted geometric mean (AGM) metric. ProteinUnetLM achieved better AGM and sequence overlap score than LSTM-based predictors, especially for the rare structures 310-helix (G), beta-bridge (B), and high curvature loop (S). It is also competitive on challenging datasets without homologs, free-modeling targets, and chameleon sequences. Moreover, ProteinUnetLM outperformed its previous MSA-based version ProteinUnet2, and provided better AGM than AlphaFold2 for 1/3 of proteins from the CASP14 dataset, proving its potential for making a significant step forward in the domain. To facilitate the usage of our solution by protein scientists, we provide an easy-to-use web interface under https://biolib.com/SUT/ProteinUnetLM/.


Asunto(s)
Memoria a Corto Plazo , Redes Neurales de la Computación , Proteínas/química , Algoritmos , Estructura Secundaria de Proteína
14.
Membranes (Basel) ; 12(12)2022 Nov 30.
Artículo en Inglés | MEDLINE | ID: mdl-36557119

RESUMEN

Proteins transporting ions or other molecules across the membrane, whose proper concentration is required to maintain homeostasis, perform very sophisticated biological functions. The symport and antiport active transport can be performed only by the structures specially prepared for this purpose. In the present work, such structures in both In and Out conformations have been analyzed with respect to the hydrophobicity distribution using the FOD-M model. This allowed for identifying the role of individual protein chain fragments in the stabilization of the specific cell membrane environment as well as the contribution of hydrophobic interactions to the conformational changes between In/Out conformations.

15.
PLoS One ; 17(10): e0275300, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36215254

RESUMEN

A collection of intrinsically disordered proteins (IDPs) having regions with the status of intrinsically disordered (IDR) according to the Disprot database was analyzed from the point of view of the structure of hydrophobic core in the structural unit (chain / domain). The analysis includes all the Homo Sapiens as well as Mus Musculus proteins present in the DisProt database for which the structure is available. In the analysis, the fuzzy oil drop modified model (FOD-M) was used, taking into account the external force field, modified by the presence of other factors apart from polar water, influencing protein structuring. The paper presents an alternative to secondary-structure-based classification of intrinsically disordered regions (IDR). The basis of our classification is the ordering of hydrophobic core as calculated by the FOD-M model resulting in FOD-ordered or FOD-unordered IDRs.


Asunto(s)
Proteínas Intrínsecamente Desordenadas , Animales , Humanos , Interacciones Hidrofóbicas e Hidrofílicas , Proteínas Intrínsecamente Desordenadas/química , Ratones , Conformación Proteica , Estructura Secundaria de Proteína , Agua
16.
Int J Mol Sci ; 23(16)2022 Aug 11.
Artículo en Inglés | MEDLINE | ID: mdl-36012200

RESUMEN

The uptake and distribution of doxorubicin in the MCF7 line of breast-cancer cells were monitored by Raman measurements. It was demonstrated that bioavailability of doxorubicin can be significantly enhanced by applying Congo red. To understand the mechanism of doxorubicin delivery by Congo red supramolecular carriers, additional monolayer measurements and molecular dynamics simulations on model membranes were undertaken. Acting as molecular scissors, Congo red particles cut doxorubicin aggregates and incorporated them into small-sized Congo red clusters. The mixed doxorubicin/Congo red clusters were adsorbed to the hydrophilic part of the model membrane. Such behavior promoted transfer through the membrane.


Asunto(s)
Rojo Congo , Doxorrubicina , Rojo Congo/farmacología , Doxorrubicina/farmacología , Excipientes , Interacciones Hidrofóbicas e Hidrofílicas
17.
Int J Mol Sci ; 23(16)2022 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-36012765

RESUMEN

The specificity of the available experimentally determined structures of amyloid forms is expressed primarily by the two- and not three-dimensional forms of a single polypeptide chain. Such a flat structure is possible due to the ß structure, which occurs predominantly. The stabilization of the fibril in this structure is achieved due to the presence of the numerous hydrogen bonds between the adjacent chains. Together with the different forms of twists created by the single R- or L-handed α-helices, they form the hydrogen bond network. The specificity of the arrangement of these hydrogen bonds lies in their joint orientation in a system perpendicular to the plane formed by the chain and parallel to the fibril axis. The present work proposes the possible mechanism for obtaining such a structure based on the geometric characterization of the polypeptide chain constituting the basis of our early intermediate model for protein folding introduced formerly. This model, being the conformational subspace of Ramachandran plot (the ellipse path), was developed on the basis of the backbone conformation, with the side-chain interactions excluded. Our proposal is also based on the results from molecular dynamics available in the literature leading to the unfolding of α-helical sections, resulting in the ß-structural forms. Both techniques used provide a similar suggestion in a search for a mechanism of conformational changes leading to a formation of the amyloid form. The potential mechanism of amyloid transformation is presented here using the fragment of the transthyretin as well as amyloid Aß.


Asunto(s)
Amiloide , Pliegue de Proteína , Amiloide/metabolismo , Proteínas Amiloidogénicas , Enlace de Hidrógeno , Simulación de Dinámica Molecular , Péptidos
18.
Biomedicines ; 10(7)2022 Jun 25.
Artículo en Inglés | MEDLINE | ID: mdl-35884807

RESUMEN

Connexins and pannexins are the transmembrane proteins of highly distinguished biological activity in the form of transport of molecules and electrical signals. Their common role is to connect the external environment with the cytoplasm of the cell, while connexin is also able to link two cells together allowing the transport from one to another. The analysis presented here aims to identify the similarities and differences between connexin and pannexin. As a comparative criterion, the hydrophobicity distribution in the structure of the discussed proteins was used. The comparative analysis is carried out with the use of a mathematical model, the FOD-M model (fuzzy oil drop model in its Modified version) expressing the specificity of the membrane's external field, which in the case of the discussed proteins is significantly different from the external field for globular proteins in the polar environment of water. The characteristics of the external force field influence the structure of protein allowing the activity in a different environment.

19.
J Mol Graph Model ; 114: 108166, 2022 07.
Artículo en Inglés | MEDLINE | ID: mdl-35325843

RESUMEN

During the protein folding process in computer simulations involving the use of a United RESidue (UNRES) force field, an additional module was introduced to represent directly the presence of a polar solvent in water form. This module implements the fuzzy oil drop model (FOD) where the 3D Gauss function expresses the presence of a polar environment which directs the polypeptide chain folding process towards the generation of a centric hydrophobic core. Sample test polypeptide chains of 8 proteins with chain lengths ranging from 37 to 75 aa were simulated in silico using the UNRES (U) package with an implicit solvent model and a built-in module expressing the FOD model (UNRES-FOD-UNRES (U + F) interleaved simulation). The protein structure obtained by both *** simulation schemes, i.e., accordingly***U and U + F, for all the analyzed protein models shows the presence of a hydrophobic core including where it is absent in the native structure. The proposed FOD-M model (M-modified) explaining the source of this phenomenon reveals the need to modify the external field expressing the role of a folding environment. The modification takes into account the influence of other than polar factors present in the folding environment.


Asunto(s)
Pliegue de Proteína , Proteínas , Simulación por Computador , Péptidos/química , Conformación Proteica , Proteínas/química , Solventes
20.
BMC Bioinformatics ; 23(1): 100, 2022 Mar 22.
Artículo en Inglés | MEDLINE | ID: mdl-35317722

RESUMEN

BACKGROUND: The prediction of protein secondary structures is a crucial and significant step for ab initio tertiary structure prediction which delivers the information about proteins activity and functions. As the experimental methods are expensive and sometimes impossible, many SS predictors, mainly based on different machine learning methods have been proposed for many years. Currently, most of the top methods use evolutionary-based input features produced by PSSM and HHblits software, although quite recently the embeddings-the new description of protein sequences generated by language models (LM) have appeared that could be leveraged as input features. Apart from input features calculation, the top models usually need extensive computational resources for training and prediction and are barely possible to run on a regular PC. SS prediction as the imbalanced classification problem should not be judged by the commonly used Q3/Q8 metrics. Moreover, as the benchmark datasets are not random samples, the classical statistical null hypothesis testing based on the Neyman-Pearson approach is not appropriate. RESULTS: We present a lightweight deep network ProteinUnet2 for SS prediction which is based on U-Net convolutional architecture and evolutionary-based input features (from PSSM and HHblits) as well as SPOT-Contact features. Through an extensive evaluation study, we report the performance of ProteinUnet2 in comparison with top SS prediction methods based on evolutionary information (SAINT and SPOT-1D). We also propose a new statistical methodology for prediction performance assessment based on the significance from Fisher-Pitman permutation tests accompanied by practical significance measured by Cohen's effect size. CONCLUSIONS: Our results suggest that ProteinUnet2 architecture has much shorter training and inference times while maintaining results similar to SAINT and SPOT-1D predictors. Taking into account the relatively long times of calculating evolutionary-based features (from PSSM in particular), it would be worth conducting the predictive ability tests on embeddings as input features in the future. We strongly believe that our proposed here statistical methodology for the evaluation of SS prediction results will be adopted and used (and even expanded) by the research community.


Asunto(s)
Biología Computacional , Proteínas , Secuencia de Aminoácidos , Biología Computacional/métodos , Bases de Datos de Proteínas , Estructura Secundaria de Proteína , Proteínas/química
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...