RESUMO
This paper describes outcomes of the 2019 Cryo-EM Model Challenge. The goals were to (1) assess the quality of models that can be produced from cryogenic electron microscopy (cryo-EM) maps using current modeling software, (2) evaluate reproducibility of modeling results from different software developers and users and (3) compare performance of current metrics used for model evaluation, particularly Fit-to-Map metrics, with focus on near-atomic resolution. Our findings demonstrate the relatively high accuracy and reproducibility of cryo-EM models derived by 13 participating teams from four benchmark maps, including three forming a resolution series (1.8 to 3.1 Å). The results permit specific recommendations to be made about validating near-atomic cryo-EM structures both in the context of individual experiments and structure data archives such as the Protein Data Bank. We recommend the adoption of multiple scoring parameters to provide full and objective annotation and assessment of the model, reflective of the observed cryo-EM map density.
Assuntos
Microscopia Crioeletrônica/métodos , Modelos Moleculares , Cristalografia por Raios X , Conformação Proteica , Proteínas/químicaRESUMO
MOTIVATION: The state-of-art protein structure prediction methods such as AlphaFold are being widely used to predict structures of uncharacterized proteins in biomedical research. There is a significant need to further improve the quality and nativeness of the predicted structures to enhance their usability. In this work, we develop ATOMRefine, a deep learning-based, end-to-end, all-atom protein structural model refinement method. It uses a SE(3)-equivariant graph transformer network to directly refine protein atomic coordinates in a predicted tertiary structure represented as a molecular graph. RESULTS: The method is first trained and tested on the structural models in AlphaFoldDB whose experimental structures are known, and then blindly tested on 69 CASP14 regular targets and 7 CASP14 refinement targets. ATOMRefine improves the quality of both backbone atoms and all-atom conformation of the initial structural models generated by AlphaFold. It also performs better than two state-of-the-art refinement methods in multiple evaluation metrics including an all-atom model quality score-the MolProbity score based on the analysis of all-atom contacts, bond length, atom clashes, torsion angles, and side-chain rotamers. As ATOMRefine can refine a protein structure quickly, it provides a viable, fast solution for improving protein geometry and fixing structural errors of predicted structures through direct coordinate refinement. AVAILABILITY AND IMPLEMENTATION: The source code of ATOMRefine is available in the GitHub repository (https://github.com/BioinfoMachineLearning/ATOMRefine). All the required data for training and testing are available at https://doi.org/10.5281/zenodo.6944368.
Assuntos
Proteínas , Software , Proteínas/química , Conformação MolecularRESUMO
MOTIVATION: Quality assessment (QA) of predicted protein tertiary structure models plays an important role in ranking and using them. With the recent development of deep learning end-to-end protein structure prediction techniques for generating highly confident tertiary structures for most proteins, it is important to explore corresponding QA strategies to evaluate and select the structural models predicted by them since these models have better quality and different properties than the models predicted by traditional tertiary structure prediction methods. RESULTS: We develop EnQA, a novel graph-based 3D-equivariant neural network method that is equivariant to rotation and translation of 3D objects to estimate the accuracy of protein structural models by leveraging the structural features acquired from the state-of-the-art tertiary structure prediction method-AlphaFold2. We train and test the method on both traditional model datasets (e.g. the datasets of the Critical Assessment of Techniques for Protein Structure Prediction) and a new dataset of high-quality structural models predicted only by AlphaFold2 for the proteins whose experimental structures were released recently. Our approach achieves state-of-the-art performance on protein structural models predicted by both traditional protein structure prediction methods and the latest end-to-end deep learning method-AlphaFold2. It performs even better than the model QA scores provided by AlphaFold2 itself. The results illustrate that the 3D-equivariant graph neural network is a promising approach to the evaluation of protein structural models. Integrating AlphaFold2 features with other complementary sequence and structural features is important for improving protein model QA. AVAILABILITY AND IMPLEMENTATION: The source code is available at https://github.com/BioinfoMachineLearning/EnQA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Redes Neurais de Computação , Proteínas , Proteínas/química , Software , RotaçãoRESUMO
BACKGROUND: The Endoscopic Purse-string Suture (EPSS) technique has gained attention for its potential in closing large defects following gastrointestinal procedures. However, its application in fistula closure is not as widely reported. This study aims to evaluate the safety and efficacy of EPSS and naso-jejunal tube feeding in the closure of duodenal cutaneous fistulas and gastric cutaneous fistulas. METHODS: This single-center retrospective study, conducted from September 2020 to September 2023 at Tongji University in Shanghai, China, examined the outcomes of EPPS and nasojejunal feeding for patients with gastric and duodenal cutaneous fistulas (n = 10). Demographic data, fistula characteristics, procedure technique and outcomes were evaluated. RESULTS: In this study, the average size of a fistula opening was 7.9 ± 4.6 mm. The operations took an average of 25.8 ± 5.6 min. Patients typically needed naso-jejunal tube feeding for a median of 14.0 days, with an interquartile range (IQR) of 7.7-19.0 days. The median duration of hospital stay post-operation was 16.5 days, with an IQR of 7.0-25.0 days. Nine patients were successful in their initial fistula closure using the EPSS technique. The other patient underwent a second EPSS and, ultimately, all patients experienced complete healing and fully recovered. There were no major adverse events reported. CONCLUSIONS: EPSS and naso-jejunal tube feeding are a safe and effective treatment option for duodenal and gastric cutaneous fistulas. Larger, prospective studies are needed to validate these findings and establish the long-term safety and efficacy of this approach.
RESUMO
In recent years, the question of whether executive function (EF) is malleable has been widely documented. Despite using the same training tasks, transfer effects remain uncertain. Researchers suggested that the inconsistency might be attributed to individual differences in temperamental traits. In the current study, we investigated how effortful control, a temperamental trait, would affect EF training outcomes in children. Based on parent rating, 79 6-year-old preschoolers were identified as having higher or lower effort control and were assigned to three conditions: working memory (WM) training, inhibitory control (IC) training, and a business-as-usual control group. Children completed assessments at baseline, 1 week after intervention (posttest), and 3 months after intervention (follow-up). As compared with the control group, the WM and IC training groups showed improvement in both trained tasks and nontrained measures. At baseline, children with higher effortful control scores showed greater WM capacity and better IC. Furthermore, effortful control was positively correlated with training gain in both training groups, with children with higher effortful control benefitting more through training. In the WM training group, effortful control was positively correlated with near transfer on WM outcomes both immediately and longitudinally. At posttest, the WM and IC training groups showed a positive correlation between effortful control and fluid intelligence performance. Our results underscore the importance of individual differences in training benefits, in particular the role of effortful control, and further illustrate the potential avenues for designing more effective individualized cognitive training programs to foster learning and optimize children's development.
Assuntos
Função Executiva , Aprendizagem , Criança , Humanos , Memória de Curto Prazo , Inteligência , IndividualidadeRESUMO
Benzovindiflupyr is a succinate dehydrogenase inhibitor fungicide that targets mitochondrial function for disease control. In this study, we investigated the adsorption-desorption and leaching behavior of benzovindiflupyr in eight soil types using the batch equilibrium method and the soil column leaching method. A Freundlich model (r2 > 0.9959) was used to better characterize the adsorption-desorption process in eight soil types, with adsorption coefficients (KF-ads) ranging from 2.303 to 17.886. KF-ads was significantly and positively correlated (p < 0.05) with the organic carbon content. High temperatures and increased initial pH of aqueous solutions led to a decrease in benzovindiflupyr adsorption in the soil. The adsorption was also influenced by factors such as ionic strength, humic acid, surfactant type, microplastic type, and particle size and concentration. Moreover, benzovindiflupyr exhibited low leachability in all four soils selected, but different leaching solutions affected the risk of benzovindiflupyr migration to groundwater. Overall, this study provides insights into the adsorption characteristics of benzovindiflupyr in different soils and provides key information for environmental risk assessment.
Assuntos
Fungicidas Industriais , Poluentes do Solo , Solo , Adsorção , Poluentes do Solo/química , Solo/química , Fungicidas Industriais/química , Concentração de Íons de Hidrogênio , Concentração Osmolar , Substâncias Húmicas , Tamanho da Partícula , Poluentes Químicos da Água/química , Água Subterrânea/químicaRESUMO
Utilizing infrared spectroscopy coupled with batch equilibrium methods, the adsorption and desorption characteristics of the novel Insecticide fluchlordiniliprole were assessed in four different soil types. It was found that fluchlordiniliprole's adsorption and desorption in these soils were consistent with the Freundlich isotherm, exhibiting adsorption capacities (KF-ads) ranging from 8.436 to 36.269. Temperature fluctuations, encompassing both high and low extremes, impaired the ability of soil to adsorb fluchlordiniliprole. In addition, adsorption dynamics were modulated by several other factors, including soil pH, ionic strength, amendments (e.g., biochar and humic substances), and the presence of various surfactants and microplastics. Although capable of leaching, fluchlordiniliprole exhibited weak mobility in most soils. Therefore, it appears that fluchlordiniliprole seems to pose a threat to surface soil and aquatic biota, but a minimal threat to groundwater. SYNOPSIS STATEMENT: This research examines the dynamics of fluchlordiniliprole in soil, an will aid in maintaining ecological safety and managing agricultural pesticides. The study's comprehensive analysis of adsorption, desorption, and soil migration patterns significantly contributes to our understanding of pesticide interactions with diverse soil types. The results of this study will enable the development of environmentally responsible agricultural practices.
Assuntos
Poluentes do Solo , Solo , Adsorção , Solo/química , Poluentes do Solo/química , Poluentes do Solo/análise , Inseticidas/química , Concentração de Íons de Hidrogênio , Concentração Osmolar , Carvão Vegetal/química , Temperatura , Tensoativos/química , Substâncias HúmicasRESUMO
Microplastics (MPs) and pesticides are two categories contaminants with proposed negative impacts to aqueous ecosystems, and adsorption of pesticides on MPs may result in their long-range transport and compound combination effects. Florpyrauxifen-benzyl, a novel pyridine-2-carboxylate auxin herbicide has been widely used to control weeds in paddy field, but the insights of which are extremely limited. Therefore, adsorption and desorption behaviors of florpyrauxifen-benzyl on polyvinyl chloride (PVC), polyethylene (PE) and disposable face masks (DFMs) in five water environment were investigated. The impacts of various environmental factors on adsorption capacity were evaluated, as well as adsorption mechanisms. The results revealed significant variations in adsorption capacity of florpyrauxifen-benzyl on three MPs, with approximately order of DFMs > PE > PVC. The discrepancy can be attributed to differences in structural and physicochemical properties, as evidenced by various characterization analysis. The kinetics and isotherm of florpyrauxifen-benzyl on three MPs were suitable for different models, wherein physical force predominantly governed adsorption process. Thermodynamic analysis revealed that both high and low temperatures weakened PE and DFMs adsorption, whereas temperature exhibited negligible impact on PVC adsorption. The adsorption capacity was significantly influenced by most environmental factors, particularly pH, cations and coexisting herbicide. This study provides valuable insights into the fate of florpyrauxifen-benzyl in presence of MPs, suggesting that PVC, PE and DFMs can serve as carriers of florpyrauxifen-benzyl in aquatic environment.
Assuntos
Herbicidas , Praguicidas , Poluentes Químicos da Água , Microplásticos/toxicidade , Microplásticos/química , Plásticos/química , Adsorção , Ecossistema , Água , Polietileno/química , Praguicidas/análise , Herbicidas/análise , Poluentes Químicos da Água/análiseRESUMO
BACKGROUND: Chest X-rays (CXR) are widely used to facilitate the diagnosis and treatment of critically ill and emergency patients in clinical practice. Accurate hemi-diaphragm detection based on postero-anterior (P-A) CXR images is crucial for the diaphragm function assessment of critically ill and emergency patients to provide precision healthcare for these vulnerable populations. OBJECTIVE: Therefore, an effective and accurate hemi-diaphragm detection method for P-A CXR images is urgently developed to assess these vulnerable populations' diaphragm function. METHODS: Based on the above, this paper proposes an effective hemi-diaphragm detection method for P-A CXR images based on the convolutional neural network (CNN) and graphics. First, we develop a robust and standard CNN model of pathological lungs trained by human P-A CXR images of normal and abnormal cases with multiple lung diseases to extract lung fields from P-A CXR images. Second, we propose a novel localization method of the cardiophrenic angle based on the two-dimensional projection morphology of the left and right lungs by graphics for detecting the hemi-diaphragm. RESULTS: The mean errors of the four key hemi-diaphragm points in the lung field mask images abstracted from static P-A CXR images based on five different segmentation models are 9.05, 7.19, 7.92, 7.27, and 6.73 pixels, respectively. Besides, the results also show that the mean errors of these four key hemi-diaphragm points in the lung field mask images abstracted from dynamic P-A CXR images based on these segmentation models are 5.50, 7.07, 4.43, 4.74, and 6.24 pixels,respectively. CONCLUSION: Our proposed hemi-diaphragm detection method can effectively perform hemi-diaphragm detection and may become an effective tool to assess these vulnerable populations' diaphragm function for precision healthcare.
Assuntos
Diafragma , Redes Neurais de Computação , Radiografia Torácica , Humanos , Diafragma/diagnóstico por imagem , Radiografia Torácica/métodos , Pneumopatias/diagnóstico por imagemRESUMO
Understanding the spatiotemporal changes in net primary productivity (NPP) and the driving factors behind these changes in climate-vulnerable regions is crucial for ecological conservation. This study simulates the actual NPP (NPPA) and climate potential NPP (NPPC) in the Three-River Headwaters Region from 2000 to 2020. The Theil-Sen Median method and Mann-Kendall mutation analyses are employed to explore their spatiotemporal variation patterns, while geographic weighted regression and machine learning are used to investigate the influence of anthropogenic activities and climatic factors on NPPA, the results indicate that the average NPPA across the entire region over multiple years is 382.506 g C m - 2 yr - 1 , which is 0.132 times the average annual NPPC over the past 21 years, showing an overall distribution pattern of low in the northwest and high in the southeast. The annual increase in NPPA from 2000 to 2020 is approximately 1.034 g C m - 2 yr - 1 . The source region of the Yangtze River shows the largest improvement in vegetation, with 74.1% of the area showing improvement. Between 2002 and 2003, the annual NPPA in the Three-River Headwaters Region experienced a sudden change, lagging behind the NPPC change by 1 year, and after 2005, the upward trend in NPPA became more pronounced. The impact of anthropogenic activities on NPPA shifted from positive to negative to positive from 2000 to 2020, with significant impact areas mainly concentrated in the northeast and a few areas in the central and southern parts. The proportion of areas with extremely significant impact increased from 1.9% in 2000 to 3.7% in 2020. Over the past 21 years, the main factors influencing NPPA changes in the Three-River Headwaters Region have been soil moisture and precipitation, with the influence of different climate factors on NPP changing over time. Additionally, NPP is more sensitive to changes in altitude in low-altitude areas. This study can provide more accurate theoretical support for ecological environment assessment and subsequent protection efforts in the Three-River Headwaters Region.
Assuntos
Monitoramento Ambiental , Rios , Rios/química , Mudança Climática , Efeitos Antropogênicos , China , EcossistemaRESUMO
We present the results for CAPRI Round 54, the 5th joint CASP-CAPRI protein assembly prediction challenge. The Round offered 37 targets, including 14 homodimers, 3 homo-trimers, 13 heterodimers including 3 antibody-antigen complexes, and 7 large assemblies. On average ~70 CASP and CAPRI predictor groups, including more than 20 automatics servers, submitted models for each target. A total of 21 941 models submitted by these groups and by 15 CAPRI scorer groups were evaluated using the CAPRI model quality measures and the DockQ score consolidating these measures. The prediction performance was quantified by a weighted score based on the number of models of acceptable quality or higher submitted by each group among their five best models. Results show substantial progress achieved across a significant fraction of the 60+ participating groups. High-quality models were produced for about 40% of the targets compared to 8% two years earlier. This remarkable improvement is due to the wide use of the AlphaFold2 and AlphaFold2-Multimer software and the confidence metrics they provide. Notably, expanded sampling of candidate solutions by manipulating these deep learning inference engines, enriching multiple sequence alignments, or integration of advanced modeling tools, enabled top performing groups to exceed the performance of a standard AlphaFold2-Multimer version used as a yard stick. This notwithstanding, performance remained poor for complexes with antibodies and nanobodies, where evolutionary relationships between the binding partners are lacking, and for complexes featuring conformational flexibility, clearly indicating that the prediction of protein complexes remains a challenging problem.
Assuntos
Algoritmos , Mapeamento de Interação de Proteínas , Mapeamento de Interação de Proteínas/métodos , Conformação Proteica , Ligação Proteica , Simulação de Acoplamento Molecular , Biologia Computacional/métodos , SoftwareRESUMO
Intestinal inflammation modifies host physiology to promote the occurrence of colorectal cancer (CRC), as seen in colitis-associated CRC. Gut microbiota is crucial in cancer progression, primarily by inducing intestinal chronic inflammatory microenvironment, leading to DNA damage, chromosomal mutation, and alterations in specific metabolite production. Therefore, there is an increasing interest in microbiota-based prevention and treatment strategies, such as probiotics, prebiotics, microbiota-derived metabolites, and fecal microbiota transplantation. This review aims to provide valuable insights into the potential correlations between gut microbiota and colitis-associated CRC, as well as the promising microbiota-based strategies for colitis-associated CRC.
RESUMO
Propyrisulfuron is a novel sulfonylurea herbicide used for controlling annual grass and broad-leaved weeds in fields, but its fates and behaviors in environment are still unknown, which are of utmost importance for environmental protection. To reduce its potential environmental risks in agricultural production, the hydrolysis kinetics, influence of 34 environmental factors including 12 microplastics (MPs), disposable face masks (DFMs) and its different parts, 6 fertilizers, 5 ions, 3 surfactants, a co-existed herbicide of florpyrauxifen-benzy, humic acid and biochar, and the effect of MPs and DFMs on its hydrolysis mechanisms were systematically investigated. The main hydrolysis products (HPs), possible mechanisms, toxicities and potential risks to aquatic organisms were studied. Propyrisulfuron hydrolysis was an acid catalytic pyrolysis, endothermic and spontaneous process driven by the reduction of activation enthalpy, and followed the first-order kinetics. All environmental factors can accelerate propyrisulfuron hydrolysis to varying degrees except humic acid, and different hydrolysis mechanisms occurred in the presence of MPs and DFMs. In addition, 10 possible HPs and 7 possible mechanisms were identified and proposed. ECOSAR prediction and ecotoxicity testing showed that acute toxicity of propyrisulfuron and its HPs for aquatic organisms were low, but may have high chronic toxicity and pose a potential threat to aquatic ecosystems. The investigations are significantly important for elucidating the environmental fates and behaviors of propyrisulfuron, assessing the risks in environmental protection, and further providing guidance for scientific application in agro-ecosystem.
Assuntos
Herbicidas , Água , Ecossistema , Substâncias Húmicas , Hidrólise , Cinética , Plásticos , Herbicidas/toxicidade , MicroplásticosRESUMO
Florpyrauxifen-benzyl is a novel herbicide used to control weeds in paddy fields. To clarify and evaluate its hydrolytic behavior and safety in water environments, its hydrolytic characteristics were investigated under varying temperatures, pH values, initial mass concentrations and water types, as well as the effects of 40 environmental factors such as microplastics (MPs) and disposable face masks (DFMs). Meanwhile, hydrolytic products were identified by UPLC-QTOF-MS/MS, and its hydrolytic pathways were proposed. The effects of MPs and DFMs on hydrolytic products and pathways were also investigated. The results showed that hydrolysis of florpyrauxifen-benzyl was a spontaneous process driven by endothermic, base catalysis and activation entropy increase and conformed to the first-order kinetics. The temperature had an obvious effect on hydrolysis rate under alkaline condition, the hydrolysis reaction conformed to Arrhenius formula, and activation enthalpy, activation entropy, and Gibbs free energy were negatively correlated with temperature. Most of environmental factors promoted hydrolysis of florpyrauxifen-benzyl, especially the cetyltrimethyl ammonium bromide (CTAB). The hydrolysis mechanism was ester hydrolysis reaction with a main product of florpyrauxifen. The MPs and DFMs did not affect the hydrolytic mechanisms but the hydrolysis rate. The results are crucial for illustrating and assessing the environmental fate and risks of florpyrauxifen-benzyl.
Assuntos
Herbicidas , Água , Espectrometria de Massas em Tandem , Cinética , Plásticos , HidróliseRESUMO
Substantial progresses in protein structure prediction have been made by utilizing deep-learning and residue-residue distance prediction since CASP13. Inspired by the advances, we improve our CASP14 MULTICOM protein structure prediction system by incorporating three new components: (a) a new deep learning-based protein inter-residue distance predictor to improve template-free (ab initio) tertiary structure prediction, (b) an enhanced template-based tertiary structure prediction method, and (c) distance-based model quality assessment methods empowered by deep learning. In the 2020 CASP14 experiment, MULTICOM predictor was ranked seventh out of 146 predictors in tertiary structure prediction and ranked third out of 136 predictors in inter-domain structure prediction. The results demonstrate that the template-free modeling based on deep learning and residue-residue distance prediction can predict the correct topology for almost all template-based modeling targets and a majority of hard targets (template-free targets or targets whose templates cannot be recognized), which is a significant improvement over the CASP13 MULTICOM predictor. Moreover, the template-free modeling performs better than the template-based modeling on not only hard targets but also the targets that have homologous templates. The performance of the template-free modeling largely depends on the accuracy of distance prediction closely related to the quality of multiple sequence alignments. The structural model quality assessment works well on targets for which enough good models can be predicted, but it may perform poorly when only a few good models are predicted for a hard target and the distribution of model quality scores is highly skewed. MULTICOM is available at https://github.com/jianlin-cheng/MULTICOM_Human_CASP14/tree/CASP14_DeepRank3 and https://github.com/multicom-toolbox/multicom/tree/multicom_v2.0.
Assuntos
Aprendizado Profundo , Modelos Moleculares , Proteínas/química , Algoritmos , Humanos , Estrutura Terciária de Proteína/genética , Proteínas/genética , Alinhamento de Sequência , Análise de Sequência de Proteína , SoftwareRESUMO
MOTIVATION: Accurate prediction of residue-residue distances is important for protein structure prediction. We developed several protein distance predictors based on a deep learning distance prediction method and blindly tested them in the 14th Critical Assessment of Protein Structure Prediction (CASP14). The prediction method uses deep residual neural networks with the channel-wise attention mechanism to classify the distance between every two residues into multiple distance intervals. The input features for the deep learning method include co-evolutionary features as well as other sequence-based features derived from multiple sequence alignments (MSAs). Three alignment methods are used with multiple protein sequence/profile databases to generate MSAs for input feature generation. Based on different configurations and training strategies of the deep learning method, five MULTICOM distance predictors were created to participate in the CASP14 experiment. RESULTS: Benchmarked on 37 hard CASP14 domains, the best performing MULTICOM predictor is ranked 5th out of 30 automated CASP14 distance prediction servers in terms of precision of top L/5 long-range contact predictions [i.e. classifying distances between two residues into two categories: in contact (<8 Angstrom) and not in contact otherwise] and performs better than the best CASP13 distance prediction method. The best performing MULTICOM predictor is also ranked 6th among automated server predictors in classifying inter-residue distances into 10 distance intervals defined by CASP14 according to the precision of distance classification. The results show that the quality and depth of MSAs depend on alignment methods and sequence databases and have a significant impact on the accuracy of distance prediction. Using larger training datasets and multiple complementary features improves prediction accuracy. However, the number of effective sequences in MSAs is only a weak indicator of the quality of MSAs and the accuracy of predicted distance maps. In contrast, there is a strong correlation between the accuracy of contact/distance predictions and the average probability of the predicted contacts, which can therefore be more effectively used to estimate the confidence of distance predictions and select predicted distance maps. AVAILABILITY AND IMPLEMENTATION: The software package, source code and data of DeepDist2 are freely available at https://github.com/multicom-toolbox/deepdist and https://zenodo.org/record/4712084#.YIIM13VKhQM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
RESUMO
BACKGROUND: Driven by deep learning, inter-residue contact/distance prediction has been significantly improved and substantially enhanced ab initio protein structure prediction. Currently, most of the distance prediction methods classify inter-residue distances into multiple distance intervals instead of directly predicting real-value distances. The output of the former has to be converted into real-value distances to be used in tertiary structure prediction. RESULTS: To explore the potentials of predicting real-value inter-residue distances, we develop a multi-task deep learning distance predictor (DeepDist) based on new residual convolutional network architectures to simultaneously predict real-value inter-residue distances and classify them into multiple distance intervals. Tested on 43 CASP13 hard domains, DeepDist achieves comparable performance in real-value distance prediction and multi-class distance prediction. The average mean square error (MSE) of DeepDist's real-value distance prediction is 0.896 Å2 when filtering out the predicted distance ≥ 16 Å, which is lower than 1.003 Å2 of DeepDist's multi-class distance prediction. When distance predictions are converted into contact predictions at 8 Å threshold (the standard threshold in the field), the precision of top L/5 and L/2 contact predictions of DeepDist's multi-class distance prediction is 79.3% and 66.1%, respectively, higher than 78.6% and 64.5% of its real-value distance prediction and the best results in the CASP13 experiment. CONCLUSIONS: DeepDist can predict inter-residue distances well and improve binary contact prediction over the existing state-of-the-art methods. Moreover, the predicted real-value distances can be directly used to reconstruct protein tertiary structures better than multi-class distance predictions due to the lower MSE. Finally, we demonstrate that predicting the real-value distance map and multi-class distance map at the same time performs better than predicting real-value distances alone.
Assuntos
Algoritmos , Biologia Computacional , Proteínas , Modelos Moleculares , Proteínas/químicaRESUMO
Deep learning has emerged as a revolutionary technology for protein residue-residue contact prediction since the 2012 CASP10 competition. Considerable advancements in the predictive power of the deep learning-based contact predictions have been achieved since then. However, little effort has been put into interpreting the black-box deep learning methods. Algorithms that can interpret the relationship between predicted contact maps and the internal mechanism of the deep learning architectures are needed to explore the essential components of contact inference and improve their explainability. In this study, we present an attention-based convolutional neural network for protein contact prediction, which consists of two attention mechanism-based modules: sequence attention and regional attention. Our benchmark results on the CASP13 free-modeling targets demonstrate that the two attention modules added on top of existing typical deep learning models exhibit a complementary effect that contributes to prediction improvements. More importantly, the inclusion of the attention mechanism provides interpretable patterns that contain useful insights into the key fold-determining residues in proteins. We expect the attention-based model can provide a reliable and practically interpretable technique that helps break the current bottlenecks in explaining deep neural networks for contact prediction. The source code of our method is available at https://github.com/jianlin-cheng/InterpretContactMap.
Assuntos
Biologia Computacional/estatística & dados numéricos , Aprendizado Profundo , Proteínas/química , Software , Benchmarking , Sítios de Ligação , Bases de Dados de Proteínas , Humanos , Ligação Proteica , Conformação Proteica , Domínios e Motivos de Interação entre Proteínas , Proteínas/metabolismo , Projetos de Pesquisa , Alinhamento de Sequência , Análise de Sequência de ProteínaRESUMO
MOTIVATION: Deep learning has become the dominant technology for protein contact prediction. However, the factors that affect the performance of deep learning in contact prediction have not been systematically investigated. RESULTS: We analyzed the results of our three deep learning-based contact prediction methods (MULTICOM-CLUSTER, MULTICOM-CONSTRUCT and MULTICOM-NOVEL) in the CASP13 experiment and identified several key factors [i.e. deep learning technique, multiple sequence alignment (MSA), distance distribution prediction and domain-based contact integration] that influenced the contact prediction accuracy. We compared our convolutional neural network (CNN)-based contact prediction methods with three coevolution-based methods on 75 CASP13 targets consisting of 108 domains. We demonstrated that the CNN-based multi-distance approach was able to leverage global coevolutionary coupling patterns comprised of multiple correlated contacts for more accurate contact prediction than the local coevolution-based methods, leading to a substantial increase of precision by 19.2 percentage points. We also tested different alignment methods and domain-based contact prediction with the deep learning contact predictors. The comparison of the three methods showed deeper sequence alignments and the integration of domain-based contact prediction with the full-length contact prediction improved the performance of contact prediction. Moreover, we demonstrated that the domain-based contact prediction based on a novel ab initio approach of parsing domains from MSAs alone without using known protein structures was a simple, fast approach to improve contact prediction. Finally, we showed that predicting the distribution of inter-residue distances in multiple distance intervals could capture more structural information and improve binary contact prediction. AVAILABILITY AND IMPLEMENTATION: https://github.com/multicom-toolbox/DNCON2/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Biologia Computacional , Aprendizado Profundo , Algoritmos , Proteínas , Alinhamento de SequênciaRESUMO
Predicting residue-residue distance relationships (eg, contacts) has become the key direction to advance protein structure prediction since 2014 CASP11 experiment, while deep learning has revolutionized the technology for contact and distance distribution prediction since its debut in 2012 CASP10 experiment. During 2018 CASP13 experiment, we enhanced our MULTICOM protein structure prediction system with three major components: contact distance prediction based on deep convolutional neural networks, distance-driven template-free (ab initio) modeling, and protein model ranking empowered by deep learning and contact prediction. Our experiment demonstrates that contact distance prediction and deep learning methods are the key reasons that MULTICOM was ranked 3rd out of all 98 predictors in both template-free and template-based structure modeling in CASP13. Deep convolutional neural network can utilize global information in pairwise residue-residue features such as coevolution scores to substantially improve contact distance prediction, which played a decisive role in correctly folding some free modeling and hard template-based modeling targets. Deep learning also successfully integrated one-dimensional structural features, two-dimensional contact information, and three-dimensional structural quality scores to improve protein model quality assessment, where the contact prediction was demonstrated to consistently enhance ranking of protein models for the first time. The success of MULTICOM system clearly shows that protein contact distance prediction and model selection driven by deep learning holds the key of solving protein structure prediction problem. However, there are still challenges in accurately predicting protein contact distance when there are few homologous sequences, folding proteins from noisy contact distances, and ranking models of hard targets.