Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 55
Filtrar
1.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36458437

RESUMO

One of key features of intrinsically disordered regions (IDRs) is facilitation of protein-protein and protein-nucleic acids interactions. These disordered binding regions include molecular recognition features (MoRFs), short linear motifs (SLiMs) and longer binding domains. Vast majority of current predictors of disordered binding regions target MoRFs, with a handful of methods that predict SLiMs and disordered protein-binding domains. A new and broader class of disordered binding regions, linear interacting peptides (LIPs), was introduced recently and applied in the MobiDB resource. LIPs are segments in protein sequences that undergo disorder-to-order transition upon binding to a protein or a nucleic acid, and they cover MoRFs, SLiMs and disordered protein-binding domains. Although current predictors of MoRFs and disordered protein-binding regions could be used to identify some LIPs, there are no dedicated sequence-based predictors of LIPs. To this end, we introduce CLIP, a new predictor of LIPs that utilizes robust logistic regression model to combine three complementary types of inputs: co-evolutionary information derived from multiple sequence alignments, physicochemical profiles and disorder predictions. Ablation analysis suggests that the co-evolutionary information is particularly useful for this prediction and that combining the three inputs provides substantial improvements when compared to using these inputs individually. Comparative empirical assessments using low-similarity test datasets reveal that CLIP secures area under receiver operating characteristic curve (AUC) of 0.8 and substantially improves over the results produced by the closest current tools that predict MoRFs and disordered protein-binding regions. The webserver of CLIP is freely available at http://biomine.cs.vcu.edu/servers/CLIP/ and the standalone code can be downloaded from http://yanglab.qd.sdu.edu.cn/download/CLIP/.


Assuntos
Proteínas Intrinsicamente Desordenadas , Proteínas Intrinsicamente Desordenadas/química , Biologia Computacional/métodos , Sequência de Aminoácidos , Peptídeos/metabolismo , Domínios Proteicos , Bases de Dados de Proteínas , Ligação Proteica
2.
Bioinformatics ; 40(2)2024 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-38341662

RESUMO

MOTIVATION: RNA threading aims to identify remote homologies for template-based modeling of RNA 3D structure. Existing RNA alignment methods primarily rely on secondary structure alignment. They are often time- and memory-consuming, limiting large-scale applications. In addition, the accuracy is far from satisfactory. RESULTS: Using RNA secondary structure and sequence profile, we developed a novel RNA threading algorithm, named RNAthreader. To enhance the alignment process and minimize memory usage, a novel approach has been introduced to simplify RNA secondary structures into compact diagrams. RNAthreader employs a two-step methodology. Initially, integer programming and dynamic programming are combined to create an initial alignment for the simplified diagram. Subsequently, the final alignment is obtained using dynamic programming, taking into account the initial alignment derived from the previous step. The benchmark test on 80 RNAs illustrates that RNAthreader generates more accurate alignments than other methods, especially for RNAs with pseudoknots. Another benchmark, involving 30 RNAs from the RNA-Puzzles experiments, exhibits that the models constructed using RNAthreader templates have a lower average RMSD than those created by alternative methods. Remarkably, RNAthreader takes less than two hours to complete alignments with ∼5000 RNAs, which is 3-40 times faster than other methods. These compelling results suggest that RNAthreader is a promising algorithm for RNA template detection. AVAILABILITY AND IMPLEMENTATION: https://yanglab.qd.sdu.edu.cn/RNAthreader.


Assuntos
RNA , Software , RNA/química , Alinhamento de Sequência , Algoritmos , Estrutura Secundária de Proteína
3.
Bioinformatics ; 39(2)2023 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-36734597

RESUMO

MOTIVATION: It is fundamental to cut multi-domain proteins into individual domains, for precise domain-based structural and functional studies. In the past, sequence-based and structure-based domain parsing was carried out independently with different methodologies. The recent progress in deep learning-based protein structure prediction provides the opportunity to unify sequence-based and structure-based domain parsing. RESULTS: Based on the inter-residue distance matrix, which can be either derived from the input structure or predicted by trRosettaX, we can decode the domain boundaries under a unified framework. We name the proposed method UniDoc. The principle of UniDoc is based on the well-accepted physical concept of maximizing intra-domain interaction while minimizing inter-domain interaction. Comprehensive tests on five benchmark datasets indicate that UniDoc outperforms other state-of-the-art methods in terms of both accuracy and speed, for both sequence-based and structure-based domain parsing. The major contribution of UniDoc is providing a unified framework for structure-based and sequence-based domain parsing. We hope that UniDoc would be a convenient tool for protein domain analysis. AVAILABILITY AND IMPLEMENTATION: https://yanglab.nankai.edu.cn/UniDoc/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Biologia Computacional , Domínios Proteicos , Biologia Computacional/métodos , Proteínas/química
4.
New Phytol ; 242(4): 1798-1813, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38155454

RESUMO

It is well understood that agricultural management influences arbuscular mycorrhizal (AM) fungi, but there is controversy about whether farmers should manage for AM symbiosis. We assessed AM fungal communities colonizing wheat roots for three consecutive years in a long-term (> 14 yr) tillage and fertilization experiment. Relationships among mycorrhizas, crop performance, and soil ecosystem functions were quantified. Tillage, fertilizers and continuous monoculture all reduced AM fungal richness and shifted community composition toward dominance of a few ruderal taxa. Rhizophagus and Dominikia were depressed by tillage and/or fertilization, and their abundances as well as AM fungal richness correlated positively with soil aggregate stability and nutrient cycling functions across all or no-tilled samples. In the field, wheat yield was unrelated to AM fungal abundance and correlated negatively with AM fungal richness. In a complementary glasshouse study, wheat biomass was enhanced by soil inoculum from unfertilized, no-till plots while neutral to depressed growth was observed in wheat inoculated with soils from fertilized and conventionally tilled plots. This study demonstrates contrasting impacts of low-input and conventional agricultural practices on AM symbiosis and highlights the importance of considering both crop yield and soil ecosystem functions when managing mycorrhizas for more sustainable agroecosystems.


Assuntos
Produtos Agrícolas , Ecossistema , Fertilizantes , Micorrizas , Microbiologia do Solo , Solo , Triticum , Micorrizas/fisiologia , Solo/química , Triticum/microbiologia , Triticum/crescimento & desenvolvimento , Triticum/fisiologia , Produtos Agrícolas/microbiologia , Produtos Agrícolas/crescimento & desenvolvimento , Agricultura/métodos , Biomassa , Raízes de Plantas/microbiologia , Fatores de Tempo , Biodiversidade
5.
Proteins ; 91(12): 1704-1711, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37565699

RESUMO

We present the monomer and multimer structure prediction results of our methods in CASP15. We first designed an elaborate pipeline that leverages complementary sequence databases and advanced database searching algorithms to generate high-quality multiple sequence alignments (MSAs). Top MSAs were then selected for the subsequent step of structure prediction. We utilized trRosettaX2 and AlphaFold2 for monomer structure prediction (group name Yang-Server), and AlphaFold-Multimer for multimer structure prediction (group name Yang-Multimer). Yang-Server and Yang-Multimer are ranked at the top and the fourth, respectively, for monomer and multimer structure prediction. For 94 monomers, the average TM-score of the predicted structure models by Yang-Server is 0.876, compared to 0.798 by the default AlphaFold2 (i.e., the group NBIS-AF2-standard). For 42 multimers, the average DockQ score of the predicted structure models by Yang-Multimer is 0.464, compared to 0.389 by the default AlphaFold-Multimer (i.e., the group NBIS-AF2-multimer). Detailed analysis of the results shows that several factors contribute to the improvement, including improved MSAs, iterated modeling for large targets, interplay between monomer and multimer structure prediction for intertwined structures, etc. However, the structure predictions for orphan proteins and multimers remain challenging, and breakthroughs in this area are anticipated in the future.


Assuntos
Algoritmos , Furilfuramida , Alinhamento de Sequência , Bases de Dados de Ácidos Nucleicos
6.
Bioinformatics ; 38(4): 962-969, 2022 01 27.
Artigo em Inglês | MEDLINE | ID: mdl-34791040

RESUMO

MOTIVATION: Significant progress has been achieved in distance-based protein folding, due to improved prediction of inter-residue distance by deep learning. Many efforts are thus made to improve distance prediction in recent years. However, it remains unknown what is the best way of objectively assessing the accuracy of predicted distance. RESULTS: A total of 19 metrics were proposed to measure the accuracy of predicted distance. These metrics were discussed and compared quantitatively on three benchmark datasets, with distance and structure models predicted by the trRosetta pipeline. The experiments show that a few metrics, such as distance precision, have a high correlation with the model accuracy measure TM-score (Pearson's correlation coefficient >0.7). In addition, the metrics are applied to rank the distance prediction groups in CASP14. The ranking by our metrics coincides largely with the official version. These data suggest that the proposed metrics are effective for measuring distance prediction. We anticipate that this study paves the way for objectively monitoring the progress of inter-residue distance prediction. A web server and a standalone package are provided to implement the proposed metrics. AVAILABILITY AND IMPLEMENTATION: http://yanglab.nankai.edu.cn/APD. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Proteínas , Proteínas/química , Biologia Computacional , Dobramento de Proteína
7.
Proc Natl Acad Sci U S A ; 117(3): 1496-1503, 2020 01 21.
Artigo em Inglês | MEDLINE | ID: mdl-31896580

RESUMO

The prediction of interresidue contacts and distances from coevolutionary data using deep learning has considerably advanced protein structure prediction. Here, we build on these advances by developing a deep residual network for predicting interresidue orientations, in addition to distances, and a Rosetta-constrained energy-minimization protocol for rapidly and accurately generating structure models guided by these restraints. In benchmark tests on 13th Community-Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP13)- and Continuous Automated Model Evaluation (CAMEO)-derived sets, the method outperforms all previously described structure-prediction methods. Although trained entirely on native proteins, the network consistently assigns higher probability to de novo-designed proteins, identifying the key fold-determining residues and providing an independent quantitative measure of the "ideality" of a protein structure. The method promises to be useful for a broad range of protein structure prediction and design problems.


Assuntos
Conformação Proteica , Análise de Sequência de Proteína/métodos , Software , Animais , Aprendizado Profundo , Humanos
8.
Mycorrhiza ; 33(5-6): 359-368, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37821597

RESUMO

Strong effects of plant identity, soil nutrient availability or mycorrhizal fungi on root traits have been well documented, but their interactive influences on root traits are still poorly understood. Here, three crop species (maize, wheat and soybean) were grown under four phosphorus (P) addition levels (0, 20, 40 and 60 mg P kg-1 dry soil), and plants were inoculated with or without five combined arbuscular mycorrhizal fungal (AMF) species. Plant biomass, nutrient contents, root traits (including total root length, average root diameter, specific root length and root tissue density) and plants' mycorrhizal responses were measured. Crop species, P level, AMF, and their interactions strongly affected plant biomass and root traits. P fertilization promoted plant growth but reduced mycorrhizal benefits on plant biomass and nutrient uptake. Root traits of maize were sensitive to P addition only under the non-mycorrhizal condition, whilst most root traits of soybean and wheat plants were responsive to mycorrhizal inoculation but not P addition. Mycorrhizal colonization reduced the root plasticity in response to P fertility for maize but not for wheat or soybean. This study highlights the importance of soil nutrient fertility and mycorrhizal symbiosis in influencing root traits.


Assuntos
Micorrizas , Micorrizas/fisiologia , Solo , Glycine max , Triticum , Zea mays , Fósforo , Raízes de Plantas/microbiologia
9.
BMC Plant Biol ; 22(1): 60, 2022 Feb 03.
Artigo em Inglês | MEDLINE | ID: mdl-35114932

RESUMO

BACKGROUND: The impacts of increasing nitrogen (N) deposition and overgrazing on terrestrial ecosystems have been continuously hot issues. Grazing exclusion, aimed at restoration of grassland ecosystem function and service, has been extensively applied, and considered a rapid and effective vegetation restoration method. However, the synthetic effects of exclosure and N deposition on plant and community characteristics have rarely been studied. Here, a 4-year field experiment of N addition and exclusion treatment had been conducted in the desert steppe dominated by Alhagi sparsifolia and Lycium ruthenicum in northwest of China, and the responses of soil characteristics, plant nutrition and plant community to the treatments had been analyzed. RESULTS: The grazing exclusion significantly increased total N concentration in the surface soil (0-20 cm), and increased plant height, coverage (P < 0.05) and aboveground biomass. Specifically, A. sparsifolia recovered faster both in individual and community levels than L. ruthenicum did after exclusion. There was no difference in response to N addition gradients between the two plants. CONCLUSIONS: Our findings suggest that it is exclusion rather than N addition that has greater impacts on soil properties and plant community in desert steppe. Present N deposition level has no effect on plant community of desert steppe based on short-term experimental treatments.


Assuntos
Biodiversidade , Ecossistema , Pradaria , Herbivoria , Nitrogênio/metabolismo , Fenômenos Fisiológicos Vegetais/efeitos dos fármacos , Microbiologia do Solo , China , Clima Desértico
10.
Bioinformatics ; 37(1): 36-42, 2021 Apr 09.
Artigo em Inglês | MEDLINE | ID: mdl-33416863

RESUMO

MOTIVATION: RNA molecules become attractive small molecule drug targets to treat disease in recent years. Computer-aided drug design can be facilitated by detecting the RNA sites that bind small molecules. However, very limited progress has been reported for the prediction of small molecule-RNA binding sites. RESULTS: We developed a novel method RNAsite to predict small molecule-RNA binding sites using sequence profile- and structure-based descriptors. RNAsite was shown to be competitive with the state-of-the-art methods on the experimental structures of two independent test sets. When predicted structure models were used, RNAsite outperforms other methods by a large margin. The possibility of improving RNAsite by geometry-based binding pocket detection was investigated. The influence of RNA structure's flexibility and the conformational changes caused by ligand binding on RNAsite were also discussed. RNAsite is anticipated to be a useful tool for the design of RNA-targeting small molecule drugs. AVAILABILITY AND IMPLEMENTATION: http://yanglab.nankai.edu.cn/RNAsite. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

11.
Bioinformatics ; 37(8): 1093-1098, 2021 05 23.
Artigo em Inglês | MEDLINE | ID: mdl-33135062

RESUMO

MOTIVATION: Recent years have witnessed that the inter-residue contact/distance in proteins could be accurately predicted by deep neural networks, which significantly improve the accuracy of predicted protein structure models. In contrast, fewer studies have been done for the prediction of RNA inter-nucleotide 3D closeness. RESULTS: We proposed a new algorithm named RNAcontact for the prediction of RNA inter-nucleotide 3D closeness. RNAcontact was built based on the deep residual neural networks. The covariance information from multiple sequence alignments and the predicted secondary structure were used as the input features of the networks. Experiments show that RNAcontact achieves the respective precisions of 0.8 and 0.6 for the top L/10 and L (where L is the length of an RNA) predictions on an independent test set, significantly higher than other evolutionary coupling methods. Analysis shows that about 1/3 of the correctly predicted 3D closenesses are not base pairings of secondary structure, which are critical to the determination of RNA structure. In addition, we demonstrated that the predicted 3D closeness could be used as distance restraints to guide RNA structure folding by the 3dRNA package. More accurate models could be built by using the predicted 3D closeness than the models without using 3D closeness. AVAILABILITY AND IMPLEMENTATION: The webserver and a standalone package are available at: http://yanglab.nankai.edu.cn/RNAcontact/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , RNA , Algoritmos , Redes Neurais de Computação , Nucleotídeos , Alinhamento de Sequência
12.
Bioinformatics ; 37(21): 3752-3759, 2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34473228

RESUMO

MOTIVATION: Protein model quality assessment (QA) is an essential component in protein structure prediction, which aims to estimate the quality of a structure model and/or select the most accurate model out from a pool of structure models, without knowing the native structure. QA remains a challenging task in protein structure prediction. RESULTS: Based on the inter-residue distance predicted by the recent deep learning-based structure prediction algorithm trRosetta, we developed QDistance, a new approach to the estimation of both global and local qualities. QDistance works for both single- and multi-models inputs. We designed several distance-based features to assess the agreement between the predicted and model-derived inter-residue distances. Together with a few widely used features, they are fed into a simple yet powerful linear regression model to infer the global QA scores. The local QA scores for each structure model are predicted based on a comparative analysis with a set of selected reference models. For multi-models input, the reference models are selected from the input based on the predicted global QA scores. For single-model input, the reference models are predicted by trRosetta. With the informative distance-based features, QDistance can predict the global quality with satisfactory accuracy. Benchmark tests on the CASP13 and the CAMEO structure models suggested that QDistance was competitive with other methods. Blind tests in the CASP14 experiments showed that QDistance was robust and ranked among the top predictors. Especially, QDistance was the top 3 local QA method and made the most accurate local QA prediction for unreliable local region. Analysis showed that this superior performance can be attributed to the inclusion of the predicted inter-residue distance. AVAILABILITY AND IMPLEMENTATION: http://yanglab.nankai.edu.cn/QDistance. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Proteínas , Biologia Computacional/métodos , Proteínas/química , Algoritmos
13.
Bioinformatics ; 36(Suppl_2): i754-i761, 2020 12 30.
Artigo em Inglês | MEDLINE | ID: mdl-33381830

RESUMO

MOTIVATION: Disordered flexible linkers (DFLs) are abundant and functionally important intrinsically disordered regions that connect protein domains and structural elements within domains and which facilitate disorder-based allosteric regulation. Although computational estimates suggest that thousands of proteins have DFLs, they were annotated experimentally in <200 proteins. This substantial annotation gap can be reduced with the help of accurate computational predictors. The sole predictor of DFLs, DFLpred, trade-off accuracy for shorter runtime by excluding relevant but computationally costly predictive inputs. Moreover, it relies on the local/window-based information while lacking to consider useful protein-level characteristics. RESULTS: We conceptualize, design and test APOD (Accurate Predictor Of DFLs), the first highly accurate predictor that utilizes both local- and protein-level inputs that quantify propensity for disorder, sequence composition, sequence conservation and selected putative structural properties. Consequently, APOD offers significantly more accurate predictions when compared with its faster predecessor, DFLpred, and several other alternative ways to predict DFLs. These improvements stem from the use of a more comprehensive set of inputs that cover the protein-level information and the application of a more sophisticated predictive model, a well-parametrized support vector machine. APOD achieves area under the curve = 0.82 (28% improvement over DFLpred) and Matthews correlation coefficient = 0.42 (180% increase over DFLpred) when tested on an independent/low-similarity test dataset. Consequently, APOD is a suitable choice for accurate and small-scale prediction of DFLs. AVAILABILITY AND IMPLEMENTATION: https://yanglab.nankai.edu.cn/APOD/.


Assuntos
Biologia Computacional , Proteínas Intrinsicamente Desordenadas , Bases de Dados de Proteínas , Domínios Proteicos , Proteínas/genética , Máquina de Vetores de Suporte
14.
Bioinformatics ; 36(1): 41-48, 2020 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-31173061

RESUMO

MOTIVATION: Almost all protein residue contact prediction methods rely on the availability of deep multiple sequence alignments (MSAs). However, many proteins from the poorly populated families do not have sufficient number of homologs in the conventional UniProt database. Here we aim to solve this issue by exploring the rich sequence data from the metagenome sequencing projects. RESULTS: Based on the improved MSA constructed from the metagenome sequence data, we developed MapPred, a new deep learning-based contact prediction method. MapPred consists of two component methods, DeepMSA and DeepMeta, both trained with the residual neural networks. DeepMSA was inspired by the recent method DeepCov, which was trained on 441 matrices of covariance features. By considering the symmetry of contact map, we reduced the number of matrices to 231, which makes the training more efficient in DeepMSA. Experiments show that DeepMSA outperforms DeepCov by 10-13% in precision. DeepMeta works by combining predicted contacts and other sequence profile features. Experiments on three benchmark datasets suggest that the contribution from the metagenome sequence data is significant with P-values less than 4.04E-17. MapPred is shown to be complementary and comparable the state-of-the-art methods. The success of MapPred is attributed to three factors: the deeper MSA from the metagenome sequence data, improved feature design in DeepMSA and optimized training by the residual neural networks. AVAILABILITY AND IMPLEMENTATION: http://yanglab.nankai.edu.cn/mappred/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Metagenoma , Redes Neurais de Computação , Análise de Sequência de Proteína , Algoritmos , Biologia Computacional/métodos , Proteínas/química , Alinhamento de Sequência , Análise de Sequência de Proteína/métodos
15.
Bioinformatics ; 36(7): 2119-2125, 2020 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-31790141

RESUMO

MOTIVATION: Threading is one of the most effective methods for protein structure prediction. In recent years, the increasing accuracy in protein contact map prediction opens a new avenue to improve the performance of threading algorithms. Several preliminary studies suggest that with predicted contacts, the performance of threading algorithms can be improved greatly. There is still much room to explore to make better use of predicted contacts. RESULTS: We have developed a new contact-assisted threading algorithm named CATHER using both conventional sequential profiles and contact map predicted by a deep learning-based algorithm. Benchmark tests on an independent test set and the CASP12 targets demonstrated that CATHER made significant improvement over other methods which only use either sequential profile or predicted contact map. Our method was ranked at the Top 10 among all 39 participated server groups on the 32 free modeling targets in the blind tests of the CASP13 experiment. These data suggest that it is promising to push forward the threading algorithms by using predicted contacts. AVAILABILITY AND IMPLEMENTATION: http://yanglab.nankai.edu.cn/CATHER/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Análise de Sequência de Proteína , Algoritmos , Proteínas , Software
16.
Cell Mol Life Sci ; 77(1): 149-160, 2020 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-31175370

RESUMO

Protein-coding nucleic acids exhibit composition and codon biases between sequences coding for intrinsically disordered regions (IDRs) and those coding for structured regions. IDRs are regions of proteins that are folding self-insufficient and which function without the prerequisite of folded structure. Several authors have investigated composition bias or codon selection in regions encoding for IDRs, primarily in Eukaryota, and concluded that elevated GC content is the result of the biased amino acid composition of IDRs. We substantively extend previous work by examining GC content in regions encoding IDRs, from 44 species in Eukaryota, Archaea, and Bacteria, spanning a wide range of GC content. We confirm that regions coding for IDRs show a significantly elevated GC content, even across all domains of life. Although this is largely attributable to the amino acid composition bias of IDRs, we show that this bias is independent of the overall GC content and, most importantly, we are the first to observe that GC content bias in IDRs is significantly different than expected from IDR amino acid composition alone. We empirically find compensatory codon selection that reduces the observed GC content bias in IDRs. This selection is dependent on the overall GC content of the organism. The codon selection bias manifests as use of infrequent, AT-rich codons in encoding IDRs. Further, we find these relationships to be independent of the intrinsic disorder prediction method used, and independent of estimated translation efficiency. These observations are consistent with the previous work, and we speculate on whether the observed biases are causal or symptomatic of other driving forces.


Assuntos
Códon/química , Proteínas Intrinsicamente Desordenadas/química , Animais , Composição de Bases , Códon/genética , Uso do Códon , Humanos , Proteínas Intrinsicamente Desordenadas/genética , Biossíntese de Proteínas , Conformação Proteica
17.
Brief Bioinform ; 19(2): 219-230, 2018 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-27802931

RESUMO

Sequence-based prediction of residue-residue contact in proteins becomes increasingly more important for improving protein structure prediction in the big data era. In this study, we performed a large-scale comparative assessment of 15 locally installed contact predictors. To assess these methods, we collected a big data set consisting of 680 nonredundant proteins covering different structural classes and target difficulties. We investigated a wide range of factors that may influence the precision of contact prediction, including target difficulty, structural class, the alignment depth and distribution of contact pairs in a protein structure. We found that: (1) the machine learning-based methods outperform the direct-coupling-based methods for short-range contact prediction, while the latter are significantly better for long-range contact prediction. The consensus-based methods, which combine machine learning and direct-coupling methods, perform the best. (2) The target difficulty does not have clear influence on the machine learning-based methods, while it does affect the direct-coupling and consensus-based methods significantly. (3) The alignment depth has relatively weak effect on the machine learning-based methods. However, for the direct-coupling-based methods and consensus-based methods, the predicted contacts for targets with deeper alignment tend to be more accurate. (4) All methods perform relatively better on ß and α + ß proteins than on α proteins. (5) Residues buried in the core of protein structure are more prone to be in contact than residues on the surface (22 versus 6%). We believe these are useful results for guiding future development of new approach to contact prediction.


Assuntos
Algoritmos , Domínios e Motivos de Interação entre Proteínas , Proteínas/metabolismo , Análise de Sequência de Proteína/métodos , Biologia Computacional/métodos , Humanos , Modelos Moleculares , Conformação Proteica , Dobramento de Proteína , Proteínas/química
18.
Bioinformatics ; 35(6): 930-936, 2019 03 15.
Artigo em Inglês | MEDLINE | ID: mdl-30169574

RESUMO

MOTIVATION: The interactions between protein and nucleic acids play a key role in various biological processes. Accurate recognition of the residues that bind nucleic acids can facilitate the study of uncharacterized protein-nucleic acids interactions. The accuracy of existing nucleic acids-binding residues prediction methods is relatively low. RESULTS: In this work, we introduce NucBind, a novel method for the prediction of nucleic acids-binding residues. NucBind combines the predictions from a support vector machine-based ab-initio method SVMnuc and a template-based method COACH-D. SVMnuc was trained with features from three complementary sequence profiles. COACH-D predicts the binding residues based on homologous templates identified from a nucleic acids-binding library. The proposed methods were assessed and compared with other peering methods on three benchmark datasets. Experimental results show that NucBind consistently outperforms other state-of-the-art methods. Though with higher accuracy, similar to many other ab-initio methods, cross prediction between DNA and RNA-binding residues was also observed in SVMnuc and NucBind. We attribute the success of NucBind to two folds. The first is the utilization of improved features extracted from three complementary sequence profiles in SVMnuc. The second is the combination of two complementary methods: the ab-initio method SVMnuc and the template-based method COACH-D. AVAILABILITY AND IMPLEMENTATION: http://yanglab.nankai.edu.cn/NucBind. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Análise de Sequência de Proteína , Algoritmos , Sítios de Ligação , Biologia Computacional , Consenso , Ácidos Nucleicos
19.
Bioinformatics ; 35(10): 1686-1691, 2019 05 15.
Artigo em Inglês | MEDLINE | ID: mdl-30321300

RESUMO

MOTIVATION: The de novo prediction of RNA tertiary structure remains a grand challenge. Predicted RNA solvent accessibility provides an opportunity to address this challenge. To the best of our knowledge, there is only one method (RNAsnap) available for RNA solvent accessibility prediction. However, its performance is unsatisfactory for protein-free RNAs. RESULTS: We developed RNAsol, a new algorithm to predict RNA solvent accessibility. RNAsol was built based on improved sequence profiles from the covariance models and trained with the long short-term memory (LSTM) neural networks. Independent tests on the same datasets from RNAsnap show that RNAsol achieves the mean Pearson's correlation coefficient (PCC) of 0.43/0.26 for the protein-bound/protein-free RNA molecules, which is 26.5%/136.4% higher than that of RNAsnap. When the training set is enlarged to include both types of RNAs, the PCCs increase to 0.49 and 0.46 for protein-bound and protein-free RNAs, respectively. The success of RNAsol is attributed to two aspects, including the improved sequence profiles constructed by the sequence-profile alignment and the enhanced training by the LSTM neural networks. AVAILABILITY AND IMPLEMENTATION: http://yanglab.nankai.edu.cn/RNAsol/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Redes Neurais de Computação , Algoritmos , Memória de Curto Prazo , RNA
20.
Nucleic Acids Res ; 46(W1): W438-W442, 2018 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-29846643

RESUMO

The identification of protein-ligand binding sites is critical to protein function annotation and drug discovery. The consensus algorithm COACH developed by us represents one of the most efficient approaches to protein-ligand binding sites prediction. One of the most commonly seen issues with the COACH prediction are the low quality of the predicted ligand-binding poses, which usually have severe steric clashes to the protein structure. Here, we present COACH-D, an enhanced version of COACH by utilizing molecular docking to refine the ligand-binding poses. The input to the COACH-D server is the amino acid sequence or the three-dimensional structure of a query protein. In addition, the users can also submit their own ligand of interest. For each job submission, the COACH algorithm is first used to predict the protein-ligand binding sites. The ligands from the users or the templates are then docked into the predicted binding pockets to build their complex structures. Blind tests show that the algorithm significantly outperforms other ligand-binding sites prediction methods. Benchmark tests show that the steric clashes between the ligand and the protein structures in the COACH models are reduced by 85% after molecular docking in COACH-D. The COACH-D server is freely available to all users at http://yanglab.nankai.edu.cn/COACH-D/.


Assuntos
Algoritmos , Simulação de Acoplamento Molecular/métodos , Proteínas/química , Software , Sequência de Aminoácidos , Benchmarking , Sítios de Ligação , Biologia Computacional/métodos , Bases de Dados de Proteínas , Humanos , Internet , Ligantes , Ligação Proteica , Domínios e Motivos de Interação entre Proteínas , Estrutura Secundária de Proteína
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa