Pesquisa | Portal de Pesquisa da BVS

Synthetic protein alignments by CCMgen quantify noise in residue-residue contact prediction.

Vorberg, Susann; Seemayer, Stefan; Söding, Johannes.

PLoS Comput Biol ; 14(11): e1006526, 2018 11.

Artigo em Inglês | MEDLINE | ID: mdl-30395601

RESUMO

Compensatory mutations between protein residues in physical contact can manifest themselves as statistical couplings between the corresponding columns in a multiple sequence alignment (MSA) of the protein family. Conversely, large coupling coefficients predict residue contacts. Methods for de-novo protein structure prediction based on this approach are becoming increasingly reliable. Their main limitation is the strong systematic and statistical noise in the estimation of coupling coefficients, which has so far limited their application to very large protein families. While most research has focused on improving predictions by adding external information, little progress has been made to improve the statistical procedure at the core, because our lack of understanding of the sources of noise poses a major obstacle. First, we show theoretically that the expectation value of the coupling score assuming no coupling is proportional to the product of the square roots of the column entropies, and we propose a simple entropy bias correction (EntC) that subtracts out this expectation value. Second, we show that the average product correction (APC) includes the correction of the entropy bias, partly explaining its success. Third, we have developed CCMgen, the first method for simulating protein evolution and generating realistic synthetic MSAs with pairwise statistical residue couplings. Fourth, to learn exact statistical models that reliably reproduce observed alignment statistics, we developed CCMpredPy, an implementation of the persistent contrastive divergence (PCD) method for exact inference. Fifth, we demonstrate how CCMgen and CCMpredPy can facilitate the development of contact prediction methods by analysing the systematic noise contributions from phylogeny and entropy. Using the entropy bias correction, we can disentangle both sources of noise and find that entropy contributes roughly twice as much noise as phylogeny.

Assuntos

Proteínas/química , Alinhamento de Sequência , Algoritmos , Sequência de Aminoácidos , Sítios de Ligação , Entropia , Ruído , Homologia de Sequência de Aminoácidos

LocTree3 prediction of localization.

Goldberg, Tatyana; Hecht, Maximilian; Hamp, Tobias; Karl, Timothy; Yachdav, Guy; Ahmed, Nadeem; Altermann, Uwe; Angerer, Philipp; Ansorge, Sonja; Balasz, Kinga; Bernhofer, Michael; Betz, Alexander; Cizmadija, Laura; Do, Kieu Trinh; Gerke, Julia; Greil, Robert; Joerdens, Vadim; Hastreiter, Maximilian; Hembach, Katharina; Herzog, Max; Kalemanov, Maria; Kluge, Michael; Meier, Alice; Nasir, Hassan; Neumaier, Ulrich; Prade, Verena; Reeb, Jonas; Sorokoumov, Aleksandr; Troshani, Ilira; Vorberg, Susann; Waldraff, Sonja; Zierer, Jonas; Nielsen, Henrik; Rost, Burkhard.

Nucleic Acids Res ; 42(Web Server issue): W350-5, 2014 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-24848019

RESUMO

The prediction of protein sub-cellular localization is an important step toward elucidating protein function. For each query protein sequence, LocTree2 applies machine learning (profile kernel SVM) to predict the native sub-cellular localization in 18 classes for eukaryotes, in six for bacteria and in three for archaea. The method outputs a score that reflects the reliability of each prediction. LocTree2 has performed on par with or better than any other state-of-the-art method. Here, we report the availability of LocTree3 as a public web server. The server includes the machine learning-based LocTree2 and improves over it through the addition of homology-based inference. Assessed on sequence-unique data, LocTree3 reached an 18-state accuracy Q18=80±3% for eukaryotes and a six-state accuracy Q6=89±4% for bacteria. The server accepts submissions ranging from single protein sequences to entire proteomes. Response time of the unloaded server is about 90 s for a 300-residue eukaryotic protein and a few hours for an entire eukaryotic proteome not considering the generation of the alignments. For over 1000 entirely sequenced organisms, the predictions are directly available as downloads. The web server is available at http://www.rostlab.org/services/loctree3.

Assuntos

Proteínas/análise , Software , Proteínas Arqueais/análise , Inteligência Artificial , Proteínas de Bactérias/análise , Internet , Homologia de Sequência de Aminoácidos

Modeling the Biodegradability of Chemical Compounds Using the Online CHEmical Modeling Environment (OCHEM).

Vorberg, Susann; Tetko, Igor V.

Mol Inform ; 33(1): 73-85, 2014 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-27485201

RESUMO

Biodegradability describes the capacity of substances to be mineralized by free-living bacteria. It is a crucial property in estimating a compound's long-term impact on the environment. The ability to reliably predict biodegradability would reduce the need for laborious experimental testing. However, this endpoint is difficult to model due to unavailability or inconsistency of experimental data. Our approach makes use of the Online Chemical Modeling Environment (OCHEM) and its rich supply of machine learning methods and descriptor sets to build classification models for ready biodegradability. These models were analyzed to determine the relationship between characteristic structural properties and biodegradation activity. The distinguishing feature of the developed models is their ability to estimate the accuracy of prediction for each individual compound. The models developed using seven individual descriptor sets were combined in a consensus model, which provided the highest accuracy. The identified overrepresented structural fragments can be used by chemists to improve the biodegradability of new chemical compounds. The consensus model, the datasets used, and the calculated structural fragments are publicly available at http://ochem.eu/article/31660.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA