Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 4.701
Filtrar
1.
Methods Mol Biol ; 2231: 3-16, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33289883

RESUMO

Clustal Omega is a version, completely rewritten and revised in 2011, of the widely used Clustal series of programs for multiple sequence alignment. It can deal with very large numbers (many tens of thousands) of DNA/RNA or protein sequences due to its use of the mBed algorithm for calculating guide-trees. This algorithm allows very large alignment problems to be tackled very quickly, even on personal computers. The accuracy of the program has been considerably improved over earlier Clustal programs, through the use of the HHalign method for aligning profile hidden Markov models. The program currently is used from the command-line or can be run online.


Assuntos
Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Algoritmos , Sequência de Aminoácidos , Sequência de Bases , Software
2.
Methods Mol Biol ; 2231: 121-134, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33289890

RESUMO

Sequence alignment is at the heart of DNA and protein sequence analysis. For the data volumes that are nowadays produced by massively parallel sequencing technologies, however, pairwise and multiple alignment methods are often too slow. Therefore, fast alignment-free approaches to sequence comparison have become popular in recent years. Most of these approaches are based on word frequencies, for words of a fixed length, or on word-matching statistics. Other approaches are using the length of maximal word matches. While these methods are very fast, most of them rely on ad hoc measures of sequences similarity or dissimilarity that are hard to interpret. In this chapter, I describe a number of alignment-free methods that we developed in recent years. Our approaches are based on spaced-word matches ("SpaM"), i.e. on inexact word matches, that are allowed to contain mismatches at certain pre-defined positions. Unlike most previous alignment-free approaches, our approaches are able to accurately estimate phylogenetic distances between DNA or protein sequences using a stochastic model of molecular evolution.


Assuntos
Bioestatística/métodos , Genômica/métodos , Análise de Sequência de DNA/métodos , Análise de Sequência de Proteína/métodos , Software , Algoritmos , Evolução Molecular , Filogenia , Alinhamento de Sequência
3.
Methods Mol Biol ; 2231: 163-177, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33289893

RESUMO

The Database of Aligned Structural Homologs (DASH) is a tool for efficiently navigating the Protein Data Bank (PDB) by means of pre-computed pairwise structural alignments. We recently showed that, by integrating DASH structural alignments with the multiple sequence alignment (MSA) software MAFFT, we were able to significantly improve MSA accuracy without dramatically increasing manual or computational complexity. In the latest DASH update, such queries are not limited to PDB entries but can also be launched from user-provided protein coordinates. Here, we describe a further extension of DASH that retrieves intermolecular interactions of all structurally similar domains in the PDB to a query domain of interest. We illustrate these new features using a model of the NYN domain of the ribonuclease N4BP1 as an example. We show that the protein-nucleotide interactions returned are distributed on the surface of the NYN domain in an asymmetric manner, roughly centered on the known nuclease active site.


Assuntos
Proteínas de Ligação a RNA/química , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Software , Algoritmos , Sequência de Aminoácidos , Biologia Computacional , Bases de Dados de Proteínas , Proteínas Nucleares/química , Ligação Proteica , Domínios Proteicos , Ribonucleases/química
4.
Methods Mol Biol ; 2231: 203-224, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33289895

RESUMO

In this chapter, we introduce core functionality of the Jalview interactive platform for the creation, analysis, and publication of multiple sequence alignments. A workflow is described based on Jalview's core functions: from data import to figure generation, including import of alignment reliability scores from T-Coffee and use of Jalview from the command line. The accompanying notes provide background information on the underlying methods and discuss additional options for working with Jalview to perform multiple sequence alignment, functional site analysis, and publication of alignments on the web.


Assuntos
Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Análise de Sequência de Proteína/métodos , Software , Filogenia , Reprodutibilidade dos Testes , Fluxo de Trabalho
5.
Methods Mol Biol ; 2231: 299-317, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33289899

RESUMO

Multiple sequence alignment is a core first step in many bioinformatics analyses, and errors in these alignments can have negative consequences for scientific studies. In this article, we review some of the recent literature evaluating multiple sequence alignment methods and identify specific challenges that arise when performing these evaluations. In particular, we discuss the different trends observed in simulation studies and when using biological benchmarks. Overall, we find that multiple sequence alignment, far from being a "solved problem," would benefit from new attention.


Assuntos
Benchmarking/métodos , Biologia Computacional/métodos , Alinhamento de Sequência/métodos , Software , Algoritmos , Simulação por Computador , Bases de Dados Genéticas , Filogenia , Análise de Sequência de DNA/métodos , Análise de Sequência de Proteína/métodos
6.
Gene ; 766: 145096, 2021 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-32919006

RESUMO

The phylogenetic analysis based on sequence similarity targeted to real biological taxa is one of the major challenging tasks. In this paper, we propose a novel alignment-free method, CoFASA (Codon Feature based Amino acid Sequence Analyser), for similarity analysis of nucleotide sequences. At first, we assign numerical weights to the four nucleotides. We then calculate a score of each codon based on the numerical value of the constituent nucleotides, termed as degree of codons. Accordingly, we obtain the degree of each amino acid based on the degree of codons targeted towards a specific amino acid. Utilizing the degree of twenty amino acids and their relative abundance within a given sequence, we generate 20-dimensional features for every coding DNA sequence or protein sequence. We use the features for performing phylogenetic analysis of the set of candidate sequences. We use multiple protein sequences derived from Beta-globin (BG), NADH dehydrogenase subunit 5 (ND5), Transferrins (TFs), Xylanases, low identity (<40%) and high identity (⩾40%) protein sequences (encompassing 533 and 1064 protein families) for experimental assessments. We compare our results with sixteen (16) well-known methods, including both alignment-based and alignment-free methods. Various assessment indices are used, such as the Pearson correlation coefficient, RF (Robinson-Foulds) distance and ROC score for performance analysis. While comparing the performance of CoFASA with alignment-based methods (ClustalW, ClustalΩ, MAFFT, and MUSCLE), it shows very similar results. Further, CoFASA shows better performance in comparison to well-known alignment-free methods, including LZW-Kernal, jD2Stat, FFP, spaced, and AFKS-D2s in predicting taxonomic relationship among candidate taxa. Overall, we observe that the features derived by CoFASA are very much useful in isolating the sequences according to their taxonomic labels. While our method is cost-effective, at the same time, produces consistent and satisfactory outcomes.


Assuntos
Sequência de Aminoácidos/genética , Aminoácidos/genética , Códon/genética , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Algoritmos , Animais , Humanos , Nucleotídeos/genética , Filogenia , Proteínas/genética
7.
PLoS One ; 15(12): e0239154, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33378336

RESUMO

BACKGROUND: Proteins with low complexity regions (LCRs) have atypical sequence and structural features. Their amino acid composition varies from the expected, determined proteome-wise, and they do not follow the rules of structural folding that prevail in globular regions. One way to characterize these regions is by assessing the repeatability of a sequence, that is, calculating the local propensity of a region to be part of a repeat. RESULTS: We combine two local measures of low complexity, repeatability (using the RES algorithm) and fraction of the most frequent amino acid, to evaluate different proteomes, datasets of protein regions with specific features, and individual cases of proteins with extreme compositions. We apply a representation called 'low complexity triangle' as a proof-of-concept to represent the low complexity measured values. Results show that proteomes have distinct signatures in the low complexity triangle, and that these signatures are associated to complexity features of the sequences. We developed a web tool called LCT (http://cbdm-01.zdv.uni-mainz.de/~munoz/lct/) to allow users to calculate the low complexity triangle of a given protein or region of interest. CONCLUSIONS: The low complexity triangle proves to be a suitable procedure to represent the general low complexity of a sequence or protein dataset. Homorepeats, direpeats, compositionally biased regions and globular regions occupy characteristic positions in the triangle. The described pipeline can be used to characterize LCRs and may help in quantifying the content of degenerated tandem repeats in proteins and proteomes.


Assuntos
Proteoma , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Humanos , Sequências Repetitivas de Aminoácidos
8.
ACS Chem Neurosci ; 11(22): 3701-3703, 2020 11 18.
Artigo em Inglês | MEDLINE | ID: mdl-33140636

RESUMO

Cell entry, the fundamental step in cross-species transmission of SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2), is initiated by the recognition of the host cell angiotensin-converting enzyme-2 (ACE2) receptor by the receptor-binding domain (RBD) of the spike protein of SARS-CoV-2. To date, several peptides have been proposed against SARS-CoV-2 both as inhibitor agents or as detection tools that can also be attached to the surfaces of nanoparticle carriers. But owing to their natural amino acid sequences, such peptides cannot be considered as efficient therapeutic candidates from a biostability point of view. This discussion demonstrates the design strategy of synthetic nonprotein amino acid substituted peptides with enhanced biostability and binding affinity, the implication of which can make those peptides potential therapeutic agents for inhibition and simple detection tools.


Assuntos
Antivirais/uso terapêutico , Betacoronavirus , Infecções por Coronavirus/tratamento farmacológico , Desenho de Fármacos , Fragmentos de Peptídeos/uso terapêutico , Pneumonia Viral/tratamento farmacológico , Sequência de Aminoácidos , Antivirais/metabolismo , Betacoronavirus/efeitos dos fármacos , Betacoronavirus/genética , Infecções por Coronavirus/genética , Infecções por Coronavirus/metabolismo , Humanos , Pandemias , Fragmentos de Peptídeos/genética , Fragmentos de Peptídeos/metabolismo , Pneumonia Viral/genética , Pneumonia Viral/metabolismo , Ligação Proteica/fisiologia , Análise de Sequência de Proteína/métodos
9.
BMC Bioinformatics ; 21(Suppl 11): 294, 2020 Sep 14.
Artigo em Inglês | MEDLINE | ID: mdl-32921315

RESUMO

BACKGROUND: The alignment of character sequences is important in bioinformatics. The quality of this procedure is determined by the substitution matrix and parameters of the insertion-deletion penalty function. These matrices are derived from sequence alignment and thus reflect the evolutionary process. Currently, in addition to evolutionary matrices, a large number of different background matrices have been obtained. To make an optimal choice of the substitution matrix and the penalty parameters, we conducted a numerical experiment using a representative sample of existing matrices of various types and origins. RESULTS: We tested both the classical evolutionary matrix series (PAM, Blosum, VTML, Pfasum); structural alignment based matrices, contact energy matrix, and matrix based on the properties of the genetic code. This study presents results for two test set types: first, we simulated sequences that reflect the divergent evolution; second, we performed tests on Balibase sequences. In both cases, we obtained the dependences of the alignment quality (Accuracy, Confidence) on the evolutionary distance between sequences and the evolutionary distance to which the substitution matrices correspond. Optimization of a combination of matrices and the penalty parameters was carried out for local and global alignment on the values of penalty function parameters. Consequently, we found that the best alignment quality is achieved with matrices corresponding to the largest evolutionary distance. These matrices prove to be universal, i.e. suitable for aligning sequences separated by both large and small evolutionary distances. We analysed the correspondence of the correlation coefficients of matrices to the alignment quality. It was found that matrices showing high quality alignment have an above average correlation value, but the converse is not true. CONCLUSIONS: This study showed that the best alignment quality is achieved with evolutionary matrices designed for long distances: Gonnet, VTML250, PAM250, MIQS, and Pfasum050. The same property is inherent in matrices not only of evolutionary origin, but also of another background corresponding to a large evolutionary distance. Therefore, matrices based on structural data show alignment quality close enough to its value for evolutionary matrices. This agrees with the idea that the spatial structure is more conservative than the protein sequence.


Assuntos
Substituição de Aminoácidos , Biologia Computacional/métodos , Alinhamento de Sequência , Análise de Sequência de Proteína/métodos , Software , Sequência de Aminoácidos , Evolução Molecular , Proteínas/química , Proteínas/metabolismo
10.
PLoS One ; 15(9): e0238625, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32915813

RESUMO

Recent advances in DNA sequencing methods revolutionized biology by providing highly accurate reads, with high throughput or high read length. These read data are being used in many biological and medical applications. Modern DNA sequencing methods have no equivalent in protein sequencing, severely limiting the widespread application of protein data. Recently, several optical protein sequencing methods have been proposed that rely on the fluorescent labeling of amino acids. Here, we introduce the reprotonation-deprotonation protein sequencing method. Unlike other methods, this proposed technique relies on the measurement of an electrical signal and requires no fluorescent labeling. In reprotonation-deprotonation protein sequencing, the terminal amino acid is identified through its unique protonation signal, and by repeatedly cleaving the terminal amino acids one-by-one, each amino acid in the peptide is measured. By means of simulations, we show that, given a reference database of known proteins, reprotonation-deprotonation sequencing has the potential to correctly identify proteins in a sample. Our simulations provide target values for the signal-to-noise ratios that sensor devices need to attain in order to detect reprotonation-deprotonation events, as well as suitable pH values and required measurement times per amino acid. For instance, an SNR of 10 is required for a 61.71% proteome recovery rate with 100 ms measurement time per amino acid.


Assuntos
Aminoácidos/química , Proteínas/química , Proteoma/genética , Análise de Sequência de Proteína/métodos , Aminoácidos/genética , Corantes Fluorescentes/química , Peptídeos/química , Peptídeos/genética , Proteínas/genética , Proteoma/química , Prótons , Análise de Sequência de DNA/métodos , Razão Sinal-Ruído
11.
Nat Commun ; 11(1): 3784, 2020 07 29.
Artigo em Inglês | MEDLINE | ID: mdl-32728052

RESUMO

The CRISPR-Cas are adaptive bacterial and archaeal immunity systems that have been harnessed for the development of powerful genome editing and engineering tools. In the incessant host-parasite arms race, viruses evolved multiple anti-defense mechanisms including diverse anti-CRISPR proteins (Acrs) that specifically inhibit CRISPR-Cas and therefore have enormous potential for application as modulators of genome editing tools. Most Acrs are small and highly variable proteins which makes their bioinformatic prediction a formidable task. We present a machine-learning approach for comprehensive Acr prediction. The model shows high predictive power when tested against an unseen test set and was employed to predict 2,500 candidate Acr families. Experimental validation of top candidates revealed two unknown Acrs (AcrIC9, IC10) and three other top candidates were coincidentally identified and found to possess anti-CRISPR activity. These results substantially expand the repertoire of predicted Acrs and provide a resource for experimental Acr discovery.


Assuntos
Bacteriófagos/genética , Proteína 9 Associada à CRISPR/antagonistas & inibidores , Aprendizado de Máquina , Análise de Sequência de Proteína/métodos , Proteínas Virais/genética , Archaea/genética , Archaea/virologia , Bactérias/genética , Bactérias/virologia , Proteína 9 Associada à CRISPR/genética , Sistemas CRISPR-Cas/genética , Biologia Computacional/métodos , Conjuntos de Dados como Assunto , Edição de Genes/métodos , Interações Hospedeiro-Parasita/genética , Homologia de Sequência de Aminoácidos
12.
PLoS Pathog ; 16(5): e1008190, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32413071

RESUMO

DNA replication protein Cdc45 is an integral part of the eukaryotic replicative helicase whose other components are the Mcm2-7 core, and GINS. We identified a PIP box motif in Leishmania donovani Cdc45. This motif is typically linked to interaction with the eukaryotic clamp proliferating cell nuclear antigen (PCNA). The homotrimeric PCNA can potentially bind upto three different proteins simultaneously via a loop region present in each monomer. Multiple binding partners have been identified from among the replication machinery in other eukaryotes, and the concerted /sequential binding of these partners are central to the fidelity of the replication process. Though conserved in Cdc45 across Leishmania species and Trypanosoma cruzi, the PIP box is absent in Trypanosoma brucei Cdc45. Here we investigate the possibility of Cdc45-PCNA interaction and the role of such an interaction in the in vivo context. Having confirmed the importance of Cdc45 in Leishmania DNA replication we establish that Cdc45 and PCNA interact stably in whole cell extracts, also interacting with each other directly in vitro. The interaction is mediated via the Cdc45 PIP box. This PIP box is essential for Leishmania survival. The importance of the Cdc45 PIP box is also examined in Schizosaccharomyces pombe, and it is found to be essential for cell survival here as well. Our results implicate a role for the Leishmania Cdc45 PIP box in recruiting or stabilizing PCNA on chromatin. The Cdc45-PCNA interaction might help tether PCNA and associated replicative DNA polymerase to the DNA template, thus facilitating replication fork elongation. Though multiple replication proteins that associate with PCNA have been identified in other eukaryotes, this is the first report demonstrating a direct interaction between Cdc45 and PCNA, and while our analysis suggests the interaction may not occur in human cells, it indicates that it may not be confined to trypanosomatids.


Assuntos
Leishmania donovani/metabolismo , Proteínas Nucleares/metabolismo , Proteínas de Schizosaccharomyces pombe/metabolismo , Proteínas de Ciclo Celular/genética , Proteínas de Ciclo Celular/metabolismo , Proteínas de Ciclo Celular/fisiologia , Cromatina/genética , DNA Helicases/metabolismo , Replicação do DNA/fisiologia , Leishmania donovani/genética , Proteínas Nucleares/genética , Proteínas Nucleares/fisiologia , Nucleotidiltransferases/genética , Antígeno Nuclear de Célula em Proliferação/genética , Antígeno Nuclear de Célula em Proliferação/metabolismo , Ligação Proteica , Domínios Proteicos , Proteínas de Schizosaccharomyces pombe/genética , Proteínas de Schizosaccharomyces pombe/fisiologia , Análise de Sequência de Proteína/métodos
13.
Nucleic Acids Res ; 48(W1): W65-W71, 2020 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-32313959

RESUMO

Zebra2 is a highly automated web-tool to search for subfamily-specific and conserved positions (i.e. the determinants of functional diversity as well as the key catalytic and structural residues) in protein superfamilies. The bioinformatic analysis is facilitated by Mustguseal-a companion web-server to automatically collect and superimpose a large representative set of functionally diverse homologs with high structure similarity but low sequence identity to the selected query protein. The results are automatically prioritized and provided at four information levels to facilitate the knowledge-driven expert selection of the most promising positions on-line: as a sequence similarity network; interfaces to sequence-based and 3D-structure-based analysis of conservation and variability; and accompanied by the detailed annotation of proteins accumulated from the integrated databases with links to the external resources. The integration of Zebra2 and Mustguseal web-tools provides the first of its kind out-of-the-box open-access solution to conduct a systematic analysis of evolutionarily related proteins implementing different functions within a shared 3D-structure of the superfamily, determine common and specific patterns of function-associated local structural elements, assist to select hot-spots for rational design and to prepare focused libraries for directed evolution. The web-servers are free and open to all users at https://biokinet.belozersky.msu.ru/zebra2, no login required.


Assuntos
Alinhamento de Sequência , Análise de Sequência de Proteína/métodos , Software , Algoritmos , Sequência de Aminoácidos , Biologia Computacional/métodos , Sequência Conservada , Internet , Conformação Proteica , Proteínas/química , Proteínas/classificação , Homologia de Sequência de Aminoácidos
14.
PLoS Comput Biol ; 16(4): e1007779, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-32339164

RESUMO

Antibodies are capable of potently and specifically binding individual antigens and, in some cases, disrupting their functions. The key challenge in generating antibody-based inhibitors is the lack of fundamental information relating sequences of antibodies to their unique properties as inhibitors. We develop a pipeline, Antibody Sequence Analysis Pipeline using Statistical testing and Machine Learning (ASAP-SML), to identify features that distinguish one set of antibody sequences from antibody sequences in a reference set. The pipeline extracts feature fingerprints from sequences. The fingerprints represent germline, CDR canonical structure, isoelectric point and frequent positional motifs. Machine learning and statistical significance testing techniques are applied to antibody sequences and extracted feature fingerprints to identify distinguishing feature values and combinations thereof. To demonstrate how it works, we applied the pipeline on sets of antibody sequences known to bind or inhibit the activities of matrix metalloproteinases (MMPs), a family of zinc-dependent enzymes that promote cancer progression and undesired inflammation under pathological conditions, against reference datasets that do not bind or inhibit MMPs. ASAP-SML identifies features and combinations of feature values found in the MMP-targeting sets that are distinct from those in the reference sets.


Assuntos
Anticorpos , Biologia Computacional/métodos , Aprendizado de Máquina , Análise de Sequência de Proteína/métodos , Software , Algoritmos , Anticorpos/química , Anticorpos/metabolismo , Bases de Dados de Proteínas , Humanos , Inibidores de Metaloproteinases de Matriz/química , Inibidores de Metaloproteinases de Matriz/metabolismo , Metaloproteinases da Matriz/química , Metaloproteinases da Matriz/metabolismo
15.
PLoS One ; 15(4): e0232087, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32348325

RESUMO

Many proteins exist in natures as oligomers with various quaternary structural attributes rather than as single chains. Predicting these attributes is an essential task in computational biology for the advancement of proteomics. However, the existing methods do not consider the integration of heterogeneous coding and the accuracy of subunit categories with limited data. To this end, we proposed a tool that can predict more than 12 subunit protein oligomers, QUATgo. Meanwhile, three kinds of sequence coding were used, including dipeptide composition, which was used for the first time to predict protein quaternary structural attributes, and protein half-life characteristics, and we modified the coding method of the functional domain composition proposed by predecessors to solve the problem of large feature vectors. QUATgo solves the problem of insufficient data for a single subunit using a two-stage architecture and uses 10-fold cross-validation to test the predictive accuracy of the classifier. QUATgo has 49.0% cross-validation accuracy and 31.1% independent test accuracy. In the case study, the accuracy of QUATgo can reach 61.5% for predicting the quaternary structure of influenza virus hemagglutinin proteins. Finally, QUATgo is freely accessible to the public as a web server via the site http://predictor.nchu.edu.tw/QUATgo.


Assuntos
Biologia Computacional/métodos , Aprendizado de Máquina , Estrutura Quaternária de Proteína , Proteínas/química , Análise de Sequência de Proteína/métodos , Software , Proteínas Virais/química , Algoritmos , Animais , Bases de Dados de Proteínas , Humanos , Domínios Proteicos , Proteínas/classificação , Máquina de Vetores de Suporte
16.
BMC Bioinformatics ; 21(1): 133, 2020 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-32245403

RESUMO

BACKGROUND: Despite the great advance of protein structure prediction, accurate prediction of the structures of mainly ß proteins is still highly challenging, but could be assisted by the knowledge of residue-residue pairing in ß strands. Previously, we proposed a ridge-detection-based algorithm RDb2C that adopted a multi-stage random forest framework to predict the ß-ß pairing given the amino acid sequence of a protein. RESULTS: In this work, we developed a second version of this algorithm, RDb2C2, by employing the residual neural network to further enhance the prediction accuracy. In the benchmark test, this new algorithm improves the F1-score by > 10 percentage points, reaching impressively high values of ~ 72% and ~ 73% in the BetaSheet916 and BetaSheet1452 sets, respectively. CONCLUSION: Our new method promotes the prediction accuracy of ß-ß pairing to a new level and the prediction results could better assist the structure modeling of mainly ß proteins. We prepared an online server of RDb2C2 at http://structpred.life.tsinghua.edu.cn/rdb2c2.html.


Assuntos
Algoritmos , Conformação Proteica em Folha beta , Análise de Sequência de Proteína/métodos , Redes Neurais de Computação
17.
J Mol Biol ; 432(7): 2289-2303, 2020 03 27.
Artigo em Inglês | MEDLINE | ID: mdl-32112804

RESUMO

It is becoming increasingly recognised that disordered proteins may be fuzzy, in that they can exhibit a wide variety of binding modes. In addition to the well-known process of folding upon binding (disorder-to-order transition), many examples are emerging of interacting proteins that remain disordered in their bound states (disorder-to-disorder transitions). Furthermore, disordered proteins may populate ordered and disordered states to different extents depending on their partners (context-dependent binding). Here we assemble three datasets comprising disorder-to-order, context-dependent, and disorder-to-disorder transitions of 828 protein regions represented in 2157 complexes and elucidate the sequence-determinants of the different interaction modes. We found that fuzzy interactions originate from local sequence compositions that promote the sampling of a wide range of different structures. Based on this observation, we developed the FuzPred method (http://protdyn-fuzpred.org) of predicting the binding modes of disordered proteins based on their amino acid sequences, without specifying their partners. We thus illustrate how the amino acid sequences of proteins can encode a wide range of conformational changes upon binding, including transitions from disordered to ordered and from disordered to disordered states.


Assuntos
Bases de Dados de Proteínas , Lógica Fuzzy , Proteínas Intrinsicamente Desordenadas/metabolismo , Domínios e Motivos de Interação entre Proteínas , Análise de Sequência de Proteína/métodos , Algoritmos , Sequência de Aminoácidos , Humanos , Proteínas Intrinsicamente Desordenadas/química , Modelos Moleculares , Ligação Proteica , Conformação Proteica , Domínios Proteicos , Dobramento de Proteína , Homologia de Sequência
18.
J Mol Biol ; 432(7): 2428-2443, 2020 03 27.
Artigo em Inglês | MEDLINE | ID: mdl-32142788

RESUMO

The intricate details of how proteins bind to proteins, DNA, and RNA are crucial for the understanding of almost all biological processes. Disease-causing sequence variants often affect binding residues. Here, we described a new, comprehensive system of in silico methods that take only protein sequence as input to predict binding of protein to DNA, RNA, and other proteins. Firstly, we needed to develop several new methods to predict whether or not proteins bind (per-protein prediction). Secondly, we developed independent methods that predict which residues bind (per-residue). Not requiring three-dimensional information, the system can predict the actual binding residue. The system combined homology-based inference with machine learning and motif-based profile-kernel approaches with word-based (ProtVec) solutions to machine learning protein level predictions. This achieved an overall non-exclusive three-state accuracy of 77% ± 1% (±one standard error) corresponding to a 1.8 fold improvement over random (best classification for protein-protein with F1 = 91 ± 0.8%). Standard neural networks for per-residue binding residue predictions appeared best for DNA-binding (Q2 = 81 ± 0.9%) followed by RNA-binding (Q2 = 80 ± 1%) and worst for protein-protein binding (Q2 = 69 ± 0.8%). The new method, dubbed ProNA2020, is available as code through github (https://github.com/Rostlab/ProNA2020.git) and through PredictProtein (www.predictprotein.org).


Assuntos
Biologia Computacional/métodos , DNA/metabolismo , Redes Neurais de Computação , Proteínas/metabolismo , RNA/metabolismo , Análise de Sequência de Proteína/métodos , Software , Animais , Sítios de Ligação , DNA/química , Eucariotos/metabolismo , Humanos , Aprendizado de Máquina , Conformação de Ácido Nucleico , Células Procarióticas/metabolismo , Ligação Proteica , Conformação Proteica , Proteínas/química , RNA/química
19.
PLoS Comput Biol ; 16(3): e1007741, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-32150535

RESUMO

We present ProteoClade, a Python toolkit that performs taxa-specific peptide assignment, protein inference, and quantitation for multi-species proteomics experiments. ProteoClade scales to hundreds of millions of protein sequences, requires minimal computational resources, and is open source, multi-platform, and accessible to non-programmers. We demonstrate its utility for processing quantitative proteomic data derived from patient-derived xenografts and its speed and scalability enable a novel de novo proteomic workflow for complex microbiota samples.


Assuntos
Proteínas , Proteômica/métodos , Software , Animais , Bases de Dados de Proteínas , Humanos , Camundongos , Microbiota/genética , Proteínas/química , Proteínas/classificação , Proteínas/genética , Análise de Sequência de Proteína/métodos
20.
Phys Chem Chem Phys ; 22(9): 5057-5069, 2020 Mar 07.
Artigo em Inglês | MEDLINE | ID: mdl-32073000

RESUMO

Graph theory-based reaction pathway searches (ACE-Reaction program) and density functional theory calculations were performed to shed light on the mechanisms for the production of [an + H]+, xn+, yn+, zn+, and [yn + 2H]+ fragments formed in free radical-initiated peptide sequencing (FRIPS) mass spectrometry measurements of a small model system of glycine-glycine-arginine (GGR). In particular, the graph theory-based searches, which are rarely applied to gas-phase reaction studies, allowed us to investigate reaction mechanisms in an exhaustive manner without resorting to chemical intuition. As expected, radical-driven reaction pathways were favorable over charge-driven reaction pathways in terms of kinetics and thermodynamics. Charge- and radical-driven pathways for the formation of [yn + 2H]+ fragments were carefully compared, and it was revealed that the [yn + 2H]+ fragments observed in our FRIPS MS spectra originated from the radical-driven pathway, which is in contrast to the general expectation. The acquired understanding of the FRIPS fragmentation mechanism is expected to aid in the interpretation of FRIPS MS spectra. It should be emphasized that graph theory-based searches are powerful and effective methods for studying reaction mechanisms, including gas-phase reactions in mass spectrometry.


Assuntos
Teoria da Densidade Funcional , Radicais Livres/química , Oligopeptídeos/química , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Óxidos N-Cíclicos/química , Gases/química , Cinética , Espectrometria de Massas , Simulação de Dinâmica Molecular , Termodinâmica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...