Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 14.757
Filtrar
1.
Gene ; 723: 144134, 2020 Jan 10.
Artigo em Inglês | MEDLINE | ID: mdl-31589960

RESUMO

Viral kinases are known to undergo autophosphorylation and also phosphorylate viral and host substrates. Viral kinases have been implicated in various diseases and are also known to acquire host kinases for mimicking cellular functions and exhibit virulence. Although substantial analyses have been reported in the literature on diversity of viral kinases, there is a gap in the understanding of sequence and structural similarity among kinases from different classes of viruses. In this study, we performed a comprehensive analysis of protein kinases encoded in viral genomes. Homology search methods have been used to identify kinases from 104,282 viral genomic datasets. Serine/threonine and tyrosine kinases are identified only in 390 viral genomes. Out of seven viral classes that are based on nature of genetic material, only viruses having double-stranded DNA and single-stranded RNA retroviruses are found to encode kinases. The 716 identified protein kinases are classified into 63 subfamilies based on their sequence similarity within each cluster, and sequence signatures have been identified for each subfamily. 11 clusters are well represented with at least 10 members in each of these clusters. Kinases from dsDNA viruses, Phycodnaviridae which infect green algae and Herpesvirales that infect vertebrates including human, form a major group. From our analysis, it has been observed that the protein kinases in viruses belonging to same taxonomic lineages form discrete clusters and the kinases encoded in alphaherpesvirus form host-specific clusters. A comprehensive sequence and structure-based analysis enabled us to identify the conserved residues or motifs in kinase catalytic domain regions across all viral kinases. Conserved sequence regions that are specific to a particular viral kinase cluster and the kinases that show close similarity to eukaryotic kinases were identified by using sequence and three-dimensional structural regions of eukaryotic kinases as reference. The regions specific to each viral kinase cluster can be used as signatures in the future in classifying uncharacterized viral kinases. We note that kinases from giant viruses Marseilleviridae have close similarity to viral oncogenes in the functional regions and in putative substrate binding regions indicating their possible role in cancer.


Assuntos
Proteínas Quinases/química , Proteínas Quinases/genética , Vírus/classificação , Domínio Catalítico , Biologia Computacional/métodos , Bases de Dados de Proteínas , Variação Genética , Fosforilação , Filogenia , Proteínas Quinases/metabolismo , Homologia de Sequência de Aminoácidos , Proteínas Virais/química , Proteínas Virais/genética , Proteínas Virais/metabolismo , Fatores de Virulência/química , Fatores de Virulência/genética , Fatores de Virulência/metabolismo , Vírus/enzimologia , Vírus/patogenicidade
2.
Adv Exp Med Biol ; 1163: 65-87, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31707700

RESUMO

An allosteric mechanism refers to the biological regulation process wherein macromolecules propagate the effect of ligand binding at one site to a spatially distant orthosteric locus, thus affecting activity. The theory has remained a trending topic in biology research for over 50 years, since the understanding of allostery is fundamental for gleaning numerous biological processes and developing new drug therapies. In the past two decades, the allosteric paradigm has evolved into more descriptive models, with ever-expanding amounts of experimental data pertaining to newly identified allosteric molecules. The AlloSteric Database (ASD, accessible at http://mdl.shsmu.edu.cn/ASD ), which is a comprehensive knowledge repository, has provided the public with integrated information encompassing allosteric proteins, modulators, sites, pathways, and networks to investigate allostery since 2009. In this chapter, we introduce the history and usage of the ASD and give attention to specific applications that have benefited from the ASD.


Assuntos
Sítio Alostérico , Descoberta de Drogas , Proteínas , Regulação Alostérica , Bases de Dados de Proteínas , Descoberta de Drogas/tendências , Proteínas/química
3.
J Chem Theory Comput ; 15(11): 6456-6470, 2019 Nov 12.
Artigo em Inglês | MEDLINE | ID: mdl-31553606

RESUMO

Accurate determinations of noncovalent interactions in biological systems are fundamental to rationalize the structure and to get insights into the functions and the dynamics of macromolecules. Here we propose a new tool for the efficient identification of noncovalent interactions in proteins. The noncovalent interaction (NCI) method, a well-established strategy to detect noncovalent interactions in chemical systems, is coupled with the libraries of extremely localized molecular orbitals (ELMOs), which allow instantaneous reconstruction of quantum mechanically rigorous electron distributions of polypeptides and proteins. Test calculations performed on different interactions show that the new NCI-ELMO strategy always outperforms the original NCI method based on the promolecular approximation, while it is in close agreement with original NCI analyses based on fully quantum mechanical calculations. The new technique allows for unraveling the network of interactions in biological systems and to rapidly monitor their evolutions with time, with possible consequences on the design of new drugs.


Assuntos
Modelos Moleculares , Proteínas/química , Teoria Quântica , Bases de Dados de Proteínas , Desenho de Drogas , Encefalina Leucina/química , Encefalina Leucina/metabolismo , Ligações de Hidrogênio , Metais/química , Proteínas/metabolismo
4.
BMC Bioinformatics ; 20(1): 454, 2019 Sep 05.
Artigo em Inglês | MEDLINE | ID: mdl-31488049

RESUMO

BACKGROUND: As genome sequencing projects grow rapidly, the diversity of organisms with recently assembled genome sequences peaks at an unprecedented scale, thereby highlighting the need to make gene functional annotations fast and efficient. However, the (high) quality of such annotations must be guaranteed, as this is the first indicator of the genomic potential of every organism. Automatic procedures help accelerating the annotation process, though decreasing the confidence and reliability of the outcomes. Manually curating a genome-wide annotation of genes, enzymes and transporter proteins function is a highly time-consuming, tedious and impractical task, even for the most proficient curator. Hence, a semi-automated procedure, which balances the two approaches, will increase the reliability of the annotation, while speeding up the process. In fact, a prior analysis of the annotation algorithm may leverage its performance, by manipulating its parameters, hastening the downstream processing and the manual curation of assigning functions to genes encoding proteins. RESULTS: Here SamPler, a novel strategy to select parameters for gene functional annotation routines is presented. This semi-automated method is based on the manual curation of a randomly selected set of genes/proteins. Then, in a multi-dimensional array, this sample is used to assess the automatic annotations for all possible combinations of the algorithm's parameters. These assessments allow creating an array of confusion matrices, for which several metrics are calculated (accuracy, precision and negative predictive value) and used to reach optimal values for the parameters. CONCLUSIONS: The potential of this methodology is demonstrated with four genome functional annotations performed in merlin, an in-house user-friendly computational framework for genome-scale metabolic annotation and model reconstruction. For that, SamPler was implemented as a new plugin for the merlin tool.


Assuntos
Algoritmos , Anotação de Sequência Molecular/métodos , Bactérias/genética , Mapeamento Cromossômico , Bases de Dados de Proteínas , Reprodutibilidade dos Testes
5.
Comput Biol Chem ; 81: 9-15, 2019 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-31472418

RESUMO

Position-Specific Scoring Matrix (PSSM) is an excellent feature extraction method that was proposed early in protein classifying prediction, but within the restriction of feature shape in PSSM, researchers make a lot attempts to process it so that PSSM can be input to the traditional machine learning algorithms. These processes drop information provided by PSSM in a way thus the feature representation is limited. Moreover, the high-dimensional feature representation of PSSM makes it incompatible with other feature extraction methods. We use the PSSM as the input of Recurrent Neural Network without any post-processing, the amino acids in protein sequences are regarded as time step in RNN. This way takes full advantage of the information that PSSM provides. In this study, the PSSM is input to the model directly and the internal information of PSSM is fully utilized, we propose an end-to-end solution and achieve state-of-the-art performance. Ultimately, the exploration of how to combine PSSM with traditional feature extraction methods is carried out and achieve slightly improved performance. Our network architecture is implemented in Python and is available at https://github.com/YellowcardD/RNN-for-membrane-protein-types-prediction.


Assuntos
Proteínas de Membrana/classificação , Redes Neurais (Computação) , Matrizes de Pontuação de Posição Específica , Biologia Computacional/métodos , Bases de Dados de Proteínas/estatística & dados numéricos , Proteínas de Membrana/química
6.
Phys Chem Chem Phys ; 21(32): 17950-17958, 2019 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-31384849

RESUMO

The A. aeolicus intrinsically disordered protein FlgM has four well-defined α-helices when bound to σ28, but in water FlgM undergoes a change in tertiary structure. In this work, we investigate the structure of FlgM in aqueous solutions of the ionic liquid [C4mpy][Tf2N]. We find that FlgM is induced to fold by the addition of the ionic liquid, achieving average α-helicity values similar to the bound state. Analysis of secondary structure reveals significant similarity with the bound state, but the tertiary structure is found to be more compact. Interestingly, the ionic liquid is not homogeneously dispersed in the water, but instead aggregates near the protein. Separate simulations of aqueous ionic liquid do not show ion clustering, which suggests that FlgM stabilizes ionic liquid aggregation.


Assuntos
Proteínas de Bactérias/química , Imidas/química , Proteínas Intrinsicamente Desordenadas/química , Líquidos Iônicos/química , Modelos Moleculares , Pirrolidinas/química , Bases de Dados de Proteínas , Conformação Proteica em alfa-Hélice , Dobramento de Proteína , Termodinâmica , Água
7.
Chemistry ; 25(58): 13436-13443, 2019 Oct 17.
Artigo em Inglês | MEDLINE | ID: mdl-31453653

RESUMO

Studying noncanonical intermolecular interactions between a ligand and a protein constitutes an emerging research field. Identifying synthetically accessible molecular fragments that can engage in intermolecular interactions is a key objective in this area. Here, it is shown that so-called "π-hole interactions" are present between the nitro moiety in nitro aromatic ligands and lone pairs within protein structures (water and protein carbonyls and sulfurs). Ample structural evidence was found in a PDB analysis and computations reveal interaction energies of about -5 kcal mol-1 for ligand-protein π-hole interactions. Several examples are highlighted for which a π-hole interaction is implicated in the superior binding affinity or inhibition of a nitro aromatic ligand versus a similar non-nitro analogue. The discovery that π-hole interactions with nitro aromatics are significant within protein structures parallels the finding that halogen bonds are biologically relevant. This has implications for the interpretation of ligand-protein complexation phenomena, for example, involving the more than 50 approved drugs that contain a nitro aromatic moiety.


Assuntos
Simulação por Computador , Modelos Moleculares , Nitrocompostos/química , Proteínas/química , Sequência de Aminoácidos , Bases de Dados de Proteínas , Ligações de Hidrogênio , Interações Hidrofóbicas e Hidrofílicas , Ligantes , Estrutura Molecular , Ligação Proteica , Relação Estrutura-Atividade , Termodinâmica , Água
8.
Acta Crystallogr D Struct Biol ; 75(Pt 8): 696-717, 2019 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-31373570

RESUMO

Current software tools for the automated building of models for macromolecular X-ray crystal structures are capable of assembling high-quality models for ordered macromolecule and small-molecule scattering components with minimal or no user supervision. Many of these tools also incorporate robust functionality for modelling the ordered water molecules that are found in nearly all macromolecular crystal structures. However, no current tools focus on differentiating these ubiquitous water molecules from other frequently occurring multi-atom solvent species, such as sulfate, or the automated building of models for such species. PeakProbe has been developed specifically to address the need for such a tool. PeakProbe predicts likely solvent models for a given point (termed a `peak') in a structure based on analysis (`probing') of its local electron density and chemical environment. PeakProbe maps a total of 19 resolution-dependent features associated with electron density and two associated with the local chemical environment to a two-dimensional score space that is independent of resolution. Peaks are classified based on the relative frequencies with which four different classes of solvent (including water) are observed within a given region of this score space as determined by large-scale sampling of solvent models in the Protein Data Bank. Designed to classify peaks generated from difference density maxima, PeakProbe also incorporates functionality for identifying peaks associated with model errors or clusters of peaks likely to correspond to multi-atom solvent, and for the validation of existing solvent models using solvent-omit electron-density maps. When tasked with classifying peaks into one of four distinct solvent classes, PeakProbe achieves greater than 99% accuracy for both peaks derived directly from the atomic coordinates of existing solvent models and those based on difference density maxima. While the program is still under development, a fully functional version is publicly available. PeakProbe makes extensive use of cctbx libraries, and requires a PHENIX licence and an up-to-date phenix.python environment for execution.


Assuntos
Cristalografia por Raios X/métodos , Substâncias Macromoleculares/química , Proteínas/química , Software , Solventes/química , Água/química , Bases de Dados de Proteínas , Conjuntos de Dados como Assunto , Modelos Moleculares , Conformação Proteica
9.
Acta Crystallogr D Struct Biol ; 75(Pt 8): 753-763, 2019 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-31373574

RESUMO

The performance of automated model building in crystal structure determination usually decreases with the resolution of the experimental data, and may result in fragmented models and incorrect side-chain assignment. Presented here are new methods for machine-learning-based docking of main-chain fragments to the sequence and for their sequence-independent connection using a dedicated library of protein fragments. The combined use of these new methods noticeably increases sequence coverage and reduces fragmentation of the protein models automatically built with ARP/wARP.


Assuntos
Cristalografia por Raios X/métodos , Aprendizado de Máquina , Proteínas/química , Algoritmos , Sequência de Aminoácidos , Bases de Dados de Proteínas , Conjuntos de Dados como Assunto , Simulação de Acoplamento Molecular , Conformação Proteica , Software
10.
BMC Bioinformatics ; 20(1): 446, 2019 Aug 28.
Artigo em Inglês | MEDLINE | ID: mdl-31462221

RESUMO

BACKGROUND: Protein interaction databases often provide confidence scores for each recorded interaction based on the available experimental evidence. Protein interaction networks (PINs) are then built by thresholding on these scores, so that only interactions of sufficiently high quality are included. These networks are used to identify biologically relevant motifs or nodes using metrics such as degree or betweenness centrality. This type of analysis can be sensitive to the choice of threshold. If a node metric is to be useful for extracting biological signal, it should induce similar node rankings across PINs obtained at different reasonable confidence score thresholds. RESULTS: We propose three measures-rank continuity, identifiability, and instability-to evaluate how robust a node metric is to changes in the score threshold. We apply our measures to twenty-five metrics and identify four as the most robust: the number of edges in the step-1 ego network, as well as the leave-one-out differences in average redundancy, average number of edges in the step-1 ego network, and natural connectivity. Our measures show good agreement across PINs from different species and data sources. Analysis of synthetically generated scored networks shows that robustness results are context-specific, and depend both on network topology and on how scores are placed across network edges. CONCLUSION: Due to the uncertainty associated with protein interaction detection, and therefore network structure, for PIN analysis to be reproducible, it should yield similar results across different confidence score thresholds. We demonstrate that while certain node metrics are robust with respect to threshold choice, this is not always the case. Promisingly, our results suggest that there are some metrics that are robust across networks constructed from different databases, and different scoring procedures.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas , Mapas de Interação de Proteínas , Proteínas/metabolismo , Algoritmos , Humanos
11.
BMC Bioinformatics ; 20(1): 438, 2019 Aug 23.
Artigo em Inglês | MEDLINE | ID: mdl-31443634

RESUMO

BACKGROUND: One of the most important steps in peptide identification is to estimate the false discovery rate (FDR). The most commonly used method for estimating FDR is the target-decoy search strategy (TDS). While this method is simple and effective, it is time/space-inefficient because it searches a database that is twice as large as the original protein database. This inefficiency problem becomes more evident as protein databases get bigger and bigger. We propose a target-small decoy search strategy and present a rigorous verification that it reduces the database size and search time while retaining the accuracy of target-decoy search strategy (TDS). RESULTS: We show that peptide spectrum matches (PSMs) obtained at 1% FDR in TDS overlap ~ 99% with those in our method. (Considering that 1% FDR is used, 99% overlap means our method is very accurate.) Moreover, our method is more time/space-efficient than TDS. The search time of our method is reduced to only 1/4 of that of TDS when UniProt and its 1/8 decoy database are used. CONCLUSIONS: We demonstrate that our method is almost as accurate as TDS and more time/space-efficient than TDS. Since the efficiency of our method is more evident as the database size increases, our method is expected to be useful for identifying peptides in proteogenomics databases constructed from inflated databases using genomic data.


Assuntos
Biologia Computacional/métodos , Peptídeos/química , Algoritmos , Linhagem Celular , Bases de Dados de Proteínas , Humanos
12.
BMC Bioinformatics ; 20(1): 443, 2019 Aug 28.
Artigo em Inglês | MEDLINE | ID: mdl-31455212

RESUMO

BACKGROUND: Cryo-electron tomography (Cryo-ET) is an imaging technique used to generate three-dimensional structures of cellular macromolecule complexes in their native environment. Due to developing cryo-electron microscopy technology, the image quality of three-dimensional reconstruction of cryo-electron tomography has greatly improved. However, cryo-ET images are characterized by low resolution, partial data loss and low signal-to-noise ratio (SNR). In order to tackle these challenges and improve resolution, a large number of subtomograms containing the same structure needs to be aligned and averaged. Existing methods for refining and aligning subtomograms are still highly time-consuming, requiring many computationally intensive processing steps (i.e. the rotations and translations of subtomograms in three-dimensional space). RESULTS: In this article, we propose a Stochastic Average Gradient (SAG) fine-grained alignment method for optimizing the sum of dissimilarity measure in real space. We introduce a Message Passing Interface (MPI) parallel programming model in order to explore further speedup. CONCLUSIONS: We compare our stochastic average gradient fine-grained alignment algorithm with two baseline methods, high-precision alignment and fast alignment. Our SAG fine-grained alignment algorithm is much faster than the two baseline methods. Results on simulated data of GroEL from the Protein Data Bank (PDB ID:1KP8) showed that our parallel SAG-based fine-grained alignment method could achieve close-to-optimal rigid transformations with higher precision than both high-precision alignment and fast alignment at a low SNR (SNR=0.003) with tilt angle range ±60∘ or ±40∘. For the experimental subtomograms data structures of GroEL and GroEL/GroES complexes, our parallel SAG-based fine-grained alignment can achieve higher precision and fewer iterations to converge than the two baseline methods.


Assuntos
Algoritmos , Microscopia Crioeletrônica/métodos , Tomografia com Microscopia Eletrônica/métodos , Chaperonina 10/ultraestrutura , Chaperonina 60/ultraestrutura , Bases de Dados de Proteínas , Processamento de Imagem Assistida por Computador/métodos , Razão Sinal-Ruído , Fatores de Tempo
13.
Nat Chem Biol ; 15(9): 853-864, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31427814

RESUMO

Glycans linked to proteins and lipids play key roles in biology; thus, accurate replication of cellular glycans is crucial for maintaining function following cell division. The fact that glycans are not copied from genomic templates suggests that fidelity is provided by the catalytic templates of glycosyltransferases that accurately add sugars to specific locations on growing oligosaccharides. To form new glycosidic bonds, glycosyltransferases bind acceptor substrates and orient a specific hydroxyl group, frequently one of many, for attack of the donor sugar anomeric carbon. Several recent crystal structures of glycosyltransferases with bound acceptor substrates reveal that these enzymes have common core structures that function as scaffolds upon which variable loops are inserted to confer substrate specificity and correctly orient the nucleophilic hydroxyl group. The varied approaches for acceptor binding site assembly suggest an ongoing evolution of these loop regions provides templates for assembly of the diverse glycan structures observed in biology.


Assuntos
Glicosiltransferases/metabolismo , Polissacarídeos/biossíntese , Polissacarídeos/química , Animais , Configuração de Carboidratos , Bases de Dados de Proteínas , Regulação Enzimológica da Expressão Gênica
14.
Nat Biotechnol ; 37(8): 953-961, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-31375809

RESUMO

Ruminants provide essential nutrition for billions of people worldwide. The rumen is a specialized stomach that is adapted to the breakdown of plant-derived complex polysaccharides. The genomes of the rumen microbiota encode thousands of enzymes adapted to digestion of the plant matter that dominates the ruminant diet. We assembled 4,941 rumen microbial metagenome-assembled genomes (MAGs) using approximately 6.5 terabases of short- and long-read sequence data from 283 ruminant cattle. We present a genome-resolved metagenomics workflow that enabled assembly of bacterial and archaeal genomes that were at least 80% complete. Of note, we obtained three single-contig, whole-chromosome assemblies of rumen bacteria, two of which represent previously unknown rumen species, assembled from long-read data. Using our rumen genome collection we predicted and annotated a large set of rumen proteins. Our set of rumen MAGs increases the rate of mapping of rumen metagenomic sequencing reads from 15% to 50-70%. These genomic and protein resources will enable a better understanding of the structure and functions of the rumen microbiota.


Assuntos
Archaea/genética , Bactérias/genética , Metagenoma , Metagenômica/métodos , Rúmen/microbiologia , Animais , Proteínas Arqueais , Proteínas de Bactérias , Bovinos/microbiologia , Bases de Dados de Proteínas , Genoma Arqueal , Genoma Bacteriano , Filogenia , Ovinos/microbiologia
15.
Cell Mol Life Sci ; 76(22): 4407-4412, 2019 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-31432235

RESUMO

Moonlighting proteins perform multiple unrelated functions without any change in polypeptide sequence. They can coordinate cellular activities, serving as switches between pathways and helping to respond to changes in the cellular environment. Therefore, regulation of the multiple protein activities, in space and time, is likely to be important for the homeostasis of biological systems. Some moonlighting proteins may perform their multiple functions simultaneously while others alternate between functions due to certain triggers. The switch of the moonlighting protein's functions can be regulated by several distinct factors, including the binding of other molecules such as proteins. We here review the approaches used to identify moonlighting proteins and existing repositories. We particularly emphasise the role played by short linear motifs and PTMs as regulatory switches of moonlighting functions.


Assuntos
Proteínas/metabolismo , Animais , Fenômenos Fisiológicos Celulares/fisiologia , Bases de Dados de Proteínas , Humanos , Conformação Proteica
16.
J Phys Chem Lett ; 10(15): 4382-4400, 2019 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-31304749

RESUMO

It has been demonstrated that MMP13 enzyme is related to most cancer cell tumors. The world's largest traditional Chinese medicine database was applied to screen for structure-based drug design and ligand-based drug design. To predict drug activity, machine learning models (Random Forest (RF), AdaBoost Regressor (ABR), Gradient Boosting Regressor (GBR)), and Deep Learning models were utilized to validate the Docking results, and we obtained an R2 of 0.922 on the training set and 0.804 on the test set in the RF algorithm. For the Deep Learning algorithm, R2 of the training set is 0.90, and R2 of the test set is 0.810. However, these TCM compounds fly away during the molecular dynamics (MD) simulation. We seek another method: peptide design. All peptide database were screened by the Docking process. Modification peptides were optimized the interaction modes, and the affinities were assessed with ZDOCK protocol and Refine Docked protein protocol. The 300 ns MD simulation evaluated the stability of receptor-peptide complexes. The double-site effect appeared on S2, a designed peptide based on a known inhibitor, when complexed with BCL2. S3, a designed peptide referred from endogenous inhibitor P16, competed against cyclin when binding with CDK6. The MDM2 inhibitors S5 and S6 were derived from the P53 structure and stable binding with MDM2. A flexible region of peptides S5 and S6 may enhance the binding ability by changing its own conformation, which was unforeseen. These peptides (S2, S3, S5, and S6) are potentially interesting to treat cancer; however, these findings need to be affirmed by biological testing, which will be conducted in the near future.


Assuntos
Antineoplásicos/química , Aprendizado Profundo , Aprendizado de Máquina , Modelos Moleculares , Peptídeos/química , Proteínas/química , Algoritmos , Sítios de Ligação , Quinase 6 Dependente de Ciclina/química , Inibidor p16 de Quinase Dependente de Ciclina/química , Bases de Dados de Produtos Farmacêuticos , Bases de Dados de Proteínas , Desenho de Drogas , Ligantes , Metaloproteinase 13 da Matriz/química , Mutação , Proteínas Proto-Oncogênicas c-bcl-2/química , Proteínas Proto-Oncogênicas c-mdm2/química , Proteína Supressora de Tumor p53/química , Proteína Supressora de Tumor p53/genética
17.
BMC Bioinformatics ; 20(1): 397, 2019 Jul 17.
Artigo em Inglês | MEDLINE | ID: mdl-31315562

RESUMO

BACKGROUND: Tandem mass spectrometry (MS/MS)-based database searching is a widely acknowledged and widely used method for peptide identification in shotgun proteomics. However, due to the rapid growth of spectra data produced by advanced mass spectrometry and the greatly increased number of modified and digested peptides identified in recent years, the current methods for peptide database searching cannot rapidly and thoroughly process large MS/MS spectra datasets. A breakthrough in efficient database search algorithms is crucial for peptide identification in computational proteomics. RESULTS: This paper presents MCtandem, an efficient tool for large-scale peptide identification on Intel Many Integrated Core (MIC) architecture. To support big data processing capability, a novel parallel match scoring algorithm, named MIC-SDP (spectrum dot product), and its two-level parallelization are presented in MCtandem's design. In addition, a series of optimization strategies on both the host CPU side and the MIC side, which includes pre-fetching, optimized communication overlapping scheme, multithreading and hyper-threading, are exploited to improve the execution performance. CONCLUSIONS: For fair comparisons, we first set up experiments and verified the 28 fold times speedup on a single MIC against the original CPU-based implementation. We then execute the MCtandem for a very large dataset on an MIC cluster (a component of the Tianhe-2 supercomputer) and achieved much higher scalability than in a benchmark MapReduce-based programs, MR-Tandem. MCtandem is an open-source software tool implemented in C++. The source code and the parameter settings are available at https://github.com/LogicZY/MCtandem .


Assuntos
Peptídeos/química , Software , Espectrometria de Massas em Tandem , Algoritmos , Bases de Dados de Proteínas , Humanos , Proteômica/métodos
18.
BMC Bioinformatics ; 20(1): 400, 2019 Jul 18.
Artigo em Inglês | MEDLINE | ID: mdl-31319797

RESUMO

BACKGROUND: The CATH database provides a hierarchical classification of protein domain structures including a sub-classification of superfamilies into functional families (FunFams). We analyzed the similarity of binding site annotations in these FunFams and incorporated FunFams into the prediction of protein binding residues. RESULTS: FunFam members agreed, on average, in 36.9 ± 0.6% of their binding residue annotations. This constituted a 6.7-fold increase over randomly grouped proteins and a 1.2-fold increase (1.1-fold on the same dataset) over proteins with the same enzymatic function (identical Enzyme Commission, EC, number). Mapping de novo binding residue prediction methods (BindPredict-CCS, BindPredict-CC) onto FunFam resulted in consensus predictions for those residues that were aligned and predicted alike (binding/non-binding) within a FunFam. This simple consensus increased the F1-score (for binding) 1.5-fold over the original prediction method. Variation of the threshold for how many proteins in the consensus prediction had to agree provided a convenient control of accuracy/precision and coverage/recall, e.g. reaching a precision as high as 60.8 ± 0.4% for a stringent threshold. CONCLUSIONS: The FunFams outperformed even the carefully curated EC numbers in terms of agreement of binding site residues. Additionally, we assume that our proof-of-principle through the prediction of protein binding residues will be relevant for many other solutions profiting from FunFams to infer functional information at the residue level.


Assuntos
Domínios Proteicos , Proteínas/química , Sítios de Ligação , Bases de Dados de Proteínas , Ligação Proteica , Proteínas/classificação , Proteínas/metabolismo
19.
J Chem Theory Comput ; 15(8): 4602-4614, 2019 Aug 13.
Artigo em Inglês | MEDLINE | ID: mdl-31268700

RESUMO

Many biological processes are based on molecular recognition between highly charged molecules such as nucleic acids, inorganic ions, charged amino acids, etc. For such cases, it has been demonstrated that molecular simulations with fixed partial charges often fail to achieve experimental accuracy. Although incorporation of more advanced electrostatic models (such as multipoles, mutual polarization, etc.) can significantly improve simulation accuracy, it increases computational expense by a factor of 5-20×. Indirect free energy (IFE) methods can mitigate this cost by modeling intermediate states at fixed-charge resolution. For example, an efficient "reference" model such as a pairwise Amber, CHARMM, or OPLS-AA force field can be used to derive an initial estimate, followed by thermodynamic corrections to a more advanced "target" potential such as the polarizable AMOEBA model. Unfortunately, all currently described IFE methods encounter difficulties reweighting more than ∼50 atoms between resolutions due to extensive scaling of both the magnitude of the thermodynamic corrections and their statistical uncertainty. We present an approach called "simultaneous bookending" (SB) that is fundamentally different from existing IFE methods based on a tunable sampling approximation, which permits scaling to thousands of atoms. SB is demonstrated on the relative binding affinity of Mg2+/Ca2+ to a set of metalloproteins with up to 2972 atoms, finding no statistically significant difference between direct AMOEBA results and those from correcting Amber to AMOEBA. The ability to change the resolution of thousands of atoms during reweighting suggests the approach may be applicable in the future to protein-protein binding affinities or nucleic acid thermodynamics.


Assuntos
Cálcio/metabolismo , Cátions Bivalentes/metabolismo , Magnésio/metabolismo , Metaloproteínas/metabolismo , Animais , Cálcio/química , Cátions Bivalentes/química , Bases de Dados de Proteínas , Humanos , Magnésio/química , Metaloproteínas/química , Simulação de Acoplamento Molecular , Simulação de Dinâmica Molecular , Ligação Proteica , Software , Eletricidade Estática , Termodinâmica
20.
Food Chem Toxicol ; 132: 110656, 2019 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-31279045

RESUMO

Part of the allergenicity assessment of newly expressed proteins in genetically engineered food crops involves an assessment of potential cross-reactivity with known allergens. Bioinformatic approaches are used to evaluate the amino acid sequence identity or similarity between newly expressed proteins and the sequences of known allergens. To be useful, such approaches must be sensitive to detecting cross-reactive potential, but also capable of excluding low-risk sequences. One difficulty in comparing the effectiveness of different bioinformatic approaches has been the lack of a standardized validation and evaluation method. Here, we propose a standardized method for evaluating the sensitivity of different bioinformatic algorithms using a comprehensive database of known allergen sequences. We combine this with a previously described method for evaluating selectivity using sequences from a crop not known to commonly cause food allergy (e.g. maize) to compare the standard ">35% identity-criterion over sliding-window of ≥80 amino acids" bioinformatic approach with the previously described "one-to-one (1:1) FASTA" similarity approach using an E-value threshold of 1E-9. Results confirm the superiority of the 1:1 FASTA approach for selectively detecting cross-reactive allergens. The validation methods described here can be applied to other algorithms to select even better fit-for-purpose approaches for evaluating cross-reactive risk.


Assuntos
Alérgenos/química , Biologia Computacional/normas , Proteínas de Plantas/química , Algoritmos , Alérgenos/imunologia , Sequência de Aminoácidos , Reações Cruzadas/imunologia , Bases de Dados de Proteínas/estatística & dados numéricos , Proteínas de Plantas/imunologia , Zea mays/química
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA