Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 4.611
Filtrar
1.
Phys Chem Chem Phys ; 22(9): 5057-5069, 2020 Mar 07.
Artigo em Inglês | MEDLINE | ID: mdl-32073000

RESUMO

Graph theory-based reaction pathway searches (ACE-Reaction program) and density functional theory calculations were performed to shed light on the mechanisms for the production of [an + H]+, xn+, yn+, zn+, and [yn + 2H]+ fragments formed in free radical-initiated peptide sequencing (FRIPS) mass spectrometry measurements of a small model system of glycine-glycine-arginine (GGR). In particular, the graph theory-based searches, which are rarely applied to gas-phase reaction studies, allowed us to investigate reaction mechanisms in an exhaustive manner without resorting to chemical intuition. As expected, radical-driven reaction pathways were favorable over charge-driven reaction pathways in terms of kinetics and thermodynamics. Charge- and radical-driven pathways for the formation of [yn + 2H]+ fragments were carefully compared, and it was revealed that the [yn + 2H]+ fragments observed in our FRIPS MS spectra originated from the radical-driven pathway, which is in contrast to the general expectation. The acquired understanding of the FRIPS fragmentation mechanism is expected to aid in the interpretation of FRIPS MS spectra. It should be emphasized that graph theory-based searches are powerful and effective methods for studying reaction mechanisms, including gas-phase reactions in mass spectrometry.


Assuntos
Teoria da Densidade Funcional , Radicais Livres/química , Oligopeptídeos/química , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Óxidos N-Cíclicos/química , Gases/química , Cinética , Espectrometria de Massas , Simulação de Dinâmica Molecular , Termodinâmica
2.
Genome Biol ; 20(1): 279, 2019 12 16.
Artigo em Inglês | MEDLINE | ID: mdl-31842968

RESUMO

Identification of functional elements for a protein of interest is important for achieving a mechanistic understanding. However, it remains cumbersome to assess each and every amino acid of a given protein in relevance to its functional significance. Here, we report a strategy, PArsing fragmented DNA Sequences from CRISPR Tiling MUtagenesis Screening (PASTMUS), which provides a streamlined workflow and a bioinformatics pipeline to identify critical amino acids of proteins in their native biological contexts. Using this approach, we map six proteins-three bacterial toxin receptors and three cancer drug targets, and acquire their corresponding functional maps at amino acid resolution.


Assuntos
Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Proteínas de Ciclo Celular/química , Fator de Crescimento Semelhante a EGF de Ligação à Heparina/química , Humanos , Proteínas Serina-Treonina Quinases/química , Proteínas Proto-Oncogênicas/química , Relação Estrutura-Atividade
3.
Int J Mol Sci ; 20(20)2019 Oct 12.
Artigo em Inglês | MEDLINE | ID: mdl-31614716

RESUMO

Scientists have to perform multiple experiments producing qualitative and quantitative data to determine if a compound is able to bind to a given target. Due to the large diversity of the potential ligand chemical space, the possibility of experimentally exploring a lot of compounds on a target rapidly becomes out of reach. Scientists therefore need to use virtual screening methods to determine the putative binding mode of ligands on a protein and then post-process the raw docking experiments with a dedicated scoring function in relation with experimental data. Two of the major difficulties for comparing docking predictions with experiments mostly come from the lack of transferability of experimental data and the lack of standardisation in molecule names. Although large portals like PubChem or ChEMBL are available for general purpose, there is no service allowing a formal expert annotation of both experimental data and docking studies. To address these issues, researchers build their own collection of data in flat files, often in spreadsheets, with limited possibilities of extensive annotations or standardisation of ligand descriptions allowing cross-database retrieval. We have conceived the dockNmine platform to provide a service allowing an expert and authenticated annotation of ligands and targets. First, this portal allows a scientist to incorporate controlled information in the database using reference identifiers for the protein (Uniprot ID) and the ligand (SMILES description), the data and the publication associated to it. Second, it allows the incorporation of docking experiments using forms that automatically parse useful parameters and results. Last, the web interface provides a lot of pre-computed outputs to assess the degree of correlations between docking experiments and experimental data.


Assuntos
Descoberta de Drogas/métodos , Análise de Sequência de Proteína/métodos , Software , Animais , Sítios de Ligação , Humanos , Ligantes , Ligação Proteica , Relação Quantitativa Estrutura-Atividade
4.
PLoS Comput Biol ; 15(10): e1007411, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-31622328

RESUMO

Accurate prediction of atomic-level protein structure is important for annotating the biological functions of protein molecules and for designing new compounds to regulate the functions. Template-based modeling (TBM), which aims to construct structural models by copying and refining the structural frameworks of other known proteins, remains the most accurate method for protein structure prediction. Due to the difficulty in recognizing distant-homology templates, however, the accuracy of TBM decreases rapidly when the evolutionary relationship between the query and template vanishes. In this study, we propose a new method, CEthreader, which first predicts residue-residue contacts by coupling evolutionary precision matrices with deep residual convolutional neural-networks. The predicted contact maps are then integrated with sequence profile alignments to recognize structural templates from the PDB. The method was tested on two independent benchmark sets consisting collectively of 1,153 non-homologous protein targets, where CEthreader detected 176% or 36% more correct templates with a TM-score >0.5 than the best state-of-the-art profile- or contact-based threading methods, respectively, for the Hard targets that lacked homologous templates. Moreover, CEthreader was able to identify 114% or 20% more correct templates with the same Fold as the query, after excluding structures from the same SCOPe Superfamily, than the best profile- or contact-based threading methods. Detailed analyses show that the major advantage of CEthreader lies in the efficient coupling of contact maps with profile alignments, which helps recognize global fold of protein structures when the homologous relationship between the query and template is weak. These results demonstrate an efficient new strategy to combine ab initio contact map prediction with profile alignments to significantly improve the accuracy of template-based structure prediction, especially for distant-homology proteins.


Assuntos
Rede Nervosa/fisiologia , Análise de Sequência de Proteína/métodos , Homologia Estrutural de Proteína , Algoritmos , Sequência de Aminoácidos , Biologia Computacional/métodos , Bases de Dados de Proteínas , Modelos Biológicos , Conformação Proteica , Proteínas/química , Alinhamento de Sequência , Software
5.
BMC Bioinformatics ; 20(1): 473, 2019 Sep 14.
Artigo em Inglês | MEDLINE | ID: mdl-31521110

RESUMO

BACKGROUND: HH-suite is a widely used open source software suite for sensitive sequence similarity searches and protein fold recognition. It is based on pairwise alignment of profile Hidden Markov models (HMMs), which represent multiple sequence alignments of homologous proteins. RESULTS: We developed a single-instruction multiple-data (SIMD) vectorized implementation of the Viterbi algorithm for profile HMM alignment and introduced various other speed-ups. These accelerated the search methods HHsearch by a factor 4 and HHblits by a factor 2 over the previous version 2.0.16. HHblits3 is ∼10× faster than PSI-BLAST and ∼20× faster than HMMER3. Jobs to perform HHsearch and HHblits searches with many query profile HMMs can be parallelized over cores and over cluster servers using OpenMP and message passing interface (MPI). The free, open-source, GPLv3-licensed software is available at https://github.com/soedinglab/hh-suite . CONCLUSION: The added functionalities and increased speed of HHsearch and HHblits should facilitate their use in large-scale protein structure and function prediction, e.g. in metagenomics and genomics projects.


Assuntos
Anotação de Sequência Molecular/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Software , Algoritmos , Cadeias de Markov
6.
PLoS Comput Biol ; 15(9): e1006909, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31479443

RESUMO

Proteases are multifunctional, promiscuous enzymes that degrade proteins as well as peptides and drive important processes in health and disease. Current technology has enabled the construction of libraries of peptide substrates that detect protease activity, which provides valuable biological information. An ideal library would be orthogonal, such that each protease only hydrolyzes one unique substrate, however this is impractical due to off-target promiscuity (i.e., one protease targets multiple different substrates). Therefore, when a library of probes is exposed to a cocktail of proteases, each protease activates multiple probes, producing a convoluted signature. Computational methods for parsing these signatures to estimate individual protease activities primarily use an extensive collection of all possible protease-substrate combinations, which require impractical amounts of training data when expanding to search for more candidate substrates. Here we provide a computational method for estimating protease activities efficiently by reducing the number of substrates and clustering proteases with similar cleavage activities into families. We envision that this method will be used to extract meaningful diagnostic information from biological samples.


Assuntos
Biologia Computacional/métodos , Peptídeo Hidrolases , Análise de Sequência de Proteína/métodos , Especificidade por Substrato/fisiologia , Análise por Conglomerados , Humanos , Cinética , Modelos Moleculares , Peptídeo Hidrolases/análise , Peptídeo Hidrolases/química , Peptídeo Hidrolases/classificação , Peptídeo Hidrolases/metabolismo , Peptídeos/análise , Peptídeos/química , Peptídeos/metabolismo , Proteínas Recombinantes/química , Proteínas Recombinantes/metabolismo
7.
Int J Mol Sci ; 20(13)2019 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-31261733

RESUMO

Discovering conserved three-dimensional (3D) patterns among protein structures may provide valuable insights into protein classification, functional annotations or the rational design of multi-target drugs. Thus, several computational tools have been developed to discover and compare protein 3D-patterns. However, most of them only consider previously known 3D-patterns such as orthosteric binding sites or structural motifs. This fact makes necessary the development of new methods for the identification of all possible 3D-patterns that exist in protein structures (allosteric sites, enzyme-cofactor interaction motifs, among others). In this work, we present 3D-PP, a new free access web server for the discovery and recognition all similar 3D amino acid patterns among a set of proteins structures (independent of their sequence similarity). This new tool does not require any previous structural knowledge about ligands, and all data are organized in a high-performance graph database. The input can be a text file with the PDB access codes or a zip file of PDB coordinates regardless of the origin of the structural data: X-ray crystallographic experiments or in silico homology modeling. The results are presented as lists of sequence patterns that can be further analyzed within the web page. We tested the accuracy and suitability of 3D-PP using two sets of proteins coming from the Protein Data Bank: (a) Zinc finger containing and (b) Serotonin target proteins. We also evaluated its usefulness for the discovering of new 3D-patterns, using a set of protein structures coming from in silico homology modeling methodologies, all of which are overexpressed in different types of cancer. Results indicate that 3D-PP is a reliable, flexible and friendly-user tool to identify conserved structural motifs, which could be relevant to improve the knowledge about protein function or classification. The web server can be freely utilized at https://appsbio.utalca.cl/3d-pp/.


Assuntos
Sequência Conservada , Análise de Sequência de Proteína/métodos , Software , Sítio Alostérico , Sequência de Aminoácidos , Animais , Humanos , Conformação Proteica
8.
Talanta ; 204: 367-371, 2019 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-31357307

RESUMO

A rapid and efficient method to isolate global N-termini is presented. Utilizing laser-assisted proteolysis and Fe3O4 microsphere, protein N-termini could be isolated in 4 h. The amino-blocked protein was digested by trypsin assisted by laser radiation, shortening the digest time from overnight to 40 s. Non-N-terminal peptides were characterized by a tryptic free amino in their N-term, which could be derived with sulfhydryl by traut' s reagent efficiently and then coupled with Fe3O4 microspheres nearly completely in less than 4 h. The rapid method was beneficial for the identification of unstable N-termini in short-lived proteins. Human serum albumin was studied as a model. The N-terminus was successfully isolated from the digest within 4 h. Also, 2011 N-terminal peptides out of 936 proteins in mouse liver proteome sample were identified using liquid chromatography-tandem mass spectrometer (LC-MS/MS). This method was demonstrated as a facile and efficient N-termini enrichment method for targeted protein N-termini analysis, especially those with short half-life.


Assuntos
Óxido Ferroso-Férrico/química , Peptídeos/análise , Domínios Proteicos , Proteoma/química , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Animais , Cromatografia Líquida/métodos , Óxido Ferroso-Férrico/síntese química , Humanos , Raios Infravermelhos , Lasers , Fígado/química , Camundongos , Microesferas , Peptídeos/química , Proteólise/efeitos da radiação , Proteoma/efeitos da radiação , Proteômica/métodos , Albumina Sérica Humana/química , Albumina Sérica Humana/efeitos da radiação , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz/métodos , Espectrometria de Massas em Tandem/métodos
9.
Medicina (Kaunas) ; 55(8)2019 Jul 26.
Artigo em Inglês | MEDLINE | ID: mdl-31357502

RESUMO

Background and Objectives: The defects in the CLDN16 gene are a cause of primary hypomagnesemia (FHHNC), which is characterized by massive renal magnesium wasting, resulting in nephrocalcinosis and renal failure. The mutations occur throughout the gene's coding region and can impact on intracellular trafficking of the protein or its paracellular pore forming function. To gain more understanding about the mechanisms by which CLDN16 mutations can induce FHHNC, we performed an in-depth computational analysis of the CLDN16 gene and protein, focusing specifically on the prediction of the latter's subcellular localization. Materials and Methods: The complete nucleotide or amino acid sequence of CLDN16 in FASTA format was entered and processed in 14 databases. Results: One CpG island was identified. Twenty five promoters/enhancers were predicted. The CLDN16 interactome was found to consist of 20 genes, mainly involved in kidney diseases. No signal peptide cleavage site was identified. A probability of export to mitochondria equal to 0.9740 and a cleavable mitochondrial localization signal in the N terminal of the CLDN16 protein were predicted. The secondary structure prediction was visualized. Νo phosphorylation sites were identified within the CLDN16 protein region by applying DISPHOS to the functional class of transport. The KnotProt database did not predict any knot or slipknot in the protein structure of CLDN16. Seven putative miRNA binding sites within the 3'-UTR region of CLDN16 were identified. Conclusions: This is the first study to identify mitochondria as a probable cytoplasmic compartment for CLDN16 localization, thus providing new insights into the protein's intracellular transport. The results relative to the CLDN16 interactome underline its role in renal pathophysiology and highlight the functional dependence of CLDNs-10, 14, 16, 19. The predictions pertaining to the miRNAs, promoters/enhancers and CpG islands of the CLDN16 gene indicate a strict regulation of its expression both transcriptionally and post-transcriptionally.


Assuntos
Claudinas/análise , Claudinas/genética , Biologia Computacional/métodos , Mitocôndrias/genética , Humanos , Hipercalciúria/genética , Nefrocalcinose/genética , Regiões Promotoras Genéticas/genética , Erros Inatos do Transporte Tubular Renal/genética , Análise de Sequência de Proteína/métodos
10.
PLoS One ; 14(6): e0218717, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31233538

RESUMO

The diversity of antibody variable regions makes cDNA sequencing challenging, and conventional monoclonal antibody cDNA amplification requires the use of degenerate primers. Here, we describe a simplified workflow for amplification of IgG antibody variable regions from hybridoma RNA by a specialized RT-PCR followed by Sanger sequencing. We perform three separate reactions for each hybridoma: one each for kappa, lambda, and heavy chain transcripts. We prime reverse transcription with a primer specific to the respective constant region and use a template-switch oligonucleotide, which creates a custom sequence at the 5' end of the antibody cDNA. This template-switching circumvents the issue of low sequence homology and the need for degenerate primers. Instead, subsequent PCR amplification of the antibody cDNA molecules requires only two primers: one primer specific for the template-switch oligonucleotide sequence and a nested primer to the respective constant region. We successfully sequenced the variable regions of five mouse monoclonal IgG antibodies using this method, which enabled us to design chimeric mouse/human antibody expression plasmids for recombinant antibody production in mammalian cell culture expression systems. All five recombinant antibodies bind their respective antigens with high affinity, confirming that the amino acid sequences determined by our method are correct and demonstrating the high success rate of our method. Furthermore, we also designed RT-PCR primers and amplified the variable regions from RNA of cells transfected with chimeric mouse/human antibody expression plasmids, showing that our approach is also applicable to IgG antibodies of human origin. Our monoclonal antibody sequencing method is highly accurate, user-friendly, and very cost-effective.


Assuntos
Anticorpos Monoclonais/genética , Região Variável de Imunoglobulina/genética , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Animais , Anticorpos Monoclonais/metabolismo , Reações Antígeno-Anticorpo , Primers do DNA/genética , DNA Complementar/genética , Células HEK293 , Humanos , Hibridomas/imunologia , Imunoglobulina G/genética , Camundongos , Proteínas Recombinantes de Fusão/genética , Proteínas Recombinantes de Fusão/imunologia , Proteínas Recombinantes de Fusão/metabolismo , Reação em Cadeia da Polimerase Via Transcriptase Reversa/métodos , Fluxo de Trabalho
11.
PLoS Comput Biol ; 15(6): e1007129, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-31199797

RESUMO

Identification of drug-target interactions (DTIs) plays a key role in drug discovery. The high cost and labor-intensive nature of in vitro and in vivo experiments have highlighted the importance of in silico-based DTI prediction approaches. In several computational models, conventional protein descriptors have been shown to not be sufficiently informative to predict accurate DTIs. Thus, in this study, we propose a deep learning based DTI prediction model capturing local residue patterns of proteins participating in DTIs. When we employ a convolutional neural network (CNN) on raw protein sequences, we perform convolution on various lengths of amino acids subsequences to capture local residue patterns of generalized protein classes. We train our model with large-scale DTI information and demonstrate the performance of the proposed model using an independent dataset that is not seen during the training phase. As a result, our model performs better than previous protein descriptor-based models. Also, our model performs better than the recently developed deep learning models for massive prediction of DTIs. By examining pooled convolution results, we confirmed that our model can detect binding sites of proteins for DTIs. In conclusion, our prediction model for detecting local residue patterns of target proteins successfully enriches the protein features of a raw protein sequence, yielding better prediction results than previous approaches. Our code is available at https://github.com/GIST-CSBL/DeepConv-DTI.


Assuntos
Aprendizado Profundo , Descoberta de Drogas/métodos , Proteínas , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Sítios de Ligação , Biologia Computacional , Simulação por Computador , Ligantes , Modelos Moleculares , Proteínas/química , Proteínas/metabolismo
12.
BMC Genomics ; 20(Suppl 5): 424, 2019 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-31167665

RESUMO

BACKGROUND: Motifs are crucial patterns that have numerous applications including the identification of transcription factors and their binding sites, composite regulatory patterns, similarity between families of proteins, etc. Several motif models have been proposed in the literature. The (l,d)-motif model is one of these that has been studied widely. However, this model will sometimes report too many spurious motifs than expected. We interpret a motif as a biologically significant entity that is evolutionarily preserved within some distance. It may be highly improbable that the motif undergoes the same number of changes in each of the species. To address this issue, in this paper, we introduce a new model which is more general than (l,d)-motif model. This model is called (l,d1,d2)-motif model (LDDMS) and is NP-hard as well. We present three elegant as well as efficient algorithms to solve the LDDMS problem, i.e., LDDMS1, LDDMS2 and LDDMS3. They are all exact algorithms. RESULTS: We did both theoretical analyses and empirical tests on these algorithms. Theoretical analyses demonstrate that our algorithms have less computational cost than the pattern driven approach. Empirical results on both simulated datasets and real datasets show that each of the three algorithms has some advantages on some (l,d1,d2) instances. CONCLUSIONS: We proposed LDDMS model which is more practically relevant. We also proposed three exact efficient algorithms to solve the problem. Besides, our algorithms can be nicely parallelized. We believe that the idea in this new model can also be extended to other motif search problems such as Edit-distance-based Motif Search (EMS) and Simple Motif Search (SMS).


Assuntos
Algoritmos , Motivos de Aminoácidos , Motivos de Nucleotídeos , Biologia Computacional , Humanos , Modelos Teóricos , Análise de Sequência de DNA/métodos , Análise de Sequência de Proteína/métodos
13.
J Am Soc Mass Spectrom ; 30(10): 1914-1922, 2019 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-31250319

RESUMO

A strategy to sequence lysine-containing cyclic peptides by MSn is presented. Doubly protonated cyclic peptides ions are transformed into gold (I) cationized peptide ions via cation switching ion/ion reaction. Gold(I) cationization facilitates the oxidation of neutral lysine residues in the gas phase, weakening the adjacent amide bond. Upon activation, facile cleavage N-terminal to the oxidized lysine residue provides a site-specific ring opening pathway that converts cyclic peptides into acyclic analogs. The ensuing ion contains a cyclic imine as the new N-terminus and an oxazolone, or structural equivalent, as the new C-terminus. Product ions are formed from subsequent fragmentation events of the linearized peptide ion. Such an approach simplifies MS/MS data interpretation as a series of fragment ions with common N- and C-termini are generated. Results are presented for two cyclic peptides, sunflower trypsin inhibitor and the model cyclic peptide, ß-Loop. The power of this strategy lies in the ability to generate the oxidized peptide, which is easily identified via the loss of HAuNH3 from [M + Au]+. While some competitive processes are observed, the site of ring opening can be pinpointed to the lysine residue upon MS4 enabling the unambiguous sequencing of cyclic peptides.


Assuntos
Ouro/química , Lisina/química , Peptídeos Cíclicos/química , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Cátions/química , Peptídeos Cíclicos/análise , Espectrometria de Massas em Tandem/métodos
14.
BMC Plant Biol ; 19(1): 210, 2019 May 21.
Artigo em Inglês | MEDLINE | ID: mdl-31113367

RESUMO

BACKGROUND: Taxus cuspidata is well known worldwide for its ability to produce Taxol, one of the top-selling natural anticancer drugs. However, current Taxol production cannot match the increasing needs of the market, and novel strategies should be considered to increase the supply of Taxol. Since the biosynthetic mechanism of Taxol remains largely unknown, elucidating this pathway in detail will be very helpful in exploring alternative methods for Taxol production. RESULTS: Here, we sequenced Taxus cuspidata transcriptomes with next-generation sequencing (NGS) and third-generation sequencing (TGS) platforms. After correction with Illumina reads and removal of redundant reads, more than 180,000 nonredundant transcripts were generated from the raw Iso-Seq data. Using Cogent software and an alignment-based method, we identified a total of 139 cytochrome P450s (CYP450s), 31 BAHD acyltransferases (ACTs) and 1940 transcription factors (TFs). Based on phylogenetic and coexpression analysis, we identified 9 CYP450s and 7 BAHD ACTs as potential lead candidates for Taxol biosynthesis and 6 TFs that are possibly involved in the regulation of this process. Using coexpression analysis of genes known to be involved in Taxol biosynthesis, we elucidated the stem biosynthetic pathway. In addition, we analyzed the expression patterns of 12 characterized genes in the Taxol pathway and speculated that the isoprene precursors for Taxol biosynthesis were mainly synthesized via the MEP pathway. In addition, we found and confirmed that the alternative splicing patterns of some genes varied in different tissues, which may be an important tissue-specific method of posttranscriptional regulation. CONCLUSIONS: A strategy was developed to generate corrected full-length or nearly full-length transcripts without assembly to ensure sequence accuracy, thus greatly improving the reliability of coexpression and phylogenetic analysis and greatly facilitating gene cloning and characterization. This strategy was successfully utilized to elucidate the Taxol biosynthetic pathway, which will greatly contribute to the goals of improving the Taxol content in Taxus spp. using molecular breeding or plant management strategies and synthesizing Taxol in microorganisms using synthetic biological technology.


Assuntos
Regulação da Expressão Gênica de Plantas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Paclitaxel/biossíntese , Análise de Sequência de Proteína/métodos , Taxus/genética , Transcriptoma , Sequência de Aminoácidos , Filogenia , Reprodutibilidade dos Testes , Alinhamento de Sequência , Taxus/metabolismo
16.
BMC Bioinformatics ; 20(1): 205, 2019 Apr 23.
Artigo em Inglês | MEDLINE | ID: mdl-31014229

RESUMO

BACKGROUND: Sub-nuclear structures or locations are associated with various nuclear processes. Proteins localized in these substructures are important to understand the interior nuclear mechanisms. Despite advances in high-throughput methods, experimental protein annotations remain limited. Predictions of cellular compartments have become very accurate, largely at the expense of leaving out substructures inside the nucleus making a fine-grained analysis impossible. RESULTS: Here, we present a new method (LocNuclei) that predicts nuclear substructures from sequence alone. LocNuclei used a string-based Profile Kernel with Support Vector Machines (SVMs). It distinguishes sub-nuclear localization in 13 distinct substructures and distinguishes between nuclear proteins confined to the nucleus and those that are also native to other compartments (traveler proteins). High performance was achieved by implicitly leveraging a large biological knowledge-base in creating predictions by homology-based inference through BLAST. Using this approach, the performance reached AUC = 0.70-0.74 and Q13 = 59-65%. Travelling proteins (nucleus and other) were identified at Q2 = 70-74%. A Gene Ontology (GO) analysis of the enrichment of biological processes revealed that the predicted sub-nuclear compartments matched the expected functionality. Analysis of protein-protein interactions (PPI) show that formation of compartments and functionality of proteins in these compartments highly rely on interactions between proteins. This suggested that the LocNuclei predictions carry important information about function. The source code and data sets are available through GitHub: https://github.com/Rostlab/LocNuclei . CONCLUSIONS: LocNuclei predicts subnuclear compartments and traveler proteins accurately. These predictions carry important information about functionality and PPIs.


Assuntos
Núcleo Celular/química , Biologia Computacional/métodos , Proteínas Nucleares , Análise de Sequência de Proteína/métodos , Proteínas Nucleares/química , Proteínas Nucleares/classificação , Proteínas Nucleares/fisiologia , Proteínas/química , Proteínas/classificação , Proteínas/fisiologia , Máquina de Vetores de Suporte
17.
Biopolymers ; 110(8): e23282, 2019 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-30977898

RESUMO

How to characterize short protein sequences to make an effective connection to their functions is an unsolved problem. Here we propose to map the physicochemical properties of each amino acid onto unit spheres so that each protein sequence can be represented quantitatively. We demonstrate the usefulness of this representation by applying it to the prediction of cell penetrating peptides. We show that its combination with traditional composition features yields the best performance across different datasets, among several methods compared. For the convenience of users, a web server has been established for automatic calculations of the proposed features at http://biophy.dzu.edu.cn/SNumD/.


Assuntos
Algoritmos , Proteínas/química , Sequência de Aminoácidos , Análise de Sequência de Proteína/métodos , Interface Usuário-Computador
18.
PLoS One ; 14(3): e0212868, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30921350

RESUMO

We propose and theoretically study an approach to massively parallel single molecule peptide sequencing, based on single molecule measurement of the kinetics of probe binding (Havranek, et al., 2013) to the N-termini of immobilized peptides. Unlike previous proposals, this method is robust to both weak and non-specific probe-target affinities, which we demonstrate by applying the method to a range of randomized affinity matrices consisting of relatively low-quality binders. This suggests a novel principle for proteomic measurement whereby highly non-optimized sets of low-affinity binders could be applicable for protein sequencing, thus shifting the burden of amino acid identification from biomolecular design to readout. Measurement of probe occupancy times, or of time-averaged fluorescence, should allow high-accuracy determination of N-terminal amino acid identity for realistic probe sets. The time-averaged fluorescence method scales well to weakly-binding probes with dissociation constants of tens or hundreds of micromolar, and bypasses photobleaching limitations associated with other fluorescence-based approaches to protein sequencing. We argue that this method could lead to an approach with single amino acid resolution and the ability to distinguish many canonical and modified amino acids, even using highly non-optimized probe sets. This readout method should expand the design space for single molecule peptide sequencing by removing constraints on the properties of the fluorescent binding probes.


Assuntos
Proteômica/métodos , Análise de Sequência de Proteína/métodos , Imagem Individual de Molécula , Sequência de Aminoácidos , Fluorescência , Corantes Fluorescentes/química , Cinética , Peptídeos/química , Ligação Proteica
19.
MAbs ; 11(4): 767-778, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30919719

RESUMO

Growth in the pharmaceutical industry has led to an increasing demand for rapid characterization of therapeutic monoclonal antibodies. The current methods for antibody sequence confirmation (e.g., N-terminal Edman sequencing and traditional peptide mapping methods) are not sufficient; thus, we developed a fast method for sequencing recombinant monoclonal antibodies using a novel digestion-on-emitter technology. Using this method, a monoclonal antibody can be denatured, reduced, digested, and sequenced in less than an hour. High throughput and satisfactory protein sequence coverage were achieved by using a non-specific protease from Aspergillus saitoi, protease XIII, to digest the denatured and reduced monoclonal antibody on an electrospray emitter, while electrospray high voltage was applied to the digestion mixture through the emitter. Tandem mass spectrometry data was acquired over the course of enzyme digestion, generating similar information compared to standard peptide mapping experiments in much less time. We demonstrated that this fast protein sequencing method provided sufficient sequence information for bovine serum albumin and two commercially available monoclonal antibodies, mouse IgG1 MOPC21 and humanized IgG1 NISTmAb. For two monoclonal antibodies, we obtained sequence coverage of 90.5-95.1% for the heavy chains and 98.6-99.1% for the light chains. We found that on-emitter digestion by protease XIII generated peptides of various lengths during the digestion process, which was critical for achieving sufficient sequence coverage. Moreover, we discovered that the enzyme-to-substrate ratio was an important parameter that affects protein sequence coverage. Due to its highly automatable and efficient design, our method offers a major advantage over N-terminal Edman sequencing and traditional peptide mapping methods in the identification of protein sequence, and is capable of meeting an ever-increasing demand for monoclonal antibody sequence confirmation in the biopharmaceutical industry.


Assuntos
Anticorpos Monoclonais/química , Ácido Aspártico Endopeptidases/química , Aspergillus/metabolismo , Imunoglobulina G/química , Análise de Sequência de Proteína/métodos , Animais , Humanos , Camundongos , Nanoestruturas/química , Mapeamento de Peptídeos , Proteólise , Espectrometria de Massas por Ionização por Electrospray
20.
Mol Biol Rep ; 46(3): 3571-3596, 2019 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-30915687

RESUMO

Life in living organisms is dependent on specific and purposeful interaction between other molecules. Such purposeful interactions make the various processes inside the cells and the bodies of living organisms possible. DNA-protein interactions, among all the types of interactions between different molecules, are of considerable importance. Currently, with the development of numerous experimental techniques, diverse methods are convenient for recognition and investigating such interactions. While the traditional experimental techniques to identify DNA-protein complexes are time-consuming and are unsuitable for genome-scale studies, the current high throughput approaches are more efficient in determining such interaction at a large-scale, but they are clearly too costly to be practice for daily applications. Hence, according to the availability of much information related to different biological sequences and clearing different dimensions of conditions in which such interactions are formed, with the developments related to the computer, mathematics, and statistics motivate scientists to develop bioinformatics tools for prediction the interaction site(s). Until now, there has been much progress in this field. In this review, the factors and conditions governing the interaction and the laboratory techniques for examining such interactions are addressed. In addition, developed bioinformatics tools are introduced and compared for this reason and, in the end, several suggestions are offered for the promotion of such tools in prediction with much more precision.


Assuntos
Sítios de Ligação/fisiologia , Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/metabolismo , Previsões/métodos , Animais , Sítios de Ligação/genética , Biologia Computacional/métodos , DNA/genética , DNA/metabolismo , Análise de Dados , Humanos , Modelos Moleculares , Análise de Sequência de DNA/métodos , Análise de Sequência de Proteína/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA