Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 180
Filtrar
1.
Nucleic Acids Res ; 52(D1): D1062-D1071, 2024 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-38000392

RESUMEN

The SysteMHC Atlas v1.0 was the first public repository dedicated to mass spectrometry-based immunopeptidomics. Here we introduce a newly released version of the SysteMHC Atlas v2.0 (https://systemhc.sjtu.edu.cn), a comprehensive collection of 7190 MS files from 303 allotypes. We extended and optimized a computational pipeline that allows the identification of MHC-bound peptides carrying on unexpected post-translational modifications (PTMs), thereby resulting in 471K modified peptides identified over 60 distinct PTM types. In total, we identified approximately 1.0 million and 1.1 million unique peptides for MHC class I and class II immunopeptidomes, respectively, indicating a 6.8-fold increase and a 28-fold increase to those in v1.0. The SysteMHC Atlas v2.0 introduces several new features, including the inclusion of non-UniProt peptides, and the incorporation of several novel computational tools for FDR estimation, binding affinity prediction and motif deconvolution. Additionally, we enhanced the user interface, upgraded website framework, and provided external links to other resources related. Finally, we built and provided various spectral libraries as community resources for data mining and future immunopeptidomic and proteomic analysis. We believe that the SysteMHC Atlas v2.0 is a unique resource to provide key insights to the immunology and proteomics community and will accelerate the development of vaccines and immunotherapies.


Asunto(s)
Bases de Datos de Proteínas , Péptidos , Proteómica , Espectrometría de Masas , Péptidos/química , Péptidos/inmunología , Procesamiento Proteico-Postraduccional , Proteómica/métodos , Bases de Datos de Proteínas/normas , Internet , Humanos , Animales
2.
Acta Crystallogr F Struct Biol Commun ; 77(Pt 7): 226-229, 2021 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-34196613

RESUMEN

In macromolecular crystallography, paired refinement is generally accepted to be the optimal approach for the determination of the high-resolution cutoff. The software tool PAIREF provides automation of the protocol and associated analysis. Support for phenix.refine as a refinement engine has recently been implemented in the program. This feature is presented here using previously published data for thermolysin. The results demonstrate the importance of the complete cross-validation procedure to obtain a thorough and unbiased insight into the quality of high-resolution data.


Asunto(s)
Cristalografía por Rayos X/métodos , Bases de Datos de Proteínas , Programas Informáticos , Cristalografía por Rayos X/normas , Bases de Datos de Proteínas/normas , Programas Informáticos/normas
3.
Structure ; 29(4): 393-400.e1, 2021 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-33657417

RESUMEN

The Worldwide Protein Data Bank (wwPDB) has provided validation reports based on recommendations from community Validation Task Forces for structures in the PDB since 2013. To further enhance validation of small molecules as recommended from the 2016 Ligand Validation Workshop, wwPDB, Global Phasing Ltd., and the Noguchi Institute, recently formed a public/private partnership to incorporate some of their software tools into the wwPDB validation package. Augmented wwPDB validation report features include: two-dimensional (2D) diagrams of small-molecule ligands and carbohydrates, highlighting geometric validation outcomes; 2D topological diagrams of oligosaccharides present in branched entities generated using 2D Symbol Nomenclature for Glycan representation; and views of 3D electron density maps for ligands and carbohydrates, illustrating the goodness-of-fit between the atomic structure and experimental data (X-ray crystallographic structures only). These improvements will impact confidence in ligand conformation and ligand-macromolecular interactions that will aid in understanding biochemical function and contribute to small-molecule drug discovery.


Asunto(s)
Carbohidratos/química , Bases de Datos de Proteínas/normas , Simulación del Acoplamiento Molecular/métodos , Proteómica/métodos , Bibliotecas de Moléculas Pequeñas/química , Quimioinformática/métodos , Bases de Datos de Compuestos Químicos/normas , Humanos , Ligandos , Unión Proteica , Proteoma/química , Proteoma/metabolismo
4.
Proteins ; 89(2): 242-250, 2021 02.
Artículo en Inglés | MEDLINE | ID: mdl-32935893

RESUMEN

A major challenge for protein databases is reconciling information from diverse sources. This is especially difficult when some information consists of secondary, human-interpreted rather than primary data. For example, the Swiss-Prot database contains curated annotations of subcellular location that are based on predictions from protein sequence, statements in scientific articles, and published experimental evidence. The Human Protein Atlas (HPA) consists of millions of high-resolution microscopic images that show protein spatial distribution on a cellular and subcellular level. These images are manually annotated with protein subcellular locations by trained experts. The image annotations in HPA can capture the variation of subcellular location across different cell lines, tissues, or tissue states. Systematic investigation of the consistency between HPA and Swiss-Prot assignments of subcellular location, which is important for understanding and utilizing protein location data from the two databases, has not been described previously. In this paper, we quantitatively evaluate the consistency of subcellular location annotations between HPA and Swiss-Prot at multiple levels, as well as variation of protein locations across cell lines and tissues. Our results show that annotations of these two databases differ significantly in many cases, leading to proposed procedures for deriving and integrating the protein subcellular location data. We also find that proteins having highly variable locations are more likely to be biomarkers of diseases, providing support for incorporating analysis of subcellular location in protein biomarker identification and screening.


Asunto(s)
Bases de Datos de Proteínas/normas , Anotación de Secuencia Molecular/normas , Proteínas/metabolismo , Atlas como Asunto , Compartimento Celular , Línea Celular , Células Eucariotas/metabolismo , Células Eucariotas/ultraestructura , Humanos , Variaciones Dependientes del Observador , Proteínas/química , Proteínas/genética , Reproducibilidad de los Resultados , Incertidumbre
5.
J Immunother Cancer ; 8(2)2020 10.
Artículo en Inglés | MEDLINE | ID: mdl-33109630

RESUMEN

BACKGROUND: Checkpoint targets play a key role in tumor-mediated immune escape and therefore are critical for cancer immunotherapy. Unfortunately, there is a lack of bioinformatics resource that compile all the checkpoint targets for translational research and drug discovery in immuno-oncology. METHODS: To this end, we developed checkpoint therapeutic target database (CKTTD), the first comprehensive database for immune checkpoint targets (proteins, miRNAs and LncRNAs) and their modulators. A scoring system was adopted to filter more relevant targets with high confidence. In addition, a few biological databases such as Oncomine, Drugbank, miRBase and Lnc2Cancer database were integrated into CKTTD to provide an in-depth information. Moreover, we computed and provided ligand-binding site information for all the targets which may support bench scientists for drug discovery efforts. RESULTS: In total, CKTTD compiles 105 checkpoint protein targets, 53 modulators (small-molecules and antibody), 30 miRNAs and 18 LncRNAs in cancer immunotherapy with validated experimental evidences curated from 10 649 literatures via an enhanced text-mining system. CONCLUSIONS: In conclusion, the CKTTD may serve as a useful platform for the research of cancer immunotherapy and drug discovery. The CKTTD database is freely available to public at http://www.ckttdb.org/.


Asunto(s)
Bases de Datos de Proteínas/normas , Inmunoterapia/métodos , Humanos
6.
BMC Bioinformatics ; 21(Suppl 13): 384, 2020 Sep 17.
Artículo en Inglés | MEDLINE | ID: mdl-32938375

RESUMEN

BACKGROUND: Protein-DNA interaction governs a large number of cellular processes, and it can be altered by a small fraction of interface residues, i.e., the so-called hot spots, which account for most of the interface binding free energy. Accurate prediction of hot spots is critical to understand the principle of protein-DNA interactions. There are already some computational methods that can accurately and efficiently predict a large number of hot residues. However, the insufficiency of experimentally validated hot-spot residues in protein-DNA complexes and the low diversity of the employed features limit the performance of existing methods. RESULTS: Here, we report a new computational method for effectively predicting hot spots in protein-DNA binding interfaces. This method, called PreHots (the abbreviation of Predicting Hotspots), adopts an ensemble stacking classifier that integrates different machine learning classifiers to generate a robust model with 19 features selected by a sequential backward feature selection algorithm. To this end, we constructed two new and reliable datasets (one benchmark for model training and one independent dataset for validation), which totally consist of 123 hot spots and 137 non-hot spots from 89 protein-DNA complexes. The data were manually collected from the literature and existing databases with a strict process of redundancy removal. Our method achieves a sensitivity of 0.813 and an AUC score of 0.868 in 10-fold cross-validation on the benchmark dataset, and a sensitivity of 0.818 and an AUC score of 0.820 on the independent test dataset. The results show that our approach outperforms the existing ones. CONCLUSIONS: PreHots, which is based on stack ensemble of boosting algorithms, can reliably predict hot spots at the protein-DNA binding interface on a large scale. Compared with the existing methods, PreHots can achieve better prediction performance. Both the webserver of PreHots and the datasets are freely available at: http://dmb.tongji.edu.cn/tools/PreHots/ .


Asunto(s)
Algoritmos , Proteínas de Unión al ADN/genética , Bases de Datos de Proteínas/normas , Humanos , Modelos Moleculares
7.
FEBS J ; 287(17): 3703-3718, 2020 09.
Artículo en Inglés | MEDLINE | ID: mdl-32418327

RESUMEN

A bright spot in the SARS-CoV-2 (CoV-2) coronavirus pandemic has been the immediate mobilization of the biomedical community, working to develop treatments and vaccines for COVID-19. Rational drug design against emerging threats depends on well-established methodology, mainly utilizing X-ray crystallography, to provide accurate structure models of the macromolecular drug targets and of their complexes with candidates for drug development. In the current crisis, the structural biological community has responded by presenting structure models of CoV-2 proteins and depositing them in the Protein Data Bank (PDB), usually without time embargo and before publication. Since the structures from the first-line research are produced in an accelerated mode, there is an elevated chance of mistakes and errors, with the ultimate risk of hindering, rather than speeding up, drug development. In the present work, we have used model-validation metrics and examined the electron density maps for the deposited models of CoV-2 proteins and a sample of related proteins available in the PDB as of April 1, 2020. We present these results with the aim of helping the biomedical community establish a better-validated pool of data. The proteins are divided into groups according to their structure and function. In most cases, no major corrections were necessary. However, in several cases significant revisions in the functionally sensitive area of protein-inhibitor complexes or for bound ions justified correction, re-refinement, and eventually reversioning in the PDB. The re-refined coordinate files and a tool for facilitating model comparisons are available at https://covid-19.bioreproducibility.org. DATABASE: Validated models of CoV-2 proteins are available in a dedicated, publicly accessible web service https://covid-19.bioreproducibility.org.


Asunto(s)
Enzima Convertidora de Angiotensina 2/química , Antivirales/química , Proteasas 3C de Coronavirus/química , Receptores Virales/química , SARS-CoV-2/química , Glicoproteína de la Espiga del Coronavirus/química , Enzima Convertidora de Angiotensina 2/antagonistas & inhibidores , Enzima Convertidora de Angiotensina 2/genética , Enzima Convertidora de Angiotensina 2/metabolismo , Antivirales/farmacología , Sitios de Unión , COVID-19/virología , Proteasas 3C de Coronavirus/antagonistas & inhibidores , Proteasas 3C de Coronavirus/genética , Proteasas 3C de Coronavirus/metabolismo , Microscopía por Crioelectrón , Cristalografía por Rayos X , Bases de Datos de Proteínas/normas , Diseño de Fármacos , Humanos , Ligandos , Modelos Moleculares , Inhibidores de Proteasas/química , Inhibidores de Proteasas/farmacología , Unión Proteica , Conformación Proteica en Hélice alfa , Conformación Proteica en Lámina beta , Dominios y Motivos de Interacción de Proteínas , Receptores Virales/antagonistas & inhibidores , Receptores Virales/genética , Receptores Virales/metabolismo , Glicoproteína de la Espiga del Coronavirus/antagonistas & inhibidores , Glicoproteína de la Espiga del Coronavirus/genética , Glicoproteína de la Espiga del Coronavirus/metabolismo , Termodinámica
8.
FEBS J ; 287(13): 2685-2698, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-32311227

RESUMEN

Crystallographic models of biological macromolecules have been ranked using the quality criteria associated with them in the Protein Data Bank (PDB). The outcomes of this quality analysis have been correlated with time and with the journals that published papers based on those models. The results show that the overall quality of PDB structures has substantially improved over the last ten years, but this period of progress was preceded by several years of stagnation or even depression. Moreover, the study shows that the historically observed negative correlation between journal impact and the quality of structural models presented therein seems to disappear as time progresses.


Asunto(s)
Biología Computacional/métodos , Bases de Datos de Proteínas/normas , Sustancias Macromoleculares/química , Modelos Moleculares , Proteínas/química , Control de Calidad , Algoritmos , Conformación Proteica , Dominios Proteicos
9.
Proteomics ; 20(10): e1900261, 2020 05.
Artículo en Inglés | MEDLINE | ID: mdl-32249536

RESUMEN

Proteogenomics is gaining momentum as, today, genomics, transcriptomics, and proteomics can be readily performed on any new species. This approach allows key alterations to molecular pathways to be identified when comparing conditions. For animals and plants, RNA-seq-informed proteomics is the most popular means of interpreting tandem mass spectrometry spectra acquired for species for which the genome has not yet been sequenced. It relies on high-performance de novo RNA-seq assembly and optimized translation strategies. Here, several pre-treatments for Illumina RNA-seq reads before assembly are explored to translate the resulting contigs into useful polypeptide sequences. Experimental transcriptomics and proteomics datasets acquired for individual Gammarus fossarum freshwater crustaceans are used, the most relevant procedure is defined by the ratio of MS/MS spectra assigned to peptide sequences. Removing reads with a mean quality score of less than 17-which represents a single probable nucleotide error on 150-bp reads-prior to assembly, increases the proteomics outcome. The best translation using Transdecoder is achieved with a minimal open reading frame length of 50 amino acids and systematic selection of ORFs longer than 900 nucleotides. Using these parameters, transcriptome assembly and translation informed by proteomics pave the way to further improvements in proteogenomics.


Asunto(s)
Proteogenómica/métodos , Proteómica , RNA-Seq , Transcriptoma/genética , Secuencia de Aminoácidos/genética , Animales , Biología Computacional , Bases de Datos de Proteínas/normas , Genoma/genética , Genómica/tendencias , Análisis de Secuencia de ARN
10.
FEBS J ; 287(13): 2664-2684, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-31944606

RESUMEN

Phosphatases play an essential role in the regulation of protein phosphorylation. Less abundant than kinases, many phosphatases are components of one or more macromolecular complexes with different substrate specificities and specific functionalities. The expert scientific curation of phosphatase complexes for the UniProt and Complex Portal databases supports the whole scientific community by collating and organising small- and large-scale experimental data from the scientific literature into context-specific central resources, where the data can be freely accessed and used to further academic and translational research. In this review, we discuss how the diverse biological functions of phosphatase complexes are presented in UniProt and the Complex Portal, and how understanding the biological significance of phosphatase complexes in Caenorhabditis elegans offers insight into the mechanisms of substrate diversity in a variety of cellular and molecular processes.


Asunto(s)
Proteínas de Caenorhabditis elegans/metabolismo , Caenorhabditis elegans/metabolismo , Bases de Datos de Proteínas/normas , Complejos Multiproteicos/metabolismo , Monoéster Fosfórico Hidrolasas/metabolismo , Procesamiento Proteico-Postraduccional , Animales , Proteínas de Caenorhabditis elegans/química , Complejos Multiproteicos/química , Monoéster Fosfórico Hidrolasas/química , Fosforilación , Especificidad por Sustrato
11.
BMC Bioinformatics ; 20(1): 228, 2019 May 06.
Artículo en Inglés | MEDLINE | ID: mdl-31060495

RESUMEN

BACKGROUND: An orthologous group (OG) comprises a set of orthologous and paralogous genes that share a last common ancestor (LCA). OGs are defined with respect to a chosen taxonomic level, which delimits the position of the LCA in time to a specified speciation event. A hierarchy of OGs expands on this notion, connecting more general OGs, distant in time, to more recent, fine-grained OGs, thereby spanning multiple levels of the tree of life. Large scale inference of OG hierarchies with independently computed taxonomic levels can suffer from inconsistencies between successive levels, such as the position in time of a duplication event. This can be due to confounding genetic signal or algorithmic limitations. Importantly, inconsistencies limit the potential use of OGs for functional annotation and third-party applications. RESULTS: Here we present a new methodology to ensure hierarchical consistency of OGs across taxonomic levels. To resolve an inconsistency, we subsample the protein space of the OG members and perform gene tree-species tree reconciliation for each sampling. Differently from previous approaches, by subsampling the protein space, we avoid the notoriously difficult task of accurately building and reconciling very large phylogenies. We implement the method into a high-throughput pipeline and apply it to the eggNOG database. We use independent protein domain definitions to validate its performance. CONCLUSION: The presented consistency pipeline shows that, contrary to previous limitations, tree reconciliation can be a useful instrument for the construction of OG hierarchies. The key lies in the combination of sampling smaller trees and aggregating their reconciliations for robustness. Results show comparable or greater performance to previous pipelines. The code is available on Github at: https://github.com/meringlab/og_consistency_pipeline .


Asunto(s)
Bases de Datos de Proteínas/normas , Filogenia
12.
J Proteome Res ; 18(3): 1019-1031, 2019 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-30652484

RESUMEN

In the current study, we show how ProCan90, a curated data set of HEK293 technical replicates, can be used to optimize the configuration options for algorithms in the OpenSWATH pipeline. Furthermore, we use this case study as a proof of concept for horizontal scaling of such a pipeline to allow 45 810 computational analysis runs of OpenSWATH to be completed within four and a half days on a budget of US $10 000. Through the use of Amazon Web Services (AWS), we have successfully processed each of the ProCan 90 files with 506 combinations of input parameters. In total, the project consumed more than 340 000 core hours of compute and generated in excess of 26 TB of data. Using the resulting data and a set of quantitative metrics, we show an analysis pathway that allows the calculation of two optimal parameter sets, one for a compute rich environment (where run time is not a constraint), and another for a compute poor environment (where run time is optimized). For the same input files and the compute rich parameter set, we show a 29.8% improvement in the number of quality protein (>2 peptide) identifications found compared to the current OpenSWATH defaults, with negligible adverse effects on quantification reproducibility or drop in identification confidence, and a median run time of 75 min (103% increase). For the compute poor parameter set, we find a 55% improvement in the run time from the default parameter set, at the expense of a 3.4% decrease in the number of quality protein identifications, and an intensity CV decrease from 14.0% to 13.7%.


Asunto(s)
Biología Computacional/métodos , Bases de Datos de Proteínas/normas , Conjuntos de Datos como Asunto/normas , Células HEK293 , Humanos , Proteínas/análisis , Proteómica/métodos , Reproducibilidad de los Resultados , Factores de Tiempo
13.
J Proteome Res ; 18(2): 585-593, 2019 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-30560673

RESUMEN

Decoy database search with target-decoy competition (TDC) provides an intuitive, easy-to-implement method for estimating the false discovery rate (FDR) associated with spectrum identifications from shotgun proteomics data. However, the procedure can yield different results for a fixed data set analyzed with different decoy databases, and this decoy-induced variability is particularly problematic for smaller FDR thresholds, data sets, or databases. The average TDC (aTDC) protocol combats this problem by exploiting multiple independently shuffled decoy databases to provide an FDR estimate with reduced variability. We provide a tutorial introduction to aTDC, describe an improved variant of the protocol that offers increased statistical power, and discuss how to deploy aTDC in practice using the Crux software toolkit.


Asunto(s)
Bases de Datos de Proteínas/normas , Proteómica/métodos , Programas Informáticos , Conjuntos de Datos como Asunto , Humanos , Modelos Estadísticos , Reproducibilidad de los Resultados
14.
Autophagy ; 14(12): 2033-2034, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30296899

RESUMEN

I routinely see people use incorrect names for MAP1LC3/LC3 isoforms in scientific papers. In fact, it happens often enough that I decided to investigate the reason for the apparent confusion. It turns out that the sources of misinformation are abundant, including UniProt and antibody supplier web sites.


Asunto(s)
Anticuerpos/clasificación , Proteínas Asociadas a Microtúbulos/clasificación , Terminología como Asunto , Proteínas Relacionadas con la Autofagia/química , Proteínas Relacionadas con la Autofagia/inmunología , Comercio/normas , Bases de Datos de Proteínas/clasificación , Bases de Datos de Proteínas/normas , Humanos , Proteínas Asociadas a Microtúbulos/química , Proteínas Asociadas a Microtúbulos/inmunología , Isoformas de Proteínas/clasificación , Isoformas de Proteínas/inmunología
15.
J Proteome Res ; 17(12): 4051-4060, 2018 12 07.
Artículo en Inglés | MEDLINE | ID: mdl-30270626

RESUMEN

The 2017 Dagstuhl Seminar on Computational Proteomics provided an opportunity for a broad discussion on the current state and future directions of the generation and use of peptide tandem mass spectrometry spectral libraries. Their use in proteomics is growing slowly, but there are multiple challenges in the field that must be addressed to further increase the adoption of spectral libraries and related techniques. The primary bottlenecks are the paucity of high quality and comprehensive libraries and the general difficulty of adopting spectral library searching into existing workflows. There are several existing spectral library formats, but none captures a satisfactory level of metadata; therefore, a logical next improvement is to design a more advanced, Proteomics Standards Initiative-approved spectral library format that can encode all of the desired metadata. The group discussed a series of metadata requirements organized into three designations of completeness or quality, tentatively dubbed bronze, silver, and gold. The metadata can be organized at four different levels of granularity: at the collection (library) level, at the individual entry (peptide ion) level, at the peak (fragment ion) level, and at the peak annotation level. Strategies for encoding mass modifications in a consistent manner and the requirement for encoding high-quality and commonly seen but as-yet-unidentified spectra were discussed. The group also discussed related topics, including strategies for comparing two spectra, techniques for generating representative spectra for a library, approaches for selection of optimal signature ions for targeted workflows, and issues surrounding the merging of two or more libraries into one. We present here a review of this field and the challenges that the community must address in order to accelerate the adoption of spectral libraries in routine analysis of proteomics datasets.


Asunto(s)
Bases de Datos de Proteínas/normas , Biblioteca de Péptidos , Proteómica/métodos , Animales , Humanos , Espectrometría de Masas en Tándem/métodos , Flujo de Trabajo
16.
Acta Crystallogr D Struct Biol ; 74(Pt 9): 939-945, 2018 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-30198902

RESUMEN

The Protein Data Bank (PDB) constitutes a collection of the available atomic models of macromolecules and their complexes obtained by various methods used in structural biology, but chiefly by crystallography. It is an indispensable resource for all branches of science that deal with the structures of biologically active molecules, such as structural biology, bioinformatics, the design of novel drugs etc. Since not all users of the PDB are familiar with the methods of crystallography, it is important to present the results of crystallographic analyses in a form that is easy to interpret by nonspecialists. It is advisable during the submission of structures to the PDB to pay attention to the optimal placement of molecules within the crystal unit cell, to the correct representation of oligomeric assemblies and to the proper selection of the space-group symmetry. Examples of significant departures from these principles illustrate the potential for the misinterpretation of such suboptimally presented crystal structures.


Asunto(s)
Bases de Datos de Proteínas/normas , Conformación Proteica , Proteínas/química , Cristalografía por Rayos X , Humanos , Modelos Moleculares
17.
Acta Crystallogr F Struct Biol Commun ; 74(Pt 8): 463-472, 2018 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-30084395

RESUMEN

Glycosylation is one of the most common forms of protein post-translational modification, but is also the most complex. Dealing with glycoproteins in structure model building, refinement, validation and PDB deposition is more error-prone than dealing with nonglycosylated proteins owing to limitations of the experimental data and available software tools. Also, experimentalists are typically less experienced in dealing with carbohydrate residues than with amino-acid residues. The results of the reannotation and re-refinement by PDB-REDO of 8114 glycoprotein structure models from the Protein Data Bank are analyzed. The positive aspects of 3620 reannotations and subsequent refinement, as well as the remaining challenges to obtaining consistently high-quality carbohydrate models, are discussed.


Asunto(s)
Bases de Datos de Proteínas/clasificación , Bases de Datos de Proteínas/normas , Glicoproteínas/química , Glicoproteínas/clasificación
18.
Acta Crystallogr D Struct Biol ; 74(Pt 6): 531-544, 2018 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-29872004

RESUMEN

This article describes the implementation of real-space refinement in the phenix.real_space_refine program from the PHENIX suite. The use of a simplified refinement target function enables very fast calculation, which in turn makes it possible to identify optimal data-restraint weights as part of routine refinements with little runtime cost. Refinement of atomic models against low-resolution data benefits from the inclusion of as much additional information as is available. In addition to standard restraints on covalent geometry, phenix.real_space_refine makes use of extra information such as secondary-structure and rotamer-specific restraints, as well as restraints or constraints on internal molecular symmetry. The re-refinement of 385 cryo-EM-derived models available in the Protein Data Bank at resolutions of 6 Šor better shows significant improvement of the models and of the fit of these models to the target maps.


Asunto(s)
Microscopía por Crioelectrón/métodos , Programas Informáticos , Animales , Simulación por Computador , Cristalografía/métodos , Bases de Datos de Proteínas/normas , Humanos , Sustancias Macromoleculares/química , Modelos Moleculares , Canales Catiónicos TRPV/química , Estudios de Validación como Asunto
19.
BMC Bioinformatics ; 19(1): 204, 2018 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-29859055

RESUMEN

BACKGROUND: Identifying protein functional sites (PFSs) and, particularly, the physicochemical interactions at these sites is critical to understanding protein functions and the biochemical reactions involved. Several knowledge-based methods have been developed for the prediction of PFSs; however, accurate methods for predicting the physicochemical interactions associated with PFSs are still lacking. RESULTS: In this paper, we present a sequence-based method for the prediction of physicochemical interactions at PFSs. The method is based on a functional site and physicochemical interaction-annotated domain profile database, called fiDPD, which was built using protein domains found in the Protein Data Bank. This method was applied to 13 target proteins from the very recent Critical Assessment of Structure Prediction (CASP10/11), and our calculations gave a Matthews correlation coefficient (MCC) value of 0.66 for PFS prediction and an 80% recall in the prediction of the associated physicochemical interactions. CONCLUSIONS: Our results show that, in addition to the PFSs, the physical interactions at these sites are also conserved in the evolution of proteins. This work provides a valuable sequence-based tool for rational drug design and side-effect assessment. The method is freely available and can be accessed at http://202.119.249.49 .


Asunto(s)
Bases de Datos de Proteínas/normas , Proteínas/química , Análisis de Secuencia de Proteína/métodos , Humanos
20.
Endocrinology ; 159(6): 2397-2407, 2018 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-29718163

RESUMEN

Nuclear receptors (NRs) are ligand-inducible transcription factors that play critical roles in metazoan development, reproduction, and physiology and therefore are implicated in a broad range of pathologies. The transcriptional activity of NRs critically depends on their interaction(s) with transcriptional coregulator proteins, including coactivators and corepressors. Short leucine-rich peptide motifs in these proteins (LxxLL in coactivators and LxxxIxxxL in corepressors) are essential and sufficient for NR binding. With 350 different coregulator proteins identified to date and with many coregulators containing multiple interaction motifs, an enormous combinatorial potential is present for selective NR-mediated gene regulation. However, NR-coregulator interactions have often been determined experimentally on a one-to-one basis across diverse experimental conditions. In addition, NR-coregulator interactions are difficult to predict because the molecular determinants that govern specificity are not well established. Therefore, many biologically and clinically relevant NR-coregulator interactions may remain to be discovered. Here, we present a comprehensive overview of 3696 NR-coregulator interactions by systematically characterizing the binding of 24 nuclear receptors with 154 coregulator peptides. We identified unique ligand-dependent NR-coregulator interaction profiles for each NR, confirming many well-established NR-coregulator interactions. Hierarchical clustering based on the NR-coregulator interaction profiles largely recapitulates the classification of NR subfamilies based on the primary amino acid sequences of the ligand-binding domains, indicating that amino acid sequence is an important, although not the only, molecular determinant in directing and fine-tuning NR-coregulator interactions. This NR-coregulator peptide interactome provides an open data resource for future biological and clinical discovery as well as NR-based drug design.


Asunto(s)
Proteínas Co-Represoras/genética , Bases de Datos de Proteínas , Mapeo de Interacción de Proteínas/métodos , Receptores Citoplasmáticos y Nucleares/metabolismo , Factores de Transcripción/genética , Animales , Análisis por Conglomerados , Proteínas Co-Represoras/metabolismo , Bases de Datos de Proteínas/normas , Bases de Datos de Proteínas/provisión & distribución , Diseño de Fármacos , Perfilación de la Expresión Génica , Ensayos Analíticos de Alto Rendimiento , Humanos , Filogenia , Unión Proteica , Dominios Proteicos , Receptores Citoplasmáticos y Nucleares/genética , Factores de Transcripción/metabolismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA