Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 97
Filter
1.
Protein Sci ; 31(1): 92-106, 2022 01.
Article in English | MEDLINE | ID: mdl-34529321

ABSTRACT

The antimicrobial peptide database (APD) has served the antimicrobial peptide field for 18 years. Because it is widely used in research and education, this article documents database milestones and key events that have transformed it into the current form. A comparison is made for the APD peptide statistics between 2010 and 2020, validating the major database findings to date. We also describe new additions ranging from peptide entries to search functions. Of note, the APD also contains antimicrobial peptides from host microbiota, which are important in shaping immune systems and could be linked to a variety of human diseases. Finally, the database has been re-programmed to the web branding and latest security compliance of the University of Nebraska Medical Center. The reprogrammed APD can be accessed at https://aps.unmc.edu.


Subject(s)
Antimicrobial Peptides , Computational Biology , Databases, Protein , Antimicrobial Peptides/chemistry , Antimicrobial Peptides/genetics , Computational Biology/history , Computational Biology/trends , Databases, Protein/history , Databases, Protein/trends , History, 21st Century
2.
PLoS Biol ; 19(12): e3001464, 2021 12.
Article in English | MEDLINE | ID: mdl-34871295

ABSTRACT

The UniProt knowledgebase is a public database for protein sequence and function, covering the tree of life and over 220 million protein entries. Now, the whole community can use a new crowdsourcing annotation system to help scale up UniProt curation and receive proper attribution for their biocuration work.


Subject(s)
Crowdsourcing/methods , Data Curation/methods , Molecular Sequence Annotation/methods , Amino Acid Sequence/genetics , Computational Biology/methods , Databases, Protein/trends , Humans , Literature , Proteins/metabolism , Stakeholder Participation
4.
Acta Crystallogr D Struct Biol ; 76(Pt 5): 400-405, 2020 May 01.
Article in English | MEDLINE | ID: mdl-32355036

ABSTRACT

The number of new X-ray crystallography-based submissions to the Protein Data Bank appears to be at the beginning of a decline, perhaps signalling an end to the era of the dominance of X-ray crystallography within structural biology. This letter, from the viewpoint of a young structural biologist, applies the Copernican method to the life expectancy of crystallography and asks whether the technique is still the mainstay of structural biology. A study of the rate of Protein Data Bank depositions allows a more nuanced analysis of the fortunes of macromolecular X-ray crystallography and shows that cryo-electron microscopy might now be outcompeting crystallography for new labour and talent, perhaps heralding a change in the landscape of the field.


Subject(s)
Cryoelectron Microscopy/trends , Crystallography, X-Ray/trends , Proteins/chemistry , Databases, Protein/trends , Multiprotein Complexes/chemistry , Protein Conformation
5.
J Proteome Res ; 17(12): 4031-4041, 2018 12 07.
Article in English | MEDLINE | ID: mdl-30099871

ABSTRACT

The Human Proteome Project (HPP) annually reports on progress throughout the field in credibly identifying and characterizing the human protein parts list and making proteomics an integral part of multiomics studies in medicine and the life sciences. NeXtProt release 2018-01-17, the baseline for this sixth annual HPP special issue of the Journal of Proteome Research, contains 17 470 PE1 proteins, 89% of all neXtProt predicted PE1-4 proteins, up from 17 008 in release 2017-01-23 and 13 975 in release 2012-02-24. Conversely, the number of neXtProt PE2,3,4 missing proteins has been reduced from 2949 to 2579 to 2186 over the past two years. Of the PE1 proteins, 16 092 are based on mass spectrometry results, and 1378 on other kinds of protein studies, notably protein-protein interaction findings. PeptideAtlas has 15 798 canonical proteins, up 625 over the past year, including 269 from SUMOylation studies. The largest reason for missing proteins is low abundance. Meanwhile, the Human Protein Atlas has released its Cell Atlas, Pathology Atlas, and updated Tissue Atlas, and is applying recommendations from the International Working Group on Antibody Validation. Finally, there is progress using the quantitative multiplex organ-specific popular proteins targeted proteomics approach in various disease categories.


Subject(s)
Databases, Protein/trends , Proteome/analysis , Proteomics/methods , Guidelines as Topic , Humans , Mass Spectrometry/methods , Protein Interaction Maps , Research Design , Software
6.
Nucleic Acids Res ; 46(W1): W84-W88, 2018 07 02.
Article in English | MEDLINE | ID: mdl-29741643

ABSTRACT

The unprecedented growth of high-throughput sequencing has led to an ever-widening annotation gap in protein databases. While computational prediction methods are available to make up the shortfall, a majority of public web servers are hindered by practical limitations and poor performance. Here, we introduce PANNZER2 (Protein ANNotation with Z-scoRE), a fast functional annotation web server that provides both Gene Ontology (GO) annotations and free text description predictions. PANNZER2 uses SANSparallel to perform high-performance homology searches, making bulk annotation based on sequence similarity practical. PANNZER2 can output GO annotations from multiple scoring functions, enabling users to see which predictions are robust across predictors. Finally, PANNZER2 predictions scored within the top 10 methods for molecular function and biological process in the CAFA2 NK-full benchmark. The PANNZER2 web server is updated on a monthly schedule and is accessible at http://ekhidna2.biocenter.helsinki.fi/sanspanz/. The source code is available under the GNU Public Licence v3.


Subject(s)
Computational Biology/trends , Gene Ontology/trends , Internet , Software , Algorithms , Databases, Protein/trends , High-Throughput Nucleotide Sequencing , Molecular Sequence Annotation
9.
J Proteome Res ; 16(12): 4281-4287, 2017 12 01.
Article in English | MEDLINE | ID: mdl-28853897

ABSTRACT

The Human Proteome Organization (HUPO) Human Proteome Project (HPP) continues to make progress on its two overall goals: (1) completing the protein parts list, with an annual update of the HUPO draft human proteome, and (2) making proteomics an integrated complement to genomics and transcriptomics throughout biomedical and life sciences research. neXtProt version 2017-01-23 has 17 008 confident protein identifications (Protein Existence [PE] level 1) that are compliant with the HPP Guidelines v2.1 ( https://hupo.org/Guidelines ), up from 13 664 in 2012-12 and 16 518 in 2016-04. Remaining to be found by mass spectrometry and other methods are 2579 "missing proteins" (PE2+3+4), down from 2949 in 2016. PeptideAtlas 2017-01 has 15 173 canonical proteins, accounting for nearly all of the 15 290 PE1 proteins based on MS data. These resources have extensive data on PTMs, single amino acid variants, and splice isoforms. The Human Protein Atlas v16 has 10 492 highly curated protein entries with tissue and subcellular spatial localization of proteins and transcript expression. Organ-specific popular protein lists have been generated for broad use in quantitative targeted proteomics using SRM-MS or DIA-SWATH-MS studies of biology and disease.


Subject(s)
Databases, Protein/trends , Proteome/analysis , Guidelines as Topic , Human Genome Project , Humans , Mass Spectrometry/methods , Proteomics/methods , Proteomics/trends
10.
J Proteome Res ; 16(12): 4288-4298, 2017 12 01.
Article in English | MEDLINE | ID: mdl-28849660

ABSTRACT

The Proteomics Standards Initiative (PSI) of the Human Proteome Organization (HUPO) has now been developing and promoting open community standards and software tools in the field of proteomics for 15 years. Under the guidance of the chair, cochairs, and other leadership positions, the PSI working groups are tasked with the development and maintenance of community standards via special workshops and ongoing work. Among the existing ratified standards, the PSI working groups continue to update PSI-MI XML, MITAB, mzML, mzIdentML, mzQuantML, mzTab, and the MIAPE (Minimum Information About a Proteomics Experiment) guidelines with the advance of new technologies and techniques. Furthermore, new standards are currently either in the final stages of completion (proBed and proBAM for proteogenomics results as well as PEFF) or in early stages of design (a spectral library standard format, a universal spectrum identifier, the qcML quality control format, and the Protein Expression Interface (PROXI) web services Application Programming Interface). In this work we review the current status of all of these aspects of the PSI, describe synergies with other efforts such as the ProteomeXchange Consortium, the Human Proteome Project, and the metabolomics community, and provide a look at future directions of the PSI.


Subject(s)
Proteomics/standards , Software , Databases, Protein/standards , Databases, Protein/trends , Guidelines as Topic , Humans , Metabolomics , Proteomics/trends , Reference Standards , Software/standards , Software/trends
11.
J Proteomics ; 163: 67-75, 2017 06 23.
Article in English | MEDLINE | ID: mdl-28385663

ABSTRACT

The unique physiochemical properties of wheat gluten enable a diverse range of food products to be manufactured. However, gluten triggers coeliac disease, a condition which is treated using a gluten-free diet. Analytical methods are required to confirm if foods are gluten-free, but current immunoassay-based methods can unreliable and proteomic methods offer an alternative but require comprehensive and well annotated sequence databases which are lacking for gluten. A manually a curated database (GluPro V1.0) of gluten proteins, comprising 630 discrete unique full length protein sequences has been compiled. It is representative of the different types of gliadin and glutenin components found in gluten. An in silico comparison of their coeliac toxicity was undertaken by analysing the distribution of coeliac toxic motifs. This demonstrated that whilst the α-gliadin proteins contained more toxic motifs, these were distributed across all gluten protein sub-types. Comparison of annotations observed using a discovery proteomics dataset acquired using ion mobility MS/MS showed that more reliable identifications were obtained using the GluPro V1.0 database compared to the complete reviewed Viridiplantae database. This highlights the value of a curated sequence database specifically designed to support the proteomic workflows and the development of methods to detect and quantify gluten. SIGNIFICANCE: We have constructed the first manually curated open-source wheat gluten protein sequence database (GluPro V1.0) in a FASTA format to support the application of proteomic methods for gluten protein detection and quantification. We have also analysed the manually verified sequences to give the first comprehensive overview of the distribution of sequences able to elicit a reaction in coeliac disease, the prevalent form of gluten intolerance. Provision of this database will improve the reliability of gluten protein identification by proteomic analysis, and aid the development of targeted mass spectrometry methods in line with Codex Alimentarius Commission requirements for foods designed to meet the needs of gluten intolerant individuals.


Subject(s)
Databases, Protein , Glutens/analysis , Proteomics/methods , Amino Acid Sequence , Celiac Disease/etiology , Databases, Protein/standards , Databases, Protein/trends , Diet, Gluten-Free , Gliadin/analysis , Humans
12.
Nucleic Acids Res ; 45(D1): D1-D11, 2017 01 04.
Article in English | MEDLINE | ID: mdl-28053160

ABSTRACT

This year's Database Issue of Nucleic Acids Research contains 152 papers that include descriptions of 54 new databases and update papers on 98 databases, of which 16 have not been previously featured in NAR As always, these databases cover a broad range of molecular biology subjects, including genome structure, gene expression and its regulation, proteins, protein domains, and protein-protein interactions. Following the recent trend, an increasing number of new and established databases deal with the issues of human health, from cancer-causing mutations to drugs and drug targets. In accordance with this trend, three recently compiled databases that have been selected by NAR reviewers and editors as 'breakthrough' contributions, denovo-db, the Monarch Initiative, and Open Targets, cover human de novo gene variants, disease-related phenotypes in model organisms, and a bioinformatics platform for therapeutic target identification and validation, respectively. We expect these databases to attract the attention of numerous researchers working in various areas of genetics and genomics. Looking back at the past 12 years, we present here the 'golden set' of databases that have consistently served as authoritative, comprehensive, and convenient data resources widely used by the entire community and offer some lessons on what makes a successful database. The Database Issue is freely available online at the https://academic.oup.com/nar web site. An updated version of the NAR Molecular Biology Database Collection is available at http://www.oxfordjournals.org/nar/database/a/.


Subject(s)
Databases, Nucleic Acid/trends , Databases, Protein/trends , Databases, Chemical/trends , Genomics , Humans
13.
Ann N Y Acad Sci ; 1387(1): 95-104, 2017 01.
Article in English | MEDLINE | ID: mdl-27862010

ABSTRACT

Access to experimental X-ray diffraction image data is important for validation and reproduction of macromolecular models and indispensable for the development of structural biology processing methods. In response to the evolving needs of the structural biology community, we recently established a diffraction data publication system, the Structural Biology Data Grid (SBDG, data.sbgrid.org), to preserve primary experimental datasets supporting scientific publications. All datasets published through the SBDG are freely available to the research community under a public domain dedication license, with metadata compliant with the DataCite Schema (schema.datacite.org). A proof-of-concept study demonstrated community interest and utility. Publication of large datasets is a challenge shared by several fields, and the SBDG has begun collaborating with the Institute for Quantitative Social Science at Harvard University to extend the Dataverse (dataverse.org) open-source data repository system to structural biology datasets. Several extensions are necessary to support the size and metadata requirements for structural biology datasets. In this paper, we describe one such extension-functionality supporting preservation of file system structure within Dataverse-which is essential for both in-place computation and supporting non-HTTP data transfers.


Subject(s)
Access to Information , Biomedical Research , Computational Biology/methods , Computer Communication Networks , Database Management Systems , Databases, Protein , Animals , Biomedical Research/trends , Computational Biology/instrumentation , Computational Biology/trends , Computer Communication Networks/instrumentation , Computer Communication Networks/trends , Crystallography, X-Ray , Data Mining/trends , Database Management Systems/trends , Databases, Protein/trends , Humans , Image Interpretation, Computer-Assisted , Internet , Periodicals as Topic , Protein Conformation , Software
14.
J Proteome Res ; 15(11): 4091-4100, 2016 11 04.
Article in English | MEDLINE | ID: mdl-27577934

ABSTRACT

The results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstances-a problem especially acute in human sample analysis. All sequence databases are genome-based, with sequences for the predicted gene and their protein translation products compiled. Our goal is to create a set of primary sequence databases that comprise the union of sequences from many of the different available sources and make the result easily available to the community. We have compiled a set of four sequence databases of varying sizes, from a small database consisting of only the ∼20,000 primary isoforms plus contaminants to a very large database that includes almost all nonredundant protein sequences from several sources. This set of tiered, increasingly complete human protein sequence databases suitable for mass spectrometry proteomics sequence database searching is called the Tiered Human Integrated Search Proteome set. In order to evaluate the utility of these databases, we have analyzed two different data sets, one from the HeLa cell line and the other from normal human liver tissue, with each of the four tiers of database complexity. The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides can be identified for Tiers 2, 3, and 4, respectively, as compared with the Tier 1 database, at substantially increasing computational cost. This increase in computational cost may be worth bearing if the identification of sequence variants or the discovery of sequences that are not present in the reviewed knowledge base entries is an important goal of the study. We find that it is useful to search a data set against a simpler database, and then check the uniqueness of the discovered peptides against a more complex database. We have set up an automated system that downloads all the source databases on the first of each month and automatically generates a new set of search databases and makes them available for download at http://www.peptideatlas.org/thisp/ .


Subject(s)
Databases, Protein/trends , Proteomics/methods , Computational Biology/methods , HeLa Cells , Humans , Liver/chemistry , Liver/cytology , Mass Spectrometry , Protein Isoforms/analysis , Proteins/analysis
15.
J Proteome Res ; 15(11): 3979-3987, 2016 11 04.
Article in English | MEDLINE | ID: mdl-27573249

ABSTRACT

The Biology and Disease-driven Human Proteome Project (B/D-HPP) is aimed at supporting and enhancing the broad use of state-of-the-art proteomic methods to characterize and quantify proteins for in-depth understanding of the molecular mechanisms of biological processes and human disease. Based on a foundation of the pre-existing HUPO initiatives begun in 2002, the B/D-HPP is designed to provide standardized methods and resources for mass spectrometry and specific protein affinity reagents and facilitate accessibility of these resources to the broader life sciences research and clinical communities. Currently there are 22 B/D-HPP initiatives and 3 closely related HPP resource pillars. The B/D-HPP groups are working to define sets of protein targets that are highly relevant to each particular field to deliver relevant assays for the measurement of these selected targets and to disseminate and make publicly accessible the information and tools generated. Major developments are the 2016 publications of the Human SRM Atlas and of "popular protein sets" for six organ systems. Here we present the current activities and plans of the BD-HPP initiatives as highlighted in numerous B/D-HPP workshops at the 14th annual HUPO 2015 World Congress of Proteomics in Vancouver, Canada.


Subject(s)
Databases, Protein/trends , Proteome , Proteomics/methods , Biomedical Research/standards , Computational Biology , Disease/etiology , Human Genome Project/organization & administration , Humans , Information Services/organization & administration , Mass Spectrometry , Proteomics/trends
16.
Toxins (Basel) ; 8(8)2016 07 23.
Article in English | MEDLINE | ID: mdl-27455327

ABSTRACT

Spiders and scorpions are notorious for their fearful dispositions and their ability to inject venom into prey and predators, causing symptoms such as necrosis, paralysis, and excruciating pain. Information on venom composition and the toxins present in these species is growing due to an interest in using bioactive toxins from spiders and scorpions for drug discovery purposes and for solving crystal structures of membrane-embedded receptors. Additionally, the identification and isolation of a myriad of spider and scorpion toxins has allowed research within next generation antivenoms to progress at an increasingly faster pace. In this review, the current knowledge of spider and scorpion venoms is presented, followed by a discussion of all published biotechnological efforts within development of spider and scorpion antitoxins based on small molecules, antibodies and fragments thereof, and next generation immunization strategies. The increasing number of discovery and development efforts within this field may point towards an upcoming transition from serum-based antivenoms towards therapeutic solutions based on modern biotechnology.


Subject(s)
Antivenins/therapeutic use , Biotechnology/trends , Drug Discovery/trends , Scorpion Stings/drug therapy , Scorpion Venoms/antagonists & inhibitors , Spider Bites/drug therapy , Spider Venoms/antagonists & inhibitors , Animals , Antivenins/chemistry , Computational Biology/trends , Databases, Protein/trends , Humans , Scorpion Stings/immunology , Scorpion Stings/metabolism , Scorpion Venoms/immunology , Scorpion Venoms/metabolism , Spider Bites/immunology , Spider Bites/metabolism , Spider Venoms/immunology , Spider Venoms/metabolism
17.
J Proteome Res ; 15(11): 4060-4072, 2016 11 04.
Article in English | MEDLINE | ID: mdl-27470641

ABSTRACT

Identification of all phosphorylation forms of known proteins is a major goal of the Chromosome-Centric Human Proteome Project (C-HPP). Recent studies have found that certain phosphoproteins can be encapsulated in exosomes and function as key regulators in tumor microenvironment, but no deep coverage phosphoproteome of human exosomes has been reported to date, which makes the exosome a potential source for the new phosphosite discovery. In this study, we performed highly optimized MS analyses on the exosomal and cellular proteins isolated from human colorectal cancer SW620 cells. With stringent data quality control, 313 phosphoproteins with 1091 phosphosites were confidently identified from the SW620 exosome, from which 202 new phosphosites were detected. Exosomal phosphoproteins were significantly enriched in the 11q12.1-13.5 region of chromosome 11 and had a remarkably high level of tyrosine-phosphorylated proteins (6.4%), which were functionally relevant to ephrin signaling pathway-directed cytoskeleton remodeling. In conclusion, we here report the first high-coverage phosphoproteome of human cell-secreted exosomes, which leads to the identification of new phosphosites for C-HPP. Our findings provide insights into the exosomal phosphoprotein systems that help to understand the signaling language being delivered by exosomes in cell-cell communications. The mass spectrometry proteomics data have been deposited to the ProteomeXchange consortium with the data set identifier PXD004079, and iProX database (accession number: IPX00076800).


Subject(s)
Colorectal Neoplasms/pathology , Databases, Protein/trends , Exosomes , Phosphoproteins/analysis , Proteome/genetics , Cell Communication , Cell Line, Tumor , Chromosomes, Human, Pair 11/genetics , Colorectal Neoplasms/genetics , Human Genome Project , Humans , Mass Spectrometry , Neoplasm Proteins , Phosphopeptides/analysis , Phosphoproteins/genetics , Proteomics/methods , Signal Transduction
19.
Biochimie ; 119: 209-17, 2015 Dec.
Article in English | MEDLINE | ID: mdl-26253692

ABSTRACT

This article presents a historical review of the protein structure classification database CATH. Together with the SCOP database, CATH remains comprehensive and reasonably up-to-date with the now more than 100,000 protein structures in the PDB. We review the expansion of the CATH and SCOP resources to capture predicted domain structures in the genome sequence data and to provide information on the likely functions of proteins mediated by their constituent domains. The establishment of comprehensive function annotation resources has also meant that domain families can be functionally annotated allowing insights into functional divergence and evolution within protein families.


Subject(s)
Databases, Protein/history , Models, Molecular , Protein Isoforms/chemistry , Animals , Catalytic Domain , Cluster Analysis , Databases, Genetic/history , Databases, Genetic/trends , Databases, Protein/trends , England , Evolution, Molecular , History, 20th Century , History, 21st Century , Humans , Isoenzymes/chemistry , Isoenzymes/classification , Isoenzymes/genetics , Isoenzymes/metabolism , Molecular Sequence Annotation , Protein Folding , Protein Isoforms/classification , Protein Isoforms/genetics , Protein Isoforms/metabolism , Protein Structure, Tertiary , Structural Homology, Protein
20.
Acta Crystallogr F Struct Biol Commun ; 71(Pt 7): 831-7, 2015 Jul.
Article in English | MEDLINE | ID: mdl-26144227

ABSTRACT

High-quality macromolecular crystals are a prerequisite for the process of protein structure determination by X-ray diffraction. Unfortunately, the relative yield of diffraction-quality crystals from crystallization experiments is often very low. In this context, innovative crystallization screen formulations are continuously being developed. In the past, MORPHEUS, a screen in which each condition integrates a mix of additives selected from the Protein Data Bank, a cryoprotectant and a buffer system, was developed. Here, MORPHEUS II, a follow-up to the original 96-condition initial screen, is described. Reagents were selected to yield crystals when none might be observed in traditional initial screens. Besides, the screen includes heavy atoms for experimental phasing and small polyols to ensure the cryoprotection of crystals. The suitability of the resulting novel conditions is shown by the crystallization of a broad variety of protein samples and their efficiency is compared with commercially available conditions.


Subject(s)
Crystallization/methods , Databases, Protein , Macromolecular Substances/chemistry , Databases, Protein/trends , Macromolecular Substances/analysis , X-Ray Diffraction/methods
SELECTION OF CITATIONS
SEARCH DETAIL