Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
Mais filtros











Base de dados
Intervalo de ano de publicação
1.
Nucleic Acids Res ; 52(D1): D672-D678, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37941124

RESUMO

The Reactome Knowledgebase (https://reactome.org), an Elixir and GCBR core biological data resource, provides manually curated molecular details of a broad range of normal and disease-related biological processes. Processes are annotated as an ordered network of molecular transformations in a single consistent data model. Reactome thus functions both as a digital archive of manually curated human biological processes and as a tool for discovering functional relationships in data such as gene expression profiles or somatic mutation catalogs from tumor cells. Here we review progress towards annotation of the entire human proteome, targeted annotation of disease-causing genetic variants of proteins and of small-molecule drugs in a pathway context, and towards supporting explicit annotation of cell- and tissue-specific pathways. Finally, we briefly discuss issues involved in making Reactome more fully interoperable with other related resources such as the Gene Ontology and maintaining the resulting community resource network.


Assuntos
Bases de Conhecimento , Redes e Vias Metabólicas , Transdução de Sinais , Humanos , Redes e Vias Metabólicas/genética , Proteoma/genética
2.
J Proteome Res ; 22(6): 1800-1815, 2023 06 02.
Artigo em Inglês | MEDLINE | ID: mdl-37183442

RESUMO

Understanding autoimmunity to endogenous proteins is crucial in diagnosing and treating autoimmune diseases. In this work, we developed a user-friendly AAgAtlas portal (http://biokb.ncpsb.org.cn/aagatlas_portal/index.php#), which can be used to search for 8045 non-redundant autoantigens (AAgs) and 47 post-translationally modified AAgs against 1073 human diseases that are prioritized by a credential score developed by multisource evidence. Using AAgAtlas, the immunogenic properties of human AAgs was systematically elucidated according to their genetic, biophysical, cytological, expression profile, and evolutionary characteristics. The results indicated that human AAgs are evolutionally conserved in protein sequence and enriched in three hydrophilic and polar amino acid residues (K, D, and E) that are located at the protein surface. AAgs are enriched in proteins that are involved in nucleic acid binding, transferase, and the cytoskeleton. Genome, transcriptome, and proteome analyses further indicated that AAb production is associated with gene variance and abnormal protein expression related to the pathological activities of different tumors. Collectively, our data outlines the hallmarks of human AAgs that facilitate the understanding of humoral autoimmunity and the identification of biomarkers of human diseases.


Assuntos
Autoantígenos , Doenças Autoimunes , Humanos , Autoantígenos/genética , Autoanticorpos , Doenças Autoimunes/genética , Autoimunidade , Sequência de Aminoácidos
3.
Curr Protoc ; 3(4): e722, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-37053306

RESUMO

Pathway databases provide descriptions of the roles of proteins, nucleic acids, lipids, carbohydrates, and other molecular entities within their biological cellular contexts. Pathway-centric views of these roles may allow for the discovery of unexpected functional relationships in data such as gene expression profiles and somatic mutation catalogues from tumor cells. For this reason, there is a high demand for high-quality pathway databases and their associated tools. The Reactome project (a collaboration between the Ontario Institute for Cancer Research, New York University Langone Health, the European Bioinformatics Institute, and Oregon Health & Science University) is one such pathway database. Reactome collects detailed information on biological pathways and processes in humans from the primary literature. Reactome content is manually curated, expert-authored, and peer-reviewed and spans the gamut from simple intermediate metabolism to signaling pathways and complex cellular events. This information is supplemented with likely orthologous molecular reactions in mouse, rat, zebrafish, worm, and other model organisms. © 2023 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Browsing a Reactome pathway Basic Protocol 2: Exploring Reactome annotations of disease and drugs Basic Protocol 3: Finding the pathways involving a gene or protein Alternate Protocol 1: Finding the pathways involving a gene or protein using UniProtKB (SwissProt), Ensembl, or Entrez gene identifier Alternate Protocol 2: Using advanced search Basic Protocol 4: Using the Reactome pathway analysis tool to identify statistically overrepresented pathways Basic Protocol 5: Using the Reactome pathway analysis tool to overlay expression data onto Reactome pathway diagrams Basic Protocol 6: Comparing inferred model organism and human pathways using the Species Comparison tool Basic Protocol 7: Comparing tissue-specific expression using the Tissue Distribution tool.


Assuntos
Redes e Vias Metabólicas , Peixe-Zebra , Humanos , Animais , Camundongos , Ratos , Peixe-Zebra/metabolismo , Bases de Dados de Proteínas , Proteínas/metabolismo , Transdução de Sinais
5.
Nucleic Acids Res ; 50(D1): D687-D692, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34788843

RESUMO

The Reactome Knowledgebase (https://reactome.org), an Elixir core resource, provides manually curated molecular details across a broad range of physiological and pathological biological processes in humans, including both hereditary and acquired disease processes. The processes are annotated as an ordered network of molecular transformations in a single consistent data model. Reactome thus functions both as a digital archive of manually curated human biological processes and as a tool for discovering functional relationships in data such as gene expression profiles or somatic mutation catalogs from tumor cells. Recent curation work has expanded our annotations of normal and disease-associated signaling processes and of the drugs that target them, in particular infections caused by the SARS-CoV-1 and SARS-CoV-2 coronaviruses and the host response to infection. New tools support better simultaneous analysis of high-throughput data from multiple sources and the placement of understudied ('dark') proteins from analyzed datasets in the context of Reactome's manually curated pathways.


Assuntos
Antivirais/farmacologia , Bases de Conhecimento , Proteínas/metabolismo , COVID-19/metabolismo , Curadoria de Dados , Genoma Humano , Interações Hospedeiro-Patógeno , Humanos , Proteínas/genética , Transdução de Sinais , Software
6.
Mol Cell Proteomics ; 19(12): 2115-2125, 2020 12.
Artigo em Inglês | MEDLINE | ID: mdl-32907876

RESUMO

Pathway analyses are key methods to analyze 'omics experiments. Nevertheless, integrating data from different 'omics technologies and different species still requires considerable bioinformatics knowledge.Here we present the novel ReactomeGSA resource for comparative pathway analyses of multi-omics datasets. ReactomeGSA can be used through Reactome's existing web interface and the novel ReactomeGSA R Bioconductor package with explicit support for scRNA-seq data. Data from different species is automatically mapped to a common pathway space. Public data from ExpressionAtlas and Single Cell ExpressionAtlas can be directly integrated in the analysis. ReactomeGSA greatly reduces the technical barrier for multi-omics, cross-species, comparative pathway analyses.We used ReactomeGSA to characterize the role of B cells in anti-tumor immunity. We compared B cell rich and poor human cancer samples from five of the Cancer Genome Atlas (TCGA) transcriptomics and two of the Clinical Proteomic Tumor Analysis Consortium (CPTAC) proteomics studies. B cell-rich lung adenocarcinoma samples lacked the otherwise present activation through NFkappaB. This may be linked to the presence of a specific subset of tumor associated IgG+ plasma cells that lack NFkappaB activation in scRNA-seq data from human melanoma. This showcases how ReactomeGSA can derive novel biomedical insights by integrating large multi-omics datasets.


Assuntos
Bases de Dados Genéticas , Proteômica , Software , Linfócitos B/imunologia , Humanos , Internet , Interface Usuário-Computador
7.
Nucleic Acids Res ; 48(D1): D498-D503, 2020 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-31691815

RESUMO

The Reactome Knowledgebase (https://reactome.org) provides molecular details of signal transduction, transport, DNA replication, metabolism and other cellular processes as an ordered network of molecular transformations in a single consistent data model, an extended version of a classic metabolic map. Reactome functions both as an archive of biological processes and as a tool for discovering functional relationships in data such as gene expression profiles or somatic mutation catalogs from tumor cells. To extend our ability to annotate human disease processes, we have implemented a new drug class and have used it initially to annotate drugs relevant to cardiovascular disease. Our annotation model depends on external domain experts to identify new areas for annotation and to review new content. New web pages facilitate recruitment of community experts and allow those who have contributed to Reactome to identify their contributions and link them to their ORCID records. To improve visualization of our content, we have implemented a new tool to automatically lay out the components of individual reactions with multiple options for downloading the reaction diagrams and associated data, and a new display of our event hierarchy that will facilitate visual interpretation of pathway analysis results.


Assuntos
Bases de Dados de Compostos Químicos , Bases de Dados de Produtos Farmacêuticos , Bases de Conhecimento , Software , Genoma Humano , Humanos , Redes e Vias Metabólicas , Mapas de Interação de Proteínas , Transdução de Sinais
8.
Nucleic Acids Res ; 46(D1): D649-D655, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29145629

RESUMO

The Reactome Knowledgebase (https://reactome.org) provides molecular details of signal transduction, transport, DNA replication, metabolism, and other cellular processes as an ordered network of molecular transformations-an extended version of a classic metabolic map, in a single consistent data model. Reactome functions both as an archive of biological processes and as a tool for discovering unexpected functional relationships in data such as gene expression profiles or somatic mutation catalogues from tumor cells. To support the continued brisk growth in the size and complexity of Reactome, we have implemented a graph database, improved performance of data analysis tools, and designed new data structures and strategies to boost diagram viewer performance. To make our website more accessible to human users, we have improved pathway display and navigation by implementing interactive Enhanced High Level Diagrams (EHLDs) with an associated icon library, and subpathway highlighting and zooming, in a simplified and reorganized web site with adaptive design. To encourage re-use of our content, we have enabled export of pathway diagrams as 'PowerPoint' files.


Assuntos
Bases de Conhecimento , Redes e Vias Metabólicas , Gráficos por Computador , Bases de Dados de Compostos Químicos , Bases de Dados de Proteínas , Humanos , Internet , Anotação de Sequência Molecular , Transdução de Sinais , Interface Usuário-Computador
9.
Nucleic Acids Res ; 45(D1): D769-D776, 2017 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-27924021

RESUMO

Autoantibodies refer to antibodies that target self-antigens, which can play pivotal roles in maintaining homeostasis, distinguishing normal from tumor tissue and trigger autoimmune diseases. In the last three decades, tremendous efforts have been devoted to elucidate the generation, evolution and functions of autoantibodies, as well as their target autoantigens. However, reports of these countless previously identified autoantigens are randomly dispersed in the literature. Here, we constructed an AAgAtlas database 1.0 using text-mining and manual curation. We extracted 45 830 autoantigen-related abstracts and 94 313 sentences from PubMed using the keywords of either 'autoantigen' or 'autoantibody' or their lexical variants, which were further refined to 25 520 abstracts, 43 253 sentences and 3984 candidates by our bio-entity recognizer based on the Protein Ontology. Finally, we identified 1126 genes as human autoantigens and 1071 related human diseases, with which we constructed a human autoantigen database (AAgAtlas database 1.0). The database provides a user-friendly interface to conveniently browse, retrieve and download human autoantigens as well as their associated diseases. The database is freely accessible at http://biokb.ncpsb.org/aagatlas/ We believe this database will be a valuable resource to track and understand human autoantigens as well as to investigate their functions in basic and translational research.


Assuntos
Autoantígenos , Biologia Computacional/métodos , Bases de Dados Factuais , Curadoria de Dados , Humanos , Ferramenta de Busca , Interface Usuário-Computador , Navegador
10.
J Proteomics ; 150: 170-182, 2017 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-27498275

RESUMO

In mass spectrometry-based shotgun proteomics, protein identifications are usually the desired result. However, most of the analytical methods are based on the identification of reliable peptides and not the direct identification of intact proteins. Thus, assembling peptides identified from tandem mass spectra into a list of proteins, referred to as protein inference, is a critical step in proteomics research. Currently, different protein inference algorithms and tools are available for the proteomics community. Here, we evaluated five software tools for protein inference (PIA, ProteinProphet, Fido, ProteinLP, MSBayesPro) using three popular database search engines: Mascot, X!Tandem, and MS-GF+. All the algorithms were evaluated using a highly customizable KNIME workflow using four different public datasets with varying complexities (different sample preparation, species and analytical instruments). We defined a set of quality control metrics to evaluate the performance of each combination of search engines, protein inference algorithm, and parameters on each dataset. We show that the results for complex samples vary not only regarding the actual numbers of reported protein groups but also concerning the actual composition of groups. Furthermore, the robustness of reported proteins when using databases of differing complexities is strongly dependant on the applied inference algorithm. Finally, merging the identifications of multiple search engines does not necessarily increase the number of reported proteins, but does increase the number of peptides per protein and thus can generally be recommended. SIGNIFICANCE: Protein inference is one of the major challenges in MS-based proteomics nowadays. Currently, there are a vast number of protein inference algorithms and implementations available for the proteomics community. Protein assembly impacts in the final results of the research, the quantitation values and the final claims in the research manuscript. Even though protein inference is a crucial step in proteomics data analysis, a comprehensive evaluation of the many different inference methods has never been performed. Previously Journal of proteomics has published multiple studies about other benchmark of bioinformatics algorithms (PMID: 26585461; PMID: 22728601) in proteomics studies making clear the importance of those studies for the proteomics community and the journal audience. This manuscript presents a new bioinformatics solution based on the KNIME/OpenMS platform that aims at providing a fair comparison of protein inference algorithms (https://github.com/KNIME-OMICS). Six different algorithms - ProteinProphet, MSBayesPro, ProteinLP, Fido and PIA- were evaluated using the highly customizable workflow on four public datasets with varying complexities. Five popular database search engines Mascot, X!Tandem, MS-GF+ and combinations thereof were evaluated for every protein inference tool. In total >186 proteins lists were analyzed and carefully compare using three metrics for quality assessments of the protein inference results: 1) the numbers of reported proteins, 2) peptides per protein, and the 3) number of uniquely reported proteins per inference method, to address the quality of each inference method. We also examined how many proteins were reported by choosing each combination of search engines, protein inference algorithms and parameters on each dataset. The results show that using 1) PIA or Fido seems to be a good choice when studying the results of the analyzed workflow, regarding not only the reported proteins and the high-quality identifications, but also the required runtime. 2) Merging the identifications of multiple search engines gives almost always more confident results and increases the number of peptides per protein group. 3) The usage of databases containing not only the canonical, but also known isoforms of proteins has a small impact on the number of reported proteins. The detection of specific isoforms could, concerning the question behind the study, compensate for slightly shorter reports using the parsimonious reports. 4) The current workflow can be easily extended to support new algorithms and search engine combinations.


Assuntos
Algoritmos , Biologia Computacional/métodos , Bases de Dados de Proteínas , Proteômica/métodos , Ferramenta de Busca/métodos , Humanos , Peptídeos/química , Isoformas de Proteínas , Software , Espectrometria de Massas em Tandem
11.
Nucleic Acids Res ; 44(D1): D447-56, 2016 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-26527722

RESUMO

The PRoteomics IDEntifications (PRIDE) database is one of the world-leading data repositories of mass spectrometry (MS)-based proteomics data. Since the beginning of 2014, PRIDE Archive (http://www.ebi.ac.uk/pride/archive/) is the new PRIDE archival system, replacing the original PRIDE database. Here we summarize the developments in PRIDE resources and related tools since the previous update manuscript in the Database Issue in 2013. PRIDE Archive constitutes a complete redevelopment of the original PRIDE, comprising a new storage backend, data submission system and web interface, among other components. PRIDE Archive supports the most-widely used PSI (Proteomics Standards Initiative) data standard formats (mzML and mzIdentML) and implements the data requirements and guidelines of the ProteomeXchange Consortium. The wide adoption of ProteomeXchange within the community has triggered an unprecedented increase in the number of submitted data sets (around 150 data sets per month). We outline some statistics on the current PRIDE Archive data contents. We also report on the status of the PRIDE related stand-alone tools: PRIDE Inspector, PRIDE Converter 2 and the ProteomeXchange submission tool. Finally, we will give a brief update on the resources under development 'PRIDE Cluster' and 'PRIDE Proteomes', which provide a complementary view and quality-scored information of the peptide and protein identification data available in PRIDE Archive.


Assuntos
Bases de Dados de Proteínas , Espectrometria de Massas , Proteômica , Peptídeos/química , Proteínas/química , Proteínas/metabolismo , Software , Interface Usuário-Computador
12.
Bioinformatics ; 32(6): 821-7, 2016 03 15.
Artigo em Inglês | MEDLINE | ID: mdl-26568629

RESUMO

MOTIVATION: In any macromolecular polyprotic system-for example protein, DNA or RNA-the isoelectric point-commonly referred to as the pI-can be defined as the point of singularity in a titration curve, corresponding to the solution pH value at which the net overall surface charge-and thus the electrophoretic mobility-of the ampholyte sums to zero. Different modern analytical biochemistry and proteomics methods depend on the isoelectric point as a principal feature for protein and peptide characterization. Protein separation by isoelectric point is a critical part of 2-D gel electrophoresis, a key precursor of proteomics, where discrete spots can be digested in-gel, and proteins subsequently identified by analytical mass spectrometry. Peptide fractionation according to their pI is also widely used in current proteomics sample preparation procedures previous to the LC-MS/MS analysis. Therefore accurate theoretical prediction of pI would expedite such analysis. While such pI calculation is widely used, it remains largely untested, motivating our efforts to benchmark pI prediction methods. RESULTS: Using data from the database PIP-DB and one publically available dataset as our reference gold standard, we have undertaken the benchmarking of pI calculation methods. We find that methods vary in their accuracy and are highly sensitive to the choice of basis set. The machine-learning algorithms, especially the SVM-based algorithm, showed a superior performance when studying peptide mixtures. In general, learning-based pI prediction methods (such as Cofactor, SVM and Branca) require a large training dataset and their resulting performance will strongly depend of the quality of that data. In contrast with Iterative methods, machine-learning algorithms have the advantage of being able to add new features to improve the accuracy of prediction. CONTACT: yperez@ebi.ac.uk AVAILABILITY AND IMPLEMENTATION: The software and data are freely available at https://github.com/ypriverol/pIRSupplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Sequência de Aminoácidos , Focalização Isoelétrica , Ponto Isoelétrico , Peptídeos , Proteômica , Espectrometria de Massas em Tandem
13.
Nucleic Acids Res ; 44(D1): D481-7, 2016 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-26656494

RESUMO

The Reactome Knowledgebase (www.reactome.org) provides molecular details of signal transduction, transport, DNA replication, metabolism and other cellular processes as an ordered network of molecular transformations-an extended version of a classic metabolic map, in a single consistent data model. Reactome functions both as an archive of biological processes and as a tool for discovering unexpected functional relationships in data such as gene expression pattern surveys or somatic mutation catalogues from tumour cells. Over the last two years we redeveloped major components of the Reactome web interface to improve usability, responsiveness and data visualization. A new pathway diagram viewer provides a faster, clearer interface and smooth zooming from the entire reaction network to the details of individual reactions. Tool performance for analysis of user datasets has been substantially improved, now generating detailed results for genome-wide expression datasets within seconds. The analysis module can now be accessed through a RESTFul interface, facilitating its inclusion in third party applications. A new overview module allows the visualization of analysis results on a genome-wide Reactome pathway hierarchy using a single screen page. The search interface now provides auto-completion as well as a faceted search to narrow result lists efficiently.


Assuntos
Bases de Dados de Compostos Químicos , Redes e Vias Metabólicas , Expressão Gênica , Humanos , Bases de Conhecimento , Proteínas/metabolismo , Transdução de Sinais , Software
14.
Database (Oxford) ; 2013: bat066, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24067240

RESUMO

The complex biological processes that control cellular function are mediated by intricate networks of molecular interactions. Accumulating evidence indicates that these interactions are often interdependent, thus acting cooperatively. Cooperative interactions are prevalent in and indispensible for reliable and robust control of cell regulation, as they underlie the conditional decision-making capability of large regulatory complexes. Despite an increased focus on experimental elucidation of the molecular details of cooperative binding events, as evidenced by their growing occurrence in literature, they are currently lacking from the main bioinformatics resources. One of the contributing factors to this deficiency is the lack of a computer-readable standard representation and exchange format for cooperative interaction data. To tackle this shortcoming, we added functionality to the widely used PSI-MI interchange format for molecular interaction data by defining new controlled vocabulary terms that allow annotation of different aspects of cooperativity without making structural changes to the underlying XML schema. As a result, we are able to capture cooperative interaction data in a structured format that is backward compatible with PSI-MI-based data and applications. This will facilitate the storage, exchange and analysis of cooperative interaction data, which in turn will advance experimental research on this fundamental principle in biology.


Assuntos
Bases de Dados de Proteínas , Mapeamento de Interação de Proteínas , Proteômica , Regulação Alostérica , Proteínas de Ciclo Celular/química , Ciclina A/química , Quinase 2 Dependente de Ciclina/química , Humanos , Modelos Moleculares , Anotação de Sequência Molecular , Fosforilação , Ligação Proteica
15.
Mol Cell Proteomics ; 12(11): 3026-35, 2013 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-23813117

RESUMO

The Proteomics Standards Initiative has recently released the mzIdentML data standard for representing peptide and protein identification results, for example, created by a search engine. When a new standard format is produced, it is important that software tools are available that make it straightforward for laboratory scientists to use it routinely and for bioinformaticians to embed support in their own tools. Here we report the release of several open-source Java-based software packages based on mzIdentML: ProteoIDViewer, mzidLibrary, and mzidValidator. The ProteoIDViewer is a desktop application allowing users to visualize mzIdentML-formatted results originating from any appropriate identification software; it supports visualization of all the features of the mzIdentML format. The mzidLibrary is a software library containing routines for importing data from external search engines, post-processing identification data (such as false discovery rate calculations), combining results from multiple search engines, performing protein inference, setting identification thresholds, and exporting results from mzIdentML to plain text files. The mzidValidator is able to process files and report warnings or errors if files are not correctly formatted or contain some semantic error. We anticipate that these developments will simplify adoption of the new standard in proteomics laboratories and the integration of mzIdentML into other software tools. All three tools are freely available in the public domain.


Assuntos
Peptídeos/química , Proteínas/química , Proteômica/estatística & dados numéricos , Software , Proteômica/normas , Ferramenta de Busca
16.
Anal Chem ; 85(7): 3515-20, 2013 Apr 02.
Artigo em Inglês | MEDLINE | ID: mdl-23448308

RESUMO

Peptide sequence matching algorithms used for peptide identification by tandem mass spectrometry (MS/MS) enumerate theoretical peptides from the database, predict their fragment ions, and match them to the experimental MS/MS spectra. Here, we present an approach for scoring MS/MS identifications based on the high mass accuracy matching of precursor ions, the identification of a high intensity b1 fragment ion, and partial sequence tags from phenylthiocarbamoyl-derivatized peptides. This derivatization process boosts the b1 fragment ion signal, which turns it into a powerful feature for peptide identification. We demonstrate the effectiveness of our scoring system by implementing it on a computational tool called "HI-bone" and by identifying mass spectra of an Escherichia coli sample acquired on an Orbitrap Velos instrument using Higher-energy C-trap dissociation. Following this strategy, we identified 1614 peptide spectrum matches with a peptide false discovery rate (FDR) below 1%. These results were significantly higher than those from Mascot and SEQUEST using a similar FDR.


Assuntos
Isotiocianatos/química , Peptídeos/análise , Proteômica/métodos , Espectrometria de Massas em Tandem/métodos , Algoritmos , Sequência de Aminoácidos , Escherichia coli/química , Proteínas de Escherichia coli/química , Humanos , Íons/química , Dados de Sequência Molecular , Software
17.
Nucleic Acids Res ; 41(Database issue): D1063-9, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23203882

RESUMO

The PRoteomics IDEntifications (PRIDE, http://www.ebi.ac.uk/pride) database at the European Bioinformatics Institute is one of the most prominent data repositories of mass spectrometry (MS)-based proteomics data. Here, we summarize recent developments in the PRIDE database and related tools. First, we provide up-to-date statistics in data content, splitting the figures by groups of organisms and species, including peptide and protein identifications, and post-translational modifications. We then describe the tools that are part of the PRIDE submission pipeline, especially the recently developed PRIDE Converter 2 (new submission tool) and PRIDE Inspector (visualization and analysis tool). We also give an update about the integration of PRIDE with other MS proteomics resources in the context of the ProteomeXchange consortium. Finally, we briefly review the quality control efforts that are ongoing at present and outline our future plans.


Assuntos
Bases de Dados de Proteínas , Proteômica , Internet , Espectrometria de Massas , Peptídeos/química , Peptídeos/metabolismo , Proteínas/química , Proteínas/metabolismo , Software
18.
BMC Bioinformatics ; 13: 324, 2012 Dec 05.
Artigo em Inglês | MEDLINE | ID: mdl-23216909

RESUMO

BACKGROUND: For shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly high rate, and the scope of what is searched for is continually increasing. Therefore solutions for improving our ability to perform these searches are needed. RESULTS: We present a sequence database search engine that is specifically designed to run efficiently on the Hadoop MapReduce distributed computing framework. The search engine implements the K-score algorithm, generating comparable output for the same input files as the original implementation. The scalability of the system is shown, and the architecture required for the development of such distributed processing is discussed. CONCLUSION: The software is scalable in its ability to handle a large peptide database, numerous modifications and large numbers of spectra. Performance scales with the number of processors in the cluster, allowing throughput to expand with the available resources.


Assuntos
Algoritmos , Proteômica/métodos , Ferramenta de Busca , Análise de Sequência de Proteína/métodos , Software , Bases de Dados Factuais , Espectrometria de Massas/métodos , Peptídeos/química , Processamento de Proteína Pós-Traducional
19.
Proteomics ; 12(6): 790-4, 2012 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-22539429

RESUMO

We present a Java application programming interface (API), jmzIdentML, for the Human Proteome Organisation (HUPO) Proteomics Standards Initiative (PSI) mzIdentML standard for peptide and protein identification data. The API combines the power of Java Architecture of XML Binding (JAXB) and an XPath-based random-access indexer to allow a fast and efficient mapping of extensible markup language (XML) elements to Java objects. The internal references in the mzIdentML files are resolved in an on-demand manner, where the whole file is accessed as a random-access swap file, and only the relevant piece of XMLis selected for mapping to its corresponding Java object. The APIis highly efficient in its memory usage and can handle files of arbitrary sizes. The APIfollows the official release of the mzIdentML (version 1.1) specifications and is available in the public domain under a permissive licence at http://www.code.google.com/p/jmzidentml/.


Assuntos
Proteínas/química , Proteômica/métodos , Proteômica/normas , Software , Sequência de Aminoácidos , Bases de Dados de Proteínas , Humanos , Dados de Sequência Molecular , Peptídeos/química , Proteoma/química , Software/normas
20.
Cancers (Basel) ; 4(4): 1180-211, 2012 Nov 08.
Artigo em Inglês | MEDLINE | ID: mdl-24213504

RESUMO

Reactome describes biological pathways as chemical reactions that closely mirror the actual physical interactions that occur in the cell. Recent extensions of our data model accommodate the annotation of cancer and other disease processes. First, we have extended our class of protein modifications to accommodate annotation of changes in amino acid sequence and the formation of fusion proteins to describe the proteins involved in disease processes. Second, we have added a disease attribute to reaction, pathway, and physical entity classes that uses disease ontology terms. To support the graphical representation of "cancer" pathways, we have adapted our Pathway Browser to display disease variants and events in a way that allows comparison with the wild type pathway, and shows connections between perturbations in cancer and other biological pathways. The curation of pathways associated with cancer, coupled with our efforts to create other disease-specific pathways, will interoperate with our existing pathway and network analysis tools. Using the Epidermal Growth Factor Receptor (EGFR) signaling pathway as an example, we show how Reactome annotates and presents the altered biological behavior of EGFR variants due to their altered kinase and ligand-binding properties, and the mode of action and specificity of anti-cancer therapeutics.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA