Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
1.
Nat Methods ; 18(11): 1363-1369, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34711972

RESUMEN

The inclusion of peptide retention time prediction promises to remove peptide identification ambiguity in complex liquid chromatography-mass spectrometry identification workflows. However, due to the way peptides are encoded in current prediction models, accurate retention times cannot be predicted for modified peptides. This is especially problematic for fledgling open searches, which will benefit from accurate retention time prediction for modified peptides to reduce identification ambiguity. We present DeepLC, a deep learning peptide retention time predictor using peptide encoding based on atomic composition that allows the retention time of (previously unseen) modified peptides to be predicted accurately. We show that DeepLC performs similarly to current state-of-the-art approaches for unmodified peptides and, more importantly, accurately predicts retention times for modifications not seen during training. Moreover, we show that DeepLC's ability to predict retention times for any modification enables potentially incorrect identifications to be flagged in an open search of a wide variety of proteome data.


Asunto(s)
Algoritmos , Aprendizaje Profundo , Fragmentos de Péptidos/análisis , Procesamiento Proteico-Postraduccional , Proteínas/análisis , Proteínas/química , Proteoma/análisis , Conjuntos de Datos como Asunto , Humanos , Fragmentos de Péptidos/química , Mapeo Peptídico
2.
J Proteome Res ; 19(7): 2786-2793, 2020 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-32384242

RESUMEN

Spectral similarity searching to identify peptide-derived MS/MS spectra is a promising technique, and different spectrum similarity search tools have therefore been developed. Each of these tools, however, comes with some limitations, mainly because of low processing speed and issues with handling large databases. Furthermore, the number of spectral data formats supported is typically limited, which also creates a threshold to adoption. We have therefore developed COSS (CompOmics Spectral Searching), a new and user-friendly spectral library search tool supporting two scoring functions. COSS also includes decoy spectra generation for result validation. We have benchmarked COSS on three different spectral libraries and compared the results with established spectral searching tools and a sequence database search tool. Our comparison showed that COSS more reliably identifies spectra, is capable of handling large data sets and libraries, and is an easy to use tool that can run on low computer specifications. COSS binaries and source code can be freely downloaded from https://github.com/compomics/COSS.


Asunto(s)
Programas Informáticos , Espectrometría de Masas en Tándem , Algoritmos , Bases de Datos de Proteínas , Péptidos , Motor de Búsqueda
3.
J Proteome Res ; 19(8): 3478-3486, 2020 08 07.
Artículo en Inglés | MEDLINE | ID: mdl-32508104

RESUMEN

Protein phosphorylation is a key post-translational modification in many biological processes and is associated to human diseases such as cancer and metabolic disorders. The accurate identification, annotation, and functional analysis of phosphosites are therefore crucial to understand their various roles. Phosphosites are mainly analyzed through phosphoproteomics, which has led to increasing amounts of publicly available phosphoproteomics data. Several resources have been built around the resulting phosphosite information, but these are usually restricted to the protein sequence and basic site metadata. What is often missing from these resources, however, is context, including protein structure mapping, experimental provenance information, and biophysical predictions. We therefore developed Scop3P: a comprehensive database of human phosphosites within their full context. Scop3P integrates sequences (UniProtKB/Swiss-Prot), structures (PDB), and uniformly reprocessed phosphoproteomics data (PRIDE) to annotate all known human phosphosites. Furthermore, these sites are put into biophysical context by annotating each phosphoprotein with per-residue structural propensity, solvent accessibility, disordered probability, and early folding information. Scop3P, available at https://iomics.ugent.be/scop3p, presents a unique resource for visualization and analysis of phosphosites and for understanding of phosphosite structure-function relationships.


Asunto(s)
Fosfoproteínas , Procesamiento Proteico-Postraduccional , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Humanos , Fosfoproteínas/genética , Fosfoproteínas/metabolismo , Fosforilación
4.
J Proteome Res ; 19(1): 537-542, 2020 01 03.
Artículo en Inglés | MEDLINE | ID: mdl-31755270

RESUMEN

The field of computational proteomics is approaching the big data age, driven both by a continuous growth in the number of samples analyzed per experiment as well as by the growing amount of data obtained in each analytical run. In order to process these large amounts of data, it is increasingly necessary to use elastic compute resources such as Linux-based cluster environments and cloud infrastructures. Unfortunately, the vast majority of cross-platform proteomics tools are not able to operate directly on the proprietary formats generated by the diverse mass spectrometers. Here, we present ThermoRawFileParser, an open-source, cross-platform tool that converts Thermo RAW files into open file formats such as MGF and the HUPO-PSI standard file format mzML. To ensure the broadest possible availability and to increase integration capabilities with popular workflow systems such as Galaxy or Nextflow, we have also built Conda package and BioContainers container around ThermoRawFileParser. In addition, we implemented a user-friendly interface (ThermoRawFileParserGUI) for those users not familiar with command-line tools. Finally, we performed a benchmark of ThermoRawFileParser and msconvert to verify that the converted mzML files contain reliable quantitative results.


Asunto(s)
Biología Computacional/métodos , Proteómica/métodos , Programas Informáticos , Bases de Datos de Proteínas , Proteínas de Saccharomyces cerevisiae/metabolismo , Flujo de Trabajo
5.
J Proteome Res ; 18(2): 765-769, 2019 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-30540477

RESUMEN

Scop3D is a tool that automatically annotates protein structure with sequence conservation starting from a set of protein sequence variants. We present a complete upgrade and rewrite of Scop3D. We have included a DNA module that allows the analysis of single nucleotide polymorphisms in relation to the structural context of the protein. Scop3D therefore forms a bridge between genomics and protein structure. Moreover, Scop3D is now also available through an intuitive web-interface that makes the tool highly user-friendly.


Asunto(s)
Bases de Datos de Proteínas , Internet , Tasa de Mutación , Proteínas/genética , Programas Informáticos , Polimorfismo de Nucleótido Simple , Proteínas/química , Interfaz Usuario-Computador
6.
Nucleic Acids Res ; 43(W1): W543-6, 2015 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-25897125

RESUMEN

The iceLogo web server and SOAP service implement the previously published iceLogo algorithm. iceLogo builds on probability theory to visualize protein consensus sequences in a format resembling sequence logos. Peptide sequences are compared against a reference sequence set that can be tailored to the studied system and the used protocol. As such, not only over- but also underrepresented residues can be visualized in a statistically sound manner, which further allows the user to easily analyse and interpret conserved sequence patterns in proteins. The web application and SOAP service can be found free and open to all users without the need for a login on http://iomics.ugent.be/icelogoserver/main.html.


Asunto(s)
Secuencia de Consenso , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Algoritmos , Secuencia de Aminoácidos , Animales , Internet , Ratones
7.
J Proteome Res ; 15(3): 707-12, 2016 Mar 04.
Artículo en Inglés | MEDLINE | ID: mdl-26510693

RESUMEN

The use of proteomics bioinformatics substantially contributes to an improved understanding of proteomes, but this novel and in-depth knowledge comes at the cost of increased computational complexity. Parallelization across multiple computers, a strategy termed distributed computing, can be used to handle this increased complexity; however, setting up and maintaining a distributed computing infrastructure requires resources and skills that are not readily available to most research groups. Here we propose a free and open-source framework named Pladipus that greatly facilitates the establishment of distributed computing networks for proteomics bioinformatics tools. Pladipus is straightforward to install and operate thanks to its user-friendly graphical interface, allowing complex bioinformatics tasks to be run easily on a network instead of a single computer. As a result, any researcher can benefit from the increased computational efficiency provided by distributed computing, hence empowering them to tackle more complex bioinformatics challenges. Notably, it enables any research group to perform large-scale reprocessing of publicly available proteomics data, thus supporting the scientific community in mining these data for novel discoveries.


Asunto(s)
Biología Computacional/métodos , Redes de Comunicación de Computadores , Proteómica/métodos , Minería de Datos , Interfaz Usuario-Computador
8.
J Proteome Res ; 15(6): 1963-70, 2016 06 03.
Artículo en Inglés | MEDLINE | ID: mdl-27089233

RESUMEN

Shotgun proteomics experiments often take the form of a differential analysis, where two or more samples are compared against each other. The objective is to identify proteins that are either unique to a specific sample or a set of samples (qualitative differential proteomics), or that are significantly differentially expressed in one or more samples (quantitative differential proteomics). However, the success depends on the availability of a reliable protein sequence database for each sample. To perform such an analysis in the absence of a database, we here propose a novel, generic pipeline comprising an adapted spectral similarity score derived from database search algorithms that compares samples at the spectrum level to detect unique spectra. We applied our pipeline to compare two parasitic tapeworms: Taenia solium and Taenia hydatigena, of which only the former poses a threat to humans. Furthermore, because the genome of T. solium recently became available, we were able to prove the effectiveness and reliability of our pipeline a posteriori.


Asunto(s)
Proteómica/métodos , Taenia/química , Algoritmos , Animales , Bases de Datos de Proteínas , Genoma , Especificidad de la Especie , Espectrometría de Masas en Tándem , Flujo de Trabajo
9.
Anal Chem ; 88(20): 9949-9957, 2016 10 18.
Artículo en Inglés | MEDLINE | ID: mdl-27642655

RESUMEN

Chemical cross-linking coupled with mass spectrometry plays an important role in unravelling protein interactions, especially weak and transient ones. Moreover, cross-linking complements several structural determination approaches such as cryo-EM. Although several computational approaches are available for the annotation of spectra obtained from cross-linked peptides, there remains room for improvement. Here, we present Xilmass, a novel algorithm to identify cross-linked peptides that introduces two new concepts: (i) the cross-linked peptides are represented in the search database such that the cross-linking sites are explicitly encoded, and (ii) the scoring function derived from the Andromeda algorithm was adapted to score against a theoretical tandem mass spectrometry (MS/MS) spectrum that contains the peaks from all possible fragment ions of a cross-linked peptide pair. The performance of Xilmass was evaluated against the recently published Kojak and the popular pLink algorithms on a calmodulin-plectin complex data set, as well as three additional, published data sets. The results show that Xilmass typically had the highest number of identified distinct cross-linked sites and also the highest number of predicted cross-linked sites.


Asunto(s)
Algoritmos , Calmodulina/análisis , Plectina/análisis , Calmodulina/química , Reactivos de Enlaces Cruzados/química , Bases de Datos de Proteínas , Humanos , Plectina/química , Succinimidas/química , Espectrometría de Masas en Tándem
10.
J Proteome Res ; 14(4): 1987-90, 2015 Apr 03.
Artículo en Inglés | MEDLINE | ID: mdl-25728987

RESUMEN

Proteins are dynamic molecules; they undergo crucial conformational changes induced by post-translational modifications and by binding of cofactors or other molecules. The characterization of these conformational changes and their relation to protein function is a central goal of structural biology. Unfortunately, most conventional methods to obtain structural information do not provide information on protein dynamics. Therefore, mass spectrometry-based approaches, such as limited proteolysis, hydrogen-deuterium exchange, and stable-isotope labeling, are frequently used to characterize protein conformation and dynamics, yet the interpretation of these data can be cumbersome and time consuming. Here, we present PepShell, a tool that allows interactive data analysis of mass spectrometry-based conformational proteomics studies by visualization of the identified peptides both at the sequence and structure levels. Moreover, PepShell allows the comparison of experiments under different conditions, including different proteolysis times or binding of the protein to different substrates or inhibitors.


Asunto(s)
Presentación de Datos , Espectrometría de Masas/métodos , Conformación Proteica , Proteínas/química , Proteómica/métodos , Programas Informáticos
11.
Nucleic Acids Res ; 41(Database issue): D333-7, 2013 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-23093603

RESUMEN

We here present The Online Protein Processing Resource (TOPPR; http://iomics.ugent.be/toppr/), an online database that contains thousands of published proteolytically processed sites in human and mouse proteins. These cleavage events were identified with COmbinded FRActional DIagonal Chromatography proteomics technologies, and the resulting database is provided with full data provenance. Indeed, TOPPR provides an interactive visual display of the actual fragmentation mass spectrum that led to each identification of a reported processed site, complete with fragment ion annotations and search engine scores. Apart from warehousing and disseminating these data in an intuitive manner, TOPPR also provides an online analysis platform, including methods to analyze protease specificity and substrate-centric analyses. Concretely, TOPPR supports three ways to retrieve data: (i) the retrieval of all substrates for one or more cellular stimuli or assays; (ii) a substrate search by UniProtKB/Swiss-Prot accession number, entry name or description; and (iii) a motif search that retrieves substrates matching a user-defined protease specificity profile. The analysis of the substrates is supported through the presence of a variety of annotations, including predicted secondary structure, known domains and experimentally obtained 3D structure where available. Across substrates, substrate orthologs and conserved sequence stretches can also be shown, with iceLogo visualization provided for the latter.


Asunto(s)
Bases de Datos de Proteínas , Péptido Hidrolasas/metabolismo , Procesamiento Proteico-Postraduccional , Proteolisis , Animales , Humanos , Internet , Ratones , Proteínas/metabolismo , Especificidad por Sustrato
13.
Bioinformatics ; 29(20): 2661-3, 2013 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-23918247

RESUMEN

SUMMARY: Automated image processing has allowed cell migration research to evolve to a high-throughput research field. As a consequence, there is now an unmet need for data management in this domain. The absence of a generic management system for the quantitative data generated in cell migration assays results in each dataset being treated in isolation, making data comparison across experiments difficult. Moreover, by integrating quality control and analysis capabilities into such a data management system, the common practice of having to manually transfer data across different downstream analysis tools will be markedly sped up and made more robust. In addition, access to a data management solution creates gateways for data standardization, meta-analysis and structured public data dissemination. We here present CellMissy, a cross-platform data management system for cell migration data with a focus on wound healing data. CellMissy simplifies and automates data management, storage and analysis from the initial experimental set-up to data exploration. AVAILABILITY AND IMPLEMENTATION: CellMissy is a cross-platform open-source software developed in Java. Source code and cross-platform binaries are freely available under the Apache2 open source license at http://cellmissy.googlecode.com.


Asunto(s)
Movimiento Celular , Programas Informáticos , Bases de Datos de Compuestos Químicos , Procesamiento de Imagen Asistido por Computador , Lenguajes de Programación , Cicatrización de Heridas
14.
Mol Cell Proteomics ; 11(12): 1682-9, 2012 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-22949509

RESUMEN

The original PRIDE Converter tool greatly simplified the process of submitting mass spectrometry (MS)-based proteomics data to the PRIDE database. However, after much user feedback, it was noted that the tool had some limitations and could not handle several user requirements that were now becoming commonplace. This prompted us to design and implement a whole new suite of tools that would build on the successes of the original PRIDE Converter and allow users to generate submission-ready, well-annotated PRIDE XML files. The PRIDE Converter 2 tool suite allows users to convert search result files into PRIDE XML (the format needed for performing submissions to the PRIDE database), generate mzTab skeleton files that can be used as a basis to submit quantitative and gel-based MS data, and post-process PRIDE XML files by filtering out contaminants and empty spectra, or by merging several PRIDE XML files together. All the tools have both a graphical user interface that provides a dialog-based, user-friendly way to convert and prepare files for submission, as well as a command-line interface that can be used to integrate the tools into existing or novel pipelines, for batch processing and power users. The PRIDE Converter 2 tool suite will thus become a cornerstone in the submission process to PRIDE and, by extension, to the ProteomeXchange consortium of MS-proteomics data repositories.


Asunto(s)
Bases de Datos de Proteínas , Procesamiento Automatizado de Datos , Espectrometría de Masas , Proteómica , Proteoma/análisis , Programas Informáticos , Diseño de Software , Interfaz Usuario-Computador
15.
Anal Chem ; 85(22): 11054-60, 2013 Nov 19.
Artículo en Inglés | MEDLINE | ID: mdl-24134513

RESUMEN

The use of internal calibrants (the so-called lock mass approach) provides much greater accuracy in mass spectrometry based proteomics. However, the polydimethylcyclosiloxane (PCM) peaks commonly used for this purpose are quite unreliable, leading to missing calibrant peaks in spectra and correspondingly lower mass measurement accuracy. Therefore, we here introduce a universally applicable and robust internal calibrant, the tripeptide Asn3. We show that Asn3 is a substantial improvement over PCM both in terms of consistent detection and resulting mass measurement accuracy. Asn3 is also very easy to adopt in the lab, as it requires only minor adjustments to the analytical setup.


Asunto(s)
Asparagina/química , Cromatografía Liquida/métodos , Fragmentos de Péptidos/química , Siloxanos/química , Espectrometría de Masas en Tándem/métodos , Humanos , Células Jurkat , Proteómica
16.
Proteomics ; 12(8): 1142-6, 2012 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-22577015

RESUMEN

We have created a new software platform called sigpep that analyzes transition redundancy in selected reaction monitoring assays. Building on this platform, we also created a web application to generate transition sets with unique signatures for targeted peptides. The platform has been made available under the permissive Apache 2.0 open-source license, and the web application can be accessed from http://iomics.ugent.be/sigpep.


Asunto(s)
Espectrometría de Masas/métodos , Péptidos/análisis , Proteoma/análisis , Proteómica/métodos , Programas Informáticos , Secuencia de Aminoácidos , Animales , Humanos , Internet , Datos de Secuencia Molecular , Estándares de Referencia , Tripsina/química
17.
Anal Bioanal Chem ; 404(4): 1069-77, 2012 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-22447219

RESUMEN

Proteomics research has taken up an increasingly important role in life sciences over the past few years. Due to a strong push from publishers and funders alike, the community has also started to freely share its data in earnest, making use of public repositories such as the highly popular PRIDE database at EMBL-EBI. Reuse of these publicly available data has so far been confined to rather specific, targeted reanalyses, but this limited reuse is set to expand dramatically as repositories continue to grow exponentially. Examples of large-scale reuse are readily found in other omics disciplines, where more comprehensive public data have already accumulated over longer periods. Here, a typical example of integrative data reuse is provided by the construction of so-called expression atlases. We here therefore investigate the issues involved in using the human data currently stored in the PRIDE database to construct a robust, tissue-specific protein expression atlas from tandem-MS based label-free quantification.


Asunto(s)
Bases de Datos de Proteínas , Proteínas/química , Proteómica , Humanos , Espectrometría de Masas , Proteínas/genética , Proteínas/metabolismo , Programas Informáticos
18.
Nat Commun ; 12(1): 6414, 2021 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-34741024

RESUMEN

While transcriptome- and proteome-wide technologies to assess processes in protein biogenesis are now widely available, we still lack global approaches to assay post-ribosomal biogenesis events, in particular those occurring in the eukaryotic secretory system. We here develop a method, SECRiFY, to simultaneously assess the secretability of >105 protein fragments by two yeast species, S. cerevisiae and P. pastoris, using custom fragment libraries, surface display and a sequencing-based readout. Screening human proteome fragments with a median size of 50-100 amino acids, we generate datasets that enable datamining into protein features underlying secretability, revealing a striking role for intrinsic disorder and chain flexibility. The SECRiFY methodology generates sufficient amounts of annotated data for advanced machine learning methods to deduce secretability patterns. The finding that secretability is indeed a learnable feature of protein sequences provides a solid base for application-focused studies.


Asunto(s)
Saccharomyces cerevisiae/metabolismo , Humanos , Proteoma/genética , Proteoma/fisiología , Transcriptoma/genética , Transcriptoma/fisiología
20.
Adv Nutr ; 8(5): 639-651, 2017 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-28916566

RESUMEN

Pooled analysis of secondary data increases the power of research and enables scientific discovery in nutritional epidemiology. Information on study characteristics that determine data quality is needed to enable correct reuse and interpretation of data. This study aims to define essential quality characteristics for data from observational studies in nutrition. First, a literature review was performed to get an insight on existing instruments that assess the quality of cohort, case-control, and cross-sectional studies and dietary measurement. Second, 2 face-to-face workshops were organized to determine the study characteristics that affect data quality. Third, consensus on the data descriptors and controlled vocabulary was obtained. From 4884 papers retrieved, 26 relevant instruments, containing 164 characteristics for study design and 93 characteristics for measurements, were selected. The workshop and consensus process resulted in 10 descriptors allocated to "study design" and 22 to "measurement" domains. Data descriptors were organized as an ordinal scale of items to facilitate the identification, storage, and querying of nutrition data. Further integration of an Ontology for Nutrition Studies will facilitate interoperability of data repositories.


Asunto(s)
Dieta , Evaluación Nutricional , Estudios Observacionales como Asunto , Adiposidad , Antropometría , Bases de Datos Factuales , Estudios Epidemiológicos , Humanos , Proyectos de Investigación
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA