Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 19 de 19
Filter
Add more filters










Publication year range
1.
BMC Bioinformatics ; 24(1): 412, 2023 Nov 01.
Article in English | MEDLINE | ID: mdl-37915001

ABSTRACT

BACKGROUND: The PubMed archive contains more than 34 million articles; consequently, it is becoming increasingly difficult for a biomedical researcher to keep up-to-date with different knowledge domains. Computationally efficient and interpretable tools are needed to help researchers find and understand associations between biomedical concepts. The goal of literature-based discovery (LBD) is to connect concepts in isolated literature domains that would normally go undiscovered. This usually takes the form of an A-B-C relationship, where A and C terms are linked through a B term intermediate. Here we describe Serial KinderMiner (SKiM), an LBD algorithm for finding statistically significant links between an A term and one or more C terms through some B term intermediate(s). The development of SKiM is motivated by the observation that there are only a few LBD tools that provide a functional web interface, and that the available tools are limited in one or more of the following ways: (1) they identify a relationship but not the type of relationship, (2) they do not allow the user to provide their own lists of B or C terms, hindering flexibility, (3) they do not allow for querying thousands of C terms (which is crucial if, for instance, the user wants to query connections between a disease and the thousands of available drugs), or (4) they are specific for a particular biomedical domain (such as cancer). We provide an open-source tool and web interface that improves on all of these issues. RESULTS: We demonstrate SKiM's ability to discover useful A-B-C linkages in three control experiments: classic LBD discoveries, drug repurposing, and finding associations related to cancer. Furthermore, we supplement SKiM with a knowledge graph built with transformer machine-learning models to aid in interpreting the relationships between terms found by SKiM. Finally, we provide a simple and intuitive open-source web interface ( https://skim.morgridge.org ) with comprehensive lists of drugs, diseases, phenotypes, and symptoms so that anyone can easily perform SKiM searches. CONCLUSIONS: SKiM is a simple algorithm that can perform LBD searches to discover relationships between arbitrary user-defined concepts. SKiM is generalized for any domain, can perform searches with many thousands of C term concepts, and moves beyond the simple identification of an existence of a relationship; many relationships are given relationship type labels from our knowledge graph.


Subject(s)
Algorithms , Neoplasms , Humans , PubMed , Knowledge , Knowledge Discovery
2.
bioRxiv ; 2023 Jun 01.
Article in English | MEDLINE | ID: mdl-37397987

ABSTRACT

Background: The PubMed database contains more than 34 million articles; consequently, it is becoming increasingly difficult for a biomedical researcher to keep up-to-date with different knowledge domains. Computationally efficient and interpretable tools are needed to help researchers find and understand associations between biomedical concepts. The goal of literature-based discovery (LBD) is to connect concepts in isolated literature domains that would normally go undiscovered. This usually takes the form of an A-B-C relationship, where A and C terms are linked through a B term intermediate. Here we describe Serial KinderMiner (SKiM), an LBD algorithm for finding statistically significant links between an A term and one or more C terms through some B term intermediate(s). The development of SKiM is motivated by the the observation that there are only a few LBD tools that provide a functional web interface, and that the available tools are limited in one or more of the following ways: 1) they identify a relationship but not the type of relationship, 2) they do not allow the user to provide their own lists of B or C terms, hindering flexibility, 3) they do not allow for querying thousands of C terms (which is crucial if, for instance, the user wants to query connections between a disease and the thousands of available drugs), or 4) they are specific for a particular biomedical domain (such as cancer). We provide an open-source tool and web interface that improves on all of these issues. Results: We demonstrate SKiM's ability to discover useful A-B-C linkages in three control experiments: classic LBD discoveries, drug repurposing, and finding associations related to cancer. Furthermore, we supplement SKiM with a knowledge graph built with transformer machine-learning models to aid in interpreting the relationships between terms found by SKiM. Finally, we provide a simple and intuitive open-source web interface ( https://skim.morgridge.org ) with comprehensive lists of drugs, diseases, phenotypes, and symptoms so that anyone can easily perform SKiM searches. Conclusions: SKiM is a simple algorithm that can perform LBD searches to discover relationships between arbitrary user-defined concepts. SKiM is generalized for any domain, can perform searches with many thousands of C term concepts, and moves beyond the simple identification of an existence of a relationship; many relationships are given relationship type labels from our knowledge graph.

3.
Methods Mol Biol ; 2426: 35-66, 2023.
Article in English | MEDLINE | ID: mdl-36308684

ABSTRACT

MetaMorpheus is a free and open-source software program dedicated to the comprehensive analysis of proteomic data. In bottom-up proteomics, protein samples are digested into peptides prior to chromatographic separation and tandem mass spectrometric analysis. The resulting fragmentation spectra are subsequently analyzed with search software programs to obtain peptide identifications and infer the presence of proteins in the samples. MetaMorpheus seeks to maximize the information gleaned from proteomic data through the use of (a) mass calibration, (b) post-translational modification discovery, (c) multiple search algorithms, which aid in the analysis of data from traditional, crosslinking, and glycoproteomic experiments, (d) isotope-based or label-free quantification, (e) multi-protease protein inference, and (f) spectral annotation and data visualization capabilities. This protocol provides detailed descriptions of how use MetaMorpheus and how to customize data analysis workflows using MetaMorpheus tasks to meet the specific needs of the user.


Subject(s)
Data Analysis , Proteomics , Proteomics/methods , Software , Tandem Mass Spectrometry/methods , Peptides/chemistry , Proteins/chemistry , Algorithms , Databases, Protein
4.
Methods Mol Biol ; 2426: 303-313, 2023.
Article in English | MEDLINE | ID: mdl-36308694

ABSTRACT

The rapid and accurate quantification of peptides is a critical element of modern proteomics that has become increasingly challenging as proteomic data sets grow in size and complexity. We present here FlashLFQ, a computer program for high-speed label-free quantification of peptides and proteins following a search of bottom-up mass spectrometry data. FlashLFQ is approximately an order of magnitude faster than established label-free quantification methods and can quantify data-dependent analysis (DDA) search results from any proteomics search program. It is available as a graphical user interface program, a command line tool, a Docker image, and integrated into the MetaMorpheus search software.


Subject(s)
Proteins , Proteomics , Proteomics/methods , Proteins/chemistry , Peptides/chemistry , Software , Mass Spectrometry/methods
5.
J Proteome Res ; 21(11): 2609-2618, 2022 11 04.
Article in English | MEDLINE | ID: mdl-36206157

ABSTRACT

Tandem mass spectrometry (MS/MS) is widely employed for the analysis of complex proteomic samples. While protein sequence database searching and spectral library searching are both well-established peptide identification methods, each has shortcomings. Protein sequence databases lack fragment peak intensity information, which can result in poor discrimination between correct and incorrect spectrum assignments. Spectral libraries usually contain fewer peptides than protein sequence databases, which limits the number of peptides that can be identified. Notably, few post-translationally modified peptides are represented in spectral libraries. This is because few search engines can both identify a broad spectrum of PTMs and create corresponding spectral libraries. Also, programs that generate spectral libraries using deep learning approaches are not yet able to accurately predict spectra for the vast majority of PTMs. Here, we address these limitations through use of a hybrid search strategy that combines protein sequence database and spectral library searches to improve identification success rates and sensitivity. This software uses Global PTM Discovery (G-PTM-D) to produce spectral libraries for a wide variety of different PTMs. These features, along with a new spectrum annotation and visualization tool, have been integrated into the freely available and open-source search engine MetaMorpheus.


Subject(s)
Proteomics , Tandem Mass Spectrometry , Databases, Protein , Proteomics/methods , Tandem Mass Spectrometry/methods , Data Analysis , Software , Peptides/analysis , Peptide Library , Algorithms
6.
Genome Biol ; 23(1): 69, 2022 03 03.
Article in English | MEDLINE | ID: mdl-35241129

ABSTRACT

BACKGROUND: The detection of physiologically relevant protein isoforms encoded by the human genome is critical to biomedicine. Mass spectrometry (MS)-based proteomics is the preeminent method for protein detection, but isoform-resolved proteomic analysis relies on accurate reference databases that match the sample; neither a subset nor a superset database is ideal. Long-read RNA sequencing (e.g., PacBio or Oxford Nanopore) provides full-length transcripts which can be used to predict full-length protein isoforms. RESULTS: We describe here a long-read proteogenomics approach for integrating sample-matched long-read RNA-seq and MS-based proteomics data to enhance isoform characterization. We introduce a classification scheme for protein isoforms, discover novel protein isoforms, and present the first protein inference algorithm for the direct incorporation of long-read transcriptome data to enable detection of protein isoforms previously intractable to MS-based detection. We have released an open-source Nextflow pipeline that integrates long-read sequencing in a proteomic workflow for isoform-resolved analysis. CONCLUSIONS: Our work suggests that the incorporation of long-read sequencing and proteomic data can facilitate improved characterization of human protein isoform diversity. Our first-generation pipeline provides a strong foundation for future development of long-read proteogenomics and its adoption for both basic and translational research.


Subject(s)
Proteogenomics , Alternative Splicing , Humans , Protein Isoforms/genetics , Proteomics , Sequence Analysis, RNA/methods , Transcriptome
7.
J Biol Chem ; 297(3): 101049, 2021 09.
Article in English | MEDLINE | ID: mdl-34375640

ABSTRACT

Fused in sarcoma (FUS) encodes an RNA-binding protein with diverse roles in transcriptional activation and RNA splicing. While oncogenic fusions of FUS and transcription factor DNA-binding domains are associated with soft tissue sarcomas, dominant mutations in FUS can cause amyotrophic lateral sclerosis. FUS has also been implicated in genome maintenance. However, the underlying mechanisms of its actions in genome stability are unknown. Here, we applied gene editing, functional reconstitution, and integrated proteomics and transcriptomics to illuminate roles for FUS in DNA replication and repair. Consistent with a supportive role in DNA double-strand break repair, FUS-deficient cells exhibited subtle alterations in the recruitment and retention of double-strand break-associated factors, including 53BP1 and BRCA1. FUS-/- cells also exhibited reduced proliferative potential that correlated with reduced speed of replication fork progression, diminished loading of prereplication complexes, enhanced micronucleus formation, and attenuated expression and splicing of S-phase-associated genes. Finally, FUS-deficient cells exhibited genome-wide alterations in DNA replication timing that were reversed upon re-expression of FUS complementary DNA. We also showed that FUS-dependent replication domains were enriched in transcriptionally active chromatin and that FUS was required for the timely replication of transcriptionally active DNA. These findings suggest that alterations in DNA replication kinetics and programming contribute to genome instability and functional defects in FUS-deficient cells.


Subject(s)
DNA Replication Timing , RNA-Binding Protein FUS/metabolism , Sarcoma/genetics , Sarcoma/metabolism , BRCA1 Protein/genetics , BRCA1 Protein/metabolism , Cell Proliferation , DNA Breaks, Double-Stranded , DNA Repair , Humans , Kinetics , RNA-Binding Protein FUS/genetics , Tumor Suppressor p53-Binding Protein 1/genetics , Tumor Suppressor p53-Binding Protein 1/metabolism
8.
J Proteome Res ; 20(4): 1997-2004, 2021 04 02.
Article in English | MEDLINE | ID: mdl-33683901

ABSTRACT

MetaMorpheus is a free, open-source software program for the identification of peptides and proteoforms from data-dependent acquisition tandem MS experiments. There is inherent uncertainty in these assignments for several reasons, including the limited overlap between experimental and theoretical peaks, the m/z uncertainty, and noise peaks or peaks from coisolated peptides that produce false matches. False discovery rates provide only a set-wise approximation for incorrect spectrum matches. Here we implemented a binary decision tree calculation within MetaMorpheus to compute a posterior error probability, which provides a measure of uncertainty for each peptide-spectrum match. We demonstrate its utility for increasing identifications and resolving ambiguities in bottom-up, top-down, proteogenomic, and nonspecific digestion searches.


Subject(s)
Proteomics , Tandem Mass Spectrometry , Algorithms , Databases, Protein , Peptides , Probability , Software
9.
J Proteome Res ; 20(4): 1826-1834, 2021 04 02.
Article in English | MEDLINE | ID: mdl-32967423

ABSTRACT

Proteoforms are the workhorses of the cell, and subtle differences between their amino acid sequences or post-translational modifications (PTMs) can change their biological function. To most effectively identify and quantify proteoforms in genetically diverse samples by mass spectrometry (MS), it is advantageous to search the MS data against a sample-specific protein database that is tailored to the sample being analyzed, in that it contains the correct amino acid sequences and relevant PTMs for that sample. To this end, we have developed Spritz (https://smith-chem-wisc.github.io/Spritz/), an open-source software tool for generating protein databases annotated with sequence variations and PTMs. We provide a simple graphical user interface for Windows and scripts that can be run on any operating system. Spritz automatically sets up and executes approximately 20 tools, which enable the construction of a proteogenomic database from only raw RNA sequencing data. Sequence variations that are discovered in RNA sequencing data upon comparison to the Ensembl reference genome are annotated on proteins in these databases, and PTM annotations are transferred from UniProt. Modifications can also be discovered and added to the database using bottom-up mass spectrometry data and global PTM discovery in MetaMorpheus. We demonstrate that such sample-specific databases allow the identification of variant peptides, modified variant peptides, and variant proteoforms by searching bottom-up and top-down proteomic data from the Jurkat human T lymphocyte cell line and demonstrate the identification of phosphorylated variant sites with phosphoproteomic data from the U2OS human osteosarcoma cell line.


Subject(s)
Proteogenomics , Databases, Protein , Humans , Mass Spectrometry , Protein Processing, Post-Translational , Proteomics , Software
10.
J Am Soc Mass Spectrom ; 31(9): 1783-1802, 2020 Sep 02.
Article in English | MEDLINE | ID: mdl-32812765

ABSTRACT

The Consortium for Top-Down Proteomics (www.topdownproteomics.org) launched the present study to assess the current state of top-down mass spectrometry (TD MS) and middle-down mass spectrometry (MD MS) for characterizing monoclonal antibody (mAb) primary structures, including their modifications. To meet the needs of the rapidly growing therapeutic antibody market, it is important to develop analytical strategies to characterize the heterogeneity of a therapeutic product's primary structure accurately and reproducibly. The major objective of the present study is to determine whether current TD/MD MS technologies and protocols can add value to the more commonly employed bottom-up (BU) approaches with regard to confirming protein integrity, sequencing variable domains, avoiding artifacts, and revealing modifications and their locations. We also aim to gather information on the common TD/MD MS methods and practices in the field. A panel of three mAbs was selected and centrally provided to 20 laboratories worldwide for the analysis: Sigma mAb standard (SiLuLite), NIST mAb standard, and the therapeutic mAb Herceptin (trastuzumab). Various MS instrument platforms and ion dissociation techniques were employed. The present study confirms that TD/MD MS tools are available in laboratories worldwide and provide complementary information to the BU approach that can be crucial for comprehensive mAb characterization. The current limitations, as well as possible solutions to overcome them, are also outlined. A primary limitation revealed by the results of the present study is that the expert knowledge in both experiment and data analysis is indispensable to practice TD/MD MS.


Subject(s)
Antibodies, Monoclonal , Mass Spectrometry/methods , Proteomics/methods , Animals , Antibodies, Monoclonal/analysis , Antibodies, Monoclonal/chemistry , Antibodies, Monoclonal/genetics , Complementarity Determining Regions/analysis , Complementarity Determining Regions/chemistry , Complementarity Determining Regions/genetics , Humans , Mice
11.
Proteomes ; 8(3)2020 Jul 08.
Article in English | MEDLINE | ID: mdl-32650610

ABSTRACT

For mass spectrometry-based peptide and protein quantification, label-free quantification (LFQ) based on precursor mass peak (MS1) intensities is considered reliable due to its dynamic range, reproducibility, and accuracy. LFQ enables peptide-level quantitation, which is useful in proteomics (analyzing peptides carrying post-translational modifications) and multi-omics studies such as metaproteomics (analyzing taxon-specific microbial peptides) and proteogenomics (analyzing non-canonical sequences). Bioinformatics workflows accessible via the Galaxy platform have proven useful for analysis of such complex multi-omic studies. However, workflows within the Galaxy platform have lacked well-tested LFQ tools. In this study, we have evaluated moFF and FlashLFQ, two open-source LFQ tools, and implemented them within the Galaxy platform to offer access and use via established workflows. Through rigorous testing and communication with the tool developers, we have optimized the performance of each tool. Software features evaluated include: (a) match-between-runs (MBR); (b) using multiple file-formats as input for improved quantification; (c) use of containers and/or conda packages; (d) parameters needed for analyzing large datasets; and (e) optimization and validation of software performance. This work establishes a process for software implementation, optimization, and validation, and offers access to two robust software tools for LFQ-based analysis within the Galaxy platform.

12.
J Proteome Res ; 19(8): 3510-3517, 2020 08 07.
Article in English | MEDLINE | ID: mdl-32584579

ABSTRACT

Cellular functions are performed by a vast and diverse set of proteoforms. Proteoforms are the specific forms of proteins produced as a result of genetic variations, RNA splicing, and post-translational modifications (PTMs). Top-down mass spectrometric analysis of intact proteins enables proteoform identification, including proteoforms derived from sequence cleavage events or harboring multiple PTMs. In contrast, bottom-up proteomics identifies peptides, which necessitates protein inference and does not yield proteoform identifications. We seek here to exploit the synergies between these two data types to improve the quality and depth of the overall proteomic analysis. To this end, we automated the large-scale integration of results from multiprotease bottom-up and top-down analyses in the software program Proteoform Suite and applied it to the analysis of proteoforms from the human Jurkat T lymphocyte cell line. We implemented the recently developed proteoform-level classification scheme for top-down tandem mass spectrometry (MS/MS) identifications in Proteoform Suite, which enables users to observe the level and type of ambiguity for each proteoform identification, including which of the ambiguous proteoform identifications are supported by bottom-up-level evidence. We used Proteoform Suite to find instances where top-down identifications aid in protein inference from bottom-up analysis and conversely where bottom-up peptide identifications aid in proteoform PTM localization. We also show the use of bottom-up data to infer proteoform candidates potentially present in the sample, allowing confirmation of such proteoform candidates by intact-mass analysis of MS1 spectra. The implementation of these capabilities in the freely available software program Proteoform Suite enables users to integrate large-scale top-down and bottom-up data sets and to utilize the synergies between them to improve and extend the proteomic analysis.


Subject(s)
Proteomics , Tandem Mass Spectrometry , Humans , Protein Processing, Post-Translational , Proteome/metabolism , Software
13.
J Proteome Res ; 19(5): 1975-1981, 2020 05 01.
Article in English | MEDLINE | ID: mdl-32243168

ABSTRACT

Statistical significance tests are a common feature in quantitative proteomics workflows. The Student's t-test is widely used to compute the statistical significance of a protein's change between two groups of samples. However, the t-test's null hypothesis asserts that the difference in means between two groups is exactly zero, often marking small but uninteresting fold-changes as statistically significant. Compensations to address this issue are widely used in quantitative proteomics, but we suggest that a replacement of the t-test with a Bayesian approach offers a better path forward. In this article, we describe a Bayesian hypothesis test in which the null hypothesis is an interval rather than a single point at zero; the width of the interval is estimated from population statistics. The improved sensitivity of the method substantially increases the number of truly changing proteins detected in two benchmark data sets (ProteomeXchange identifiers PXD005590 and PXD016470). The method has been implemented within FlashLFQ, an open-source software program that quantifies bottom-up proteomics search results obtained from any search tool. FlashLFQ is rapid, sensitive, and accurate and is available both as an easy-to-use graphical user interface (Windows) and as a command-line tool (Windows/Linux/OSX).


Subject(s)
Proteomics , Software , Bayes Theorem , Humans , Proteins , Workflow
14.
Curr Bioinform ; 15(9): 1065-1074, 2020.
Article in English | MEDLINE | ID: mdl-33692656

ABSTRACT

BACKGROUND: The identification of non-specifically cleaved peptides in proteomics and peptidomics poses a significant computational challenge. Current strategies for the identification of such peptides are typically time consuming and hinder routine data analysis. OBJECTIVE: We aimed to design an algorithm that would improve the speed of semi- and non-specific enzyme searches and could be applicable to existing search programs. METHOD: We developed a novel search algorithm that leverages fragment-ion redundancy to simultaneously search multiple non-specifically cleaved peptides at once. Briefly, a theoretical peptide tandem mass spectrum is generated using only the fragment-ion series from a single terminus. This spectrum serves as a proxy for several shorter theoretical peptides sharing the same terminus. After database searching, amino acids are removed from the opposing terminus until the observed and theoretical precursor masses match within a given mass tolerance. RESULTS: The algorithm was implemented in the search program MetaMorpheus and found to perform an order of magnitude faster than the traditional MetaMorpheus search and produce superior results. CONCLUSION: We report a speedy non-specific enzyme search algorithm which is open-source and enables search programs to utilize fragment-ion redundancy to achieve a notable increase in search speed.

15.
J Proteome Res ; 18(10): 3671-3680, 2019 10 04.
Article in English | MEDLINE | ID: mdl-31479276

ABSTRACT

Complex human biomolecular processes are made possible by the diversity of human proteoforms. Constructing proteoform families, groups of proteoforms derived from the same gene, is one way to represent this diversity. Comprehensive, high-confidence identification of human proteoforms remains a central challenge in mass spectrometry-based proteomics. We have previously reported a strategy for proteoform identification using intact-mass measurements, and we have since improved that strategy by mass calibration based on search results, the use of a global post-translational modification discovery database, and the integration of top-down proteomics results with intact-mass analysis. In the present study, we combine these strategies for enhanced proteoform identification in total cell lysate from the Jurkat human T lymphocyte cell line. We collected, processed, and integrated three types of proteomics data (NeuCode-labeled intact-mass, label-free top-down, and multi-protease bottom-up) to maximize the number of confident proteoform identifications. The integrated analysis revealed 5950 unique experimentally observed proteoforms, which were assembled into 848 proteoform families. Twenty percent of the observed proteoforms were confidently identified at a 3.9% false discovery rate, representing 1207 unique proteoforms derived from 484 genes.


Subject(s)
Databases, Protein , Proteome , Proteomics/methods , Humans , Jurkat Cells , Mass Spectrometry , Peptide Hydrolases/analysis , Protein Isoforms , Protein Processing, Post-Translational
16.
J Proteome Res ; 18(9): 3429-3438, 2019 09 06.
Article in English | MEDLINE | ID: mdl-31378069

ABSTRACT

Peptides detected by tandem mass spectrometry (MS/MS) in bottom-up proteomics serve as proxies for the proteins expressed in the sample. Protein inference is a process routinely applied to these peptides to generate a plausible list of candidate protein identifications. The use of multiple proteases for parallel protein digestions expands sequence coverage, provides additional peptide identifications, and increases the probability of identifying peptides that are unique to a single protein, which are all valuable for protein inference. We have developed and implemented a multi-protease protein inference algorithm in MetaMorpheus, a bottom-up search software program, which incorporates the calculation of protease-specific q-values and preserves the association of peptide sequences and their protease of origin. This integrated multi-protease protein inference algorithm provides more accurate results than either the aggregation of results from the separate analysis of the peptide identifications produced by each protease (separate approach) in MetaMorpheus, or results that are obtained using Fido, ProteinProphet, or DTASelect2. MetaMorpheus' integrated multi-protease data analysis decreases the ambiguity of the protein group list, reduces the frequency of erroneous identifications, and increases the number of post-translational modifications identified, while combining multi-protease search and protein inference into a single software program.


Subject(s)
Proteins/isolation & purification , Proteomics , Software , Tandem Mass Spectrometry/methods , Algorithms , Amino Acid Sequence/genetics , Databases, Protein , Peptide Hydrolases/chemistry , Peptide Hydrolases/isolation & purification , Peptides/chemistry , Peptides/isolation & purification , Proteins/chemistry
17.
Proteomics ; 19(10): e1800361, 2019 05.
Article in English | MEDLINE | ID: mdl-31050378

ABSTRACT

A proteoform is a defined form of a protein derived from a given gene with a specific amino acid sequence and localized post-translational modifications. In top-down proteomic analyses, proteoforms are identified and quantified through mass spectrometric analysis of intact proteins. Recent technological developments have enabled comprehensive proteoform analyses in complex samples, and an increasing number of laboratories are adopting top-down proteomic workflows. In this review, some recent advances are outlined and current challenges and future directions for the field are discussed.


Subject(s)
Amino Acids/analysis , Mass Spectrometry , Protein Processing, Post-Translational , Proteome/analysis , Proteomics/methods , Animals , Computational Biology , Electrophoresis, Capillary , Humans , Programming Languages , Reproducibility of Results , Software
18.
J Proteome Res ; 17(7): 2370-2376, 2018 07 06.
Article in English | MEDLINE | ID: mdl-29793340

ABSTRACT

Protein chemical cross-linking combined with mass spectrometry has become an important technique for the analysis of protein structure and protein-protein interactions. A variety of cross-linkers are well developed, but reliable, rapid, and user-friendly tools for large-scale analysis of cross-linked proteins are still in need. Here we report MetaMorpheusXL, a new search module within the MetaMorpheus software suite that identifies both MS-cleavable and noncleavable cross-linked peptides in MS data. MetaMorpheusXL identifies MS-cleavable cross-linked peptides with an ion-indexing algorithm, which enables an efficient large database search. The identification does not require the presence of signature fragment ions, an advantage compared with similar programs such as XlinkX. One complication associated with the need for signature ions from cleavable cross-linkers such as DSSO (disuccinimidyl sulfoxide) is the requirement for multiple fragmentation types and energy combinations, which is not necessary for MetaMorpheusXL. The ability to perform proteome-wide analysis is another advantage of MetaMorpheusXL compared with programs such as MeroX and DXMSMS. MetaMorpheusXL is also faster than other currently available MS-cleavable cross-link search software programs. It is imbedded in MetaMorpheus, an open-source and freely available software suite that provides a reliable, fast, user-friendly graphical user interface that is readily accessible to researchers.


Subject(s)
Algorithms , Cross-Linking Reagents/chemistry , Peptides/analysis , Tandem Mass Spectrometry/methods , Databases as Topic , Peptides/chemistry , Proteome/analysis , Software
19.
J Proteome Res ; 17(1): 386-391, 2018 01 05.
Article in English | MEDLINE | ID: mdl-29083185

ABSTRACT

The rapid and accurate quantification of peptides is a critical element of modern proteomics that has become increasingly challenging as proteomic data sets grow in size and complexity. We present here FlashLFQ, a computer program for high-speed label-free quantification of peptides following a search of bottom-up mass spectrometry data. FlashLFQ is approximately an order of magnitude faster than established label-free quantification methods. The increased speed makes it practical to base quantification upon all of the charge states for a given peptide rather than solely upon the charge state that was selected for MS2 fragmentation. This increases the number of quantified peptides, improves replicate-to-replicate reproducibility, and increases quantitative accuracy. We integrated FlashLFQ into the graphical user interface of the MetaMorpheus search software, allowing it to work together with the global post-translational modification discovery (G-PTM-D) engine to accurately quantify modified peptides. FlashLFQ is also available as a NuGet package, facilitating its integration into other software, and as a standalone command line software program for the quantification of search results from other programs (e.g., MaxQuant).


Subject(s)
Peptides/analysis , Software , Mass Spectrometry , Proteomics/methods , Reproducibility of Results , Time Factors , User-Computer Interface
SELECTION OF CITATIONS
SEARCH DETAIL
...