Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 115
Filter
1.
mSystems ; : e0092923, 2024 Jun 27.
Article in English | MEDLINE | ID: mdl-38934598

ABSTRACT

Airway microbiota are known to contribute to lung diseases, such as cystic fibrosis (CF), but their contributions to pathogenesis are still unclear. To improve our understanding of host-microbe interactions, we have developed an integrated analytical and bioinformatic mass spectrometry (MS)-based metaproteomics workflow to analyze clinical bronchoalveolar lavage (BAL) samples from people with airway disease. Proteins from BAL cellular pellets were processed and pooled together in groups categorized by disease status (CF vs. non-CF) and bacterial diversity, based on previously performed small subunit rRNA sequencing data. Proteins from each pooled sample group were digested and subjected to liquid chromatography tandem mass spectrometry (MS/MS). MS/MS spectra were matched to human and bacterial peptide sequences leveraging a bioinformatic workflow using a metagenomics-guided protein sequence database and rigorous evaluation. Label-free quantification revealed differentially abundant human peptides from proteins with known roles in CF, like neutrophil elastase and collagenase, and proteins with lesser-known roles in CF, including apolipoproteins. Differentially abundant bacterial peptides were identified from known CF pathogens (e.g., Pseudomonas), as well as other taxa with potentially novel roles in CF. We used this host-microbe peptide panel for targeted parallel-reaction monitoring validation, demonstrating for the first time an MS-based assay effective for quantifying host-microbe protein dynamics within BAL cells from individual CF patients. Our integrated bioinformatic and analytical workflow combining discovery, verification, and validation should prove useful for diverse studies to characterize microbial contributors in airway diseases. Furthermore, we describe a promising preliminary panel of differentially abundant microbe and host peptide sequences for further study as potential markers of host-microbe relationships in CF disease pathogenesis.IMPORTANCEIdentifying microbial pathogenic contributors and dysregulated human responses in airway disease, such as CF, is critical to understanding disease progression and developing more effective treatments. To this end, characterizing the proteins expressed from bacterial microbes and human host cells during disease progression can provide valuable new insights. We describe here a new method to confidently detect and monitor abundance changes of both microbe and host proteins from challenging BAL samples commonly collected from CF patients. Our method uses both state-of-the art mass spectrometry-based instrumentation to detect proteins present in these samples and customized bioinformatic software tools to analyze the data and characterize detected proteins and their association with CF. We demonstrate the use of this method to characterize microbe and host proteins from individual BAL samples, paving the way for a new approach to understand molecular contributors to CF and other diseases of the airway.

2.
Res Sq ; 2024 Jun 04.
Article in English | MEDLINE | ID: mdl-38883770

ABSTRACT

Background: Obstructive lung disease (OLD) is increasingly prevalent among persons living with HIV (PLWH). However, the role of proteases in HIV-associated OLD remains unclear. Methods: We combined proteomics and peptidomics to comprehensively characterize protease activities. We combined mass spectrometry (MS) analysis on bronchoalveolar lavage fluid (BALF) peptides and proteins from PLWH with OLD (n=25) and without OLD (n=26) with a targeted Somascan aptamer-based proteomic approach to quantify individual proteases and assess their correlation with lung function. Endogenous peptidomics mapped peptides to native proteins to identify substrates of protease activity. Using the MEROPS database, we identified candidate proteases linked to peptide generation based on binding site affinities which were assessed via z-scores. We used t-tests to compare average forced expiratory volume in 1 second per predicted value (FEV1pp) between samples with and without detection of each cleaved protein and adjusted for multiple comparisons by controlling the false discovery rate (FDR). Findings: We identified 101 proteases, of which 95 had functional network associations and 22 correlated with FEV1pp. These included cathepsins, metalloproteinases (MMP), caspases and neutrophil elastase. We discovered 31 proteins subject to proteolytic cleavage that associate with FEV1pp, with the top pathways involved in small ubiquitin-like modifier mediated modification (SUMOylation). Proteases linked to protein cleavage included neutrophil elastase, granzyme, and cathepsin D. Interpretations: In HIV-associated OLD, a significant number of proteases are up-regulated, many of which are involved in protein degradation. These proteases degrade proteins involved in cell cycle and protein stability, thereby disrupting critical biological functions.

3.
mSphere ; 9(6): e0079323, 2024 Jun 25.
Article in English | MEDLINE | ID: mdl-38780289

ABSTRACT

Clinical metaproteomics has the potential to offer insights into the host-microbiome interactions underlying diseases. However, the field faces challenges in characterizing microbial proteins found in clinical samples, usually present at low abundance relative to the host proteins. As a solution, we have developed an integrated workflow coupling mass spectrometry-based analysis with customized bioinformatic identification, quantification, and prioritization of microbial proteins, enabling targeted assay development to investigate host-microbe dynamics in disease. The bioinformatics tools are implemented in the Galaxy ecosystem, offering the development and dissemination of complex bioinformatic workflows. The modular workflow integrates MetaNovo (to generate a reduced protein database), SearchGUI/PeptideShaker and MaxQuant [to generate peptide-spectral matches (PSMs) and quantification], PepQuery2 (to verify the quality of PSMs), Unipept (for taxonomic and functional annotation), and MSstatsTMT (for statistical analysis). We have utilized this workflow in diverse clinical samples, from the characterization of nasopharyngeal swab samples to bronchoalveolar lavage fluid. Here, we demonstrate its effectiveness via analysis of residual fluid from cervical swabs. The complete workflow, including training data and documentation, is available via the Galaxy Training Network, empowering non-expert researchers to utilize these powerful tools in their clinical studies. IMPORTANCE: Clinical metaproteomics has immense potential to offer functional insights into the microbiome and its contributions to human disease. However, there are numerous challenges in the metaproteomic analysis of clinical samples, including handling of very large protein sequence databases for sensitive and accurate peptide and protein identification from mass spectrometry data, as well as taxonomic and functional annotation of quantified peptides and proteins to enable interpretation of results. To address these challenges, we have developed a novel clinical metaproteomics workflow that provides customized bioinformatic identification, verification, quantification, and taxonomic and functional annotation. This bioinformatic workflow is implemented in the Galaxy ecosystem and has been used to characterize diverse clinical sample types, such as nasopharyngeal swabs and bronchoalveolar lavage fluid. Here, we demonstrate its effectiveness and availability for use by the research community via analysis of residual fluid from cervical swabs.


Subject(s)
Computational Biology , Proteomics , Workflow , Proteomics/methods , Humans , Computational Biology/methods , Host Microbial Interactions , Mass Spectrometry , Microbiota/genetics , Bronchoalveolar Lavage Fluid/microbiology , Bronchoalveolar Lavage Fluid/chemistry , Bacterial Proteins/genetics
4.
bioRxiv ; 2023 Dec 19.
Article in English | MEDLINE | ID: mdl-38045370

ABSTRACT

Clinical metaproteomics has the potential to offer insights into the host-microbiome interactions underlying diseases. However, the field faces challenges in characterizing microbial proteins found in clinical samples, which are usually present at low abundance relative to the host proteins. As a solution, we have developed an integrated workflow coupling mass spectrometry-based analysis with customized bioinformatic identification, quantification and prioritization of microbial and host proteins, enabling targeted assay development to investigate host-microbe dynamics in disease. The bioinformatics tools are implemented in the Galaxy ecosystem, offering the development and dissemination of complex bioinformatic workflows. The modular workflow integrates MetaNovo (to generate a reduced protein database), SearchGUI/PeptideShaker and MaxQuant (to generate peptide-spectral matches (PSMs) and quantification), PepQuery2 (to verify the quality of PSMs), and Unipept and MSstatsTMT (for taxonomy and functional annotation). We have utilized this workflow in diverse clinical samples, from the characterization of nasopharyngeal swab samples to bronchoalveolar lavage fluid. Here, we demonstrate its effectiveness via analysis of residual fluid from cervical swabs. The complete workflow, including training data and documentation, is available via the Galaxy Training Network, empowering non-expert researchers to utilize these powerful tools in their clinical studies.

5.
Chem Res Toxicol ; 36(12): 2019-2030, 2023 12 18.
Article in English | MEDLINE | ID: mdl-37963067

ABSTRACT

Hemoglobin (Hb) adducts are widely used in human biomonitoring due to the high abundance of hemoglobin in human blood, its reactivity toward electrophiles, and adducted protein stability for up to 120 days. In the present paper, we compared three methods of analysis of hemoglobin adducts: mass spectrometry of derivatized N-terminal Val adducts, mass spectrometry of N-terminal adducted hemoglobin peptides, and limited proteolysis mass spectrometry . Blood from human donors was incubated with a selection of contact allergens and other electrophiles, after which hemoglobin was isolated and subjected to three analysis methods. We found that the FIRE method was able to detect and reliably quantify N-terminal adducts of acrylamide, acrylic acid, glycidic acid, and 2,3-epoxypropyl phenyl ether (PGE), but it was less efficient for 2-methyleneglutaronitrile (2-MGN) and failed to detect 1-chloro-2,4-dinitrobenzene (DNCB). By contrast, bottom-up proteomics was able to determine the presence of adducts from all six electrophiles at both the N-terminus and reactive hemoglobin side chains. Limited proteolysis mass spectrometry, studied for four contact allergens (three electrophiles and a metal salt), was able to determine the presence of covalent hemoglobin adducts with one of the three electrophiles (DNCB) and coordination complexation with the nickel salt. Together, these approaches represent complementary tools in the study of the hemoglobin adductome.


Subject(s)
Dinitrochlorobenzene , Hemoglobins , Humans , Hemoglobins/analysis , Mass Spectrometry
6.
Chem Res Toxicol ; 36(11): 1666-1682, 2023 11 20.
Article in English | MEDLINE | ID: mdl-37862059

ABSTRACT

Exogenous compounds and metabolites derived from therapeutics, microbiota, or environmental exposures directly interact with endogenous metabolic pathways, influencing disease pathogenesis and modulating outcomes of clinical interventions. With few spectral library references, the identification of covalently modified biomolecules, secondary metabolites, and xenobiotics is a challenging task using global metabolomics profiling approaches. Numerous liquid chromatography-coupled mass spectrometry (LC-MS) small molecule analytical workflows have been developed to curate global profiling experiments for specific compound groups of interest. These workflows exploit shared structural moiety, functional groups, or elemental composition to discover novel and undescribed compounds through nontargeted small molecule discovery pipelines. This Review introduces the concept of structure-oriented LC-MS discovery methodology and aims to highlight common approaches employed for the detection and characterization of covalently modified biomolecules, secondary metabolites, and xenobiotics. These approaches represent a combination of instrument-dependent and computational techniques to rapidly curate global profiling experiments to detect putative ions of interest based on fragmentation patterns, predictable phase I or phase II metabolic transformations, or rare elemental composition. Application of these methods is explored for the detection and identification of novel and undescribed biomolecules relevant to the fields of toxicology, pharmacology, and drug discovery. Continued advances in these methods expand the capacity for selective compound discovery and characterization that promise remarkable insights into the molecular interactions of exogenous chemicals with host biochemical pathways.


Subject(s)
Tandem Mass Spectrometry , Xenobiotics , Chromatography, Liquid , Drug Discovery , Environmental Exposure
7.
Expert Rev Proteomics ; 20(11): 251-266, 2023.
Article in English | MEDLINE | ID: mdl-37787106

ABSTRACT

INTRODUCTION: Continuous advances in mass spectrometry (MS) technologies have enabled deeper and more reproducible proteome characterization and a better understanding of biological systems when integrated with other 'omics data. Bioinformatic resources meeting the analysis requirements of increasingly complex MS-based proteomic data and associated multi-omic data are critically needed. These requirements included availability of software that would span diverse types of analyses, scalability for large-scale, compute-intensive applications, and mechanisms to ease adoption of the software. AREAS COVERED: The Galaxy ecosystem meets these requirements by offering a multitude of open-source tools for MS-based proteomics analyses and applications, all in an adaptable, scalable, and accessible computing environment. A thriving global community maintains these software and associated training resources to empower researcher-driven analyses. EXPERT OPINION: The community-supported Galaxy ecosystem remains a crucial contributor to basic biological and clinical studies using MS-based proteomics. In addition to the current status of Galaxy-based resources, we describe ongoing developments for meeting emerging challenges in MS-based proteomic informatics. We hope this review will catalyze increased use of Galaxy by researchers employing MS-based proteomics and inspire software developers to join the community and implement new tools, workflows, and associated training content that will add further value to this already rich ecosystem.


Subject(s)
Proteomics , Humans , Computational Biology/methods , Mass Spectrometry/methods , Proteomics/methods , Software
8.
Environ Microbiome ; 18(1): 56, 2023 Jul 07.
Article in English | MEDLINE | ID: mdl-37420292

ABSTRACT

BACKGROUND: 'Omics methods have empowered scientists to tackle the complexity of microbial communities on a scale not attainable before. Individually, omics analyses can provide great insight; while combined as "meta-omics", they enhance the understanding of which organisms occupy specific metabolic niches, how they interact, and how they utilize environmental nutrients. Here we present three integrative meta-omics workflows, developed in Galaxy, for enhanced analysis and integration of metagenomics, metatranscriptomics, and metaproteomics, combined with our newly developed web-application, ViMO (Visualizer for Meta-Omics) to analyse metabolisms in complex microbial communities. RESULTS: In this study, we applied the workflows on a highly efficient cellulose-degrading minimal consortium enriched from a biogas reactor to analyse the key roles of uncultured microorganisms in complex biomass degradation processes. Metagenomic analysis recovered metagenome-assembled genomes (MAGs) for several constituent populations including Hungateiclostridium thermocellum, Thermoclostridium stercorarium and multiple heterogenic strains affiliated to Coprothermobacter proteolyticus. The metagenomics workflow was developed as two modules, one standard, and one optimized for improving the MAG quality in complex samples by implementing a combination of single- and co-assembly, and dereplication after binning. The exploration of the active pathways within the recovered MAGs can be visualized in ViMO, which also provides an overview of the MAG taxonomy and quality (contamination and completeness), and information about carbohydrate-active enzymes (CAZymes), as well as KEGG annotations and pathways, with counts and abundances at both mRNA and protein level. To achieve this, the metatranscriptomic reads and metaproteomic mass-spectrometry spectra are mapped onto predicted genes from the metagenome to analyse the functional potential of MAGs, as well as the actual expressed proteins and functions of the microbiome, all visualized in ViMO. CONCLUSION: Our three workflows for integrative meta-omics in combination with ViMO presents a progression in the analysis of 'omics data, particularly within Galaxy, but also beyond. The optimized metagenomics workflow allows for detailed reconstruction of microbial community consisting of MAGs with high quality, and thus improves analyses of the metabolism of the microbiome, using the metatranscriptomics and metaproteomics workflows.

9.
J Proteome Res ; 22(8): 2608-2619, 2023 08 04.
Article in English | MEDLINE | ID: mdl-37450889

ABSTRACT

During the COVID-19 pandemic, impaired immunity and medical interventions resulted in cases of secondary infections. The clinical difficulties and dangers associated with secondary infections in patients necessitate the exploration of their microbiome. Metaproteomics is a powerful approach to study the taxonomic composition and functional status of the microbiome under study. In this study, the mass spectrometry (MS)-based data of nasopharyngeal swab samples from COVID-19 patients was used to investigate the metaproteome. We have established a robust bioinformatics workflow within the Galaxy platform, which includes (a) generation of a tailored database of the common respiratory tract pathogens, (b) database search using multiple search algorithms, and (c) verification of the detected microbial peptides. The microbial peptides detected in this study, belong to several opportunistic pathogens such as Streptococcus pneumoniae, Klebsiella pneumoniae, Rhizopus microsporus, and Syncephalastrum racemosum. Microbial proteins with a role in stress response, gene expression, and DNA repair were found to be upregulated in severe patients compared to negative patients. Using parallel reaction monitoring (PRM), we confirmed some of the microbial peptides in fresh clinical samples. MS-based clinical metaproteomics can serve as a powerful tool for detection and characterization of potential pathogens, which can significantly impact the diagnosis and treatment of patients.


Subject(s)
COVID-19 , Coinfection , Humans , COVID-19/diagnosis , Pandemics , Peptides , Nasopharynx
10.
Clin Proteomics ; 20(1): 14, 2023 Apr 02.
Article in English | MEDLINE | ID: mdl-37005570

ABSTRACT

BACKGROUND: Clinical bronchoalveolar lavage fluid (BALF) samples are rich in biomolecules, including proteins, and useful for molecular studies of lung health and disease. However, mass spectrometry (MS)-based proteomic analysis of BALF is challenged by the dynamic range of protein abundance, and potential for interfering contaminants. A robust, MS-based proteomics compatible sample preparation workflow for BALF samples, including those of small and large volume, would be useful for many researchers. RESULTS: We have developed a workflow that combines high abundance protein depletion, protein trapping, clean-up, and in-situ tryptic digestion, that is compatible with either qualitative or quantitative MS-based proteomic analysis. The workflow includes a value-added collection of endogenous peptides for peptidomic analysis of BALF samples, if desired, as well as amenability to offline semi-preparative or microscale fractionation of complex peptide mixtures prior to LC-MS/MS analysis, for increased depth of analysis. We demonstrate the effectiveness of this workflow on BALF samples collected from COPD patients, including for smaller sample volumes of 1-5 mL that are commonly available from the clinic. We also demonstrate the repeatability of the workflow as an indicator of its utility for quantitative proteomic studies. CONCLUSIONS: Overall, our described workflow consistently provided high quality proteins and tryptic peptides for MS analysis. It should enable researchers to apply MS-based proteomics to a wide-variety of studies focused on BALF clinical specimens.

11.
ERJ Open Res ; 9(2)2023 Mar.
Article in English | MEDLINE | ID: mdl-36949960

ABSTRACT

Purpose: Obstructive lung disease is increasingly common among persons with HIV, both smokers and nonsmokers. We used aptamer proteomics to identify proteins and associated pathways in HIV-associated obstructive lung disease. Methods: Bronchoalveolar lavage fluid (BALF) samples from 26 persons living with HIV with obstructive lung disease were matched to persons living with HIV without obstructive lung disease based on age, smoking status and antiretroviral treatment. 6414 proteins were measured using SomaScan® aptamer-based assay. We used sparse distance-weighted discrimination (sDWD) to test for a difference in protein expression and permutation tests to identify univariate associations between proteins and forced expiratory volume in 1 s % predicted (FEV1 % pred). Significant proteins were entered into a pathway over-representation analysis. We also constructed protein-driven endotypes using K-means clustering and performed over-representation analysis on the proteins that were significantly different between clusters. We compared protein-associated clusters to those obtained from BALF and plasma metabolomics data on the same patient cohort. Results: After filtering, we retained 3872 proteins for further analysis. Based on sDWD, protein expression was able to separate cases and controls. We found 575 proteins that were significantly correlated with FEV1 % pred after multiple comparisons adjustment. We identified two protein-driven endotypes, one of which was associated with poor lung function, and found that insulin and apoptosis pathways were differentially represented. We found similar clusters driven by metabolomics in BALF but not plasma. Conclusion: Protein expression differs in persons living with HIV with and without obstructive lung disease. We were not able to identify specific pathways differentially expressed among patients based on FEV1 % pred; however, we identified a unique protein endotype associated with insulin and apoptotic pathways.

12.
PLoS Comput Biol ; 19(1): e1010752, 2023 01.
Article in English | MEDLINE | ID: mdl-36622853

ABSTRACT

There is an ongoing explosion of scientific datasets being generated, brought on by recent technological advances in many areas of the natural sciences. As a result, the life sciences have become increasingly computational in nature, and bioinformatics has taken on a central role in research studies. However, basic computational skills, data analysis, and stewardship are still rarely taught in life science educational programs, resulting in a skills gap in many of the researchers tasked with analysing these big datasets. In order to address this skills gap and empower researchers to perform their own data analyses, the Galaxy Training Network (GTN) has previously developed the Galaxy Training Platform (https://training.galaxyproject.org), an open access, community-driven framework for the collection of FAIR (Findable, Accessible, Interoperable, Reusable) training materials for data analysis utilizing the user-friendly Galaxy framework as its primary data analysis platform. Since its inception, this training platform has thrived, with the number of tutorials and contributors growing rapidly, and the range of topics extending beyond life sciences to include topics such as climatology, cheminformatics, and machine learning. While initially aimed at supporting researchers directly, the GTN framework has proven to be an invaluable resource for educators as well. We have focused our efforts in recent years on adding increased support for this growing community of instructors. New features have been added to facilitate the use of the materials in a classroom setting, simplifying the contribution flow for new materials, and have added a set of train-the-trainer lessons. Here, we present the latest developments in the GTN project, aimed at facilitating the use of the Galaxy Training materials by educators, and its usage in different learning environments.


Subject(s)
Computational Biology , Software , Humans , Computational Biology/methods , Data Analysis , Research Personnel
13.
Viruses ; 14(10)2022 10 07.
Article in English | MEDLINE | ID: mdl-36298760

ABSTRACT

The Coronavirus disease 2019 (COVID-19) pandemic caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) resulted in a major health crisis worldwide with its continuously emerging new strains, resulting in new viral variants that drive "waves" of infection. PCR or antigen detection assays have been routinely used to detect clinical infections; however, the emergence of these newer strains has presented challenges in detection. One of the alternatives has been to detect and characterize variant-specific peptide sequences from viral proteins using mass spectrometry (MS)-based methods. MS methods can potentially help in both diagnostics and vaccine development by understanding the dynamic changes in the viral proteome associated with specific strains and infection waves. In this study, we developed an accessible, flexible, and shareable bioinformatics workflow that was implemented in the Galaxy Platform to detect variant-specific peptide sequences from MS data derived from the clinical samples. We demonstrated the utility of the workflow by characterizing published clinical data from across the world during various pandemic waves. Our analysis identified six SARS-CoV-2 variant-specific peptides suitable for confident detection by MS in commonly collected clinical samples.


Subject(s)
COVID-19 , SARS-CoV-2 , Humans , SARS-CoV-2/genetics , COVID-19/diagnosis , Proteome , Peptides , Viral Proteins/genetics
14.
J Acquir Immune Defic Syndr ; 91(3): 312-318, 2022 11 01.
Article in English | MEDLINE | ID: mdl-35849661

ABSTRACT

BACKGROUND: HIV is a risk factor for obstructive lung disease (OLD), independent of smoking. We used mass spectrometry (MS) approaches to identify metabolomic biomarkers that inform mechanistic pathogenesis of OLD in persons with HIV (PWH). METHODS: We obtained bronchoalveolar lavage fluid (BALF) samples from 52 PWH, in case:control (+OLD/-OLD) pairs matched on age, smoking status, and antiretroviral treatment. Four hundred nine metabolites from 8 families were measured on BALF and plasma samples using a MS-based Biocrates platform. After filtering metabolites with a high proportion of missing values and values below the level of detection, we performed univariate testing using paired t tests followed by false discovery rate corrections. We used distance-weighted discrimination (DWD) to test for an overall difference in the metabolite profile between cases and controls. RESULTS: After filtering, there were 252 BALF metabolites for analysis from 8 metabolite families. DWD testing found that collectively, BALF metabolites differentiated cases from controls, whereas plasma metabolites did not. In BALF samples, we identified 3 metabolites that correlated with OLD at the false discovery rate of 10%; all were in the phosphatidylcholine family. We identified additional BALF metabolites when analyzing lung function as a continuous variable, and these included acylcarnitines, triglycerides, and a cholesterol ester. CONCLUSIONS: Collectively, BALF metabolites differentiate PWH with and without OLD. These included several BALF lipid metabolites. These findings were limited to BALF and were not found in plasma from the same individuals. Phosphatidylcholine, the most common lipid component of surfactant, was the predominant lipid metabolite differentially expressed.


Subject(s)
HIV Infections , Lung Diseases, Obstructive , Biomarkers , Bronchoalveolar Lavage Fluid/chemistry , Cholesterol Esters , HIV Infections/complications , HIV Infections/pathology , Humans , Lung , Metabolome , Phosphatidylcholines , Surface-Active Agents , Triglycerides
15.
Proteomes ; 10(2)2022 Apr 14.
Article in English | MEDLINE | ID: mdl-35466239

ABSTRACT

Chronic inflammation of the colon causes genomic and/or transcriptomic events, which can lead to expression of non-canonical protein sequences contributing to oncogenesis. To better understand these mechanisms, Rag2-/-Il10-/- mice were infected with Helicobacter hepaticus to induce chronic inflammation of the cecum and the colon. Transcriptomic data from harvested proximal colon samples were used to generate a customized FASTA database containing non-canonical protein sequences. Using a proteogenomic approach, mass spectrometry data for proximal colon proteins were searched against this custom FASTA database using the Galaxy for Proteomics (Galaxy-P) platform. In addition to the increased abundance in inflammatory response proteins, we also discovered several non-canonical peptide sequences derived from unique proteoforms. We confirmed the veracity of these novel sequences using an automated bioinformatics verification workflow with targeted MS-based assays for peptide validation. Our bioinformatics discovery workflow identified 235 putative non-canonical peptide sequences, of which 58 were verified with high confidence and 39 were validated in targeted proteomics assays. This study provides insights into challenges faced when identifying non-canonical peptides using a proteogenomics approach and demonstrates an integrated workflow addressing these challenges. Our bioinformatic discovery and verification workflow is publicly available and accessible via the Galaxy platform and should be valuable in non-canonical peptide identification using proteogenomics.

16.
Expert Rev Proteomics ; 19(3): 165-181, 2022 03.
Article in English | MEDLINE | ID: mdl-35466851

ABSTRACT

INTRODUCTION: Mass spectrometry-based proteomics reveals dynamic molecular signatures underlying phenotypes reflecting normal and perturbed conditions in living systems. Although valuable on its own, the proteome has only one level of moleclar information, with the genome, epigenome, transcriptome, and metabolome, all providing complementary information. Multi-omic analysis integrating information from one or more of these other domains with proteomic information provides a more complete picture of molecular contributors to dynamic biological systems. AREAS COVERED: Here, we discuss the improvements to mass spectrometry-based technologies, focused on peptide-based, bottom-up approaches that have enabled deep, quantitative characterization of complex proteomes. These advances are facilitating the integration of proteomics data with other 'omic information, providing a more complete picture of living systems. We also describe the current state of bioinformatics software and approaches for integrating proteomics and other 'omics data, critical for enabling new discoveries driven by multi-omics. EXPERT COMMENTARY: Multi-omics, centered on the integration of proteomics information with other 'omic information, has tremendous promise for biological and biomedical studies. Continued advances in approaches for generating deep, reliable proteomic data and bioinformatics tools aimed at integrating data across 'omic domains will ensure the discoveries offered by these multi-omic studies continue to increase.


Proteomics uses mass spectrometry to identify as many of the proteins in a system of interest as possible, making it extremely useful in biomedical research and basic biological research. Unlike next-generation DNA/genome sequencing, proteomics directly measures the changes in gene translation in response to a disease state, injury, etc. However, when proteomics data is coupled to and examined together with other forms of 'omics' data, such as transcriptomics, genomics, and metabolomics, a full biological picture emerges that can demonstrate the underlying regulatory networks of living systems and how they respond to positive and negative stimuli. This integration is called multi-omics and represents a powerful paradigm shift in systems biology. To be fully compatible with other 'omics datasets, proteomics must be as complete and accurate as possible; in addition, the task of integrating multiple different kinds of datasets can be daunting to novice researchers. With this in mind, we reviewed in this manuscript the technologies that allow for the generation of the best possible proteomics for multi-omics analysis, in addition to the software tools needed to integrate proteomics data with other 'omics data. Together, we believe this review will enable other researchers to begin applying multi-omics approaches to answer their research questions.


Subject(s)
Proteome , Proteomics , Computational Biology , Software , Mass Spectrometry
17.
Nat Commun ; 12(1): 7305, 2021 12 15.
Article in English | MEDLINE | ID: mdl-34911965

ABSTRACT

Metaproteomics has matured into a powerful tool to assess functional interactions in microbial communities. While many metaproteomic workflows are available, the impact of method choice on results remains unclear. Here, we carry out a community-driven, multi-laboratory comparison in metaproteomics: the critical assessment of metaproteome investigation study (CAMPI). Based on well-established workflows, we evaluate the effect of sample preparation, mass spectrometry, and bioinformatic analysis using two samples: a simplified, laboratory-assembled human intestinal model and a human fecal sample. We observe that variability at the peptide level is predominantly due to sample processing workflows, with a smaller contribution of bioinformatic pipelines. These peptide-level differences largely disappear at the protein group level. While differences are observed for predicted community composition, similar functional profiles are obtained across workflows. CAMPI demonstrates the robustness of present-day metaproteomics research, serves as a template for multi-laboratory studies in metaproteomics, and provides publicly available data sets for benchmarking future developments.


Subject(s)
Bacteria/genetics , Bacterial Proteins/chemistry , Feces/microbiology , Proteomics/methods , Adult , Bacteria/classification , Bacteria/isolation & purification , Bacterial Proteins/genetics , Female , Gastrointestinal Microbiome , Humans , Intestines/microbiology , Laboratories , Mass Spectrometry , Peptides/chemistry , Workflow
18.
F1000Res ; 10: 897, 2021.
Article in English | MEDLINE | ID: mdl-34804501

ABSTRACT

Scientific data analyses often combine several computational tools in automated pipelines, or workflows. Thousands of such workflows have been used in the life sciences, though their composition has remained a cumbersome manual process due to a lack of standards for annotation, assembly, and implementation. Recent technological advances have returned the long-standing vision of automated workflow composition into focus. This article summarizes a recent Lorentz Center workshop dedicated to automated composition of workflows in the life sciences. We survey previous initiatives to automate the composition process, and discuss the current state of the art and future perspectives. We start by drawing the "big picture" of the scientific workflow development life cycle, before surveying and discussing current methods, technologies and practices for semantic domain modelling, automation in workflow development, and workflow assessment. Finally, we derive a roadmap of individual and community-based actions to work toward the vision of automated workflow development in the forthcoming years. A central outcome of the workshop is a general description of the workflow life cycle in six stages: 1) scientific question or hypothesis, 2) conceptual workflow, 3) abstract workflow, 4) concrete workflow, 5) production workflow, and 6) scientific results. The transitions between stages are facilitated by diverse tools and methods, usually incorporating domain knowledge in some form. Formal semantic domain modelling is hard and often a bottleneck for the application of semantic technologies. However, life science communities have made considerable progress here in recent years and are continuously improving, renewing interest in the application of semantic technologies for workflow exploration, composition and instantiation. Combined with systematic benchmarking with reference data and large-scale deployment of production-stage workflows, such technologies enable a more systematic process of workflow development than we know today. We believe that this can lead to more robust, reusable, and sustainable workflows in the future.


Subject(s)
Biological Science Disciplines , Computational Biology , Benchmarking , Software , Workflow
19.
F1000Res ; 10: 103, 2021.
Article in English | MEDLINE | ID: mdl-34484688

ABSTRACT

The Earth Microbiome Project (EMP) aided in understanding the role of microbial communities and the influence of collective genetic material (the 'microbiome') and microbial diversity patterns across the habitats of our planet. With the evolution of new sequencing technologies, researchers can now investigate the microbiome and map its influence on the environment and human health. Advances in bioinformatics methods for next-generation sequencing (NGS) data analysis have helped researchers to gain an in-depth knowledge about the taxonomic and genetic composition of microbial communities. Metagenomic-based methods have been the most commonly used approaches for microbiome analysis; however, it primarily extracts information about taxonomic composition and genetic potential of the microbiome under study, lacking quantification of the gene products (RNA and proteins). On the other hand, metatranscriptomics, the study of a microbial community's RNA expression, can reveal the dynamic gene expression of individual microbial populations and the community as a whole, ultimately providing information about the active pathways in the microbiome.  In order to address the analysis of NGS data, the ASaiM analysis framework was previously developed and made available via the Galaxy platform. Although developed for both metagenomics and metatranscriptomics, the original publication demonstrated the use of ASaiM only for metagenomics, while thorough testing for metatranscriptomics data was lacking.  In the current study, we have focused on validating and optimizing the tools within ASaiM for metatranscriptomics data. As a result, we deliver a robust workflow that will enable researchers to understand dynamic functional response of the microbiome in a wide variety of metatranscriptomics studies. This improved and optimized ASaiM-metatranscriptomics (ASaiM-MT) workflow is publicly available via the ASaiM framework, documented and supported with training material so that users can interrogate and characterize metatranscriptomic data, as part of larger meta-omic studies of microbiomes.


Subject(s)
Metagenomics , Microbiota , High-Throughput Nucleotide Sequencing , Humans , Metagenome , Microbiota/genetics , Workflow
20.
Chem Res Toxicol ; 34(7): 1769-1781, 2021 07 19.
Article in English | MEDLINE | ID: mdl-34110810

ABSTRACT

Humans are exposed to large numbers of electrophiles from their diet, the environment, and endogenous physiological processes. Adducts formed at the N-terminal valine of hemoglobin are often used as biomarkers of human exposure to electrophilic compounds. We previously reported the formation of hemoglobin N-terminal valine adducts (added mass, 106.042 Da) in the blood of human smokers and nonsmokers and identified their structure as 4-hydroxybenzyl-Val. In the present work, mass spectrometry-based proteomics was utilized to identify additional sites for 4-hydroxybenzyl adduct formation at internal nucleophilic amino acid side chains within hemoglobin. Hemoglobin isolated from human blood was treated with para-quinone methide (para-QM) followed by global nanoLC-MS/MS and targeted nanoLC-MS/MS to identify amino acid residues containing the 4-hydroxybenzyl modification. Our experiments revealed the formation of 4-hydroxybenzyl adducts at the αHis20, αTyr24, αTyr42, αHis45, ßSer72, ßThr84, ßThr87, ßSer89, ßHis92, ßCys93, ßCys112, ßThr123, and ßHis143 residues (in addition to N-terminal valine) through characteristic MS/MS spectra. These amino acid side chains had variable reactivity toward para-QM with αHis45, αTyr42, ßCys93, ßHis92, and ßSer72 forming the largest numbers of adducts upon exposure to para-QM. Two additional mechanisms for formation of 4-hydroxybenzyl adducts in humans were investigated: exposure to 4-hydroxybenzaldehyde (4-HBA) followed by reduction and UV-mediated reactions of hemoglobin with tyrosine. Exposure of hemoglobin to a 5-fold molar excess of 4-HBA followed by reduction with sodium cyanoborohydride produced 4-hydroxybenzyl adducts at several amino acid side chains of which αHis20, αTyr24, αTyr42, αHis45, ßSer44, ßThr84, and ßHis92 were verified in targeted mass spectrometry experiments. Similarly, exposure of human blood to ultraviolet radiation produced 4-hydroxybenzyl adducts at αHis20, αTyr24, αTyr42, αHis45, ßSer44, ßThr84, and ßSer89. Overall, our results reveal that 4-hydroxybenzyl adducts form at multiple nucleophilic sites of hemoglobin and that para-QM is the most likely source of these adducts in humans.


Subject(s)
Benzyl Compounds/chemistry , Hemoglobins/chemistry , Indolequinones/chemistry , Amino Acid Sequence , Amino Acids/chemistry , Humans , Models, Molecular
SELECTION OF CITATIONS
SEARCH DETAIL
...