Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 95
Filter
1.
Nat Methods ; 20(12): 1883-1886, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37996752

ABSTRACT

Cardinal v.3 is an open-source software for reproducible analysis of mass spectrometry imaging experiments. A major update from its previous versions, Cardinal v.3 supports most mass spectrometry imaging workflows. Its analytical capabilities include advanced data processing such as mass recalibration, advanced statistical analyses such as single-ion segmentation and rough annotation-based classification, and memory-efficient analyses of large-scale multitissue experiments.


Subject(s)
Image Processing, Computer-Assisted , Software , Mass Spectrometry/methods
2.
Nat Methods ; 20(10): 1523-1529, 2023 10.
Article in English | MEDLINE | ID: mdl-37749212

ABSTRACT

Protein complexes are responsible for the enactment of most cellular functions. For the protein complex to form and function, its subunits often need to be present at defined quantitative ratios. Typically, global changes in protein complex composition are assessed with experimental approaches that tend to be time consuming. Here, we have developed a computational algorithm for the detection of altered protein complexes based on the systematic assessment of subunit ratios from quantitative proteomic measurements. We applied it to measurements from breast cancer cell lines and patient biopsies and were able to identify strong remodeling of HDAC2 epigenetic complexes in more aggressive forms of cancer. The presented algorithm is available as an R package and enables the inference of changes in protein complex states by extracting functionally relevant information from bottom-up proteomic datasets.


Subject(s)
Proteome , Proteomics , Humans , Proteome/metabolism , Algorithms , MCF-7 Cells , Computational Biology
3.
J Proteome Res ; 22(8): 2641-2659, 2023 08 04.
Article in English | MEDLINE | ID: mdl-37467362

ABSTRACT

Repeated measures experimental designs, which quantify proteins in biological subjects repeatedly over multiple experimental conditions or times, are commonly used in mass spectrometry-based proteomics. Such designs distinguish the biological variation within and between the subjects and increase the statistical power of detecting within-subject changes in protein abundance. Meanwhile, proteomics experiments increasingly incorporate tandem mass tag (TMT) labeling, a multiplexing strategy that gains both relative protein quantification accuracy and sample throughput. However, combining repeated measures and TMT multiplexing in a large-scale investigation presents statistical challenges due to unique interplays of between-mixture, within-mixture, between-subject, and within-subject variation. This manuscript proposes a family of linear mixed-effects models for differential analysis of proteomics experiments with repeated measures and TMT multiplexing. These models decompose the variation in the data into the contributions from its sources as appropriate for the specifics of each experiment, enable statistical inference of differential protein abundance, and recognize a difference in the uncertainty of between-subject versus within-subject comparisons. The proposed family of models is implemented in the R/Bioconductor package MSstatsTMT v2.2.0. Evaluations of four simulated datasets and four investigations answering diverse biological questions demonstrated the value of this approach as compared to the existing general-purpose approaches and implementations.


Subject(s)
Research Design , Tandem Mass Spectrometry , Humans , Proteome/analysis
4.
Bioinformatics ; 39(39 Suppl 1): i494-i503, 2023 06 30.
Article in English | MEDLINE | ID: mdl-37387179

ABSTRACT

Causal query estimation in biomolecular networks commonly selects a 'valid adjustment set', i.e. a subset of network variables that eliminates the bias of the estimator. A same query may have multiple valid adjustment sets, each with a different variance. When networks are partially observed, current methods use graph-based criteria to find an adjustment set that minimizes asymptotic variance. Unfortunately, many models that share the same graph topology, and therefore same functional dependencies, may differ in the processes that generate the observational data. In these cases, the topology-based criteria fail to distinguish the variances of the adjustment sets. This deficiency can lead to sub-optimal adjustment sets, and to miss-characterization of the effect of the intervention. We propose an approach for deriving 'optimal adjustment sets' that takes into account the nature of the data, bias and finite-sample variance of the estimator, and cost. It empirically learns the data generating processes from historical experimental data, and characterizes the properties of the estimators by simulation. We demonstrate the utility of the proposed approach in four biomolecular Case studies with different topologies and different data generation processes. The implementation and reproducible Case studies are at https://github.com/srtaheri/OptimalAdjustmentSet.


Subject(s)
Computational Biology , Computer Simulation
5.
J Proteome Res ; 22(5): 1466-1482, 2023 05 05.
Article in English | MEDLINE | ID: mdl-37018319

ABSTRACT

The MSstats R-Bioconductor family of packages is widely used for statistical analyses of quantitative bottom-up mass spectrometry-based proteomic experiments to detect differentially abundant proteins. It is applicable to a variety of experimental designs and data acquisition strategies and is compatible with many data processing tools used to identify and quantify spectral features. In the face of ever-increasing complexities of experiments and data processing strategies, the core package of the family, with the same name MSstats, has undergone a series of substantial updates. Its new version MSstats v4.0 improves the usability, versatility, and accuracy of statistical methodology, and the usage of computational resources. New converters integrate the output of upstream processing tools directly with MSstats, requiring less manual work by the user. The package's statistical models have been updated to a more robust workflow. Finally, MSstats' code has been substantially refactored to improve memory use and computation speed. Here we detail these updates, highlighting methodological differences between the new and old versions. An empirical comparison of MSstats v4.0 to its previous implementations, as well as to the packages MSqRob and DEqMS, on controlled mixtures and biological experiments demonstrated a stronger performance and better usability of MSstats v4.0 as compared to existing methods.


Subject(s)
Proteomics , Research Design , Proteomics/methods , Software , Mass Spectrometry/methods , Chromatography, Liquid/methods
6.
bioRxiv ; 2023 Feb 21.
Article in English | MEDLINE | ID: mdl-36865170

ABSTRACT

Cardinal v3 is an open source software for reproducible analysis of mass spectrometry imaging experiments. A major update from its previous versions, Cardinal v3 supports most mass spectrometry imaging workflows. Its analytical capabilities include advanced data processing such as mass re-calibration, advanced statistical analyses such as single-ion segmentation and rough annotation-based classification, and memory-efficient analyses of large-scale multi-tissue experiments.

7.
Bioinformatics ; 39(2)2023 02 03.
Article in English | MEDLINE | ID: mdl-36744928

ABSTRACT

MOTIVATION: Mass Spectrometry Imaging (MSI) analyzes complex biological samples such as tissues. It simultaneously characterizes the ions present in the tissue in the form of mass spectra, and the spatial distribution of the ions across the tissue in the form of ion images. Unsupervised clustering of ion images facilitates the interpretation in the spectral domain, by identifying groups of ions with similar spatial distributions. Unfortunately, many current methods for clustering ion images ignore the spatial features of the images, and are therefore unable to learn these features for clustering purposes. Alternative methods extract spatial features using deep neural networks pre-trained on natural image tasks; however, this is often inadequate since ion images are substantially noisier than natural images. RESULTS: We contribute a deep clustering approach for ion images that accounts for both spatial contextual features and noise. In evaluations on a simulated dataset and on four experimental datasets of different tissue types, the proposed method grouped ions from the same source into a same cluster more frequently than existing methods. We further demonstrated that using ion image clustering as a pre-processing step facilitated the interpretation of a subsequent spatial segmentation as compared to using either all the ions or one ion at a time. As a result, the proposed approach facilitated the interpretability of MSI data in both the spectral domain and the spatial domain. AVAILABILITYAND IMPLEMENTATION: The data and code are available at https://github.com/DanGuo1223/mzClustering. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Neural Networks, Computer , Mass Spectrometry/methods , Cluster Analysis , Ions/analysis
9.
J Proteome Res ; 22(2): 551-556, 2023 02 03.
Article in English | MEDLINE | ID: mdl-36622173

ABSTRACT

Liquid chromatography coupled with bottom-up mass spectrometry (LC-MS/MS)-based proteomics is a versatile technology for identifying and quantifying proteins in complex biological mixtures. Postidentification, analysis of changes in protein abundances between conditions requires increasingly complex and specialized statistical methods. Many of these methods, in particular the family of open-source Bioconductor packages MSstats, are implemented in a coding language such as R. To make the methods in MSstats accessible to users with limited programming and statistical background, we have created MSstatsShiny, an R-Shiny graphical user interface (GUI) integrated with MSstats, MSstatsTMT, and MSstatsPTM. The GUI provides a point and click analysis pipeline applicable to a wide variety of proteomics experimental types, including label-free data-dependent acquisitions (DDAs) or data-independent acquisitions (DIAs), or tandem mass tag (TMT)-based TMT-DDAs, answering questions such as relative changes in the abundance of peptides, proteins, or post-translational modifications (PTMs). To support reproducible research, the application saves user's selections and builds an R script that programmatically recreates the analysis. MSstatsShiny can be installed locally via Github and Bioconductor, or utilized on the cloud at www.msstatsshiny.com. We illustrate the utility of the platform using two experimental data sets (MassIVE IDs MSV000086623 and MSV000085565).


Subject(s)
Proteomics , Software , Proteomics/methods , Chromatography, Liquid/methods , Tandem Mass Spectrometry/methods , Proteins/analysis
10.
Nat Protoc ; 18(3): 659-682, 2023 03.
Article in English | MEDLINE | ID: mdl-36526727

ABSTRACT

Proteins regulate biological processes by changing their structure or abundance to accomplish a specific function. In response to a perturbation, protein structure may be altered by various molecular events, such as post-translational modifications, protein-protein interactions, aggregation, allostery or binding to other molecules. The ability to probe these structural changes in thousands of proteins simultaneously in cells or tissues can provide valuable information about the functional state of biological processes and pathways. Here, we present an updated protocol for LiP-MS, a proteomics technique combining limited proteolysis with mass spectrometry, to detect protein structural alterations in complex backgrounds and on a proteome-wide scale. In LiP-MS, proteins undergo a brief proteolysis in native conditions followed by complete digestion in denaturing conditions, to generate structurally informative proteolytic fragments that are analyzed by mass spectrometry. We describe advances in the throughput and robustness of the LiP-MS workflow and implementation of data-independent acquisition-based mass spectrometry, which together achieve high reproducibility and sensitivity, even on large sample sizes. We introduce MSstatsLiP, an R package dedicated to the analysis of LiP-MS data for the identification of structurally altered peptides and differentially abundant proteins. The experimental procedures take 3 d, mass spectrometric measurement time and data processing depend on sample number and statistical analysis typically requires ~1 d. These improvements expand the adaptability of LiP-MS and enable wide use in functional proteomics and translational applications.


Subject(s)
Protein Processing, Post-Translational , Proteome , Proteolysis , Proteome/analysis , Reproducibility of Results , Mass Spectrometry/methods
11.
Mol Cell Proteomics ; 22(1): 100477, 2023 01.
Article in English | MEDLINE | ID: mdl-36496144

ABSTRACT

Liquid chromatography coupled with bottom-up mass spectrometry (LC-MS/MS)-based proteomics is increasingly used to detect changes in posttranslational modifications (PTMs) in samples from different conditions. Analysis of data from such experiments faces numerous statistical challenges. These include the low abundance of modified proteoforms, the small number of observed peptides that span modification sites, and confounding between changes in the abundance of PTM and the overall changes in the protein abundance. Therefore, statistical approaches for detecting differential PTM abundance must integrate all the available information pertaining to a PTM site and consider all the relevant sources of confounding and variation. In this manuscript, we propose such a statistical framework, which is versatile, accurate, and leads to reproducible results. The framework requires an experimental design, which quantifies, for each sample, both peptides with PTMs and peptides from the same proteins with no modification sites. The proposed framework supports both label-free and tandem mass tag-based LC-MS/MS acquisitions. The statistical methodology separately summarizes the abundances of peptides with and without the modification sites, by fitting separate linear mixed effects models appropriate for the experimental design. Next, model-based inferences regarding the PTM and the protein-level abundances are combined to account for the confounding between these two sources. Evaluations on computer simulations, a spike-in experiment with known ground truth, and three biological experiments with different organisms, modification types, and data acquisition types demonstrate the improved fold change estimation and detection of differential PTM abundance, as compared to currently used approaches. The proposed framework is implemented in the free and open-source R/Bioconductor package MSstatsPTM.


Subject(s)
Proteomics , Tandem Mass Spectrometry , Proteomics/methods , Chromatography, Liquid , Protein Processing, Post-Translational , Proteins , Peptides/chemistry
12.
Bioinformatics ; 38(Suppl 1): i350-i358, 2022 06 24.
Article in English | MEDLINE | ID: mdl-35758817

ABSTRACT

MOTIVATION: Estimating causal queries, such as changes in protein abundance in response to a perturbation, is a fundamental task in the analysis of biomolecular pathways. The estimation requires experimental measurements on the pathway components. However, in practice many pathway components are left unobserved (latent) because they are either unknown, or difficult to measure. Latent variable models (LVMs) are well-suited for such estimation. Unfortunately, LVM-based estimation of causal queries can be inaccurate when parameters of the latent variables are not uniquely identified, or when the number of latent variables is misspecified. This has limited the use of LVMs for causal inference in biomolecular pathways. RESULTS: In this article, we propose a general and practical approach for LVM-based estimation of causal queries. We prove that, despite the challenges above, LVM-based estimators of causal queries are accurate if the queries are identifiable according to Pearl's do-calculus and describe an algorithm for its estimation. We illustrate the breadth and the practical utility of this approach for estimating causal queries in four synthetic and two experimental case studies, where structures of biomolecular pathways challenge the existing methods for causal query estimation. AVAILABILITY AND IMPLEMENTATION: The code and the data documenting all the case studies are available at https://github.com/srtaheri/LVMwithDoCalculus. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Calculi , Humans , Models, Theoretical , Proteins
13.
Clin Proteomics ; 19(1): 8, 2022 Apr 19.
Article in English | MEDLINE | ID: mdl-35439943

ABSTRACT

BACKGROUND: Mass spectrometry imaging (MSI) derives spatial molecular distribution maps directly from clinical tissue specimens and thus bears great potential for assisting pathologists with diagnostic decisions or personalized treatments. Unfortunately, progress in translational MSI is often hindered by insufficient quality control and lack of reproducible data analysis. Raw data and analysis scripts are rarely publicly shared. Here, we demonstrate the application of the Galaxy MSI tool set for the reproducible analysis of a urothelial carcinoma dataset. METHODS: Tryptic peptides were imaged in a cohort of 39 formalin-fixed, paraffin-embedded human urothelial cancer tissue cores with a MALDI-TOF/TOF device. The complete data analysis was performed in a fully transparent and reproducible manner on the European Galaxy Server. Annotations of tumor and stroma were performed by a pathologist and transferred to the MSI data to allow for supervised classifications of tumor vs. stroma tissue areas as well as for muscle-infiltrating and non-muscle infiltrating urothelial carcinomas. For putative peptide identifications, m/z features were matched to the MSiMass list. RESULTS: Rigorous quality control in combination with careful pre-processing enabled reduction of m/z shifts and intensity batch effects. High classification accuracy was found for both, tumor vs. stroma and muscle-infiltrating vs. non-muscle infiltrating urothelial tumors. Some of the most discriminative m/z features for each condition could be assigned a putative identity: stromal tissue was characterized by collagen peptides and tumor tissue by histone peptides. Immunohistochemistry confirmed an increased histone H2A abundance in the tumor compared to the stroma tissues. The muscle-infiltration status was distinguished via MSI by peptides from intermediate filaments such as cytokeratin 7 in non-muscle infiltrating carcinomas and vimentin in muscle-infiltrating urothelial carcinomas, which was confirmed by immunohistochemistry. To make the study fully reproducible and to advocate the criteria of FAIR (findability, accessibility, interoperability, and reusability) research data, we share the raw data, spectra annotations as well as all Galaxy histories and workflows. Data are available via ProteomeXchange with identifier PXD026459 and Galaxy results via https://github.com/foellmelanie/Bladder_MSI_Manuscript_Galaxy_links . CONCLUSION: Here, we show that translational MSI data analysis in a fully transparent and reproducible manner is possible and we would like to encourage the community to join our efforts.

14.
J Proteome Res ; 21(1): 289-294, 2022 01 07.
Article in English | MEDLINE | ID: mdl-34919405

ABSTRACT

Skyline Batch is a newly developed Windows forms application that enables the easy and consistent reprocessing of data with Skyline. Skyline has made previous advances in this direction; however, none enable seamless automated reprocessing of local and remote files. Skyline keeps a log of all of the steps that were taken in the document; however, reproducing these steps takes time and allows room for human error. Skyline also has a command-line interface, enabling it to be run from a batch script, but using the program in this way requires expertise in editing these scripts. By formalizing the workflow of a highly used set of batch scripts into an intuitive and powerful user interface, Skyline Batch can reprocess data stored in remote repositories just by opening and running a Skyline Batch configuration file. When run, a Skyline Batch configuration downloads all necessary remote files and then runs a four-step Skyline workflow. By condensing the steps needed to reprocess the data into one file, Skyline Batch gives researchers the opportunity to publish their processing along with their data and other analysis files. These easily run configuration files will greatly increase the transparency and reproducibility of published work. Skyline Batch is freely available at https://skyline.ms/batch.url.


Subject(s)
Software , User-Computer Interface , Humans , Reproducibility of Results , Workflow
15.
Elife ; 102021 06 04.
Article in English | MEDLINE | ID: mdl-34085925

ABSTRACT

Defective autophagy is strongly associated with chronic inflammation. Loss-of-function of the core autophagy gene Atg16l1 increases risk for Crohn's disease in part by enhancing innate immunity through myeloid cells such as macrophages. However, autophagy is also recognized as a mechanism for clearance of certain intracellular pathogens. These divergent observations prompted a re-evaluation of ATG16L1 in innate antimicrobial immunity. In this study, we found that loss of Atg16l1 in myeloid cells enhanced the killing of virulent Shigella flexneri (S.flexneri), a clinically relevant enteric bacterium that resides within the cytosol by escaping from membrane-bound compartments. Quantitative multiplexed proteomics of murine bone marrow-derived macrophages revealed that ATG16L1 deficiency significantly upregulated proteins involved in the glutathione-mediated antioxidant response to compensate for elevated oxidative stress, which simultaneously promoted S.flexneri killing. Consistent with this, myeloid-specific deletion of Atg16l1 in mice accelerated bacterial clearance in vitro and in vivo. Pharmacological induction of oxidative stress through suppression of cysteine import enhanced microbial clearance by macrophages. Conversely, antioxidant treatment of macrophages permitted S.flexneri proliferation. These findings demonstrate that control of oxidative stress by ATG16L1 and autophagy regulates antimicrobial immunity against intracellular pathogens.


Subject(s)
Autophagy-Related Proteins/deficiency , Autophagy , Dysentery, Bacillary/microbiology , Immunity, Innate , Macrophages/microbiology , Oxidative Stress , Proteome , Proteomics , Shigella flexneri/pathogenicity , Animals , Autophagy-Related Proteins/genetics , Cells, Cultured , Disease Models, Animal , Dysentery, Bacillary/immunology , Dysentery, Bacillary/metabolism , Host-Pathogen Interactions , Inflammation Mediators/metabolism , Macrophages/immunology , Macrophages/metabolism , Mice, Inbred C57BL , Mice, Knockout , Microbial Viability , Shigella flexneri/immunology , Shigella flexneri/metabolism , Virulence
16.
IEEE Trans Big Data ; 7(1): 25-37, 2021 Mar 01.
Article in English | MEDLINE | ID: mdl-37981991

ABSTRACT

Counterfactual inference is a useful tool for comparing outcomes of interventions on complex systems. It requires us to represent the system in form of a structural causal model, complete with a causal diagram, probabilistic assumptions on exogenous variables, and functional assignments. Specifying such models can be extremely difficult in practice. The process requires substantial domain expertise, and does not scale easily to large systems, multiple systems, or novel system modifications. At the same time, many application domains, such as molecular biology, are rich in structured causal knowledge that is qualitative in nature. This article proposes a general approach for querying a causal biological knowledge graph, and converting the qualitative result into a quantitative structural causal model that can learn from data to answer the question. We demonstrate the feasibility, accuracy and versatility of this approach using two case studies in systems biology. The first demonstrates the appropriateness of the underlying assumptions and the accuracy of the results. The second demonstrates the versatility of the approach by querying a knowledge base for the molecular determinants of a severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)-induced cytokine storm, and performing counterfactual inference to estimate the causal effect of medical countermeasures for severely ill patients.

17.
Bioinformatics ; 36(Suppl_2): i745-i753, 2020 12 30.
Article in English | MEDLINE | ID: mdl-33381824

ABSTRACT

MOTIVATION: Accurate estimation of false discovery rate (FDR) of spectral identification is a central problem in mass spectrometry-based proteomics. Over the past two decades, target-decoy approaches (TDAs) and decoy-free approaches (DFAs) have been widely used to estimate FDR. TDAs use a database of decoy species to faithfully model score distributions of incorrect peptide-spectrum matches (PSMs). DFAs, on the other hand, fit two-component mixture models to learn the parameters of correct and incorrect PSM score distributions. While conceptually straightforward, both approaches lead to problems in practice, particularly in experiments that push instrumentation to the limit and generate low fragmentation-efficiency and low signal-to-noise-ratio spectra. RESULTS: We introduce a new decoy-free framework for FDR estimation that generalizes present DFAs while exploiting more search data in a manner similar to TDAs. Our approach relies on multi-component mixtures, in which score distributions corresponding to the correct PSMs, best incorrect PSMs and second-best incorrect PSMs are modeled by the skew normal family. We derive EM algorithms to estimate parameters of these distributions from the scores of best and second-best PSMs associated with each experimental spectrum. We evaluate our models on multiple proteomics datasets and a HeLa cell digest case study consisting of more than a million spectra in total. We provide evidence of improved performance over existing DFAs and improved stability and speed over TDAs without any performance degradation. We propose that the new strategy has the potential to extend beyond peptide identification and reduce the need for TDA on all analytical platforms. AVAILABILITYAND IMPLEMENTATION: https://github.com/shawn-peng/FDR-estimation. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Proteomics , Tandem Mass Spectrometry , Algorithms , Databases, Protein , HeLa Cells , Humans , Peptides
18.
Nat Methods ; 17(10): 981-984, 2020 10.
Article in English | MEDLINE | ID: mdl-32929271

ABSTRACT

MassIVE.quant is a repository infrastructure and data resource for reproducible quantitative mass spectrometry-based proteomics, which is compatible with all mass spectrometry data acquisition types and computational analysis tools. A branch structure enables MassIVE.quant to systematically store raw experimental data, metadata of the experimental design, scripts of the quantitative analysis workflow, intermediate input and output files, as well as alternative reanalyses of the same dataset.


Subject(s)
Databases, Protein , Mass Spectrometry , Proteomics , Algorithms , Fungal Proteins/chemistry , Reproducibility of Results , Saccharomyces cerevisiae/metabolism , Software
19.
Cancer Epidemiol Biomarkers Prev ; 29(10): 1973-1982, 2020 10.
Article in English | MEDLINE | ID: mdl-32732250

ABSTRACT

BACKGROUND: We have verified a mass spectrometry (MS)-based targeted proteomics signature for the detection of malignant pleural mesothelioma (MPM) from the blood. METHODS: A seven-peptide biomarker MPM signature by targeted proteomics in serum was identified in a previous independent study. Here, we have verified the predictive accuracy of a reduced version of that signature, now composed of six-peptide biomarkers. We have applied liquid chromatography-selected reaction monitoring (LC-SRM), also known as multiple-reaction monitoring (MRM), for the investigation of 402 serum samples from 213 patients with MPM and 189 cancer-free asbestos-exposed donors from the United States, Australia, and Europe. RESULTS: Each of the biomarkers composing the signature was independently informative, with no apparent functional or physical relation to each other. The multiplexing possibility offered by MS proteomics allowed their integration into a single signature with a higher discriminating capacity than that of the single biomarkers alone. The strategy allowed in this way to increase their potential utility for clinical decisions. The signature discriminated patients with MPM and asbestos-exposed donors with AUC of 0.738. For early-stage MPM, AUC was 0.765. This signature was also prognostic, and Kaplan-Meier analysis showed a significant difference between high- and low-risk groups with an HR of 1.659 (95% CI, 1.075-2.562; P = 0.021). CONCLUSIONS: Targeted proteomics allowed the development of a multianalyte signature with diagnostic and prognostic potential for MPM from the blood. IMPACT: The proteomic signature represents an additional diagnostic approach for informing clinical decisions for patients at risk for MPM.


Subject(s)
Mass Spectrometry/methods , Mesothelioma, Malignant/genetics , Pleural Neoplasms/genetics , Proteomics/methods , Aged , Female , Humans , Male , Middle Aged
20.
Bioinformatics ; 36(Suppl_1): i300-i308, 2020 07 01.
Article in English | MEDLINE | ID: mdl-32657378

ABSTRACT

MOTIVATION: Mass spectrometry imaging (MSI) characterizes the molecular composition of tissues at spatial resolution, and has a strong potential for distinguishing tissue types, or disease states. This can be achieved by supervised classification, which takes as input MSI spectra, and assigns class labels to subtissue locations. Unfortunately, developing such classifiers is hindered by the limited availability of training sets with subtissue labels as the ground truth. Subtissue labeling is prohibitively expensive, and only rough annotations of the entire tissues are typically available. Classifiers trained on data with approximate labels have sub-optimal performance. RESULTS: To alleviate this challenge, we contribute a semi-supervised approach mi-CNN. mi-CNN implements multiple instance learning with a convolutional neural network (CNN). The multiple instance aspect enables weak supervision from tissue-level annotations when classifying subtissue locations. The convolutional architecture of the CNN captures contextual dependencies between the spectral features. Evaluations on simulated and experimental datasets demonstrated that mi-CNN improved the subtissue classification as compared to traditional classifiers. We propose mi-CNN as an important step toward accurate subtissue classification in MSI, enabling rapid distinction between tissue types and disease states. AVAILABILITY AND IMPLEMENTATION: The data and code are available at https://github.com/Vitek-Lab/mi-CNN_MSI.


Subject(s)
Neural Networks, Computer , Mass Spectrometry
SELECTION OF CITATIONS
SEARCH DETAIL
...