RESUMEN
Post-translational modifications (PTMs) play key roles in regulating cell signaling and physiology in both normal and cancer cells. Advances in mass spectrometry enable high-throughput, accurate, and sensitive measurement of PTM levels to better understand their role, prevalence, and crosstalk. Here, we analyze the largest collection of proteogenomics data from 1,110 patients with PTM profiles across 11 cancer types (10 from the National Cancer Institute's Clinical Proteomic Tumor Analysis Consortium [CPTAC]). Our study reveals pan-cancer patterns of changes in protein acetylation and phosphorylation involved in hallmark cancer processes. These patterns revealed subsets of tumors, from different cancer types, including those with dysregulated DNA repair driven by phosphorylation, altered metabolic regulation associated with immune response driven by acetylation, affected kinase specificity by crosstalk between acetylation and phosphorylation, and modified histone regulation. Overall, this resource highlights the rich biology governed by PTMs and exposes potential new therapeutic avenues.
Asunto(s)
Neoplasias , Procesamiento Proteico-Postraduccional , Proteómica , Humanos , Acetilación , Histonas/metabolismo , Neoplasias/genética , Neoplasias/metabolismo , Fosforilación , Proteómica/métodosRESUMEN
Cancer driver events refer to key genetic aberrations that drive oncogenesis; however, their exact molecular mechanisms remain insufficiently understood. Here, our multi-omics pan-cancer analysis uncovers insights into the impacts of cancer drivers by identifying their significant cis-effects and distal trans-effects quantified at the RNA, protein, and phosphoprotein levels. Salient observations include the association of point mutations and copy-number alterations with the rewiring of protein interaction networks, and notably, most cancer genes converge toward similar molecular states denoted by sequence-based kinase activity profiles. A correlation between predicted neoantigen burden and measured T cell infiltration suggests potential vulnerabilities for immunotherapies. Patterns of cancer hallmarks vary by polygenic protein abundance ranging from uniform to heterogeneous. Overall, our work demonstrates the value of comprehensive proteogenomics in understanding the functional states of oncogenic drivers and their links to cancer development, surpassing the limitations of studying individual cancer types.
Asunto(s)
Neoplasias , Proteogenómica , Humanos , Neoplasias/genética , Oncogenes , Transformación Celular Neoplásica/genética , Variaciones en el Número de Copia de ADNRESUMEN
To provide a detailed analysis of the molecular components and underlying mechanisms associated with ovarian cancer, we performed a comprehensive mass-spectrometry-based proteomic characterization of 174 ovarian tumors previously analyzed by The Cancer Genome Atlas (TCGA), of which 169 were high-grade serous carcinomas (HGSCs). Integrating our proteomic measurements with the genomic data yielded a number of insights into disease, such as how different copy-number alternations influence the proteome, the proteins associated with chromosomal instability, the sets of signaling pathways that diverse genome rearrangements converge on, and the ones most associated with short overall survival. Specific protein acetylations associated with homologous recombination deficiency suggest a potential means for stratifying patients for therapy. In addition to providing a valuable resource, these findings provide a view of how the somatic genome drives the cancer proteome and associations between protein and post-translational modification levels and clinical outcomes in HGSC. VIDEO ABSTRACT.
Asunto(s)
Proteínas de Neoplasias/genética , Neoplasias Quísticas, Mucinosas y Serosas/genética , Neoplasias Ováricas/genética , Proteoma , Acetilación , Inestabilidad Cromosómica , Reparación del ADN , ADN de Neoplasias , Femenino , Dosificación de Gen , Humanos , Espectrometría de Masas , Fosfoproteínas/genética , Procesamiento Proteico-Postraduccional , Análisis de SupervivenciaRESUMEN
Single-cell proteomics is growing rapidly and has made several technological advancements. As most research has been focused on improving instrumentation and sample preparation methods, very little attention has been given to algorithms responsible for identifying and quantifying proteins. Given the inherent difference between bulk data and single-cell data, it is necessary to realize that current algorithms being employed on single-cell data were designed for bulk data and have underlying assumptions that may not hold true for single-cell data. In order to develop and optimize algorithms for single-cell data, we need to characterize the differences between single-cell data and bulk data and assess how current algorithms perform on single-cell data. Here, we present a review of algorithms responsible for identifying and quantifying peptides and proteins. We will give a review of how each type of algorithm works, assumptions it relies on, how it performs on single-cell data, and possible optimizations and solutions that could be used to address the differences in single-cell data.
Asunto(s)
Proteínas , Proteómica , Proteómica/métodos , Péptidos/química , AlgoritmosRESUMEN
Single-cell analysis is an active area of research in many fields of biology. Measurements at single-cell resolution allow researchers to study diverse populations without losing biologically meaningful information to sample averages. Many technologies have been used to study single cells, including mass spectrometry-based single-cell proteomics (SCP). SCP has seen a lot of growth over the past couple of years through improvements in data acquisition and analysis, leading to greater proteomic depth. Because method development has been the main focus in SCP, biological applications have been sprinkled in only as proof-of-concept. However, SCP methods now provide significant coverage of the proteome and have been implemented in many laboratories. Thus, a primary question to address in our community is whether the current state of technology is ready for widespread adoption for biological inquiry. In this Perspective, we examine the potential for SCP in three thematic areas of biological investigation: cell annotation, developmental trajectories, and spatial mapping. We identify that the primary limitation of SCP is sample throughput. As proteome depth has been the primary target for method development to date, we advocate for a change in focus to facilitate measuring tens of thousands of single-cell proteomes to enable biological applications beyond proof-of-concept.
RESUMEN
BACKGROUND: Human tear protein biomarkers are useful for detecting ocular and systemic diseases. Unfortunately, existing tear film sampling methods (Schirmer strip; SS and microcapillary tube; MCT) have significant drawbacks, such as pain, risk of injury, sampling difficulty, and proteomic disparities between methods. Here, we present an alternative tear protein sampling method using soft contact lenses (SCLs). RESULTS: We optimized the SCL protein sampling in vitro and performed in vivo studies in 6 subjects. Using Etafilcon A SCLs and 4M guanidine-HCl for protein removal, we sampled an average of 60 ± 31 µg of protein per eye. We also performed objective and subjective assessments of all sampling methods. Signs of irritation post-sampling were observed with SS but not with MCT and SCLs. Proteomic analysis by mass spectrometry (MS) revealed that all sampling methods resulted in the detection of abundant tear proteins. However, smaller subsets of unique and shared proteins were identified, particularly for SS and MCT. Additionally, there was no significant intrasubject variation between MCT and SCL sampling. CONCLUSIONS: These experiments demonstrate that SCLs are an accessible tear-sampling method with the potential to surpass current methods in sampling basal tears.
RESUMEN
Computer programming is a fundamental tool for life scientists, allowing them to carry out essential research tasks. However, despite various educational efforts, learning to write code can be a challenging endeavor for students and researchers in life-sciences disciplines. Recent advances in artificial intelligence have made it possible to translate human-language prompts to functional code, raising questions about whether these technologies can aid (or replace) life scientists' efforts to write code. Using 184 programming exercises from an introductory-bioinformatics course, we evaluated the extent to which one such tool-OpenAI's ChatGPT-could successfully complete programming tasks. ChatGPT solved 139 (75.5%) of the exercises on its first attempt. For the remaining exercises, we provided natural-language feedback to the model, prompting it to try different approaches. Within 7 or fewer attempts, ChatGPT solved 179 (97.3%) of the exercises. These findings have implications for life-sciences education and research. Instructors may need to adapt their pedagogical approaches and assessment techniques to account for these new capabilities that are available to the general public. For some programming tasks, researchers may be able to work in collaboration with machine-learning models to produce functional code.
RESUMEN
In recent years machine learning has made extensive progress in modeling many aspects of mass spectrometry data. We brought together proteomics data generators, repository managers, and machine learning experts in a workshop with the goals to evaluate and explore machine learning applications for realistic modeling of data from multidimensional mass spectrometry-based proteomics analysis of any sample or organism. Following this sample-to-data roadmap helped identify knowledge gaps and define needs. Being able to generate bespoke and realistic synthetic data has legitimate and important uses in system suitability, method development, and algorithm benchmarking, while also posing critical ethical questions. The interdisciplinary nature of the workshop informed discussions of what is currently possible and future opportunities and challenges. In the following perspective we summarize these discussions in the hope of conveying our excitement about the potential of machine learning in proteomics and to inspire future research.
Asunto(s)
Aprendizaje Automático , Proteómica , Proteómica/métodos , Algoritmos , Espectrometría de MasasRESUMEN
Recent developments in mass spectrometry-based single-cell proteomics (SCP) have resulted in dramatically improved sensitivity, yet the relatively low measurement throughput remains a limitation. Isobaric and isotopic labeling methods have been separately applied to SCP to increase throughput through multiplexing. Here we combined both forms of labeling to achieve multiplicative scaling for higher throughput. Two-plex stable isotope labeling of amino acids in cell culture (SILAC) and isobaric tandem mass tag (TMT) labeling enabled up to 28 single cells to be analyzed in a single liquid chromatography-mass spectrometry (LC-MS) analysis, in addition to carrier, reference, and negative control channels. A custom nested nanowell chip was used for nanoliter sample processing to minimize sample losses. Using a 145-min total LC-MS cycle time, â¼280 single cells were analyzed per day. This measurement throughput could be increased to â¼700 samples per day with a high-duty-cycle multicolumn LC system producing the same active gradient. The labeling efficiency and achievable proteome coverage were characterized for multiple analysis conditions.
Asunto(s)
Proteómica , Espectrometría de Masas en Tándem , Espectrometría de Masas en Tándem/métodos , Proteómica/métodos , Cromatografía Liquida/métodos , Proteoma/análisis , Marcaje IsotópicoRESUMEN
Single-cell measurements are uniquely capable of characterizing cell-to-cell heterogeneity and have been used to explore the large diversity of cell types and physiological functions present in tissues and other complex cell assemblies. An intriguing application of single-cell proteomics is the characterization of proteome dynamics during biological transitions, like cellular differentiation or disease progression. Time-course experiments, which regularly take measurements during state transitions, rely on the ability to detect dynamic trajectories in a data series. However, in a single-cell proteomics experiment, cell-to-cell heterogeneity complicates the confident identification of proteome dynamics as measurement variability may be higher than expected. Therefore, a critical question for these experiments is how many data points need to be acquired during the time course to enable robust statistical analysis. We present here an analysis of the most important variables that affect statistical confidence in the detection of proteome dynamics: fold change, measurement variability, and the number of cells measured during the time course. Importantly, we show that datasets with less than 16 measurements across the time domain suffer from low accuracy and also have a high false-positive rate. We also demonstrate how to balance competing demands in experimental design to achieve a desired result.
Asunto(s)
Proteómica/métodos , Animales , Línea Celular , Ratones , Tamaño de la Muestra , Análisis de la Célula IndividualRESUMEN
INTRODUCTION: Distal radius fractures (DRFs) are common fractures requiring surgical fixation. The literature varies regarding opioid prescribing habits, opioid consumption, and postoperative pain scores. We hypothesized that the preoperative administration of a liposomal bupivacaine (LB) supraclavicular nerve block would be safe and effective in controlling postoperative pain. METHODS: A standardized pain management protocol was implemented at a single institution from July 2021 to March 2022 for patients undergoing open reduction internal fixation of DRF. Protocol elements included a preoperative LB supraclavicular nerve block and a multimodal postoperative pain regimen. Primary clinical outcomes included postoperative pain scores and number of opioid tablets consumed. RESULTS: Twenty patients underwent a newly implemented protocol. The average age was 56 years. Mean number of oxycodone 5-mg tablets consumed was 4.1 (median, 2.5), and mean visual analog scale pain score at first postoperative appointment was 2.8. There were no incidences of missed acute carpal tunnel postoperatively. When compared with an institutional historical control (n = 189), number of opioid pills prescribed was reduced by 60% (21.4 vs 8.6 tablets, P < 0.0001), and no patients had unscheduled health care contact because of uncontrolled pain (22% vs 0%, P < 0.016). CONCLUSIONS: Liposomal bupivacaine supraclavicular nerve blocks are safe and effective in the treatment of postoperative pain after open reduction internal fixation of DRF. Patients consumed <5 oxycodone tablets on average, which is less than many recommend prescribed quantities (>20-30 tablets). Patients had low pain scores (2.8/10) at the first postoperative follow-up. To our knowledge, this is the first study demonstrating the utility of LB in this clinical setting.
Asunto(s)
Bloqueo Nervioso , Fracturas de la Muñeca , Humanos , Persona de Mediana Edad , Bupivacaína , Anestésicos Locales , Analgésicos Opioides/uso terapéutico , Manejo del Dolor/métodos , Oxicodona/uso terapéutico , Pautas de la Práctica en Medicina , Dolor Postoperatorio/tratamiento farmacológico , Dolor Postoperatorio/prevención & control , Bloqueo Nervioso/métodos , Liposomas/uso terapéuticoRESUMEN
BACKGROUND: Reduction mammaplasty is an effective and safe treatment option for adults with symptomatic macromastia, but there are few data regarding outcomes in adolescents. OBJECTIVES: The purpose of this study was to determine the short-term psychosocial impact, satisfaction, and safety of reduction mammaplasty when performed during adolescence. METHODS: A retrospective review was performed of a single pediatric plastic surgeon's experience with reduction mammaplasty from 2018 to 2021 in patients aged ≤18 years. Patients completed the preoperative and postoperative "Satisfaction with Breasts" and "Psychosocial Well-being" sections of the BREAST-Q survey. Clinical variables gathered included age, weight, BMI, complication profile, specimen resection weight, and follow-up duration. RESULTS: In total, 41 patients met inclusion criteria. The mean converted Rasch scores for BREAST-Q "Satisfaction with Breasts" and "Psychosocial Well-being" increased significantly following reduction mammaplasty ("Satisfaction with Breasts": preoperative, 24.1 vs postoperative, 92.6; "Psychosocial Well-being": preoperative, 37.7 vs postoperative, 90.4; P < .001). Obesity (BMI ≥ 30 kg/m2) was associated with lower preoperative "Psychosocial Well-being" scores (obese, 29.7 vs nonobese, 43.3; P < .001) but a greater improvement in score following surgery (obese, +63.9 vs nonobese, +44.9; P < .001). Specimen weight ≥1000 grams was also associated with greater improvement in score on the "Psychosocial Well-being" section (≥1000 grams, +58 vs <1000 grams, +49.7; P = .046). Overall complication rate was 31.7% while the major complication rate was 2.4%. Mean specimen resection weight was higher in patients who experienced complications (1141.3 grams vs 836.8 grams, P = .008). CONCLUSIONS: Reduction mammaplasty during adolescence predictably improves both short-term satisfaction with breasts and psychosocial well-being while demonstrating a favorable short-term complication profile.
Asunto(s)
Mamoplastia , Satisfacción del Paciente , Adulto , Femenino , Adolescente , Humanos , Niño , Mamoplastia/efectos adversos , Mamoplastia/psicología , Mama/cirugía , Hipertrofia/cirugía , Hipertrofia/psicología , Estudios Retrospectivos , Obesidad/cirugía , Resultado del TratamientoRESUMEN
We combined efficient sample preparation and ultra-low-flow liquid chromatography with a newly developed data acquisition and analysis scheme termed wide window acquisition (WWA) to quantify >3,000 proteins from single cells in rapid label-free analyses. WWA employs large isolation windows to intentionally co-isolate and co-fragment adjacent precursors along with the selected precursor. Optimized WWA increased the number of MS2-identified proteins by ≈40 % relative to standard data-dependent acquisition. For a 40-min LC gradient operated at ≈15 nL/min, we identified an average of 3,524 proteins per single-cell-sized aliquot of protein digest. Reducing the active gradient to 20â min resulted in a modest 10 % decrease in proteome coverage. Using this platform, we compared protein expression between single HeLa cells having an essential autophagy gene, atg9a, knocked out, with their isogenic WT parental line. Similar proteome coverage was observed, and 268 proteins were significantly up- or downregulated. Protein upregulation primarily related to innate immunity, vesicle trafficking and protein degradation.
Asunto(s)
Proteoma , Proteómica , Humanos , Proteoma/análisis , Células HeLa , Proteómica/métodos , Cromatografía Liquida/métodosRESUMEN
The goal of proteomics is to identify and quantify the complete set of proteins in a biological sample. Single-cell proteomics specializes in the identification and quantitation of proteins for individual cells, often used to elucidate cellular heterogeneity. The significant reduction in ions introduced into the mass spectrometer for single-cell samples could impact the features of MS2 fragmentation spectra. As all peptide identification software tools have been developed on spectra from bulk samples and the associated ion-rich spectra, the potential for spectral features to change is of great interest. We characterize the differences between single-cell spectra and bulk spectra by examining three fundamental spectral features that are likely to affect peptide identification performance. All features show significant changes in single-cell spectra, including the loss of annotated fragment ions, blurring signal and background peaks due to diminishing ion intensity, and distinct fragmentation pattern, compared to bulk spectra. As each of these features is a foundational part of peptide identification algorithms, it is critical to adjust algorithms to compensate for these losses.
Asunto(s)
Proteómica , Espectrometría de Masas en Tándem , Algoritmos , Péptidos/química , Programas InformáticosRESUMEN
Metaproteomics has been increasingly utilized for high-throughput characterization of proteins in complex environments and has been demonstrated to provide insights into microbial composition and functional roles. However, significant challenges remain in metaproteomic data analysis, including creation of a sample-specific protein sequence database. A well-matched database is a requirement for successful metaproteomics analysis, and the accuracy and sensitivity of PSM identification algorithms suffer when the database is incomplete or contains extraneous sequences. When matched DNA sequencing data of the sample is unavailable or incomplete, creating the proteome database that accurately represents the organisms in the sample is a challenge. Here, we leverage a de novo peptide sequencing approach to identify the sample composition directly from metaproteomic data. First, we created a deep learning model, Kaiko, to predict the peptide sequences from mass spectrometry data and trained it on 5 million peptide-spectrum matches from 55 phylogenetically diverse bacteria. After training, Kaiko successfully identified organisms from soil isolates and synthetic communities directly from proteomics data. Finally, we created a pipeline for metaproteome database generation using Kaiko. We tested the pipeline on native soils collected in Kansas, showing that the de novo sequencing model can be employed as an alternative and complementary method to construct the sample-specific protein database instead of relying on (un)matched metagenomes. Our pipeline identified all highly abundant taxa from 16S rRNA sequencing of the soil samples and uncovered several additional species which were strongly represented only in proteomic data.
Asunto(s)
Microbiota , Proteómica , Microbiota/genética , Péptidos/análisis , Péptidos/genética , Proteoma/genética , Proteómica/métodos , ARN Ribosómico 16S/genética , SueloRESUMEN
The ability to improve the data quality of ion mobility-mass spectrometry (IM-MS) measurements is of great importance for enabling modular and efficient computational workflows and gaining better qualitative and quantitative insights from complex biological and environmental samples. We developed the PNNL PreProcessor, a standalone and user-friendly software housing various algorithmic implementations to generate new MS-files with enhanced signal quality and in the same instrument format. Different experimental approaches are supported for IM-MS based on Drift-Tube (DT) and Structures for Lossless Ion Manipulations (SLIM), including liquid chromatography (LC) and infusion analyses. The algorithms extend the dynamic range of the detection system, while reducing file sizes for faster and memory-efficient downstream processing. Specifically, multidimensional smoothing improves peak shapes of poorly defined low-abundance signals, and saturation repair reconstructs the intensity profile of high-abundance peaks from various analyte types. Other functionalities are data compression and interpolation, IM demultiplexing, noise filtering by low intensity threshold and spike removal, and exporting of acquisition metadata. Several advantages of the tool are illustrated, including an increase of 19.4% in lipid annotations and a two-times faster processing of LC-DT IM-MS data-independent acquisition spectra from a complex lipid extract of a standard human plasma sample. The software is freely available at https://omics.pnl.gov/software/pnnl-preprocessor.
Asunto(s)
Espectrometría de Movilidad Iónica , Lípidos , Cromatografía Liquida/métodos , Humanos , Espectrometría de Movilidad Iónica/métodos , Iones , Espectrometría de Masas/métodos , Flujo de TrabajoRESUMEN
MOTIVATION: Ion mobility spectrometry (IMS) separations are increasingly used in conjunction with mass spectrometry (MS) for separation and characterization of ionized molecular species. Information obtained from IMS measurements includes the ion's collision cross section (CCS), which reflects its size and structure and constitutes a descriptor for distinguishing similar species in mixtures that cannot be separated using conventional approaches. Incorporating CCS into MS-based workflows can improve the specificity and confidence of molecular identification. At present, there is no automated, open-source pipeline for determining CCS of analyte ions in both targeted and untargeted fashion, and intensive user-assisted processing with vendor software and manual evaluation is often required. RESULTS: We present AutoCCS, an open-source software to rapidly determine CCS values from IMS-MS measurements. We conducted various IMS experiments in different formats to demonstrate the flexibility of AutoCCS for automated CCS calculation: (i) stepped-field methods for drift tube-based IMS (DTIMS), (ii) single-field methods for DTIMS (supporting two calibration methods: a standard and a new enhanced method) and (iii) linear calibration for Bruker timsTOF and non-linear calibration methods for traveling wave based-IMS in Waters Synapt and Structures for Lossless Ion Manipulations. We demonstrated that AutoCCS offers an accurate and reproducible determination of CCS for both standard and unknown analyte ions in various IMS-MS platforms, IMS-field methods, ionization modes and collision gases, without requiring manual processing. AVAILABILITY AND IMPLEMENTATION: https://github.com/PNNL-Comp-Mass-Spec/AutoCCS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Demo datasets are publicly available at MassIVE (Dataset ID: MSV000085979).
Asunto(s)
Espectrometría de Movilidad Iónica , Programas Informáticos , Espectrometría de Masas/métodos , IonesRESUMEN
BACKGROUND: Complications of implant-based reconstruction have been shown to be related to increasing body mass index (BMI) and breast size. The impact of skin reducing mastectomy (SRM) with a dermal flap is examined. METHODS: A retrospective review of a single surgeon's experience with immediate submuscular tissue expander (TE) reconstruction from 2011 to 2019 was performed. The outcomes of SRM were compared with those of skin sparing mastectomy (SSM). RESULTS: A total of 162 patients (292 breasts) were identified. Mastectomy types were as follows: SRM, 73 (136 breasts) and SSM, 89 (156 breasts). Acellular dermal matrix (ADM) was used to supplement TE coverage in 65.4% of SRM cases. Mean BMI was 29.2 among SRM patients and 25.9 in SSM patients (P < 0.001). Obesity (BMI ≥ 30) was more prevalent in the SRM group (SRM, 38.4% vs SSM, 22.5%; P = 0.03). Mean mastectomy weight was higher in the SRM group (SRM, 833.6 g vs SSM, 425.6 g; P < 0.001). Mean BMI and mastectomy weight were lower in SRM patients who were reconstructed with ADM (ADM, 28.1 vs no ADM, 30.8; P = 0.01; ADM, 746.1 g vs no ADM, 1006.3 g; P < 0.001). Minor complications were more prevalent in the SRM group (SRM, 22.8% vs SSM, 4.5%; P < 0.001). Mastectomy skin flap necrosis (MSFN) was more common in the SRM group (SRM, 22.8% vs SSM, 7.7%; P < 0.001), but MSFN necessitating operative debridement was similarly low in both groups (SRM: 1.9% vs SSM: 4.5%). Major complication rates (SRM 11.0% vs SSM 10.9%) and reconstructive failure rates (SRM 5.9% vs SSM 5.1%) were similar between groups. Mastectomy weight 800 g or higher and BMI of 30 or higher were found to be risk factors for complications on analysis of the SRM cohort (P < 0.05). CONCLUSIONS: Mastectomy weight and BMI were positive predictors of complications after immediate TE reconstruction. Mastectomy skin flap necrosis is more common after SRM than SSM. The use of SRM with a dermal flap has a similar major complication rate as SSM despite its use in obese, large-breasted women. The dermal flap provides soft tissue coverage, which prevents implant exposure and seroma. The use of ADM does not adversely affect the complication rate of SRM.
Asunto(s)
Dermis Acelular , Implantación de Mama , Neoplasias de la Mama , Mamoplastia , Dermis Acelular/efectos adversos , Implantación de Mama/efectos adversos , Neoplasias de la Mama/complicaciones , Femenino , Humanos , Mamoplastia/efectos adversos , Mastectomía/efectos adversos , Necrosis/etiología , Complicaciones Posoperatorias/epidemiología , Complicaciones Posoperatorias/etiología , Complicaciones Posoperatorias/prevención & control , Estudios Retrospectivos , Dispositivos de Expansión Tisular/efectos adversosRESUMEN
Comprehensive cancer data sets recently generated by the Clinical Proteomic Tumor Analysis Consortium (CPTAC) offer great potential for advancing our understanding of how to combat cancer. These data sets include DNA, RNA, protein, and clinical characterization for tumor and normal samples from large cohorts of many different cancer types. The raw data are publicly available at various Cancer Research Data Commons. However, widespread reuse of these data sets is also facilitated by easy access to the processed quantitative data tables. We have created a data application programming interface (API) to distribute these processed tables, implemented as a Python package called cptac. We implement it such that users who prefer to work in R can easily use our package for data access and then transfer the data into R for analysis. Our package distributes the finalized processed CPTAC data sets in a consistent, up-to-date format. This consistency makes it easy to integrate the data with common graphing, statistical, and machine-learning packages for advanced analysis. Additionally, consistent formatting across all cancer types promotes the investigation of pan-cancer trends. The data API structure of directly streaming data within a programming environment enhances the reproducibility. Finally, with the accompanying tutorials, this package provides a novel resource for cancer research education. View the software documentation at https://paynelab.github.io/cptac/. View the GitHub repository at https://github.com/PayneLab/cptac.
Asunto(s)
Neoplasias , Proteogenómica , Humanos , Neoplasias/genética , Proteómica , Reproducibilidad de los Resultados , Programas InformáticosRESUMEN
Recent advances in sample preparation and analysis have enabled direct profiling of protein expression in single mammalian cells and other trace samples. Several techniques to prepare and analyze low-input samples employ custom fluidics for nanoliter sample processing and manual sample injection onto a specialized separation column. While being effective, these highly specialized systems require significant expertise to fabricate and operate, which has greatly limited implementation in most proteomic laboratories. Here, we report a fully automated platform termed autoPOTS (automated preparation in one pot for trace samples) that uses only commercially available instrumentation for sample processing and analysis. An unmodified, low-cost commercial robotic pipetting platform was utilized for one-pot sample preparation. We used low-volume 384-well plates and periodically added water or buffer to the microwells to compensate for limited evaporation during sample incubation. Prepared samples were analyzed directly from the well plate with a commercial autosampler that was modified with a 10-port valve for compatibility with 30 µm i.d. nanoLC columns. We used autoPOTS to analyze 1-500 HeLa cells and observed only a moderate reduction in peptide coverage for 150 cells and a 24% reduction in coverage for single cells compared to our previously developed nanoPOTS platform. To evaluate clinical feasibility, we identified an average of 1095 protein groups from â¼130 sorted B or T lymphocytes. We anticipate that the straightforward implementation of autoPOTS will make it an attractive option for low-input and single-cell proteomics in many laboratories.