RESUMEN
Hepatocellular carcinoma (HCC) is a leading cause of cancer-related deaths worldwide, mainly associated with liver cirrhosis. Current diagnostic methods for HCC have limited sensitivity and specificity, highlighting the need for improved early detection and intervention. In this study, we used a comprehensive approach involving endogenous peptidome along with bioinformatics analysis to identify and evaluate potential biomarkers for HCC. Serum samples from 40 subjects, comprising 20 HCC cases and 20 patients with liver cirrhosis (CIRR), were analyzed. Among 2568 endogenous peptides, 67 showed significant differential expression between the HCC vs CIRR. Further analysis revealed three endogenous peptides (VMHEALHNHYTQKSLSLSPG, NRFTQKSLSLSPG, and SARQSTLDKEL) that showed far better performance compared to AFP in terms of area under the receiver operating characteristic curve (AUC), showcasing their potential as biomarkers for HCC. Additionally, endogenous peptide IAVEWESNGQPENNYKT that belongs to the precursor protein Immunoglobulin heavy constant gamma 4 was detected in 100% of the HCC group and completely absent in the CIRR group, suggesting a promising diagnostic biomarker. Gene ontology and pathway analysis revealed the potential involvement of these dysregulated peptides in HCC. These findings provide valuable insights into the molecular basis of HCC and may contribute to the development of improved diagnostic methods and therapeutic targets for HCC.
Asunto(s)
Biomarcadores de Tumor , Carcinoma Hepatocelular , Cirrosis Hepática , Neoplasias Hepáticas , Péptidos , Humanos , Carcinoma Hepatocelular/sangre , Carcinoma Hepatocelular/diagnóstico , Neoplasias Hepáticas/sangre , Neoplasias Hepáticas/diagnóstico , Cirrosis Hepática/sangre , Cirrosis Hepática/diagnóstico , Biomarcadores de Tumor/sangre , Péptidos/sangre , Masculino , Femenino , Curva ROC , Persona de Mediana Edad , Biología Computacional , Proteómica/métodos , alfa-Fetoproteínas/análisis , alfa-Fetoproteínas/metabolismoRESUMEN
Hepatocellular carcinoma (HCC) has been an approved indication for the administration of immunotherapy since 2017, but biomarkers that predict therapeutic response have remained limited. Understanding and characterizing the tumor immune microenvironment enables better classification of these tumors and may reveal biomarkers that predict immunotherapeutic efficacy. In this paper, we applied a cell-type deconvolution algorithm using DNA methylation array data to investigate the composition of the tumor microenvironment in HCC. Using publicly available and in-house datasets with a total cohort size of 57 patients, each with tumor and matched normal tissue samples, we identified key differences in immune cell composition. We found that NK cell abundance was significantly decreased in HCC tumors compared to adjacent normal tissue. We also applied DNA methylation "clocks" which estimate phenotypic aging and compared these findings to expression-based determinations of cellular senescence. Senescence and epigenetic aging were significantly increased in HCC tumors, and the degree of age acceleration and senescence was strongly associated with decreased NK cell abundance. In summary, we found that NK cell infiltration in the tumor microenvironment is significantly diminished, and that this loss of NK abundance is strongly associated with increased senescence and age-related phenotype. These findings point to key interactions between NK cells and the senescent tumor microenvironment and offer insights into the pathogenesis of HCC as well as potential biomarkers of therapeutic efficacy.
Asunto(s)
Carcinoma Hepatocelular , Neoplasias Hepáticas , Humanos , Carcinoma Hepatocelular/genética , Neoplasias Hepáticas/genética , Metilación de ADN/genética , Microambiente Tumoral/genética , Senescencia Celular/genética , Biomarcadores de Tumor/genéticaRESUMEN
Human serum N-linked glycans expression levels change during the disease progression. The low abundance, structural diversity, and coexisting matrices hinder their detection in mass spectrometry analysis. Considering the hydrophilic nature of N-glycans, cellulose/polymer (1,2-Epoxy-5-hexene) nanohybrid is fabricated with oxirane groups functionalized of asparagine to develop solid phase extraction based hydrophilic interaction liquid chromatography sorbent (cellulose/1,2-Epoxy-5-hexene/asparagine). The morphology, elemental analysis, and surface properties are studied through scanning electron microscopy, energy dispersive X-ray spectroscopy, and Fourier-transform infrared spectroscopy. The large surface area of cellulose/polymer nanohybrid (2.09 × 102 m2 /g) facilitates the high density of asparagine immobilization resulting in better hydrophilic interaction liquid chromatography enrichment under optimized conditions. The enrichment capability of nanohybrid/asparagine is assessed by the N-Linked glycans released from ovalbumin and immunoglobulin G where 23 and 13 N-glycans are detected respectively. The nanohybrid/asparagine shows selectivity of 1:1200 with spiked bovine serum albumin and sensitivity down to 100 attomole. Human serum profiling for N-glycans identifies 52 glycan structures. This new enrichment strategy enriches serum N-linked glycans in the presence of salts, proteins, endogenous serum peptides, and so forth.
Asunto(s)
Celulosa , Polímeros , Humanos , AsparaginaRESUMEN
A new polymeric (methyl methacrylate/ethylene glycol dimethacrylate/1,2-epoxy-5-hexene) base/matrix has been fabricated and decorated with zwitterionic hydrophilic cysteic acid (Cya) for the enrichment of intact N-glycopeptides from standards and biological samples. Terpolymer-Cya provides good enrichment efficiency, improved hydrophilicity, and selectivity by virtue of better surface area (2.09 × 102 m2/g) provided by terpolymer and the zwitterionic property offered by cysteic acid. Cysteic acid-functionalized polymeric hydrophilic interaction liquid chromatography (HILIC) sorbent enriches 35 and 24 N-linked glycopeptides via SPE (solid phase extraction) mode from tryptic digests of model glycoproteins, i.e., immunoglobulin G (IgG) and horseradish peroxidase (HRP), respectively. Zwitterionic chemistry of cysteine helps in achieving higher selectivity with BSA digest (1:200), and lower detection limit down to 100 attomoles with a complete glycosylation profile of each standard digest. The recovery of 81% and good reproducibility define the application of terpolymer-Cya for complex samples like a serum. Analysis of human serum provides a profile of 807 intact N-linked glycopeptides via nano-liquid chromatography-tandem mass spectrometry (nLC-MS/MS). To the best of our knowledge, this is the highest number of glycopeptides enriched by any HILIC sorbent. Selected glycoproteins are evaluated in link to various cancers including the breast, lung, uterine, and melanoma using single-nucleotide variances (BioMuta). This study represents the complete idea of using an in-house developed strategy as a successful tool to help analyze, relate, and answer glycoprotein-based clinical issues regarding cancers.
Asunto(s)
Ácido Cisteico , Glicopéptidos , Glicopéptidos/análisis , Glicoproteínas , Humanos , Interacciones Hidrofóbicas e Hidrofílicas , Reproducibilidad de los Resultados , Espectrometría de Masas en TándemRESUMEN
A three-step strategy is introduced to develop inherent iminodiacetic (IDA)-functionalized nanopolymer. SEM micrographs show homogenous spherical beads with a particle size of 500 nm. Further modification to COOH-functionalized 1,2-epoxy-5-hexene/DVB mesoporous nanopolymer enriches glycopeptides via hydrophilic interactions followed by their MS determination. Significantly high BET surface area 433.4336 m2 g-1 contributes to the improved surface hydrophilicity which is also shown by high concentration of ionizable carboxylic acids, 14.59 ± 0.25 mmol g-1. Measured surface area is the highest among DVB-based polymers and in general much higher in comparison to the previously reported BET surface areas of co-polymers, terpolymers, MOFs, and graphene-based composites. Thirty-one, 19, and 16 N-glycopeptides are enriched/identified by nanopolymer beads from tryptic digests of immunoglobulin G, horseradish peroxidase, and chicken avidin, respectively, without additional desalting steps. Material exhibits high selectivity (1:400 IgG:BSA), sensitivity (down to 0.1 fmol), regeneration ability up to three cycles, and batch-to-batch reproducibility (RSD > 1%). Furthermore, from 1 µL of digested human serum, 343 N-glycopeptide characteristics of 134 glycoproteins including 30 FDA-approved serum biomarkers are identified via nano-LC-MS/MS. The developed strategy to self-generate IDA on polymeric surface with improved surface area, porosity, and ordered morphology is insignia of its potential as chromatographic tool contributing to future developments in large-scale biomedical glycoproteomics studies.
Asunto(s)
Glicopéptidos/química , Iminoácidos/química , Nanoestructuras/química , Polímeros/química , Humanos , Interacciones Hidrofóbicas e Hidrofílicas , Microscopía Electrónica de Rastreo , Nanoestructuras/ultraestructura , Porosidad , Propiedades de SuperficieRESUMEN
INTRODUCTION: Metabolite annotation is a critical and challenging step in mass spectrometry-based metabolomic profiling. In a typical untargeted MS/MS-based metabolomic study, experimental MS/MS spectra are matched against those in spectral libraries for metabolite annotation. Yet, existing spectral libraries comprise merely a marginal percentage of known compounds. OBJECTIVE: The objective is to develop a method that helps rank putative metabolite IDs for analytes whose reference MS/MS spectra are not present in spectral libraries. METHODS: We introduce MetFID, which uses an artificial neural network (ANN) trained for predicting molecular fingerprints based on experimental MS/MS data. To narrow the search space, MetFID retrieves candidates from metabolite databases using molecular formula or m/z value of the precursor ions of the analytes. The candidate whose fingerprint is most analogous to the predicted fingerprint is used for metabolite annotation. A comprehensive evaluation was performed by training MetFID using MS/MS spectra from the MoNA repository and NIST library and by testing with structure-disjoint MS/MS spectra from the NIST library, the CASMI 2016 dataset, and in-house MS/MS data from a cancer biomarker discovery study. RESULTS: We observed that training separate models for distinct ranges of collision energies enhanced model performance compared to a single model that covers a wide range of collision energies. Using MetaboQuest to retrieve candidates, MetFID prioritized the correct putative ID in the first place rank for about 50% of the testing cases. Through the independent testing dataset, we demonstrated that MetFID has the potential to improve the accuracy of ranking putative metabolite IDs by more than 5% compared to other tools such as ChemDistiller, CSI:FingerID, and MetFrag. CONCLUSION: MetFID offers a promising opportunity to enhance the accuracy of metabolite annotation by using ANN for molecular fingerprint prediction.
Asunto(s)
Metabolómica/métodos , Algoritmos , Bases de Datos Factuales/normas , Humanos , Redes Neurales de la Computación , Estándares de Referencia , Valores de Referencia , Programas Informáticos , Espectrometría de Masa por Ionización de Electrospray/métodos , Espectrometría de Masas en Tándem/métodosRESUMEN
A long-term hepatocyte culture maintaining liver-specific functions is very essential for both basic research and the development of bioartificial liver devices in clinical application. However, primary hepatocytes rapidly lose their proliferation and hepatic functions over a few days in culture. This work is to establish an ornithine transcarbamylase deficiency (OTCD) patient-derived primary human hepatocyte (OTCD-PHH) culture with hepatic functions for providing an in vitro cell model. Liver tissue from an infant with OTCD was dispersed into single cells. The cells were cultured using conditional reprogramming. To characterize the cells, we assessed activities and mRNA expression of CYP3A4, 1A1, 2C9, as well as albumin and urea secretion. We found that the OTCD-PHH can be subpassaged for more than 15 passages. The cells do not express mRNA of fibroblast-specific maker, whereas they highly express markers of epithelial cells and hepatocytes. In addition, the OTCD-PHH retain native CYP3A4, 1A1, 2C9 activities and albumin secretion function at early passages. The OTCD-PHH at passages 2, 6, 9 and 13 have identical DNA fingerprint as the original tissue. Furthermore, under 3D culture environment, low urea production and hepatocyte marker staining of the OTCD-PHH were detected. The established OTCD-PHH maintain liver-specific functions at early passages and can be long-term cultured in vitro. We believe the established long-term OTCD-PHH culture is highly relevant to study liver diseases, particularly in infants with OTCD.
Asunto(s)
Hepatocitos/patología , Hepatopatías/patología , Hígado/patología , Enfermedad por Deficiencia de Ornitina Carbamoiltransferasa/patología , Células 3T3 , Animales , Línea Celular , Línea Celular Tumoral , Citocromo P-450 CYP1A1/metabolismo , Citocromo P-450 CYP3A/metabolismo , Células Epiteliales/metabolismo , Células Hep G2 , Hepatocitos/metabolismo , Humanos , Lactante , Hígado/metabolismo , Hepatopatías/metabolismo , Masculino , Ratones , Enfermedad por Deficiencia de Ornitina Carbamoiltransferasa/metabolismo , ARN Mensajero/metabolismoRESUMEN
Hepatocellular carcinoma (HCC) causes more than half a million annual deaths worldwide. Understanding the mechanisms contributing to HCC development is highly desirable for improved surveillance, diagnosis, and treatment. Liver tissue metabolomics has the potential to reflect the physiological changes behind HCC development. Also, it allows identification of biomarker candidates for future evaluation in biofluids and investigation of racial disparities in HCC. Tumor and nontumor tissues from 40 patients were analyzed by both gas chromatography-mass spectrometry (GC-MS) and liquid chromatography-mass spectrometry (LC-MS) platforms to increase the metabolome coverage. The levels of the metabolites extracted from solid liver tissue of the HCC area and adjacent non-HCC area were compared. Among the analytes detected by GC-MS and LC-MS with significant alterations, 18 were selected based on biological relevance and confirmed metabolite identification. These metabolites belong to TCA cycle, glycolysis, purines, and lipid metabolism and have been previously reported in liver metabolomic studies where high correlation with HCC progression is implied. We demonstrated that metabolites related to HCC pathogenesis can be identified through liver tissue metabolomic analysis. Additionally, this study has enabled us to identify race-specific metabolites associated with HCC.
Asunto(s)
Carcinoma Hepatocelular/metabolismo , Neoplasias Hepáticas/metabolismo , Metaboloma/genética , Metabolómica , Biomarcadores de Tumor/genética , Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/patología , Femenino , Cromatografía de Gases y Espectrometría de Masas , Regulación Neoplásica de la Expresión Génica/genética , Humanos , Metabolismo de los Lípidos/genética , Hígado/metabolismo , Hígado/patología , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/patología , Masculino , Persona de Mediana EdadRESUMEN
In this paper, we introduce a novel computational method for constructing protein networks based on reverse phase protein array (RPPA) data to identify complex patterns in protein signaling. The method is applied to phosphoproteomic profiles of basal expression and activation/phosphorylation of 76 key signaling proteins in three breast cancer cell lines (MCF7, LCC1, and LCC9). Temporal RPPA data are acquired at 48h, 96h, and 144h after knocking down four genes in separate experiments. These genes are selected from a previous study as important determinants for breast cancer survival. Interaction networks are constructed by analyzing the expression levels of protein pairs using a multivariate analysis of variance model. A new scoring criterion is introduced to determine relevant protein pairs. Through a network topology based analysis, we search for wiring patterns to identify key proteins that are associated with significant changes in expression levels across various experimental conditions.
Asunto(s)
Neoplasias de la Mama/genética , Regulación Neoplásica de la Expresión Génica , Redes Reguladoras de Genes , Proteínas de Neoplasias/genética , Análisis por Matrices de Proteínas/estadística & datos numéricos , Procesamiento Proteico-Postraduccional , ATPasas Asociadas con Actividades Celulares Diversas/antagonistas & inhibidores , ATPasas Asociadas con Actividades Celulares Diversas/genética , ATPasas Asociadas con Actividades Celulares Diversas/metabolismo , Neoplasias de la Mama/metabolismo , Neoplasias de la Mama/patología , Línea Celular Tumoral , Proteína 61 Rica en Cisteína/antagonistas & inhibidores , Proteína 61 Rica en Cisteína/genética , Proteína 61 Rica en Cisteína/metabolismo , Femenino , Humanos , Péptidos y Proteínas de Señalización Intracelular/antagonistas & inhibidores , Péptidos y Proteínas de Señalización Intracelular/genética , Péptidos y Proteínas de Señalización Intracelular/metabolismo , Células MCF-7 , Análisis Multivariante , Proteínas de Neoplasias/antagonistas & inhibidores , Proteínas de Neoplasias/metabolismo , Fosforilación , Complejo de la Endopetidasa Proteasomal/genética , Complejo de la Endopetidasa Proteasomal/metabolismo , ARN Polimerasa II/antagonistas & inhibidores , ARN Polimerasa II/genética , ARN Polimerasa II/metabolismo , ARN Interferente Pequeño/genética , ARN Interferente Pequeño/metabolismo , Transducción de Señal , Proteínas Supresoras de Tumor/antagonistas & inhibidores , Proteínas Supresoras de Tumor/genética , Proteínas Supresoras de Tumor/metabolismoRESUMEN
BACKGROUND: Conventional differential gene expression analysis by methods such as student's t-test, SAM, and Empirical Bayes often searches for statistically significant genes without considering the interactions among them. Network-based approaches provide a natural way to study these interactions and to investigate the rewiring interactions in disease versus control groups. In this paper, we apply weighted graphical LASSO (wgLASSO) algorithm to integrate a data-driven network model with prior biological knowledge (i.e., protein-protein interactions) for biological network inference. We propose a novel differentially weighted graphical LASSO (dwgLASSO) algorithm that builds group-specific networks and perform network-based differential gene expression analysis to select biomarker candidates by considering their topological differences between the groups. RESULTS: Through simulation, we showed that wgLASSO can achieve better performance in building biologically relevant networks than purely data-driven models (e.g., neighbor selection, graphical LASSO), even when only a moderate level of information is available as prior biological knowledge. We evaluated the performance of dwgLASSO for survival time prediction using two microarray breast cancer datasets previously reported by Bild et al. and van de Vijver et al. Compared with the top 10 significant genes selected by conventional differential gene expression analysis method, the top 10 significant genes selected by dwgLASSO in the dataset from Bild et al. led to a significantly improved survival time prediction in the independent dataset from van de Vijver et al. Among the 10 genes selected by dwgLASSO, UBE2S, SALL2, XBP1 and KIAA0922 have been confirmed by literature survey to be highly relevant in breast cancer biomarker discovery study. Additionally, we tested dwgLASSO on TCGA RNA-seq data acquired from patients with hepatocellular carcinoma (HCC) on tumors samples and their corresponding non-tumorous liver tissues. Improved sensitivity, specificity and area under curve (AUC) were observed when comparing dwgLASSO with conventional differential gene expression analysis method. CONCLUSIONS: The proposed network-based differential gene expression analysis algorithm dwgLASSO can achieve better performance than conventional differential gene expression analysis methods by integrating information at both gene expression and network topology levels. The incorporation of prior biological knowledge can lead to the identification of biologically meaningful genes in cancer biomarker studies.
Asunto(s)
Algoritmos , Perfilación de la Expresión Génica/métodos , Redes Reguladoras de Genes/genética , Área Bajo la Curva , Biomarcadores/metabolismo , Neoplasias de la Mama/diagnóstico , Neoplasias de la Mama/genética , Neoplasias de la Mama/patología , Carcinoma Hepatocelular/diagnóstico , Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/patología , Femenino , Humanos , Neoplasias Hepáticas/diagnóstico , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/patología , ARN/química , ARN/aislamiento & purificación , ARN/metabolismo , Curva ROC , Análisis de Secuencia de ARNRESUMEN
Smoking-related biomarkers for lung cancer and other diseases are needed to enhance early detection strategies and to provide a science base for tobacco product regulation. An untargeted metabolomics approach by ultra-performance liquid chromatography-quadrupole-time of flight mass spectrometry (UHPLC-Q-TOF MS) totaling 957 assays was used in a novel experimental design where 105 current smokers smoked two cigarettes 1 h apart. Blood was collected immediately before and after each cigarette allowing for within-subject replication. Dynamic changes of the metabolomic profiles from smokers' four blood samples were observed and biomarkers affected by cigarette smoking were identified. Thirty-one metabolites were definitively shown to be affected by acute effect of cigarette smoking, uniquely including menthol-glucuronide, the reduction of glutamate, oleamide, and 13 glycerophospholipids. This first time identification of a menthol metabolite in smokers' blood serves as proof-of-principle for using metabolomics to identify new tobacco-exposure biomarkers, and also provides new opportunities in studying menthol-containing tobacco products in humans. Gender and race differences also were observed. Network analysis revealed 12 molecules involved in cancer, notably inhibition of cAMP. These novel tobacco-related biomarkers provide new insights to the effects of smoking which may be important in carcinogenesis but not previously linked with tobacco-related diseases. © 2016 Wiley Periodicals, Inc.
Asunto(s)
Glucuronatos/sangre , Mentol/análogos & derivados , Metaboloma , Fumar/sangre , Adolescente , Adulto , Anciano , Biomarcadores/sangre , Biomarcadores/metabolismo , Femenino , Glucuronatos/metabolismo , Humanos , Masculino , Mentol/sangre , Mentol/metabolismo , Metabolómica , Persona de Mediana Edad , Fumar/metabolismo , Adulto JovenRESUMEN
Differential expression (DE) analysis is commonly used to identify biomarker candidates that have significant changes in their expression levels between distinct biological groups. One drawback of DE analysis is that it only considers the changes on single biomolecule level. Recently, differential network (DN) analysis has become popular due to its capability to measure the changes on biomolecular pair level. In DN analysis, network is typically built based on correlation and biomarker candidates are selected by investigating the network topology. However, correlation tends to generate over-complicated networks and the selection of biomarker candidates purely based on network topology ignores the changes on single biomolecule level. In this paper, we propose a novel approach, INDEED, that builds sparse differential network based on partial correlation and integrates DE and DN analyses for biomarker discovery. We applied this approach on real proteomic and glycomic data generated by liquid chromatography coupled with mass spectrometry for hepatocellular carcinoma (HCC) biomarker discovery study. For each omic data, we used one dataset to select biomarker candidates, built a disease classifier and evaluated the performance of the classifier on an independent dataset. The biomarker candidates, selected by INDEED, were more reproducible across independent datasets, and led to a higher classification accuracy in predicting HCC cases and cirrhotic controls compared with those selected by separate DE and DN analyses. INDEED also identified some candidates previously reported to be relevant to HCC, such as intercellular adhesion molecule 2 (ICAM2) and c4b-binding protein alpha chain (C4BPA), which were missed by both DE and DN analyses. In addition, we applied INDEED for survival time prediction based on transcriptomic data acquired by analysis of samples from breast cancer patients. We selected biomarker candidates and built a regression model for survival time prediction based on a gene expression dataset and patients' survival records. We evaluated the performance of the regression model on an independent dataset. Compared with the biomarker candidates selected by DE and DN analyses, those selected through INDEED led to more accurate survival time prediction.
Asunto(s)
Antígenos CD/genética , Biomarcadores de Tumor/genética , Moléculas de Adhesión Celular/genética , Proteína de Unión al Complemento C4b/genética , Proteómica/métodos , Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/metabolismo , Cromatografía Liquida , Regulación Neoplásica de la Expresión Génica , Glicómica/métodos , Humanos , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/metabolismo , Espectrometría de Masas , Transcriptoma/genéticaRESUMEN
BACKGROUND: A fundamental challenge in quantitation of biomolecules for cancer biomarker discovery is owing to the heterogeneous nature of human biospecimens. Although this issue has been a subject of discussion in cancer genomic studies, it has not yet been rigorously investigated in mass spectrometry based proteomic and metabolomic studies. Purification of mass spectometric data is highly desired prior to subsequent analysis, e.g., quantitative comparison of the abundance of biomolecules in biological samples. METHODS: We investigated topic models to computationally analyze mass spectrometric data considering both integrated peak intensities and scan-level features, i.e., extracted ion chromatograms (EICs). Probabilistic generative models enable flexible representation in data structure and infer sample-specific pure resources. Scan-level modeling helps alleviate information loss during data preprocessing. We evaluated the capability of the proposed models in capturing mixture proportions of contaminants and cancer profiles on LC-MS based serum proteomic and GC-MS based tissue metabolomic datasets acquired from patients with hepatocellular carcinoma (HCC) and liver cirrhosis as well as synthetic data we generated based on the serum proteomic data. RESULTS: The results we obtained by analysis of the synthetic data demonstrated that both intensity-level and scan-level purification models can accurately infer the mixture proportions and the underlying true cancerous sources with small average error ratios (<7 %) between estimation and ground truth. By applying the topic model-based purification to mass spectrometric data, we found more proteins and metabolites with significant changes between HCC cases and cirrhotic controls. Candidate biomarkers selected after purification yielded biologically meaningful pathway analysis results and improved disease discrimination power in terms of the area under ROC curve compared to the results found prior to purification. CONCLUSIONS: We investigated topic model-based inference methods to computationally address the heterogeneity issue in samples analyzed by LC/GC-MS. We observed that incorporation of scan-level features have the potential to lead to more accurate purification results by alleviating the loss in information as a result of integrating peaks. We believe cancer biomarker discovery studies that use mass spectrometric analysis of human biospecimens can greatly benefit from topic model-based purification of the data prior to statistical and pathway analyses.
Asunto(s)
Biomarcadores de Tumor/sangre , Espectrometría de Masas/estadística & datos numéricos , Neoplasias/sangre , Proteómica/métodos , Carcinoma Hepatocelular/sangre , Carcinoma Hepatocelular/genética , Humanos , Cirrosis Hepática/sangre , Cirrosis Hepática/genética , Neoplasias Hepáticas/sangre , Neoplasias Hepáticas/genética , Metabolómica , Neoplasias/genéticaRESUMEN
Associating changes in protein levels with the onset of cancer has been widely investigated to identify clinically relevant diagnostic biomarkers. In the present study, we analyzed sera from 205 patients recruited in the United States and Egypt for biomarker discovery using label-free proteomic analysis by LC-MS/MS. We performed untargeted proteomic analysis of sera to identify candidate proteins with statistically significant differences between hepatocellular carcinoma (HCC) and patients with liver cirrhosis. We further evaluated the significance of 101 proteins in sera from the same 205 patients through targeted quantitation by MRM on a triple quadrupole mass spectrometer. This led to the identification of 21 candidate protein biomarkers that were significantly altered in both the United States and Egyptian cohorts. Among the 21 candidates, ten were previously reported as HCC-associated proteins (eight exhibiting consistent trends with our observation), whereas 11 are new candidates discovered by this study. Pathway analysis based on the significant proteins reveals upregulation of the complement and coagulation cascades pathway and downregulation of the antigen processing and presentation pathway in HCC cases versus patients with liver cirrhosis. The results of this study demonstrate the power of combining untargeted and targeted quantitation methods for a comprehensive serum proteomic analysis, to evaluate changes in protein levels and discover novel diagnostic biomarkers. All MS data have been deposited in the ProteomeXchange with identifier PXD001171 (http://proteomecentral.proteomexchange.org/dataset/PXD001171).
Asunto(s)
Carcinoma Hepatocelular/metabolismo , Cromatografía Liquida/métodos , Neoplasias Hepáticas/metabolismo , Proteómica/métodos , Espectrometría de Masas en Tándem/métodos , Femenino , Humanos , Masculino , Persona de Mediana EdadRESUMEN
BACKGROUND: Gas chromatography coupled with mass spectrometry (GC-MS) is one of the technologies widely used for qualitative and quantitative analysis of small molecules. In particular, GC coupled to single quadrupole MS can be utilized for targeted analysis by selected ion monitoring (SIM). However, to our knowledge, there are no software tools specifically designed for analysis of GC-SIM-MS data. In this paper, we introduce a new R/Bioconductor package called SIMAT for quantitative analysis of the levels of targeted analytes. SIMAT provides guidance in choosing fragments for a list of targets. This is accomplished through an optimization algorithm that has the capability to select the most appropriate fragments from overlapping chromatographic peaks based on a pre-specified library of background analytes. The tool also allows visualization of the total ion chromatograms (TIC) of runs and extracted ion chromatograms (EIC) of analytes of interest. Moreover, retention index (RI) calibration can be performed and raw GC-SIM-MS data can be imported in netCDF or NIST mass spectral library (MSL) formats. RESULTS: We evaluated the performance of SIMAT using two GC-SIM-MS datasets obtained by targeted analysis of: (1) plasma samples from 86 patients in a targeted metabolomic experiment; and (2) mixtures of internal standards spiked in plasma samples at varying concentrations in a method development study. Our results demonstrate that SIMAT offers alternative solutions to AMDIS and MetaboliteDetector to achieve accurate detection of targets and estimation of their relative intensities by analysis of GC-SIM-MS data. CONCLUSIONS: We introduce a new R package called SIMAT that allows the selection of the optimal set of fragments and retention time windows for target analytes in GC-SIM-MS based analysis. Also, various functions and algorithms are implemented in the tool to: (1) read and import raw data and spectral libraries; (2) perform GC-SIM-MS data preprocessing; and (3) plot and visualize EICs and TICs.
Asunto(s)
Programas Informáticos , Algoritmos , Cromatografía de Gases y Espectrometría de Masas , Internet , MetabolómicaRESUMEN
Biological network inference is a major challenge in systems biology. Traditional correlation-based network analysis results in too many spurious edges since correlation cannot distinguish between direct and indirect associations. To address this issue, Gaussian graphical models (GGM) were proposed and have been widely used. Though they can significantly reduce the number of spurious edges, GGM are insufficient to uncover a network structure faithfully due to the fact that they only consider the full order partial correlation. Moreover, when the number of samples is smaller than the number of variables, further technique based on sparse regularization needs to be incorporated into GGM to solve the singular covariance inversion problem. In this paper, we propose an efficient and mathematically solid algorithm that infers biological networks by computing low order partial correlation (LOPC) up to the second order. The bias introduced by the low order constraint is minimal compared to the more reliable approximation of the network structure achieved. In addition, the algorithm is suitable for a dataset with small sample size but large number of variables. Simulation results show that LOPC yields far less spurious edges and works well under various conditions commonly seen in practice. The application to a real metabolomics dataset further validates the performance of LOPC and suggests its potential power in detecting novel biomarkers for complex disease.
Asunto(s)
Biomarcadores , Biología Computacional/métodos , Modelos Teóricos , Biología de Sistemas , Algoritmos , Perfilación de la Expresión Génica , Humanos , Distribución NormalRESUMEN
Defining clinically relevant biomarkers for early stage hepatocellular carcinoma (HCC) in a high-risk population of cirrhotic patients has potentially far-reaching implications for disease management and patient health. Changes in glycan levels have been associated with the onset of numerous diseases including cancer. In the present study, we used liquid chromatography coupled with electrospray ionization mass spectrometry (LC-ESI-MS) to analyze N-glycans in sera from 183 participants recruited in Egypt and the U.S. and identified candidate biomarkers that distinguish HCC cases from cirrhotic controls. N-Glycans were released from serum proteins and permethylated prior to the LC-ESI-MS analysis. Through two complementary LC-ESI-MS quantitation approaches, global profiling and targeted quantitation, we identified 11 N-glycans with statistically significant differences between HCC cases and cirrhotic controls. These glycans can further be categorized into four structurally related clusters, matching closely with the implications of important glycosyltransferases in cancer progression and metastasis. The results of this study illustrate the power of the integrative approach combining complementary LC-ESI-MS based quantitation approaches to investigate changes in N-glycan levels between HCC cases and patients with liver cirrhosis.
Asunto(s)
Biomarcadores de Tumor/sangre , Carcinoma Hepatocelular/diagnóstico , Cirrosis Hepática/sangre , Neoplasias Hepáticas/diagnóstico , Polisacáridos/sangre , Carcinoma Hepatocelular/sangre , Carcinoma Hepatocelular/etiología , Cromatografía Liquida , Egipto , Perfilación de la Expresión Génica/métodos , Humanos , Cirrosis Hepática/complicaciones , Neoplasias Hepáticas/sangre , Neoplasias Hepáticas/etiología , Espectrometría de Masas , Estados UnidosRESUMEN
Many complex diseases such as cancer are associated with changes in biological pathways and molecular networks rather than being caused by single gene alterations. A major challenge in the diagnosis and treatment of such diseases is to identify characteristic aberrancies in the biological pathways and molecular network activities and elucidate their relationship to the disease. This review presents recent progress in using high-throughput biological assays to decipher aberrant pathways and network activities. In particular, this review provides specific examples in which high-throughput data have been applied to identify relationships between diseases and aberrant pathways and network activities. The achievements in this field have been remarkable, but many challenges have yet to be addressed.
Asunto(s)
Redes Reguladoras de Genes , Ensayos Analíticos de Alto Rendimiento , Bases de Datos Factuales , Biología de SistemasRESUMEN
MOTIVATION: Liquid chromatography-mass spectrometry (LC-MS) has been widely used for profiling expression levels of biomolecules in various '-omic' studies including proteomics, metabolomics and glycomics. Appropriate LC-MS data preprocessing steps are needed to detect true differences between biological groups. Retention time (RT) alignment, which is required to ensure that ion intensity measurements among multiple LC-MS runs are comparable, is one of the most important yet challenging preprocessing steps. Current alignment approaches estimate RT variability using either single chromatograms or detected peaks, but do not simultaneously take into account the complementary information embedded in the entire LC-MS data. RESULTS: We propose a Bayesian alignment model for LC-MS data analysis. The alignment model provides estimates of the RT variability along with uncertainty measures. The model enables integration of multiple sources of information including internal standards and clustered chromatograms in a mathematically rigorous framework. We apply the model to LC-MS metabolomic, proteomic and glycomic data. The performance of the model is evaluated based on ground-truth data, by measuring correlation of variation, RT difference across runs and peak-matching performance. We demonstrate that Bayesian alignment model improves significantly the RT alignment performance through appropriate integration of relevant information. AVAILABILITY AND IMPLEMENTATION: MATLAB code, raw and preprocessed LC-MS data are available at http://omics.georgetown.edu/alignLCMS.html. CONTACT: hwr@georgetown.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Cromatografía Liquida/métodos , Espectrometría de Masas/métodos , Algoritmos , Teorema de Bayes , Cromatografía Liquida/normas , Glicómica , Humanos , Espectrometría de Masas/normas , Metabolómica , Modelos Estadísticos , Proteómica , Estándares de ReferenciaRESUMEN
RNA-Seq data analysis stands as a vital part of genomics research, turning vast and complex datasets into meaningful biological insights. It is a field marked by rapid evolution and ongoing innovation, necessitating a thorough understanding for anyone seeking to unlock the potential of RNA-Seq data. In this chapter, we describe the intricate landscape of RNA-seq data analysis, elucidating a comprehensive pipeline that navigates through the entirety of this complex process. Beginning with quality control, the chapter underscores the paramount importance of ensuring the integrity of RNA-seq data, as it lays the groundwork for subsequent analyses. Preprocessing is then addressed, where the raw sequence data undergoes necessary modifications and enhancements, setting the stage for the alignment phase. This phase involves mapping the processed sequences to a reference genome, a step pivotal for decoding the origins and functions of these sequences.Venturing into the heart of RNA-seq analysis, the chapter then explores differential expression analysis-the process of identifying genes that exhibit varying expression levels across different conditions or sample groups. Recognizing the biological context of these differentially expressed genes is pivotal; hence, the chapter transitions into functional analysis. Here, methods and tools like Gene Ontology and pathway analyses help contextualize the roles and interactions of the identified genes within broader biological frameworks. However, the chapter does not stop at conventional analysis methods. Embracing the evolving paradigms of data science, it delves into machine learning applications for RNA-seq data, introducing advanced techniques in dimension reduction and both unsupervised and supervised learning. These approaches allow for patterns and relationships to be discerned in the data that might be imperceptible through traditional methods.