Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 4 de 4
Filter
Add more filters

Database
Language
Publication year range
1.
Comput Stat Data Anal ; 132: 46-69, 2019 Apr.
Article in English | MEDLINE | ID: mdl-38774121

ABSTRACT

Clustering methods for multivariate data exploiting the underlying geometry of the graphical structure between variables are presented. As opposed to standard approaches for graph clustering that assume known graph structures, the edge structure of the unknown graph is first estimated using sparse regression based approaches for sparse graph structure learning. Subsequently, graph clustering on the lower dimensional projections of the graph is performed based on Laplacian embeddings using a penalized k-means approach, motivated by Dirichlet process mixture models in Bayesian nonparametrics. In contrast to standard algorithmic approaches for known graphs, the proposed method allows estimation and inference for both graph structure learning and clustering. More importantly, the arguments for Laplacian embeddings as suitable projections for graph clustering are formalized by providing theoretical support for the consistency of the eigenspace of the estimated graph Laplacians. Fast computational algorithms are proposed to scale the method to large number of nodes. Extensive simulations are presented to compare the clustering performance with standard methods. The methods are applied to a novel pan-cancer proteomic data set, and protein networks and clusters are evaluated across multiple different cancer types.

2.
bioRxiv ; 2024 Jan 03.
Article in English | MEDLINE | ID: mdl-38260566

ABSTRACT

Background: Principal component analysis (PCA), a standard approach to analysis and visualization of large datasets, is commonly used in biomedical research for detecting similarities and differences among groups of samples. We initially used conventional PCA as a tool for critical quality control of batch and trend effects in multi-omic profiling data produced by The Cancer Genome Atlas (TCGA) project of the NCI. We found, however, that conventional PCA visualizations were often hard to interpret when inter-batch differences were moderate in comparison with intra-batch differences; it was also difficult to quantify batch effects objectively. We, therefore, sought enhancements to make the method more informative in those and analogous settings. Results: We have developed algorithms and a toolbox of enhancements to conventional PCA that improve the detection, diagnosis, and quantitation of differences between or among groups, e.g., groups of molecularly profiled biological samples. The enhancements include (i) computed group centroids; (ii) sample-dispersion rays; (iii) differential coloring of centroids, rays, and sample data points; (iii) trend trajectories; and (iv) a novel separation index (DSC) for quantitation of differences among groups. Conclusions: PCA-Plus has been our most useful single tool for analyzing, visualizing, and quantitating batch effects, trend effects, and class differences in molecular profiling data of many types: mRNA expression, microRNA expression, DNA methylation, and DNA copy number. An early version of PCA-Plus has been used as the central graphical visualization in our MBatch package for near-real-time surveillance of data for analysis working groups in more than 70 TCGA, PanCancer Atlas, PanCancer Analysis of Whole Genomes, and Genome Data Analysis Network projects of the NCI. The algorithms and software are generic, hence applicable more generally to other types of multivariate data as well. PCA-Plus is freely available in a down-loadable R package at our MBatch website.

3.
Res Sq ; 2024 Feb 01.
Article in English | MEDLINE | ID: mdl-38352620

ABSTRACT

Ion suppression is a major problem in mass spectrometry (MS)-based metabolomics; it can dramatically decrease measurement accuracy, precision, and signal-to-noise sensitivity. Here we report a new method, the IROA TruQuant Workflow, that uses a stable isotope-labeled internal standard (IROA-IS) plus novel companion algorithms to 1) measure and correct for ion suppression, and 2) perform Dual MSTUS normalization of MS metabolomic data. We have evaluated the method across ion chromatography (IC), hydrophilic interaction liquid chromatography (HILIC), and reverse phase liquid chromatography (RPLC)-MS systems in both positive and negative ionization modes, with clean and unclean ion sources, and across different biological matrices. Across the broad range of conditions tested, all detected metabolites exhibited ion suppression ranging from 1% to 90+% and coefficient of variations ranging from 1% to 20%, but the Workflow and companion algorithms were highly effective at nulling out that suppression and error. Overall, the Workflow corrects ion suppression across diverse analytical conditions and produces robust normalization of non-targeted metabolomic data.

SELECTION OF CITATIONS
SEARCH DETAIL