Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
J Am Soc Mass Spectrom ; 35(6): 1089-1100, 2024 Jun 05.
Artículo en Inglés | MEDLINE | ID: mdl-38690775

RESUMEN

Metabolomics generates complex data necessitating advanced computational methods for generating biological insight. While machine learning (ML) is promising, the challenges of selecting the best algorithms and tuning hyperparameters, particularly for nonexperts, remain. Automated machine learning (AutoML) can streamline this process; however, the issue of interpretability could persist. This research introduces a unified pipeline that combines AutoML with explainable AI (XAI) techniques to optimize metabolomics analysis. We tested our approach on two data sets: renal cell carcinoma (RCC) urine metabolomics and ovarian cancer (OC) serum metabolomics. AutoML, using Auto-sklearn, surpassed standalone ML algorithms like SVM and k-Nearest Neighbors in differentiating between RCC and healthy controls, as well as OC patients and those with other gynecological cancers. The effectiveness of Auto-sklearn is highlighted by its AUC scores of 0.97 for RCC and 0.85 for OC, obtained from the unseen test sets. Importantly, on most of the metrics considered, Auto-sklearn demonstrated a better classification performance, leveraging a mix of algorithms and ensemble techniques. Shapley Additive Explanations (SHAP) provided a global ranking of feature importance, identifying dibutylamine and ganglioside GM(d34:1) as the top discriminative metabolites for RCC and OC, respectively. Waterfall plots offered local explanations by illustrating the influence of each metabolite on individual predictions. Dependence plots spotlighted metabolite interactions, such as the connection between hippuric acid and one of its derivatives in RCC, and between GM3(d34:1) and GM3(18:1_16:0) in OC, hinting at potential mechanistic relationships. Through decision plots, a detailed error analysis was conducted, contrasting feature importance for correctly versus incorrectly classified samples. In essence, our pipeline emphasizes the importance of harmonizing AutoML and XAI, facilitating both simplified ML application and improved interpretability in metabolomics data science.


Asunto(s)
Neoplasias Renales , Aprendizaje Automático , Metabolómica , Neoplasias Ováricas , Humanos , Metabolómica/métodos , Femenino , Neoplasias Ováricas/metabolismo , Neoplasias Ováricas/diagnóstico , Neoplasias Ováricas/sangre , Neoplasias Renales/metabolismo , Neoplasias Renales/diagnóstico , Neoplasias Renales/sangre , Neoplasias Renales/orina , Algoritmos , Carcinoma de Células Renales/metabolismo , Carcinoma de Células Renales/diagnóstico , Biomarcadores de Tumor/sangre , Biomarcadores de Tumor/análisis , Biomarcadores de Tumor/orina , Biomarcadores de Tumor/metabolismo
2.
Cancer Epidemiol Biomarkers Prev ; 33(5): 681-693, 2024 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-38412029

RESUMEN

BACKGROUND: Distinguishing ovarian cancer from other gynecological malignancies is crucial for patient survival yet hindered by non-specific symptoms and limited understanding of ovarian cancer pathogenesis. Accumulating evidence suggests a link between ovarian cancer and deregulated lipid metabolism. Most studies have small sample sizes, especially for early-stage cases, and lack racial/ethnic diversity, necessitating more inclusive research for improved ovarian cancer diagnosis and prevention. METHODS: Here, we profiled the serum lipidome of 208 ovarian cancer, including 93 early-stage patients with ovarian cancer and 117 nonovarian cancer (other gynecological malignancies) patients of Korean descent. Serum samples were analyzed with a high-coverage liquid chromatography high-resolution mass spectrometry platform, and lipidome alterations were investigated via statistical and machine learning (ML) approaches. RESULTS: We found that lipidome alterations unique to ovarian cancer were present in Korean women as early as when the cancer is localized, and those changes increase in magnitude as the diseases progresses. Analysis of relative lipid abundances revealed specific patterns for various lipid classes, with most classes showing decreased abundance in ovarian cancer in comparison with other gynecological diseases. ML methods selected a panel of 17 lipids that discriminated ovarian cancer from nonovarian cancer cases with an AUC value of 0.85 for an independent test set. CONCLUSIONS: This study provides a systemic analysis of lipidome alterations in human ovarian cancer, specifically in Korean women. IMPACT: Here, we show the potential of circulating lipids in distinguishing ovarian cancer from nonovarian cancer conditions.


Asunto(s)
Lipidómica , Neoplasias Ováricas , Humanos , Femenino , Neoplasias Ováricas/sangre , Lipidómica/métodos , República de Corea/epidemiología , Persona de Mediana Edad , Biomarcadores de Tumor/sangre , Adulto , Anciano , Metabolismo de los Lípidos , Lípidos/sangre
3.
bioRxiv ; 2023 Oct 31.
Artículo en Inglés | MEDLINE | ID: mdl-37961534

RESUMEN

Motivation: Metabolomics generates complex data necessitating advanced computational methods for generating biological insight. While machine learning (ML) is promising, the challenges of selecting the best algorithms and tuning hyperparameters, particularly for non-experts, remain. Automated machine learning (AutoML) can streamline this process; however, the issue of interpretability could persist. This research introduces a unified pipeline that combines AutoML with explainable AI (XAI) techniques to optimize metabolomics analysis. Results: We tested our approach on two datasets: renal cell carcinoma (RCC) urine metabolomics and ovarian cancer (OC) serum metabolomics. AutoML, using auto-sklearn, surpassed standalone ML algorithms such as SVM and random forest in differentiating between RCC and healthy controls, as well as OC patients and those with other gynecological cancers (Non-OC). Auto-sklearn employed a mix of algorithms and ensemble techniques, yielding a superior performance (AUC of 0.97 for RCC and 0.85 for OC). Shapley Additive Explanations (SHAP) provided a global ranking of feature importance, identifying dibutylamine and ganglioside GM(d34:1) as the top discriminative metabolites for RCC and OC, respectively. Waterfall plots offered local explanations by illustrating the influence of each metabolite on individual predictions. Dependence plots spotlighted metabolite interactions, such as the connection between hippuric acid and one of its derivatives in RCC, and between GM3(d34:1) and GM3(18:1_16:0) in OC, hinting at potential mechanistic relationships. Through decision plots, a detailed error analysis was conducted, contrasting feature importance for correctly versus incorrectly classified samples. In essence, our pipeline emphasizes the importance of harmonizing AutoML and XAI, facilitating both simplified ML application and improved interpretability in metabolomics data science. Availability: https://github.com/obifarin/automl-xai-metabolomics.

4.
PLoS One ; 18(5): e0284315, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37141218

RESUMEN

Machine learning (ML) models are used in clinical metabolomics studies most notably for biomarker discoveries, to identify metabolites that discriminate between a case and control group. To improve understanding of the underlying biomedical problem and to bolster confidence in these discoveries, model interpretability is germane. In metabolomics, partial least square discriminant analysis (PLS-DA) and its variants are widely used, partly due to the model's interpretability with the Variable Influence in Projection (VIP) scores, a global interpretable method. Herein, Tree-based Shapley Additive explanations (SHAP), an interpretable ML method grounded in game theory, was used to explain ML models with local explanation properties. In this study, ML experiments (binary classification) were conducted for three published metabolomics datasets using PLS-DA, random forests, gradient boosting, and extreme gradient boosting (XGBoost). Using one of the datasets, PLS-DA model was explained using VIP scores, while one of the best-performing models, a random forest model, was interpreted using Tree SHAP. The results show that SHAP has a more explanation depth than PLS-DA's VIP, making it a powerful method for rationalizing machine learning predictions from metabolomics studies.


Asunto(s)
Investigación Biomédica , Grupos Control , Análisis Discriminante , Aprendizaje Automático , Metabolómica
5.
J Proteome Res ; 22(6): 2092-2108, 2023 06 02.
Artículo en Inglés | MEDLINE | ID: mdl-37220064

RESUMEN

Ovarian cancer (OC) is one of the deadliest cancers affecting the female reproductive system. It may present little or no symptoms at the early stages and typically unspecific symptoms at later stages. High-grade serous ovarian cancer (HGSC) is the subtype responsible for most ovarian cancer deaths. However, very little is known about the metabolic course of this disease, particularly in its early stages. In this longitudinal study, we examined the temporal course of serum lipidome changes using a robust HGSC mouse model and machine learning data analysis. Early progression of HGSC was marked by increased levels of phosphatidylcholines and phosphatidylethanolamines. In contrast, later stages featured more diverse lipid alterations, including fatty acids and their derivatives, triglycerides, ceramides, hexosylceramides, sphingomyelins, lysophosphatidylcholines, and phosphatidylinositols. These alterations underscored unique perturbations in cell membrane stability, proliferation, and survival during cancer development and progression, offering potential targets for early detection and prognosis of human ovarian cancer.


Asunto(s)
Cistadenocarcinoma Seroso , Neoplasias Ováricas , Ratones , Animales , Femenino , Humanos , Lipidómica , Estudios Longitudinales , Neoplasias Ováricas/metabolismo , Esfingomielinas/metabolismo , Cistadenocarcinoma Seroso/metabolismo
6.
bioRxiv ; 2023 Jan 04.
Artículo en Inglés | MEDLINE | ID: mdl-36711577

RESUMEN

Ovarian cancer (OC) is one of the deadliest cancers affecting the female reproductive system. It may present little or no symptoms at the early stages, and typically unspecific symptoms at later stages. High-grade serous ovarian cancer (HGSC) is the subtype responsible for most ovarian cancer deaths. However, very little is known about the metabolic course of this disease, particularly in its early stages. In this longitudinal study, we examined the temporal course of serum lipidome changes using a robust HGSC mouse model and machine learning data analysis. Early progression of HGSC was marked by increased levels of phosphatidylcholines and phosphatidylethanolamines. In contrast, later stages featured more diverse lipids alterations, including fatty acids and their derivatives, triglycerides, ceramides, hexosylceramides, sphingomyelins, lysophosphatidylcholines, and phosphatidylinositols. These alterations underscored unique perturbations in cell membrane stability, proliferation, and survival during cancer development and progression, offering potential targets for early detection and prognosis of human ovarian cancer. Teaser: Time-resolved lipidome remodeling in an ovarian cancer model is studied through lipidomics and machine learning.

7.
Cancers (Basel) ; 13(24)2021 Dec 13.
Artículo en Inglés | MEDLINE | ID: mdl-34944874

RESUMEN

Urine metabolomics profiling has potential for non-invasive RCC staging, in addition to providing metabolic insights into disease progression. In this study, we utilized liquid chromatography-mass spectrometry (LC-MS), nuclear magnetic resonance (NMR), and machine learning (ML) for the discovery of urine metabolites associated with RCC progression. Two machine learning questions were posed in the study: Binary classification into early RCC (stage I and II) and advanced RCC stages (stage III and IV), and RCC tumor size estimation through regression analysis. A total of 82 RCC patients with known tumor size and metabolomic measurements were used for the regression task, and 70 RCC patients with complete tumor-nodes-metastasis (TNM) staging information were used for the classification tasks under ten-fold cross-validation conditions. A voting ensemble regression model consisting of elastic net, ridge, and support vector regressor predicted RCC tumor size with a R2 value of 0.58. A voting classifier model consisting of random forest, support vector machines, logistic regression, and adaptive boosting yielded an AUC of 0.96 and an accuracy of 87%. Some identified metabolites associated with renal cell carcinoma progression included 4-guanidinobutanoic acid, 7-aminomethyl-7-carbaguanine, 3-hydroxyanthranilic acid, lysyl-glycine, glycine, citrate, and pyruvate. Overall, we identified a urine metabolic phenotype associated with renal cell carcinoma stage, exploring the promise of a urine-based metabolomic assay for staging this disease.

8.
J Proteome Res ; 20(7): 3629-3641, 2021 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-34161092

RESUMEN

Renal cell carcinoma (RCC) is diagnosed through expensive cross-sectional imaging, frequently followed by renal mass biopsy, which is not only invasive but also prone to sampling errors. Hence, there is a critical need for a noninvasive diagnostic assay. RCC exhibits altered cellular metabolism combined with the close proximity of the tumor(s) to the urine in the kidney, suggesting that urine metabolomic profiling is an excellent choice for assay development. Here, we acquired liquid chromatography-mass spectrometry (LC-MS) and nuclear magnetic resonance (NMR) data followed by the use of machine learning (ML) to discover candidate metabolomic panels for RCC. The study cohort consisted of 105 RCC patients and 179 controls separated into two subcohorts: the model cohort and the test cohort. Univariate, wrapper, and embedded methods were used to select discriminatory features using the model cohort. Three ML techniques, each with different induction biases, were used for training and hyperparameter tuning. Assessment of RCC status prediction was evaluated using the test cohort with the selected biomarkers and the optimally tuned ML algorithms. A seven-metabolite panel predicted RCC in the test cohort with 88% accuracy, 94% sensitivity, 85% specificity, and 0.98 AUC. Metabolomics Workbench Study IDs are ST001705 and ST001706.


Asunto(s)
Carcinoma de Células Renales , Neoplasias Renales , Carcinoma de Células Renales/diagnóstico , Humanos , Neoplasias Renales/diagnóstico por imagen , Aprendizaje Automático , Espectrometría de Masas , Metabolómica
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...