Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 44
Filtrar
1.
JCO Precis Oncol ; 8: e2300718, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38976829

RESUMEN

PURPOSE: To use modern machine learning approaches to enhance and automate the feature extraction from the longitudinal circulating tumor DNA (ctDNA) data and to improve the prediction of survival and disease progression, risk stratification, and treatment strategies for patients with 1L non-small cell lung cancer (NSCLC). MATERIALS AND METHODS: Using IMpower150 trial data on patients with untreated metastatic NSCLC treated with atezolizumab and chemotherapies, we developed a machine learning algorithm to extract predictive features from ctDNA kinetics, improving survival and progression prediction. We analyzed kinetic data from 17 ctDNA summary markers, including cell-free DNA concentration, allele frequency, tumor molecules in plasma, and mutation counts. RESULTS: Three hundred and ninety-eight patients with ctDNA data (206 in training and 192 in validation) were analyzed. Our models outperformed existing workflow using conventional temporal ctDNA features, raising overall survival (OS) concordance index to 0.72 and 0.71 from 0.67 and 0.63 for C3D1 and C4D1, respectively, and substantially improving progression-free survival (PFS) to approximately 0.65 from the previous 0.54-0.58, a 12%-20% increase. Additionally, they enhanced risk stratification for patients with NSCLC, achieving clear OS and PFS separation. Distinct patterns of ctDNA kinetic characteristics (eg, baseline ctDNA markers, depth of ctDNA responses, and timing of ctDNA clearance, etc) were revealed across the risk groups. Rapid and complete ctDNA clearance appears essential for long-term clinical benefit. CONCLUSION: Our machine learning approach offers a novel tool for analyzing ctDNA kinetics, extracting critical features from longitudinal data, improving our understanding of the link between ctDNA kinetics and progression/mortality risks, and optimizing personalized immunotherapies for 1L NSCLC.


Asunto(s)
Carcinoma de Pulmón de Células no Pequeñas , ADN Tumoral Circulante , Progresión de la Enfermedad , Inmunoterapia , Neoplasias Pulmonares , Aprendizaje Automático , Humanos , Carcinoma de Pulmón de Células no Pequeñas/tratamiento farmacológico , Carcinoma de Pulmón de Células no Pequeñas/genética , Carcinoma de Pulmón de Células no Pequeñas/sangre , Carcinoma de Pulmón de Células no Pequeñas/mortalidad , Carcinoma de Pulmón de Células no Pequeñas/patología , ADN Tumoral Circulante/sangre , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/sangre , Neoplasias Pulmonares/tratamiento farmacológico , Neoplasias Pulmonares/patología , Neoplasias Pulmonares/mortalidad , Inmunoterapia/métodos , Masculino , Femenino , Persona de Mediana Edad , Anticuerpos Monoclonales Humanizados/uso terapéutico , Anciano , Supervivencia sin Progresión
2.
Eur J Cancer ; 207: 114147, 2024 May 31.
Artículo en Inglés | MEDLINE | ID: mdl-38834016

RESUMEN

BACKGROUND: We aim to compare the prognostic value of organ-specific dynamics with the sum of the longest diameter (SLD) dynamics in patients with metastatic colorectal cancer (mCRC). METHODS: All datasets are accessible in Project Data Sphere, an open-access platform. The tumor growth inhibition models developed based on organ-level SLD and SLD were used to estimate the organ-specific tumor growth rates (KGs) and SLD KG. The early tumor shrinkage (ETS) from baseline to the first measurement after treatment was also evaluated. The relationship between organ-specific dynamics, SLD dynamics, and survival outcomes (overall survival, OS; progression-free survival, PFS) was quantified using Kaplan-Meier analysis and Cox regression. RESULTS: This study included 3687 patients from 6 phase III mCRC trials. The liver emerged as the most frequent metastatic site (2901, 78.7 %), with variable KGs across different organs in individual patients (liver 0.0243 > lung 0.0202 > lymph node 0.0127 > other 0.0118 [week-1]). Notably, the dynamics for different organs did not equally contribute to predicting survival outcomes. In liver metastasis cases, liver KG proved to be a superior prognostic indicator for OS and surpasses the predictive performance of SLD, (C-index, liver KG 0.610 vs SLD KG 0.606). A similar result can be found for PFS. Moreover, liver ETS also outperforms SLD ETS in predicting survival. Cox regression analysis confirmed liver KG is the most significant variable in survival prediction. CONCLUSIONS: In mCRC patients with liver metastasis, liver dynamics is the primary prognostic indicator for both PFS and OS. In future drug development for mCRC, greater emphasis should be directed towards understanding the dynamics of liver metastasis development.

3.
Comput Biol Chem ; 109: 108009, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38219419

RESUMEN

Many soft biclustering algorithms have been developed and applied to various biological and biomedical data analyses. However, few mutually exclusive (hard) biclustering algorithms have been proposed, which could better identify disease or molecular subtypes with survival significance based on genomic or transcriptomic data. In this study, we developed a novel mutually exclusive spectral biclustering (MESBC) algorithm based on spectral method to detect mutually exclusive biclusters. MESBC simultaneously detects relevant features (genes) and corresponding conditions (patients) subgroups and, therefore, automatically uses the signature features for each subtype to perform the clustering. Extensive simulations revealed that MESBC provided superior accuracy in detecting pre-specified biclusters compared with the non-negative matrix factorization (NMF) and Dhillon's algorithm, particularly in very noisy data. Further analysis of the algorithm on real datasets obtained from the TCGA database showed that MESBC provided more accurate (i.e., smaller p-value) overall survival prediction in patients with lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) cancers when compared to the existing, gold-standard subtypes for lung cancers (integrative clustering). Furthermore, MESBC detected several genes with significant prognostic value in both LUAD and LUSC patients. External validation on an independent, unseen GEO dataset of LUAD showed that MESBC-derived clusters based on TCGA data still exhibited clear biclustering patterns and consistent, outstanding prognostic predictability, demonstrating robust generalizability of MESBC. Therefore, MESBC could potentially be used as a risk stratification tool to optimize the treatment for the patient, improve the selection of patients for clinical trials, and contribute to the development of novel therapeutic agents.


Asunto(s)
Adenocarcinoma del Pulmón , Carcinoma de Pulmón de Células no Pequeñas , Carcinoma de Células Escamosas , Neoplasias Pulmonares , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Perfilación de la Expresión Génica/métodos , Algoritmos , Neoplasias Pulmonares/genética
4.
JCO Clin Cancer Inform ; 8: e2300154, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-38231003

RESUMEN

PURPOSE: To apply deep learning algorithms to histopathology images, construct image-based subtypes independent of known clinical and molecular classifications for glioblastoma, and produce novel insights into molecular and immune characteristics of the glioblastoma tumor microenvironment. MATERIALS AND METHODS: Using whole-slide hematoxylin and eosin images from 214 patients with glioblastoma in The Cancer Genome Atlas (TCGA), a fine-tuned convolutional neural network model extracted deep learning features. Biclustering was used to identify subtypes and image feature modules. Prognostic value of image subtypes was assessed via Cox regression on survival outcomes and validated with 189 samples from the Clinical Proteomic Tumor Analysis Consortium (CPTAC) data set. Morphological, molecular, and immune characteristics of glioblastoma image subtypes were analyzed. RESULTS: Four distinct subtypes and modules (imClust1-4) were identified for the TCGA patients with glioblastoma on the basis of the image feature data. The glioblastoma image subtypes were significantly associated with overall survival (OS; P = .028) and progression-free survival (P = .003). Apparent association was also observed for disease-specific survival (P = .096). imClust2 had the best prognosis for all three survival end points (eg, after 25 months, imClust2 had >7% surviving patients than the other subtypes). Examination of OS in the external validation using the unseen CPTAC data set showed consistent patterns. Multivariable Cox analyses confirmed that the image subtypes carry unique prognostic information independent of known clinical and molecular predictors. Molecular and immune profiling revealed distinct immune compositions of the tumor microenvironment in different image subtypes and may provide biologic explanations for the patterns in patients' outcomes. CONCLUSION: Our image-based subtype classification on the basis of deep learning models is a novel tool to refine risk stratification in cancers. The image subtypes detected for glioblastoma represent a promising prognostic biomarker with distinct molecular and immune characteristics and may facilitate developing novel, individualized immunotherapies for glioblastoma.


Asunto(s)
Productos Biológicos , Aprendizaje Profundo , Glioblastoma , Humanos , Glioblastoma/diagnóstico por imagen , Pronóstico , Proteómica , Microambiente Tumoral
5.
Clin Pharmacol Ther ; 115(4): 805-814, 2024 04.
Artículo en Inglés | MEDLINE | ID: mdl-37724436

RESUMEN

Pretreatment serum lactate dehydrogenase (LDH) levels have been associated with poor prognosis in several types of cancer, including metastatic colorectal cancer (mCRC). However, very few models link survival to longitudinal LDH measured repeatedly over time during treatment. We investigated the prognostic value of on-treatment LDH dynamics in mCRC. Using data from two large phase III studies (2L and 3L+ mCRC settings, n = 824 and 210, respectively), we found that integrating longitudinal LDH data with baseline risk factors significantly improved survival prediction. Current LDH values performed best, enhancing discrimination ability (area under the receiver operating characteristic curve) by 4.5~15.4% and prediction accuracy (Brier score) by 3.9~15.0% compared with baseline variables. Combining all longitudinal LDH markers further improved predictive performance. After controlling for baseline covariates and other longitudinal LDH indicators, current LDH levels remained a significant risk factor in mCRC, increasing mortality risk by over 90% (P < 0.001) in 2L patients and 60-70% (P < 0.01) in 3L+ patients per unit increment in current log (LDH). Machine-learning techniques, like functional principal component analysis (FPCA), extracted informative features from longitudinal LDH data, capturing over 99% of variability and allowing prediction of survival. Unsupervised clustering based on the extracted FPCA features stratified patients into three groups with distinct LDH dynamics and survival outcomes. Hence, our approaches offer a valuable and cost-effective way for risk stratification and improves survival prediction in mCRC using LDH trajectories.


Asunto(s)
Neoplasias Colorrectales , L-Lactato Deshidrogenasa , p-Cloroanfetamina/análogos & derivados , Humanos , Pronóstico , Factores de Riesgo , Estudios Retrospectivos
6.
Am J Pathol ; 193(12): 2122-2132, 2023 12.
Artículo en Inglés | MEDLINE | ID: mdl-37775043

RESUMEN

In digital pathology tasks, transformers have achieved state-of-the-art results, surpassing convolutional neural networks (CNNs). However, transformers are usually complex and resource intensive. This study developed a novel and efficient digital pathology classifier called DPSeq to predict cancer biomarkers through fine-tuning a sequencer architecture integrating horizontal and vertical bidirectional long short-term memory networks. Using hematoxylin and eosin-stained histopathologic images of colorectal cancer from two international data sets (The Cancer Genome Atlas and Molecular and Cellular Oncology), the predictive performance of DPSeq was evaluated in a series of experiments. DPSeq demonstrated exceptional performance for predicting key biomarkers in colorectal cancer (microsatellite instability status, hypermutation, CpG island methylator phenotype status, BRAF mutation, TP53 mutation, and chromosomal instability), outperforming most published state-of-the-art classifiers in a within-cohort internal validation and a cross-cohort external validation. In addition, under the same experimental conditions using the same set of training and testing data sets, DPSeq surpassed four CNNs (ResNet18, ResNet50, MobileNetV2, and EfficientNet) and two transformer (Vision Transformer and Swin Transformer) models, achieving the highest area under the receiver operating characteristic curve and area under the precision-recall curve values in predicting microsatellite instability status, BRAF mutation, and CpG island methylator phenotype status. Furthermore, DPSeq required less time for both training and prediction because of its simple architecture. Therefore, DPSeq appears to be the preferred choice over transformer and CNN models for predicting cancer biomarkers.


Asunto(s)
Biomarcadores de Tumor , Neoplasias Colorrectales , Humanos , Biomarcadores de Tumor/genética , Proteínas Proto-Oncogénicas B-raf/genética , Inestabilidad de Microsatélites , Metilación de ADN/genética , Neoplasias Colorrectales/diagnóstico , Neoplasias Colorrectales/genética , Neoplasias Colorrectales/patología , Islas de CpG/genética
7.
Nutrients ; 15(9)2023 May 04.
Artículo en Inglés | MEDLINE | ID: mdl-37432361

RESUMEN

Several studies have demonstrated that adhering to the Dietary Approaches to Stop Hypertension (DASH) diet may result in decreased blood pressure levels and hypertension risk. This may be an effect of a reduction in central obesity. In the current study, we explored the mediation role of multiple anthropometric measurements in association with DASH score and hypertension risk, and we investigated potential common micro/macro nutrients that react with the obesity-reduction mechanism. Our study used data from the National Health and Nutrition Examination Survey (NHANES). Important demographic variables, such as gender, race, age, marital status, education attainment, poverty income ratio, and lifestyle habits such as smoking, alcohol drinking, and physical activity were collected. Various anthropometric measurements, including weight, waist circumference, body mass index (BMI), and waist-to-height ratio (WHtR) were also obtained from the official website. The nutrient intake of 8224 adults was quantified through a combination of interviews and laboratory tests. We conducted stepwise regression to filter the most important anthropometric measurements and performed a multiple mediation analysis to test whether the selected anthropometric measurements had mediation effects on the total effect of the DASH diet on hypertension. Random forest models were conducted to identify nutrient subsets associated with the DASH score and anthropometric measurements. Finally, associations between common nutrients and DASH score, anthropometric measurements, and risk of hypertension were respectively evaluated by a logistic regression model adjusting for possible confounders. Our study revealed that BMI and WHtR acted as full mediators between DASH score and high blood pressure levels. Together, they accounted for more than 45% of the variation in hypertension. Interestingly, WHtR was found to be the strongest mediator, explaining approximate 80% of the mediating effect. Furthermore, we identified a group of three commonly consumed nutrients (sodium, potassium, and octadecatrienoic acid) that had opposing effects on DASH score and anthropometric measurements. These nutrients were also found to be associated with hypertension in the same way as BMI and WHtR in univariate regression models. The most important among these nutrients was sodium, which was negatively correlated with the DASH score (ß = -0.53, 95% CI = -0.56~-0.50, p < 0.001) and had a positive association with BMI (ß = 0.04, 95% CI = 0.01~0.07, p = 0.02), WHtR (ß = 0.06, 95% CI = 0.03~0.09, p < 0.001), and hypertension (OR = 1.09, 95% CI = 1.01~1.19, p = 0.037). Our investigation revealed that the WHtR exerts a greater mediating effect than BMI on the correlation between the DASH diet and hypertension. Notably, we identified a plausible nutrient intake pathway involving sodium, potassium, and octadecatrienoic acid. Our findings suggested that lifestyle modifications that emphasize the reduction of central obesity and the attainment of a well-balanced micro/macro nutrient profile, such as the DASH diet, could potentially be efficacious in managing hypertension.


Asunto(s)
Enfoques Dietéticos para Detener la Hipertensión , Hipertensión , Adulto , Humanos , Encuestas Nutricionales , Obesidad Abdominal/epidemiología , Dieta , Ingestión de Alimentos , Hipertensión/epidemiología , Obesidad/epidemiología , Sodio
8.
Clin Pharmacokinet ; 62(5): 705-713, 2023 05.
Artículo en Inglés | MEDLINE | ID: mdl-36930421

RESUMEN

BACKGROUND AND OBJECTIVE: The designs of first-in-human (FIH) studies in oncology (e.g., 3 + 3 dose escalation design) usually do not provide a sufficient sample size to determine the dose-response relationship for efficacy. This study aimed to assess the feasibility of using monoclonal antibody (mAb) clearance as a biomarker for efficacy to facilitate the identification of potentially efficacious doses across cancer types and drug targets. METHODS: We performed electronic searches of the Drugs@FDA website, the European Medicines Agency website, and PubMed to identify reports of FIH trials of approved mAbs in oncology. The clearance, half-life, and overall response rate (ORR) data for the mAbs at different dose levels were extracted. RESULTS: Twenty-five approved mAbs were included in this study. As expected, due to the small sample sizes in FIH studies, there was no clear dose-response for ORR. However, we found a clear negative association between mAb clearance and ORR across tumors/drug targets, and a clear negative dose-clearance relationship, with clearance decreasing and saturated at high dose levels. The approved mAb doses (1-25 mg/kg) are approximately 2-fold the saturation doses (1-10 mg/kg). The associated clearance values at the approved doses vary across different cancers and drug targets (0.17-1.56 L/day), while tend to be similar within a disease/drug target. Anti-CD20 mAbs for B-cell lymphomas show a higher clearance (~ 1 L/day) than other cancers and targets (e.g., ~ 0.3 L/day for anti-PD-1). CONCLUSIONS: Clearance of mAbs can be a tumor/drug target-agnostic biomarker for potential anti-tumor activity as clearance decreases with increasing ORR. Our findings shed important insights into target clearance values that may lead to desired efficacy for different cancers and drug targets, which can be used to guide dose selection for the future development of mAbs during FIH oncology studies.


Asunto(s)
Anticuerpos Monoclonales , Neoplasias , Humanos , Anticuerpos Monoclonales/uso terapéutico , Neoplasias/tratamiento farmacológico , Semivida , Biomarcadores de Tumor
9.
Comput Med Imaging Graph ; 105: 102189, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36739752

RESUMEN

Self-attention mechanism-based algorithms are attractive in digital pathology due to their interpretability, but suffer from computation complexity. This paper presents a novel, lightweight Attention-based Multiple Instance Mutation Learning (AMIML) model to allow small-scale attention operations for predicting gene mutations. Compared to the standard self-attention model, AMIML reduces the number of model parameters by approximately 70%. Using data for 24 clinically relevant genes from four cancer cohorts in TCGA studies (UCEC, BRCA, GBM, and KIRC), we compare AMIML with a standard self-attention model, five other deep learning models, and four traditional machine learning models. The results show that AMIML has excellent robustness and outperforms all the baseline algorithms in the vast majority of the tested genes. Conversely, the performance of the reference deep learning and machine learning models vary across different genes, and produce suboptimal prediction for certain genes. Furthermore, with the flexible and interpretable attention-based pooling mechanism, AMIML can further zero in and detect predictive image patches.


Asunto(s)
Algoritmos , Aprendizaje Automático
10.
J Pathol Clin Res ; 9(3): 223-235, 2023 05.
Artículo en Inglés | MEDLINE | ID: mdl-36723384

RESUMEN

Many artificial intelligence models have been developed to predict clinically relevant biomarkers for colorectal cancer (CRC), including microsatellite instability (MSI). However, existing deep learning networks require large training datasets, which are often hard to obtain. In this study, based on the latest Hierarchical Vision Transformer using Shifted Windows (Swin Transformer [Swin-T]), we developed an efficient workflow to predict biomarkers in CRC (MSI, hypermutation, chromosomal instability, CpG island methylator phenotype, and BRAF and TP53 mutation) that required relatively small datasets. Our Swin-T workflow substantially achieved the state-of-the-art (SOTA) predictive performance in an intra-study cross-validation experiment on the Cancer Genome Atlas colon and rectal cancer dataset (TCGA-CRC-DX). It also demonstrated excellent generalizability in cross-study external validation and delivered a SOTA area under the receiver operating characteristic curve (AUROC) of 0.90 for MSI, using the Molecular and Cellular Oncology dataset for training (N = 1,065) and the TCGA-CRC-DX (N = 462) for testing. A similar performance (AUROC = 0.91) was reported in a recent study, using ~8,000 training samples (ResNet18) on the same testing dataset. Swin-T was extremely efficient when using small training datasets and exhibited robust predictive performance with 200-500 training samples. Our findings indicate that Swin-T could be 5-10 times more efficient than existing algorithms for MSI prediction based on ResNet18 and ShuffleNet. Furthermore, the Swin-T models demonstrated their capability in accurately predicting MSI and BRAF mutation status, which could exclude and therefore reduce samples before subsequent standard testing in a cascading diagnostic workflow, in turn reducing turnaround time and costs.


Asunto(s)
Neoplasias del Colon , Neoplasias Colorrectales , Humanos , Inestabilidad de Microsatélites , Neoplasias Colorrectales/diagnóstico , Neoplasias Colorrectales/genética , Proteínas Proto-Oncogénicas B-raf/genética , Inteligencia Artificial , Metilación de ADN , Biomarcadores , Neoplasias del Colon/genética
11.
J Pathol Clin Res ; 9(1): 3-17, 2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-36376239

RESUMEN

Deep learning models are increasingly being used to interpret whole-slide images (WSIs) in digital pathology and to predict genetic mutations. Currently, it is commonly assumed that tumor regions have most of the predictive power. However, it is reasonable to assume that other tissues from the tumor microenvironment may also provide important predictive information. In this paper, we propose an unsupervised clustering-based multiple-instance deep learning model for the prediction of genetic mutations using WSIs of three cancer types obtained from The Cancer Genome Atlas. Our proposed model facilitates the identification of spatial regions related to specific gene mutations and exclusion of patches that lack predictive information through the use of unsupervised clustering. This results in a more accurate prediction of gene mutations when compared with models using all image patches on WSIs and two recently published algorithms for all three different cancer types evaluated in this study. In addition, our study validates the hypothesis that the prediction of gene mutations solely based on tumor regions on WSI slides may not always provide the best performance. Other tissue types in the tumor microenvironment could provide a better prediction ability than tumor tissues alone. These results highlight the heterogeneity in the tumor microenvironment and the importance of identification of predictive image patches in digital pathology prediction tasks.


Asunto(s)
Aprendizaje Profundo , Humanos , Análisis por Conglomerados , Mutación , Microambiente Tumoral/genética , Algoritmos
12.
J Pathol Inform ; 13: 100115, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36268072

RESUMEN

Background: Due to lack of annotated pathological images, transfer learning has been the predominant approach in the field of digital pathology. Pre-trained neural networks based on ImageNet database are often used to extract "off-the-shelf" features, achieving great success in predicting tissue types, molecular features, and clinical outcomes, etc. We hypothesize that fine-tuning the pre-trained models using histopathological images could further improve feature extraction, and downstream prediction performance. Methods: We used 100 000 annotated H&E image patches for colorectal cancer (CRC) to fine-tune a pre-trained Xception model via a 2-step approach. The features extracted from fine-tuned Xception (FTX-2048) model and Image-pretrained (IMGNET-2048) model were compared through: (1) tissue classification for H&E images from CRC, same image type that was used for fine-tuning; (2) prediction of immune-related gene expression, and (3) gene mutations for lung adenocarcinoma (LUAD). Five-fold cross validation was used for model performance evaluation. Each experiment was repeated 50 times. Findings: The extracted features from the fine-tuned FTX-2048 exhibited significantly higher accuracy (98.4%) for predicting tissue types of CRC compared to the "off-the-shelf" features directly from Xception based on ImageNet database (96.4%) (P value = 2.2 × 10-6). Particularly, FTX-2048 markedly improved the accuracy for stroma from 87% to 94%. Similarly, features from FTX-2048 boosted the prediction of transcriptomic expression of immune-related genes in LUAD. For the genes that had significant relationships with image features (P < 0.05, n = 171), the features from the fine-tuned model improved the prediction for the majority of the genes (139; 81%). In addition, features from FTX-2048 improved prediction of mutation for 5 out of 9 most frequently mutated genes (STK11, TP53, LRP1B, NF1, and FAT1) in LUAD. Conclusions: We proved the concept that fine-tuning the pretrained ImageNet neural networks with histopathology images can produce higher quality features and better prediction performance for not only the same-cancer tissue classification where similar images from the same cancer are used for fine-tuning, but also cross-cancer prediction for gene expression and mutation at patient level.

13.
Comput Biol Chem ; 99: 107697, 2022 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-35636264

RESUMEN

The naïve empirical Bayes method has been widely used as an ad hoc tool in fitting linear mixed-effect models, which is much computationally efficient than the maximum likelihood estimation method. However, the shrinkage effect of the empirical Bayes method causes bias in the estimates of the fixed effects. Bias-correction has been proposed for the mixed-effects model when only one covariate is present. In this paper, we derive the shrinkage factor of the empirical Bayes predictors of the random effects and the variance-covariance matrix of the corrected estimates when the model has more than one covariate. The empirical Bayes estimates and test statistics are then corrected using the derived factor. Theoretical derivations, simulation studies and a real data application demonstrate the validity of the proposed method in that the corrected estimates are unbiased and the corrected tests have correct p-values.


Asunto(s)
Teorema de Bayes , Simulación por Computador , Modelos Lineales
14.
J Cancer Res Clin Oncol ; 148(8): 1955-1963, 2022 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-35332389

RESUMEN

PURPOSE: Most of Stage II/III colorectal cancer (CRC) patients can be cured by surgery alone, and only certain CRC patients benefit from adjuvant chemotherapy. Risk stratification based on deep-learning from haematoxylin and eosin (H&E) images has been postulated as a potential predictive biomarker for benefit from adjuvant chemotherapy. However, very limited success has been achieved in using biomarkers, including deep-learning-based markers, to facilitate the decision for adjuvant chemotherapy despite recent advances of artificial intelligence. METHODS: We trained and internally validated CRCNet using 780 Stage II/III CRC patients from Molecular and Cellular Oncology. Independent external validation of the model was performed using 337 Stage II/III CRC patients from The Cancer Genome Atlas (TCGA). RESULTS: CRCNet stratified the patients into high, medium, and low-risk subgroups. Multivariate Cox regression analyses confirmed that CRCNet risk groups are statistically significant after adjusting for existing risk factors. The high-risk subgroup significantly benefits from adjuvant chemotherapy. A hazard ratio (chemo-treated vs untreated) of 0.2 (95% Confidence Interval (CI), 0.05-0.65; P = 0.009) and 0.6 (95% CI 0.42-0.98; P = 0.038) are observed in the TCGA and MCO Fluorouracil-treated patients, respectively. Conversely, no significant benefit from chemotherapy is observed in the low- and medium-risk groups (P = 0.2-1). CONCLUSION: The retrospective analysis provides further evidence that H&E image-based biomarkers may potentially be of great use in delivering treatments following surgery for Stage II/III CRC, improving patient survival, and avoiding unnecessary treatment and associated toxicity, and warrants further validation on other datasets and prospective confirmation in clinical trials.


Asunto(s)
Neoplasias Colorrectales , Aprendizaje Profundo , Protocolos de Quimioterapia Combinada Antineoplásica/uso terapéutico , Inteligencia Artificial , Biomarcadores de Tumor/genética , Quimioterapia Adyuvante , Neoplasias Colorrectales/patología , Fluorouracilo/uso terapéutico , Humanos , Estadificación de Neoplasias , Pronóstico , Estudios Prospectivos , Estudios Retrospectivos
15.
Genetica ; 149(5-6): 313-325, 2021 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-34480683

RESUMEN

Reducing false discoveries caused by population stratification (PS) has always been a challenge in genome-wide association studies (GWAS). The current literature established several single marker approaches including genomic control (GC), EIGENSTRAT and generalized linear mixed model association test (GMMAT) and multi-marker methods such as LASSO mixed model (LASSOMM). However, the single-marker methods require prespecifying an arbitrary p value threshold in the selection process, likely resulting in suboptimal precision or recall. On the other hand, it appears that LASSOMM is extremely computationally intensive and may not suitable for large-scale GWAS. In this paper, we proposed a simple multi-marker approach (PCA-LASSO) combining principal component analysis (PCA) and least absolute shrinkage and selection operator (LASSO). We utilize PCA to correct for the confounding effects of PS and LASSO with built-in cross-validation for a data-driven selection. Compared to the current single-marker approaches, the proposed PCA-LASSO provides optimal balance between precision and recall, and consequently superior F1 scores. Similarly, compared to LASSOMM, PCA-LASSO markedly increases the precision while minimizing the loss of recall, and therefore improves the overall F1 score in presence of PS. More importantly, PCA-LASSO drastically reduces the computational time by > 1000 times when compared to LASSOMM. We applied PCA-LASSO to a real dataset of Alzheimer's disease and successfully identified SNP rs429358 (Gene APOE4) which has been widely reported to be associated with the onset and elevated risk of Alzheimer's disease. In conclusion, PCA-LASSO is a simple, fast, but accurate approach for GWAS in presence of latent PS.


Asunto(s)
Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo/métodos , Estudio de Asociación del Genoma Completo/normas , Enfermedad de Alzheimer/genética , Conjuntos de Datos como Asunto , Genómica , Humanos , Análisis de Componente Principal , Factores de Tiempo
16.
Stat Methods Med Res ; 30(1): 233-243, 2021 01.
Artículo en Inglés | MEDLINE | ID: mdl-32838650

RESUMEN

Nonlinear mixed-effects modeling is one of the most popular tools for analyzing repeated measurement data, particularly for applications in the biomedical fields. Multiple integration and nonlinear optimization are the two major challenges for likelihood-based methods in nonlinear mixed-effects modeling. To solve these problems, approaches based on empirical Bayesian estimates have been proposed by breaking the problem into a nonlinear mixed-effects model with no covariates and a linear regression model without random effect. This approach is time-efficient as it involves no covariates in the nonlinear optimization. However, covariate effects based on empirical Bayesian estimates are underestimated and the bias depends on the extent of shrinkage. Marginal correction method has been proposed to correct the bias caused by shrinkage to some extent. However, the marginal approach appears to be suboptimal when testing covariate effects on multiple model parameters, a situation that is often encountered in real-world data analysis. In addition, the marginal approach cannot correct the inaccuracy in the associated p-values. In this paper, we proposed a simultaneous correction method (nSCEBE), which can handle the situation where covariate analysis is performed on multiple model parameters. Simulation studies and real data analysis showed that nSCEBE is accurate and efficient for both effect-size estimation and p-value calculation compared with the existing methods. Importantly, nSCEBE can be >2000 times faster than the standard mixed-effects models, potentially allowing utilization for high-dimension covariate analysis for longitudinal or repeated measured outcomes.


Asunto(s)
Modelos Estadísticos , Dinámicas no Lineales , Algoritmos , Teorema de Bayes , Simulación por Computador , Funciones de Verosimilitud
17.
Curr Genomics ; 22(5): 363-372, 2021 Dec 30.
Artículo en Inglés | MEDLINE | ID: mdl-35283669

RESUMEN

Background: In genetic association studies with quantitative trait loci (QTL), the association between a candidate genetic marker and the trait of interest is commonly examined by the omnibus F test or by the t-test corresponding to a given genetic model or mode of inheritance. It is known that the t-test with a correct model specification is more powerful than the F test. However, since the underlying genetic model is rarely known in practice, the use of a model-specific t-test may incur substantial power loss. Robust-efficient tests, such as the Maximin Efficiency Robust Test (MERT) and MAX3 have been proposed in the literature. Methods: In this paper, we propose a novel two-step robust-efficient approach, namely, the genetic model selection (GMS) method for quantitative trait analysis. GMS selects a genetic model by testing Hardy-Weinberg disequilibrium (HWD) with extremal samples of the population in the first step and then applies the corresponding genetic model-specific t-test in the second step. Results: Simulations show that GMS is not only more efficient than MERT and MAX3, but also has comparable power to the optimal t-test when the genetic model is known. Conclusion: Application to the data from Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort demonstrates that the proposed approach can identify meaningful biological SNPs on chromosome 19.

18.
Brief Bioinform ; 22(3)2021 05 20.
Artículo en Inglés | MEDLINE | ID: mdl-32634825

RESUMEN

Genome-wide association studies (GWAS) using longitudinal phenotypes collected over time is appealing due to the improvement of power. However, computation burden has been a challenge because of the complex algorithms for modeling the longitudinal data. Approximation methods based on empirical Bayesian estimates (EBEs) from mixed-effects modeling have been developed to expedite the analysis. However, our analysis demonstrated that bias in both association test and estimation for the existing EBE-based methods remains an issue. We propose an incredibly fast and unbiased method (simultaneous correction for EBE, SCEBE) that can correct the bias in the naive EBE approach and provide unbiased P-values and estimates of effect size. Through application to Alzheimer's Disease Neuroimaging Initiative data with 6 414 695 single nucleotide polymorphisms, we demonstrated that SCEBE can efficiently perform large-scale GWAS with longitudinal outcomes, providing nearly 10 000 times improvement of computational efficiency and shortening the computation time from months to minutes. The SCEBE package and the example datasets are available at https://github.com/Myuan2019/SCEBE.


Asunto(s)
Algoritmos , Enfermedad de Alzheimer/genética , Polimorfismo de Nucleótido Simple , Programas Informáticos , Estudio de Asociación del Genoma Completo , Humanos
19.
J Hum Genet ; 66(5): 509-518, 2021 May.
Artículo en Inglés | MEDLINE | ID: mdl-33177701

RESUMEN

Mutual exclusivity analyses provide an effective tool to identify driver genes from passenger genes for cancer studies. Various algorithms have been developed for the detection of mutual exclusivity, but controlling false positive and improving accuracy remain challenging. We propose a forward selection algorithm for identification of mutually exclusive gene sets (FSME) in this paper. The method includes an initial search of seed pair of mutually exclusive (ME) genes and subsequently including more genes into the current ME set. Simulations demonstrated that, compared to recently published approaches (i.e., CoMEt, WExT, and MEGSA), FSME could provide higher precision or recall rate to identify ME gene sets, and had superior control of false positive rates. With application to TCGA real data sets for AML, BRCA, and GBM, we confirmed that FSME can be utilized to discover cancer driver genes.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Regulación Neoplásica de la Expresión Génica , Neoplasias/genética , Carcinogénesis/genética , Reacciones Falso Positivas , Humanos , Cadenas de Markov , Método de Montecarlo , Mutagénesis/genética , Oncogenes
20.
Comput Biol Chem ; 88: 107320, 2020 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-32711355

RESUMEN

Family based multi-locus tests integrate information from individual loci by weighted averaging of the marginal statistics, and have been proven to be more efficient and robust than the single-locus tests in genetic association studies. The power depends on how much information the weights can extract from data. The currently published weighted sum methods are only applicable to either common or rare variants and may suffer from substantial power loss especially for rare variants. In this paper, we propose a novel data-driven weight to improve the power under both common and rare variant circumstances. We use the l1 regularization in Least Absolute Shrinkage and Selection Operator (LASSO) regression to construct the weight serving as a simultaneously adaptive marker selection process. Simulations for a dichotomous phenotype demonstrated that our LASSO-based approach outperformed the existing multi-locus methods in the sense of providing the highest statistical power while well controlled type I error rate under different scenarios. We also applied our methods to a real dataset for rheumatoid arthritis (GAW15 Problem 2). Two groups of alleles, in which individual SNPs had only modest and non-significant effects, were detected (P < 0.00001) using our proposed methods, whereas traditional multi-locus methods failed to identify them. In conclusion, the novel LASSO-based approach represents a superior weight-choosing strategy for multi-locus tests.


Asunto(s)
Artritis Reumatoide/genética , Modelos Logísticos , Alelos , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...