Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 115
Filtrar
1.
Epidemiol Methods ; 13(1): 20230039, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-38989109

RESUMEN

Objectives: The addition of two-way interactions is a classic problem in statistics, and comes with the challenge of quadratically increasing dimension. We aim to a) devise an estimation method that can handle this challenge and b) to aid interpretation of the resulting model by developing computational tools for quantifying variable importance. Methods: Existing strategies typically overcome the dimensionality problem by only allowing interactions between relevant main effects. Building on this philosophy, and aiming for settings with moderate n to p ratio, we develop a local shrinkage model that links the shrinkage of interaction effects to the shrinkage of their corresponding main effects. In addition, we derive a new analytical formula for the Shapley value, which allows rapid assessment of individual-specific variable importance scores and their uncertainties. Results: We empirically demonstrate that our approach provides accurate estimates of the model parameters and very competitive predictive accuracy. In our Bayesian framework, estimation inherently comes with inference, which facilitates variable selection. Comparisons with key competitors are provided. Large-scale cohort data are used to provide realistic illustrations and evaluations. The implementation of our method in RStan is relatively straightforward and flexible, allowing for adaptation to specific needs. Conclusions: Our method is an attractive alternative for existing strategies to handle interactions in epidemiological and/or clinical studies, as its linked local shrinkage can improve parameter accuracy, prediction and variable selection. Moreover, it provides appropriate inference and interpretation, and may compete well with less interpretable machine learners in terms of prediction.

2.
Diabetologia ; 67(5): 885-894, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38374450

RESUMEN

AIMS/HYPOTHESIS: People with type 2 diabetes are heterogeneous in their disease trajectory, with some progressing more quickly to insulin initiation than others. Although classical biomarkers such as age, HbA1c and diabetes duration are associated with glycaemic progression, it is unclear how well such variables predict insulin initiation or requirement and whether newly identified markers have added predictive value. METHODS: In two prospective cohort studies as part of IMI-RHAPSODY, we investigated whether clinical variables and three types of molecular markers (metabolites, lipids, proteins) can predict time to insulin requirement using different machine learning approaches (lasso, ridge, GRridge, random forest). Clinical variables included age, sex, HbA1c, HDL-cholesterol and C-peptide. Models were run with unpenalised clinical variables (i.e. always included in the model without weights) or penalised clinical variables, or without clinical variables. Model development was performed in one cohort and the model was applied in a second cohort. Model performance was evaluated using Harrel's C statistic. RESULTS: Of the 585 individuals from the Hoorn Diabetes Care System (DCS) cohort, 69 required insulin during follow-up (1.0-11.4 years); of the 571 individuals in the Genetics of Diabetes Audit and Research in Tayside Scotland (GoDARTS) cohort, 175 required insulin during follow-up (0.3-11.8 years). Overall, the clinical variables and proteins were selected in the different models most often, followed by the metabolites. The most frequently selected clinical variables were HbA1c (18 of the 36 models, 50%), age (15 models, 41.2%) and C-peptide (15 models, 41.2%). Base models (age, sex, BMI, HbA1c) including only clinical variables performed moderately in both the DCS discovery cohort (C statistic 0.71 [95% CI 0.64, 0.79]) and the GoDARTS replication cohort (C 0.71 [95% CI 0.69, 0.75]). A more extensive model including HDL-cholesterol and C-peptide performed better in both cohorts (DCS, C 0.74 [95% CI 0.67, 0.81]; GoDARTS, C 0.73 [95% CI 0.69, 0.77]). Two proteins, lactadherin and proto-oncogene tyrosine-protein kinase receptor, were most consistently selected and slightly improved model performance. CONCLUSIONS/INTERPRETATION: Using machine learning approaches, we show that insulin requirement risk can be modestly well predicted by predominantly clinical variables. Inclusion of molecular markers improves the prognostic performance beyond that of clinical variables by up to 5%. Such prognostic models could be useful for identifying people with diabetes at high risk of progressing quickly to treatment intensification. DATA AVAILABILITY: Summary statistics of lipidomic, proteomic and metabolomic data are available from a Shiny dashboard at https://rhapdata-app.vital-it.ch .


Asunto(s)
Diabetes Mellitus Tipo 2 , Humanos , Diabetes Mellitus Tipo 2/metabolismo , Estudios Prospectivos , Péptido C , Proteómica , Insulina/uso terapéutico , Biomarcadores , Aprendizaje Automático , Colesterol
3.
Bioinformatics ; 39(12)2023 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-37951587

RESUMEN

MOTIVATION: In many high-dimensional prediction or classification tasks, complementary data on the features are available, e.g. prior biological knowledge on (epi)genetic markers. Here we consider tasks with numerical prior information that provide an insight into the importance (weight) and the direction (sign) of the feature effects, e.g. regression coefficients from previous studies. RESULTS: We propose an approach for integrating multiple sources of such prior information into penalized regression. If suitable co-data are available, this improves the predictive performance, as shown by simulation and application. AVAILABILITY AND IMPLEMENTATION: The proposed method is implemented in the R package transreg (https://github.com/lcsb-bds/transreg, https://cran.r-project.org/package=transreg).


Asunto(s)
Programas Informáticos , Simulación por Computador , Ciencia de la Implementación
4.
J Comput Graph Stat ; 32(3): 950-960, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38013849

RESUMEN

Elastic net penalization is widely used in high-dimensional prediction and variable selection settings. Auxiliary information on the variables, for example, groups of variables, is often available. Group-adaptive elastic net penalization exploits this information to potentially improve performance by estimating group penalties, thereby penalizing important groups of variables less than other groups. Estimating these group penalties is, however, hard due to the high dimension of the data. Existing methods are computationally expensive or not generic in the type of response. Here we present a fast method for estimation of group-adaptive elastic net penalties for generalized linear models. We first derive a low-dimensional representation of the Taylor approximation of the marginal likelihood for group-adaptive ridge penalties, to efficiently estimate these penalties. Then we show by using asymptotic normality of the linear predictors that this marginal likelihood approximates that of elastic net models. The ridge group penalties are then transformed to elastic net group penalties by matching the ridge prior variance to the elastic net prior variance as function of the group penalties. The method allows for overlapping groups and unpenalized variables, and is easily extended to other penalties. For a model-based simulation study and two cancer genomics applications we demonstrate a substantially decreased computation time and improved or matching performance compared to other methods. Supplementary materials for this article are available online.

5.
BMC Bioinformatics ; 24(1): 172, 2023 Apr 26.
Artículo en Inglés | MEDLINE | ID: mdl-37101151

RESUMEN

BACKGROUND: High-dimensional prediction considers data with more variables than samples. Generic research goals are to find the best predictor or to select variables. Results may be improved by exploiting prior information in the form of co-data, providing complementary data not on the samples, but on the variables. We consider adaptive ridge penalised generalised linear and Cox models, in which the variable-specific ridge penalties are adapted to the co-data to give a priori more weight to more important variables. The R-package ecpc originally accommodated various and possibly multiple co-data sources, including categorical co-data, i.e. groups of variables, and continuous co-data. Continuous co-data, however, were handled by adaptive discretisation, potentially inefficiently modelling and losing information. As continuous co-data such as external p values or correlations often arise in practice, more generic co-data models are needed. RESULTS: Here, we present an extension to the method and software for generic co-data models, particularly for continuous co-data. At the basis lies a classical linear regression model, regressing prior variance weights on the co-data. Co-data variables are then estimated with empirical Bayes moment estimation. After placing the estimation procedure in the classical regression framework, extension to generalised additive and shape constrained co-data models is straightforward. Besides, we show how ridge penalties may be transformed to elastic net penalties. In simulation studies we first compare various co-data models for continuous co-data from the extension to the original method. Secondly, we compare variable selection performance to other variable selection methods. The extension is faster than the original method and shows improved prediction and variable selection performance for non-linear co-data relations. Moreover, we demonstrate use of the package in several genomics examples throughout the paper. CONCLUSIONS: The R-package ecpc accommodates linear, generalised additive and shape constrained additive co-data models for the purpose of improved high-dimensional prediction and variable selection. The extended version of the package as presented here (version number 3.1.1 and higher) is available on ( https://cran.r-project.org/web/packages/ecpc/ ).


Asunto(s)
Genómica , Programas Informáticos , Teorema de Bayes , Simulación por Computador , Modelos Lineales
6.
Oral Oncol ; 137: 106307, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36657208

RESUMEN

OBJECTIVES: Human papillomavirus- (HPV) positive oropharyngeal squamous cell carcinoma (OPSCC) differs biologically and clinically from HPV-negative OPSCC and has a better prognosis. This study aims to analyze the value of magnetic resonance imaging (MRI)-based radiomics in predicting HPV status in OPSCC and aims to develop a prognostic model in OPSCC including HPV status and MRI-based radiomics. MATERIALS AND METHODS: Manual delineation of 249 primary OPSCCs (91 HPV-positive and 159 HPV-negative) on pretreatment native T1-weighted MRIs was performed and used to extract 498 radiomic features per delineation. A logistic regression (LR) and random forest (RF) model were developed using univariate feature selection. Additionally, factor analysis was performed, and the derived factors were combined with clinical data in a predictive model to assess the performance on predicting HPV status. Additionally, factors were combined with clinical parameters in a multivariable survival regression analysis. RESULTS: Both feature-based LR and RF models performed with an AUC of 0.79 in prediction of HPV status. Fourteen of the twenty most significant features were similar in both models, mainly concerning tumor sphericity, intensity variation, compactness, and tumor diameter. The model combining clinical data and radiomic factors (AUC = 0.89) outperformed the radiomics-only model in predicting OPSCC HPV status. Overall survival prediction was most accurate using the combination of clinical parameters and radiomic factors (C-index = 0.72). CONCLUSION: Predictive models based on MR-radiomic features were able to predict HPV status with sufficient performance, supporting the role of MRI-based radiomics as potential imaging biomarker. Survival prediction improved by combining clinical features with MRI-based radiomics.


Asunto(s)
Neoplasias de Cabeza y Cuello , Neoplasias Orofaríngeas , Infecciones por Papillomavirus , Humanos , Carcinoma de Células Escamosas de Cabeza y Cuello , Virus del Papiloma Humano , Neoplasias Orofaríngeas/patología , Infecciones por Papillomavirus/complicaciones , Infecciones por Papillomavirus/diagnóstico por imagen , Infecciones por Papillomavirus/patología , Pronóstico , Imagen por Resonancia Magnética , Estudios Retrospectivos , Papillomaviridae
7.
Neuroradiology ; 65(4): 855-863, 2023 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-36574026

RESUMEN

PURPOSE: Patients with vanishing white matter (VWM) experience unremitting chronic neurological decline and stress-provoked episodes of rapid, partially reversible decline. Cerebral white matter abnormalities are progressive, without improvement, and are therefore unlikely to be related to the episodes. We determined which radiological findings are related to episodic decline. METHODS: MRI scans of VWM patients were retrospectively analyzed. Patients were grouped into A (never episodes) and B (episodes). Signal abnormalities outside the cerebral white matter were rated as absent, mild, or severe. A sum score was developed with abnormalities only seen in group B. The temporal relationship between signal abnormalities and episodes was determined by subdividing scans into those made before, less than 3 months after, and more than 3 months after onset of an episode. RESULTS: Five hundred forty-three examinations of 298 patients were analyzed. Mild and severe signal abnormalities in the caudate nucleus, putamen, globus pallidus, thalamus, midbrain, medulla oblongata, and severe signal abnormalities in the pons were only seen in group B. The sum score, constructed with these abnormalities, depended on the timing of the scan (χ2(2, 400) = 22.8; p < .001): it was least often abnormal before, most often abnormal with the highest value shortly after, and lower longer than 3 months after an episode. CONCLUSION: In VWM, signal abnormalities in brainstem, thalamus, and basal ganglia are related to episodic decline and can improve. Knowledge of the natural MRI history in VWM is important for clinical interpretation of MRI findings and crucial in therapy trials.


Asunto(s)
Leucoencefalopatías , Sustancia Blanca , Leucoencefalopatías/diagnóstico por imagen , Leucoencefalopatías/patología , Sustancia Blanca/diagnóstico por imagen , Sustancia Blanca/patología , Humanos , Masculino , Femenino , Recién Nacido , Lactante , Preescolar , Niño , Adolescente , Adulto Joven , Adulto , Persona de Mediana Edad , Imagen por Resonancia Magnética
8.
Eur Radiol ; 33(4): 2850-2860, 2023 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-36460924

RESUMEN

OBJECTIVES: To externally validate a pre-treatment MR-based radiomics model predictive of locoregional control in oropharyngeal squamous cell carcinoma (OPSCC) and to assess the impact of differences between datasets on the predictive performance. METHODS: Radiomic features, as defined in our previously published radiomics model, were extracted from the primary tumor volumes of 157 OPSCC patients in a different institute. The developed radiomics model was validated using this cohort. Additionally, parameters influencing performance, such as patient subgroups, MRI acquisition, and post-processing steps on prediction performance will be investigated. For this analysis, matched subgroups (based on human papillomavirus (HPV) status of the tumor, T-stage, and tumor subsite) and a subgroup with only patients with 4-mm slice thickness were studied. Also the influence of harmonization techniques (ComBat harmonization, quantile normalization) and the impact of feature stability across observers and centers were studied. Model performances were assessed by area under the curve (AUC), sensitivity, and specificity. RESULTS: Performance of the published model (AUC/sensitivity/specificity: 0.74/0.75/0.60) drops when applied on the validation cohort (AUC/sensitivity/specificity: 0.64/0.68/0.60). The performance of the full validation cohort improves slightly when the model is validated using a patient group with comparable HPV status of the tumor (AUC/sensitivity/specificity: 0.68/0.74/0.60), using patients acquired with a slice thickness of 4 mm (AUC/sensitivity/specificity: 0.67/0.73/0.57), or when quantile harmonization was performed (AUC/sensitivity/specificity: 0.66/0.69/0.60). CONCLUSION: The previously published model shows its generalizability and can be applied on data acquired from different vendors and protocols. Harmonization techniques and subgroup definition influence performance of predictive radiomics models. KEY POINTS: • Radiomics, a noninvasive quantitative image analysis technique, can support the radiologist by enhancing diagnostic accuracy and/or treatment decision-making. • A previously published model shows its generalizability and could be applied on data acquired from different vendors and protocols.


Asunto(s)
Neoplasias Orofaríngeas , Infecciones por Papillomavirus , Humanos , Imagen por Resonancia Magnética/métodos , Sensibilidad y Especificidad , Neoplasias Orofaríngeas/diagnóstico por imagen , Estudios Retrospectivos
9.
Int J Mol Sci ; 23(19)2022 Sep 22.
Artículo en Inglés | MEDLINE | ID: mdl-36232432

RESUMEN

Patients with inflammatory bowel disease (IBD) produce enhanced immunoglobulin A (IgA) against the microbiota compared to healthy individuals, which has been correlated with disease severity. Since IgA complexes can potently activate myeloid cells via the IgA receptor FcαRI (CD89), excessive IgA production may contribute to IBD pathology. However, the cellular mechanisms that contribute to dysregulated IgA production in IBD are poorly understood. Here, we demonstrate that intestinal FcαRI-expressing myeloid cells (i.e., monocytes and neutrophils) are in close contact with B lymphocytes in the lamina propria of IBD patients. Furthermore, stimulation of FcαRI-on monocytes triggered production of cytokines and chemokines that regulate B-cell differentiation and migration, including interleukin-6 (IL6), interleukin-10 (IL10), tumour necrosis factor-α (TNFα), a proliferation-inducing ligand (APRIL), and chemokine ligand-20 (CCL20). In vitro, these cytokines promoted IgA isotype switching in human B cells. Moreover, when naïve B lymphocytes were cultured in vitro in the presence of FcαRI-stimulated monocytes, enhanced IgA isotype switching was observed compared to B cells that were cultured with non-stimulated monocytes. Taken together, FcαRI-activated monocytes produced a cocktail of cytokines, as well as chemokines, that stimulated IgA switching in B cells, and close contact between B cells and myeloid cells was observed in the colons of IBD patients. As such, we hypothesize that, in IBD, IgA complexes activate myeloid cells, which in turn can result in excessive IgA production, likely contributing to disease pathology. Interrupting this loop may, therefore, represent a novel therapeutic strategy.


Asunto(s)
Enfermedades Inflamatorias del Intestino , Interleucina-10 , Linfocitos B , Citocinas , Humanos , Inmunoglobulina A , Cambio de Clase de Inmunoglobulina , Isotipos de Inmunoglobulinas , Interleucina-6 , Ligandos , Monocitos , Factor de Necrosis Tumoral alfa
10.
Biom J ; 64(7): 1289-1306, 2022 10.
Artículo en Inglés | MEDLINE | ID: mdl-35730912

RESUMEN

The features in a high-dimensional biomedical prediction problem are often well described by low-dimensional latent variables (or factors). We use this to include unlabeled features and additional information on the features when building a prediction model. Such additional feature information is often available in biomedical applications. Examples are annotation of genes, metabolites, or p-values from a previous study. We employ a Bayesian factor regression model that jointly models the features and the outcome using Gaussian latent variables. We fit the model using a computationally efficient variational Bayes method, which scales to high dimensions. We use the extra information to set up a prior model for the features in terms of hyperparameters, which are then estimated through empirical Bayes. The method is demonstrated in simulations and two applications. One application considers influenza vaccine efficacy prediction based on microarray data. The second application predicts oral cancer metastasis from RNAseq data.


Asunto(s)
Algoritmos , Proyectos de Investigación , Teorema de Bayes , Distribución Normal
11.
Epigenetics ; 17(10): 1057-1069, 2022 10.
Artículo en Inglés | MEDLINE | ID: mdl-34605346

RESUMEN

High levels of methylated DNA in urine represent an emerging biomarker for non-small cell lung cancer (NSCLC) detection and are the subject of ongoing research. This study aimed to investigate the circadian variation of urinary cell-free DNA (cfDNA) abundance and methylation levels of cancer-associated genes in NSCLC patients. In this prospective study of 23 metastatic NSCLC patients with active disease, patients were asked to collect six urine samples during the morning, afternoon, and evening of two subsequent days. Urinary cfDNA concentrations and methylation levels of CDO1, SOX17, and TAC1 were measured at each time point. Circadian variation and between- and within-subject variability were assessed using linear mixed models. Variability was estimated using the Intraclass Correlation Coefficient (ICC), representing reproducibility. No clear circadian patterns could be recognized for cfDNA concentrations or methylation levels across the different sampling time points. Significantly lower cfDNA concentrations were found in males (p=0.034). For cfDNA levels, the between- and within-subject variability were comparable, rendering an ICC of 0.49. For the methylation markers, ICCs varied considerably, ranging from 0.14 to 0.74. Test reproducibility could be improved by collecting multiple samples per patient. In conclusion, there is no preferred collection time for NSCLC detection in urine using methylation markers, but single measurements should be interpreted carefully, and serial sampling may increase test performance. This study contributes to the limited understanding of cfDNA dynamics in urine and the continued interest in urine-based liquid biopsies for cancer diagnostics.


Asunto(s)
Carcinoma de Pulmón de Células no Pequeñas , Ácidos Nucleicos Libres de Células , Neoplasias Pulmonares , Biomarcadores de Tumor/genética , Carcinoma de Pulmón de Células no Pequeñas/genética , Carcinoma de Pulmón de Células no Pequeñas/patología , Ácidos Nucleicos Libres de Células/genética , ADN , Metilación de ADN , Humanos , Neoplasias Pulmonares/genética , Masculino , Estudios Prospectivos , Reproducibilidad de los Resultados
12.
Proc Natl Acad Sci U S A ; 118(49)2021 12 07.
Artículo en Inglés | MEDLINE | ID: mdl-34873056

RESUMEN

Preclinical models have been the workhorse of cancer research, producing massive amounts of drug response data. Unfortunately, translating response biomarkers derived from these datasets to human tumors has proven to be particularly challenging. To address this challenge, we developed TRANSACT, a computational framework that builds a consensus space to capture biological processes common to preclinical models and human tumors and exploits this space to construct drug response predictors that robustly transfer from preclinical models to human tumors. TRANSACT performs favorably compared to four competing approaches, including two deep learning approaches, on a set of 23 drug prediction challenges on The Cancer Genome Atlas and 226 metastatic tumors from the Hartwig Medical Foundation. We demonstrate that response predictions deliver a robust performance for a number of therapies of high clinical importance: platinum-based chemotherapies, gemcitabine, and paclitaxel. In contrast to other approaches, we demonstrate the interpretability of the TRANSACT predictors by correctly identifying known biomarkers of targeted therapies, and we propose potential mechanisms that mediate the resistance to two chemotherapeutic agents.


Asunto(s)
Ensayos de Selección de Medicamentos Antitumorales/métodos , Perfilación de la Expresión Génica/métodos , Animales , Antineoplásicos/uso terapéutico , Biomarcadores Farmacológicos/metabolismo , Línea Celular Tumoral/efectos de los fármacos , Aprendizaje Profundo , Modelos Animales de Enfermedad , Predicción/métodos , Xenoinjertos , Humanos , Modelos Teóricos
13.
Cells ; 10(11)2021 10 29.
Artículo en Inglés | MEDLINE | ID: mdl-34831173

RESUMEN

Hypertrophic Cardiomyopathy (HCM) is a common inherited heart disease with poor risk prediction due to incomplete penetrance and a lack of clear genotype-phenotype correlations. Advanced imaging techniques have shown altered myocardial energetics already in preclinical gene variant carriers. To determine whether disturbed myocardial energetics with the potential to serve as biomarkers are also reflected in the serum metabolome, we analyzed the serum metabolome of asymptomatic carriers in comparison to healthy controls and obstructive HCM patients (HOCM). We performed non-quantitative direct-infusion high-resolution mass spectrometry-based untargeted metabolomics on serum from fasted asymptomatic gene variant carriers, symptomatic HOCM patients and healthy controls (n = 31, 14 and 9, respectively). Biomarker panels that discriminated the groups were identified by performing multivariate modeling with gradient-boosting classifiers. For all three group-wise comparisons we identified a panel of 30 serum metabolites that best discriminated the groups. These metabolite panels performed equally well as advanced cardiac imaging modalities in distinguishing the groups. Seven metabolites were found to be predictive in two different comparisons and may play an important role in defining the disease stage. This study reveals unique metabolic signatures in serum of preclinical carriers and HOCM patients that may potentially be used for HCM risk stratification and precision therapeutics.


Asunto(s)
Cardiomiopatía Hipertrófica/metabolismo , Metabolómica , Adulto , Metabolismo Energético , Femenino , Humanos , Masculino , Metaboloma , Persona de Mediana Edad , Análisis Multivariante , Mutación/genética , Sarcómeros/genética
14.
Nat Commun ; 12(1): 6106, 2021 10 20.
Artículo en Inglés | MEDLINE | ID: mdl-34671028

RESUMEN

Deconvolution of bulk gene expression profiles into the cellular components is pivotal to portraying tissue's complex cellular make-up, such as the tumor microenvironment. However, the inherently variable nature of gene expression requires a comprehensive statistical model and reliable prior knowledge of individual cell types that can be obtained from single-cell RNA sequencing. We introduce BLADE (Bayesian Log-normAl Deconvolution), a unified Bayesian framework to estimate both cellular composition and gene expression profiles for each cell type. Unlike previous comprehensive statistical approaches, BLADE can handle > 20 types of cells due to the efficient variational inference. Throughout an intensive evaluation with > 700 simulated and real datasets, BLADE demonstrated enhanced robustness against gene expression variability and better completeness than conventional methods, in particular, to reconstruct gene expression profiles of each cell type. In summary, BLADE is a powerful tool to unravel heterogeneous cellular activity in complex biological systems from standard bulk gene expression data.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Teorema de Bayes , Simulación por Computador , Humanos , Leucocitos Mononucleares/citología , Leucocitos Mononucleares/metabolismo , Aprendizaje Automático , Modelos Estadísticos , Neoplasias/genética , Neoplasias/patología , Análisis de Secuencia de ARN , Análisis de la Célula Individual , Transcriptoma/genética , Flujo de Trabajo
15.
Mol Oncol ; 15(12): 3348-3362, 2021 12.
Artículo en Inglés | MEDLINE | ID: mdl-34510716

RESUMEN

Consensus molecular subtypes (CMSs) can guide precision treatment of colorectal cancer (CRC). We aim to identify methylation markers to distinguish between CMS2 and CMS3 in patients with CRC, for which an easy test is currently lacking. To this aim, fresh-frozen tumor tissue of 239 patients with stage I-III CRC was analyzed. Methylation profiles were obtained using the Infinium HumanMethylation450 BeadChip. We performed adaptive group-regularized logistic ridge regression with post hoc group-weighted elastic net marker selection to build prediction models for classification of CMS2 and CMS3. The Cancer Genome Atlas (TCGA) data were used for validation. Group regularization of the probes was done based on their location either relative to a CpG island or relative to a gene present in the CMS classifier, resulting in two different prediction models and subsequently different marker panels. For both panels, even when using only five markers, accuracies were > 90% in our cohort and in the TCGA validation set. Our methylation marker panel accurately distinguishes between CMS2 and CMS3. This enables development of a targeted assay to provide a robust and clinically relevant classification tool for CRC patients.


Asunto(s)
Neoplasias Colorrectales , Metilación de ADN , Biomarcadores de Tumor/genética , Neoplasias Colorrectales/genética , Neoplasias Colorrectales/patología , Islas de CpG/genética , Metilación de ADN/genética , Humanos
16.
Stat Med ; 40(26): 5910-5925, 2021 11 20.
Artículo en Inglés | MEDLINE | ID: mdl-34438466

RESUMEN

Clinical research often focuses on complex traits in which many variables play a role in mechanisms driving, or curing, diseases. Clinical prediction is hard when data is high-dimensional, but additional information, like domain knowledge and previously published studies, may be helpful to improve predictions. Such complementary data, or co-data, provide information on the covariates, such as genomic location or P-values from external studies. We use multiple and various co-data to define possibly overlapping or hierarchically structured groups of covariates. These are then used to estimate adaptive multi-group ridge penalties for generalized linear and Cox models. Available group adaptive methods primarily target for settings with few groups, and therefore likely overfit for non-informative, correlated or many groups, and do not account for known structure on group level. To handle these issues, our method combines empirical Bayes estimation of the hyperparameters with an extra level of flexible shrinkage. This renders a uniquely flexible framework as any type of shrinkage can be used on the group level. We describe various types of co-data and propose suitable forms of hypershrinkage. The method is very versatile, as it allows for integration and weighting of multiple co-data sets, inclusion of unpenalized covariates and posterior variable selection. For three cancer genomics applications we demonstrate improvements compared to other models in terms of performance, variable selection stability and validation.


Asunto(s)
Genómica , Teorema de Bayes , Humanos , Modelos de Riesgos Proporcionales
17.
Biostatistics ; 22(4): 723-737, 2021 10 13.
Artículo en Inglés | MEDLINE | ID: mdl-31886488

RESUMEN

In high-dimensional data settings, additional information on the features is often available. Examples of such external information in omics research are: (i) $p$-values from a previous study and (ii) omics annotation. The inclusion of this information in the analysis may enhance classification performance and feature selection but is not straightforward. We propose a group-regularized (logistic) elastic net regression method, where each penalty parameter corresponds to a group of features based on the external information. The method, termed gren, makes use of the Bayesian formulation of logistic elastic net regression to estimate both the model and penalty parameters in an approximate empirical-variational Bayes framework. Simulations and applications to three cancer genomics studies and one Alzheimer metabolomics study show that, if the partitioning of the features is informative, classification performance, and feature selection are indeed enhanced.


Asunto(s)
Genómica , Neoplasias , Teorema de Bayes , Humanos , Modelos Logísticos , Análisis de Regresión
18.
Bioinformatics ; 37(14): 2012-2016, 2021 08 04.
Artículo en Inglés | MEDLINE | ID: mdl-32437519

RESUMEN

MOTIVATION: Machine learning in the biomedical sciences should ideally provide predictive and interpretable models. When predicting outcomes from clinical or molecular features, applied researchers often want to know which features have effects, whether these effects are positive or negative and how strong these effects are. Regression analysis includes this information in the coefficients but typically renders less predictive models than more advanced machine learning techniques. RESULTS: Here, we propose an interpretable meta-learning approach for high-dimensional regression. The elastic net provides a compromise between estimating weak effects for many features and strong effects for some features. It has a mixing parameter to weight between ridge and lasso regularization. Instead of selecting one weighting by tuning, we combine multiple weightings by stacking. We do this in a way that increases predictivity without sacrificing interpretability. AVAILABILITY AND IMPLEMENTATION: The R package starnet is available on GitHub (https://github.com/rauschenberger/starnet) and CRAN (https://CRAN.R-project.org/package=starnet).


Asunto(s)
Aprendizaje Automático , Programas Informáticos , Humanos , Análisis de Regresión
19.
Biom J ; 63(2): 289-304, 2021 02.
Artículo en Inglés | MEDLINE | ID: mdl-33155717

RESUMEN

In precision medicine, a common problem is drug sensitivity prediction from cancer tissue cell lines. These types of problems entail modelling multivariate drug responses on high-dimensional molecular feature sets in typically >1000 cell lines. The dimensions of the problem require specialised models and estimation methods. In addition, external information on both the drugs and the features is often available. We propose to model the drug responses through a linear regression with shrinkage enforced through a normal inverse Gaussian prior. We let the prior depend on the external information, and estimate the model and external information dependence in an empirical-variational Bayes framework. We demonstrate the usefulness of this model in both a simulated setting and in the publicly available Genomics of Drug Sensitivity in Cancer data.


Asunto(s)
Genómica , Preparaciones Farmacéuticas , Teorema de Bayes , Distribución Normal , Medicina de Precisión
20.
Eur Radiol ; 30(11): 6311-6321, 2020 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-32500196

RESUMEN

OBJECTIVES: Head and neck squamous cell carcinoma (HNSCC) shows a remarkable heterogeneity between tumors, which may be captured by a variety of quantitative features extracted from diagnostic images, termed radiomics. The aim of this study was to develop and validate MRI-based radiomic prognostic models in oral and oropharyngeal cancer. MATERIALS AND METHODS: Native T1-weighted images of four independent, retrospective (2005-2013), patient cohorts (n = 102, n = 76, n = 89, and n = 56) were used to delineate primary tumors, and to extract 545 quantitative features from. Subsequently, redundancy filtering and factor analysis were performed to handle collinearity in the data. Next, radiomic prognostic models were trained and validated to predict overall survival (OS) and relapse-free survival (RFS). Radiomic features were compared to and combined with prognostic models based on standard clinical parameters. Performance was assessed by integrated area under the curve (iAUC). RESULTS: In oral cancer, the radiomic model showed an iAUC of 0.69 (OS) and 0.70 (RFS) in the validation cohort, whereas the iAUC in the oropharyngeal cancer validation cohort was 0.71 (OS) and 0.74 (RFS). By integration of radiomic and clinical variables, the most accurate models were defined (iAUC oral cavity, 0.72 (OS) and 0.74 (RFS); iAUC oropharynx, 0.81 (OS) and 0.78 (RFS)), and these combined models outperformed prognostic models based on standard clinical variables only (p < 0.001). CONCLUSIONS: MRI radiomics is feasible in HNSCC despite the known variability in MRI vendors and acquisition protocols, and radiomic features added information to prognostic models based on clinical parameters. KEY POINTS: • MRI radiomics can predict overall survival and relapse-free survival in oral and HPV-negative oropharyngeal cancer. • MRI radiomics provides additional prognostic information to known clinical variables, with the best performance of the combined models. • Variation in MRI vendors and acquisition protocols did not influence performance of radiomic prognostic models.


Asunto(s)
Neoplasias de Cabeza y Cuello/diagnóstico por imagen , Imagen por Resonancia Magnética , Recurrencia Local de Neoplasia/diagnóstico por imagen , Radiometría , Carcinoma de Células Escamosas de Cabeza y Cuello/diagnóstico por imagen , Anciano , Área Bajo la Curva , Biomarcadores , Comorbilidad , Supervivencia sin Enfermedad , Análisis Factorial , Femenino , Humanos , Estimación de Kaplan-Meier , Masculino , Persona de Mediana Edad , Variaciones Dependientes del Observador , Pronóstico , Reproducibilidad de los Resultados , Estudios Retrospectivos , Resultado del Tratamiento
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA