Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 121
Filtrar
1.
Bioinform Adv ; 4(1): vbae021, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38456127

RESUMO

Summary: In clinical and biomedical research, multiple high-dimensional datasets are nowadays routinely collected from omics and imaging devices. Multivariate methods, such as Canonical Correlation Analysis (CCA), integrate two (or more) datasets to discover and understand underlying biological mechanisms. For an explorative method like CCA, interpretation is key. We present a sparse CCA method based on soft-thresholding that produces near-orthogonal components, allows for browsing over various sparsity levels, and permutation-based hypothesis testing. Our soft-thresholding approach avoids tuning of a penalty parameter. Such tuning is computationally burdensome and may render unintelligible results. In addition, unlike alternative approaches, our method is less dependent on the initialization. We examined the performance of our approach with simulations and illustrated its use on real cancer genomics data from drug sensitivity screens. Moreover, we compared its performance to Penalized Matrix Analysis (PMA), which is a popular alternative of sparse CCA with a focus on yielding interpretable results. Compared to PMA, our method offers improved interpretability of the results, while not compromising, or even improving, signal discovery. Availability and implementation: The software and simulation framework are available at https://github.com/nuria-sv/toscca.

2.
Diabetologia ; 67(5): 885-894, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38374450

RESUMO

AIMS/HYPOTHESIS: People with type 2 diabetes are heterogeneous in their disease trajectory, with some progressing more quickly to insulin initiation than others. Although classical biomarkers such as age, HbA1c and diabetes duration are associated with glycaemic progression, it is unclear how well such variables predict insulin initiation or requirement and whether newly identified markers have added predictive value. METHODS: In two prospective cohort studies as part of IMI-RHAPSODY, we investigated whether clinical variables and three types of molecular markers (metabolites, lipids, proteins) can predict time to insulin requirement using different machine learning approaches (lasso, ridge, GRridge, random forest). Clinical variables included age, sex, HbA1c, HDL-cholesterol and C-peptide. Models were run with unpenalised clinical variables (i.e. always included in the model without weights) or penalised clinical variables, or without clinical variables. Model development was performed in one cohort and the model was applied in a second cohort. Model performance was evaluated using Harrel's C statistic. RESULTS: Of the 585 individuals from the Hoorn Diabetes Care System (DCS) cohort, 69 required insulin during follow-up (1.0-11.4 years); of the 571 individuals in the Genetics of Diabetes Audit and Research in Tayside Scotland (GoDARTS) cohort, 175 required insulin during follow-up (0.3-11.8 years). Overall, the clinical variables and proteins were selected in the different models most often, followed by the metabolites. The most frequently selected clinical variables were HbA1c (18 of the 36 models, 50%), age (15 models, 41.2%) and C-peptide (15 models, 41.2%). Base models (age, sex, BMI, HbA1c) including only clinical variables performed moderately in both the DCS discovery cohort (C statistic 0.71 [95% CI 0.64, 0.79]) and the GoDARTS replication cohort (C 0.71 [95% CI 0.69, 0.75]). A more extensive model including HDL-cholesterol and C-peptide performed better in both cohorts (DCS, C 0.74 [95% CI 0.67, 0.81]; GoDARTS, C 0.73 [95% CI 0.69, 0.77]). Two proteins, lactadherin and proto-oncogene tyrosine-protein kinase receptor, were most consistently selected and slightly improved model performance. CONCLUSIONS/INTERPRETATION: Using machine learning approaches, we show that insulin requirement risk can be modestly well predicted by predominantly clinical variables. Inclusion of molecular markers improves the prognostic performance beyond that of clinical variables by up to 5%. Such prognostic models could be useful for identifying people with diabetes at high risk of progressing quickly to treatment intensification. DATA AVAILABILITY: Summary statistics of lipidomic, proteomic and metabolomic data are available from a Shiny dashboard at https://rhapdata-app.vital-it.ch .


Assuntos
Diabetes Mellitus Tipo 2 , Humanos , Diabetes Mellitus Tipo 2/metabolismo , Estudos Prospectivos , Peptídeo C , Proteômica , Insulina/uso terapêutico , Biomarcadores , Aprendizado de Máquina , Colesterol
3.
Bioinformatics ; 39(12)2023 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-37951587

RESUMO

MOTIVATION: In many high-dimensional prediction or classification tasks, complementary data on the features are available, e.g. prior biological knowledge on (epi)genetic markers. Here we consider tasks with numerical prior information that provide an insight into the importance (weight) and the direction (sign) of the feature effects, e.g. regression coefficients from previous studies. RESULTS: We propose an approach for integrating multiple sources of such prior information into penalized regression. If suitable co-data are available, this improves the predictive performance, as shown by simulation and application. AVAILABILITY AND IMPLEMENTATION: The proposed method is implemented in the R package transreg (https://github.com/lcsb-bds/transreg, https://cran.r-project.org/package=transreg).


Assuntos
Software , Simulação por Computador , Ciência da Implementação
4.
J Comput Graph Stat ; 32(3): 950-960, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38013849

RESUMO

Elastic net penalization is widely used in high-dimensional prediction and variable selection settings. Auxiliary information on the variables, for example, groups of variables, is often available. Group-adaptive elastic net penalization exploits this information to potentially improve performance by estimating group penalties, thereby penalizing important groups of variables less than other groups. Estimating these group penalties is, however, hard due to the high dimension of the data. Existing methods are computationally expensive or not generic in the type of response. Here we present a fast method for estimation of group-adaptive elastic net penalties for generalized linear models. We first derive a low-dimensional representation of the Taylor approximation of the marginal likelihood for group-adaptive ridge penalties, to efficiently estimate these penalties. Then we show by using asymptotic normality of the linear predictors that this marginal likelihood approximates that of elastic net models. The ridge group penalties are then transformed to elastic net group penalties by matching the ridge prior variance to the elastic net prior variance as function of the group penalties. The method allows for overlapping groups and unpenalized variables, and is easily extended to other penalties. For a model-based simulation study and two cancer genomics applications we demonstrate a substantially decreased computation time and improved or matching performance compared to other methods. Supplementary materials for this article are available online.

5.
iScience ; 26(8): 107331, 2023 Aug 18.
Artigo em Inglês | MEDLINE | ID: mdl-37539043

RESUMO

To understand the clinical significance of the tumor microenvironment (TME), it is essential to study the interactions between malignant and non-malignant cells in clinical specimens. Here, we established a computational framework for a multiplex imaging system to comprehensively characterize spatial contexts of the TME at multiple scales, including close and long-distance spatial interactions between cell type pairs. We applied this framework to a total of 1,393 multiplex imaging data newly generated from 88 primary central nervous system lymphomas with complete follow-up data and identified significant prognostic subgroups mainly shaped by the spatial context. A supervised analysis confirmed a significant contribution of spatial context in predicting patient survival. In particular, we found an opposite prognostic value of macrophage infiltration depending on its proximity to specific cell types. Altogether, we provide a comprehensive framework to analyze spatial cellular interaction that can be broadly applied to other technologies and tumor contexts.

6.
BMC Bioinformatics ; 24(1): 172, 2023 Apr 26.
Artigo em Inglês | MEDLINE | ID: mdl-37101151

RESUMO

BACKGROUND: High-dimensional prediction considers data with more variables than samples. Generic research goals are to find the best predictor or to select variables. Results may be improved by exploiting prior information in the form of co-data, providing complementary data not on the samples, but on the variables. We consider adaptive ridge penalised generalised linear and Cox models, in which the variable-specific ridge penalties are adapted to the co-data to give a priori more weight to more important variables. The R-package ecpc originally accommodated various and possibly multiple co-data sources, including categorical co-data, i.e. groups of variables, and continuous co-data. Continuous co-data, however, were handled by adaptive discretisation, potentially inefficiently modelling and losing information. As continuous co-data such as external p values or correlations often arise in practice, more generic co-data models are needed. RESULTS: Here, we present an extension to the method and software for generic co-data models, particularly for continuous co-data. At the basis lies a classical linear regression model, regressing prior variance weights on the co-data. Co-data variables are then estimated with empirical Bayes moment estimation. After placing the estimation procedure in the classical regression framework, extension to generalised additive and shape constrained co-data models is straightforward. Besides, we show how ridge penalties may be transformed to elastic net penalties. In simulation studies we first compare various co-data models for continuous co-data from the extension to the original method. Secondly, we compare variable selection performance to other variable selection methods. The extension is faster than the original method and shows improved prediction and variable selection performance for non-linear co-data relations. Moreover, we demonstrate use of the package in several genomics examples throughout the paper. CONCLUSIONS: The R-package ecpc accommodates linear, generalised additive and shape constrained additive co-data models for the purpose of improved high-dimensional prediction and variable selection. The extended version of the package as presented here (version number 3.1.1 and higher) is available on ( https://cran.r-project.org/web/packages/ecpc/ ).


Assuntos
Genômica , Software , Teorema de Bayes , Simulação por Computador , Modelos Lineares
7.
Oral Oncol ; 137: 106307, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36657208

RESUMO

OBJECTIVES: Human papillomavirus- (HPV) positive oropharyngeal squamous cell carcinoma (OPSCC) differs biologically and clinically from HPV-negative OPSCC and has a better prognosis. This study aims to analyze the value of magnetic resonance imaging (MRI)-based radiomics in predicting HPV status in OPSCC and aims to develop a prognostic model in OPSCC including HPV status and MRI-based radiomics. MATERIALS AND METHODS: Manual delineation of 249 primary OPSCCs (91 HPV-positive and 159 HPV-negative) on pretreatment native T1-weighted MRIs was performed and used to extract 498 radiomic features per delineation. A logistic regression (LR) and random forest (RF) model were developed using univariate feature selection. Additionally, factor analysis was performed, and the derived factors were combined with clinical data in a predictive model to assess the performance on predicting HPV status. Additionally, factors were combined with clinical parameters in a multivariable survival regression analysis. RESULTS: Both feature-based LR and RF models performed with an AUC of 0.79 in prediction of HPV status. Fourteen of the twenty most significant features were similar in both models, mainly concerning tumor sphericity, intensity variation, compactness, and tumor diameter. The model combining clinical data and radiomic factors (AUC = 0.89) outperformed the radiomics-only model in predicting OPSCC HPV status. Overall survival prediction was most accurate using the combination of clinical parameters and radiomic factors (C-index = 0.72). CONCLUSION: Predictive models based on MR-radiomic features were able to predict HPV status with sufficient performance, supporting the role of MRI-based radiomics as potential imaging biomarker. Survival prediction improved by combining clinical features with MRI-based radiomics.


Assuntos
Neoplasias de Cabeça e Pescoço , Neoplasias Orofaríngeas , Infecções por Papillomavirus , Humanos , Carcinoma de Células Escamosas de Cabeça e Pescoço , Papillomavirus Humano , Neoplasias Orofaríngeas/patologia , Infecções por Papillomavirus/complicações , Infecções por Papillomavirus/diagnóstico por imagem , Infecções por Papillomavirus/patologia , Prognóstico , Imageamento por Ressonância Magnética , Estudos Retrospectivos , Papillomaviridae
8.
Neuroradiology ; 65(4): 855-863, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-36574026

RESUMO

PURPOSE: Patients with vanishing white matter (VWM) experience unremitting chronic neurological decline and stress-provoked episodes of rapid, partially reversible decline. Cerebral white matter abnormalities are progressive, without improvement, and are therefore unlikely to be related to the episodes. We determined which radiological findings are related to episodic decline. METHODS: MRI scans of VWM patients were retrospectively analyzed. Patients were grouped into A (never episodes) and B (episodes). Signal abnormalities outside the cerebral white matter were rated as absent, mild, or severe. A sum score was developed with abnormalities only seen in group B. The temporal relationship between signal abnormalities and episodes was determined by subdividing scans into those made before, less than 3 months after, and more than 3 months after onset of an episode. RESULTS: Five hundred forty-three examinations of 298 patients were analyzed. Mild and severe signal abnormalities in the caudate nucleus, putamen, globus pallidus, thalamus, midbrain, medulla oblongata, and severe signal abnormalities in the pons were only seen in group B. The sum score, constructed with these abnormalities, depended on the timing of the scan (χ2(2, 400) = 22.8; p < .001): it was least often abnormal before, most often abnormal with the highest value shortly after, and lower longer than 3 months after an episode. CONCLUSION: In VWM, signal abnormalities in brainstem, thalamus, and basal ganglia are related to episodic decline and can improve. Knowledge of the natural MRI history in VWM is important for clinical interpretation of MRI findings and crucial in therapy trials.


Assuntos
Leucoencefalopatias , Substância Branca , Leucoencefalopatias/diagnóstico por imagem , Leucoencefalopatias/patologia , Substância Branca/diagnóstico por imagem , Substância Branca/patologia , Humanos , Masculino , Feminino , Recém-Nascido , Lactente , Pré-Escolar , Criança , Adolescente , Adulto Jovem , Adulto , Pessoa de Meia-Idade , Imageamento por Ressonância Magnética
9.
Eur Radiol ; 33(4): 2850-2860, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-36460924

RESUMO

OBJECTIVES: To externally validate a pre-treatment MR-based radiomics model predictive of locoregional control in oropharyngeal squamous cell carcinoma (OPSCC) and to assess the impact of differences between datasets on the predictive performance. METHODS: Radiomic features, as defined in our previously published radiomics model, were extracted from the primary tumor volumes of 157 OPSCC patients in a different institute. The developed radiomics model was validated using this cohort. Additionally, parameters influencing performance, such as patient subgroups, MRI acquisition, and post-processing steps on prediction performance will be investigated. For this analysis, matched subgroups (based on human papillomavirus (HPV) status of the tumor, T-stage, and tumor subsite) and a subgroup with only patients with 4-mm slice thickness were studied. Also the influence of harmonization techniques (ComBat harmonization, quantile normalization) and the impact of feature stability across observers and centers were studied. Model performances were assessed by area under the curve (AUC), sensitivity, and specificity. RESULTS: Performance of the published model (AUC/sensitivity/specificity: 0.74/0.75/0.60) drops when applied on the validation cohort (AUC/sensitivity/specificity: 0.64/0.68/0.60). The performance of the full validation cohort improves slightly when the model is validated using a patient group with comparable HPV status of the tumor (AUC/sensitivity/specificity: 0.68/0.74/0.60), using patients acquired with a slice thickness of 4 mm (AUC/sensitivity/specificity: 0.67/0.73/0.57), or when quantile harmonization was performed (AUC/sensitivity/specificity: 0.66/0.69/0.60). CONCLUSION: The previously published model shows its generalizability and can be applied on data acquired from different vendors and protocols. Harmonization techniques and subgroup definition influence performance of predictive radiomics models. KEY POINTS: • Radiomics, a noninvasive quantitative image analysis technique, can support the radiologist by enhancing diagnostic accuracy and/or treatment decision-making. • A previously published model shows its generalizability and could be applied on data acquired from different vendors and protocols.


Assuntos
Neoplasias Orofaríngeas , Infecções por Papillomavirus , Humanos , Imageamento por Ressonância Magnética/métodos , Sensibilidade e Especificidade , Neoplasias Orofaríngeas/diagnóstico por imagem , Estudos Retrospectivos
10.
Int J Mol Sci ; 23(19)2022 Sep 22.
Artigo em Inglês | MEDLINE | ID: mdl-36232432

RESUMO

Patients with inflammatory bowel disease (IBD) produce enhanced immunoglobulin A (IgA) against the microbiota compared to healthy individuals, which has been correlated with disease severity. Since IgA complexes can potently activate myeloid cells via the IgA receptor FcαRI (CD89), excessive IgA production may contribute to IBD pathology. However, the cellular mechanisms that contribute to dysregulated IgA production in IBD are poorly understood. Here, we demonstrate that intestinal FcαRI-expressing myeloid cells (i.e., monocytes and neutrophils) are in close contact with B lymphocytes in the lamina propria of IBD patients. Furthermore, stimulation of FcαRI-on monocytes triggered production of cytokines and chemokines that regulate B-cell differentiation and migration, including interleukin-6 (IL6), interleukin-10 (IL10), tumour necrosis factor-α (TNFα), a proliferation-inducing ligand (APRIL), and chemokine ligand-20 (CCL20). In vitro, these cytokines promoted IgA isotype switching in human B cells. Moreover, when naïve B lymphocytes were cultured in vitro in the presence of FcαRI-stimulated monocytes, enhanced IgA isotype switching was observed compared to B cells that were cultured with non-stimulated monocytes. Taken together, FcαRI-activated monocytes produced a cocktail of cytokines, as well as chemokines, that stimulated IgA switching in B cells, and close contact between B cells and myeloid cells was observed in the colons of IBD patients. As such, we hypothesize that, in IBD, IgA complexes activate myeloid cells, which in turn can result in excessive IgA production, likely contributing to disease pathology. Interrupting this loop may, therefore, represent a novel therapeutic strategy.


Assuntos
Doenças Inflamatórias Intestinais , Interleucina-10 , Linfócitos B , Citocinas , Humanos , Imunoglobulina A , Switching de Imunoglobulina , Isotipos de Imunoglobulinas , Interleucina-6 , Ligantes , Monócitos , Fator de Necrose Tumoral alfa
11.
Biom J ; 64(7): 1289-1306, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-35730912

RESUMO

The features in a high-dimensional biomedical prediction problem are often well described by low-dimensional latent variables (or factors). We use this to include unlabeled features and additional information on the features when building a prediction model. Such additional feature information is often available in biomedical applications. Examples are annotation of genes, metabolites, or p-values from a previous study. We employ a Bayesian factor regression model that jointly models the features and the outcome using Gaussian latent variables. We fit the model using a computationally efficient variational Bayes method, which scales to high dimensions. We use the extra information to set up a prior model for the features in terms of hyperparameters, which are then estimated through empirical Bayes. The method is demonstrated in simulations and two applications. One application considers influenza vaccine efficacy prediction based on microarray data. The second application predicts oral cancer metastasis from RNAseq data.


Assuntos
Algoritmos , Projetos de Pesquisa , Teorema de Bayes , Distribuição Normal
12.
Epigenetics ; 17(10): 1057-1069, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-34605346

RESUMO

High levels of methylated DNA in urine represent an emerging biomarker for non-small cell lung cancer (NSCLC) detection and are the subject of ongoing research. This study aimed to investigate the circadian variation of urinary cell-free DNA (cfDNA) abundance and methylation levels of cancer-associated genes in NSCLC patients. In this prospective study of 23 metastatic NSCLC patients with active disease, patients were asked to collect six urine samples during the morning, afternoon, and evening of two subsequent days. Urinary cfDNA concentrations and methylation levels of CDO1, SOX17, and TAC1 were measured at each time point. Circadian variation and between- and within-subject variability were assessed using linear mixed models. Variability was estimated using the Intraclass Correlation Coefficient (ICC), representing reproducibility. No clear circadian patterns could be recognized for cfDNA concentrations or methylation levels across the different sampling time points. Significantly lower cfDNA concentrations were found in males (p=0.034). For cfDNA levels, the between- and within-subject variability were comparable, rendering an ICC of 0.49. For the methylation markers, ICCs varied considerably, ranging from 0.14 to 0.74. Test reproducibility could be improved by collecting multiple samples per patient. In conclusion, there is no preferred collection time for NSCLC detection in urine using methylation markers, but single measurements should be interpreted carefully, and serial sampling may increase test performance. This study contributes to the limited understanding of cfDNA dynamics in urine and the continued interest in urine-based liquid biopsies for cancer diagnostics.


Assuntos
Carcinoma Pulmonar de Células não Pequenas , Ácidos Nucleicos Livres , Neoplasias Pulmonares , Biomarcadores Tumorais/genética , Carcinoma Pulmonar de Células não Pequenas/genética , Carcinoma Pulmonar de Células não Pequenas/patologia , Ácidos Nucleicos Livres/genética , DNA , Metilação de DNA , Humanos , Neoplasias Pulmonares/genética , Masculino , Estudos Prospectivos , Reprodutibilidade dos Testes
13.
Proc Natl Acad Sci U S A ; 118(49)2021 12 07.
Artigo em Inglês | MEDLINE | ID: mdl-34873056

RESUMO

Preclinical models have been the workhorse of cancer research, producing massive amounts of drug response data. Unfortunately, translating response biomarkers derived from these datasets to human tumors has proven to be particularly challenging. To address this challenge, we developed TRANSACT, a computational framework that builds a consensus space to capture biological processes common to preclinical models and human tumors and exploits this space to construct drug response predictors that robustly transfer from preclinical models to human tumors. TRANSACT performs favorably compared to four competing approaches, including two deep learning approaches, on a set of 23 drug prediction challenges on The Cancer Genome Atlas and 226 metastatic tumors from the Hartwig Medical Foundation. We demonstrate that response predictions deliver a robust performance for a number of therapies of high clinical importance: platinum-based chemotherapies, gemcitabine, and paclitaxel. In contrast to other approaches, we demonstrate the interpretability of the TRANSACT predictors by correctly identifying known biomarkers of targeted therapies, and we propose potential mechanisms that mediate the resistance to two chemotherapeutic agents.


Assuntos
Ensaios de Seleção de Medicamentos Antitumorais/métodos , Perfilação da Expressão Gênica/métodos , Animais , Antineoplásicos/uso terapêutico , Biomarcadores Farmacológicos/metabolismo , Linhagem Celular Tumoral/efeitos dos fármacos , Aprendizado Profundo , Modelos Animais de Doenças , Previsões/métodos , Xenoenxertos , Humanos , Modelos Teóricos
14.
Cells ; 10(11)2021 10 29.
Artigo em Inglês | MEDLINE | ID: mdl-34831173

RESUMO

Hypertrophic Cardiomyopathy (HCM) is a common inherited heart disease with poor risk prediction due to incomplete penetrance and a lack of clear genotype-phenotype correlations. Advanced imaging techniques have shown altered myocardial energetics already in preclinical gene variant carriers. To determine whether disturbed myocardial energetics with the potential to serve as biomarkers are also reflected in the serum metabolome, we analyzed the serum metabolome of asymptomatic carriers in comparison to healthy controls and obstructive HCM patients (HOCM). We performed non-quantitative direct-infusion high-resolution mass spectrometry-based untargeted metabolomics on serum from fasted asymptomatic gene variant carriers, symptomatic HOCM patients and healthy controls (n = 31, 14 and 9, respectively). Biomarker panels that discriminated the groups were identified by performing multivariate modeling with gradient-boosting classifiers. For all three group-wise comparisons we identified a panel of 30 serum metabolites that best discriminated the groups. These metabolite panels performed equally well as advanced cardiac imaging modalities in distinguishing the groups. Seven metabolites were found to be predictive in two different comparisons and may play an important role in defining the disease stage. This study reveals unique metabolic signatures in serum of preclinical carriers and HOCM patients that may potentially be used for HCM risk stratification and precision therapeutics.


Assuntos
Cardiomiopatia Hipertrófica/metabolismo , Metabolômica , Adulto , Metabolismo Energético , Feminino , Humanos , Masculino , Metaboloma , Pessoa de Meia-Idade , Análise Multivariada , Mutação/genética , Sarcômeros/genética
15.
Nat Commun ; 12(1): 6106, 2021 10 20.
Artigo em Inglês | MEDLINE | ID: mdl-34671028

RESUMO

Deconvolution of bulk gene expression profiles into the cellular components is pivotal to portraying tissue's complex cellular make-up, such as the tumor microenvironment. However, the inherently variable nature of gene expression requires a comprehensive statistical model and reliable prior knowledge of individual cell types that can be obtained from single-cell RNA sequencing. We introduce BLADE (Bayesian Log-normAl Deconvolution), a unified Bayesian framework to estimate both cellular composition and gene expression profiles for each cell type. Unlike previous comprehensive statistical approaches, BLADE can handle > 20 types of cells due to the efficient variational inference. Throughout an intensive evaluation with > 700 simulated and real datasets, BLADE demonstrated enhanced robustness against gene expression variability and better completeness than conventional methods, in particular, to reconstruct gene expression profiles of each cell type. In summary, BLADE is a powerful tool to unravel heterogeneous cellular activity in complex biological systems from standard bulk gene expression data.


Assuntos
Perfilação da Expressão Gênica/métodos , Teorema de Bayes , Simulação por Computador , Humanos , Leucócitos Mononucleares/citologia , Leucócitos Mononucleares/metabolismo , Aprendizado de Máquina , Modelos Estatísticos , Neoplasias/genética , Neoplasias/patologia , Análise de Sequência de RNA , Análise de Célula Única , Transcriptoma/genética , Fluxo de Trabalho
16.
Mol Oncol ; 15(12): 3348-3362, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34510716

RESUMO

Consensus molecular subtypes (CMSs) can guide precision treatment of colorectal cancer (CRC). We aim to identify methylation markers to distinguish between CMS2 and CMS3 in patients with CRC, for which an easy test is currently lacking. To this aim, fresh-frozen tumor tissue of 239 patients with stage I-III CRC was analyzed. Methylation profiles were obtained using the Infinium HumanMethylation450 BeadChip. We performed adaptive group-regularized logistic ridge regression with post hoc group-weighted elastic net marker selection to build prediction models for classification of CMS2 and CMS3. The Cancer Genome Atlas (TCGA) data were used for validation. Group regularization of the probes was done based on their location either relative to a CpG island or relative to a gene present in the CMS classifier, resulting in two different prediction models and subsequently different marker panels. For both panels, even when using only five markers, accuracies were > 90% in our cohort and in the TCGA validation set. Our methylation marker panel accurately distinguishes between CMS2 and CMS3. This enables development of a targeted assay to provide a robust and clinically relevant classification tool for CRC patients.


Assuntos
Neoplasias Colorretais , Metilação de DNA , Biomarcadores Tumorais/genética , Neoplasias Colorretais/genética , Neoplasias Colorretais/patologia , Ilhas de CpG/genética , Metilação de DNA/genética , Humanos
17.
Stat Med ; 40(26): 5910-5925, 2021 11 20.
Artigo em Inglês | MEDLINE | ID: mdl-34438466

RESUMO

Clinical research often focuses on complex traits in which many variables play a role in mechanisms driving, or curing, diseases. Clinical prediction is hard when data is high-dimensional, but additional information, like domain knowledge and previously published studies, may be helpful to improve predictions. Such complementary data, or co-data, provide information on the covariates, such as genomic location or P-values from external studies. We use multiple and various co-data to define possibly overlapping or hierarchically structured groups of covariates. These are then used to estimate adaptive multi-group ridge penalties for generalized linear and Cox models. Available group adaptive methods primarily target for settings with few groups, and therefore likely overfit for non-informative, correlated or many groups, and do not account for known structure on group level. To handle these issues, our method combines empirical Bayes estimation of the hyperparameters with an extra level of flexible shrinkage. This renders a uniquely flexible framework as any type of shrinkage can be used on the group level. We describe various types of co-data and propose suitable forms of hypershrinkage. The method is very versatile, as it allows for integration and weighting of multiple co-data sets, inclusion of unpenalized covariates and posterior variable selection. For three cancer genomics applications we demonstrate improvements compared to other models in terms of performance, variable selection stability and validation.


Assuntos
Genômica , Teorema de Bayes , Humanos , Modelos de Riscos Proporcionais
18.
Biostatistics ; 22(4): 723-737, 2021 10 13.
Artigo em Inglês | MEDLINE | ID: mdl-31886488

RESUMO

In high-dimensional data settings, additional information on the features is often available. Examples of such external information in omics research are: (i) $p$-values from a previous study and (ii) omics annotation. The inclusion of this information in the analysis may enhance classification performance and feature selection but is not straightforward. We propose a group-regularized (logistic) elastic net regression method, where each penalty parameter corresponds to a group of features based on the external information. The method, termed gren, makes use of the Bayesian formulation of logistic elastic net regression to estimate both the model and penalty parameters in an approximate empirical-variational Bayes framework. Simulations and applications to three cancer genomics studies and one Alzheimer metabolomics study show that, if the partitioning of the features is informative, classification performance, and feature selection are indeed enhanced.


Assuntos
Genômica , Neoplasias , Teorema de Bayes , Humanos , Modelos Logísticos , Análise de Regressão
19.
Bioinformatics ; 37(14): 2012-2016, 2021 08 04.
Artigo em Inglês | MEDLINE | ID: mdl-32437519

RESUMO

MOTIVATION: Machine learning in the biomedical sciences should ideally provide predictive and interpretable models. When predicting outcomes from clinical or molecular features, applied researchers often want to know which features have effects, whether these effects are positive or negative and how strong these effects are. Regression analysis includes this information in the coefficients but typically renders less predictive models than more advanced machine learning techniques. RESULTS: Here, we propose an interpretable meta-learning approach for high-dimensional regression. The elastic net provides a compromise between estimating weak effects for many features and strong effects for some features. It has a mixing parameter to weight between ridge and lasso regularization. Instead of selecting one weighting by tuning, we combine multiple weightings by stacking. We do this in a way that increases predictivity without sacrificing interpretability. AVAILABILITY AND IMPLEMENTATION: The R package starnet is available on GitHub (https://github.com/rauschenberger/starnet) and CRAN (https://CRAN.R-project.org/package=starnet).


Assuntos
Aprendizado de Máquina , Software , Humanos , Análise de Regressão
20.
Biom J ; 63(2): 289-304, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33155717

RESUMO

In precision medicine, a common problem is drug sensitivity prediction from cancer tissue cell lines. These types of problems entail modelling multivariate drug responses on high-dimensional molecular feature sets in typically >1000 cell lines. The dimensions of the problem require specialised models and estimation methods. In addition, external information on both the drugs and the features is often available. We propose to model the drug responses through a linear regression with shrinkage enforced through a normal inverse Gaussian prior. We let the prior depend on the external information, and estimate the model and external information dependence in an empirical-variational Bayes framework. We demonstrate the usefulness of this model in both a simulated setting and in the publicly available Genomics of Drug Sensitivity in Cancer data.


Assuntos
Genômica , Preparações Farmacêuticas , Teorema de Bayes , Distribuição Normal , Medicina de Precisão
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...