Search | VHL Regional Portal

Identification of prognostic and predictive biomarkers in high-dimensional data with PPLasso.

Zhu, Wencan; Lévy-Leduc, Céline; Ternès, Nils.

BMC Bioinformatics ; 24(1): 25, 2023 Jan 23.

Article in English | MEDLINE | ID: mdl-36690931

ABSTRACT

In clinical trials, identification of prognostic and predictive biomarkers has became essential to precision medicine. Prognostic biomarkers can be useful for the prevention of the occurrence of the disease, and predictive biomarkers can be used to identify patients with potential benefit from the treatment. Previous researches were mainly focused on clinical characteristics, and the use of genomic data in such an area is hardly studied. A new method is required to simultaneously select prognostic and predictive biomarkers in high dimensional genomic data where biomarkers are highly correlated. We propose a novel approach called PPLasso, that integrates prognostic and predictive effects into one statistical model. PPLasso also takes into account the correlations between biomarkers that can alter the biomarker selection accuracy. Our method consists in transforming the design matrix to remove the correlations between the biomarkers before applying the generalized Lasso. In a comprehensive numerical evaluation, we show that PPLasso outperforms the traditional Lasso and other extensions on both prognostic and predictive biomarker identification in various scenarios. Finally, our method is applied to publicly available transcriptomic and proteomic data.

Subject(s)

Biomarkers, Tumor , Proteomics , Humans , Prognosis , Biomarkers , Models, Statistical , Genomics

A variable selection approach for highly correlated predictors in high-dimensional genomic data.

Zhu, Wencan; Lévy-Leduc, Céline; Ternès, Nils.

Bioinformatics ; 37(16): 2238-2244, 2021 Aug 25.

Article in English | MEDLINE | ID: mdl-33617644

ABSTRACT

MOTIVATION: In genomic studies, identifying biomarkers associated with a variable of interest is a major concern in biomedical research. Regularized approaches are classically used to perform variable selection in high-dimensional linear models. However, these methods can fail in highly correlated settings. RESULTS: We propose a novel variable selection approach called WLasso, taking these correlations into account. It consists in rewriting the initial high-dimensional linear model to remove the correlation between the biomarkers (predictors) and in applying the generalized Lasso criterion. The performance of WLasso is assessed using synthetic data in several scenarios and compared with recent alternative approaches. The results show that when the biomarkers are highly correlated, WLasso outperforms the other approaches in sparse high-dimensional frameworks. The method is also illustrated on publicly available gene expression data in breast cancer. AVAILABILITYAND IMPLEMENTATION: Our method is implemented in the WLasso R package which is available from the Comprehensive R Archive Network (CRAN). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

A Quantitative Multivariate Model of Human Dendritic Cell-T Helper Cell Communication.

Grandclaudon, Maximilien; Perrot-Dockès, Marie; Trichot, Coline; Karpf, Léa; Abouzid, Omar; Chauvin, Camille; Sirven, Philémon; Abou-Jaoudé, Wassim; Berger, Frédérique; Hupé, Philippe; Thieffry, Denis; Sansonnet, Laure; Chiquet, Julien; Lévy-Leduc, Céline; Soumelis, Vassili.

Cell ; 179(2): 432-447.e21, 2019 10 03.

Article in English | MEDLINE | ID: mdl-31585082

ABSTRACT

Cell-cell communication involves a large number of molecular signals that function as words of a complex language whose grammar remains mostly unknown. Here, we describe an integrative approach involving (1) protein-level measurement of multiple communication signals coupled to output responses in receiving cells and (2) mathematical modeling to uncover input-output relationships and interactions between signals. Using human dendritic cell (DC)-T helper (Th) cell communication as a model, we measured 36 DC-derived signals and 17 Th cytokines broadly covering Th diversity in 428 observations. We developed a data-driven, computationally validated model capturing 56 already described and 290 potentially novel mechanisms of Th cell specification. By predicting context-dependent behaviors, we demonstrate a new function for IL-12p70 as an inducer of Th17 in an IL-1 signaling context. This work provides a unique resource to decipher the complex combinatorial rules governing DC-Th cell communication and guide their manipulation for vaccine design and immunotherapies.

Subject(s)

Cell Communication/immunology , Dendritic Cells/immunology , Interleukin-12/physiology , Th17 Cells/immunology , Adolescent , Adult , Aged , Cells, Cultured , Coculture Techniques , Healthy Volunteers , Humans , Interleukin-1/metabolism , Middle Aged , Models, Biological , Young Adult

A variable selection approach in the multivariate linear model: an application to LC-MS metabolomics data.

Perrot-Dockès, Marie; Lévy-Leduc, Céline; Chiquet, Julien; Sansonnet, Laure; Brégère, Margaux; Étienne, Marie-Pierre; Robin, Stéphane; Genta-Jouve, Grégory.

Stat Appl Genet Mol Biol ; 17(5)2018 09 08.

Article in English | MEDLINE | ID: mdl-30205662

ABSTRACT

Omic data are characterized by the presence of strong dependence structures that result either from data acquisition or from some underlying biological processes. Applying statistical procedures that do not adjust the variable selection step to the dependence pattern may result in a loss of power and the selection of spurious variables. The goal of this paper is to propose a variable selection procedure within the multivariate linear model framework that accounts for the dependence between the multiple responses. We shall focus on a specific type of dependence which consists in assuming that the responses of a given individual can be modelled as a time series. We propose a novel Lasso-based approach within the framework of the multivariate linear model taking into account the dependence structure by using different types of stationary processes covariance structures for the random error matrix. Our numerical experiments show that including the estimation of the covariance matrix of the random error matrix in the Lasso criterion dramatically improves the variable selection performance. Our approach is successfully applied to an untargeted LC-MS (Liquid Chromatography-Mass Spectrometry) data set made of African copals samples. Our methodology is implemented in the R package MultiVarSel which is available from the Comprehensive R Archive Network (CRAN).

Subject(s)

Biomarkers/metabolism , Chromatography, Liquid/methods , Data Interpretation, Statistical , Metabolomics/methods , Tandem Mass Spectrometry/methods , Humans , Linear Models , Metabolomics/statistics & numerical data

Two-dimensional segmentation for analyzing Hi-C data.

Lévy-Leduc, Celine; Delattre, M; Mary-Huard, T; Robin, S.

Bioinformatics ; 30(17): i386-92, 2014 Sep 01.

Article in English | MEDLINE | ID: mdl-25161224

ABSTRACT

MOTIVATION: The spatial conformation of the chromosome has a deep influence on gene regulation and expression. Hi-C technology allows the evaluation of the spatial proximity between any pair of loci along the genome. It results in a data matrix where blocks corresponding to (self-)interacting regions appear. The delimitation of such blocks is critical to better understand the spatial organization of the chromatin. From a computational point of view, it results in a 2D segmentation problem. RESULTS: We focus on the detection of cis-interacting regions, which appear to be prominent in observed data. We define a block-wise segmentation model for the detection of such regions. We prove that the maximization of the likelihood with respect to the block boundaries can be rephrased in terms of a 1D segmentation problem, for which the standard dynamic programming applies. The performance of the proposed methods is assessed by a simulation study on both synthetic and resampled data. A comparative study on public data shows good concordance with biologically confirmed regions. AVAILABILITY AND IMPLEMENTATION: The HiCseg R package is available from the Comprehensive R Archive Network and from the Web page of the corresponding author.

Subject(s)

Chromosomes, Mammalian/chemistry , Animals , Chromatin/chemistry , Chromosomes, Human/chemistry , High-Throughput Nucleotide Sequencing , Humans , Mice , Models, Statistical , Sequence Analysis, DNA

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL