Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 14 de 14
Filter
1.
BMC Med Res Methodol ; 22(1): 335, 2022 12 28.
Article in English | MEDLINE | ID: mdl-36577946

ABSTRACT

BACKGROUND: An external control arm is a cohort of control patients that are collected from data external to a single-arm trial. To provide an unbiased estimation of efficacy, the clinical profiles of patients from single and external arms should be aligned, typically using propensity score approaches. There are alternative approaches to infer efficacy based on comparisons between outcomes of single-arm patients and machine-learning predictions of control patient outcomes. These methods include G-computation and Doubly Debiased Machine Learning (DDML) and their evaluation for External Control Arms (ECA) analysis is insufficient. METHODS: We consider both numerical simulations and a trial replication procedure to evaluate the different statistical approaches: propensity score matching, Inverse Probability of Treatment Weighting (IPTW), G-computation, and DDML. The replication study relies on five type 2 diabetes randomized clinical trials granted by the Yale University Open Data Access (YODA) project. From the pool of five trials, observational experiments are artificially built by replacing a control arm from one trial by an arm originating from another trial and containing similarly-treated patients. RESULTS: Among the different statistical approaches, numerical simulations show that DDML has the smallest bias followed by G-computation. In terms of mean squared error, G-computation usually minimizes mean squared error. Compared to other methods, DDML has varying Mean Squared Error performances that improves with increasing sample sizes. For hypothesis testing, all methods control type I error and DDML is the most conservative. G-computation is the best method in terms of statistical power, and DDML has comparable power at [Formula: see text] but inferior ones for smaller sample sizes. The replication procedure also indicates that G-computation minimizes mean squared error whereas DDML has intermediate performances in between G-computation and propensity score approaches. The confidence intervals of G-computation are the narrowest whereas confidence intervals obtained with DDML are the widest for small sample sizes, which confirms its conservative nature. CONCLUSIONS: For external control arm analyses, methods based on outcome prediction models can reduce estimation error and increase statistical power compared to propensity score approaches.


Subject(s)
Diabetes Mellitus, Type 2 , Humans , Bias , Computer Simulation , Diabetes Mellitus, Type 2/therapy , Machine Learning , Propensity Score , Research Design , Randomized Controlled Trials as Topic
2.
Hepatology ; 72(6): 2000-2013, 2020 12.
Article in English | MEDLINE | ID: mdl-32108950

ABSTRACT

BACKGROUND AND AIMS: Standardized and robust risk-stratification systems for patients with hepatocellular carcinoma (HCC) are required to improve therapeutic strategies and investigate the benefits of adjuvant systemic therapies after curative resection/ablation. APPROACH AND RESULTS: In this study, we used two deep-learning algorithms based on whole-slide digitized histological slides (whole-slide imaging; WSI) to build models for predicting survival of patients with HCC treated by surgical resection. Two independent series were investigated: a discovery set (Henri Mondor Hospital, n = 194) used to develop our algorithms and an independent validation set (The Cancer Genome Atlas [TCGA], n = 328). WSIs were first divided into small squares ("tiles"), and features were extracted with a pretrained convolutional neural network (preprocessing step). The first deep-learning-based algorithm ("SCHMOWDER") uses an attention mechanism on tumoral areas annotated by a pathologist whereas the second ("CHOWDER") does not require human expertise. In the discovery set, c-indices for survival prediction of SCHMOWDER and CHOWDER reached 0.78 and 0.75, respectively. Both models outperformed a composite score incorporating all baseline variables associated with survival. Prognostic value of the models was further validated in the TCGA data set, and, as observed in the discovery series, both models had a higher discriminatory power than a score combining all baseline variables associated with survival. Pathological review showed that the tumoral areas most predictive of poor survival were characterized by vascular spaces, the macrotrabecular architectural pattern, and a lack of immune infiltration. CONCLUSIONS: This study shows that artificial intelligence can help refine the prediction of HCC prognosis. It highlights the importance of pathologist/machine interactions for the construction of deep-learning algorithms that benefit from expert knowledge and allow a biological understanding of their output.


Subject(s)
Carcinoma, Hepatocellular/mortality , Deep Learning , Hepatectomy/methods , Liver Neoplasms/mortality , Aged , Carcinoma, Hepatocellular/pathology , Carcinoma, Hepatocellular/surgery , Feasibility Studies , Female , Follow-Up Studies , Humans , Liver/pathology , Liver/surgery , Liver Neoplasms/pathology , Liver Neoplasms/surgery , Male , Middle Aged , Prognosis , Risk Assessment/methods , Survival Analysis , Treatment Outcome
3.
BMC Bioinformatics ; 15: 191, 2014 Jun 17.
Article in English | MEDLINE | ID: mdl-24934562

ABSTRACT

BACKGROUND: Meganucleases are important tools for genome engineering, providing an efficient way to generate DNA double-strand breaks at specific loci of interest. Numerous experimental efforts, ranging from in vivo selection to in silico modeling, have been made to re-engineer meganucleases to target relevant DNA sequences. RESULTS: Here we present a novel in silico method for designing custom meganucleases that is based on the use of a machine learning approach. We compared it with existing in silico physical models and high-throughput experimental screening. The machine learning model was used to successfully predict active meganucleases for 53 new DNA targets. CONCLUSIONS: This new method shows competitive performance compared with state-of-the-art in silico physical models, with up to a fourfold increase in terms of the design success rate. Compared to experimental high-throughput screening methods, it reduces the number of screening experiments needed by a factor of more than 100 without affecting final performance.


Subject(s)
Artificial Intelligence , Computer Simulation , DNA/genetics , High-Throughput Screening Assays/methods , Sequence Analysis, DNA/methods , DNA/chemistry
4.
BMC Mol Biol ; 15: 13, 2014 Jul 05.
Article in English | MEDLINE | ID: mdl-24997498

ABSTRACT

BACKGROUND: The past decade has seen the emergence of several molecular tools that render possible modification of cellular functions through accurate and easy addition, removal, or exchange of genomic DNA sequences. Among these technologies, transcription activator-like effectors (TALE) has turned out to be one of the most versatile and incredibly robust platform for generating targeted molecular tools as demonstrated by fusion to various domains such as transcription activator, repressor and nucleases. RESULTS: In this study, we generated a novel nuclease architecture based on the transcription activator-like effector scaffold. In contrast to the existing Tail to Tail (TtT) and head to Head (HtH) nuclease architectures based on the symmetrical association of two TALE DNA binding domains fused to the C-terminal (TtT) or N-terminal (HtH) end of FokI, this novel architecture consists of the asymmetrical association of two different engineered TALE DNA binding domains fused to the N- and C-terminal ends of FokI (TALE::FokI and FokI::TALE scaffolds respectively). The characterization of this novel Tail to Head (TtH) architecture in yeast enabled us to demonstrate its nuclease activity and define its optimal target configuration. We further showed that this architecture was able to promote substantial level of targeted mutagenesis at three endogenous loci present in two different mammalian cell lines. CONCLUSION: Our results demonstrated that this novel functional TtH architecture which requires binding to only one DNA strand of a given endogenous locus has the potential to extend the targeting possibility of FokI-based TALE nucleases.


Subject(s)
Deoxyribonucleases, Type II Site-Specific/metabolism , Fungal Proteins/metabolism , Protein Engineering/methods , Recombinant Fusion Proteins/metabolism , Transcription Factors/metabolism , Yeasts/metabolism , Animals , Base Sequence , Binding Sites , Cell Line , DNA/metabolism , Deoxyribonucleases, Type II Site-Specific/chemistry , Deoxyribonucleases, Type II Site-Specific/genetics , Fungal Proteins/chemistry , Fungal Proteins/genetics , Gene Targeting/methods , Genetic Loci , Humans , Molecular Sequence Data , Mutagenesis , Protein Structure, Tertiary , Recombinant Fusion Proteins/chemistry , Recombinant Fusion Proteins/genetics , Sequence Alignment , Transcription Factors/chemistry , Transcription Factors/genetics , Yeasts/genetics
5.
Nucleic Acids Res ; 40(13): 6367-79, 2012 Jul.
Article in English | MEDLINE | ID: mdl-22467209

ABSTRACT

The ability to specifically engineer the genome of living cells at precise locations using rare-cutting designer endonucleases has broad implications for biotechnology and medicine, particularly for functional genomics, transgenics and gene therapy. However, the potential impact of chromosomal context and epigenetics on designer endonuclease-mediated genome editing is poorly understood. To address this question, we conducted a comprehensive analysis on the efficacy of 37 endonucleases derived from the quintessential I-CreI meganuclease that were specifically designed to cleave 39 different genomic targets. The analysis revealed that the efficiency of targeted mutagenesis at a given chromosomal locus is predictive of that of homologous gene targeting. Consequently, a strong genome-wide correlation was apparent between the efficiency of targeted mutagenesis (≤ 0.1% to ≈ 6%) with that of homologous gene targeting (≤ 0.1% to ≈ 15%). In contrast, the efficiency of targeted mutagenesis or homologous gene targeting at a given chromosomal locus does not correlate with the activity of individual endonucleases on transiently transfected substrates. Finally, we demonstrate that chromatin accessibility modulates the efficacy of rare-cutting endonucleases, accounting for strong position effects. Thus, chromosomal context and epigenetic mechanisms may play a major role in the efficiency rare-cutting endonuclease-induced genome engineering.


Subject(s)
Chromosomal Position Effects , DNA Restriction Enzymes/metabolism , Animals , CHO Cells , Cell Line , Cricetinae , Cricetulus , DNA Restriction Enzymes/chemistry , Gene Targeting , Genetic Engineering , Genome, Human , Humans , Mutagenesis
6.
Nat Commun ; 14(1): 3459, 2023 06 13.
Article in English | MEDLINE | ID: mdl-37311751

ABSTRACT

Two tumor (Classical/Basal) and stroma (Inactive/active) subtypes of Pancreatic adenocarcinoma (PDAC) with prognostic and theragnostic implications have been described. These molecular subtypes were defined by RNAseq, a costly technique sensitive to sample quality and cellularity, not used in routine practice. To allow rapid PDAC molecular subtyping and study PDAC heterogeneity, we develop PACpAInt, a multi-step deep learning model. PACpAInt is trained on a multicentric cohort (n = 202) and validated on 4 independent cohorts including biopsies (surgical cohorts n = 148; 97; 126 / biopsy cohort n = 25), all with transcriptomic data (n = 598) to predict tumor tissue, tumor cells from stroma, and their transcriptomic molecular subtypes, either at the whole slide or tile level (112 µm squares). PACpAInt correctly predicts tumor subtypes at the whole slide level on surgical and biopsies specimens and independently predicts survival. PACpAInt highlights the presence of a minor aggressive Basal contingent that negatively impacts survival in 39% of RNA-defined classical cases. Tile-level analysis ( > 6 millions) redefines PDAC microheterogeneity showing codependencies in the distribution of tumor and stroma subtypes, and demonstrates that, in addition to the Classical and Basal tumors, there are Hybrid tumors that combine the latter subtypes, and Intermediate tumors that may represent a transition state during PDAC evolution.


Subject(s)
Adenocarcinoma , Deep Learning , Pancreatic Neoplasms , Humans , Adenocarcinoma/genetics , Pancreatic Neoplasms/genetics , Aggression , Pancreatic Neoplasms
7.
Nat Med ; 29(1): 135-146, 2023 01.
Article in English | MEDLINE | ID: mdl-36658418

ABSTRACT

Triple-negative breast cancer (TNBC) is a rare cancer, characterized by high metastatic potential and poor prognosis, and has limited treatment options. The current standard of care in nonmetastatic settings is neoadjuvant chemotherapy (NACT), but treatment efficacy varies substantially across patients. This heterogeneity is still poorly understood, partly due to the paucity of curated TNBC data. Here we investigate the use of machine learning (ML) leveraging whole-slide images and clinical information to predict, at diagnosis, the histological response to NACT for early TNBC women patients. To overcome the biases of small-scale studies while respecting data privacy, we conducted a multicentric TNBC study using federated learning, in which patient data remain secured behind hospitals' firewalls. We show that local ML models relying on whole-slide images can predict response to NACT but that collaborative training of ML models further improves performance, on par with the best current approaches in which ML models are trained using time-consuming expert annotations. Our ML model is interpretable and is sensitive to specific histological patterns. This proof of concept study, in which federated learning is applied to real-world datasets, paves the way for future biomarker discovery using unprecedentedly large datasets.


Subject(s)
Neoadjuvant Therapy , Triple Negative Breast Neoplasms , Humans , Female , Neoadjuvant Therapy/methods , Triple Negative Breast Neoplasms/drug therapy , Triple Negative Breast Neoplasms/pathology , Antineoplastic Combined Chemotherapy Protocols/therapeutic use , Treatment Outcome
8.
Eur Heart J Digit Health ; 3(1): 38-48, 2022 Mar.
Article in English | MEDLINE | ID: mdl-36713994

ABSTRACT

Aims: Through this proof of concept, we studied the potential added value of machine learning (ML) methods in building cardiovascular risk scores from structured data and the conditions under which they outperform linear statistical models. Methods and results: Relying on extensive cardiovascular clinical data from FOURIER, a randomized clinical trial to test for evolocumab efficacy, we compared linear models, neural networks, random forest, and gradient boosting machines for predicting the risk of major adverse cardiovascular events. To study the relative strengths of each method, we extended the comparison to restricted subsets of the full FOURIER dataset, limiting either the number of available patients or the number of their characteristics. When using all the 428 covariates available in the dataset, ML methods significantly (c-index 0.67, P-value 2e-5) outperformed linear models built from the same variables (c-index 0.62), as well as a reference cardiovascular risk score based on only 10 variables (c-index 0.60). We showed that gradient boosting-the best performing model in our setting-requires fewer patients and significantly outperforms linear models when using large numbers of variables. On the other hand, we illustrate how linear models suffer from being trained on too many variables, thus requiring a more careful prior selection. These ML methods proved to consistently improve risk assessment, to be interpretable despite their complexity and to help identify the minimal set of covariates necessary to achieve top performance. Conclusion: In the field of secondary cardiovascular events prevention, given the increased availability of extensive electronic health records, ML methods could open the door to more powerful tools for patient risk stratification and treatment allocation strategies.

9.
BMC Bioinformatics ; 11: 99, 2010 Feb 22.
Article in English | MEDLINE | ID: mdl-20175916

ABSTRACT

BACKGROUND: Predicting which molecules can bind to a given binding site of a protein with known 3D structure is important to decipher the protein function, and useful in drug design. A classical assumption in structural biology is that proteins with similar 3D structures have related molecular functions, and therefore may bind similar ligands. However, proteins that do not display any overall sequence or structure similarity may also bind similar ligands if they contain similar binding sites. Quantitatively assessing the similarity between binding sites may therefore be useful to propose new ligands for a given pocket, based on those known for similar pockets. RESULTS: We propose a new method to quantify the similarity between binding pockets, and explore its relevance for ligand prediction. We represent each pocket by a cloud of atoms, and assess the similarity between two pockets by aligning their atoms in the 3D space and comparing the resulting configurations with a convolution kernel. Pocket alignment and comparison is possible even when the corresponding proteins share no sequence or overall structure similarities. In order to predict ligands for a given target pocket, we compare it to an ensemble of pockets with known ligands to identify the most similar pockets. We discuss two criteria to evaluate the performance of a binding pocket similarity measure in the context of ligand prediction, namely, area under ROC curve (AUC scores) and classification based scores. We show that the latter is better suited to evaluate the methods with respect to ligand prediction, and demonstrate the relevance of our new binding site similarity compared to existing similarity measures. CONCLUSIONS: This study demonstrates the relevance of the proposed method to identify ligands binding to known binding pockets. We also provide a new benchmark for future work in this field. The new method and the benchmark are available at http://cbio.ensmp.fr/paris/.


Subject(s)
Computational Biology/methods , Proteins/chemistry , Proteins/metabolism , Binding Sites , Databases, Protein , Ligands , Models, Molecular , Protein Conformation , Structure-Activity Relationship
10.
Bioinformatics ; 25(12): i259-67, 2009 Jun 15.
Article in English | MEDLINE | ID: mdl-19477997

ABSTRACT

MOTIVATION: Aligning protein-protein interaction (PPI) networks of different species has drawn a considerable interest recently. This problem is important to investigate evolutionary conserved pathways or protein complexes across species, and to help in the identification of functional orthologs through the detection of conserved interactions. It is, however, a difficult combinatorial problem, for which only heuristic methods have been proposed so far. RESULTS: We reformulate the PPI alignment as a graph matching problem, and investigate how state-of-the-art graph matching algorithms can be used for that purpose. We differentiate between two alignment problems, depending on whether strict constraints on protein matches are given, based on sequence similarity, or whether the goal is instead to find an optimal compromise between sequence similarity and interaction conservation in the alignment. We propose new methods for both cases, and assess their performance on the alignment of the yeast and fly PPI networks. The new methods consistently outperform state-of-the-art algorithms, retrieving in particular 78% more conserved interactions than IsoRank for a given level of sequence similarity. AVAILABILITY: All data and codes are freely and publicly available upon request.


Subject(s)
Computational Biology/methods , Protein Interaction Mapping/methods , Proteins/chemistry , Algorithms , Databases, Protein , Sequence Alignment/methods
11.
Nat Commun ; 11(1): 3877, 2020 08 03.
Article in English | MEDLINE | ID: mdl-32747659

ABSTRACT

Deep learning methods for digital pathology analysis are an effective way to address multiple clinical questions, from diagnosis to prediction of treatment outcomes. These methods have also been used to predict gene mutations from pathology images, but no comprehensive evaluation of their potential for extracting molecular features from histology slides has yet been performed. We show that HE2RNA, a model based on the integration of multiple data modes, can be trained to systematically predict RNA-Seq profiles from whole-slide images alone, without expert annotation. Through its interpretable design, HE2RNA provides virtual spatialization of gene expression, as validated by CD3- and CD20-staining on an independent dataset. The transcriptomic representation learned by HE2RNA can also be transferred on other datasets, even of small size, to increase prediction performance for specific molecular phenotypes. We illustrate the use of this approach in clinical diagnosis purposes such as the identification of tumors with microsatellite instability.


Subject(s)
Computational Biology/methods , Deep Learning , Gene Expression Regulation, Neoplastic , Image Processing, Computer-Assisted/methods , Neoplasms/genetics , RNA-Seq/methods , Algorithms , Gene Expression Profiling/methods , Humans , Microsatellite Instability , Models, Genetic , Neoplasms/diagnosis , Neoplasms/metabolism
12.
Nat Med ; 25(10): 1519-1525, 2019 10.
Article in English | MEDLINE | ID: mdl-31591589

ABSTRACT

Malignant mesothelioma (MM) is an aggressive cancer primarily diagnosed on the basis of histological criteria1. The 2015 World Health Organization classification subdivides mesothelioma tumors into three histological types: epithelioid, biphasic and sarcomatoid MM. MM is a highly complex and heterogeneous disease, rendering its diagnosis and histological typing difficult and leading to suboptimal patient care and decisions regarding treatment modalities2. Here we have developed a new approach-based on deep convolutional neural networks-called MesoNet to accurately predict the overall survival of mesothelioma patients from whole-slide digitized images, without any pathologist-provided locally annotated regions. We validated MesoNet on both an internal validation cohort from the French MESOBANK and an independent cohort from The Cancer Genome Atlas (TCGA). We also demonstrated that the model was more accurate in predicting patient survival than using current pathology practices. Furthermore, unlike classical black-box deep learning methods, MesoNet identified regions contributing to patient outcome prediction. Strikingly, we found that these regions are mainly located in the stroma and are histological features associated with inflammation, cellular diversity and vacuolization. These findings suggest that deep learning models can identify new features predictive of patient survival and potentially lead to new biomarker discoveries.


Subject(s)
Lung Neoplasms/diagnosis , Lung Neoplasms/pathology , Mesothelioma/diagnosis , Mesothelioma/pathology , Prognosis , Deep Learning , Female , Humans , Lung Neoplasms/classification , Male , Mesothelioma/classification , Mesothelioma, Malignant , Neoplasm Grading , Neural Networks, Computer
13.
Nat Commun ; 10(1): 2674, 2019 06 17.
Article in English | MEDLINE | ID: mdl-31209238

ABSTRACT

The effectiveness of most cancer targeted therapies is short-lived. Tumors often develop resistance that might be overcome with drug combinations. However, the number of possible combinations is vast, necessitating data-driven approaches to find optimal patient-specific treatments. Here we report AstraZeneca's large drug combination dataset, consisting of 11,576 experiments from 910 combinations across 85 molecularly characterized cancer cell lines, and results of a DREAM Challenge to evaluate computational strategies for predicting synergistic drug pairs and biomarkers. 160 teams participated to provide a comprehensive methodological development and benchmarking. Winning methods incorporate prior knowledge of drug-target interactions. Synergy is predicted with an accuracy matching biological replicates for >60% of combinations. However, 20% of drug combinations are poorly predicted by all methods. Genomic rationale for synergy predictions are identified, including ADAM17 inhibitor antagonism when combined with PIK3CB/D inhibition contrasting to synergy when combined with other PI3K-pathway inhibitors in PIK3CA mutant cells.


Subject(s)
Antineoplastic Combined Chemotherapy Protocols/pharmacology , Computational Biology/methods , Neoplasms/drug therapy , Pharmacogenetics/methods , ADAM17 Protein/antagonists & inhibitors , Antineoplastic Combined Chemotherapy Protocols/therapeutic use , Benchmarking , Biomarkers, Tumor/genetics , Cell Line, Tumor , Computational Biology/standards , Datasets as Topic , Drug Antagonism , Drug Resistance, Neoplasm/drug effects , Drug Resistance, Neoplasm/genetics , Drug Synergism , Genomics/methods , Humans , Molecular Targeted Therapy/methods , Mutation , Neoplasms/genetics , Pharmacogenetics/standards , Phosphatidylinositol 3-Kinases/genetics , Phosphoinositide-3 Kinase Inhibitors , Treatment Outcome
14.
IEEE Trans Pattern Anal Mach Intell ; 31(12): 2227-42, 2009 Dec.
Article in English | MEDLINE | ID: mdl-19834143

ABSTRACT

We propose a convex-concave programming approach for the labeled weighted graph matching problem. The convex-concave programming formulation is obtained by rewriting the weighted graph matching problem as a least-square problem on the set of permutation matrices and relaxing it to two different optimization problems: a quadratic convex and a quadratic concave optimization problem on the set of doubly stochastic matrices. The concave relaxation has the same global minimum as the initial graph matching problem, but the search for its global minimum is also a hard combinatorial problem. We, therefore, construct an approximation of the concave problem solution by following a solution path of a convex-concave problem obtained by linear interpolation of the convex and concave formulations, starting from the convex relaxation. This method allows to easily integrate the information on graph label similarities into the optimization problem, and therefore, perform labeled weighted graph matching. The algorithm is compared with some of the best performing graph matching methods on four data sets: simulated graphs, QAPLib, retina vessel images, and handwritten Chinese characters. In all cases, the results are competitive with the state of the art.


Subject(s)
Algorithms , Pattern Recognition, Automated/statistics & numerical data , Artificial Intelligence , Humans , Image Processing, Computer-Assisted , Retinal Vessels/anatomy & histology
SELECTION OF CITATIONS
SEARCH DETAIL