Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
BMC Med Res Methodol ; 22(1): 335, 2022 12 28.
Artículo en Inglés | MEDLINE | ID: mdl-36577946

RESUMEN

BACKGROUND: An external control arm is a cohort of control patients that are collected from data external to a single-arm trial. To provide an unbiased estimation of efficacy, the clinical profiles of patients from single and external arms should be aligned, typically using propensity score approaches. There are alternative approaches to infer efficacy based on comparisons between outcomes of single-arm patients and machine-learning predictions of control patient outcomes. These methods include G-computation and Doubly Debiased Machine Learning (DDML) and their evaluation for External Control Arms (ECA) analysis is insufficient. METHODS: We consider both numerical simulations and a trial replication procedure to evaluate the different statistical approaches: propensity score matching, Inverse Probability of Treatment Weighting (IPTW), G-computation, and DDML. The replication study relies on five type 2 diabetes randomized clinical trials granted by the Yale University Open Data Access (YODA) project. From the pool of five trials, observational experiments are artificially built by replacing a control arm from one trial by an arm originating from another trial and containing similarly-treated patients. RESULTS: Among the different statistical approaches, numerical simulations show that DDML has the smallest bias followed by G-computation. In terms of mean squared error, G-computation usually minimizes mean squared error. Compared to other methods, DDML has varying Mean Squared Error performances that improves with increasing sample sizes. For hypothesis testing, all methods control type I error and DDML is the most conservative. G-computation is the best method in terms of statistical power, and DDML has comparable power at [Formula: see text] but inferior ones for smaller sample sizes. The replication procedure also indicates that G-computation minimizes mean squared error whereas DDML has intermediate performances in between G-computation and propensity score approaches. The confidence intervals of G-computation are the narrowest whereas confidence intervals obtained with DDML are the widest for small sample sizes, which confirms its conservative nature. CONCLUSIONS: For external control arm analyses, methods based on outcome prediction models can reduce estimation error and increase statistical power compared to propensity score approaches.


Asunto(s)
Diabetes Mellitus Tipo 2 , Humanos , Sesgo , Simulación por Computador , Diabetes Mellitus Tipo 2/terapia , Aprendizaje Automático , Puntaje de Propensión , Proyectos de Investigación , Ensayos Clínicos Controlados Aleatorios como Asunto
2.
Hepatology ; 72(6): 2000-2013, 2020 12.
Artículo en Inglés | MEDLINE | ID: mdl-32108950

RESUMEN

BACKGROUND AND AIMS: Standardized and robust risk-stratification systems for patients with hepatocellular carcinoma (HCC) are required to improve therapeutic strategies and investigate the benefits of adjuvant systemic therapies after curative resection/ablation. APPROACH AND RESULTS: In this study, we used two deep-learning algorithms based on whole-slide digitized histological slides (whole-slide imaging; WSI) to build models for predicting survival of patients with HCC treated by surgical resection. Two independent series were investigated: a discovery set (Henri Mondor Hospital, n = 194) used to develop our algorithms and an independent validation set (The Cancer Genome Atlas [TCGA], n = 328). WSIs were first divided into small squares ("tiles"), and features were extracted with a pretrained convolutional neural network (preprocessing step). The first deep-learning-based algorithm ("SCHMOWDER") uses an attention mechanism on tumoral areas annotated by a pathologist whereas the second ("CHOWDER") does not require human expertise. In the discovery set, c-indices for survival prediction of SCHMOWDER and CHOWDER reached 0.78 and 0.75, respectively. Both models outperformed a composite score incorporating all baseline variables associated with survival. Prognostic value of the models was further validated in the TCGA data set, and, as observed in the discovery series, both models had a higher discriminatory power than a score combining all baseline variables associated with survival. Pathological review showed that the tumoral areas most predictive of poor survival were characterized by vascular spaces, the macrotrabecular architectural pattern, and a lack of immune infiltration. CONCLUSIONS: This study shows that artificial intelligence can help refine the prediction of HCC prognosis. It highlights the importance of pathologist/machine interactions for the construction of deep-learning algorithms that benefit from expert knowledge and allow a biological understanding of their output.


Asunto(s)
Carcinoma Hepatocelular/mortalidad , Aprendizaje Profundo , Hepatectomía/métodos , Neoplasias Hepáticas/mortalidad , Anciano , Carcinoma Hepatocelular/patología , Carcinoma Hepatocelular/cirugía , Estudios de Factibilidad , Femenino , Estudios de Seguimiento , Humanos , Hígado/patología , Hígado/cirugía , Neoplasias Hepáticas/patología , Neoplasias Hepáticas/cirugía , Masculino , Persona de Mediana Edad , Pronóstico , Medición de Riesgo/métodos , Análisis de Supervivencia , Resultado del Tratamiento
3.
BMC Bioinformatics ; 15: 191, 2014 Jun 17.
Artículo en Inglés | MEDLINE | ID: mdl-24934562

RESUMEN

BACKGROUND: Meganucleases are important tools for genome engineering, providing an efficient way to generate DNA double-strand breaks at specific loci of interest. Numerous experimental efforts, ranging from in vivo selection to in silico modeling, have been made to re-engineer meganucleases to target relevant DNA sequences. RESULTS: Here we present a novel in silico method for designing custom meganucleases that is based on the use of a machine learning approach. We compared it with existing in silico physical models and high-throughput experimental screening. The machine learning model was used to successfully predict active meganucleases for 53 new DNA targets. CONCLUSIONS: This new method shows competitive performance compared with state-of-the-art in silico physical models, with up to a fourfold increase in terms of the design success rate. Compared to experimental high-throughput screening methods, it reduces the number of screening experiments needed by a factor of more than 100 without affecting final performance.


Asunto(s)
Inteligencia Artificial , Simulación por Computador , ADN/genética , Ensayos Analíticos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , ADN/química
4.
BMC Mol Biol ; 15: 13, 2014 Jul 05.
Artículo en Inglés | MEDLINE | ID: mdl-24997498

RESUMEN

BACKGROUND: The past decade has seen the emergence of several molecular tools that render possible modification of cellular functions through accurate and easy addition, removal, or exchange of genomic DNA sequences. Among these technologies, transcription activator-like effectors (TALE) has turned out to be one of the most versatile and incredibly robust platform for generating targeted molecular tools as demonstrated by fusion to various domains such as transcription activator, repressor and nucleases. RESULTS: In this study, we generated a novel nuclease architecture based on the transcription activator-like effector scaffold. In contrast to the existing Tail to Tail (TtT) and head to Head (HtH) nuclease architectures based on the symmetrical association of two TALE DNA binding domains fused to the C-terminal (TtT) or N-terminal (HtH) end of FokI, this novel architecture consists of the asymmetrical association of two different engineered TALE DNA binding domains fused to the N- and C-terminal ends of FokI (TALE::FokI and FokI::TALE scaffolds respectively). The characterization of this novel Tail to Head (TtH) architecture in yeast enabled us to demonstrate its nuclease activity and define its optimal target configuration. We further showed that this architecture was able to promote substantial level of targeted mutagenesis at three endogenous loci present in two different mammalian cell lines. CONCLUSION: Our results demonstrated that this novel functional TtH architecture which requires binding to only one DNA strand of a given endogenous locus has the potential to extend the targeting possibility of FokI-based TALE nucleases.


Asunto(s)
Desoxirribonucleasas de Localización Especificada Tipo II/metabolismo , Proteínas Fúngicas/metabolismo , Ingeniería de Proteínas/métodos , Proteínas Recombinantes de Fusión/metabolismo , Factores de Transcripción/metabolismo , Levaduras/metabolismo , Animales , Secuencia de Bases , Sitios de Unión , Línea Celular , ADN/metabolismo , Desoxirribonucleasas de Localización Especificada Tipo II/química , Desoxirribonucleasas de Localización Especificada Tipo II/genética , Proteínas Fúngicas/química , Proteínas Fúngicas/genética , Marcación de Gen/métodos , Sitios Genéticos , Humanos , Datos de Secuencia Molecular , Mutagénesis , Estructura Terciaria de Proteína , Proteínas Recombinantes de Fusión/química , Proteínas Recombinantes de Fusión/genética , Alineación de Secuencia , Factores de Transcripción/química , Factores de Transcripción/genética , Levaduras/genética
5.
Nucleic Acids Res ; 40(13): 6367-79, 2012 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-22467209

RESUMEN

The ability to specifically engineer the genome of living cells at precise locations using rare-cutting designer endonucleases has broad implications for biotechnology and medicine, particularly for functional genomics, transgenics and gene therapy. However, the potential impact of chromosomal context and epigenetics on designer endonuclease-mediated genome editing is poorly understood. To address this question, we conducted a comprehensive analysis on the efficacy of 37 endonucleases derived from the quintessential I-CreI meganuclease that were specifically designed to cleave 39 different genomic targets. The analysis revealed that the efficiency of targeted mutagenesis at a given chromosomal locus is predictive of that of homologous gene targeting. Consequently, a strong genome-wide correlation was apparent between the efficiency of targeted mutagenesis (≤ 0.1% to ≈ 6%) with that of homologous gene targeting (≤ 0.1% to ≈ 15%). In contrast, the efficiency of targeted mutagenesis or homologous gene targeting at a given chromosomal locus does not correlate with the activity of individual endonucleases on transiently transfected substrates. Finally, we demonstrate that chromatin accessibility modulates the efficacy of rare-cutting endonucleases, accounting for strong position effects. Thus, chromosomal context and epigenetic mechanisms may play a major role in the efficiency rare-cutting endonuclease-induced genome engineering.


Asunto(s)
Efectos de la Posición Cromosómica , Enzimas de Restricción del ADN/metabolismo , Animales , Células CHO , Línea Celular , Cricetinae , Cricetulus , Enzimas de Restricción del ADN/química , Marcación de Gen , Ingeniería Genética , Genoma Humano , Humanos , Mutagénesis
6.
Nat Commun ; 14(1): 3459, 2023 06 13.
Artículo en Inglés | MEDLINE | ID: mdl-37311751

RESUMEN

Two tumor (Classical/Basal) and stroma (Inactive/active) subtypes of Pancreatic adenocarcinoma (PDAC) with prognostic and theragnostic implications have been described. These molecular subtypes were defined by RNAseq, a costly technique sensitive to sample quality and cellularity, not used in routine practice. To allow rapid PDAC molecular subtyping and study PDAC heterogeneity, we develop PACpAInt, a multi-step deep learning model. PACpAInt is trained on a multicentric cohort (n = 202) and validated on 4 independent cohorts including biopsies (surgical cohorts n = 148; 97; 126 / biopsy cohort n = 25), all with transcriptomic data (n = 598) to predict tumor tissue, tumor cells from stroma, and their transcriptomic molecular subtypes, either at the whole slide or tile level (112 µm squares). PACpAInt correctly predicts tumor subtypes at the whole slide level on surgical and biopsies specimens and independently predicts survival. PACpAInt highlights the presence of a minor aggressive Basal contingent that negatively impacts survival in 39% of RNA-defined classical cases. Tile-level analysis ( > 6 millions) redefines PDAC microheterogeneity showing codependencies in the distribution of tumor and stroma subtypes, and demonstrates that, in addition to the Classical and Basal tumors, there are Hybrid tumors that combine the latter subtypes, and Intermediate tumors that may represent a transition state during PDAC evolution.


Asunto(s)
Adenocarcinoma , Aprendizaje Profundo , Neoplasias Pancreáticas , Humanos , Adenocarcinoma/genética , Neoplasias Pancreáticas/genética , Agresión , Neoplasias Pancreáticas
7.
Nat Med ; 29(1): 135-146, 2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-36658418

RESUMEN

Triple-negative breast cancer (TNBC) is a rare cancer, characterized by high metastatic potential and poor prognosis, and has limited treatment options. The current standard of care in nonmetastatic settings is neoadjuvant chemotherapy (NACT), but treatment efficacy varies substantially across patients. This heterogeneity is still poorly understood, partly due to the paucity of curated TNBC data. Here we investigate the use of machine learning (ML) leveraging whole-slide images and clinical information to predict, at diagnosis, the histological response to NACT for early TNBC women patients. To overcome the biases of small-scale studies while respecting data privacy, we conducted a multicentric TNBC study using federated learning, in which patient data remain secured behind hospitals' firewalls. We show that local ML models relying on whole-slide images can predict response to NACT but that collaborative training of ML models further improves performance, on par with the best current approaches in which ML models are trained using time-consuming expert annotations. Our ML model is interpretable and is sensitive to specific histological patterns. This proof of concept study, in which federated learning is applied to real-world datasets, paves the way for future biomarker discovery using unprecedentedly large datasets.


Asunto(s)
Terapia Neoadyuvante , Neoplasias de la Mama Triple Negativas , Humanos , Femenino , Terapia Neoadyuvante/métodos , Neoplasias de la Mama Triple Negativas/tratamiento farmacológico , Neoplasias de la Mama Triple Negativas/patología , Protocolos de Quimioterapia Combinada Antineoplásica/uso terapéutico , Resultado del Tratamiento
8.
Eur Heart J Digit Health ; 3(1): 38-48, 2022 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-36713994

RESUMEN

Aims: Through this proof of concept, we studied the potential added value of machine learning (ML) methods in building cardiovascular risk scores from structured data and the conditions under which they outperform linear statistical models. Methods and results: Relying on extensive cardiovascular clinical data from FOURIER, a randomized clinical trial to test for evolocumab efficacy, we compared linear models, neural networks, random forest, and gradient boosting machines for predicting the risk of major adverse cardiovascular events. To study the relative strengths of each method, we extended the comparison to restricted subsets of the full FOURIER dataset, limiting either the number of available patients or the number of their characteristics. When using all the 428 covariates available in the dataset, ML methods significantly (c-index 0.67, P-value 2e-5) outperformed linear models built from the same variables (c-index 0.62), as well as a reference cardiovascular risk score based on only 10 variables (c-index 0.60). We showed that gradient boosting-the best performing model in our setting-requires fewer patients and significantly outperforms linear models when using large numbers of variables. On the other hand, we illustrate how linear models suffer from being trained on too many variables, thus requiring a more careful prior selection. These ML methods proved to consistently improve risk assessment, to be interpretable despite their complexity and to help identify the minimal set of covariates necessary to achieve top performance. Conclusion: In the field of secondary cardiovascular events prevention, given the increased availability of extensive electronic health records, ML methods could open the door to more powerful tools for patient risk stratification and treatment allocation strategies.

9.
BMC Bioinformatics ; 11: 99, 2010 Feb 22.
Artículo en Inglés | MEDLINE | ID: mdl-20175916

RESUMEN

BACKGROUND: Predicting which molecules can bind to a given binding site of a protein with known 3D structure is important to decipher the protein function, and useful in drug design. A classical assumption in structural biology is that proteins with similar 3D structures have related molecular functions, and therefore may bind similar ligands. However, proteins that do not display any overall sequence or structure similarity may also bind similar ligands if they contain similar binding sites. Quantitatively assessing the similarity between binding sites may therefore be useful to propose new ligands for a given pocket, based on those known for similar pockets. RESULTS: We propose a new method to quantify the similarity between binding pockets, and explore its relevance for ligand prediction. We represent each pocket by a cloud of atoms, and assess the similarity between two pockets by aligning their atoms in the 3D space and comparing the resulting configurations with a convolution kernel. Pocket alignment and comparison is possible even when the corresponding proteins share no sequence or overall structure similarities. In order to predict ligands for a given target pocket, we compare it to an ensemble of pockets with known ligands to identify the most similar pockets. We discuss two criteria to evaluate the performance of a binding pocket similarity measure in the context of ligand prediction, namely, area under ROC curve (AUC scores) and classification based scores. We show that the latter is better suited to evaluate the methods with respect to ligand prediction, and demonstrate the relevance of our new binding site similarity compared to existing similarity measures. CONCLUSIONS: This study demonstrates the relevance of the proposed method to identify ligands binding to known binding pockets. We also provide a new benchmark for future work in this field. The new method and the benchmark are available at http://cbio.ensmp.fr/paris/.


Asunto(s)
Biología Computacional/métodos , Proteínas/química , Proteínas/metabolismo , Sitios de Unión , Bases de Datos de Proteínas , Ligandos , Modelos Moleculares , Conformación Proteica , Relación Estructura-Actividad
10.
Bioinformatics ; 25(12): i259-67, 2009 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-19477997

RESUMEN

MOTIVATION: Aligning protein-protein interaction (PPI) networks of different species has drawn a considerable interest recently. This problem is important to investigate evolutionary conserved pathways or protein complexes across species, and to help in the identification of functional orthologs through the detection of conserved interactions. It is, however, a difficult combinatorial problem, for which only heuristic methods have been proposed so far. RESULTS: We reformulate the PPI alignment as a graph matching problem, and investigate how state-of-the-art graph matching algorithms can be used for that purpose. We differentiate between two alignment problems, depending on whether strict constraints on protein matches are given, based on sequence similarity, or whether the goal is instead to find an optimal compromise between sequence similarity and interaction conservation in the alignment. We propose new methods for both cases, and assess their performance on the alignment of the yeast and fly PPI networks. The new methods consistently outperform state-of-the-art algorithms, retrieving in particular 78% more conserved interactions than IsoRank for a given level of sequence similarity. AVAILABILITY: All data and codes are freely and publicly available upon request.


Asunto(s)
Biología Computacional/métodos , Mapeo de Interacción de Proteínas/métodos , Proteínas/química , Algoritmos , Bases de Datos de Proteínas , Alineación de Secuencia/métodos
11.
Nat Commun ; 11(1): 3877, 2020 08 03.
Artículo en Inglés | MEDLINE | ID: mdl-32747659

RESUMEN

Deep learning methods for digital pathology analysis are an effective way to address multiple clinical questions, from diagnosis to prediction of treatment outcomes. These methods have also been used to predict gene mutations from pathology images, but no comprehensive evaluation of their potential for extracting molecular features from histology slides has yet been performed. We show that HE2RNA, a model based on the integration of multiple data modes, can be trained to systematically predict RNA-Seq profiles from whole-slide images alone, without expert annotation. Through its interpretable design, HE2RNA provides virtual spatialization of gene expression, as validated by CD3- and CD20-staining on an independent dataset. The transcriptomic representation learned by HE2RNA can also be transferred on other datasets, even of small size, to increase prediction performance for specific molecular phenotypes. We illustrate the use of this approach in clinical diagnosis purposes such as the identification of tumors with microsatellite instability.


Asunto(s)
Biología Computacional/métodos , Aprendizaje Profundo , Regulación Neoplásica de la Expresión Génica , Procesamiento de Imagen Asistido por Computador/métodos , Neoplasias/genética , RNA-Seq/métodos , Algoritmos , Perfilación de la Expresión Génica/métodos , Humanos , Inestabilidad de Microsatélites , Modelos Genéticos , Neoplasias/diagnóstico , Neoplasias/metabolismo
12.
Nat Med ; 25(10): 1519-1525, 2019 10.
Artículo en Inglés | MEDLINE | ID: mdl-31591589

RESUMEN

Malignant mesothelioma (MM) is an aggressive cancer primarily diagnosed on the basis of histological criteria1. The 2015 World Health Organization classification subdivides mesothelioma tumors into three histological types: epithelioid, biphasic and sarcomatoid MM. MM is a highly complex and heterogeneous disease, rendering its diagnosis and histological typing difficult and leading to suboptimal patient care and decisions regarding treatment modalities2. Here we have developed a new approach-based on deep convolutional neural networks-called MesoNet to accurately predict the overall survival of mesothelioma patients from whole-slide digitized images, without any pathologist-provided locally annotated regions. We validated MesoNet on both an internal validation cohort from the French MESOBANK and an independent cohort from The Cancer Genome Atlas (TCGA). We also demonstrated that the model was more accurate in predicting patient survival than using current pathology practices. Furthermore, unlike classical black-box deep learning methods, MesoNet identified regions contributing to patient outcome prediction. Strikingly, we found that these regions are mainly located in the stroma and are histological features associated with inflammation, cellular diversity and vacuolization. These findings suggest that deep learning models can identify new features predictive of patient survival and potentially lead to new biomarker discoveries.


Asunto(s)
Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/patología , Mesotelioma/diagnóstico , Mesotelioma/patología , Pronóstico , Aprendizaje Profundo , Femenino , Humanos , Neoplasias Pulmonares/clasificación , Masculino , Mesotelioma/clasificación , Mesotelioma Maligno , Clasificación del Tumor , Redes Neurales de la Computación
13.
Nat Commun ; 10(1): 2674, 2019 06 17.
Artículo en Inglés | MEDLINE | ID: mdl-31209238

RESUMEN

The effectiveness of most cancer targeted therapies is short-lived. Tumors often develop resistance that might be overcome with drug combinations. However, the number of possible combinations is vast, necessitating data-driven approaches to find optimal patient-specific treatments. Here we report AstraZeneca's large drug combination dataset, consisting of 11,576 experiments from 910 combinations across 85 molecularly characterized cancer cell lines, and results of a DREAM Challenge to evaluate computational strategies for predicting synergistic drug pairs and biomarkers. 160 teams participated to provide a comprehensive methodological development and benchmarking. Winning methods incorporate prior knowledge of drug-target interactions. Synergy is predicted with an accuracy matching biological replicates for >60% of combinations. However, 20% of drug combinations are poorly predicted by all methods. Genomic rationale for synergy predictions are identified, including ADAM17 inhibitor antagonism when combined with PIK3CB/D inhibition contrasting to synergy when combined with other PI3K-pathway inhibitors in PIK3CA mutant cells.


Asunto(s)
Protocolos de Quimioterapia Combinada Antineoplásica/farmacología , Biología Computacional/métodos , Neoplasias/tratamiento farmacológico , Farmacogenética/métodos , Proteína ADAM17/antagonistas & inhibidores , Protocolos de Quimioterapia Combinada Antineoplásica/uso terapéutico , Benchmarking , Biomarcadores de Tumor/genética , Línea Celular Tumoral , Biología Computacional/normas , Conjuntos de Datos como Asunto , Antagonismo de Drogas , Resistencia a Antineoplásicos/efectos de los fármacos , Resistencia a Antineoplásicos/genética , Sinergismo Farmacológico , Genómica/métodos , Humanos , Terapia Molecular Dirigida/métodos , Mutación , Neoplasias/genética , Farmacogenética/normas , Fosfatidilinositol 3-Quinasas/genética , Inhibidores de las Quinasa Fosfoinosítidos-3 , Resultado del Tratamiento
14.
IEEE Trans Pattern Anal Mach Intell ; 31(12): 2227-42, 2009 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-19834143

RESUMEN

We propose a convex-concave programming approach for the labeled weighted graph matching problem. The convex-concave programming formulation is obtained by rewriting the weighted graph matching problem as a least-square problem on the set of permutation matrices and relaxing it to two different optimization problems: a quadratic convex and a quadratic concave optimization problem on the set of doubly stochastic matrices. The concave relaxation has the same global minimum as the initial graph matching problem, but the search for its global minimum is also a hard combinatorial problem. We, therefore, construct an approximation of the concave problem solution by following a solution path of a convex-concave problem obtained by linear interpolation of the convex and concave formulations, starting from the convex relaxation. This method allows to easily integrate the information on graph label similarities into the optimization problem, and therefore, perform labeled weighted graph matching. The algorithm is compared with some of the best performing graph matching methods on four data sets: simulated graphs, QAPLib, retina vessel images, and handwritten Chinese characters. In all cases, the results are competitive with the state of the art.


Asunto(s)
Algoritmos , Reconocimiento de Normas Patrones Automatizadas/estadística & datos numéricos , Inteligencia Artificial , Humanos , Procesamiento de Imagen Asistido por Computador , Vasos Retinianos/anatomía & histología
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA