Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 60
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Nat Methods ; 20(9): 1336-1345, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37550579

RESUMO

Single-cell datasets are routinely collected to investigate changes in cellular state between control cells and the corresponding cells in a treatment condition, such as exposure to a drug or infection by a pathogen. To better understand heterogeneity in treatment response, it is desirable to deconvolve variations enriched in treated cells from those shared with controls. However, standard computational models of single-cell data are not designed to explicitly separate these variations. Here, we introduce contrastive variational inference (contrastiveVI; https://github.com/suinleelab/contrastiveVI ), a framework for deconvolving variations in treatment-control single-cell RNA sequencing (scRNA-seq) datasets into shared and treatment-specific latent variables. Using three treatment-control scRNA-seq datasets, we apply contrastiveVI to perform a variety of analysis tasks, including visualization, clustering and differential expression testing. We find that contrastiveVI consistently achieves results that agree with known ground truths and often highlights subtle phenomena that may be difficult to ascertain with standard workflows. We conclude by generalizing contrastiveVI to accommodate joint transcriptome and surface protein measurements.


Assuntos
Perfilação da Expressão Gênica , Análise de Célula Única , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Transcriptoma , Análise por Conglomerados , Algoritmos , Software
2.
Bioinformatics ; 36(Suppl_2): i573-i582, 2020 12 30.
Artigo em Inglês | MEDLINE | ID: mdl-33381842

RESUMO

MOTIVATION: Increasing number of gene expression profiles has enabled the use of complex models, such as deep unsupervised neural networks, to extract a latent space from these profiles. However, expression profiles, especially when collected in large numbers, inherently contain variations introduced by technical artifacts (e.g. batch effects) and uninteresting biological variables (e.g. age) in addition to the true signals of interest. These sources of variations, called confounders, produce embeddings that fail to transfer to different domains, i.e. an embedding learned from one dataset with a specific confounder distribution does not generalize to different distributions. To remedy this problem, we attempt to disentangle confounders from true signals to generate biologically informative embeddings. RESULTS: In this article, we introduce the Adversarial Deconfounding AutoEncoder (AD-AE) approach to deconfounding gene expression latent spaces. The AD-AE model consists of two neural networks: (i) an autoencoder to generate an embedding that can reconstruct original measurements, and (ii) an adversary trained to predict the confounder from that embedding. We jointly train the networks to generate embeddings that can encode as much information as possible without encoding any confounding signal. By applying AD-AE to two distinct gene expression datasets, we show that our model can (i) generate embeddings that do not encode confounder information, (ii) conserve the biological signals present in the original space and (iii) generalize successfully across different confounder domains. We demonstrate that AD-AE outperforms standard autoencoder and other deconfounding approaches. AVAILABILITY AND IMPLEMENTATION: Our code and data are available at https://gitlab.cs.washington.edu/abdincer/ad-ae. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Redes Neurais de Computação , Expressão Gênica
3.
Nucleic Acids Res ; 47(10): e58, 2019 06 04.
Artigo em Inglês | MEDLINE | ID: mdl-30869146

RESUMO

ChIP-seq is a technique to determine binding locations of transcription factors, which remains a central challenge in molecular biology. Current practice is to use a 'control' dataset to remove background signals from a immunoprecipitation (IP) 'target' dataset. We introduce the AIControl framework, which eliminates the need to obtain a control dataset and instead identifies binding peaks by estimating the distributions of background signals from many publicly available control ChIP-seq datasets. We thereby avoid the cost of running control experiments while simultaneously increasing the accuracy of binding location identification. Specifically, AIControl can (i) estimate background signals at fine resolution, (ii) systematically weigh the most appropriate control datasets in a data-driven way, (iii) capture sources of potential biases that may be missed by one control dataset and (iv) remove the need for costly and time-consuming control experiments. We applied AIControl to 410 IP datasets in the ENCODE ChIP-seq database, using 440 control datasets from 107 cell types to impute background signal. Without using matched control datasets, AIControl identified peaks that were more enriched for putative binding sites than those identified by other popular peak callers that used a matched control dataset. We also demonstrated that our framework identifies binding sites that recover documented protein interactions more accurately.


Assuntos
Algoritmos , Imunoprecipitação da Cromatina/métodos , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Aprendizado de Máquina , Análise de Sequência de DNA/métodos , Sítios de Ligação , Humanos , Ligação Proteica , Reprodutibilidade dos Testes , Fatores de Transcrição/metabolismo
4.
Lancet ; 403(10428): 717, 2024 02 24.
Artigo em Inglês | MEDLINE | ID: mdl-38401957
6.
PLoS Comput Biol ; 12(5): e1004888, 2016 05.
Artigo em Inglês | MEDLINE | ID: mdl-27145341

RESUMO

We present a computational framework, called DISCERN (DIfferential SparsE Regulatory Network), to identify informative topological changes in gene-regulator dependence networks inferred on the basis of mRNA expression datasets within distinct biological states. DISCERN takes two expression datasets as input: an expression dataset of diseased tissues from patients with a disease of interest and another expression dataset from matching normal tissues. DISCERN estimates the extent to which each gene is perturbed-having distinct regulator connectivity in the inferred gene-regulator dependencies between the disease and normal conditions. This approach has distinct advantages over existing methods. First, DISCERN infers conditional dependencies between candidate regulators and genes, where conditional dependence relationships discriminate the evidence for direct interactions from indirect interactions more precisely than pairwise correlation. Second, DISCERN uses a new likelihood-based scoring function to alleviate concerns about accuracy of the specific edges inferred in a particular network. DISCERN identifies perturbed genes more accurately in synthetic data than existing methods to identify perturbed genes between distinct states. In expression datasets from patients with acute myeloid leukemia (AML), breast cancer and lung cancer, genes with high DISCERN scores in each cancer are enriched for known tumor drivers, genes associated with the biological processes known to be important in the disease, and genes associated with patient prognosis, in the respective cancer. Finally, we show that DISCERN can uncover potential mechanisms underlying network perturbation by explaining observed epigenomic activity patterns in cancer and normal tissue types more accurately than alternative methods, based on the available epigenomic data from the ENCODE project.


Assuntos
Redes Reguladoras de Genes , Modelos Genéticos , Neoplasias/genética , Neoplasias da Mama/genética , Biologia Computacional , Simulação por Computador , Bases de Dados Genéticas , Epigênese Genética , Feminino , Regulação Neoplásica da Expressão Gênica , Humanos , Leucemia Mieloide Aguda/genética , Funções Verossimilhança , Neoplasias Pulmonares/genética , Prognóstico
7.
Nucleic Acids Res ; 43(3): 1332-44, 2015 Feb 18.
Artigo em Inglês | MEDLINE | ID: mdl-25583238

RESUMO

We define a new category of candidate tumor drivers in cancer genome evolution: 'selected expression regulators' (SERs)-genes driving dysregulated transcriptional programs in cancer evolution. The SERs are identified from genome-wide tumor expression data with a novel method, namely SPARROW ( SPAR: se selected exp R: essi O: n regulators identified W: ith penalized regression). SPARROW uncovers a previously unknown connection between cancer expression variation and driver events, by using a novel sparse regression technique. Our results indicate that SPARROW is a powerful complementary approach to identify candidate genes containing driver events that are hard to detect from sequence data, due to a large number of passenger mutations and lack of comprehensive sequence information from a sufficiently large number of samples. SERs identified by SPARROW reveal known driver mutations in multiple human cancers, along with known cancer-associated processes and survival-associated genes, better than popular methods for inferring gene expression networks. We demonstrate that when applied to acute myeloid leukemia expression data, SPARROW identifies an apoptotic biomarker (PYCARD) for an investigational drug obatoclax. The PYCARD and obatoclax association is validated in 30 AML patient samples.


Assuntos
Neoplasias Encefálicas/genética , Perfilação da Expressão Gênica , Glioblastoma/genética , Leucemia Mieloide Aguda/genética , Redes Reguladoras de Genes , Humanos , Mutação
8.
J Natl Compr Canc Netw ; 14(1): 8-17, 2016 01.
Artigo em Inglês | MEDLINE | ID: mdl-26733551

RESUMO

Accelerating cancer research is expected to require new types of clinical trials. This report describes the Intensive Trial of OMics in Cancer (ITOMIC) and a participant with triple-negative breast cancer metastatic to bone, who had markedly elevated circulating tumor cells (CTCs) that were monitored 48 times over 9 months. A total of 32 researchers from 14 institutions were engaged in the patient's evaluation; 20 researchers had no prior involvement in patient care and 18 were recruited specifically for this patient. Whole-exome sequencing of 3 bone marrow samples demonstrated a novel ROS1 variant that was estimated to be present in most or all tumor cells. After an initial response to cisplatin, a hypothesis of crizotinib sensitivity was disproven. Leukapheresis followed by partial CTC enrichment allowed for the development of a differential high-throughput drug screen and demonstrated sensitivity to investigational BH3-mimetic inhibitors of BCL-2 that could not be tested in the patient because requests to the pharmaceutical sponsors were denied. The number and size of CTC clusters correlated with clinical status and eventually death. Focusing the expertise of a distributed network of investigators on an intensively monitored patient with cancer can generate high-resolution views of the natural history of cancer and suggest new opportunities for therapy. Optimization requires access to investigational drugs.


Assuntos
Redes Comunitárias , Pesquisadores , Neoplasias de Mama Triplo Negativas/diagnóstico , Protocolos de Quimioterapia Combinada Antineoplásica/uso terapêutico , Neoplasias Ósseas/secundário , Resistencia a Medicamentos Antineoplásicos , Ensaios de Seleção de Medicamentos Antitumorais , Prova Pericial , Feminino , Seguimentos , Humanos , Leucaférese , Estudos Longitudinais , Pessoa de Meia-Idade , Metástase Neoplásica , Células Neoplásicas Circulantes , Neoplasias de Mama Triplo Negativas/patologia , Neoplasias de Mama Triplo Negativas/terapia
9.
bioRxiv ; 2024 Mar 17.
Artigo em Inglês | MEDLINE | ID: mdl-38559197

RESUMO

Clinically and biologically valuable information may reside untapped in large cancer gene expression data sets. Deep unsupervised learning has the potential to extract this information with unprecedented efficacy but has thus far been hampered by a lack of biological interpretability and robustness. Here, we present DeepProfile, a comprehensive framework that addresses current challenges in applying unsupervised deep learning to gene expression profiles. We use DeepProfile to learn low-dimensional latent spaces for 18 human cancers from 50,211 transcriptomes. DeepProfile outperforms existing dimensionality reduction methods with respect to biological interpretability. Using DeepProfile interpretability methods, we show that genes that are universally important in defining the latent spaces across all cancer types control immune cell activation, while cancer type-specific genes and pathways define molecular disease subtypes. By linking DeepProfile latent variables to secondary tumor characteristics, we discover that tumor mutation burden is closely associated with the expression of cell cycle-related genes. DNA mismatch repair and MHC class II antigen presentation pathway expression, on the other hand, are consistently associated with patient survival. We validate these results through Kaplan-Meier analyses and nominate tumor-associated macrophages as an important source of survival-correlated MHC class II transcripts. Our results illustrate the power of unsupervised deep learning for discovery of novel cancer biology from existing gene expression data.

10.
Water Res ; 262: 122086, 2024 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-39032338

RESUMO

Artificial intelligence has been employed to simulate and optimize the performance of membrane capacitive deionization (MCDI), an emerging ion separation process. However, a real-time control for optimal MCDI operation has not been investigated yet. In this study, we aimed to develop a reinforcement learning (RL)-based control model and investigate the model to find an energy-efficient MCDI operation strategy. To fulfill the objectives, we established three long-short term memory models to predict applied voltage, outflow pH, and outflow electrical conductivity. Also, four RL agents were trained to minimize outflow concentration and energy consumption simultaneously. Consequently, actor-critic (A2C) and proximal policy optimization (PPO2) achieved the ion separation goal (<0.8 mS/cm) as they determined the electrical current and pump speed to be low. Particularly, A2C kept the parameters consistent in charging MCDI, which caused lower energy consumption (0.0128 kWh/m3) than PPO2 (0.0363 kWh/m3). To understand the decision-making process of A2C, the Shapley additive explanation based on the decision tree model estimated the influence of input parameters on the control parameters. The results of this study demonstrate the feasibility of RL-based controls in MCDI operations. Thus, we expect that the RL-based control model can improve further and enhance the efficiency of water treatment technologies.


Assuntos
Membranas Artificiais , Purificação da Água/métodos , Modelos Teóricos , Inteligência Artificial , Condutividade Elétrica
11.
Nat Med ; 30(4): 1154-1165, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38627560

RESUMO

Building trustworthy and transparent image-based medical artificial intelligence (AI) systems requires the ability to interrogate data and models at all stages of the development pipeline, from training models to post-deployment monitoring. Ideally, the data and associated AI systems could be described using terms already familiar to physicians, but this requires medical datasets densely annotated with semantically meaningful concepts. In the present study, we present a foundation model approach, named MONET (medical concept retriever), which learns how to connect medical images with text and densely scores images on concept presence to enable important tasks in medical AI development and deployment such as data auditing, model auditing and model interpretation. Dermatology provides a demanding use case for the versatility of MONET, due to the heterogeneity in diseases, skin tones and imaging modalities. We trained MONET based on 105,550 dermatological images paired with natural language descriptions from a large collection of medical literature. MONET can accurately annotate concepts across dermatology images as verified by board-certified dermatologists, competitively with supervised models built on previously concept-annotated dermatology datasets of clinical images. We demonstrate how MONET enables AI transparency across the entire AI system development pipeline, from building inherently interpretable models to dataset and model auditing, including a case study dissecting the results of an AI clinical trial.


Assuntos
Inteligência Artificial , Médicos , Humanos , Aprendizagem
12.
J Gastric Cancer ; 24(3): 341-352, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38960892

RESUMO

PURPOSE: Textbook outcome is a comprehensive measure used to assess surgical quality and is increasingly being recognized as a valuable evaluation tool. Delta-shaped anastomosis (DA), an intracorporeal gastroduodenostomy, is a viable option for minimally invasive distal gastrectomy in patients with gastric cancer. This study aims to evaluate the surgical outcomes and calculate the textbook outcome of DA. MATERIALS AND METHODS: In this retrospective study, the records of 4,902 patients who underwent minimally invasive distal gastrectomy for DA between 2009 and 2020 were reviewed. The data were categorized into three phases to analyze the trends over time. Surgical outcomes, including the operation time, length of post-operative hospital stay, and complication rates, were assessed, and the textbook outcome was calculated. RESULTS: Among 4,505 patients, the textbook outcome is achieved in 3,736 (82.9%). Post-operative complications affect the textbook outcome the most significantly (91.9%). The highest textbook outcome is achieved in phase 2 (85.0%), which surpasses the rates of in phase 1 (81.7%) and phase 3 (82.3%). The post-operative complication rate within 30 d after surgery is 8.7%, and the rate of major complications exceeding the Clavien-Dindo classification grade 3 is 2.4%. CONCLUSIONS: Based on the outcomes of a large dataset, DA can be considered safe and feasible for gastric cancer.


Assuntos
Anastomose Cirúrgica , Gastrectomia , Procedimentos Cirúrgicos Minimamente Invasivos , Complicações Pós-Operatórias , Neoplasias Gástricas , Humanos , Neoplasias Gástricas/cirurgia , Neoplasias Gástricas/patologia , Gastrectomia/métodos , Gastrectomia/efeitos adversos , Feminino , Masculino , Estudos Retrospectivos , Pessoa de Meia-Idade , Anastomose Cirúrgica/métodos , Idoso , Procedimentos Cirúrgicos Minimamente Invasivos/métodos , Procedimentos Cirúrgicos Minimamente Invasivos/efeitos adversos , Complicações Pós-Operatórias/epidemiologia , Complicações Pós-Operatórias/etiologia , Adulto , Resultado do Tratamento , Tempo de Internação , Idoso de 80 Anos ou mais , Duração da Cirurgia
14.
Arterioscler Thromb Vasc Biol ; 32(12): 2821-35, 2012 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-23087359

RESUMO

The combination of systems biology and large data sets offers new approaches to the study of cardiovascular diseases. These new approaches are especially important for the common cardiovascular diseases that have long been described as multifactorial. This promise is undermined by biologists' skepticism of the spider web-like network diagrams required to analyze these large data sets. Although these spider webs resemble composites of the familiar biochemical pathway diagrams, the complexity of the webs is overwhelming. As a result, biologists collaborate with data analysts whose mathematical methods seem much like those of experts using Ouija boards. To make matters worse, it is not evident how to design experiments when the network implies that many molecules must be part of the disease process. Our goal is to remove some of this mystery and suggest a simple experimental approach to the design of experiments appropriate for such analysis. We will attempt to explain how combinations of data sets that include all possible variables, graphical diagrams, complementation of different data sets, and Bayesian analyses now make it possible to determine the causes of multifactorial cardiovascular disease. We will describe this approach using the term causal analysis. Finally, we will describe how causal analysis is already being used to decipher the interactions among cytokines as causes of cardiovascular disease.


Assuntos
Doenças Cardiovasculares/epidemiologia , Animais , Teorema de Bayes , Doenças Cardiovasculares/genética , Causalidade , Expressão Gênica/genética , Humanos , Modelos Teóricos , Estatística como Assunto
15.
Lancet Healthy Longev ; 4(12): e711-e723, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37944549

RESUMO

BACKGROUND: Biological age is a measure of health that offers insights into ageing. The existing age clocks, although valuable, often trade off accuracy and interpretability. We introduce ExplaiNAble BioLogical Age (ENABL Age), a computational framework that combines machine-learning models with explainable artificial intelligence (XAI) methods to accurately estimate biological age with individualised explanations. METHODS: To construct the ENABL Age clock, we first predicted an age-related outcome (eg, all-cause or cause-specific mortality), and then rescaled these predictions to estimate biological age, using UK Biobank and National Health and Nutrition Examination Survey (NHANES) datasets. We adapted existing XAI methods to decompose individual ENABL Ages into contributing risk factors. For broad accessibility, we developed two versions: ENABL Age-L, based on blood tests, and ENABL Age-Q, based on questionnaire characteristics. Finally, we validated diverse ageing mechanisms captured by each ENABL Age clock through genome-wide association studies (GWAS) association analyses. FINDINGS: Our ENABL Age clock was significantly correlated with chronological age (r=0·7867, p<0·0001 for UK Biobank; r=0·7126, p<0·0001 for NHANES). These clocks distinguish individuals who are healthy (ie, their ENABL Age is lower than their chronological age) from those who are unhealthy (ie, their ENABL Age is higher than their chronological age), predicting mortality more effectively than existing clocks. Groups of individuals who were unhealthy showed approximately three to 12 times higher log hazard ratio than healthy groups, as per ENABL Age. The clocks achieved high mortality prediction power with an area under the receiver operating characteristic curve of 0·8179 for 5-year mortality and 0·8115 for 10-year mortality on the UK Biobank dataset, and 0·8935 for 5-year mortality and 0·9107 for 10-year mortality on the NHANES dataset. The individualised explanations that revealed the contribution of specific characteristics to ENABL Age provided insights into the important characteristics for ageing. An association analysis with risk factors and ageing-related morbidities and GWAS results on ENABL Age clocks trained on different mortality causes showed that each clock captures distinct ageing mechanisms. INTERPRETATION: ENABL Age brings an important leap forward in the application of XAI for interpreting biological age clocks. ENABL Age also carries substantial potential in practical settings, assisting medical professionals in untangling the complexity of ageing mechanisms, and potentially becoming a valuable tool in informed clinical decision-making processes. FUNDING: National Science Foundation and National Institutes of Health.


Assuntos
Inteligência Artificial , Estudo de Associação Genômica Ampla , Estados Unidos , Humanos , Inquéritos Nutricionais , Aprendizado de Máquina , Envelhecimento/genética
16.
Nat Commun ; 14(1): 2091, 2023 04 12.
Artigo em Inglês | MEDLINE | ID: mdl-37045821

RESUMO

A prominent trend in single-cell transcriptomics is providing spatial context alongside a characterization of each cell's molecular state. This typically requires targeting an a priori selection of genes, often covering less than 1% of the genome, and a key question is how to optimally determine the small gene panel. We address this challenge by introducing a flexible deep learning framework, PERSIST, to identify informative gene targets for spatial transcriptomics studies by leveraging reference scRNA-seq data. Using datasets spanning different brain regions, species, and scRNA-seq technologies, we show that PERSIST reliably identifies panels that provide more accurate prediction of the genome-wide expression profile, thereby capturing more information with fewer genes. PERSIST can be adapted to specific biological goals, and we demonstrate that PERSIST's binarization of gene expression levels enables models trained on scRNA-seq data to generalize with to spatial transcriptomics data, despite the complex shift between these technologies.


Assuntos
Análise de Célula Única , Transcriptoma , Transcriptoma/genética , Perfilação da Expressão Gênica , Análise de Sequência de RNA
17.
medRxiv ; 2023 May 16.
Artigo em Inglês | MEDLINE | ID: mdl-37292705

RESUMO

Despite the proliferation and clinical deployment of artificial intelligence (AI)-based medical software devices, most remain black boxes that are uninterpretable to key stakeholders including patients, physicians, and even the developers of the devices. Here, we present a general model auditing framework that combines insights from medical experts with a highly expressive form of explainable AI that leverages generative models, to understand the reasoning processes of AI devices. We then apply this framework to generate the first thorough, medically interpretable picture of the reasoning processes of machine-learning-based medical image AI. In our synergistic framework, a generative model first renders "counterfactual" medical images, which in essence visually represent the reasoning process of a medical AI device, and then physicians translate these counterfactual images to medically meaningful features. As our use case, we audit five high-profile AI devices in dermatology, an area of particular interest since dermatology AI devices are beginning to achieve deployment globally. We reveal how dermatology AI devices rely both on features used by human dermatologists, such as lesional pigmentation patterns, as well as multiple, previously unreported, potentially undesirable features, such as background skin texture and image color balance. Our study also sets a precedent for the rigorous application of explainable AI to understand AI in any specialized domain and provides a means for practitioners, clinicians, and regulators to uncloak AI's powerful but previously enigmatic reasoning processes in a medically understandable way.

18.
medRxiv ; 2023 Jun 12.
Artigo em Inglês | MEDLINE | ID: mdl-37398017

RESUMO

Building trustworthy and transparent image-based medical AI systems requires the ability to interrogate data and models at all stages of the development pipeline: from training models to post-deployment monitoring. Ideally, the data and associated AI systems could be described using terms already familiar to physicians, but this requires medical datasets densely annotated with semantically meaningful concepts. Here, we present a foundation model approach, named MONET (Medical cONcept rETriever), which learns how to connect medical images with text and generates dense concept annotations to enable tasks in AI transparency from model auditing to model interpretation. Dermatology provides a demanding use case for the versatility of MONET, due to the heterogeneity in diseases, skin tones, and imaging modalities. We trained MONET on the basis of 105,550 dermatological images paired with natural language descriptions from a large collection of medical literature. MONET can accurately annotate concepts across dermatology images as verified by board-certified dermatologists, outperforming supervised models built on previously concept-annotated dermatology datasets. We demonstrate how MONET enables AI transparency across the entire AI development pipeline from dataset auditing to model auditing to building inherently interpretable models.

19.
Nat Biomed Eng ; 2023 Dec 28.
Artigo em Inglês | MEDLINE | ID: mdl-38155295

RESUMO

The inferences of most machine-learning models powering medical artificial intelligence are difficult to interpret. Here we report a general framework for model auditing that combines insights from medical experts with a highly expressive form of explainable artificial intelligence. Specifically, we leveraged the expertise of dermatologists for the clinical task of differentiating melanomas from melanoma 'lookalikes' on the basis of dermoscopic and clinical images of the skin, and the power of generative models to render 'counterfactual' images to understand the 'reasoning' processes of five medical-image classifiers. By altering image attributes to produce analogous images that elicit a different prediction by the classifiers, and by asking physicians to identify medically meaningful features in the images, the counterfactual images revealed that the classifiers rely both on features used by human dermatologists, such as lesional pigmentation patterns, and on undesirable features, such as background skin texture and colour balance. The framework can be applied to any specialized medical domain to make the powerful inference processes of machine-learning models medically understandable.

20.
Nat Biomed Eng ; 7(6): 811-829, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-37127711

RESUMO

Machine learning may aid the choice of optimal combinations of anticancer drugs by explaining the molecular basis of their synergy. By combining accurate models with interpretable insights, explainable machine learning promises to accelerate data-driven cancer pharmacology. However, owing to the highly correlated and high-dimensional nature of transcriptomic data, naively applying current explainable machine-learning strategies to large transcriptomic datasets leads to suboptimal outcomes. Here by using feature attribution methods, we show that the quality of the explanations can be increased by leveraging ensembles of explainable machine-learning models. We applied the approach to a dataset of 133 combinations of 46 anticancer drugs tested in ex vivo tumour samples from 285 patients with acute myeloid leukaemia and uncovered a haematopoietic-differentiation signature underlying drug combinations with therapeutic synergy. Ensembles of machine-learning models trained to predict drug combination synergies on the basis of gene-expression data may improve the feature attribution quality of complex machine-learning models.


Assuntos
Perfilação da Expressão Gênica , Aprendizado de Máquina , Humanos , Transcriptoma
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa