Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
Comput Biol Med ; 174: 108398, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38608322

RESUMO

The recurrence of low-stage lung cancer poses a challenge due to its unpredictable nature and diverse patient responses to treatments. Personalized care and patient outcomes heavily rely on early relapse identification, yet current predictive models, despite their potential, lack comprehensive genetic data. This inadequacy fuels our research focus-integrating specific genetic information, such as pathway scores, into clinical data. Our aim is to refine machine learning models for more precise relapse prediction in early-stage non-small cell lung cancer. To address the scarcity of genetic data, we employ imputation techniques, leveraging publicly available datasets such as The Cancer Genome Atlas (TCGA), integrating pathway scores into our patient cohort from the Cancer Long Survivor Artificial Intelligence Follow-up (CLARIFY) project. Through the integration of imputed pathway scores from the TCGA dataset with clinical data, our approach achieves notable strides in predicting relapse among a held-out test set of 200 patients. By training machine learning models on enriched knowledge graph data, inclusive of triples derived from pathway score imputation, we achieve a promising precision of 82% and specificity of 91%. These outcomes highlight the potential of our models as supplementary tools within tumour, node, and metastasis (TNM) classification systems, offering improved prognostic capabilities for lung cancer patients. In summary, our research underscores the significance of refining machine learning models for relapse prediction in early-stage non-small cell lung cancer. Our approach, centered on imputing pathway scores and integrating them with clinical data, not only enhances predictive performance but also demonstrates the promising role of machine learning in anticipating relapse and ultimately elevating patient outcomes.


Assuntos
Carcinoma Pulmonar de Células não Pequenas , Genômica , Neoplasias Pulmonares , Aprendizado de Máquina , Humanos , Neoplasias Pulmonares/genética , Carcinoma Pulmonar de Células não Pequenas/genética , Genômica/métodos , Recidiva Local de Neoplasia/genética , Feminino , Masculino , Bases de Dados Genéticas
2.
JCO Clin Cancer Inform ; 7: e2200062, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37428988

RESUMO

PURPOSE: Stratifying patients with cancer according to risk of relapse can personalize their care. In this work, we provide an answer to the following research question: How to use machine learning to estimate probability of relapse in patients with early-stage non-small-cell lung cancer (NSCLC)? MATERIALS AND METHODS: For predicting relapse in 1,387 patients with early-stage (I-II) NSCLC from the Spanish Lung Cancer Group data (average age 65.7 years, female 24.8%, male 75.2%), we train tabular and graph machine learning models. We generate automatic explanations for the predictions of such models. For models trained on tabular data, we adopt SHapley Additive exPlanations local explanations to gauge how each patient feature contributes to the predicted outcome. We explain graph machine learning predictions with an example-based method that highlights influential past patients. RESULTS: Machine learning models trained on tabular data exhibit a 76% accuracy for the random forest model at predicting relapse evaluated with a 10-fold cross-validation (the model was trained 10 times with different independent sets of patients in test, train, and validation sets, and the reported metrics are averaged over these 10 test sets). Graph machine learning reaches 68% accuracy over a held-out test set of 200 patients, calibrated on a held-out set of 100 patients. CONCLUSION: Our results show that machine learning models trained on tabular and graph data can enable objective, personalized, and reproducible prediction of relapse and, therefore, disease outcome in patients with early-stage NSCLC. With further prospective and multisite validation, and additional radiological and molecular data, this prognostic model could potentially serve as a predictive decision support tool for deciding the use of adjuvant treatments in early-stage lung cancer.


Assuntos
Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Humanos , Masculino , Feminino , Idoso , Carcinoma Pulmonar de Células não Pequenas/diagnóstico , Carcinoma Pulmonar de Células não Pequenas/terapia , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/terapia , Recidiva Local de Neoplasia/diagnóstico , Aprendizado de Máquina , Prognóstico
3.
J Biomed Inform ; 144: 104424, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37352900

RESUMO

OBJECTIVE: Lung cancer exhibits unpredictable recurrence in low-stage tumors and variable responses to different therapeutic interventions. Predicting relapse in early-stage lung cancer can facilitate precision medicine and improve patient survivability. While existing machine learning models rely on clinical data, incorporating genomic information could enhance their efficiency. This study aims to impute and integrate specific types of genomic data with clinical data to improve the accuracy of machine learning models for predicting relapse in early-stage, non-small cell lung cancer patients. METHODS: The study utilized a publicly available TCGA lung cancer cohort and imputed genetic pathway scores into the Spanish Lung Cancer Group (SLCG) data, specifically in 1348 early-stage patients. Initially, tumor recurrence was predicted without imputed pathway scores. Subsequently, the SLCG data were augmented with pathway scores imputed from TCGA. The integrative approach aimed to enhance relapse risk prediction performance. RESULTS: The integrative approach achieved improved relapse risk prediction with the following evaluation metrics: an area under the precision-recall curve (PR-AUC) score of 0.75, an area under the ROC (ROC-AUC) score of 0.80, an F1 score of 0.61, and a Precision of 0.80. The prediction explanation model SHAP (SHapley Additive exPlanations) was employed to explain the machine learning model's predictions. CONCLUSION: We conclude that our explainable predictive model is a promising tool for oncologists that addresses an unmet clinical need of post-treatment patient stratification based on the relapse risk while also improving the predictive power by incorporating proxy genomic data not available for specific patients.


Assuntos
Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Carcinoma de Pequenas Células do Pulmão , Humanos , Carcinoma Pulmonar de Células não Pequenas/diagnóstico , Carcinoma Pulmonar de Células não Pequenas/genética , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/genética , Recidiva Local de Neoplasia/genética , Pulmão
4.
Neural Netw ; 156: 205-217, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-36274527

RESUMO

The scarcity of high-quality annotations in many application scenarios has recently led to an increasing interest in devising learning techniques that combine unlabeled data with labeled data in a network. In this work, we focus on the label propagation problem in multilayer networks. Our approach is inspired by the heat diffusion model, which shows usefulness in machine learning problems such as classification and dimensionality reduction. We propose a novel boundary-based heat diffusion algorithm that guarantees a closed-form solution with an efficient implementation. We experimentally validated our method on synthetic networks and five real-world multilayer network datasets representing scientific coauthorship, spreading drug adoption among physicians, two bibliographic networks, and a movie network. The results demonstrate the benefits of the proposed algorithm, where our boundary-based heat diffusion dominates the performance of the state-of-the-art methods.


Assuntos
Temperatura Alta , Aprendizado de Máquina Supervisionado , Algoritmos , Aprendizado de Máquina
5.
AMIA Annu Symp Proc ; 2022: 1062-1071, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-37128408

RESUMO

Early-stage lung cancer is crucial clinically due to its insidious nature and rapid progression. Most of the prediction models designed to predict tumour recurrence in the early stage of lung cancer rely on the clinical or medical history of the patient. However, their performance could likely be improved if the input patient data contained genomic information. Unfortunately, such data is not always collected. This is the main motivation of our work, in which we have imputed and integrated specific type of genomic data with clinical data to increase the accuracy of machine learning models for prediction of relapse in early-stage, non-small cell lung cancer patients. Using a publicly available TCGA lung adenocarcinoma cohort of 501 patients, their aneuploidy scores were imputed into similar records in the Spanish Lung Cancer Group (SLCG) data, more specifically a cohort of 1348 early-stage patients. First, the tumor recurrence in those patients was predicted without the imputed aneuploidy scores. Then, the SLCG data were enriched with the aneuploidy scores imputed from TCGA. This integrative approach improved the prediction of the relapse risk, achieving area under the precision-recall curve (PR-AUC) score of 0.74, and area under the ROC (ROC-AUC) score of 0.79. Using the prediction explanation model SHAP (SHapley Additive exPlanations), we further explained the predictions performed by the machine learning model. We conclude that our explainable predictive model is a promising tool for oncologists that addresses an unmet clinical need of post-treatment patient stratification based on the relapse risk, while also improving the predictive power by incorporating proxy genomic data not available for the actual specific patients.


Assuntos
Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Humanos , Carcinoma Pulmonar de Células não Pequenas/genética , Recidiva Local de Neoplasia , Genômica
6.
IEEE/ACM Trans Comput Biol Bioinform ; 19(2): 1203-1213, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-33064647

RESUMO

Semi-Supervised Learning (SSL)is an approach to machine learning that makes use of unlabeled data for training with a small amount of labeled data. In the context of molecular biology and pharmacology, one can take advantage of unlabeled data. For instance, to identify drugs and targets where a few genes are known to be associated with a specific target for drugs and considered as labeled data. Labeling the genes requires laboratory verification and validation. This process is usually very time consuming and expensive. Thus, it is useful to estimate the functional role of drugs from unlabeled data using computational methods. To develop such a model, we used openly available data resources to create (i)drugs and genes, (ii)genes and disease, bipartite graphs. We constructed the genetic embedding graph from the two bipartite graphs using Tensor Factorization methods. We integrated the genetic embedding graph with the publicly available protein functional association network. Our results show the usefulness of the integration by effectively predicting drug labels.


Assuntos
Proteínas , Aprendizado de Máquina Supervisionado , Proteínas/genética , Proteínas/metabolismo
7.
Comput Biol Med ; 131: 104249, 2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-33561673

RESUMO

BACKGROUND: The COVID-19 pandemic is a significant public health crisis that is hitting hard on people's health, well-being, and freedom of movement, and affecting the global economy. Scientists worldwide are competing to develop therapeutics and vaccines; currently, three drugs and two vaccine candidates have been given emergency authorization use. However, there are still questions of efficacy with regard to specific subgroups of patients and the vaccine's scalability to the general public. Under such circumstances, understanding COVID-19 symptoms is vital in initial triage; it is crucial to distinguish the severity of cases for effective management and treatment. This study aimed to discover symptom patterns and overall symptom rules, including rules disaggregated by age, sex, chronic condition, and mortality status, among COVID-19 patients. METHODS: This study was a retrospective analysis of COVID-19 patient data made available online by the Wolfram Data Repository through May 27, 2020. We applied a widely used rule-based machine learning technique called association rule mining to identify frequent symptoms and define patterns in the rules discovered. RESULT: In total, 1,560 patients with COVID-19 were included in the study, with a median age of 52 years. The most frequently occurring symptom was fever (67%), followed by cough (37%), malaise/body soreness (11%), pneumonia (11%), and sore throat (8%). Myocardial infarction, heart failure, and renal disease were present in less than 1% of patients. The top ten significant symptom rules (out of 71 generated) showed cough, septic shock, and respiratory distress syndrome as frequent consequents. If a patient had a breathing problem and sputum production, then, there was higher confidence of that patient having a cough; if cardiac disease, renal disease, or pneumonia was present, then there was a higher confidence of septic shock or respiratory distress syndrome. Symptom rules differed between younger and older patients and between male and female patients. Patients who had chronic conditions or died of COVID-19 had more severe symptom rules than those patients who did not have chronic conditions or survived of COVID-19. Concerning chronic condition rules among 147 patients, if a patient had diabetes, prerenal azotemia, and coronary bypass surgery, there was a certainty of hypertension. CONCLUSION: The most frequently reported symptoms in patients with COVID-19 were fever, cough, pneumonia, and sore throat; while 1% had severe symptoms, such as septic shock, respiratory distress syndrome, and respiratory failure. Symptom rules differed by age and sex. Patients with chronic disease and patients who died of COVID-19 had severe symptom rules more specifically, cardiovascular-related symptoms accompanied by pneumonia, fever, and cough as consequents.


Assuntos
COVID-19 , Mineração de Dados , Bases de Dados Factuais , Diagnóstico por Computador , Pandemias , SARS-CoV-2/metabolismo , Biomarcadores/metabolismo , COVID-19/diagnóstico , COVID-19/epidemiologia , COVID-19/metabolismo , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Estudos Retrospectivos
8.
AMIA Annu Symp Proc ; 2021: 853-862, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-35308971

RESUMO

Early detection and mitigation of disease recurrence in non-small cell lung cancer (NSCLC) patients is a nontrivial problem that is typically addressed either by rather generic follow-up screening guidelines, self-reporting, simple nomograms, or by models that predict relapse risk in individual patients using statistical analysis of retrospective data. We posit that machine learning models trained on patient data can provide an alternative approach that allows for more efficient development of many complementary models at once, superior accuracy, less dependency on the data collection protocols and increased support for explainability of the predictions. In this preliminary study, we describe an experimental suite of various machine learning models applied on a patient cohort of 2442 early stage NSCLC patients. We discuss the promising results achieved, as well as the lessons we learned while developing this baseline for further, more advanced studies in this area.


Assuntos
Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Carcinoma Pulmonar de Células não Pequenas/diagnóstico , Carcinoma Pulmonar de Células não Pequenas/patologia , Humanos , Neoplasias Pulmonares/diagnóstico , Estadiamento de Neoplasias , Nomogramas , Prognóstico , Estudos Retrospectivos
9.
BMC Bioinformatics ; 20(1): 462, 2019 Sep 09.
Artigo em Inglês | MEDLINE | ID: mdl-31500564

RESUMO

BACKGROUND: Determining the association between tumor sample and the gene is demanding because it requires a high cost for conducting genetic experiments. Thus, the discovered association between tumor sample and gene further requires clinical verification and validation. This entire mechanism is time-consuming and expensive. Due to this issue, predicting the association between tumor samples and genes remain a challenge in biomedicine. RESULTS: Here we present, a computational model based on a heat diffusion algorithm which can predict the association between tumor samples and genes. We proposed a 2-layered graph. In the first layer, we constructed a graph of tumor samples and genes where these two types of nodes are connected by "hasGene" relationship. In the second layer, the gene nodes are connected by "interaction" relationship. We applied the heat diffusion algorithms in nine different variants of genetic interaction networks extracted from STRING and BioGRID database. The heat diffusion algorithm predicted the links between tumor samples and genes with mean AUC-ROC score of 0.84. This score is obtained by using weighted genetic interactions of fusion or co-occurrence channels from the STRING database. For the unweighted genetic interaction from the BioGRID database, the algorithms predict the links with an AUC-ROC score of 0.74. CONCLUSIONS: We demonstrate that the gene-gene interaction scores could improve the predictive power of the heat diffusion model to predict the links between tumor samples and genes. We showed the efficient runtime of the heat diffusion algorithm in various genetic interaction network. We statistically validated our prediction quality of the links between tumor samples and genes.


Assuntos
Algoritmos , Genes Neoplásicos , Neoplasias/genética , Área Sob a Curva , Metilação de DNA/genética , Bases de Dados Factuais , Difusão , Epistasia Genética , Redes Reguladoras de Genes , Humanos , Curva ROC , Reprodutibilidade dos Testes
10.
Sci Rep ; 9(1): 10436, 2019 07 18.
Artigo em Inglês | MEDLINE | ID: mdl-31320740

RESUMO

Identifying the unintended effects of drugs (side effects) is a very important issue in pharmacological studies. The laboratory verification of associations between drugs and side effects requires costly, time-intensive research. Thus, an approach to predicting drug side effects based on known side effects, using a computational model, is highly desirable. To provide such a model, we used openly available data resources to model drugs and side effects as a bipartite graph. The drug-drug network is constructed using the word2vec model where the edges between drugs represent the semantic similarity between them. We integrated the bipartite graph and the semantic similarity graph using a matrix factorization method and a diffusion based model. Our results show the effectiveness of this integration by computing weighted (i.e., ranked) predictions of initially unknown links between side effects and drugs.


Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/etiologia , Preparações Farmacêuticas/química , Algoritmos , Biologia Computacional/métodos , Simulação por Computador , Difusão , Descoberta de Drogas/métodos , Humanos , Semântica
11.
Int J Med Inform ; 127: 127-133, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-31128824

RESUMO

BACKGROUND: In general practice, many infections are treated empirically prior to or without microbiological confirmation. Prediction of antimicrobial susceptibility could optimise prescribing thus improving patient outcomes. Decision tree models are a novel idea to predict AMR at the time of clinical presentation. This study aims to apply a prediction model using a decision tree approach to predict the antimicrobial resistance (AMR) of pathogens causing urinary tract infections (UTI) for patients over 65 years based on pre-existing routine laboratory data. METHODS: Data were extracted from the database of the microbiological laboratory of the University Hospitals Galway (UHG). All urine results from patients over 65 years, their microbiological analysis and susceptibility (AST) results from January 2011 to December 2015 were included. The primary endpoint was culture result and resistance to antimicrobials (nitrofurantoin, trimethoprim, ciprofloxacin, co-amoxiclav, and amoxicillin) commonly used to treat UTI. A non-parametric regression tree analysis i.e. a decision tree model was generated with the 75% of the dataset (training set) and validated with the remaining 25% (test set). The model performance was evaluated measuring Area Under the Curve Receiver Operating Characteristic (AUC_ROC) curve. RESULTS: A total of 99,101 urine samples of patients over 65 years were submitted for culture over the five years and 27% had significant bacteriuria (≥104 cfu/ml) and AST. The most common identified causative organisms were E.coli, Klebsiella spp. and Proteus spp. E.coli was more often resistant to amoxicillin (66%) followed by Proteus spp. (41%). Klebsiella spp. and Proteus spp. were more often resistant to trimethoprim (78% and 54% respectively). E. coli resistance to nitrofurantoin is low (<10%). The decision tree model showed an AUC-ROC score of 0.68 for culture and in between 0.60 to 0.97 for antimicrobial resistance of the pathogens, with the inclusion of patient's descriptors only. Including the uropathogen in the model did not change model performance. CONCLUSIONS: The decision tree models using patient descriptors available at the time of presentation showed fair to excellent performance in predicting culture and antimicrobial resistance. The presented models provide an alternative approach to decision making on antimicrobial prescribing for UTIs. Increasing more predictors in the model could improve the model performance. Prospective data collection, validation and feasibility testing of the model including data from other laboratories will progress the practical implementation of similar models.


Assuntos
Antibacterianos/uso terapêutico , Farmacorresistência Bacteriana , Infecções Urinárias/tratamento farmacológico , Idoso , Árvores de Decisões , Escherichia coli/efeitos dos fármacos , Infecções por Escherichia coli/tratamento farmacológico , Feminino , Humanos , Masculino , Estudos Retrospectivos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA