Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
1.
Artif Intell Med ; 143: 102625, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37673566

RESUMO

The wide adoption of electronic health records (EHRs) offers immense potential as a source of support for clinical research. However, previous studies focused on extracting only a limited set of medical concepts to support information extraction in the cancer domain for the Spanish language. Building on the success of deep learning for processing natural language texts, this paper proposes a transformer-based approach to extract named entities from breast cancer clinical notes written in Spanish and compares several language models. To facilitate this approach, a schema for annotating clinical notes with breast cancer concepts is presented, and a corpus for breast cancer is developed. Results indicate that both BERT-based and RoBERTa-based language models demonstrate competitive performance in clinical Named Entity Recognition (NER). Specifically, BETO and multilingual BERT achieve F-scores of 93.71% and 94.63%, respectively. Additionally, RoBERTa Biomedical attains an F-score of 95.01%, while RoBERTa BNE achieves an F-score of 94.54%. The findings suggest that transformers can feasibly extract information in the clinical domain in the Spanish language, with the use of models trained on biomedical texts contributing to enhanced results. The proposed approach takes advantage of transfer learning techniques by fine-tuning language models to automatically represent text features and avoiding the time-consuming feature engineering process.


Assuntos
Neoplasias da Mama , Registros Eletrônicos de Saúde , Multilinguismo , Armazenamento e Recuperação da Informação , Aprendizado Profundo , Processamento de Linguagem Natural
2.
JCO Clin Cancer Inform ; 7: e2200062, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37428988

RESUMO

PURPOSE: Stratifying patients with cancer according to risk of relapse can personalize their care. In this work, we provide an answer to the following research question: How to use machine learning to estimate probability of relapse in patients with early-stage non-small-cell lung cancer (NSCLC)? MATERIALS AND METHODS: For predicting relapse in 1,387 patients with early-stage (I-II) NSCLC from the Spanish Lung Cancer Group data (average age 65.7 years, female 24.8%, male 75.2%), we train tabular and graph machine learning models. We generate automatic explanations for the predictions of such models. For models trained on tabular data, we adopt SHapley Additive exPlanations local explanations to gauge how each patient feature contributes to the predicted outcome. We explain graph machine learning predictions with an example-based method that highlights influential past patients. RESULTS: Machine learning models trained on tabular data exhibit a 76% accuracy for the random forest model at predicting relapse evaluated with a 10-fold cross-validation (the model was trained 10 times with different independent sets of patients in test, train, and validation sets, and the reported metrics are averaged over these 10 test sets). Graph machine learning reaches 68% accuracy over a held-out test set of 200 patients, calibrated on a held-out set of 100 patients. CONCLUSION: Our results show that machine learning models trained on tabular and graph data can enable objective, personalized, and reproducible prediction of relapse and, therefore, disease outcome in patients with early-stage NSCLC. With further prospective and multisite validation, and additional radiological and molecular data, this prognostic model could potentially serve as a predictive decision support tool for deciding the use of adjuvant treatments in early-stage lung cancer.


Assuntos
Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Humanos , Masculino , Feminino , Idoso , Carcinoma Pulmonar de Células não Pequenas/diagnóstico , Carcinoma Pulmonar de Células não Pequenas/terapia , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/terapia , Recidiva Local de Neoplasia/diagnóstico , Aprendizado de Máquina , Prognóstico
3.
Cancers (Basel) ; 14(16)2022 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-36011034

RESUMO

BACKGROUND: Artificial intelligence (AI) has contributed substantially in recent years to the resolution of different biomedical problems, including cancer. However, AI tools with significant and widespread impact in oncology remain scarce. The goal of this study is to present an AI-based solution tool for cancer patients data analysis that assists clinicians in identifying the clinical factors associated with poor prognosis, relapse and survival, and to develop a prognostic model that stratifies patients by risk. MATERIALS AND METHODS: We used clinical data from 5275 patients diagnosed with non-small cell lung cancer, breast cancer, and non-Hodgkin lymphoma at Hospital Universitario Puerta de Hierro-Majadahonda. Accessible clinical parameters measured with a wearable device and quality of life questionnaires data were also collected. RESULTS: Using an AI-tool, data from 5275 cancer patients were analyzed, integrating clinical data, questionnaires data, and data collected from wearable devices. Descriptive analyses were performed in order to explore the patients' characteristics, survival probabilities were calculated, and a prognostic model identified low and high-risk profile patients. CONCLUSION: Overall, the reconstruction of the population's risk profile for the cancer-specific predictive model was achieved and proved useful in clinical practice using artificial intelligence. It has potential application in clinical settings to improve risk stratification, early detection, and surveillance management of cancer patients.

4.
PeerJ Comput Sci ; 8: e913, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35494817

RESUMO

Detecting negation and uncertainty is crucial for medical text mining applications; otherwise, extracted information can be incorrectly identified as real or factual events. Although several approaches have been proposed to detect negation and uncertainty in clinical texts, most efforts have focused on the English language. Most proposals developed for Spanish have focused mainly on negation detection and do not deal with uncertainty. In this paper, we propose a deep learning-based approach for both negation and uncertainty detection in clinical texts written in Spanish. The proposed approach explores two deep learning methods to achieve this goal: (i) Bidirectional Long-Short Term Memory with a Conditional Random Field layer (BiLSTM-CRF) and (ii) Bidirectional Encoder Representation for Transformers (BERT). The approach was evaluated using NUBES and IULA, two public corpora for the Spanish language. The results obtained showed an F-score of 92% and 80% in the scope recognition task for negation and uncertainty, respectively. We also present the results of a validation process conducted using a real-life annotated dataset from clinical notes belonging to cancer patients. The proposed approach shows the feasibility of deep learning-based methods to detect negation and uncertainty in Spanish clinical texts. Experiments also highlighted that this approach improves performance in the scope recognition task compared to other proposals in the biomedical domain.

5.
Ther Adv Musculoskelet Dis ; 13: 1759720X211034063, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34367344

RESUMO

INTRODUCTION: Rheumatic and musculoskeletal diseases (RMDs) have a significant impact on patients' health-related quality of life (HRQoL) exacerbating disability, reducing independence and work capacity, among others. Predictors' identification affecting HRQoL could help to place efforts that minimize the deleterious impact of these conditions on patients' wellbeing. This study evaluates the influence of demographic and clinical predictors on the HRQoL of a cohort of RMD patients, measured using the Rosser classification index (RCI). METHODS: We included patients attending the Hospital Clínico San Carlos (HCSC) rheumatology outpatient clinic from 1 April 2007 to 30 November 2017. The primary outcome was the HRQoL assessed in each of the patient's visits using the RCI. Demographic and clinical variables extracted from a departmental electronic health record (EHR) were used as predictors: RMD diagnoses, treatments, comorbidities, and averaged HRQoL values from previous periods (for this last variable, values were imputed if no information was available). Association between predictors and HRQoL was analyzed using penalized generalized estimating equations (PGEEs). To account for imputation bias, the PGEE model was repeated excluding averaged HRQoL predictors, and common predictors were considered. DISCUSSION: A total of 18,187 outpatients with 95,960 visits were included. From 410 initial predictors, 19 were independently associated with patients' HRQoL in both PGEE models. Chronic kidney disease (CKD), an episode of prescription of third level analgesics, monoarthritis, and fibromyalgia diagnoses were associated with worse HRQoL. Conversely, the prescription in the previous visit of acid-lowering medication, colchicine, and third level analgesics was associated with better HRQoL. CONCLUSION: We have identified several diagnoses, treatments, and comorbidities independently associated with HRQoL in a cohort of outpatients attending a rheumatology clinic.

6.
Brief Bioinform ; 22(3)2021 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-32632447

RESUMO

Molecular classification of glioblastoma has enabled a deeper understanding of the disease. The four-subtype model (including Proneural, Classical, Mesenchymal and Neural) has been replaced by a model that discards the Neural subtype, found to be associated with samples with a high content of normal tissue. These samples can be misclassified preventing biological and clinical insights into the different tumor subtypes from coming to light. In this work, we present a model that tackles both the molecular classification of samples and discrimination of those with a high content of normal cells. We performed a transcriptomic in silico analysis on glioblastoma (GBM) samples (n = 810) and tested different criteria to optimize the number of genes needed for molecular classification. We used gene expression of normal brain samples (n = 555) to design an additional gene signature to detect samples with a high normal tissue content. Microdissection samples of different structures within GBM (n = 122) have been used to validate the final model. Finally, the model was tested in a cohort of 43 patients and confirmed by histology. Based on the expression of 20 genes, our model is able to discriminate samples with a high content of normal tissue and to classify the remaining ones. We have shown that taking into consideration normal cells can prevent errors in the classification and the subsequent misinterpretation of the results. Moreover, considering only samples with a low content of normal cells, we found an association between the complexity of the samples and survival for the three molecular subtypes.


Assuntos
Biomarcadores Tumorais , Neoplasias Encefálicas , Encéfalo , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Glioblastoma , Biomarcadores Tumorais/biossíntese , Biomarcadores Tumorais/genética , Encéfalo/metabolismo , Encéfalo/patologia , Neoplasias Encefálicas/classificação , Neoplasias Encefálicas/genética , Neoplasias Encefálicas/metabolismo , Neoplasias Encefálicas/patologia , Feminino , Glioblastoma/classificação , Glioblastoma/genética , Glioblastoma/metabolismo , Glioblastoma/patologia , Humanos , Masculino , Microdissecção
7.
Artif Intell Med ; 105: 101860, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32505419

RESUMO

The automatic extraction of a patient's natural history from Electronic Health Records (EHRs) is a critical step towards building intelligent systems that can reason about clinical variables and support decision making. Although EHRs contain a large amount of valuable information about the patient's medical care, this information can only be fully understood when analyzed in a temporal context. Any intelligent system should then be able to extract medical concepts, date expressions, temporal relations and the temporal ordering of medical events from the free texts of EHRs; yet, this task is hard to tackle, due to the domain specific nature of EHRs, writing quality and lack of structure of these texts, and more generally the presence of redundant information. In this paper, we introduce a new Natural Language Processing (NLP) framework, capable of extracting the aforementioned elements from EHRs written in Spanish using rule-based methods. We focus on building medical timelines, which include disease diagnosis and its progression over time. By using a large dataset of EHRs comprising information about patients suffering from lung cancer, we show that our framework has an adequate level of performance by correctly building the timeline for 843 patients from a pool of 989 patients, achieving a precision of 0.852.


Assuntos
Registros Eletrônicos de Saúde , Neoplasias Pulmonares , Humanos , Processamento de Linguagem Natural , Tempo
8.
Cancer Epidemiol ; 67: 101737, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32450544

RESUMO

BACKGROUND: Biological differences between the sexes have a major impact on disease and treatment outcome. In this paper, we evaluate the prognostic value of sex in stage IV non-small-cell lung cancer (NSCLC) in the context of routine clinical data, and compare this information with other external datasets. METHODS: Clinical data from stage IV NSCLC patients from Hospital Puerta de Hierro (HPH) were retrieved from electronic health records using big data analytics (N = 397). In addition, data from the Spanish Lung Cancer Group (GECP) Tumor Registry (N = 1382) and from a published study available from the cBioPortal (MSK) (N = 601) were analyzed. Survival curves were estimated using the Kaplan-Meier method. A Cox proportional hazards regression model was used to assess the prognostic value of sex. A meta-analysis to compare the outcome for males and females in terms of overall survival (OS) and progression free survival (PFS) was performed. RESULTS: The median OS time was 12 months for males and 19 months for females (overall HR = 0.77; 95% CI: 0.68-0.87; P < 0.001). Similarly, females with stage IV NSCLC harboring an EGFR-sensitizing mutation lived significantly longer than males (median OS: males, 19 months; females, 32 months) with a lower risk of death compared with males (overall HR = 0.75; 95% CI: 0.67-0.84). In addition, female patients benefited more from EGFR inhibitors in terms of PFS and OS (overall HR = 0.45; 95% CI: 0.32-0.64, and HR = 0.62; 95% CI: 0.48-0.80, respectively). Median PFS was 21 months in females and 12 months in males (P < 0.001). CONCLUSIONS: Using routine clinical data we confirmed the previous finding that among stage IV NSCLC patients, females had a significantly better prognosis than males. The effect size of the sex was notable, highlighting the fact that survival rates are usually estimated and patients are generally managed without considering the sexes separately, which may lead to suboptimal results.


Assuntos
Carcinoma Pulmonar de Células não Pequenas/mortalidade , Neoplasias Pulmonares/mortalidade , Mutação , Adulto , Idoso , Carcinoma Pulmonar de Células não Pequenas/tratamento farmacológico , Carcinoma Pulmonar de Células não Pequenas/genética , Carcinoma Pulmonar de Células não Pequenas/patologia , Receptores ErbB/antagonistas & inibidores , Receptores ErbB/genética , Feminino , Humanos , Neoplasias Pulmonares/tratamento farmacológico , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/patologia , Masculino , Metanálise como Assunto , Pessoa de Meia-Idade , Estadiamento de Neoplasias , Prognóstico , Inibidores de Proteínas Quinases/uso terapêutico , Fatores Sexuais , Taxa de Sobrevida
9.
PLoS One ; 8(8): e72045, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23991036

RESUMO

Complex networks have been extensively used in the last decade to characterize and analyze complex systems, and they have been recently proposed as a novel instrument for the analysis of spectra extracted from biological samples. Yet, the high number of measurements composing spectra, and the consequent high computational cost, make a direct network analysis unfeasible. We here present a comparative analysis of three customary feature selection algorithms, including the binning of spectral data and the use of information theory metrics. Such algorithms are compared by assessing the score obtained in a classification task, where healthy subjects and people suffering from different types of cancers should be discriminated. Results indicate that a feature selection strategy based on Mutual Information outperforms the more classical data binning, while allowing a reduction of the dimensionality of the data set in two orders of magnitude.


Assuntos
Espectrometria de Massas/métodos , Neoplasias/metabolismo , Redes Neurais de Computação , Proteoma/análise , Algoritmos , Humanos , Neoplasias/diagnóstico , Reconhecimento Automatizado de Padrão/métodos , Proteoma/classificação , Curva ROC , Reprodutibilidade dos Testes
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA