Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
1.
J Biomed Inform ; 95: 103219, 2019 07.
Artículo en Inglés | MEDLINE | ID: mdl-31150777

RESUMEN

Clinical narratives are a valuable source of information for both patient care and biomedical research. Given the unstructured nature of medical reports, specific automatic techniques are required to extract relevant entities from such texts. In the natural language processing (NLP) community, this task is often addressed by using supervised methods. To develop such methods, both reliably-annotated corpora and elaborately designed features are needed. Despite the recent advances on corpora collection and annotation, research on multiple domains and languages is still limited. In addition, to compute the features required for supervised classification, suitable language- and domain-specific tools are needed. In this work, we propose a novel application of recurrent neural networks (RNNs) for event extraction from medical reports written in Italian. To train and evaluate the proposed approach, we annotated a corpus of 75 cardiology reports for a total of 4365 mentions of relevant events and their attributes (e.g., the polarity). For the annotation task, we developed specific annotation guidelines, which are provided together with this paper. The RNN-based classifier was trained on a training set including 3335 events (60 documents). The resulting model was integrated into an NLP pipeline that uses a dictionary lookup approach to search for relevant concepts inside the text. A test set of 1030 events (15 documents) was used to evaluate and compare different pipeline configurations. As a main result, using the RNN-based classifier instead of the dictionary lookup approach allowed increasing recall from 52.4% to 88.9%, and precision from 81.1% to 88.2%. Further, using the two methods in combination, we obtained final recall, precision, and F1 score of 91.7%, 88.6%, and 90.1%, respectively. These experiments indicate that integrating a well-performing RNN-based classifier with a standard knowledge-based approach can be a good strategy to extract information from clinical text in non-English languages.


Asunto(s)
Minería de Datos/métodos , Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Cardiopatías , Humanos , Italia , Redes Neurales de la Computación , Semántica
2.
BMJ Open ; 12(12): e058058, 2022 12 05.
Artículo en Inglés | MEDLINE | ID: mdl-36576182

RESUMEN

OBJECTIVES: Attention deficit hyperactivity disorder (ADHD) is a prevalent childhood disorder, but often goes unrecognised and untreated. To improve access to services, accurate predictions of populations at high risk of ADHD are needed for effective resource allocation. Using a unique linked health and education data resource, we examined how machine learning (ML) approaches can predict risk of ADHD. DESIGN: Retrospective population cohort study. SETTING: South London (2007-2013). PARTICIPANTS: n=56 258 pupils with linked education and health data. PRIMARY OUTCOME MEASURES: Using area under the curve (AUC), we compared the predictive accuracy of four ML models and one neural network for ADHD diagnosis. Ethnic group and language biases were weighted using a fair pre-processing algorithm. RESULTS: Random forest and logistic regression prediction models provided the highest predictive accuracy for ADHD in population samples (AUC 0.86 and 0.86, respectively) and clinical samples (AUC 0.72 and 0.70). Precision-recall curve analyses were less favourable. Sociodemographic biases were effectively reduced by a fair pre-processing algorithm without loss of accuracy. CONCLUSIONS: ML approaches using linked routinely collected education and health data offer accurate, low-cost and scalable prediction models of ADHD. These approaches could help identify areas of need and inform resource allocation. Introducing 'fairness weighting' attenuates some sociodemographic biases which would otherwise underestimate ADHD risk within minority groups.


Asunto(s)
Trastorno por Déficit de Atención con Hiperactividad , Humanos , Niño , Trastorno por Déficit de Atención con Hiperactividad/diagnóstico , Trastorno por Déficit de Atención con Hiperactividad/epidemiología , Estudios Retrospectivos , Estudios de Cohortes , Instituciones Académicas , Atención a la Salud , Aprendizaje Automático
3.
Sci Rep ; 11(1): 757, 2021 01 12.
Artículo en Inglés | MEDLINE | ID: mdl-33436814

RESUMEN

Receiving timely and appropriate treatment is crucial for better health outcomes, and research on the contribution of specific variables is essential. In the mental health domain, an important research variable is the date of psychosis symptom onset, as longer delays in treatment are associated with worse intervention outcomes. The growing adoption of electronic health records (EHRs) within mental health services provides an invaluable opportunity to study this problem at scale retrospectively. However, disease onset information is often only available in open text fields, requiring natural language processing (NLP) techniques for automated analyses. Since this variable can be documented at different points during a patient's care, NLP methods that model clinical and temporal associations are needed. We address the identification of psychosis onset by: 1) manually annotating a corpus of mental health EHRs with disease onset mentions, 2) modelling the underlying NLP problem as a paragraph classification approach, and 3) combining multiple onset paragraphs at the patient level to generate a ranked list of likely disease onset dates. For 22/31 test patients (71%) the correct onset date was found among the top-3 NLP predictions. The proposed approach was also applied at scale, allowing an onset date to be estimated for 2483 patients.


Asunto(s)
Registros Electrónicos de Salud/estadística & datos numéricos , Servicios de Salud Mental/estadística & datos numéricos , Procesamiento de Lenguaje Natural , Trastornos Psicóticos/diagnóstico , Evaluación de Síntomas/métodos , Humanos , Salud Mental , Estudios Retrospectivos
4.
J Biomed Semantics ; 11(1): 2, 2020 03 10.
Artículo en Inglés | MEDLINE | ID: mdl-32156302

RESUMEN

BACKGROUND: Duration of untreated psychosis (DUP) is an important clinical construct in the field of mental health, as longer DUP can be associated with worse intervention outcomes. DUP estimation requires knowledge about when psychosis symptoms first started (symptom onset), and when psychosis treatment was initiated. Electronic health records (EHRs) represent a useful resource for retrospective clinical studies on DUP, but the core information underlying this construct is most likely to lie in free text, meaning it is not readily available for clinical research. Natural Language Processing (NLP) is a means to addressing this problem by automatically extracting relevant information in a structured form. As a first step, it is important to identify appropriate documents, i.e., those that are likely to include the information of interest. Next, temporal information extraction methods are needed to identify time references for early psychosis symptoms. This NLP challenge requires solving three different tasks: time expression extraction, symptom extraction, and temporal "linking". In this study, we focus on the first step, using two relevant EHR datasets. RESULTS: We applied a rule-based NLP system for time expression extraction that we had previously adapted to a corpus of mental health EHRs from patients with a diagnosis of schizophrenia (first referrals). We extended this work by applying this NLP system to a larger set of documents and patients, to identify additional texts that would be relevant for our long-term goal, and developed a new corpus from a subset of these new texts (early intervention services). Furthermore, we added normalized value annotations ("2011-05") to the annotated time expressions ("May 2011") in both corpora. The finalized corpora were used for further NLP development and evaluation, with promising results (normalization accuracy 71-86%). To highlight the specificities of our annotation task, we also applied the final adapted NLP system to a different temporally annotated clinical corpus. CONCLUSIONS: Developing domain-specific methods is crucial to address complex NLP tasks such as symptom onset extraction and retrospective calculation of duration of a preclinical syndrome. To the best of our knowledge, this is the first clinical text resource annotated for temporal entities in the mental health domain.


Asunto(s)
Registros Electrónicos de Salud , Almacenamiento y Recuperación de la Información , Salud Mental , Trastornos Psicóticos , Humanos , Procesamiento de Lenguaje Natural , Trastornos Psicóticos/diagnóstico , Trastornos Psicóticos/terapia , Factores de Tiempo
5.
Stud Health Technol Inform ; 270: 98-102, 2020 Jun 16.
Artículo en Inglés | MEDLINE | ID: mdl-32570354

RESUMEN

Chronic fatigue syndrome (CFS) is a long-term illness with a wide range of symptoms and condition trajectories. To improve the understanding of these, automated analysis of large amounts of patient data holds promise. Routinely documented assessments are useful for large-scale analysis, however relevant information is mainly in free text. As a first step to extract symptom and condition trajectories, natural language processing (NLP) methods are useful to identify important textual content and relevant information. In this paper, we propose an agnostic NLP method of extracting segments of patients' clinical histories in CFS assessments. Moreover, we present initial results on the advantage of using these segments to quantify and analyse the presence of certain clinically relevant concepts.


Asunto(s)
Registros Electrónicos de Salud , Síndrome de Fatiga Crónica , Procesamiento de Lenguaje Natural , Recolección de Datos , Humanos
6.
NPJ Digit Med ; 3: 69, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32435697

RESUMEN

A serious obstacle to the development of Natural Language Processing (NLP) methods in the clinical domain is the accessibility of textual data. The mental health domain is particularly challenging, partly because clinical documentation relies heavily on free text that is difficult to de-identify completely. This problem could be tackled by using artificial medical data. In this work, we present an approach to generate artificial clinical documents. We apply this approach to discharge summaries from a large mental healthcare provider and discharge summaries from an intensive care unit. We perform an extensive intrinsic evaluation where we (1) apply several measures of text preservation; (2) measure how much the model memorises training data; and (3) estimate clinical validity of the generated text based on a human evaluation task. Furthermore, we perform an extrinsic evaluation by studying the impact of using artificial text in a downstream NLP text classification task. We found that using this artificial data as training data can lead to classification results that are comparable to the original results. Additionally, using only a small amount of information from the original data to condition the generation of the artificial data is successful, which holds promise for reducing the risk of these artificial data retaining rare information from the original data. This is an important finding for our long-term goal of being able to generate artificial clinical data that can be released to the wider research community and accelerate advances in developing computational methods that use healthcare data.

7.
Stud Health Technol Inform ; 264: 418-422, 2019 Aug 21.
Artículo en Inglés | MEDLINE | ID: mdl-31437957

RESUMEN

For patients with a diagnosis of schizophrenia, determining symptom onset is crucial for timely and successful intervention. In mental health records, information about early symptoms is often documented only in free text, and thus needs to be extracted to support clinical research. To achieve this, natural language processing (NLP) methods can be used. Development and evaluation of NLP systems requires manually annotated corpora. We present a corpus of mental health records annotated with temporal relations for psychosis symptoms. We propose a methodology for document selection and manual annotation to detect symptom onset information, and develop an annotated corpus. To assess the utility of the created corpus, we propose a pilot NLP system. To the best of our knowledge, this is the first temporally-annotated corpus tailored to a specific clinical use-case.


Asunto(s)
Procesamiento de Lenguaje Natural , Trastornos Psicóticos , Registros Electrónicos de Salud , Humanos , Registros
8.
Int J Med Inform ; 111: 140-148, 2018 03.
Artículo en Inglés | MEDLINE | ID: mdl-29425625

RESUMEN

OBJECTIVE: In this work, we propose an ontology-driven approach to identify events and their attributes from episodes of care included in medical reports written in Italian. For this language, shared resources for clinical information extraction are not easily accessible. MATERIALS AND METHODS: The corpus considered in this work includes 5432 non-annotated medical reports belonging to patients with rare arrhythmias. To guide the information extraction process, we built a domain-specific ontology that includes the events and the attributes to be extracted, with related regular expressions. The ontology and the annotation system were constructed on a development set, while the performance was evaluated on an independent test set. As a gold standard, we considered a manually curated hospital database named TRIAD, which stores most of the information written in reports. RESULTS: The proposed approach performs well on the considered Italian medical corpus, with a percentage of correct annotations above 90% for most considered clinical events. We also assessed the possibility to adapt the system to the analysis of another language (i.e., English), with promising results. DISCUSSION AND CONCLUSION: Our annotation system relies on a domain ontology to extract and link information in clinical text. We developed an ontology that can be easily enriched and translated, and the system performs well on the considered task. In the future, it could be successfully used to automatically populate the TRIAD database.


Asunto(s)
Documentación/métodos , Almacenamiento y Recuperación de la Información , Registro Médico Coordinado/métodos , Registros Médicos , Procesamiento de Lenguaje Natural , Bases de Datos Factuales , Humanos , Italia
9.
Stud Health Technol Inform ; 247: 715-719, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29678054

RESUMEN

Medical reports often contain a lot of relevant information in the form of free text. To reuse these unstructured texts for biomedical research, it is important to extract structured data from them. In this work, we adapted a previously developed information extraction system to the oncology domain, to process a set of anatomic pathology reports in the Italian language. The information extraction system relies on a domain ontology, which was adapted and refined in an iterative way. The final output was evaluated by a domain expert, with promising results.


Asunto(s)
Almacenamiento y Recuperación de la Información , Lenguaje , Procesamiento de Lenguaje Natural , Investigación Biomédica , Minería de Datos , Humanos , Italia
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA