Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
1.
J Biomed Inform ; 75S: S71-S84, 2017 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-28576748

RESUMO

This paper presents a novel method for automatically recognizing symptom severity by using natural language processing of psychiatric evaluation records to extract features that are processed by machine learning techniques to assign a severity score to each record evaluated in the 2016 RDoC for Psychiatry Challenge from CEGS/N-GRID. The natural language processing techniques focused on (a) discerning the discourse information expressed in questions and answers; (b) identifying medical concepts that relate to mental disorders; and (c) accounting for the role of negation. The machine learning techniques rely on the assumptions that (1) the severity of a patient's positive valence symptoms exists on a latent continuous spectrum and (2) all the patient's answers and narratives documented in the psychological evaluation records are informed by the patient's latent severity score along this spectrum. These assumptions motivated our two-step machine learning framework for automatically recognizing psychological symptom severity. In the first step, the latent continuous severity score is inferred from each record; in the second step, the severity score is mapped to one of the four discrete severity levels used in the CEGS/N-GRID challenge. We evaluated three methods for inferring the latent severity score associated with each record: (i) pointwise ridge regression; (ii) pairwise comparison-based classification; and (iii) a hybrid approach combining pointwise regression and the pairwise classifier. The second step was implemented using a tree of cascading support vector machine (SVM) classifiers. While the official evaluation results indicate that all three methods are promising, the hybrid approach not only outperformed the pairwise and pointwise methods, but also produced the second highest performance of all submissions to the CEGS/N-GRID challenge with a normalized MAE score of 84.093% (where higher numbers indicate better performance). These evaluation results enabled us to observe that, for this task, considering pairwise information can produce more accurate severity scores than pointwise regression - an approach widely used in other systems for assigning severity scores. Moreover, our analysis indicates that using a cascading SVM tree outperforms traditional SVM classification methods for the purpose of determining discrete severity levels.


Assuntos
Reconhecimento Automatizado de Padrão , Testes Psicológicos , Humanos , Aprendizado de Máquina , Processamento de Linguagem Natural , Índice de Gravidade de Doença
2.
Sci Data ; 9(1): 432, 2022 07 21.
Artigo em Inglês | MEDLINE | ID: mdl-35864125

RESUMO

One of the effects of COVID-19 pandemic is a rapidly growing and changing stream of publications to inform clinicians, researchers, policy makers, and patients about the health, socio-economic, and cultural consequences of the pandemic. Managing this information stream manually is not feasible. Automatic Question Answering can quickly bring the most salient points to the user's attention. Leveraging a collection of scientific articles, government websites, relevant news articles, curated social media posts, and questions asked by researchers, clinicians, and the general public, we developed a dataset to explore automatic Question Answering for multiple stakeholders. Analysis of questions asked by various stakeholders shows that while information needs of experts and the public may overlap, satisfactory answers to these questions often originate from different information sources or benefit from different approaches to answer generation. We believe that this dataset has the potential to support the development of question answering systems not only for epidemic questions, but for other domains with varying expertise such as legal or finance.


Assuntos
COVID-19 , Pandemias , Humanos
3.
AMIA Jt Summits Transl Sci Proc ; 2021: 445-454, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34457160

RESUMO

The objective of this study is to explore the role of structured and unstructured data for clinical phenotyping by determining which types of clinical phenotypes are best identified using unstructured data (e.g., clinical notes), structured data (e.g., laboratory values, vital signs), or their combination across 172 clinical phenotypes. Specifically, we used laboratory and chart measurements as well as clinical notes from the MIMIC-III critical care database and trained an LSTM using features extracted from each type of data to determine which categories of phenotypes were best identified by structured data, unstructured data, or both. We observed that textual features on their own outperformed structured features for 145 (84%) of phenotypes, and that Doc2Vec was the most effective representation of unstructured data for all phenotypes. When evaluating the impact of adding textual features to systems previously relying only on structured features, we found a statistically significant (p < 0.05) increase in phenotyping performance for 51 phenotypes (primarily involving the circulatory system, injury, and poisoning), one phenotype for which textual features degraded performance (diabetes without complications), and no statistically significant change in performance with the remaining 120 phenotypes. We provide analysis on which phenotypes are best identified by each type of data and guidance on which data sources to consider for future research on phenotype identification.


Assuntos
Cuidados Críticos , Registros Eletrônicos de Saúde , Bases de Dados Factuais , Humanos , Fenótipo
4.
J Am Med Inform Assoc ; 27(4): 567-576, 2020 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-32065628

RESUMO

OBJECTIVE: Reliable longitudinal risk prediction for hospitalized patients is needed to provide quality care. Our goal is to develop a generalizable model capable of leveraging clinical notes to predict healthcare-associated diseases 24-96 hours in advance. METHODS: We developed a reCurrent Additive Network for Temporal RIsk Prediction (CANTRIP) to predict the risk of hospital acquired (occurring ≥ 48 hours after admission) acute kidney injury, pressure injury, or anemia ≥ 24 hours before it is implicated by the patient's chart, labs, or notes. We rely on the MIMIC III critical care database and extract distinct positive and negative cohorts for each disease. We retrospectively determine the date-of-event using structured and unstructured criteria and use it as a form of indirect supervision to train and evaluate CANTRIP to predict disease risk using clinical notes. RESULTS: Our experiments indicate that CANTRIP, operating on text alone, obtains 74%-87% area under the curve and 77%-85% Specificity. Baseline shallow models showed lower performance on all metrics, while bidirectional long short-term memory obtained the highest Sensitivity at the cost of significantly lower Specificity and Precision. DISCUSSION: Proper model architecture allows clinical text to be successfully harnessed to predict nosocomial disease, outperforming shallow models and obtaining similar performance to disease-specific models reported in the literature. CONCLUSION: Clinical text on its own can provide a competitive alternative to traditional structured features (eg, lab values, vital signs). CANTRIP is able to generalize across nosocomial diseases without disease-specific feature extraction and is available at https://github.com/h4ste/cantrip.


Assuntos
Injúria Renal Aguda , Anemia , Aprendizado Profundo , Doença Iatrogênica , Úlcera por Pressão , Medição de Risco/métodos , Área Sob a Curva , Cuidados Críticos , Registros Eletrônicos de Saúde , Humanos , Processamento de Linguagem Natural , Curva ROC , Estudos Retrospectivos , Máquina de Vetores de Suporte
5.
Artigo em Inglês | MEDLINE | ID: mdl-33364628

RESUMO

Deep neural networks have demonstrated high performance on many natural language processing (NLP) tasks that can be answered directly from text, and have struggled to solve NLP tasks requiring external (e.g., world) knowledge. In this paper, we present OSCR (Ontology-based Semantic Composition Regularization), a method for injecting task-agnostic knowledge from an Ontology or knowledge graph into a neural network during pre-training. We evaluated the performance of BERT pre-trained on Wikipedia with and without OSCR by measuring the performance when fine-tuning on two question answering tasks involving world knowledge and causal reasoning and one requiring domain (healthcare) knowledge and obtained 33.3 %, 18.6 %, and 4 % improved accuracy compared to pre-training BERT without OSCR.

6.
Proc Conf Empir Methods Nat Lang Process ; 2020: 3215-3226, 2020 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-33364629

RESUMO

Automatic summarization research has traditionally focused on providing high quality general-purpose summaries of documents. However, there are many applications that require more specific summaries, such as supporting question answering or topic-based literature discovery. In this paper, we study the problem of conditional summarization in which content selection and surface realization are explicitly conditioned on an ad-hoc natural language question or topic description. Because of the difficulty in obtaining sufficient reference summaries to support arbitrary conditional summarization, we explore the use of multi-task fine-tuning (MTFT) on twenty-one natural language tasks to enable zero-shot conditional summarization on five tasks. We present four new summarization datasets, two novel "online" or adaptive task-mixing strategies, and report zero-shot performance using T5 and BART, demonstrating that MTFT can improve zero-shot summarization quality.

7.
Proc Int Conf Comput Ling ; 2020: 5640-5646, 2020 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-33293900

RESUMO

Recent work has shown that pre-trained Transformers obtain remarkable performance on many natural language processing tasks, including automatic summarization. However, most work has focused on (relatively) data-rich single-document summarization settings. In this paper, we explore highly-abstractive multi-document summarization, where the summary is explicitly conditioned on a user-given topic statement or question. We compare the summarization quality produced by three state-of-the-art transformer-based models: BART, T5, and PEGASUS. We report the performance on four challenging summarization datasets: three from the general domain and one from consumer health in both zero-shot and few-shot learning settings. While prior work has shown significant differences in performance for these models on standard summarization tasks, our results indicate that with as few as 10 labeled examples, there is no statistically significant difference in summary quality, suggesting the need for more abstractive benchmark collections when determining state-of-the-art.

8.
AMIA Annu Symp Proc ; 2019: 467-476, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-32308840

RESUMO

Hospital acquired pneumonia (HAP) is the second most common nosocomial infection in the ICU and costs an estimated $3.1 billion annually. The ability to predict HAP could improve patient outcomes and reduce costs. Traditional pneumonia risk prediction models rely on a small number of hand-chosen signs and symptoms and have been shown to poorly discriminate between low and high risk individuals. Consequently, we wanted to investigate whether modern data-driven techniques applied to respective pneumonia cohorts could provide more robust and discriminative prognostication of pneumonia risk. In this paper we present a deep learning system for predicting imminent pneumonia risk one or more days into the future using clinical observations documented in ICU notes for an at-risk population (n = 1, 467). We show how the system can be trained without direct supervision or feature engineering from sparse, noisy, and limited data to predict future pneumonia risk with 96% Sensitivity, 72% AUC, and 80% F1-measure, outperforming SVM approaches using the same features by 20% Accuracy (relative; 12% absolute).


Assuntos
Aprendizado Profundo , Pneumonia Associada a Assistência à Saúde , Unidades de Terapia Intensiva , Medição de Risco/métodos , Humanos , Sensibilidade e Especificidade
9.
Stud Health Technol Inform ; 264: 25-29, 2019 Aug 21.
Artigo em Inglês | MEDLINE | ID: mdl-31437878

RESUMO

This paper addresses the task of answering consumer health questions about medications. To better understand the challenge and needs in terms of methods and resources, we first introduce a gold standard corpus for Medication Question Answering created using real consumer questions. The gold standard (https://github.com/abachaa/Medication_QA_MedInfo2019) consists of six hundred and seventy-four question-answer pairs with annotations of the question focus and type and the answer source. We first present the manual annotation and answering process. In the second part of this paper, we test the performance of recurrent and convolutional neural networks in question type identification and focus recognition. Finally, we discuss the research insights from both the dataset creation process and our experiments. This study provides new resources and experiments on answering consumers' medication questions and discusses the limitations and directions for future research efforts.


Assuntos
Informática Aplicada à Saúde dos Consumidores , Atenção à Saúde , Confiança , Resolução de Problemas
10.
JAMIA Open ; 1(2): 265-275, 2018 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-30474078

RESUMO

OBJECTIVE: We explored how judgements provided by physicians can be used to learn relevance models that enhance the quality of patient cohorts retrieved from Electronic Health Records (EHRs) collections. METHODS: A very large number of features were extracted from patient cohort descriptions as well as EHR collections. The features were used to investigate retrieving (1) neurology-specific patient cohorts from the de-identified Temple University Hospital electroencephalography (EEG) Corpus as well as (2) the more general cohorts evaluated in the TREC Medical Records Track (TRECMed) from the de-identified hospital records provided by the University of Pittsburgh Medical Center. The features informed a learning relevance model (LRM) that took advantage of relevance judgements provided by physicians. The LRM implements a pairwise learning-to-rank framework, which enables our learning patient cohort retrieval (L-PCR) system to learn from physicians' feedback. RESULTS AND DISCUSSION: We evaluated the L-PCR system against state-of-the-art traditional patient cohort retrieval systems, and observed a 27% improvement when operating on EEGs and a 53% improvement when operating on TRECMed EHRs, showing the promise of the L-PCR system. We also performed extensive feature analyses to reveal the most effective strategies for representing cohort descriptions as queries, encoding EHRs, and measuring cohort relevance. CONCLUSION: The L-PCR system has significant promise for reliably retrieving patient cohorts from EHRs in multiple settings when trained with relevance judgments. When provided with additional cohort descriptions, the L-PCR system will continue to learn, thus offering a potential solution to the performance barriers of current cohort retrieval systems.

11.
Artigo em Inglês | MEDLINE | ID: mdl-29888040

RESUMO

As medical science continues to advance, health care professionals and researchers are increasingly turning to clinical trials to obtain evidence supporting best-practice treatment options. While clinical trial registries such as Clinical-Trials.gov aim to facilitate these needs, it has been shown that many trials in the registry do not contain links to their published results. To address this problem, we present NCT Link, a system for automatically linking registered clinical trials to published MEDLINE articles reporting their results. NCT Link incorporates state-of-the-art deep learning and information retrieval techniques by automatically learning a Deep Highway Network (DHN) that estimates the likelihood that a MEDLINE article reports the results of a clinical trial. Our experimental results indicate that NCT Link obtains 30%-58% improved performance over previously reported automatic systems, suggesting that NCT Link could become a valuable tool for health care providers seeking to deliver best-practice medical care informed by evidence of clinical trials as well as (a) researchers investigating selective publication and reporting of clinical trial outcomes, and (b) study designers seeking to avoid unnecessary duplication of research efforts.

12.
AMIA Jt Summits Transl Sci Proc ; 2017: 156-165, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29888063

RESUMO

The automatic identification of relations between medical concepts in a large corpus of Electroencephalography (EEG) reports is an important step in the development of an EEG-specific patient cohort retrieval system as well as in the acquisition of EEG-specific knowledge from this corpus. EEG-specific relations involve medical concepts that are not typically mentioned in the same sentence or even the same section of a report, thus requiring extraction techniques that can handle such long-distance dependencies. To address this challenge, we present a novel frame work which combines the advantages of a deep learning framework employing Dynamic Relational Memory (DRM) with active learning. While DRM enables the prediction of long-distance relations, active learning provides a mechanism for accurately identifying relations with minimal training data, obtaining an 5-fold cross validationF1 score of 0.7475 on a set of 140 EEG reports selected with active learning. The results obtained with our novel framework show great promise.

13.
AMIA Annu Symp Proc ; 2017: 770-779, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29854143

RESUMO

Successful diagnosis and management of neurological dysfunction relies on proper communication between the neurologist and the primary physician (or other specialists). Because this communication is documented within medical records, the ability to automatically infer the clinical correlations for a patient from his or her medical records would provide an important step towards enabling health care systems to automatically identify patients requiring additional follow-up as well as flagging any unexpected clinical correlations for review. In this paper, we present a Deep Section Recovery Model (DSRM) which applies deep neural learning on a large body of EEG reports in order to infer the expected clinical correlations for a patient from the information in a given EEG report by (1) automatically extracting word- and report- level features from the report and (2) inferring the most likely clinical correlations and expressing those clinical correlations in natural language. We evaluated the performance of the DSRM by removing the clinical correlation sections from EEG reports and measuring how well the model could recover that information from the remainder of the report. The DSRM obtained a 17% improvement over the top-performing baseline, highlighting not only the power of the DSRM but also the promise of automatically recognizing unexpected clinical correlations in the future.


Assuntos
Aprendizado Profundo , Eletroencefalografia , Modelos Neurológicos , Processamento de Linguagem Natural , Doenças do Sistema Nervoso/diagnóstico , Feminino , Humanos , Masculino , Neurologia
14.
AMIA Jt Summits Transl Sci Proc ; 2017: 112-121, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28815118

RESUMO

Secondary use1of electronic health records (EHRs) often relies on the ability to automatically identify and extract information from EHRs. Unfortunately, EHRs are known to suffer from a variety of idiosyncrasies - most prevalently, they have been shown to often omit or underspecify information. Adapting traditional machine learning methods for inferring underspecified information relies on manually specifying features characterizing the specific information to recover (e.g. particular findings, test results, or physician's impressions). By contrast, in this paper, we present a method for jointly (1) automatically extracting word- and report-level features and (2) inferring underspecified information from EHRs. Our approach accomplishes these two tasks jointly by combining recent advances in deep neural learning with access to textual data in electroencephalogram (EEG) reports. We evaluate the performance of our model on the problem of inferring the neurologist's over-all impression (normal or abnormal) from electroencephalogram (EEG) reports and report an accuracy of 91.4% precision of 94.4% recall of 91.2% and F1 measure of 92.8% (a 40% improvement over the performance obtained using Doc2Vec). These promising results demonstrate the power of our approach, while error analysis reveals remaining obstacles as well as areas for future improvement.

15.
AMIA Jt Summits Transl Sci Proc ; 2017: 229-238, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28815135

RESUMO

The annotation of a large corpus of Electroencephalography (EEG) reports is a crucial step in the development of an EEG-specific patient cohort retrieval system. The annotation of multiple types of EEG-specific medical concepts, along with their polarity and modality, is challenging, especially when automatically performed on Big Data. To address this challenge, we present a novel framework which combines the advantages of active and deep learning while producing annotations that capture a variety of attributes of medical concepts. Results obtained through our novel framework show great promise.

16.
AMIA Annu Symp Proc ; 2017: 1233-1242, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29854192

RESUMO

While biomedical ontologies have traditionally been used to guide the identification of concepts or relations in biomedical data, recent advances in deep learning are able to capture high-quality knowledge from textual data and represent it in graphical structures. As opposed to the top-down methodology used in the generation of ontologies, which starts with the principled design of the upper ontology, the bottom-up methodology enabled by deep learning encodes the likelihood that concepts share certain relations, as evidenced by data. In this paper, we present a knowledge representation produced by deep learning methods, called Medical Knowledge Embeddings (MKE), that encode medical concepts related to the study of epilepsy and the relations between them. Many of the epilepsy-relevant medical concepts from MKE are not yet available in existing biomedical ontologies, but are mentioned in vast collections of epilepsy-related medical records which also imply their relationships. The evaluation of the MKE indicates high accuracy of the medical concepts automatically identified from clinical text as well as promising results in terms of correctness and completeness of relations produced by deep learning.


Assuntos
Ontologias Biológicas , Aprendizado Profundo , Eletroencefalografia , Epilepsia , Confiabilidade dos Dados , Humanos , Prontuários Médicos
17.
LREC Int Conf Lang Resour Eval ; 2016: 4621-4628, 2016 May.
Artigo em Inglês | MEDLINE | ID: mdl-28649676

RESUMO

Our ability to understand language often relies on common-sense knowledge - background information the speaker can assume is known by the reader. Similarly, our comprehension of the language used in complex domains relies on access to domain-specific knowledge. Capturing common-sense and domain-specific knowledge can be achieved by taking advantage of recent advances in open information extraction (IE) techniques and, more importantly, of knowledge embeddings, which are multi-dimensional representations of concepts and relations. Building a knowledge graph for representing common-sense knowledge in which concepts discerned from noun phrases are cast as vertices and lexicalized relations are cast as edges leads to learning the embeddings of common-sense knowledge accounting for semantic compositionality as well as implied knowledge. Common-sense knowledge is acquired from a vast collection of blogs and books as well as from WordNet. Similarly, medical knowledge is learned from two large sets of electronic health records. The evaluation results of these two forms of knowledge are promising: the same knowledge acquisition methodology based on learning knowledge embeddings works well both for common-sense knowledge and for medical knowledge Interestingly, the common-sense knowledge that we have acquired was evaluated as being less neutral than than the medical knowledge, as it often reflected the opinion of the knowledge utterer. In addition, the acquired medical knowledge was evaluated as more plausible than the common-sense knowledge, reflecting the complexity of acquiring common-sense knowledge due to the pragmatics and economicity of language.

18.
Artigo em Inglês | MEDLINE | ID: mdl-27595044

RESUMO

The wealth of clinical information provided by the advent of electronic health records offers an exciting opportunity to improve the quality of patient care. Of particular importance are the risk factors, which indicate possible diagnoses, and the medications which treat them. By analysing which risk factors and medications were mentioned at different times in patients' EHRs, we are able to construct a patient's clinical chronology. This chronology enables us to not only predict how new patient's risk factors may progress, but also to discover patterns of interactions between risk factors and medications. We present a novel probabilistic model of patients' clinical chronologies and demonstrate how this model can be used to (1) predict the way a new patient's risk factors may evolve over time, (2) identify patients with irregular chronologies, and (3) discovering the interactions between pairs of risk factors, and between risk factors and medications over time. Moreover, the model proposed in this paper does not rely on (nor specify) any prior knowledge about any interactions between the risk factors and medications it represents. Thus, our model can be easily applied to any arbitrary set of risk factors and medications derived from a new dataset.

19.
AMIA Annu Symp Proc ; 2016: 1794-1803, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-28269938

RESUMO

Clinical electroencephalography (EEG) is the most important investigation in the diagnosis and management of epilepsies. An EEG records the electrical activity along the scalp and measures spontaneous electrical activity of the brain. Because the EEG signal is complex, its interpretation is known to produce moderate inter-observer agreement among neurologists. This problem can be addressed by providing clinical experts with the ability to automatically retrieve similar EEG signals and EEG reports through a patient cohort retrieval system operating on a vast archive of EEG data. In this paper, we present a multi-modal EEG patient cohort retrieval system called MERCuRY which leverages the heterogeneous nature of EEG data by processing both the clinical narratives from EEG reports as well as the raw electrode potentials derived from the recorded EEG signal data. At the core of MERCuRY is a novel multimodal clinical indexing scheme which relies on EEG data representations obtained through deep learning. The index is used by two clinical relevance models that we have generated for identifying patient cohorts satisfying the inclusion and exclusion criteria expressed in natural language queries. Evaluations of the MERCuRY system measured the relevance of the patient cohorts, obtaining MAP scores of 69.87% and a NDCG of 83.21%.


Assuntos
Indexação e Redação de Resumos , Eletroencefalografia , Epilepsia/fisiopatologia , Armazenamento e Recuperação da Informação , Sistemas de Informação , Eletrodos , Humanos , Aprendizado de Máquina , Processamento de Linguagem Natural , Redes Neurais de Computação
20.
Proc ACM Int Conf Inf Knowl Manag ; 2016: 297-306, 2016 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-28758046

RESUMO

The goal of modern Clinical Decision Support (CDS) systems is to provide physicians with information relevant to their management of patient care. When faced with a medical case, a physician asks questions about the diagnosis, the tests, or treatments that should be administered. Recently, the TREC-CDS track has addressed this challenge by evaluating results of retrieving relevant scientific articles where the answers of medical questions in support of CDS can be found. Although retrieving relevant medical articles instead of identifying the answers was believed to be an easier task, state-of-the-art results are not yet sufficiently promising. In this paper, we present a novel framework for answering medical questions in the spirit of TREC-CDS by first discovering the answer and then selecting and ranking scientific articles that contain the answer. Answer discovery is the result of probabilistic inference which operates on a probabilistic knowledge graph, automatically generated by processing the medical language of large collections of electronic medical records (EMRs). The probabilistic inference of answers combines knowledge from medical practice (EMRs) with knowledge from medical research (scientific articles). It also takes into account the medical knowledge automatically discerned from the medical case description. We show that this novel form of medical question answering (Q/A) produces very promising results in (a) identifying accurately the answers and (b) it improves medical article ranking by 40%.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA