Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 107
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
J Biomed Inform ; 150: 104586, 2024 02.
Artículo en Inglés | MEDLINE | ID: mdl-38191011

RESUMEN

BACKGROUND: Halbert L. Dunn's concept of wellness is a multi-dimensional aspect encompassing social and mental well-being. Neglecting these dimensions over time can have a negative impact on an individual's mental health. The manual efforts employed in in-person therapy sessions reveal that underlying factors of mental disturbance if triggered, may lead to severe mental health disorders. OBJECTIVE: In our research, we introduce a fine-grained approach focused on identifying indicators of wellness dimensions and mark their presence in self-narrated human-writings on Reddit social media platform. DESIGN AND METHOD: We present the MultiWD dataset, a curated collection comprising 3281 instances, as a specifically designed and annotated dataset that facilitates the identification of multiple wellness dimensions in Reddit posts. In our study, we introduce the task of identifying wellness dimensions and utilize state-of-the-art classifiers to solve this multi-label classification task. RESULTS: Our findings highlights the best and comparative performance of fine-tuned large language models with fine-tuned BERT model. As such, we set BERT as a baseline model to tag wellness dimensions in a user-penned text with F1 score of 76.69. CONCLUSION: Our findings underscore the need of trustworthy and domain-specific knowledge infusion to develop more comprehensive and contextually-aware AI models for tagging and extracting wellness dimensions.


Asunto(s)
Trastornos Mentales , Medios de Comunicación Sociales , Humanos , Salud Mental , Concienciación
2.
J Biomed Inform ; 152: 104623, 2024 04.
Artículo en Inglés | MEDLINE | ID: mdl-38458578

RESUMEN

INTRODUCTION: Patients' functional status assesses their independence in performing activities of daily living, including basic ADLs (bADL), and more complex instrumental activities (iADL). Existing studies have discovered that patients' functional status is a strong predictor of health outcomes, particularly in older adults. Depite their usefulness, much of the functional status information is stored in electronic health records (EHRs) in either semi-structured or free text formats. This indicates the pressing need to leverage computational approaches such as natural language processing (NLP) to accelerate the curation of functional status information. In this study, we introduced FedFSA, a hybrid and federated NLP framework designed to extract functional status information from EHRs across multiple healthcare institutions. METHODS: FedFSA consists of four major components: 1) individual sites (clients) with their private local data, 2) a rule-based information extraction (IE) framework for ADL extraction, 3) a BERT model for functional status impairment classification, and 4) a concept normalizer. The framework was implemented using the OHNLP Backbone for rule-based IE and open-source Flower and PyTorch library for federated BERT components. For gold standard data generation, we carried out corpus annotation to identify functional status-related expressions based on ICF definitions. Four healthcare institutions were included in the study. To assess FedFSA, we evaluated the performance of category- and institution-specific ADL extraction across different experimental designs. RESULTS: ADL extraction performance ranges from an F1-score of 0.907 to 0.986 for bADL and 0.825 to 0.951 for iADL across the four healthcare sites. The performance for ADL extraction with impairment ranges from an F1-score of 0.722 to 0.954 for bADL and 0.674 to 0.813 for iADL across four healthcare sites. For category-specific ADL extraction, laundry and transferring yielded relatively high performance, while dressing, medication, bathing, and continence achieved moderate-high performance. Conversely, food preparation and toileting showed low performance. CONCLUSION: NLP performance varied across ADL categories and healthcare sites. Federated learning using a FedFSA framework performed higher than non-federated learning for impaired ADL extraction at all healthcare sites. Our study demonstrated the potential of the federated learning framework in functional status extraction and impairment classification in EHRs, exemplifying the importance of a large-scale, multi-institutional collaborative development effort.


Asunto(s)
Actividades Cotidianas , Estado Funcional , Humanos , Anciano , Aprendizaje , Almacenamiento y Recuperación de la Información , Procesamiento de Lenguaje Natural
3.
Circulation ; 145(12): 877-891, 2022 03 22.
Artículo en Inglés | MEDLINE | ID: mdl-34930020

RESUMEN

BACKGROUND: Sequencing Mendelian arrhythmia genes in individuals without an indication for arrhythmia genetic testing can identify carriers of pathogenic or likely pathogenic (P/LP) variants. However, the extent to which these variants are associated with clinically meaningful phenotypes before or after return of variant results is unclear. In addition, the majority of discovered variants are currently classified as variants of uncertain significance, limiting clinical actionability. METHODS: The eMERGE-III study (Electronic Medical Records and Genomics Phase III) is a multicenter prospective cohort that included 21 846 participants without previous indication for cardiac genetic testing. Participants were sequenced for 109 Mendelian disease genes, including 10 linked to arrhythmia syndromes. Variant carriers were assessed with electronic health record-derived phenotypes and follow-up clinical examination. Selected variants of uncertain significance (n=50) were characterized in vitro with automated electrophysiology experiments in HEK293 cells. RESULTS: As previously reported, 3.0% of participants had P/LP variants in the 109 genes. Herein, we report 120 participants (0.6%) with P/LP arrhythmia variants. Compared with noncarriers, arrhythmia P/LP carriers had a significantly higher burden of arrhythmia phenotypes in their electronic health records. Fifty-four participants had variant results returned. Nineteen of these 54 participants had inherited arrhythmia syndrome diagnoses (primarily long-QT syndrome), and 12 of these 19 diagnoses were made only after variant results were returned (0.05%). After in vitro functional evaluation of 50 variants of uncertain significance, we reclassified 11 variants: 3 to likely benign and 8 to P/LP. CONCLUSIONS: Genome sequencing in a large population without indication for arrhythmia genetic testing identified phenotype-positive carriers of variants in congenital arrhythmia syndrome disease genes. As the genomes of large numbers of people are sequenced, the disease risk from rare variants in arrhythmia genes can be assessed by integrating genomic screening, electronic health record phenotypes, and in vitro functional studies. REGISTRATION: URL: https://www. CLINICALTRIALS: gov; Unique identifier; NCT03394859.


Asunto(s)
Arritmias Cardíacas , Pruebas Genéticas , Arritmias Cardíacas/diagnóstico , Arritmias Cardíacas/genética , Predisposición Genética a la Enfermedad , Pruebas Genéticas/métodos , Genómica , Células HEK293 , Humanos , Fenotipo , Estudios Prospectivos
4.
J Arthroplasty ; 38(10): 1948-1953, 2023 10.
Artículo en Inglés | MEDLINE | ID: mdl-37619802

RESUMEN

Total joint arthroplasty is becoming one of the most common surgeries within the United States, creating an abundance of analyzable data to improve patient experience and outcomes. Unfortunately, a large majority of this data is concealed in electronic health records only accessible by manual extraction, which takes extensive time and resources. Natural language processing (NLP), a field within artificial intelligence, may offer a viable alternative to manual extraction. Using NLP, a researcher can analyze written and spoken data and extract data in an organized manner suitable for future research and clinical use. This article will first discuss common subtasks involved in an NLP pipeline, including data preparation, modeling, analysis, and external validation, followed by examples of NLP projects. Challenges and limitations of NLP will be discussed, closing with future directions of NLP projects, including large language models.


Asunto(s)
Inteligencia Artificial , Procesamiento de Lenguaje Natural , Humanos , Artroplastia , Lenguaje , Registros Electrónicos de Salud
5.
J Arthroplasty ; 38(10): 2081-2084, 2023 10.
Artículo en Inglés | MEDLINE | ID: mdl-36280160

RESUMEN

BACKGROUND: Natural language processing (NLP) systems are distinctive in their ability to extract critical information from raw text in electronic health records (EHR). We previously developed three algorithms for total hip arthroplasty (THA) operative notes with rules aimed at capturing (1) operative approach, (2) fixation method, and (3) bearing surface using inputs from a single institution. The purpose of this study was to externally validate and improve these algorithms as a prerequisite for broader adoption in automated registry data curation. METHODS: The previous NLP algorithms developed at Mayo Clinic were deployed and refined on EHRs from OrthoCarolina, evaluating 39 randomly selected primary THA operative reports from 2018 to 2021. Operative reports were available only in PDF format, requiring conversion to "readable" text with Adobe software. Accuracy statistics were calculated against manual chart review. RESULTS: The operative approach, fixation technique, and bearing surface algorithms all demonstrated perfect accuracy of 100%. By comparison, validated performance at the developing center yielded an accuracy of 99.2% for operative approach, 90.7% for fixation technique, and 95.8% for bearing surface. CONCLUSION: NLP algorithms applied to data from an external center demonstrated excellent accuracy in delineating common elements in THA operative notes. Notably, the algorithms had no functional problems evaluating scanned PDFs that were converted to "readable" text by common software. Taken together, these findings provide promise for NLP applied to scanned PDFs as a source to develop large registries by reliably extracting data of interest from very large unstructured data sets in an expeditious and cost-effective manner.


Asunto(s)
Artroplastia de Reemplazo de Cadera , Humanos , Procesamiento de Lenguaje Natural , Elementos de Datos Comunes , Algoritmos , Programas Informáticos , Registros Electrónicos de Salud
6.
Genet Epidemiol ; 45(1): 4-15, 2021 02.
Artículo en Inglés | MEDLINE | ID: mdl-32964493

RESUMEN

Carotid artery atherosclerotic disease (CAAD) is a risk factor for stroke. We used a genome-wide association (GWAS) approach to discover genetic variants associated with CAAD in participants in the electronic Medical Records and Genomics (eMERGE) Network. We identified adult CAAD cases with unilateral or bilateral carotid artery stenosis and controls without evidence of stenosis from electronic health records at eight eMERGE sites. We performed GWAS with a model adjusting for age, sex, study site, and genetic principal components of ancestry. In eMERGE we found 1793 CAAD cases and 17,958 controls. Two loci reached genome-wide significance, on chr6 in LPA (rs10455872, odds ratio [OR] (95% confidence interval [CI]) = 1.50 (1.30-1.73), p = 2.1 × 10-8 ) and on chr7, an intergenic single nucleotide variant (SNV; rs6952610, OR (95% CI) = 1.25 (1.16-1.36), p = 4.3 × 10-8 ). The chr7 association remained significant in the presence of the LPA SNV as a covariate. The LPA SNV was also associated with coronary heart disease (CHD; 4199 cases and 11,679 controls) in this study (OR (95% CI) = 1.27 (1.13-1.43), p = 5 × 10-5 ) but the chr7 SNV was not (OR (95% CI) = 1.03 (0.97-1.09), p = .37). Both variants replicated in UK Biobank. Elevated lipoprotein(a) concentrations ([Lp(a)]) and LPA variants associated with elevated [Lp(a)] have previously been associated with CAAD and CHD, including rs10455872. With electronic health record phenotypes in eMERGE and UKB, we replicated a previously known association and identified a novel locus associated with CAAD.


Asunto(s)
Estenosis Carotídea , Estudio de Asociación del Genoma Completo , Registros Electrónicos de Salud , Predisposición Genética a la Enfermedad , Genómica , Humanos , Lipoproteína(a)/genética , Modelos Genéticos , Polimorfismo de Nucleótido Simple
7.
J Biomed Inform ; 113: 103660, 2021 01.
Artículo en Inglés | MEDLINE | ID: mdl-33321199

RESUMEN

Coronavirus Disease 2019 has emerged as a significant global concern, triggering harsh public health restrictions in a successful bid to curb its exponential growth. As discussion shifts towards relaxation of these restrictions, there is significant concern of second-wave resurgence. The key to managing these outbreaks is early detection and intervention, and yet there is a significant lag time associated with usage of laboratory confirmed cases for surveillance purposes. To address this, syndromic surveillance can be considered to provide a timelier alternative for first-line screening. Existing syndromic surveillance solutions are however typically focused around a known disease and have limited capability to distinguish between outbreaks of individual diseases sharing similar syndromes. This poses a challenge for surveillance of COVID-19 as its active periods tend to overlap temporally with other influenza-like illnesses. In this study we explore performing sentinel syndromic surveillance for COVID-19 and other influenza-like illnesses using a deep learning-based approach. Our methods are based on aberration detection utilizing autoencoders that leverages symptom prevalence distributions to distinguish outbreaks of two ongoing diseases that share similar syndromes, even if they occur concurrently. We first demonstrate that this approach works for detection of outbreaks of influenza, which has known temporal boundaries. We then demonstrate that the autoencoder can be trained to not alert on known and well-managed influenza-like illnesses such as the common cold and influenza. Finally, we applied our approach to 2019-2020 data in the context of a COVID-19 syndromic surveillance task to demonstrate how implementation of such a system could have provided early warning of an outbreak of a novel influenza-like illness that did not match the symptom prevalence profile of influenza and other known influenza-like illnesses.


Asunto(s)
COVID-19/epidemiología , Gripe Humana/epidemiología , Vigilancia de Guardia , COVID-19/virología , Aprendizaje Profundo , Brotes de Enfermedades , Humanos , SARS-CoV-2/aislamiento & purificación
8.
Int Psychogeriatr ; 33(10): 1105-1109, 2021 10.
Artículo en Inglés | MEDLINE | ID: mdl-34551841

RESUMEN

Delirium is reported to be one of the manifestations of coronavirus infectious disease 2019 (COVID-19) infection. COVID-19 hospitalized patients are at a higher risk of delirium. Pathophysiology behind the association of delirium and COVID-19 is uncertain. We analyzed the association of delirium occurrence with outcomes in hospitalized COVID-19 patients, across all age groups, at Mayo Clinic hospitals.A retrospective study of all hospitalized COVID-19 patients at Mayo Clinic between March 1, 2020 and December 31, 2020 was performed. Occurrence of delirium and outcomes of mortality, length of stay, readmission, and 30-day mortality after hospital discharge were measured. Chi-square test, student t-test, survival analysis, and logistic regression analysis were performed to measure and compare outcomes of delirium group adjusted for age, sex, Charlson comorbidity score, and COVID-19 severity with no-delirium group.A total of 4351 COVID-19 patients were included in the study. Delirium occurrence in the overall study population was noted to be 22.4%. The highest occurrence of delirium was also noted in patients with critical COVID-19 illness severity. A statistically significant OR 4.35 (3.27-5.83) for in-hospital mortality and an OR 4.54 (3.25-6.38) for 30-day mortality after discharge in the delirium group were noted. Increased hospital length of stay, 30-day readmission, and need for skilled nursing facility on discharge were noted in the delirium group. Delirium in hospitalized COVID-19 patients is a marker for increased mortality and morbidity. In this group, outcomes appear to be much worse when patients are older and have a critical severity of COVID-19 illness.


Asunto(s)
COVID-19/mortalidad , Delirio/epidemiología , Hospitalización/estadística & datos numéricos , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , COVID-19/complicaciones , Niño , Preescolar , Delirio/complicaciones , Humanos , Lactante , Recién Nacido , Unidades de Cuidados Intensivos , Tiempo de Internación , Masculino , Persona de Mediana Edad , Minnesota/epidemiología , Estudios Retrospectivos , SARS-CoV-2 , Adulto Joven
9.
BMC Med Inform Decis Mak ; 21(Suppl 7): 272, 2021 11 09.
Artículo en Inglés | MEDLINE | ID: mdl-34753481

RESUMEN

BACKGROUND: There are significant variabilities in guideline-concordant documentation in asthma care. However, assessing clinician's documentation is not feasible using only structured data but requires labor-intensive chart review of electronic health records (EHRs). A certain guideline element in asthma control factors, such as review inhaler techniques, requires context understanding to correctly capture from EHR free text. METHODS: The study data consist of two sets: (1) manual chart reviewed data-1039 clinical notes of 300 patients with asthma diagnosis, and (2) weakly labeled data (distant supervision)-27,363 clinical notes from 800 patients with asthma diagnosis. A context-aware language model, Bidirectional Encoder Representations from Transformers (BERT) was developed to identify inhaler techniques in EHR free text. Both original BERT and clinical BioBERT (cBERT) were applied with a cost-sensitivity to deal with imbalanced data. The distant supervision using weak labels by rules was also incorporated to augment the training set and alleviate a costly manual labeling process in the development of a deep learning algorithm. A hybrid approach using post-hoc rules was also explored to fix BERT model errors. The performance of BERT with/without distant supervision, hybrid, and rule-based models were compared in precision, recall, F-score, and accuracy. RESULTS: The BERT models on the original data performed similar to a rule-based model in F1-score (0.837, 0.845, and 0.838 for rules, BERT, and cBERT, respectively). The BERT models with distant supervision produced higher performance (0.853 and 0.880 for BERT and cBERT, respectively) than without distant supervision and a rule-based model. The hybrid models performed best in F1-score of 0.877 and 0.904 over the distant supervision on BERT and cBERT. CONCLUSIONS: The proposed BERT models with distant supervision demonstrated its capability to identify inhaler techniques in EHR free text, and outperformed both the rule-based model and BERT models trained on the original data. With a distant supervision approach, we may alleviate costly manual chart review to generate the large training data required in most deep learning-based models. A hybrid model was able to fix BERT model errors and further improve the performance.


Asunto(s)
Asma , Aprendizaje Profundo , Algoritmos , Asma/diagnóstico , Asma/tratamiento farmacológico , Registros Electrónicos de Salud , Humanos , Procesamiento de Lenguaje Natural
10.
BMC Med Inform Decis Mak ; 21(1): 310, 2021 11 08.
Artículo en Inglés | MEDLINE | ID: mdl-34749701

RESUMEN

BACKGROUND: A subgroup of patients with asthma has been reported to have an increased risk for asthma-associated infectious and inflammatory multimorbidities (AIMs). To systematically investigate the association of asthma with AIMs using a large patient cohort, it is desired to leverage a broad range of electronic health record (EHR) data sources to automatically identify AIMs accurately and efficiently. METHODS: We established an expert consensus for an operational definition for each AIM from EHR through a modified Delphi technique. A series of questions about the operational definition of 19 AIMS (11 infectious diseases and 8 inflammatory diseases) was generated by a core team of experts who considered feasibility, balance between sensitivity and specificity, and generalizability. Eight internal and 5 external expert panelists were invited to individually complete a series of online questionnaires and provide judgement and feedback throughout three sequential internal rounds and two external rounds. Panelists' responses were collected, descriptive statistics tabulated, and results reported back to the entire group. Following each round the core team of experts made iterative edits to the operational definitions until a moderate (≥ 60%) or strong (≥ 80%) level of consensus among the panel was achieved. RESULTS: Response rates for each Delphi round were 100% in all 5 rounds with the achievement of the following consensus levels: (1) Internal panel consensus: 100% for 8 definitions, 88% for 10 definitions, and 75% for 1 definition, (2) External panel consensus: 100% for 12 definitions and 80% for 7 definitions. CONCLUSIONS: The final operational definitions of AIMs established through a modified Delphi technique can serve as a foundation for developing computational algorithms to automatically identify AIMs from EHRs to enable large scale research studies on patient's multimorbidities associated with asthma.


Asunto(s)
Asma , Enfermedades Transmisibles , Algoritmos , Asma/diagnóstico , Consenso , Técnica Delphi , Humanos
11.
J Arthroplasty ; 36(2): 688-692, 2021 02.
Artículo en Inglés | MEDLINE | ID: mdl-32854996

RESUMEN

BACKGROUND: Periprosthetic joint infection (PJI) data elements are contained in both structured and unstructured documents in electronic health records and require manual data collection. The goal of this study is to develop a natural language processing (NLP) algorithm to replicate manual chart review for PJI data elements. METHODS: PJI was identified among all total joint arthroplasty (TJA) procedures performed at a single academic institution between 2000 and 2017. Data elements that comprise the Musculoskeletal Infection Society (MSIS) criteria were manually extracted and used as the gold standard for validation. A training sample of 1208 TJA surgeries (170 PJI cases) was randomly selected to develop the prototype NLP algorithms and an additional 1179 surgeries (150 PJI cases) were randomly selected as the test sample. The algorithms were applied to all consultation notes, operative notes, pathology reports, and microbiology reports to predict the correct status of PJI based on MSIS criteria. RESULTS: The algorithm, which identified patients with PJI based on MSIS criteria, achieved an f1-score (harmonic mean of precision and recall) of 0.911. Algorithm performance in extracting the presence of sinus tract, purulence, pathologic documentation of inflammation, and growth of cultured organisms from the involved TJA achieved f1-scores that ranged from 0.771 to 0.982, sensitivity that ranged from 0.730 to 1.000, and specificity that ranged from 0.947 to 1.000. CONCLUSION: NLP-enabled algorithms have the potential to automate data collection for PJI diagnostic elements, which could directly improve patient care and augment cohort surveillance and research efforts. Further validation is needed in other hospital settings. LEVEL OF EVIDENCE: Level III, Diagnostic.


Asunto(s)
Artritis Infecciosa , Infecciones Relacionadas con Prótesis , Artroplastia , Registros Electrónicos de Salud , Humanos , Procesamiento de Lenguaje Natural , Infecciones Relacionadas con Prótesis/diagnóstico , Infecciones Relacionadas con Prótesis/epidemiología , Infecciones Relacionadas con Prótesis/etiología
12.
J Arthroplasty ; 36(3): 922-926, 2021 03.
Artículo en Inglés | MEDLINE | ID: mdl-33051119

RESUMEN

BACKGROUND: Natural language processing (NLP) methods have the capability to process clinical free text in electronic health records, decreasing the need for costly manual chart review, and improving data quality. We developed rule-based NLP algorithms to automatically extract surgery specific data elements from knee arthroplasty operative notes. METHODS: Within a cohort of 20,000 knee arthroplasty operative notes from 2000 to 2017 at a large tertiary institution, we randomly selected independent pairs of training and test sets to develop and evaluate NLP algorithms to detect five major data elements. The size of the training and test datasets were similar and ranged between 420 to 1592 surgeries. Expert rules using keywords in operative notes were used to implement NLP algorithms capturing: (1) category of surgery (total knee arthroplasty, unicompartmental knee arthroplasty, patellofemoral arthroplasty), (2) laterality of surgery, (3) constraint type, (4) presence of patellar resurfacing, and (5) implant model (catalog numbers). We used institutional registry data as our gold standard to evaluate the NLP algorithms. RESULTS: NLP algorithms to detect the category of surgery, laterality, constraint, and patellar resurfacing achieved 98.3%, 99.5%, 99.2%, and 99.4% accuracy on test datasets, respectively. The implant model algorithm achieved an F1-score (harmonic mean of precision and recall) of 99.9%. CONCLUSIONS: NLP algorithms are a promising alternative to costly manual chart review to automate the extraction of embedded information within knee arthroplasty operative notes. Further validation in other hospital settings will enhance widespread implementation and efficiency in data capture for research and clinical purposes. LEVEL OF EVIDENCE: Level III.


Asunto(s)
Artroplastia de Reemplazo de Rodilla , Algoritmos , Elementos de Datos Comunes , Registros Electrónicos de Salud , Humanos , Procesamiento de Lenguaje Natural
13.
J Biomed Inform ; 109: 103526, 2020 09.
Artículo en Inglés | MEDLINE | ID: mdl-32768446

RESUMEN

BACKGROUND: Concept extraction, a subdomain of natural language processing (NLP) with a focus on extracting concepts of interest, has been adopted to computationally extract clinical information from text for a wide range of applications ranging from clinical decision support to care quality improvement. OBJECTIVES: In this literature review, we provide a methodology review of clinical concept extraction, aiming to catalog development processes, available methods and tools, and specific considerations when developing clinical concept extraction applications. METHODS: Based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, a literature search was conducted for retrieving EHR-based information extraction articles written in English and published from January 2009 through June 2019 from Ovid MEDLINE In-Process & Other Non-Indexed Citations, Ovid MEDLINE, Ovid EMBASE, Scopus, Web of Science, and the ACM Digital Library. RESULTS: A total of 6,686 publications were retrieved. After title and abstract screening, 228 publications were selected. The methods used for developing clinical concept extraction applications were discussed in this review.


Asunto(s)
Almacenamiento y Recuperación de la Información , Procesamiento de Lenguaje Natural , Bibliometría , Proyectos de Investigación
14.
BMC Med Inform Decis Mak ; 19(Suppl 3): 73, 2019 04 04.
Artículo en Inglés | MEDLINE | ID: mdl-30943952

RESUMEN

BACKGROUND: Osteoporosis has become an important public health issue. Most of the population, particularly elderly people, are at some degree of risk of osteoporosis-related fractures. Accurate identification and surveillance of patient populations with fractures has a significant impact on reduction of cost of care by preventing future fractures and its corresponding complications. METHODS: In this study, we developed a rule-based natural language processing (NLP) algorithm for identification of twenty skeletal site-specific fractures from radiology reports. The rule-based NLP algorithm was based on regular expressions developed using MedTagger, an NLP tool of the Apache Unstructured Information Management Architecture (UIMA) pipeline to facilitate information extraction from clinical narratives. Radiology notes were retrieved from the Mayo Clinic electronic health records data warehouse. We developed rules for identifying each fracture type according to physicians' knowledge and experience, and refined these rules via verification with physicians. This study was approved by the institutional review board (IRB) for human subject research. RESULTS: We validated the NLP algorithm using the radiology reports of a community-based cohort at Mayo Clinic with the gold standard constructed by medical experts. The micro-averaged results of sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F1-score of the proposed NLP algorithm are 0.930, 1.0, 1.0, 0.941, 0.961, respectively. The F1-score is 1.0 for 8 fractures, and above 0.9 for a total of 17 out of 20 fractures (85%). CONCLUSIONS: The results verified the effectiveness of the proposed rule-based NLP algorithm in automatic identification of osteoporosis-related skeletal site-specific fractures from radiology reports. The NLP algorithm could be utilized to accurately identify the patients with fractures and those who are also at high risk of future fractures due to osteoporosis. Appropriate care interventions to those patients, not only the most at-risk patients but also those with emerging risk, would significantly reduce future fractures.


Asunto(s)
Fracturas Óseas/clasificación , Procesamiento de Lenguaje Natural , Radiología , Anciano , Algoritmos , Estudios de Cohortes , Registros Electrónicos de Salud , Femenino , Humanos , Almacenamiento y Recuperación de la Información
15.
BMC Med Inform Decis Mak ; 19(Suppl 4): 149, 2019 08 08.
Artículo en Inglés | MEDLINE | ID: mdl-31391041

RESUMEN

BACKGROUND: The aging population has led to an increase in cognitive impairment (CI) resulting in significant costs to patients, their families, and society. A research endeavor on a large cohort to better understand the frequency and severity of CI is urgent to respond to the health needs of this population. However, little is known about temporal trends of patient health functions (i.e., activity of daily living [ADL]) and how these trends are associated with the onset of CI in elderly patients. Also, the use of a rich source of clinical free text in electronic health records (EHRs) to facilitate CI research has not been well explored. The aim of this study is to characterize and better understand early signals of elderly patient CI by examining temporal trends of patient ADL and analyzing topics of patient medical conditions in clinical free text using topic models. METHODS: The study cohort consists of physician-diagnosed CI patients (n = 1,435) and cognitively unimpaired (CU) patients (n = 1,435) matched by age and sex, selected from patients 65 years of age or older at the time of enrollment in the Mayo Clinic Biobank. A corpus analysis was performed to examine the basic statistics of event types and practice settings where the physician first diagnosed CI. We analyzed the distribution of ADL in three different age groups over time before the development of CI. Furthermore, we applied three different topic modeling approaches on clinical free text to examine how patients' medical conditions change over time when they were close to CI diagnosis. RESULTS: The trajectories of ADL deterioration became steeper in CI patients than CU patients approximately 1 to 1.5 year(s) before the actual physician diagnosis of CI. The topic modeling showed that the topic terms were mostly correlated and captured the underlying semantics relevant to CI when approaching to CI diagnosis. CONCLUSIONS: There exist notable differences in temporal trends of basic and instrumental ADL between CI and CU patients. The trajectories of certain individual ADL, such as bathing and responsibility of own medication, were closely associated with CI development. The topic terms obtained by topic modeling methods from clinical free text have a potential to show how CI patients' conditions evolve and reveal overlooked conditions when they close to CI diagnosis.


Asunto(s)
Actividades Cotidianas , Disfunción Cognitiva/epidemiología , Factores de Edad , Anciano , Anciano de 80 o más Años , Disfunción Cognitiva/complicaciones , Disfunción Cognitiva/psicología , Estudios de Cohortes , Registros Electrónicos de Salud , Femenino , Humanos , Masculino , Factores de Tiempo
16.
BMC Med Inform Decis Mak ; 19(1): 1, 2019 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-30616584

RESUMEN

BACKGROUND: Automatic clinical text classification is a natural language processing (NLP) technology that unlocks information embedded in clinical narratives. Machine learning approaches have been shown to be effective for clinical text classification tasks. However, a successful machine learning model usually requires extensive human efforts to create labeled training data and conduct feature engineering. In this study, we propose a clinical text classification paradigm using weak supervision and deep representation to reduce these human efforts. METHODS: We develop a rule-based NLP algorithm to automatically generate labels for the training data, and then use the pre-trained word embeddings as deep representation features for training machine learning models. Since machine learning is trained on labels generated by the automatic NLP algorithm, this training process is called weak supervision. We evaluat the paradigm effectiveness on two institutional case studies at Mayo Clinic: smoking status classification and proximal femur (hip) fracture classification, and one case study using a public dataset: the i2b2 2006 smoking status classification shared task. We test four widely used machine learning models, namely, Support Vector Machine (SVM), Random Forest (RF), Multilayer Perceptron Neural Networks (MLPNN), and Convolutional Neural Networks (CNN), using this paradigm. Precision, recall, and F1 score are used as metrics to evaluate performance. RESULTS: CNN achieves the best performance in both institutional tasks (F1 score: 0.92 for Mayo Clinic smoking status classification and 0.97 for fracture classification). We show that word embeddings significantly outperform tf-idf and topic modeling features in the paradigm, and that CNN captures additional patterns from the weak supervision compared to the rule-based NLP algorithms. We also observe two drawbacks of the proposed paradigm that CNN is more sensitive to the size of training data, and that the proposed paradigm might not be effective for complex multiclass classification tasks. CONCLUSION: The proposed clinical text classification paradigm could reduce human efforts of labeled training data creation and feature engineering for applying machine learning to clinical text classification by leveraging weak supervision and deep representation. The experimental experiments have validated the effectiveness of paradigm by two institutional and one shared clinical text classification tasks.


Asunto(s)
Algoritmos , Registros Electrónicos de Salud , Aprendizaje Automático , Procesamiento de Lenguaje Natural , Redes Neurales de la Computación , Conjuntos de Datos como Asunto , Fracturas de Cadera/clasificación , Humanos , Fumar
17.
J Arthroplasty ; 34(10): 2216-2219, 2019 10.
Artículo en Inglés | MEDLINE | ID: mdl-31416741

RESUMEN

BACKGROUND: Manual chart review is labor-intensive and requires specialized knowledge possessed by highly trained medical professionals. The cost and infrastructure challenges required to implement this is prohibitive for most hospitals. Natural language processing (NLP) tools are distinctive in their ability to extract critical information from unstructured text in the electronic health records. As a simple proof-of-concept for the potential application of NLP technology in total hip arthroplasty (THA), we examined its ability to identify periprosthetic femur fractures (PPFFx) followed by more complex Vancouver classification. METHODS: PPFFx were identified among all THAs performed at a single academic institution between 1998 and 2016. A randomly selected training cohort (1538 THAs with 89 PPFFx cases) was used to develop the prototype NLP algorithm and an additional randomly selected cohort (2982 THAs with 84 PPFFx cases) was used to further validate the algorithm. Keywords to identify, and subsequently classify, Vancouver type PPFFx about THA were defined. The gold standard was confirmed by experienced orthopedic surgeons using chart and radiographic review. The algorithm was applied to consult and operative notes to evaluate language used by surgeons as a means to predict the correct pathology in the absence of a listed, precise diagnosis. Given the variability inherent to fracture descriptions by different surgeons, an iterative process was used to improve the algorithm during the training phase following error identification. Validation statistics were calculated using manual chart review as the gold standard. RESULTS: In distinguishing PPFFx, the NLP algorithm demonstrated 100% sensitivity and 99.8% specificity. Among 84 PPFFx test cases, the algorithm demonstrated 78.6% sensitivity and 94.8% specificity in determining the correct Vancouver classification. CONCLUSION: NLP-enabled algorithms are a promising alternative to manual chart review for identifying THA outcomes. NLP algorithms applied to surgeon notes demonstrated excellent accuracy in delineating PPFFx, but accuracy was low for Vancouver classification subtype. This proof-of-concept study supports the use of NLP technology to extract THA-specific data elements from the unstructured text in electronic health records in an expeditious and cost-effective manner. LEVEL OF EVIDENCE: Level III.


Asunto(s)
Registros Electrónicos de Salud , Fracturas del Fémur/diagnóstico , Procesamiento de Lenguaje Natural , Fracturas Periprotésicas/diagnóstico , Anciano , Anciano de 80 o más Años , Algoritmos , Estudios de Cohortes , Femenino , Humanos , Lenguaje , Masculino , Ortopedia , Prueba de Estudio Conceptual , Sensibilidad y Especificidad , Cirujanos
18.
J Biomed Inform ; 83: 167-177, 2018 07.
Artículo en Inglés | MEDLINE | ID: mdl-29883623

RESUMEN

Sequences of events have often been modeled with computational techniques, but typical preprocessing steps and problem settings do not explicitly address the ramifications of timestamped events. Clinical data, such as is found in electronic health records (EHRs), typically comes with timestamp information. In this work, we define event sequences and their properties: synchronicity, evenness, and co-cardinality; we then show how asynchronous, uneven, and multi-cardinal problem settings can support explicit accountings of relative time. Our evaluation uses the temporally sensitive clinical use case of pediatric asthma, which is a chronic disease with symptoms (and lack thereof) evolving over time. We show several approaches to explicitly incorporating relative time into a recurrent neural network (RNN) model that improve the overall classification of patients into those with no asthma, those with persistent asthma, those in long-term remission, and those who have experienced relapse. We also compare and contrast these results with those in an inpatient intensive care setting.


Asunto(s)
Asma/clasificación , Registros Electrónicos de Salud , Redes Neurales de la Computación , Niño , Preescolar , Simulación por Computador , Humanos , Lactante , Unidades de Cuidados Intensivos/estadística & datos numéricos , Recurrencia
19.
J Biomed Inform ; 77: 34-49, 2018 01.
Artículo en Inglés | MEDLINE | ID: mdl-29162496

RESUMEN

BACKGROUND: With the rapid adoption of electronic health records (EHRs), it is desirable to harvest information and knowledge from EHRs to support automated systems at the point of care and to enable secondary use of EHRs for clinical and translational research. One critical component used to facilitate the secondary use of EHR data is the information extraction (IE) task, which automatically extracts and encodes clinical information from text. OBJECTIVES: In this literature review, we present a review of recent published research on clinical information extraction (IE) applications. METHODS: A literature search was conducted for articles published from January 2009 to September 2016 based on Ovid MEDLINE In-Process & Other Non-Indexed Citations, Ovid MEDLINE, Ovid EMBASE, Scopus, Web of Science, and ACM Digital Library. RESULTS: A total of 1917 publications were identified for title and abstract screening. Of these publications, 263 articles were selected and discussed in this review in terms of publication venues and data sources, clinical IE tools, methods, and applications in the areas of disease- and drug-related studies, and clinical workflow optimizations. CONCLUSIONS: Clinical IE has been used for a wide range of applications, however, there is a considerable gap between clinical studies using EHR data and studies using clinical IE. This study enabled us to gain a more concrete understanding of the gap and to provide potential solutions to bridge this gap.


Asunto(s)
Registros Electrónicos de Salud , Almacenamiento y Recuperación de la Información/métodos , Informática Médica/tendencias , Humanos , Uso Significativo , Procesamiento de Lenguaje Natural , Proyectos de Investigación
20.
Am J Respir Crit Care Med ; 196(4): 430-437, 2017 08 15.
Artículo en Inglés | MEDLINE | ID: mdl-28375665

RESUMEN

RATIONALE: Difficulty of asthma ascertainment and its associated methodologic heterogeneity have created significant barriers to asthma care and research. OBJECTIVES: We evaluated the validity of an existing natural language processing (NLP) algorithm for asthma criteria to enable an automated chart review using electronic medical records (EMRs). METHODS: The study was designed as a retrospective birth cohort study using a random sample of 500 subjects from the 1997-2007 Mayo Birth Cohort who were born at Mayo Clinic and enrolled in primary pediatric care at Mayo Clinic Rochester. Performance of NLP-based asthma ascertainment using predetermined asthma criteria was assessed by determining both criterion validity (chart review of EMRs by abstractor as a gold standard) and construct validity (association with known risk factors for asthma, such as allergic rhinitis). MEASUREMENTS AND MAIN RESULTS: After excluding three subjects whose respiratory symptoms could be attributed to other conditions (e.g., tracheomalacia), among the remaining eligible 497 subjects, 51% were male, 77% white persons, and the median age at last follow-up date was 11.5 years. The asthma prevalence was 31% in the study cohort. Sensitivity, specificity, positive predictive value, and negative predictive value for NLP algorithm in predicting asthma status were 97%, 95%, 90%, and 98%, respectively. The risk factors for asthma (e.g., allergic rhinitis) that were identified either by NLP or the abstractor were the same. CONCLUSIONS: Asthma ascertainment through NLP should be considered in the era of EMRs because it can enable large-scale clinical studies in a more time-efficient manner and improve the recognition and care of childhood asthma in practice.


Asunto(s)
Asma/epidemiología , Registros Electrónicos de Salud/estadística & datos numéricos , Procesamiento de Lenguaje Natural , Adolescente , Niño , Preescolar , Estudios de Cohortes , Femenino , Humanos , Masculino , Minnesota/epidemiología , Prevalencia , Reproducibilidad de los Resultados , Estudios Retrospectivos , Factores de Riesgo , Sensibilidad y Especificidad
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA