Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 46
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Eur Radiol ; 2024 Jun 06.
Article in English | MEDLINE | ID: mdl-38842692

ABSTRACT

OBJECTIVES: To develop an automated pipeline for extracting prostate cancer-related information from clinical notes. MATERIALS AND METHODS: This retrospective study included 23,225 patients who underwent prostate MRI between 2017 and 2022. Cancer risk factors (family history of cancer and digital rectal exam findings), pre-MRI prostate pathology, and treatment history of prostate cancer were extracted from free-text clinical notes in English as binary or multi-class classification tasks. Any sentence containing pre-defined keywords was extracted from clinical notes within one year before the MRI. After manually creating sentence-level datasets with ground truth, Bidirectional Encoder Representations from Transformers (BERT)-based sentence-level models were fine-tuned using the extracted sentence as input and the category as output. The patient-level output was determined by compilation of multiple sentence-level outputs using tree-based models. Sentence-level classification performance was evaluated using the area under the receiver operating characteristic curve (AUC) on 15% of the sentence-level dataset (sentence-level test set). The patient-level classification performance was evaluated on the patient-level test set created by radiologists by reviewing the clinical notes of 603 patients. Accuracy and sensitivity were compared between the pipeline and radiologists. RESULTS: Sentence-level AUCs were ≥ 0.94. The pipeline showed higher patient-level sensitivity for extracting cancer risk factors (e.g., family history of prostate cancer, 96.5% vs. 77.9%, p < 0.001), but lower accuracy in classifying pre-MRI prostate pathology (92.5% vs. 95.9%, p = 0.002) and treatment history of prostate cancer (95.5% vs. 97.7%, p = 0.03) than radiologists, respectively. CONCLUSION: The proposed pipeline showed promising performance, especially for extracting cancer risk factors from patient's clinical notes. CLINICAL RELEVANCE STATEMENT: The natural language processing pipeline showed a higher sensitivity for extracting prostate cancer risk factors than radiologists and may help efficiently gather relevant text information when interpreting prostate MRI. KEY POINTS: When interpreting prostate MRI, it is necessary to extract prostate cancer-related information from clinical notes. This pipeline extracted the presence of prostate cancer risk factors with higher sensitivity than radiologists. Natural language processing may help radiologists efficiently gather relevant prostate cancer-related text information.

2.
J Biomed Inform ; 152: 104623, 2024 04.
Article in English | MEDLINE | ID: mdl-38458578

ABSTRACT

INTRODUCTION: Patients' functional status assesses their independence in performing activities of daily living, including basic ADLs (bADL), and more complex instrumental activities (iADL). Existing studies have discovered that patients' functional status is a strong predictor of health outcomes, particularly in older adults. Depite their usefulness, much of the functional status information is stored in electronic health records (EHRs) in either semi-structured or free text formats. This indicates the pressing need to leverage computational approaches such as natural language processing (NLP) to accelerate the curation of functional status information. In this study, we introduced FedFSA, a hybrid and federated NLP framework designed to extract functional status information from EHRs across multiple healthcare institutions. METHODS: FedFSA consists of four major components: 1) individual sites (clients) with their private local data, 2) a rule-based information extraction (IE) framework for ADL extraction, 3) a BERT model for functional status impairment classification, and 4) a concept normalizer. The framework was implemented using the OHNLP Backbone for rule-based IE and open-source Flower and PyTorch library for federated BERT components. For gold standard data generation, we carried out corpus annotation to identify functional status-related expressions based on ICF definitions. Four healthcare institutions were included in the study. To assess FedFSA, we evaluated the performance of category- and institution-specific ADL extraction across different experimental designs. RESULTS: ADL extraction performance ranges from an F1-score of 0.907 to 0.986 for bADL and 0.825 to 0.951 for iADL across the four healthcare sites. The performance for ADL extraction with impairment ranges from an F1-score of 0.722 to 0.954 for bADL and 0.674 to 0.813 for iADL across four healthcare sites. For category-specific ADL extraction, laundry and transferring yielded relatively high performance, while dressing, medication, bathing, and continence achieved moderate-high performance. Conversely, food preparation and toileting showed low performance. CONCLUSION: NLP performance varied across ADL categories and healthcare sites. Federated learning using a FedFSA framework performed higher than non-federated learning for impaired ADL extraction at all healthcare sites. Our study demonstrated the potential of the federated learning framework in functional status extraction and impairment classification in EHRs, exemplifying the importance of a large-scale, multi-institutional collaborative development effort.


Subject(s)
Activities of Daily Living , Functional Status , Humans , Aged , Learning , Information Storage and Retrieval , Natural Language Processing
3.
Endocr Pract ; 30(1): 31-35, 2024 Jan.
Article in English | MEDLINE | ID: mdl-37805101

ABSTRACT

OBJECTIVE: Thyroid palpation is a common clinical practice to detect thyroid abnormalities. However, its accuracy and potential for additional findings remain unclear. This study aimed to assess the diagnostic accuracy of physical exams in detecting thyroid nodules. METHODS: A retrospective observational study was conducted on a random sample of adult patients who underwent their first-time thyroid ultrasound between January 2015 and September 2017, following a documented thyroid physical exam. The study assessed the performance of thyroid palpation in detecting 1 or multiple thyroid nodules, as well as the proportion of additional findings on ultrasounds due to false positive thyroid palpation. RESULTS: We included 327 patients, mostly female (65.1%), white (84.1%), and treated in a primary care setting (54.4%) with a mean age of 50.8 years (SD 16.9). For solitary thyroid nodules, the physical exam had a sensitivity of 20.3%, specificity of 79.1%, an accuracy of 68.5%, negative predictive value of 81.8%, and positive predictive value of 17.6%. For detecting a multinodular goiter, physical exams demonstrated a sensitivity of 10.8%, specificity of 96.5%, accuracy of 55.4%, negative predictive value of 53.9, and positive predictive value of 73.9%. Among 154 cases with palpable nodules, 60% had additional nodules found in subsequent thyroid ultrasound. CONCLUSION: Thyroid physical exam has limited diagnostic performance and leads to additional findings when followed by a thyroid ultrasound. Future efforts should be directed at improving the accuracy of thyroid physical exams or re-evaluating its routine use.


Subject(s)
Goiter , Thyroid Neoplasms , Thyroid Nodule , Adult , Female , Humans , Male , Middle Aged , Palpation , Predictive Value of Tests , Retrospective Studies , Sensitivity and Specificity , Thyroid Neoplasms/diagnosis , Thyroid Nodule/diagnostic imaging , Ultrasonography , Aged
4.
Endocr Pract ; 29(12): 948-954, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37722595

ABSTRACT

OBJECTIVE: Excessive use of thyroid ultrasound (TUS) contributes to the overdiagnosis of thyroid nodules and thyroid cancer. In this study, we evaluated drivers of and clinical trajectories following TUS orders. METHODS: We conducted a retrospective review of 500 adult patients who underwent an initial TUS between 2015 and 2017 at Mayo Clinic in Rochester, MN. A framework was employed to classify the indication for TUS, and it was characterized as inappropriate when ordered without a guideline-based indication. Medical records were reviewed for up to 12 months following the TUS, and clinical outcomes were evaluated. RESULTS: The mean age mean age (SD) was 53.6 years (16.6), 63.8% female, and 86.6% white. TUS orders were triggered by incidental findings on unrelated imaging (31.6%), thyroid symptoms (20.4%), thyroid abnormalities on routine physical examination (17.2%), and thyroid dysfunction workup (11.8%). In females and males, the most common reason were incidental findings on imaging (female, 91/319, 28.5% and male, 67/181, 37.0%). In primary care practice, TUS orders were mostly triggered by symptoms (71/218, 32.5%), while thyroid dysfunction workup was the primary reason in endocrinology (28/100, 28.0%). We classified 11.2% (56/500) TUS orders as likely to have been ordered inappropriately based on current guidelines. Finally, 119 patients (119/500, 23.8%) had a thyroid biopsy with 11.8% had thyroid cancer (14/119. 11.8%). CONCLUSIONS: Incidental findings on imaging, symptoms, and routine physical exam findings in asymptomatic patients were the most prevalent drivers of TUS. Furthermore, 1 in 10 TUS were likely inappropriately ordered based on current practice guidelines.


Subject(s)
Thyroid Neoplasms , Thyroid Nodule , Adult , Humans , Male , Female , Middle Aged , Retrospective Studies , Thyroid Nodule/pathology , Thyroid Neoplasms/pathology , Biopsy , Ultrasonography
5.
J Med Internet Res ; 24(1): e17273, 2022 01 11.
Article in English | MEDLINE | ID: mdl-35014964

ABSTRACT

BACKGROUND: Patient-clinician secure messaging is an important function in patient portals and enables patients and clinicians to communicate on a wide spectrum of issues in a timely manner. With its growing adoption and patient engagement, it is time to comprehensively study the secure messages and user behaviors in order to improve patient-centered care. OBJECTIVE: The aim of this paper was to analyze the secure messages sent by patients and clinicians in a large multispecialty health system at Mayo Clinic, Rochester. METHODS: We performed message-based, sender-based, and thread-based analyses of more than 5 million secure messages between 2010 and 2017. We summarized the message volumes, patient and clinician population sizes, message counts per patient or clinician, as well as the trends of message volumes and user counts over the years. In addition, we calculated the time distribution of clinician-sent messages to understand their workloads at different times of a day. We also analyzed the time delay in clinician responses to patient messages to assess their communication efficiency and the back-and-forth rounds to estimate the communication complexity. RESULTS: During 2010-2017, the patient portal at Mayo Clinic, Rochester experienced a significant growth in terms of the count of patient users and the total number of secure messages sent by patients and clinicians. Three clinician categories, namely "physician-primary care," "registered nurse-specialty," and "physician-specialty," bore the majority of message volume increase. The patient portal also demonstrated growing trends in message counts per patient and clinician. The "nurse practitioner or physician assistant-primary care" and "physician-primary care" categories had the heaviest per-clinician workload each year. Most messages by the clinicians were sent from 7 AM to 5 PM during a day. Yet, between 5 PM and 7 PM, the physicians sent 7.0% (95,785/1,377,006) of their daily messages, and the nurse practitioner or physician assistant sent 5.4% (22,121/408,526) of their daily messages. The clinicians replied to 72.2% (1,272,069/1,761,739) patient messages within 1 day and 90.6% (1,595,702/1,761,739) within 3 days. In 95.1% (1,499,316/1,576,205) of the message threads, the patients communicated with their clinicians back and forth for no more than 4 rounds. CONCLUSIONS: Our study found steady increases in patient adoption of the secure messaging system and the average workload per clinician over 8 years. However, most clinicians responded timely to meet the patients' needs. Our study also revealed differential patient-clinician communication patterns across different practice roles and care settings. These findings suggest opportunities for care teams to optimize messaging tasks and to balance the workload for optimal efficiency.


Subject(s)
Medicine , Patient Portals , Communication , Humans , Patient Participation , Retrospective Studies
6.
J Biomed Inform ; 113: 103660, 2021 01.
Article in English | MEDLINE | ID: mdl-33321199

ABSTRACT

Coronavirus Disease 2019 has emerged as a significant global concern, triggering harsh public health restrictions in a successful bid to curb its exponential growth. As discussion shifts towards relaxation of these restrictions, there is significant concern of second-wave resurgence. The key to managing these outbreaks is early detection and intervention, and yet there is a significant lag time associated with usage of laboratory confirmed cases for surveillance purposes. To address this, syndromic surveillance can be considered to provide a timelier alternative for first-line screening. Existing syndromic surveillance solutions are however typically focused around a known disease and have limited capability to distinguish between outbreaks of individual diseases sharing similar syndromes. This poses a challenge for surveillance of COVID-19 as its active periods tend to overlap temporally with other influenza-like illnesses. In this study we explore performing sentinel syndromic surveillance for COVID-19 and other influenza-like illnesses using a deep learning-based approach. Our methods are based on aberration detection utilizing autoencoders that leverages symptom prevalence distributions to distinguish outbreaks of two ongoing diseases that share similar syndromes, even if they occur concurrently. We first demonstrate that this approach works for detection of outbreaks of influenza, which has known temporal boundaries. We then demonstrate that the autoencoder can be trained to not alert on known and well-managed influenza-like illnesses such as the common cold and influenza. Finally, we applied our approach to 2019-2020 data in the context of a COVID-19 syndromic surveillance task to demonstrate how implementation of such a system could have provided early warning of an outbreak of a novel influenza-like illness that did not match the symptom prevalence profile of influenza and other known influenza-like illnesses.


Subject(s)
COVID-19/epidemiology , Influenza, Human/epidemiology , Sentinel Surveillance , COVID-19/virology , Deep Learning , Disease Outbreaks , Humans , SARS-CoV-2/isolation & purification
7.
J Biomed Inform ; 58 Suppl: S164-S170, 2015 Dec.
Article in English | MEDLINE | ID: mdl-26279500

ABSTRACT

In the United States, about 600,000 people die of heart disease every year. The annual cost of care services, medications, and lost productivity reportedly exceeds 108.9 billion dollars. Effective disease risk assessment is critical to prevention, care, and treatment planning. Recent advancements in text analytics have opened up new possibilities of using the rich information in electronic medical records (EMRs) to identify relevant risk factors. The 2014 i2b2/UTHealth Challenge brought together researchers and practitioners of clinical natural language processing (NLP) to tackle the identification of heart disease risk factors reported in EMRs. We participated in this track and developed an NLP system by leveraging existing tools and resources, both public and proprietary. Our system was a hybrid of several machine-learning and rule-based components. The system achieved an overall F1 score of 0.9185, with a recall of 0.9409 and a precision of 0.8972.


Subject(s)
Cardiovascular Diseases/epidemiology , Data Mining/methods , Diabetes Complications/epidemiology , Electronic Health Records/organization & administration , Narration , Natural Language Processing , Aged , California/epidemiology , Cardiovascular Diseases/diagnosis , Cohort Studies , Comorbidity , Computer Security , Confidentiality , Diabetes Complications/diagnosis , Female , Humans , Incidence , Longitudinal Studies , Male , Middle Aged , Pattern Recognition, Automated/methods , Risk Assessment/methods , Vocabulary, Controlled
8.
BMC Med Inform Decis Mak ; 15 Suppl 1: S2, 2015.
Article in English | MEDLINE | ID: mdl-26045009

ABSTRACT

BACKGROUND: Parsing, which generates a syntactic structure of a sentence (a parse tree), is a critical component of natural language processing (NLP) research in any domain including medicine. Although parsers developed in the general English domain, such as the Stanford parser, have been applied to clinical text, there are no formal evaluations and comparisons of their performance in the medical domain. METHODS: In this study, we investigated the performance of three state-of-the-art parsers: the Stanford parser, the Bikel parser, and the Charniak parser, using following two datasets: (1) A Treebank containing 1,100 sentences that were randomly selected from progress notes used in the 2010 i2b2 NLP challenge and manually annotated according to a Penn Treebank based guideline; and (2) the MiPACQ Treebank, which is developed based on pathology notes and clinical notes, containing 13,091 sentences. We conducted three experiments on both datasets. First, we measured the performance of the three state-of-the-art parsers on the clinical Treebanks with their default settings. Then we re-trained the parsers using the clinical Treebanks and evaluated their performance using the 10-fold cross validation method. Finally we re-trained the parsers by combining the clinical Treebanks with the Penn Treebank. RESULTS: Our results showed that the original parsers achieved lower performance in clinical text (Bracketing F-measure in the range of 66.6%-70.3%) compared to general English text. After retraining on the clinical Treebank, all parsers achieved better performance, with the best performance from the Stanford parser that reached the highest Bracketing F-measure of 73.68% on progress notes and 83.72% on the MiPACQ corpus using 10-fold cross validation. When the combined clinical Treebanks and Penn Treebank was used, of the three parsers, the Charniak parser achieved the highest Bracketing F-measure of 73.53% on progress notes and the Stanford parser reached the highest F-measure of 84.15% on the MiPACQ corpus. CONCLUSIONS: Our study demonstrates that re-training using clinical Treebanks is critical for improving general English parsers' performance on clinical text, and combining clinical and open domain corpora might achieve optimal performance for parsing clinical text.


Subject(s)
Linguistics/methods , Medical Informatics/methods , Natural Language Processing , Humans
9.
J Am Med Inform Assoc ; 31(8): 1714-1724, 2024 Aug 01.
Article in English | MEDLINE | ID: mdl-38934289

ABSTRACT

OBJECTIVES: The surge in patient portal messages (PPMs) with increasing needs and workloads for efficient PPM triage in healthcare settings has spurred the exploration of AI-driven solutions to streamline the healthcare workflow processes, ensuring timely responses to patients to satisfy their healthcare needs. However, there has been less focus on isolating and understanding patient primary concerns in PPMs-a practice which holds the potential to yield more nuanced insights and enhances the quality of healthcare delivery and patient-centered care. MATERIALS AND METHODS: We propose a fusion framework to leverage pretrained language models (LMs) with different language advantages via a Convolution Neural Network for precise identification of patient primary concerns via multi-class classification. We examined 3 traditional machine learning models, 9 BERT-based language models, 6 fusion models, and 2 ensemble models. RESULTS: The outcomes of our experimentation underscore the superior performance achieved by BERT-based models in comparison to traditional machine learning models. Remarkably, our fusion model emerges as the top-performing solution, delivering a notably improved accuracy score of 77.67 ± 2.74% and an F1 score of 74.37 ± 3.70% in macro-average. DISCUSSION: This study highlights the feasibility and effectiveness of multi-class classification for patient primary concern detection and the proposed fusion framework for enhancing primary concern detection. CONCLUSIONS: The use of multi-class classification enhanced by a fusion of multiple pretrained LMs not only improves the accuracy and efficiency of patient primary concern identification in PPMs but also aids in managing the rising volume of PPMs in healthcare, ensuring critical patient communications are addressed promptly and accurately.


Subject(s)
Machine Learning , Patient Portals , Humans , Neural Networks, Computer , Natural Language Processing
10.
medRxiv ; 2024 May 22.
Article in English | MEDLINE | ID: mdl-38826441

ABSTRACT

The consistent and persuasive evidence illustrating the influence of social determinants on health has prompted a growing realization throughout the health care sector that enhancing health and health equity will likely depend, at least to some extent, on addressing detrimental social determinants. However, detailed social determinants of health (SDoH) information is often buried within clinical narrative text in electronic health records (EHRs), necessitating natural language processing (NLP) methods to automatically extract these details. Most current NLP efforts for SDoH extraction have been limited, investigating on limited types of SDoH elements, deriving data from a single institution, focusing on specific patient cohorts or note types, with reduced focus on generalizability. This study aims to address these issues by creating cross-institutional corpora spanning different note types and healthcare systems, and developing and evaluating the generalizability of classification models, including novel large language models (LLMs), for detecting SDoH factors from diverse types of notes from four institutions: Harris County Psychiatric Center, University of Texas Physician Practice, Beth Israel Deaconess Medical Center, and Mayo Clinic. Four corpora of deidentified clinical notes were annotated with 21 SDoH factors at two levels: level 1 with SDoH factor types only and level 2 with SDoH factors along with associated values. Three traditional classification algorithms (XGBoost, TextCNN, Sentence BERT) and an instruction tuned LLM-based approach (LLaMA) were developed to identify multiple SDoH factors. Substantial variation was noted in SDoH documentation practices and label distributions based on patient cohorts, note types, and hospitals. The LLM achieved top performance with micro-averaged F1 scores over 0.9 on level 1 annotated corpora and an F1 over 0.84 on level 2 annotated corpora. While models performed well when trained and tested on individual datasets, cross-dataset generalization highlighted remaining obstacles. To foster collaboration, access to partial annotated corpora and models trained by merging all annotated datasets will be made available on the PhysioNet repository.

11.
Mayo Clin Proc Digit Health ; 2(1): 67-74, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38501072

ABSTRACT

Objective: To address thyroid cancer overdiagnosis, we aim to develop a natural language processing (NLP) algorithm to determine the appropriateness of thyroid ultrasounds (TUS). Patients and Methods: Between 2017 and 2021, we identified 18,000 TUS patients at Mayo Clinic and selected 628 for chart review to create a ground truth dataset based on consensus. We developed a rule-based NLP pipeline to identify TUS as appropriate TUS (aTUS) or inappropriate TUS (iTUS) using patients' clinical notes and additional meta information. In addition, we designed an abbreviated NLP pipeline (aNLP) solely focusing on labels from TUS order requisitions to facilitate deployment at other health care systems. Our dataset was split into a training set of 468 (75%) and a test set of 160 (25%), using the former for rule development and the latter for performance evaluation. Results: There were 449 (95.9%) patients identified as aTUS and 19 (4.06%) as iTUS in the training set; there are 155 (96.88%) patients identified as aTUS and 5 (3.12%) were iTUS in the test set. In the training set, the pipeline achieved a sensitivity of 0.99, specificity of 0.95, and positive predictive value of 1.0 for detecting aTUS. The testing cohort revealed a sensitivity of 0.96, specificity of 0.80, and positive predictive value of 0.99. Similar performance metrics were observed in the aNLP pipeline. Conclusion: The NLP models can accurately identify the appropriateness of a thyroid ultrasound from clinical documentation and order requisition information, a critical initial step toward evaluating the drivers and outcomes of TUS use and subsequent thyroid cancer overdiagnosis.

12.
Mayo Clin Proc Digit Health ; 2(2): 270-279, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38938930

ABSTRACT

This study aimed to review the application of natural language processing (NLP) in thyroid-related conditions and to summarize current challenges and potential future directions. We performed a systematic search of databases for studies describing NLP applications in thyroid conditions published in English between January 1, 2012 and November 4, 2022. In addition, we used a snowballing technique to identify studies missed in the initial search or published after our search timeline until April 1, 2023. For included studies, we extracted the NLP method (eg, rule-based, machine learning, deep learning, or hybrid), NLP application (eg, identification, classification, and automation), thyroid condition (eg, thyroid cancer, thyroid nodule, and functional or autoimmune disease), data source (eg, electronic health records, health forums, medical literature databases, or genomic databases), performance metrics, and stages of development. We identified 24 eligible NLP studies focusing on thyroid-related conditions. Deep learning-based methods were the most common (38%), followed by rule-based (21%), and traditional machine learning (21%) methods. Thyroid nodules (54%) and thyroid cancer (29%) were the primary conditions under investigation. Electronic health records were the dominant data source (17/24, 71%), with imaging reports being the most frequently used (15/17, 88%). There is increasing interest in NLP applications for thyroid-related studies, mostly addressing thyroid nodules and using deep learning-based methodologies with limited external validation. However, none of the reviewed NLP applications have reached clinical practice. Several limitations, including inconsistent clinical documentation and model portability, need to be addressed to promote the evaluation and implementation of NLP applications to support patient care in thyroidology.

13.
Front Digit Health ; 5: 958338, 2023.
Article in English | MEDLINE | ID: mdl-37168528

ABSTRACT

Chronic pain (CP) lasts for more than 3 months, causing prolonged physical and mental burdens to patients. According to the US Centers for Disease Control and Prevention, CP contributes to more than 500 billion US dollars yearly in direct medical cost plus the associated productivity loss. CP is complex in etiology and can occur anywhere in the body, making it difficult to treat and manage. There is a pressing need for research to better summarize the common health issues faced by consumers living with CP and their experience in accessing over-the-counter analgesics or therapeutic devices. Modern online shopping platforms offer a broad array of opportunities for the secondary use of consumer-generated data in CP research. In this study, we performed an exploratory data mining study that analyzed CP-related Amazon product reviews. Our descriptive analyses characterized the review language, the reviewed products, the representative topics, and the network of comorbidities mentioned in the reviews. The results indicated that most of the reviews were concise yet rich in terms of representing the various health issues faced by people with CP. Despite the noise in the online reviews, we see potential in leveraging the data to capture certain consumer-reported outcomes or to identify shortcomings of the available products.

14.
JMIR AI ; 2: e41818, 2023 Jun 20.
Article in English | MEDLINE | ID: mdl-38875580

ABSTRACT

BACKGROUND: Extractive question-answering (EQA) is a useful natural language processing (NLP) application for answering patient-specific questions by locating answers in their clinical notes. Realistic clinical EQA can yield multiple answers to a single question and multiple focus points in 1 question, which are lacking in existing data sets for the development of artificial intelligence solutions. OBJECTIVE: This study aimed to create a data set for developing and evaluating clinical EQA systems that can handle natural multianswer and multifocus questions. METHODS: We leveraged the annotated relations from the 2018 National NLP Clinical Challenges corpus to generate an EQA data set. Specifically, the 1-to-N, M-to-1, and M-to-N drug-reason relations were included to form the multianswer and multifocus question-answering entries, which represent more complex and natural challenges in addition to the basic 1-drug-1-reason cases. A baseline solution was developed and tested on the data set. RESULTS: The derived RxWhyQA data set contains 96,939 QA entries. Among the answerable questions, 25% of them require multiple answers, and 2% of them ask about multiple drugs within 1 question. Frequent cues were observed around the answers in the text, and 90% of the drug and reason terms occurred within the same or an adjacent sentence. The baseline EQA solution achieved a best F1-score of 0.72 on the entire data set, and on specific subsets, it was 0.93 for the unanswerable questions, 0.48 for single-drug questions versus 0.60 for multidrug questions, and 0.54 for the single-answer questions versus 0.43 for multianswer questions. CONCLUSIONS: The RxWhyQA data set can be used to train and evaluate systems that need to handle multianswer and multifocus questions. Specifically, multianswer EQA appears to be challenging and therefore warrants more investment in research. We created and shared a clinical EQA data set with multianswer and multifocus questions that would channel future research efforts toward more realistic scenarios.

15.
Thyroid ; 33(8): 903-917, 2023 08.
Article in English | MEDLINE | ID: mdl-37279303

ABSTRACT

Background: The use of artificial intelligence (AI) in health care has grown exponentially with the promise of facilitating biomedical research and enhancing diagnosis, treatment, monitoring, disease prevention, and health care delivery. We aim to examine the current state, limitations, and future directions of AI in thyroidology. Summary: AI has been explored in thyroidology since the 1990s, and currently, there is an increasing interest in applying AI to improve the care of patients with thyroid nodules (TNODs), thyroid cancer, and functional or autoimmune thyroid disease. These applications aim to automate processes, improve the accuracy and consistency of diagnosis, personalize treatment, decrease the burden for health care professionals, improve access to specialized care in areas lacking expertise, deepen the understanding of subtle pathophysiologic patterns, and accelerate the learning curve of less experienced clinicians. There are promising results for many of these applications. Yet, most are in the validation or early clinical evaluation stages. Only a few are currently adopted for risk stratification of TNODs by ultrasound and determination of the malignant nature of indeterminate TNODs by molecular testing. Challenges of the currently available AI applications include the lack of prospective and multicenter validations and utility studies, small and low diversity of training data sets, differences in data sources, lack of explainability, unclear clinical impact, inadequate stakeholder engagement, and inability to use outside of the research setting, which might limit the value of their future adoption. Conclusions: AI has the potential to improve many aspects of thyroidology; however, addressing the limitations affecting the suitability of AI interventions in thyroidology is a prerequisite to ensure that AI provides added value for patients with thyroid disease.


Subject(s)
Hashimoto Disease , Thyroid Nodule , Humans , Artificial Intelligence , Thyroid Nodule/diagnostic imaging , Thyroid Nodule/therapy , Ultrasonography , Multicenter Studies as Topic
16.
PLoS One ; 18(3): e0283800, 2023.
Article in English | MEDLINE | ID: mdl-37000801

ABSTRACT

BACKGROUND: The incorporation of information from clinical narratives is critical for computational phenotyping. The accurate interpretation of clinical terms highly depends on their associated context, especially the corresponding clinical section information. However, the heterogeneity across different Electronic Health Record (EHR) systems poses challenges in utilizing the section information. OBJECTIVES: Leveraging the eMERGE heart failure (HF) phenotyping algorithm, we assessed the heterogeneity quantitatively through the performance comparison of machine learning (ML) classifiers which map clinical sections containing HF-relevant terms across different EHR systems to standard sections in Health Level 7 (HL7) Clinical Document Architecture (CDA). METHODS: We experimented with both random forest models with sentence-embedding features and bidirectional encoder representations from transformers models. We trained MLs using an automated labeled corpus from an EHR system that adopted HL7 CDA standard. We assessed the performance using a blind test set (n = 300) from the same EHR system and a gold standard (n = 900) manually annotated from three other EHR systems. RESULTS: The F-measure of those ML models varied widely (0.00-0.91%), indicating MLs with one tuning parameter set were insufficient to capture sections across different EHR systems. The error analysis indicates that the section does not always comply with the corresponding standardized sections, leading to low performance. CONCLUSIONS: We presented the potential use of ML techniques to map the sections containing HF-relevant terms in multiple EHR systems to standard sections. However, the findings suggested that the quality and heterogeneity of section structure across different EHRs affect applications due to the poor adoption of documentation standards.


Subject(s)
Electronic Health Records , Heart Failure , Humans , Software , Algorithms , Machine Learning
17.
Stud Health Technol Inform ; 290: 794-798, 2022 Jun 06.
Article in English | MEDLINE | ID: mdl-35673127

ABSTRACT

Patient portals have been widely used by patients to enable timely communications with their providers via secure messaging for various issues including transportation barriers. The large volume of portal messages offers an invaluable opportunity for studying transportation barriers reported by patients. In this work, we explored the feasibility of cutting-edge deep learning techniques for identifying transportation issues mentioned in patient portal messages with deep semantic embeddings. The successful creation of annotated corpus and identification of 7 transportation issues showed the feasibility of this strategy. The developed annotated corpus could aid in developing an artificial intelligence tool to automatically identify transportation issues from millions of patient portal messages. The identified specific transportation issues and the analysis of patient demographics could shed light on how to reduce transportation gaps for patients.


Subject(s)
Patient Portals , Artificial Intelligence , Cluster Analysis , Communication , Humans , Semantics
18.
Stud Health Technol Inform ; 290: 173-177, 2022 Jun 06.
Article in English | MEDLINE | ID: mdl-35672994

ABSTRACT

Reproducibility is an important quality criterion for the secondary use of electronic health records (EHRs). However, multiple barriers to reproducibility are embedded in the heterogeneous EHR environment. These barriers include complex processes for collecting and organizing EHR data and dynamic multi-level interactions occurring during information use (e.g., inter-personal, inter-system, and cross-institutional). To ensure reproducible use of EHRs, we investigated four information quality dimensions and examine the implications for reproducibility based on a real-world EHR study. Four types of IQ measurements suggested that barriers to reproducibility occurred for all stages of secondary use of EHR data. We discussed our recommendations and emphasized the importance of promoting transparent, high-throughput, and accessible data infrastructures and implementation best practices (e.g., data quality assessment, reporting standard).


Subject(s)
Electronic Health Records , Reproducibility of Results
19.
JMIR Hum Factors ; 9(2): e35187, 2022 May 05.
Article in English | MEDLINE | ID: mdl-35171108

ABSTRACT

BACKGROUND: During the COVID-19 pandemic, patient portals and their message platforms allowed remote access to health care. Utilization patterns in patient messaging during the COVID-19 crisis have not been studied thoroughly. In this work, we propose characterizing patients and their use of asynchronous virtual care for COVID-19 via a retrospective analysis of patient portal messages. OBJECTIVE: This study aimed to perform a retrospective analysis of portal messages to probe asynchronous patient responses to the COVID-19 crisis. METHODS: We collected over 2 million patient-generated messages (PGMs) at Mayo Clinic during February 1 to August 31, 2020. We analyzed descriptive statistics on PGMs related to COVID-19 and incorporated patients' sociodemographic factors into the analysis. We analyzed the PGMs on COVID-19 in terms of COVID-19-related care (eg, COVID-19 symptom self-assessment and COVID-19 tests and results) and other health issues (eg, appointment cancellation, anxiety, and depression). RESULTS: The majority of PGMs on COVID-19 pertained to COVID-19 symptom self-assessment (42.50%) and COVID-19 tests and results (30.84%). The PGMs related to COVID-19 symptom self-assessment and COVID-19 test results had dynamic patterns and peaks similar to the newly confirmed cases in the United States and in Minnesota. The trend of PGMs related to COVID-19 care plans paralleled trends in newly hospitalized cases and deaths. After an initial peak in March, the PGMs on issues such as appointment cancellations and anxiety regarding COVID-19 displayed a declining trend. The majority of message senders were 30-64 years old, married, female, White, or urban residents. This majority was an even higher proportion among patients who sent portal messages on COVID-19. CONCLUSIONS: During the COVID-19 pandemic, patients increased portal messaging utilization to address health care issues about COVID-19 (in particular, symptom self-assessment and tests and results). Trends in message usage closely followed national trends in new cases and hospitalizations. There is a wide disparity for minority and rural populations in the use of PGMs for addressing the COVID-19 crisis.

20.
Front Digit Health ; 4: 958539, 2022.
Article in English | MEDLINE | ID: mdl-36238199

ABSTRACT

The secondary use of electronic health records (EHRs) faces challenges in the form of varying data quality-related issues. To address that, we retrospectively assessed the quality of functional status documentation in EHRs of persons participating in Mayo Clinic Study of Aging (MCSA). We used a convergent parallel design to collect quantitative and qualitative data and independently analyzed the findings. We discovered a heterogeneous documentation process, where the care practice teams, institutions, and EHR systems all play an important role in how text data is documented and organized. Four prevalent instrument-assisted documentation (iDoc) expressions were identified based on three distinct instruments: Epic smart form, questionnaire, and occupational therapy and physical therapy templates. We found strong differences in the usage, information quality (intrinsic and contextual), and naturality of language among different type of iDoc expressions. These variations can be caused by different source instruments, information providers, practice settings, care events and institutions. In addition, iDoc expressions are context specific and thus shall not be viewed and processed uniformly. We recommend conducting data quality assessment of unstructured EHR text prior to using the information.

SELECTION OF CITATIONS
SEARCH DETAIL