RESUMEN
From the start of the coronavirus disease 2019 (COVID-19) pandemic, researchers have looked to electronic health record (EHR) data as a way to study possible risk factors and outcomes. To ensure the validity and accuracy of research using these data, investigators need to be confident that the phenotypes they construct are reliable and accurate, reflecting the healthcare settings from which they are ascertained. We developed a COVID-19 registry at a single academic medical center and used data from March 1 to June 5, 2020 to assess differences in population-level characteristics in pandemic and non-pandemic years respectively. Median EHR length, previously shown to impact phenotype performance in type 2 diabetes, was significantly shorter in the SARS-CoV-2 positive group relative to a 2019 influenza tested group (median 3.1 years vs 8.7; Wilcoxon rank sum P = 1.3e-52). Using three phenotyping methods of increasing complexity (billing codes alone and domain-specific algorithms provided by an EHR vendor and clinical experts), common medical comorbidities were abstracted from COVID-19 EHRs, defined by the presence of a positive laboratory test (positive predictive value 100%, recall 93%). After combining performance data across phenotyping methods, we observed significantly lower false negative rates for those records billed for a comprehensive care visit (p = 4e-11) and those with complete demographics data recorded (p = 7e-5). In an early COVID-19 cohort, we found that phenotyping performance of nine common comorbidities was influenced by median EHR length, consistent with previous studies, as well as by data density, which can be measured using portable metrics including CPT codes. Here we present those challenges and potential solutions to creating deeply phenotyped, acute COVID-19 cohorts.
Asunto(s)
COVID-19/diagnóstico , Registros Electrónicos de Salud , Fenotipo , Comorbilidad , Diabetes Mellitus Tipo 2 , Salud Global , Humanos , Gripe Humana , Funciones de Verosimilitud , PandemiasRESUMEN
OBJECTIVE: Identifying symptoms and characteristics highly specific to coronavirus disease 2019 (COVID-19) would improve the clinical and public health response to this pandemic challenge. Here, we describe a high-throughput approach - Concept-Wide Association Study (ConceptWAS) - that systematically scans a disease's clinical manifestations from clinical notes. We used this method to identify symptoms specific to COVID-19 early in the course of the pandemic. METHODS: We created a natural language processing pipeline to extract concepts from clinical notes in a local ER corresponding to the PCR testing date for patients who had a COVID-19 test and evaluated these concepts as predictors for developing COVID-19. We identified predictors from Firth's logistic regression adjusted by age, gender, and race. We also performed ConceptWAS using cumulative data every two weeks to identify the timeline for recognition of early COVID-19-specific symptoms. RESULTS: We processed 87,753 notes from 19,692 patients subjected to COVID-19 PCR testing between March 8, 2020, and May 27, 2020 (1,483 COVID-19-positive). We found 68 concepts significantly associated with a positive COVID-19 test. We identified symptoms associated with increasing risk of COVID-19, including "anosmia" (odds ratio [OR] = 4.97, 95% confidence interval [CI] = 3.21-7.50), "fever" (OR = 1.43, 95% CI = 1.28-1.59), "cough with fever" (OR = 2.29, 95% CI = 1.75-2.96), and "ageusia" (OR = 5.18, 95% CI = 3.02-8.58). Using ConceptWAS, we were able to detect loss of smell and loss of taste three weeks prior to their inclusion as symptoms of the disease by the Centers for Disease Control and Prevention (CDC). CONCLUSION: ConceptWAS, a high-throughput approach for exploring specific symptoms and characteristics of a disease like COVID-19, offers a promise for enabling EHR-powered early disease manifestations identification.
Asunto(s)
COVID-19/diagnóstico , Procesamiento de Lenguaje Natural , Evaluación de Síntomas/métodos , Adulto , Ageusia , Prueba de Ácido Nucleico para COVID-19 , Tos , Femenino , Fiebre , Humanos , Masculino , Persona de Mediana Edad , Pandemias , Estados UnidosRESUMEN
OBJECTIVE: During the COVID-19 pandemic, health systems postponed non-essential medical procedures to accommodate surge of critically-ill patients. The long-term consequences of delaying procedures in response to COVID-19 remains unknown. We developed a high-throughput approach to understand the impact of delaying procedures on patient health outcomes using electronic health record (EHR) data. MATERIALS AND METHODS: We used EHR data from Vanderbilt University Medical Center's (VUMC) Research and Synthetic Derivatives. Elective procedures and non-urgent visits were suspended at VUMC between March 18, 2020 and April 24, 2020. Surgical procedure data from this period were compared to a similar timeframe in 2019. Potential adverse impact of delay in cardiovascular and cancer-related procedures was evaluated using EHR data collected from January 1, 1993 to March 17, 2020. For surgical procedure delay, outcomes included length of hospitalization (days), mortality during hospitalization, and readmission within six months. For screening procedure delay, outcomes included 5-year survival and cancer stage at diagnosis. RESULTS: We identified 416 surgical procedures that were negatively impacted during the COVID-19 pandemic compared to the same timeframe in 2019. Using retrospective data, we found 27 significant associations between procedure delay and adverse patient outcomes. Clinician review indicated that 88.9% of the significant associations were plausible and potentially clinically significant. Analytic pipelines for this study are available online. CONCLUSION: Our approach enables health systems to identify medical procedures affected by the COVID-19 pandemic and evaluate the effect of delay, enabling them to communicate effectively with patients and prioritize rescheduling to minimize adverse patient outcomes.
Asunto(s)
COVID-19/epidemiología , Enfermedades Cardiovasculares/diagnóstico , Enfermedades Cardiovasculares/cirugía , Neoplasias/diagnóstico , Neoplasias/cirugía , Pandemias , Tiempo de Tratamiento , Adulto , COVID-19/virología , Femenino , Humanos , Masculino , Persona de Mediana Edad , Estudios Retrospectivos , SARS-CoV-2/aislamiento & purificaciónRESUMEN
PURPOSE: The purpose of this study was to evaluate the safety and efficacy of intra-articular injections of autologous peripheral blood stem cells (PBSCs) plus hyaluronic acid (HA) after arthroscopic subchondral drilling into massive chondral defects of the knee joint and to determine whether PBSC therapy can improve functional outcome and reduce pain of the knee joint better than HA plus physiotherapy. METHODS: This is a dual-center randomized controlled trial (RCT). Sixty-nine patients aged 18 to 55 years with International Cartilage Repair Society grade 3 and 4 chondral lesions (size ≥3 cm2) of the knee joint were randomized equally into (1) a control group receiving intra-articular injections of HA plus physiotherapy and (2) an intervention group receiving arthroscopic subchondral drilling into chondral defects and postoperative intra-articular injections of PBSCs plus HA. The coprimary efficacy endpoints were subjective International Knee Documentation Committee (IKDC) and Knee Injury and Osteoarthritis Outcome Score (KOOS)-pain subdomain measured at month 24. The secondary efficacy endpoints included all other KOOS subdomains, Numeric Rating Scale (NRS), and Magnetic Resonance Observation of Cartilage Repair Tissue (MOCART) scores. RESULTS: At 24 months, the mean IKDC scores for the control and intervention groups were 48.1 and 65.6, respectively (P < .0001). The mean for KOOS-pain subdomain scores were 59.0 (control) and 86.0 (intervention) with P < .0001. All other KOOS subdomain, NRS, and MOCART scores were statistically significant (P < .0001) at month 24. Moreover, for the intervention group, 70.8% of patients had IKDC and KOOS-pain subdomain scores exceeding the minimal clinically important difference values, indicating clinical significance. There were no notable adverse events that were unexpected and related to the study drug or procedures. CONCLUSIONS: Arthroscopic marrow stimulation with subchondral drilling into massive chondral defects of the knee joint followed by postoperative intra-articular injections of autologous PBSCs plus HA is safe and showed a significant improvement of clinical and radiologic scores compared with HA plus physiotherapy. LEVEL OF EVIDENCE: Level I, RCT.
Asunto(s)
Artroplastia Subcondral , Cartílago Articular , Células Madre de Sangre Periférica , Cartílago Articular/cirugía , Estudios de Seguimiento , Humanos , Ácido Hialurónico , Articulación de la Rodilla/cirugía , Modalidades de FisioterapiaRESUMEN
PURPOSE: The primary objective of this study was to reproduce and validate the harvest, processing and storage of peripheral blood stem cells for a subsequent cartilage repair trial, evaluating safety, reliability, and potential to produce viable, sterile stem cells. METHODS: Ten healthy subjects (aged 19-44 years) received 3 consecutive daily doses of filgrastim followed by an apheresis harvest of mononuclear cells on a fourth day. In a clean room, the apheresis product was prepared for cryopreservation and processed into 4 mL aliquots. Sterility and qualification testing were performed pre-processing and post-processing at multiple time points out to 2 years. Eight samples were shipped internationally to validate cell transport potential. One sample from all participants was cultured to test proliferative potential with colony forming unit (CFU) assay. Five samples, from 5 participants were tested for differentiation potential, including chondrogenic, adipogenic, osteogenic, endoderm, and ectoderm assays. RESULTS: Fresh aliquots contained an average of 532.9 ± 166. × 106 total viable cells/4 mL vial and 2.1 ± 1.0 × 106 CD34+ cells/4 mL vial. After processing for cryopreservation, the average cell count decreased to 331.3 ± 79. × 106 total viable cells /4 mL vial and 1.5 ± 0.7 × 106 CD34+ cells/4 mL vial CD34+ cells. Preprocessing viability averaged 99% and postprocessing 88%. Viability remained constant after cryopreservation at all subsequent time points. All sterility testing was negative. All samples showed proliferative potential, with average CFU count 301.4 ± 63.9. All samples were pluripotent. CONCLUSIONS: Peripheral blood stem cells are pluripotent and can be safely harvested/stored with filgrastim, apheresis, clean-room processing, and cryopreservation. These cells can be stored for 2 years and shipped without loss of viability. CLINICAL RELEVANCE: This method represents an accessible stem cell therapy in development to augment cartilage repair.
Asunto(s)
Eliminación de Componentes Sanguíneos , Células Madre de Sangre Periférica , Cartílago , Ensayo de Unidades Formadoras de Colonias , Humanos , Reproducibilidad de los ResultadosRESUMEN
This Viewpoint posits that to improve public understanding of the system, the Vaccine Adverse Event Reporting System (VAERS) could use a more accurate name, well-defined guidance about the reporting system's nature and use, and comprehensible information about an event's verification status.
Asunto(s)
Sistemas de Registro de Reacción Adversa a Medicamentos , Comunicación , Vacunas , Estados Unidos , Vacunas/efectos adversosRESUMEN
Biofilm organisms such as diatoms are potential regulators of global macrofouling dispersal because they ubiquitously colonize submerged surfaces, resist antifouling efforts and frequently alter larval recruitment. Although ships continually deliver biofilms to foreign ports, it is unclear how transport shapes biofilm microbial structure and subsequent macrofouling colonization. This study demonstrates that different ship hull coatings and transport methods change diatom assemblage composition in transported coastal marine biofilms. Assemblages carried on the hull experienced significant cell losses and changes in composition through hydrodynamic stress, whereas those that underwent sheltered transport, even through freshwater, were largely unaltered. Coatings and their associated biofilms shaped distinct macrofouling communities and affected recruitment for one third of all species, while biofilms from different transport treatments had little effect on macrofouling colonization. These results demonstrate that transport conditions can shape diatom assemblages in biofilms carried by ships, but the properties of the underlying coatings are mainly responsible for subsequent macrofouling. The methods by which organisms colonize and are transferred by ships have implications for their distribution, establishment and invasion success.
Asunto(s)
Biopelículas/crecimiento & desarrollo , Incrustaciones Biológicas/prevención & control , Diatomeas/crecimiento & desarrollo , Navíos , Diatomeas/fisiología , Florida , Agua Dulce/química , Hidrodinámica , Salinidad , Agua de Mar/química , Estrés FisiológicoRESUMEN
Adults with type 2 diabetes (T2DM) and low socioeconomic status (SES) have high rates of medication nonadherence, and, in turn, suboptimal glycemic control (hemoglobin A1c [HbA1c]). We tested the initial efficacy of a short message service (SMS) text messaging and interactive voice response (IVR) intervention to promote adherence among this high-risk group. Eighty low SES, diverse adults with T2DM used the MEssaging for Diabetes (MED) SMS/IVR intervention for 3 months. We used a pre-post single group design to explore adherence changes over 3 months, and a quasi-experimental design to test the impact of MED on HbA1c among the intervention group relative to a matched, archival control group. Compared to baseline, adherence improved at one (AOR 3.88, 95 % CI 1.79, 10.86) and at 2 months (AOR 3.76, 95 % CI 1.75, 17.44), but not at 3 months. HbA1c remained stable, with no differences at 3 months between the intervention group and the control group. MED had a positive, short-term impact on adherence, which did not translate to improvements in HbA1c. Future research should explore the longer-term impact of SMS/IVR interventions on the medication adherence of high risk adults with T2DM.
Asunto(s)
Cumplimiento de la Medicación , Envío de Mensajes de Texto , Adulto , Estudios de Casos y Controles , Diabetes Mellitus Tipo 2/sangre , Diabetes Mellitus Tipo 2/tratamiento farmacológico , Femenino , Hemoglobina Glucada/metabolismo , Humanos , Masculino , Persona de Mediana Edad , Clase SocialRESUMEN
BACKGROUND: Asthma is one of the most common childhood illnesses. Guideline-driven clinical care positively affects patient outcomes for care. There are several asthma guidelines and reminder methods for implementation to help integrate them into clinical workflow. Our goal is to determine the most prevalent method of guideline implementation; establish which methods significantly improved clinical care; and identify the factors most commonly associated with a successful and sustainable implementation. METHODS: PUBMED (MEDLINE), OVID CINAHL, ISI Web of Science, and EMBASE. STUDY SELECTION: Studies were included if they evaluated an asthma protocol or prompt, evaluated an intervention, a clinical trial of a protocol implementation, and qualitative studies as part of a protocol intervention. Studies were excluded if they had non-human subjects, were studies on efficacy and effectiveness of drugs, did not include an evaluation component, studied an educational intervention only, or were a case report, survey, editorial, letter to the editor. RESULTS: From 14,478 abstracts, we included 101 full-text articles in the analysis. The most frequent study design was pre-post, followed by prospective, population based case series or consecutive case series, and randomized trials. Paper-based reminders were the most frequent with fully computerized, then computer generated, and other modalities. No study reported a decrease in health care practitioner performance or declining patient outcomes. The most common primary outcome measure was compliance with provided or prescribing guidelines, key clinical indicators such as patient outcomes or quality of life, and length of stay. CONCLUSIONS: Paper-based implementations are by far the most popular approach to implement a guideline or protocol. The number of publications on asthma protocol reminder systems is increasing. The number of computerized and computer-generated studies is also increasing. Asthma guidelines generally improved patient care and practitioner performance regardless of the implementation method.
Asunto(s)
Asma , Protocolos Clínicos , Humanos , Asma/terapia , Guías de Práctica Clínica como Asunto , Sistemas Recordatorios/estadística & datos numéricosRESUMEN
Light and nitrogen availability are basic requirements for photosynthesis. Changing in light intensity and nitrogen concentration may require adaptive physiological and life process changes in phytoplankton cells. Our previous study demonstrated that two Thalassiosira species exhibited, respectively, distinctive physiological responses to light and nitrogen stresses. Transcriptomic analyses were employed to investigate the mechanisms behind the different physiological responses observed in two diatom species of the genus Thalassiosira. The results indicate that the congeneric species are different in their cellular responses to the same shifting light and nitrogen conditions. When conditions changed to high light with low nitrate (HLLN), the large-celled T. punctigera was photodamaged. Thus, the photosynthesis pathway and carbon fixation related genes were significantly down-regulated. In contrast, the small-celled T. pseudonana sacrificed cellular processes, especially amino acid metabolisms, to overcome the photodamage. When changing to high light with high nitrate (HLHN) conditions, the additional nitrogen appeared to compensate for the photodamage in the large-celled T. punctigera, with the tricarboxylic acid cycle (TCA cycle) and carbon fixation significantly boosted. Consequently, the growth rate of T. punctigera increased, which suggest that the larger-celled species is adapted for forming post-storm algal blooms. The impact of high light stress on the small-celled T. pseudonana was not mitigated by elevated nitrate levels, and photodamage persisted.
RESUMEN
BACKGROUND: Electronic health records (EHRs) present navigation challenges due to time-consuming searches across segmented data. Voice assistants can improve clinical workflows by allowing natural language queries and contextually aware navigation of the EHR. OBJECTIVES: To develop a voice-mediated EHR assistant and interview providers to inform its future refinement. METHODS: The Vanderbilt EHR Voice Assistant (VEVA) was developed as a responsive web application and designed to accept voice inputs and execute the appropriate EHR commands. Fourteen providers from Vanderbilt Medical Center were recruited to participate in interactions with VEVA and to share their experience with the technology. The purpose was to evaluate VEVA's overall usability, gather qualitative feedback, and detail suggestions for enhancing its performance. RESULTS: VEVA's mean system usability scale score was 81 based on the 14 providers' evaluations, which was above the standard 50th percentile score of 68. For all five summaries evaluated (overview summary, A1C results, blood pressure, weight, and health maintenance), most providers offered a positive review of VEVA. Several providers suggested modifications to make the technology more useful in their practice, ranging from summarizing current medications to changing VEVA's speech rate. Eight of the providers (64%) reported they would be willing to use VEVA in its current form. CONCLUSION: Our EHR voice assistant technology was deemed usable by most providers. With further improvements, voice assistant tools such as VEVA have the potential to improve workflows and serve as a useful adjunct tool in health care.
Asunto(s)
Registros Electrónicos de Salud , Programas Informáticos , Lenguaje , TecnologíaRESUMEN
OBJECTIVE: The American Medical Informatics Association (AMIA) Task Force on Diversity, Equity, and Inclusion (DEI) was established to address systemic racism and health disparities in biomedical and health informatics, aligning with AMIA's mission to transform healthcare. AMIA's DEI initiatives were spurred by member voices responding to police brutality and COVID-19's impact on Black/African American communities. MATERIALS AND METHODS: The Task Force, consisting of 20 members across 3 groups aligned with AMIA's 2020-2025 Strategic Plan, met biweekly to develop DEI recommendations with the help of 16 additional volunteers. These recommendations were reviewed, prioritized, and presented to the AMIA Board of Directors for approval. RESULTS: In 9 months, the Task Force (1) created a logic model to support workforce diversity and raise AMIA's DEI awareness, (2) conducted an environmental scan of other associations' DEI activities, (3) developed a DEI framework for AMIA meetings, (4) gathered member feedback, (5) cultivated DEI educational resources, (6) created a Board nominations and diversity session, (7) reviewed the Board's Strategic Planning for DEI alignment, (8) led a program to increase diversity at the 2020 AMIA Virtual Annual Symposium, and (9) standardized socially-assigned race and ethnicity data collection. DISCUSSION: The Task Force proposed actionable recommendations that focused on AMIA's role in addressing systemic racism and health equity, helping the organization understand its member diversity. CONCLUSION: This work supported marginalized groups, broadened the research agenda, and positioned AMIA as a DEI leader while reinforcing the need for ongoing transformation within informatics.
RESUMEN
OBJECTIVE: Large-language models (LLMs) can potentially revolutionize health care delivery and research, but risk propagating existing biases or introducing new ones. In epilepsy, social determinants of health are associated with disparities in care access, but their impact on seizure outcomes among those with access remains unclear. Here we (1) evaluated our validated, epilepsy-specific LLM for intrinsic bias, and (2) used LLM-extracted seizure outcomes to determine if different demographic groups have different seizure outcomes. MATERIALS AND METHODS: We tested our LLM for differences and equivalences in prediction accuracy and confidence across demographic groups defined by race, ethnicity, sex, income, and health insurance, using manually annotated notes. Next, we used LLM-classified seizure freedom at each office visit to test for demographic outcome disparities, using univariable and multivariable analyses. RESULTS: We analyzed 84 675 clinic visits from 25 612 unique patients seen at our epilepsy center. We found little evidence of bias in the prediction accuracy or confidence of outcome classifications across demographic groups. Multivariable analysis indicated worse seizure outcomes for female patients (OR 1.33, P ≤ .001), those with public insurance (OR 1.53, P ≤ .001), and those from lower-income zip codes (OR ≥1.22, P ≤ .007). Black patients had worse outcomes than White patients in univariable but not multivariable analysis (OR 1.03, P = .66). CONCLUSION: We found little evidence that our LLM was intrinsically biased against any demographic group. Seizure freedom extracted by LLM revealed disparities in seizure outcomes across several demographic groups. These findings quantify the critical need to reduce disparities in the care of people with epilepsy.
Asunto(s)
Epilepsia , Disparidades en Atención de Salud , Convulsiones , Humanos , Femenino , Masculino , Adulto , Persona de Mediana Edad , Procesamiento de Lenguaje Natural , Determinantes Sociales de la Salud , Adolescente , Adulto Joven , LenguajeRESUMEN
Importance: The Sentinel System is a key component of the US Food and Drug Administration (FDA) postmarketing safety surveillance commitment and uses clinical health care data to conduct analyses to inform drug labeling and safety communications, FDA advisory committee meetings, and other regulatory decisions. However, observational data are frequently deemed insufficient for reliable evaluation of safety concerns owing to limitations in underlying data or methodology. Advances in large language models (LLMs) provide new opportunities to address some of these limitations. However, careful consideration is necessary for how and where LLMs can be effectively deployed for these purposes. Observations: LLMs may provide new avenues to support signal-identification activities to identify novel adverse event signals from narrative text of electronic health records. These algorithms may be used to support epidemiologic investigations examining the causal relationship between exposure to a medical product and an adverse event through development of probabilistic phenotyping of health outcomes of interest and extraction of information related to important confounding factors. LLMs may perform like traditional natural language processing tools by annotating text with controlled vocabularies with additional tailored training activities. LLMs offer opportunities for enhancing information extraction from adverse event reports, medical literature, and other biomedical knowledge sources. There are several challenges that must be considered when leveraging LLMs for postmarket surveillance. Prompt engineering is needed to ensure that LLM-extracted associations are accurate and specific. LLMs require extensive infrastructure to use, which many health care systems lack, and this can impact diversity, equity, and inclusion, and result in obscuring significant adverse event patterns in some populations. LLMs are known to generate nonfactual statements, which could lead to false positive signals and downstream evaluation activities by the FDA and other entities, incurring substantial cost. Conclusions and Relevance: LLMs represent a novel paradigm that may facilitate generation of information to support medical product postmarket surveillance activities that have not been possible. However, additional work is required to ensure LLMs can be used in a fair and equitable manner, minimize false positive findings, and support the necessary rigor of signal detection needed for regulatory activities.
Asunto(s)
Procesamiento de Lenguaje Natural , Vigilancia de Productos Comercializados , United States Food and Drug Administration , Vigilancia de Productos Comercializados/métodos , Humanos , Estados Unidos , Registros Electrónicos de SaludRESUMEN
OBJECTIVES: Efforts to reduce documentation burden (DocBurden) for all health professionals (HP) are aligned with national initiatives to improve clinician wellness and patient safety. Yet DocBurden has not been precisely defined, limiting national conversations and rigorous, reproducible, and meaningful measures. Increasing attention to DocBurden motivated this work to establish a standard definition of DocBurden, with the emergence of excessive DocBurden as a term. METHODS: We conducted a scoping review of DocBurden definitions and descriptions, searching six databases for scholarly, peer-reviewed, and gray literature sources, using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extensions for Scoping Review guidance. For the concept clarification phase of work, we used the American Nursing Informatics Association's Six Domains of Burden Framework. RESULTS: A total of 153 articles were included based on a priori criteria. Most articles described a focus on DocBurden, but only 18% (n = 28) provided a definition. We define excessive DocBurden as the stress and unnecessarily heavy work an HP or health care team experiences when usability of documentation systems and documentation activities (i.e., generation, review, analysis, and synthesis of patient data) are not aligned in support of care delivery. A negative connotation was attached to burden without a neutral state in included sources, which does not align with dictionary definitions of burden. CONCLUSION: Existing literature does not distinguish between a baseline or required task load to conduct patient care resulting from usability issues (DocBurden), and the unnecessarily heavy tasks and requirements that contribute to excessive DocBurden. Our definition of excessive DocBurden explicitly acknowledges this distinction, to support development of meaningful measures for understanding and intervening on excessive DocBurden locally, nationally, and internationally.
Asunto(s)
Documentación , Personal de Salud , Humanos , Carga de TrabajoRESUMEN
Post marketing safety surveillance depends in part on the ability to detect concerning clinical events at scale. Spontaneous reporting might be an effective component of safety surveillance, but it requires awareness and understanding among healthcare professionals to achieve its potential. Reliance on readily available structured data such as diagnostic codes risks under-coding and imprecision. Clinical textual data might bridge these gaps, and natural language processing (NLP) has been shown to aid in scalable phenotyping across healthcare records in multiple clinical domains. In this study, we developed and validated a novel incident phenotyping approach using unstructured clinical textual data agnostic to Electronic Health Record (EHR) and note type. It's based on a published, validated approach (PheRe) used to ascertain social determinants of health and suicidality across entire healthcare records. To demonstrate generalizability, we validated this approach on two separate phenotypes that share common challenges with respect to accurate ascertainment: (1) suicide attempt; (2) sleep-related behaviors. With samples of 89,428 records and 35,863 records for suicide attempt and sleep-related behaviors, respectively, we conducted silver standard (diagnostic coding) and gold standard (manual chart review) validation. We showed Area Under the Precision-Recall Curve of ~ 0.77 (95% CI 0.75-0.78) for suicide attempt and AUPR ~ 0.31 (95% CI 0.28-0.34) for sleep-related behaviors. We also evaluated performance by coded race and demonstrated differences in performance by race differed across phenotypes. Scalable phenotyping models, like most healthcare AI, require algorithmovigilance and debiasing prior to implementation.
Asunto(s)
Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Humanos , Modelos Estadísticos , Femenino , Masculino , Intento de Suicidio , Adulto , Persona de Mediana EdadRESUMEN
OBJECTIVES: Automated phenotyping algorithms can reduce development time and operator dependence compared to manually developed algorithms. One such approach, PheNorm, has performed well for identifying chronic health conditions, but its performance for acute conditions is largely unknown. Herein, we implement and evaluate PheNorm applied to symptomatic COVID-19 disease to investigate its potential feasibility for rapid phenotyping of acute health conditions. MATERIALS AND METHODS: PheNorm is a general-purpose automated approach to creating computable phenotype algorithms based on natural language processing, machine learning, and (low cost) silver-standard training labels. We applied PheNorm to cohorts of potential COVID-19 patients from 2 institutions and used gold-standard manual chart review data to investigate the impact on performance of alternative feature engineering options and implementing externally trained models without local retraining. RESULTS: Models at each institution achieved AUC, sensitivity, and positive predictive value of 0.853, 0.879, 0.851 and 0.804, 0.976, and 0.885, respectively, at quantiles of model-predicted risk that maximize F1. We report performance metrics for all combinations of silver labels, feature engineering options, and models trained internally versus externally. DISCUSSION: Phenotyping algorithms developed using PheNorm performed well at both institutions. Performance varied with different silver-standard labels and feature engineering options. Models developed locally at one site also worked well when implemented externally at the other site. CONCLUSION: PheNorm models successfully identified an acute health condition, symptomatic COVID-19. The simplicity of the PheNorm approach allows it to be applied at multiple study sites with substantially reduced overhead compared to traditional approaches.