ABSTRACT
BACKGROUND: Electronic health records (EHRs) are a rich source of health information; however social determinants of health, including incarceration, and how they impact health and health care disparities can be hard to extract. OBJECTIVE: The main objective of this study was to compare sensitivity and specificity of patient self-report with various methods of identifying incarceration exposure using the EHR. RESEARCH DESIGN: Validation study using multiple data sources and types. SUBJECTS: Participants of the Veterans Aging Cohort Study (VACS), a national observational cohort based on data from the Veterans Health Administration (VHA) EHR that includes all human immunodeficiency virus-infected patients in care (47,805) and uninfected patients (99,060) matched on region, age, race/ethnicity, and sex. MEASURES AND DATA SOURCES: Self-reported incarceration history compared with: (1) linked VHA EHR data to administrative data from a state Department of Correction (DOC), (2) linked VHA EHR data to administrative data on incarceration from Centers for Medicare and Medicaid Services (CMS), (3) VHA EHR-specific identifier codes indicative of receipt of VHA incarceration reentry services, and (4) natural language processing (NLP) in unstructured text in VHA EHR. RESULTS: Linking the EHR to DOC data: sensitivity 2.5%, specificity 100%; linking the EHR to CMS data: sensitivity 7.9%, specificity 99.3%; VHA EHR-specific identifier for receipt of reentry services: sensitivity 7.3%, specificity 98.9%; and NLP, sensitivity 63.5%, specificity 95.9%. CONCLUSIONS: NLP tools hold promise as a feasible and valid method to identify individuals with exposure to incarceration in EHR. Future work should expand this approach using a larger body of documents and refinement of the methods, which may further improve operating characteristics of this method.
Subject(s)
Administrative Claims, Healthcare/statistics & numerical data , Electronic Health Records/statistics & numerical data , Natural Language Processing , Prisoners/statistics & numerical data , Self Report , Veterans/statistics & numerical data , Adult , Cohort Studies , Ethnicity , Female , Humans , Information Storage and Retrieval , Male , Medicare/statistics & numerical data , Middle Aged , Sensitivity and Specificity , United States , United States Department of Veterans AffairsABSTRACT
Introduction. The risk of infectious disease transmission, including COVID-19, is disproportionately high in correctional facilities due to close living conditions, relatively low levels of vaccination, and reduced access to testing and treatment. While much progress has been made on describing and mitigating COVID-19 and other infectious disease risk in jails and prisons, there are open questions about which data can best predict future outbreaks. Methods. We used facility data and demographic and health data collected from 24 prison facilities in the Pennsylvania Department of Corrections from March 2020 to May 2021 to determine which sources of data best predict a coming COVID-19 outbreak in a prison facility. We used machine learning methods to cluster the prisons into groups based on similar facility-level characteristics, including size, rurality, and demographics of incarcerated people. We developed logistic regression classification models to predict for each cluster, before and after vaccine availability, whether there would be no cases, an outbreak defined as 2 or more cases, or a large outbreak, defined as 10 or more cases in the next 1, 2, and 3 d. We compared these predictions to data on outbreaks that occurred. Results. Facilities were divided into 8 clusters of sizes varying from 1 to 7 facilities per cluster. We trained 60 logistic regressions; 20 had test sets with between 35% and 65% of days with outbreaks detected. Of these, 8 logistic regressions correctly predicted the occurrence of an outbreak more than 55% of the time. The most common predictive feature was incident cases among the incarcerated population from 2 to 32 d prior. Other predictive features included the number of tests administered from 1 to 33 d prior, total population, test positivity rate, and county deaths, hospitalizations, and incident cases. Cumulative cases, vaccination rates, and race, ethnicity, or age statistics for incarcerated populations were generally not predictive. Conclusions. County-level measures of COVID-19, facility population, and test positivity rate appear as potential promising predictors of COVID-19 outbreaks in correctional facilities, suggesting that correctional facilities should monitor community transmission in addition to facility transmission to inform future outbreak response decisions. These efforts should not be limited to COVID-19 but should include any large-scale infectious disease outbreak that may involve institution-community transmission. Highlights: The risk of infectious disease transmission, including COVID-19, is disproportionately high in correctional facilities.We used machine learning methods with data collected from 24 prison facilities in the Pennsylvania Department of Corrections to determine which sources of data best predict a coming COVID-19 outbreak in a prison facility.Key predictors included county-level measures of COVID-19, facility population, and the test positivity rate in a facility.Fortifying correctional facilities with the ability to monitor local community rates of infection (e.g., though improved interagency collaboration and data sharing) along with continued testing of incarcerated people and staff can help correctional facilities better predict-and respond to-future infectious disease outbreaks.