RESUMEN
Deep learning models have shown promise in histopathology image analysis, but their opaque decision-making process poses challenges in high-risk medical scenarios. Here we introduce HIPPO, an explainable AI method that interrogates attention-based multiple instance learning (ABMIL) models in computational pathology by generating counterfactual examples through tissue patch modifications in whole slide images. Applying HIPPO to ABMIL models trained to detect breast cancer metastasis reveals that they may overlook small tumors and can be misled by non-tumor tissue, while attention maps-widely used for interpretation-often highlight regions that do not directly influence predictions. By interpreting ABMIL models trained on a prognostic prediction task, HIPPO identified tissue areas with stronger prognostic effects than high-attention regions, which sometimes showed counterintuitive influences on risk scores. These findings demonstrate HIPPO's capacity for comprehensive model evaluation, bias detection, and quantitative hypothesis testing. HIPPO greatly expands the capabilities of explainable AI tools to assess the trustworthy and reliable development, deployment, and regulation of weakly-supervised models in computational pathology.
RESUMEN
Large-scale, multi-site collaboration is becoming indispensable for a wide range of research and clinical activities in oncology. To facilitate the next generation of advances in cancer biology, precision oncology and the population sciences it will be necessary to develop and implement data management and analytic tools that empower investigators to reliably and objectively detect, characterize and chronicle the phenotypic and genomic changes that occur during the transformation from the benign to cancerous state and throughout the course of disease progression. To facilitate these efforts it is incumbent upon the informatics community to establish the workflows and architectures that automate the aggregation and organization of a growing range and number of clinical data types and modalities ranging from new molecular and laboratory tests to sophisticated diagnostic imaging studies. In an attempt to meet those challenges, leading health care centers across the country are making steep investments to establish enterprise-wide, data warehouses. A significant limitation of many data warehouses, however, is that they are designed to support only alphanumeric information. In contrast to those traditional designs, the system that we have developed supports automated collection and mining of multimodal data including genomics, digital pathology and radiology images. In this paper, our team describes the design, development and implementation of a multi-modal, Clinical & Research Data Warehouse (CRDW) that is tightly integrated with a suite of computational and machine-learning tools to provide actionable insight into the underlying characteristics of the tumor environment that would not be revealed using standard methods and tools. The System features a flexible Extract, Transform and Load (ETL) interface that enables it to adapt to aggregate data originating from different clinical and research sources depending on the specific EHR and other data sources utilized at a given deployment site.
RESUMEN
Digital pathology has seen a proliferation of deep learning models in recent years, but many models are not readily reusable. To address this challenge, we developed WSInfer: an open-source software ecosystem designed to streamline the sharing and reuse of deep learning models for digital pathology. The increased access to trained models can augment research on the diagnostic, prognostic, and predictive capabilities of digital pathology.
RESUMEN
Inflammatory bowel disease (IBD) is characterized by chronic, dysregulated inflammation in the gastrointestinal tract. The heterogeneity of IBD is reflected through two major subtypes, Crohn's Disease (CD) and Ulcerative Colitis (UC). CD and UC differ across symptomatic presentation, histology, immune responses, and treatment. While colitis mouse models have been influential in deciphering IBD pathogenesis, no single model captures the full heterogeneity of clinical disease. The translational capacity of mouse models may be augmented by shifting to multi-mouse model studies that aggregate analysis across various well-controlled phenotypes. Here, we evaluate the value of histology in multi-mouse model characterizations by building upon a previous pipeline that detects histological disease classes in hematoxylin and eosin (H&E)-stained murine colons. Specifically, we map immune marker positivity across serially-sectioned slides to H&E histological classes across the dextran sodium sulfate (DSS) chemical induction model and the intestinal epithelium-specific, inducible Villin-CreERT2;Klf5fl/fl (Klf5ΔIND) genetic model. In this study, we construct the beginning frameworks to define H&E-patch-based immunophenotypes based on IHC-H&E mappings.
Asunto(s)
Colitis Ulcerosa , Colitis , Enfermedad de Crohn , Enfermedades Inflamatorias del Intestino , Animales , Ratones , Colitis/inducido químicamente , Fenotipo , Inflamación , Modelos Animales de EnfermedadRESUMEN
Despite recent methodology advancements in clinical natural language processing (NLP), the adoption of clinical NLP models within the translational research community remains hindered by process heterogeneity and human factor variations. Concurrently, these factors also dramatically increase the difficulty in developing NLP models in multi-site settings, which is necessary for algorithm robustness and generalizability. Here, we reported on our experience developing an NLP solution for Coronavirus Disease 2019 (COVID-19) signs and symptom extraction in an open NLP framework from a subset of sites participating in the National COVID Cohort (N3C). We then empirically highlight the benefits of multi-site data for both symbolic and statistical methods, as well as highlight the need for federated annotation and evaluation to resolve several pitfalls encountered in the course of these efforts.
Asunto(s)
COVID-19 , Procesamiento de Lenguaje Natural , Humanos , Registros Electrónicos de Salud , AlgoritmosRESUMEN
BACKGROUND AND OBJECTIVE: Histopathology is the gold standard for diagnosis of many cancers. Recent advances in computer vision, specifically deep learning, have facilitated the analysis of histopathology images for many tasks, including the detection of immune cells and microsatellite instability. However, it remains difficult to identify optimal models and training configurations for different histopathology classification tasks due to the abundance of available architectures and the lack of systematic evaluations. Our objective in this work is to present a software tool that addresses this need and enables robust, systematic evaluation of neural network models for patch classification in histology in a light-weight, easy-to-use package for both algorithm developers and biomedical researchers. METHODS: Here we present ChampKit (Comprehensive Histopathology Assessment of Model Predictions toolKit): an extensible, fully reproducible evaluation toolkit that is a one-stop-shop to train and evaluate deep neural networks for patch classification. ChampKit curates a broad range of public datasets. It enables training and evaluation of models supported by timm directly from the command line, without the need for users to write any code. External models are enabled through a straightforward API and minimal coding. As a result, Champkit facilitates the evaluation of existing and new models and deep learning architectures on pathology datasets, making it more accessible to the broader scientific community. To demonstrate the utility of ChampKit, we establish baseline performance for a subset of possible models that could be employed with ChampKit, focusing on several popular deep learning models, namely ResNet18, ResNet50, and R26-ViT, a hybrid vision transformer. In addition, we compare each model trained either from random weight initialization or with transfer learning from ImageNet pretrained models. For ResNet18, we also consider transfer learning from a self-supervised pretrained model. RESULTS: The main result of this paper is the ChampKit software. Using ChampKit, we were able to systemically evaluate multiple neural networks across six datasets. We observed mixed results when evaluating the benefits of pretraining versus random intialization, with no clear benefit except in the low data regime, where transfer learning was found to be beneficial. Surprisingly, we found that transfer learning from self-supervised weights rarely improved performance, which is counter to other areas of computer vision. CONCLUSIONS: Choosing the right model for a given digital pathology dataset is nontrivial. ChampKit provides a valuable tool to fill this gap by enabling the evaluation of hundreds of existing (or user-defined) deep learning models across a variety of pathology tasks. Source code and data for the tool are freely accessible at https://github.com/SBU-BMI/champkit.
Asunto(s)
Neoplasias , Redes Neurales de la Computación , Humanos , Algoritmos , Programas Informáticos , Técnicas HistológicasRESUMEN
BACKGROUND: AKI is associated with mortality in patients hospitalized with coronavirus disease 2019 (COVID-19); however, its incidence, geographic distribution, and temporal trends since the start of the pandemic are understudied. METHODS: Electronic health record data were obtained from 53 health systems in the United States in the National COVID Cohort Collaborative. We selected hospitalized adults diagnosed with COVID-19 between March 6, 2020, and January 6, 2022. AKI was determined with serum creatinine and diagnosis codes. Time was divided into 16-week periods (P1-6) and geographical regions into Northeast, Midwest, South, and West. Multivariable models were used to analyze the risk factors for AKI or mortality. RESULTS: Of a total cohort of 336,473, 129,176 (38%) patients had AKI. Fifty-six thousand three hundred and twenty-two (17%) lacked a diagnosis code but had AKI based on the change in serum creatinine. Similar to patients coded for AKI, these patients had higher mortality compared with those without AKI. The incidence of AKI was highest in P1 (47%; 23,097/48,947), lower in P2 (37%; 12,102/32,513), and relatively stable thereafter. Compared with the Midwest, the Northeast, South, and West had higher adjusted odds of AKI in P1. Subsequently, the South and West regions continued to have the highest relative AKI odds. In multivariable models, AKI defined by either serum creatinine or diagnostic code and the severity of AKI was associated with mortality. CONCLUSIONS: The incidence and distribution of COVID-19-associated AKI changed since the first wave of the pandemic in the United States. PODCAST: This article contains a podcast at https://dts.podtrac.com/redirect.mp3/www.asn-online.org/media/podcast/CJASN/2023_08_08_CJN0000000000000192.mp3.
Asunto(s)
Lesión Renal Aguda , COVID-19 , Adulto , Humanos , COVID-19/complicaciones , COVID-19/epidemiología , Estudios Retrospectivos , Creatinina , Factores de Riesgo , Lesión Renal Aguda/diagnóstico , Mortalidad HospitalariaRESUMEN
Background: Acute kidney injury (AKI) is associated with mortality in patients hospitalized with COVID-19, however, its incidence, geographic distribution, and temporal trends since the start of the pandemic are understudied. Methods: Electronic health record data were obtained from 53 health systems in the United States (US) in the National COVID Cohort Collaborative (N3C). We selected hospitalized adults diagnosed with COVID-19 between March 6th, 2020, and January 6th, 2022. AKI was determined with serum creatinine (SCr) and diagnosis codes. Time were divided into 16-weeks (P1-6) periods and geographical regions into Northeast, Midwest, South, and West. Multivariable models were used to analyze the risk factors for AKI or mortality. Results: Out of a total cohort of 306,061, 126,478 (41.0 %) patients had AKI. Among these, 17.9% lacked a diagnosis code but had AKI based on the change in SCr. Similar to patients coded for AKI, these patients had higher mortality compared to those without AKI. The incidence of AKI was highest in P1 (49.3%), reduced in P2 (40.6%), and relatively stable thereafter. Compared to the Midwest, the Northeast, South, and West had higher adjusted AKI incidence in P1, subsequently, the South and West regions continued to have the highest relative incidence. In multivariable models, AKI defined by either SCr or diagnostic code, and the severity of AKI was associated with mortality. Conclusions: Uncoded cases of COVID-19-associated AKI are common and associated with mortality. The incidence and distribution of COVID-19-associated AKI have changed since the first wave of the pandemic in the US.
RESUMEN
Inflammatory bowel disease (IBD) is a chronic immune-mediated disease of the gastrointestinal tract. While therapies exist, response can be limited within the patient population. Researchers have thus studied mouse models of colitis to further understand pathogenesis and identify new treatment targets. Flow cytometry and RNA-sequencing can phenotype immune populations with single-cell resolution but provide no spatial context. Spatial context may be particularly important in colitis mouse models, due to the simultaneous presence of colonic regions that are involved or uninvolved with disease. These regions can be identified on hematoxylin and eosin (H&E)-stained colonic tissue slides based on the presence of abnormal or normal histology. However, detection of such regions requires expert interpretation by pathologists. This can be a tedious process that may be difficult to perform consistently across experiments. To this end, we trained a deep learning model to detect 'Involved' and 'Uninvolved' regions from H&E-stained colonic tissue slides. Our model was trained on specimens from controls and three mouse models of colitis-the dextran sodium sulfate (DSS) chemical induction model, the recently established intestinal epithelium-specific, inducible Klf5ΔIND (Villin-CreERT2;Klf5fl/fl) genetic model, and one that combines both induction methods. Image patches predicted to be 'Involved' and 'Uninvolved' were extracted across mice to cluster and identify histological classes. We quantified the proportion of 'Uninvolved' patches and 'Involved' patch classes in murine swiss-rolled colons. Furthermore, we trained linear determinant analysis classifiers on these patch proportions to predict mouse model and clinical score bins in a prospectively treated cohort of mice. Such a pipeline has the potential to reveal histological links and improve synergy between various colitis mouse model studies to identify new therapeutic targets and pathophysiological mechanisms.
Asunto(s)
Colitis , Aprendizaje Profundo , Animales , Colon/patología , Sulfato de Dextran/toxicidad , Modelos Animales de Enfermedad , Humanos , Ratones , Ratones Endogámicos C57BLRESUMEN
Background: Deep learning methods have demonstrated remarkable performance in pathology image analysis, but they are computationally very demanding. The aim of our study is to reduce their computational cost to enable their use with large tissue image datasets. Methods: We propose a method called Network Auto-Reduction (NAR) that simplifies a Convolutional Neural Network (CNN) by reducing the network to minimize the computational cost of doing a prediction. NAR performs a compound scaling in which the width, depth, and resolution dimensions of the network are reduced together to maintain a balance among them in the resulting simplified network. We compare our method with a state-of-the-art solution called ResRep. The evaluation is carried out with popular CNN architectures and a real-world application that identifies distributions of tumor-infiltrating lymphocytes in tissue images. Results: The experimental results show that both ResRep and NAR are able to generate simplified, more efficient versions of ResNet50 V2. The simplified versions by ResRep and NAR require 1.32× and 3.26× fewer floating-point operations (FLOPs), respectively, than the original network without a loss in classification power as measured by the Area under the Curve (AUC) metric. When applied to a deeper and more computationally expensive network, Inception V4, NAR is able to generate a version that requires 4× lower than the original version with the same AUC performance. Conclusions: NAR is able to achieve substantial reductions in the execution cost of two popular CNN architectures, while resulting in small or no loss in model accuracy. Such cost savings can significantly improve the use of deep learning methods in digital pathology. They can enable studies with larger tissue image datasets and facilitate the use of less expensive and more accessible graphics processing units (GPUs), thus reducing the computing costs of a study.
RESUMEN
BACKGROUND: Population-based state cancer registries are an authoritative source for cancer statistics in the United States. They routinely collect a variety of data, including patient demographics, primary tumor site, stage at diagnosis, first course of treatment, and survival, on every cancer case that is reported across all U.S. states and territories. The goal of our project is to enrich NCI's Surveillance, Epidemiology, and End Results (SEER) registry data with high-quality population-based biospecimen data in the form of digital pathology, machine-learning-based classifications, and quantitative histopathology imaging feature sets (referred to here as Pathomics features). MATERIALS AND METHODS: As part of the project, the underlying informatics infrastructure was designed, tested, and implemented through close collaboration with several participating SEER registries to ensure consistency with registry processes, computational scalability, and ability to support creation of population cohorts that span multiple sites. Utilizing computational imaging algorithms and methods to both generate indices and search for matches makes it possible to reduce inter- and intra-observer inconsistencies and to improve the objectivity with which large image repositories are interrogated. RESULTS: Our team has created and continues to expand a well-curated repository of high-quality digitized pathology images corresponding to subjects whose data are routinely collected by the collaborating registries. Our team has systematically deployed and tested key, visual analytic methods to facilitate automated creation of population cohorts for epidemiological studies and tools to support visualization of feature clusters and evaluation of whole-slide images. As part of these efforts, we are developing and optimizing advanced search and matching algorithms to facilitate automated, content-based retrieval of digitized specimens based on their underlying image features and staining characteristics. CONCLUSION: To meet the challenges of this project, we established the analytic pipelines, methods, and workflows to support the expansion and management of a growing repository of high-quality digitized pathology and information-rich, population cohorts containing objective imaging and clinical attributes to facilitate studies that seek to discriminate among different subtypes of disease, stratify patient populations, and perform comparisons of tumor characteristics within and across patient cohorts. We have also successfully developed a suite of tools based on a deep-learning method to perform quantitative characterizations of tumor regions, assess infiltrating lymphocyte distributions, and generate objective nuclear feature measurements. As part of these efforts, our team has implemented reliable methods that enable investigators to systematically search through large repositories to automatically retrieve digitized pathology specimens and correlated clinical data based on their computational signatures.
RESUMEN
BACKGROUND AND OBJECTIVE: Computerized pathology image analysis is an important tool in research and clinical settings, which enables quantitative tissue characterization and can assist a pathologist's evaluation. The aim of our study is to systematically quantify and minimize uncertainty in output of computer based pathology image analysis. METHODS: Uncertainty quantification (UQ) and sensitivity analysis (SA) methods, such as Variance-Based Decomposition (VBD) and Morris One-At-a-Time (MOAT), are employed to track and quantify uncertainty in a real-world application with large Whole Slide Imaging datasets - 943 Breast Invasive Carcinoma (BRCA) and 381 Lung Squamous Cell Carcinoma (LUSC) patients. Because these studies are compute intensive, high-performance computing systems and efficient UQ/SA methods were combined to provide efficient execution. UQ/SA has been able to highlight parameters of the application that impact the results, as well as nuclear features that carry most of the uncertainty. Using this information, we built a method for selecting stable features that minimize application output uncertainty. RESULTS: The results show that input parameter variations significantly impact all stages (segmentation, feature computation, and survival analysis) of the use case application. We then identified and classified features according to their robustness to parameter variation, and using the proposed features selection strategy, for instance, patient grouping stability in survival analysis has been improved from in 17% and 34% for BRCA and LUSC, respectively. CONCLUSIONS: This strategy created more robust analyses, demonstrating that SA and UQ are important methods that may increase confidence digital pathology.
Asunto(s)
Procesamiento de Imagen Asistido por Computador , Humanos , IncertidumbreRESUMEN
Importance: The National COVID Cohort Collaborative (N3C) is a centralized, harmonized, high-granularity electronic health record repository that is the largest, most representative COVID-19 cohort to date. This multicenter data set can support robust evidence-based development of predictive and diagnostic tools and inform clinical care and policy. Objectives: To evaluate COVID-19 severity and risk factors over time and assess the use of machine learning to predict clinical severity. Design, Setting, and Participants: In a retrospective cohort study of 1â¯926â¯526 US adults with SARS-CoV-2 infection (polymerase chain reaction >99% or antigen <1%) and adult patients without SARS-CoV-2 infection who served as controls from 34 medical centers nationwide between January 1, 2020, and December 7, 2020, patients were stratified using a World Health Organization COVID-19 severity scale and demographic characteristics. Differences between groups over time were evaluated using multivariable logistic regression. Random forest and XGBoost models were used to predict severe clinical course (death, discharge to hospice, invasive ventilatory support, or extracorporeal membrane oxygenation). Main Outcomes and Measures: Patient demographic characteristics and COVID-19 severity using the World Health Organization COVID-19 severity scale and differences between groups over time using multivariable logistic regression. Results: The cohort included 174â¯568 adults who tested positive for SARS-CoV-2 (mean [SD] age, 44.4 [18.6] years; 53.2% female) and 1â¯133â¯848 adult controls who tested negative for SARS-CoV-2 (mean [SD] age, 49.5 [19.2] years; 57.1% female). Of the 174â¯568 adults with SARS-CoV-2, 32â¯472 (18.6%) were hospitalized, and 6565 (20.2%) of those had a severe clinical course (invasive ventilatory support, extracorporeal membrane oxygenation, death, or discharge to hospice). Of the hospitalized patients, mortality was 11.6% overall and decreased from 16.4% in March to April 2020 to 8.6% in September to October 2020 (P = .002 for monthly trend). Using 64 inputs available on the first hospital day, this study predicted a severe clinical course using random forest and XGBoost models (area under the receiver operating curve = 0.87 for both) that were stable over time. The factor most strongly associated with clinical severity was pH; this result was consistent across machine learning methods. In a separate multivariable logistic regression model built for inference, age (odds ratio [OR], 1.03 per year; 95% CI, 1.03-1.04), male sex (OR, 1.60; 95% CI, 1.51-1.69), liver disease (OR, 1.20; 95% CI, 1.08-1.34), dementia (OR, 1.26; 95% CI, 1.13-1.41), African American (OR, 1.12; 95% CI, 1.05-1.20) and Asian (OR, 1.33; 95% CI, 1.12-1.57) race, and obesity (OR, 1.36; 95% CI, 1.27-1.46) were independently associated with higher clinical severity. Conclusions and Relevance: This cohort study found that COVID-19 mortality decreased over time during 2020 and that patient demographic characteristics and comorbidities were associated with higher clinical severity. The machine learning models accurately predicted ultimate clinical severity using commonly collected clinical data from the first 24 hours of a hospital admission.
Asunto(s)
COVID-19 , Bases de Datos Factuales , Predicción , Hospitalización , Modelos Biológicos , Índice de Severidad de la Enfermedad , Adulto , Anciano , Anciano de 80 o más Años , COVID-19/etnología , COVID-19/mortalidad , Comorbilidad , Etnicidad , Oxigenación por Membrana Extracorpórea , Femenino , Humanos , Concentración de Iones de Hidrógeno , Masculino , Persona de Mediana Edad , Pandemias , Respiración Artificial , Estudios Retrospectivos , Factores de Riesgo , SARS-CoV-2 , Estados Unidos , Adulto JovenRESUMEN
Machine learning (ML)- and deep learning (DL)-based imaging modalities have exhibited the capacity to handle extremely high dimensional data for a number of computer vision tasks. While these approaches have been applied to numerous data types, this capacity can be especially leveraged by application on histopathological images, which capture cellular and structural features with their high-resolution, microscopic perspectives. Already, these methodologies have demonstrated promising performance in a variety of applications like disease classification, cancer grading, structure and cellular localizations, and prognostic predictions. A wide range of pathologies requiring histopathological evaluation exist in gastroenterology and hepatology, indicating these as disciplines highly targetable for integration of these technologies. Gastroenterologists have also already been primed to consider the impact of these algorithms, as development of real-time endoscopic video analysis software has been an active and popular field of research. This heightened clinical awareness will likely be important for future integration of these methods and to drive interdisciplinary collaborations on emerging studies. To provide an overview on the application of these methodologies for gastrointestinal and hepatological histopathological slides, this review will discuss general ML and DL concepts, introduce recent and emerging literature using these methods, and cover challenges moving forward to further advance the field.
Asunto(s)
Aprendizaje Profundo , Algoritmos , Humanos , Aprendizaje AutomáticoRESUMEN
Since late 2019, the novel coronavirus SARS-CoV-2 has introduced a wide array of health challenges globally. In addition to a complex acute presentation that can affect multiple organ systems, increasing evidence points to long-term sequelae being common and impactful. The worldwide scientific community is forging ahead to characterize a wide range of outcomes associated with SARS-CoV-2 infection; however the underlying assumptions in these studies have varied so widely that the resulting data are difficult to compareFormal definitions are needed in order to design robust and consistent studies of Long COVID that consistently capture variation in long-term outcomes. Even the condition itself goes by three terms, most widely "Long COVID", but also "COVID-19 syndrome (PACS)" or, "post-acute sequelae of SARS-CoV-2 infection (PASC)". In the present study, we investigate the definitions used in the literature published to date and compare them against data available from electronic health records and patient-reported information collected via surveys. Long COVID holds the potential to produce a second public health crisis on the heels of the pandemic itself. Proactive efforts to identify the characteristics of this heterogeneous condition are imperative for a rigorous scientific effort to investigate and mitigate this threat.
RESUMEN
OBJECTIVE: The United States is experiencing an opioid epidemic. In recent years, there were more than 10 million opioid misusers aged 12 years or older annually. Identifying patients at high risk of opioid use disorder (OUD) can help to make early clinical interventions to reduce the risk of OUD. Our goal is to develop and evaluate models to predict OUD for patients on opioid medications using electronic health records and deep learning methods. The resulting models help us to better understand OUD, providing new insights on the opioid epidemic. Further, these models provide a foundation for clinical tools to predict OUD before it occurs, permitting early interventions. METHODS: Electronic health records of patients who have been prescribed with medications containing active opioid ingredients were extracted from Cerner's Health Facts database for encounters between January 1, 2008, and December 31, 2017. Long short-term memory models were applied to predict OUD risk based on five recent prior encounters before the target encounter and compared with logistic regression, random forest, decision tree, and dense neural network. Prediction performance was assessed using F1 score, precision, recall, and area under the receiver-operating characteristic curve. RESULTS: The long short-term memory (LSTM) model provided promising prediction results which outperformed other methods, with an F1 score of 0.8023 (about 0.016 higher than dense neural network (DNN)) and an area under the receiver-operating characteristic curve (AUROC) of 0.9369 (about 0.145 higher than DNN). CONCLUSIONS: LSTM-based sequential deep learning models can accurately predict OUD using a patient's history of electronic health records, with minimal prior domain knowledge. This tool has the potential to improve clinical decision support for early intervention and prevention to combat the opioid epidemic.
Asunto(s)
Aprendizaje Profundo , Trastornos Relacionados con Opioides , Analgésicos Opioides/efectos adversos , Bases de Datos Factuales , Registros Electrónicos de Salud , Humanos , Trastornos Relacionados con Opioides/epidemiología , Estados Unidos/epidemiologíaRESUMEN
Opioid overdose related deaths have increased dramatically in recent years. Combating the opioid epidemic requires better understanding of the epidemiology of opioid poisoning (OP). To discover trends and patterns of opioid poisoning and the demographic and regional disparities, we analyzed large scale patient visits data in New York State (NYS). Demographic, spatial, temporal and correlation analyses were performed for all OP patients extracted from the claims data in the New York Statewide Planning and Research Cooperative System (SPARCS) from 2010 to 2016, along with Decennial US Census and American Community Survey zip code level data. 58,481 patients with at least one OP diagnosis and a valid NYS zip code address were included. Main outcome and measures include OP patient counts and rates per 100,000 population, patient level factors (gender, age, race and ethnicity, residential zip code), and zip code level social demographic factors. The results showed that the OP rate increased by 364.6%, and by 741.5% for the age group > 65 years. There were wide disparities among groups by race and ethnicity on rates and age distributions of OP. Heroin and non-heroin based OP rates demonstrated distinct temporal trends as well as major geospatial variation. The findings highlighted strong demographic disparity of OP patients, evolving patterns and substantial geospatial variation.
Asunto(s)
Analgésicos Opioides/efectos adversos , Sobredosis de Droga/epidemiología , Heroína/efectos adversos , Trastornos Relacionados con Opioides/epidemiología , Adolescente , Adulto , Distribución por Edad , Anciano , Sobredosis de Droga/patología , Epidemias , Femenino , Humanos , Masculino , Persona de Mediana Edad , Trastornos Relacionados con Opioides/patología , Estudios Retrospectivos , Adulto JovenRESUMEN
The US is experiencing an opioid epidemic, and opioid overdose is causing more than 100 deaths per day. Early identification of patients at high risk of Opioid Overdose (OD) can help to make targeted preventative interventions. We aim to build a deep learning model that can predict the patients at high risk for opioid overdose and identify most relevant features. The study included the information of 5,231,614 patients from the Health Facts database with at least one opioid prescription between January 1, 2008 and December 31, 2017. Potential predictors (n = 1185) were extracted to build a feature matrix for prediction. Long Short-Term Memory (LSTM) based models were built to predict overdose risk in the next hospital visit. Prediction performance was compared with other machine learning methods assessed using machine learning metrics. Our sequential deep learning models built upon LSTM outperformed the other methods on opioid overdose prediction. LSTM with attention mechanism achieved the highest F-1 score (F-1 score: 0.7815, AUCROC: 0.8449). The model is also able to reveal top ranked predictive features by permutation important method, including medications and vital signs. This study demonstrates that a temporal deep learning based predictive model can achieve promising results on identifying risk of opioid overdose of patients using the history of electronic health records. It provides an alternative informatics-based approach to improving clinical decision support for possible early detection and intervention to reduce opioid overdose.
Asunto(s)
Aprendizaje Profundo , Sobredosis de Opiáceos , Analgésicos Opioides/efectos adversos , Registros Electrónicos de Salud , Humanos , PrescripcionesRESUMEN
Background: The majority of U.S. reports of COVID-19 clinical characteristics, disease course, and treatments are from single health systems or focused on one domain. Here we report the creation of the National COVID Cohort Collaborative (N3C), a centralized, harmonized, high-granularity electronic health record repository that is the largest, most representative U.S. cohort of COVID-19 cases and controls to date. This multi-center dataset supports robust evidence-based development of predictive and diagnostic tools and informs critical care and policy. Methods and Findings: In a retrospective cohort study of 1,926,526 patients from 34 medical centers nationwide, we stratified patients using a World Health Organization COVID-19 severity scale and demographics; we then evaluated differences between groups over time using multivariable logistic regression. We established vital signs and laboratory values among COVID-19 patients with different severities, providing the foundation for predictive analytics. The cohort included 174,568 adults with severe acute respiratory syndrome associated with SARS-CoV-2 (PCR >99% or antigen <1%) as well as 1,133,848 adult patients that served as lab-negative controls. Among 32,472 hospitalized patients, mortality was 11.6% overall and decreased from 16.4% in March/April 2020 to 8.6% in September/October 2020 (p = 0.002 monthly trend). In a multivariable logistic regression model, age, male sex, liver disease, dementia, African-American and Asian race, and obesity were independently associated with higher clinical severity. To demonstrate the utility of the N3C cohort for analytics, we used machine learning (ML) to predict clinical severity and risk factors over time. Using 64 inputs available on the first hospital day, we predicted a severe clinical course (death, discharge to hospice, invasive ventilation, or extracorporeal membrane oxygenation) using random forest and XGBoost models (AUROC 0.86 and 0.87 respectively) that were stable over time. The most powerful predictors in these models are patient age and widely available vital sign and laboratory values. The established expected trajectories for many vital signs and laboratory values among patients with different clinical severities validates observations from smaller studies, and provides comprehensive insight into COVID-19 characterization in U.S. patients. Conclusions: This is the first description of an ongoing longitudinal observational study of patients seen in diverse clinical settings and geographical regions and is the largest COVID-19 cohort in the United States. Such data are the foundation for ML models that can be the basis for generalizable clinical decision support tools. The N3C Data Enclave is unique in providing transparent, reproducible, easily shared, versioned, and fully auditable data and analytic provenance for national-scale patient-level EHR data. The N3C is built for intensive ML analyses by academic, industry, and citizen scientists internationally. Many observational correlations can inform trial designs and care guidelines for this new disease.
RESUMEN
OBJECTIVE: Coronavirus disease 2019 (COVID-19) poses societal challenges that require expeditious data and knowledge sharing. Though organizational clinical data are abundant, these are largely inaccessible to outside researchers. Statistical, machine learning, and causal analyses are most successful with large-scale data beyond what is available in any given organization. Here, we introduce the National COVID Cohort Collaborative (N3C), an open science community focused on analyzing patient-level data from many centers. MATERIALS AND METHODS: The Clinical and Translational Science Award Program and scientific community created N3C to overcome technical, regulatory, policy, and governance barriers to sharing and harmonizing individual-level clinical data. We developed solutions to extract, aggregate, and harmonize data across organizations and data models, and created a secure data enclave to enable efficient, transparent, and reproducible collaborative analytics. RESULTS: Organized in inclusive workstreams, we created legal agreements and governance for organizations and researchers; data extraction scripts to identify and ingest positive, negative, and possible COVID-19 cases; a data quality assurance and harmonization pipeline to create a single harmonized dataset; population of the secure data enclave with data, machine learning, and statistical analytics tools; dissemination mechanisms; and a synthetic data pilot to democratize data access. CONCLUSIONS: The N3C has demonstrated that a multisite collaborative learning health network can overcome barriers to rapidly build a scalable infrastructure incorporating multiorganizational clinical data for COVID-19 analytics. We expect this effort to save lives by enabling rapid collaboration among clinicians, researchers, and data scientists to identify treatments and specialized care and thereby reduce the immediate and long-term impacts of COVID-19.