ABSTRACT
COVID-19 manifests with a wide spectrum of clinical phenotypes that are characterized by exaggerated and misdirected host immune responses1-6. Although pathological innate immune activation is well-documented in severe disease1, the effect of autoantibodies on disease progression is less well-defined. Here we use a high-throughput autoantibody discovery technique known as rapid extracellular antigen profiling7 to screen a cohort of 194 individuals infected with SARS-CoV-2, comprising 172 patients with COVID-19 and 22 healthcare workers with mild disease or asymptomatic infection, for autoantibodies against 2,770 extracellular and secreted proteins (members of the exoproteome). We found that patients with COVID-19 exhibit marked increases in autoantibody reactivities as compared to uninfected individuals, and show a high prevalence of autoantibodies against immunomodulatory proteins (including cytokines, chemokines, complement components and cell-surface proteins). We established that these autoantibodies perturb immune function and impair virological control by inhibiting immunoreceptor signalling and by altering peripheral immune cell composition, and found that mouse surrogates of these autoantibodies increase disease severity in a mouse model of SARS-CoV-2 infection. Our analysis of autoantibodies against tissue-associated antigens revealed associations with specific clinical characteristics. Our findings suggest a pathological role for exoproteome-directed autoantibodies in COVID-19, with diverse effects on immune functionality and associations with clinical outcomes.
Subject(s)
Autoantibodies/analysis , Autoantibodies/immunology , COVID-19/immunology , COVID-19/metabolism , Proteome/immunology , Proteome/metabolism , Animals , Antigens, Surface/immunology , COVID-19/pathology , COVID-19/physiopathology , Case-Control Studies , Complement System Proteins/immunology , Cytokines/immunology , Disease Models, Animal , Disease Progression , Female , Humans , Male , Mice , Organ Specificity/immunologyABSTRACT
Recent studies have provided insights into the pathogenesis of coronavirus disease 2019 (COVID-19)1-4. However, the longitudinal immunological correlates of disease outcome remain unclear. Here we serially analysed immune responses in 113 patients with moderate or severe COVID-19. Immune profiling revealed an overall increase in innate cell lineages, with a concomitant reduction in T cell number. An early elevation in cytokine levels was associated with worse disease outcomes. Following an early increase in cytokines, patients with moderate COVID-19 displayed a progressive reduction in type 1 (antiviral) and type 3 (antifungal) responses. By contrast, patients with severe COVID-19 maintained these elevated responses throughout the course of the disease. Moreover, severe COVID-19 was accompanied by an increase in multiple type 2 (anti-helminths) effectors, including interleukin-5 (IL-5), IL-13, immunoglobulin E and eosinophils. Unsupervised clustering analysis identified four immune signatures, representing growth factors (A), type-2/3 cytokines (B), mixed type-1/2/3 cytokines (C), and chemokines (D) that correlated with three distinct disease trajectories. The immune profiles of patients who recovered from moderate COVID-19 were enriched in tissue reparative growth factor signature A, whereas the profiles of those with who developed severe disease had elevated levels of all four signatures. Thus, we have identified a maladapted immune response profile associated with severe COVID-19 and poor clinical outcome, as well as early immune signatures that correlate with divergent disease trajectories.
Subject(s)
Coronavirus Infections/immunology , Coronavirus Infections/physiopathology , Cytokines/analysis , Pneumonia, Viral/immunology , Pneumonia, Viral/physiopathology , Adult , Aged , Aged, 80 and over , COVID-19 , Cluster Analysis , Cytokines/immunology , Eosinophils/immunology , Female , Humans , Immunoglobulin E/analysis , Immunoglobulin E/immunology , Interleukin-13/analysis , Interleukin-13/immunology , Interleukin-5/analysis , Interleukin-5/immunology , Male , Middle Aged , Pandemics , T-Lymphocytes/cytology , T-Lymphocytes/immunology , Viral Load , Young AdultABSTRACT
There is increasing evidence that coronavirus disease 2019 (COVID-19) produces more severe symptoms and higher mortality among men than among women1-5. However, whether immune responses against severe acute respiratory syndrome coronavirus (SARS-CoV-2) differ between sexes, and whether such differences correlate with the sex difference in the disease course of COVID-19, is currently unknown. Here we examined sex differences in viral loads, SARS-CoV-2-specific antibody titres, plasma cytokines and blood-cell phenotyping in patients with moderate COVID-19 who had not received immunomodulatory medications. Male patients had higher plasma levels of innate immune cytokines such as IL-8 and IL-18 along with more robust induction of non-classical monocytes. By contrast, female patients had more robust T cell activation than male patients during SARS-CoV-2 infection. Notably, we found that a poor T cell response negatively correlated with patients' age and was associated with worse disease outcome in male patients, but not in female patients. By contrast, higher levels of innate immune cytokines were associated with worse disease progression in female patients, but not in male patients. These findings provide a possible explanation for the observed sex biases in COVID-19, and provide an important basis for the development of a sex-based approach to the treatment and care of male and female patients with COVID-19.
Subject(s)
COVID-19/immunology , Cytokines/immunology , Immunity, Innate/immunology , SARS-CoV-2/immunology , Sex Characteristics , T-Lymphocytes/immunology , COVID-19/blood , COVID-19/virology , Chemokines/blood , Chemokines/immunology , Cohort Studies , Cytokines/blood , Disease Progression , Female , Humans , Lymphocyte Activation , Male , Monocytes/immunology , Phenotype , Prognosis , RNA, Viral/analysis , SARS-CoV-2/pathogenicity , Viral LoadABSTRACT
BACKGROUND: The impact variant-specific immune evasion and waning protection have on declining coronavirus disease 2019 (COVID-19) vaccine effectiveness (VE) remains unclear. Using whole-genome sequencing (WGS), we examined the contribution these factors had on the decline that followed the introduction of the Delta variant. Furthermore, we evaluated calendar-period-based classification as a WGS alternative. METHODS: We conducted a test-negative case-control study among people tested for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) between 1 April and 24 August 2021. Variants were classified using WGS and calendar period. RESULTS: We included 2029 cases (positive, sequenced samples) and 343 727 controls (negative tests). VE 14-89 days after second dose was significantly higher against Alpha (84.4%; 95% confidence interval [CI], 75.6%-90.0%) than Delta infection (68.9%; 95% CI, 58.0%-77.1%). The odds of Delta infection were significantly higher 90-149 than 14-89 days after second dose (P value = .003). Calendar-period-classified VE estimates approximated WGS-classified estimates; however, calendar-period-based classification was subject to misclassification (35% Alpha, 4% Delta). CONCLUSIONS: Both waning protection and variant-specific immune evasion contributed to the lower effectiveness. While calendar-period-classified VE estimates mirrored WGS-classified estimates, our analysis highlights the need for WGS when variants are cocirculating and misclassification is likely.
Subject(s)
COVID-19 , Hepatitis D , Humans , COVID-19 Vaccines , Case-Control Studies , Immune Evasion , SARS-CoV-2 , Vaccine EfficacyABSTRACT
Graph data models are an emerging approach to structure clinical and biomedical information. These models offer intriguing opportunities for novel approaches in healthcare, such as disease phenotyping, risk prediction, and personalized precision care. The combination of data and information in a graph model to create knowledge graphs has rapidly expanded in biomedical research, but the integration of real-world data from the electronic health record has been limited. To broadly apply knowledge graphs to EHR and other real-world data, a deeper understanding of how to represent these data in a standardized graph model is needed. We provide an overview of the state-of-the-art research for clinical and biomedical data integration and summarize the potential to accelerate healthcare and precision medicine research through insight generation from integrated knowledge graphs.
Subject(s)
Algorithms , Biomedical Research , Humans , Pattern Recognition, Automated , Phenotype , Precision MedicineABSTRACT
BACKGROUND: The benefit of primary and booster vaccination in people who experienced a prior Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) infection remains unclear. The objective of this study was to estimate the effectiveness of primary (two-dose series) and booster (third dose) mRNA vaccination against Omicron (lineage BA.1) infection among people with a prior documented infection. METHODS AND FINDINGS: We conducted a test-negative case-control study of reverse transcription PCRs (RT-PCRs) analyzed with the TaqPath (Thermo Fisher Scientific) assay and recorded in the Yale New Haven Health system from November 1, 2021, to April 30, 2022. Overall, 11,307 cases (positive TaqPath analyzed RT-PCRs with S-gene target failure [SGTF]) and 130,041 controls (negative TaqPath analyzed RT-PCRs) were included (median age: cases: 35 years, controls: 39 years). Among cases and controls, 5.9% and 8.1% had a documented prior infection (positive SARS-CoV-2 test record ≥90 days prior to the included test), respectively. We estimated the effectiveness of primary and booster vaccination relative to SGTF-defined Omicron (lineage BA.1) variant infection using a logistic regression adjusted for date of test, age, sex, race/ethnicity, insurance, comorbidities, social venerability index, municipality, and healthcare utilization. The effectiveness of primary vaccination 14 to 149 days after the second dose was 41.0% (95% confidence interval (CI): 14.1% to 59.4%, p 0.006) and 27.1% (95% CI: 18.7% to 34.6%, p < 0.001) for people with and without a documented prior infection, respectively. The effectiveness of booster vaccination (≥14 days after booster dose) was 47.1% (95% CI: 22.4% to 63.9%, p 0.001) and 54.1% (95% CI: 49.2% to 58.4%, p < 0.001) in people with and without a documented prior infection, respectively. To test whether booster vaccination reduced the risk of infection beyond that of the primary series, we compared the odds of infection among boosted (≥14 days after booster dose) and booster-eligible people (≥150 days after second dose). The odds ratio (OR) comparing boosted and booster-eligible people with a documented prior infection was 0.79 (95% CI: 0.54 to 1.16, p 0.222), whereas the OR comparing boosted and booster-eligible people without a documented prior infection was 0.54 (95% CI: 0.49 to 0.59, p < 0.001). This study's limitations include the risk of residual confounding, the use of data from a single system, and the reliance on TaqPath analyzed RT-PCR results. CONCLUSIONS: In this study, we observed that primary vaccination provided significant but limited protection against Omicron (lineage BA.1) infection among people with and without a documented prior infection. While booster vaccination was associated with additional protection against Omicron BA.1 infection in people without a documented prior infection, it was not found to be associated with additional protection among people with a documented prior infection. These findings support primary vaccination in people regardless of documented prior infection status but suggest that infection history may impact the relative benefit of booster doses.
Subject(s)
COVID-19 , Humans , Adult , COVID-19/epidemiology , COVID-19/prevention & control , SARS-CoV-2/genetics , Case-Control Studies , Odds Ratio , VaccinationABSTRACT
BACKGROUND: Modern artificial intelligence (AI) and machine learning (ML) methods are now capable of completing tasks with performance characteristics that are comparable to those of expert human operators. As a result, many areas throughout healthcare are incorporating these technologies, including in vitro diagnostics and, more broadly, laboratory medicine. However, there are limited literature reviews of the landscape, likely future, and challenges of the application of AI/ML in laboratory medicine. CONTENT: In this review, we begin with a brief introduction to AI and its subfield of ML. The ensuing sections describe ML systems that are currently in clinical laboratory practice or are being proposed for such use in recent literature, ML systems that use laboratory data outside the clinical laboratory, challenges to the adoption of ML, and future opportunities for ML in laboratory medicine. SUMMARY: AI and ML have and will continue to influence the practice and scope of laboratory medicine dramatically. This has been made possible by advancements in modern computing and the widespread digitization of health information. These technologies are being rapidly developed and described, but in comparison, their implementation thus far has been modest. To spur the implementation of reliable and sophisticated ML-based technologies, we need to establish best practices further and improve our information system and communication infrastructure. The participation of the clinical laboratory community is essential to ensure that laboratory data are sufficiently available and incorporated conscientiously into robust, safe, and clinically effective ML-supported clinical diagnostics.
Subject(s)
Artificial Intelligence , Medicine , Delivery of Health Care , Humans , Laboratories , Machine LearningABSTRACT
BACKGROUND: Clinical babesiosis is diagnosed, and parasite burden is determined, by microscopic inspection of a thick or thin Giemsa-stained peripheral blood smear. However, quantitative analysis by manual microscopy is subject to error. As such, methods for the automated measurement of percent parasitemia in digital microscopic images of peripheral blood smears could improve clinical accuracy, relative to the predicate method. METHODS: Individual erythrocyte images were manually labeled as "parasite" or "normal" and were used to train a model for binary image classification. The best model was then used to calculate percent parasitemia from a clinical validation dataset, and values were compared to a clinical reference value. Lastly, model interpretability was examined using an integrated gradient to identify pixels most likely to influence classification decisions. RESULTS: The precision and recall of the model during development testing were 0.92 and 1.00, respectively. In clinical validation, the model returned increasing positive signal with increasing mean reference value. However, there were 2 highly erroneous false positive values returned by the model. Further, the model incorrectly assessed 3 cases well above the clinical threshold of 10%. The integrated gradient suggested potential sources of false positives including rouleaux formations, cell boundaries, and precipitate as deterministic factors in negative erythrocyte images. CONCLUSIONS: While the model demonstrated highly accurate single cell classification and correctly assessed most slides, several false positives were highly incorrect. This project highlights the need for integrated testing of machine learning-based models, even when models in the development phase perform well.
Subject(s)
Babesia , Parasitemia , Erythrocytes , Humans , Microscopy/methods , Neural Networks, Computer , Parasitemia/diagnosisABSTRACT
BACKGROUND: The electronic health record (EHR) holds the prospect of providing more complete and timely access to clinical information for biomedical research, quality assessments, and quality improvement compared to other data sources, such as administrative claims. In this study, we sought to assess the completeness and timeliness of structured diagnoses in the EHR compared to computed diagnoses for hypertension (HTN), hyperlipidemia (HLD), and diabetes mellitus (DM). METHODS: We determined the amount of time for a structured diagnosis to be recorded in the EHR from when an equivalent diagnosis could be computed from other structured data elements, such as vital signs and laboratory results. We used EHR data for encounters from January 1, 2012 through February 10, 2019 from an academic health system. Diagnoses for HTN, HLD, and DM were computed for patients with at least two observations above threshold separated by at least 30 days, where the thresholds were outpatient blood pressure of ≥ 140/90 mmHg, any low-density lipoprotein ≥ 130 mg/dl, or any hemoglobin A1c ≥ 6.5%, respectively. The primary measure was the length of time between the computed diagnosis and the time at which a structured diagnosis could be identified within the EHR history or problem list. RESULTS: We found that 39.8% of those with HTN, 21.6% with HLD, and 5.2% with DM did not receive a corresponding structured diagnosis recorded in the EHR. For those who received a structured diagnosis, a mean of 389, 198, and 166 days elapsed before the patient had the corresponding diagnosis of HTN, HLD, or DM, respectively, recorded in the EHR. CONCLUSIONS: We found a marked temporal delay between when a diagnosis can be computed or inferred and when an equivalent structured diagnosis is recorded within the EHR. These findings demonstrate the continued need for additional study of the EHR to avoid bias when using observational data and reinforce the need for computational approaches to identify clinical phenotypes.
Subject(s)
Diabetes Mellitus , Hypertension , Diabetes Mellitus/diagnosis , Diabetes Mellitus/epidemiology , Electronic Health Records , Humans , Hypertension/diagnosis , Hypertension/epidemiology , Information Storage and Retrieval , OutpatientsABSTRACT
Pathogen-reduced (PR) platelets are routinely used in many countries. Some studies reported changes in platelet and red blood cell (RBC) transfusion requirements in patients who received PR platelets when compared to conventional (CONV) platelets. Over a 28-month period we retrospectively analysed platelet utilisation, RBC transfusion trends, and transfusion reaction rates data from all transfused adult patients transfused at the Yale-New Haven Hospital, New Haven, CT, USA. We determined the number of RBC and platelet components administered between 2 and 24, 48, 72 or 96 h. A total of 3767 patients received 21 907 platelet components (CONV = 8912; PR = 12 995); 1,087 patients received only CONV platelets (1578 components) and 1,466 patients received only PR platelets (2604 components). The number of subsequently transfused platelet components was slightly higher following PR platelet components (P < 0·05); however, fewer RBCs were transfused following PR platelet administration (P < 0·05). The mean time-to-next platelet component transfusion was slightly shorter following PR platelet transfusion (P = 0·002). The rate of non-septic transfusion reactions did not differ (all P > 0·05). Septic transfusion reactions (N = 5) were seen only after CONV platelet transfusions (P = 0·011). These results provide evidence for comparable clinical efficacy of PR and CONV platelets. PR platelets eliminated septic transfusion reactions without increased risk of other types of transfusions with only slight increase in platelet utilisation.
Subject(s)
Blood Platelets , Disinfection , Platelet Transfusion/adverse effects , Transfusion Reaction/epidemiology , Adult , Female , Humans , Male , Middle AgedABSTRACT
STUDY OBJECTIVE: The goal of this study is to create a predictive, interpretable model of early hospital respiratory failure among emergency department (ED) patients admitted with coronavirus disease 2019 (COVID-19). METHODS: This was an observational, retrospective, cohort study from a 9-ED health system of admitted adult patients with severe acute respiratory syndrome coronavirus 2 (COVID-19) and an oxygen requirement less than or equal to 6 L/min. We sought to predict respiratory failure within 24 hours of admission as defined by oxygen requirement of greater than 10 L/min by low-flow device, high-flow device, noninvasive or invasive ventilation, or death. Predictive models were compared with the Elixhauser Comorbidity Index, quick Sequential [Sepsis-related] Organ Failure Assessment, and the CURB-65 pneumonia severity score. RESULTS: During the study period, from March 1 to April 27, 2020, 1,792 patients were admitted with COVID-19, 620 (35%) of whom had respiratory failure in the ED. Of the remaining 1,172 admitted patients, 144 (12.3%) met the composite endpoint within the first 24 hours of hospitalization. On the independent test cohort, both a novel bedside scoring system, the quick COVID-19 Severity Index (area under receiver operating characteristic curve mean 0.81 [95% confidence interval {CI} 0.73 to 0.89]), and a machine-learning model, the COVID-19 Severity Index (mean 0.76 [95% CI 0.65 to 0.86]), outperformed the Elixhauser mortality index (mean 0.61 [95% CI 0.51 to 0.70]), CURB-65 (0.50 [95% CI 0.40 to 0.60]), and quick Sequential [Sepsis-related] Organ Failure Assessment (0.59 [95% CI 0.50 to 0.68]). A low quick COVID-19 Severity Index score was associated with a less than 5% risk of respiratory decompensation in the validation cohort. CONCLUSION: A significant proportion of admitted COVID-19 patients progress to respiratory failure within 24 hours of admission. These events are accurately predicted with bedside respiratory examination findings within a simple scoring system.
Subject(s)
Coronavirus Infections/complications , Coronavirus Infections/diagnosis , Emergency Service, Hospital , Pneumonia, Viral/complications , Pneumonia, Viral/diagnosis , Respiratory Insufficiency/virology , Severity of Illness Index , Adolescent , Adult , Aged , Betacoronavirus , COVID-19 , COVID-19 Testing , Clinical Laboratory Techniques , Coronavirus Infections/therapy , Female , Humans , Male , Middle Aged , Oxygen Inhalation Therapy , Pandemics , Pneumonia, Viral/therapy , Respiratory Insufficiency/therapy , Retrospective Studies , Risk Assessment/methods , SARS-CoV-2 , Young AdultABSTRACT
The ongoing coronavirus disease outbreak demonstrates the need for novel applications of real-time data to produce timely information about incident cases. Using health information technology (HIT) and real-world data, we sought to produce an interface that could, in near real time, identify patients presenting with suspected respiratory tract infection and enable monitoring of test results related to specific pathogens, including severe acute respiratory syndrome coronavirus 2. This tool was built upon our computational health platform, which provides access to near real-time data from disparate HIT sources across our health system. This combination of technology allowed us to rapidly prototype, iterate, and deploy a platform to support a cohesive organizational response to a rapidly evolving outbreak. Platforms that allow for agile analytics are needed to keep pace with evolving needs within the health care system.
Subject(s)
Betacoronavirus , Coronavirus Infections/epidemiology , Delivery of Health Care/statistics & numerical data , Medical Informatics/methods , Pneumonia, Viral/epidemiology , Public Health Surveillance/methods , COVID-19 , Disease Outbreaks/statistics & numerical data , Humans , Pandemics , SARS-CoV-2 , Time FactorsABSTRACT
OBJECTIVES: To assess the safety and efficacy of a Food and Drug Administration-approved pathogen-reduced platelet (PLT) product in children, as ongoing questions regarding their use in this population remain. STUDY DESIGN: We report findings from a quality assurance review of PLT utilization, associated red blood cell transfusion trends, and short-term safety of conventional vs pathogen-reduced PLTs over a 21-month period while transitioning from conventional to pathogen-reduced PLTs at a large, tertiary care hospital. We assessed utilization in neonatal intensive care unit (NICU) patients, infants 0-1 year not in the NICU, and children age 1-18 years (PED). RESULTS: In the 48 hours after an index conventional or pathogen-reduced platelet transfusion, respectively, NICU patients received 1.0 ± 1.4 (n = 91 transfusions) compared with 1.2 ± 1.3 (n = 145) additional platelet doses (P = .29); infants 0-1 year not in the NICU received 2.8 ± 3.0 (n = 125) vs 2.6 ± 2.6 (n = 254) additional platelet doses (P = .57); and PEDs received 0.9 ± 1.6 (n = 644) vs 1.4 ± 2.2 (n = 673) additional doses (P < .001). Time to subsequent transfusion and red cell utilization were similar in every group (P > .05). The number and type of transfusion reactions did not significantly vary based on PLT type and no rashes were reported in NICU patients receiving phototherapy and pathogen-reduced PLTs. CONCLUSIONS: Conventional and pathogen-reduced PLTs had similar utilization patterns in our pediatric populations. A small, but statistically significant, increase in transfusions was noted following pathogen-reduced PLT transfusion in PED patients, but not in other groups. Red cell utilization and transfusion reactions were similar for both products in all age groups.
Subject(s)
Platelet Transfusion/adverse effects , Transfusion Reaction/epidemiology , Adolescent , Bacterial Infections/prevention & control , Child , Child, Preschool , Humans , Infant , Infant, Newborn , Infection Control , Platelet Transfusion/statistics & numerical data , Procedures and Techniques Utilization/statistics & numerical data , Virus Diseases/prevention & controlABSTRACT
BACKGROUND: Health care data are increasing in volume and complexity. Storing and analyzing these data to implement precision medicine initiatives and data-driven research has exceeded the capabilities of traditional computer systems. Modern big data platforms must be adapted to the specific demands of health care and designed for scalability and growth. OBJECTIVE: The objectives of our study were to (1) demonstrate the implementation of a data science platform built on open source technology within a large, academic health care system and (2) describe 2 computational health care applications built on such a platform. METHODS: We deployed a data science platform based on several open source technologies to support real-time, big data workloads. We developed data-acquisition workflows for Apache Storm and NiFi in Java and Python to capture patient monitoring and laboratory data for downstream analytics. RESULTS: Emerging data management approaches, along with open source technologies such as Hadoop, can be used to create integrated data lakes to store large, real-time datasets. This infrastructure also provides a robust analytics platform where health care and biomedical research data can be analyzed in near real time for precision medicine and computational health care use cases. CONCLUSIONS: The implementation and use of integrated data science platforms offer organizations the opportunity to combine traditional datasets, including data from the electronic health record, with emerging big data sources, such as continuous patient monitoring and real-time laboratory results. These platforms can enable cost-effective and scalable analytics for the information that will be key to the delivery of precision medicine initiatives. Organizations that can take advantage of the technical advances found in data science platforms will have the opportunity to provide comprehensive access to health care data for computational health care and precision medicine research.
Subject(s)
Data Science/methods , Delivery of Health Care/methods , Medical Informatics/methods , Precision Medicine/methods , HumansABSTRACT
BACKGROUND: The current acute kidney injury (AKI) risk prediction model for patients undergoing percutaneous coronary intervention (PCI) from the American College of Cardiology (ACC) National Cardiovascular Data Registry (NCDR) employed regression techniques. This study aimed to evaluate whether models using machine learning techniques could significantly improve AKI risk prediction after PCI. METHODS AND FINDINGS: We used the same cohort and candidate variables used to develop the current NCDR CathPCI Registry AKI model, including 947,091 patients who underwent PCI procedures between June 1, 2009, and June 30, 2011. The mean age of these patients was 64.8 years, and 32.8% were women, with a total of 69,826 (7.4%) AKI events. We replicated the current AKI model as the baseline model and compared it with a series of new models. Temporal validation was performed using data from 970,869 patients undergoing PCIs between July 1, 2016, and March 31, 2017, with a mean age of 65.7 years; 31.9% were women, and 72,954 (7.5%) had AKI events. Each model was derived by implementing one of two strategies for preprocessing candidate variables (preselecting and transforming candidate variables or using all candidate variables in their original forms), one of three variable-selection methods (stepwise backward selection, lasso regularization, or permutation-based selection), and one of two methods to model the relationship between variables and outcome (logistic regression or gradient descent boosting). The cohort was divided into different training (70%) and test (30%) sets using 100 different random splits, and the performance of the models was evaluated internally in the test sets. The best model, according to the internal evaluation, was derived by using all available candidate variables in their original form, permutation-based variable selection, and gradient descent boosting. Compared with the baseline model that uses 11 variables, the best model used 13 variables and achieved a significantly better area under the receiver operating characteristic curve (AUC) of 0.752 (95% confidence interval [CI] 0.749-0.754) versus 0.711 (95% CI 0.708-0.714), a significantly better Brier score of 0.0617 (95% CI 0.0615-0.0618) versus 0.0636 (95% CI 0.0634-0.0638), and a better calibration slope of observed versus predicted rate of 1.008 (95% CI 0.988-1.028) versus 1.036 (95% CI 1.015-1.056). The best model also had a significantly wider predictive range (25.3% versus 21.6%, p < 0.001) and was more accurate in stratifying AKI risk for patients. Evaluated on a more contemporary CathPCI cohort (July 1, 2015-March 31, 2017), the best model consistently achieved significantly better performance than the baseline model in AUC (0.785 versus 0.753), Brier score (0.0610 versus 0.0627), calibration slope (1.003 versus 1.062), and predictive range (29.4% versus 26.2%). The current study does not address implementation for risk calculation at the point of care, and potential challenges include the availability and accessibility of the predictors. CONCLUSIONS: Machine learning techniques and data-driven approaches resulted in improved prediction of AKI risk after PCI. The results support the potential of these techniques for improving risk prediction models and identification of patients who may benefit from risk-mitigation strategies.
Subject(s)
Acute Kidney Injury/etiology , Data Mining/methods , Decision Support Techniques , Machine Learning , Percutaneous Coronary Intervention/adverse effects , Acute Kidney Injury/diagnosis , Acute Kidney Injury/prevention & control , Aged , Clinical Decision-Making , Female , Humans , Male , Middle Aged , Protective Factors , Registries , Reproducibility of Results , Retrospective Studies , Risk Assessment , Risk Factors , Time Factors , Treatment OutcomeABSTRACT
BACKGROUND: Morphologic profiling of the erythrocyte population is a widely used and clinically valuable diagnostic modality, but one that relies on a slow manual process associated with significant labor cost and limited reproducibility. Automated profiling of erythrocytes from digital images by capable machine learning approaches would augment the throughput and value of morphologic analysis. To this end, we sought to evaluate the performance of leading implementation strategies for convolutional neural networks (CNNs) when applied to classification of erythrocytes based on morphology. METHODS: Erythrocytes were manually classified into 1 of 10 classes using a custom-developed Web application. Using recent literature to guide architectural considerations for neural network design, we implemented a "very deep" CNN, consisting of >150 layers, with dense shortcut connections. RESULTS: The final database comprised 3737 labeled cells. Ensemble model predictions on unseen data demonstrated a harmonic mean of recall and precision metrics of 92.70% and 89.39%, respectively. Of the 748 cells in the test set, 23 misclassification errors were made, with a correct classification frequency of 90.60%, represented as a harmonic mean across the 10 morphologic classes. CONCLUSIONS: These findings indicate that erythrocyte morphology profiles could be measured with a high degree of accuracy with "very deep" CNNs. Further, these data support future efforts to expand classes and optimize practical performance in a clinical environment as a prelude to full implementation as a clinical tool.
Subject(s)
Erythrocytes/cytology , Image Processing, Computer-Assisted/methods , Neural Networks, Computer , Databases, Factual , Erythrocytes/pathology , HumansABSTRACT
While the adoption of next generation sequencing has rapidly expanded, the informatics infrastructure used to manage the data generated by this technology has not kept pace. Historically, relational databases have provided much of the framework for data storage and retrieval. Newer technologies based on NoSQL architectures may provide significant advantages in storage and query efficiency, thereby reducing the cost of data management. But their relative advantage when applied to biomedical data sets, such as genetic data, has not been characterized. To this end, we compared the storage, indexing, and query efficiency of a common relational database (MySQL), a document-oriented NoSQL database (MongoDB), and a relational database with NoSQL support (PostgreSQL). When used to store genomic annotations from the dbSNP database, we found the NoSQL architectures to outperform traditional, relational models for speed of data storage, indexing, and query retrieval in nearly every operation. These findings strongly support the use of novel database technologies to improve the efficiency of data management within the biological sciences.