Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 57
Filtrar
1.
Ann Thorac Surg ; 2023 Dec 06.
Artigo em Inglês | MEDLINE | ID: mdl-38065331

RESUMO

BACKGROUND: We previously showed that machine learning-based methodologies of optimal classification trees (OCTs) can accurately predict risk after congenital heart surgery and assess case-mix-adjusted performance after benchmark procedures. We extend this methodology to provide interpretable, easily accessible, and actionable hospital performance analysis across all procedures. METHODS: The European Congenital Heart Surgeons Association Congenital Cardiac Database data subset of 172,888 congenital cardiac surgical procedures performed in European centers between 1989 and 2022 was analyzed. OCT models (decision trees) were built predicting hospital mortality (area under the curve [AUC], 0.866), prolonged postoperative mechanical ventilatory support time (AUC, 0.851), or hospital length of stay (AUC, 0.818), thereby establishing case-adjusted benchmarking standards reflecting the overall performance of all participating hospitals, designated as the "virtual hospital." OCT analysis of virtual hospital aggregate data yielded predicted expected outcomes (both aggregate and for risk-matched patient cohorts) for the individual hospital's own specific case-mix, readily available on-line. RESULTS: Raw average rates were hospital mortality, 4.9%; mechanical ventilatory support time, 14.5%; and length of stay, 15.0%. Of 146 participating centers, compared with each hospital's overall case-adjusted predicted hospital mortality benchmark, 20.5% statistically (<90% CI) overperformed and 20.5% underperformed. An interactive tool based on the OCT analysis automatically reveals 14 hospital-specific patient cohorts, simultaneously assessing overperformance or underperformance, and enabling further analysis of cohort strata in any chosen time frame. CONCLUSIONS: Machine learning-based OCT benchmarking analysis provides automatic assessment of hospital-specific case-adjusted performance after congenital heart surgery, not only overall but importantly, also by similar risk patient cohorts. This is a tool for hospital self-assessment, particularly facilitated by the user-accessible online-platform.

2.
Surgery ; 174(6): 1302-1308, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37778969

RESUMO

BACKGROUND: Existent methodologies for benchmarking the quality of surgical care are linear and fail to capture the complex interactions of preoperative variables. We sought to leverage novel nonlinear artificial intelligence methodologies to benchmark emergency surgical care. METHODS: Using a nonlinear but interpretable artificial intelligence methodology called optimal classification trees, first, the overall observed mortality rate at the index hospital's emergency surgery population (index cohort) was compared to the risk-adjusted expected mortality rate calculated by the optimal classification trees from the American College of Surgeons National Surgical Quality Improvement Program database (benchmark cohort). Second, the artificial intelligence optimal classification trees created different "nodes" of care representing specific patient phenotypes defined by the artificial intelligence optimal classification trees without human interference to optimize prediction. These nodes capture multiple iterative risk-adjusted comparisons, permitting the identification of specific areas of excellence and areas for improvement. RESULTS: The index and benchmark cohorts included 1,600 and 637,086 patients, respectively. The observed and risk-adjusted expected mortality rates of the index cohort calculated by optimal classification trees were similar (8.06% [95% confidence interval: 6.8-9.5] vs 7.53%, respectively, P = .42). Two areas of excellence and 4 for improvement were identified. For example, the index cohort had lower-than-expected mortality when patients were older than 75 and in respiratory failure and septic shock preoperatively but higher-than-expected mortality when patients had respiratory failure preoperatively and were thrombocytopenic, with an international normalized ratio ≤1.7. CONCLUSION: We used artificial intelligence methodology to benchmark the quality of emergency surgical care. Such nonlinear and interpretable methods promise a more comprehensive evaluation and a deeper dive into areas of excellence versus suboptimal care.


Assuntos
Serviços Médicos de Emergência , Insuficiência Respiratória , Humanos , Inteligência Artificial , Benchmarking , Bases de Dados Factuais
3.
JCO Clin Cancer Inform ; 7: e2300026, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-37843071

RESUMO

PURPOSE: Abundant literature and clinical trials indicate that routine cancer screenings decrease patient mortality for several common cancers. However, current national cancer screening guidelines heavily rely on patient age as the predominant factor in deciding cancer screening timing, neglecting other important medical characteristics of individual patients. This approach either delays screening or prescribes excessive screenings. Another disadvantage of the current approach is its inability to combine information across hospital systems because of the lack of a coherent records system. METHODS: We propose to use claims data and medical insurance transactions that use consistent and pre-established sets of codes for diagnosis, procedures, and medications to develop a clinical support tool to supply supplemental insights and precautions for physicians to make more informed decisions. Furthermore, we propose a novel machine learning framework to recommend personalized, data-driven, and dynamic screening decisions. RESULTS: We apply this new method to the study of breast cancer mammograms using claims data from 378,840 female patients to demonstrate that across different risk populations, personalized screening reduces the average delay in a cancer diagnosis by 2-3 months with statistical significance, with even stronger benefits for individual patients up to 10 months. CONCLUSION: Incorporating personal medical characteristics using claims data and novel machine learning methodologies into breast cancer screening improves screening delay by more dynamically considering changing patient risks. Future incorporation of the proposed methodology in health care settings could be provided as a potential support tool for clinicians.


Assuntos
Neoplasias da Mama , Médicos , Humanos , Feminino , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/prevenção & controle , Detecção Precoce de Câncer , Mamografia
4.
EClinicalMedicine ; 64: 102200, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37731933

RESUMO

Background: There are several models that predict the risk of recurrence following resection of localised, primary gastrointestinal stromal tumour (GIST). However, assessment of calibration is not always feasible and when performed, calibration of current GIST models appears to be suboptimal. We aimed to develop a prognostic model to predict the recurrence of GIST after surgery with both good discrimination and calibration by uncovering and harnessing the non-linear relationships among variables that predict recurrence. Methods: In this observational cohort study, the data of 395 adult patients who underwent complete resection (R0 or R1) of a localised, primary GIST in the pre-imatinib era at Memorial Sloan Kettering Cancer Center (NY, USA) (recruited 1982-2001) and a European consortium (Spanish Group for Research in Sarcomas, 80 sites) (recruited 1987-2011) were used to train an interpretable Artificial Intelligence (AI)-based model called Optimal Classification Trees (OCT). The OCT predicted the probability of recurrence after surgery by capturing non-linear relationships among predictors of recurrence. The data of an additional 596 patients from another European consortium (Polish Clinical GIST Registry, 7 sites) (recruited 1981-2013) who were also treated in the pre-imatinib era were used to externally validate the OCT predictions with regard to discrimination (Harrell's C-index and Brier score) and calibration (calibration curve, Brier score, and Hosmer-Lemeshow test). The calibration of the Memorial Sloan Kettering (MSK) GIST nomogram was used as a comparative gold standard. We also evaluated the clinical utility of the OCT and the MSK nomogram by performing a Decision Curve Analysis (DCA). Findings: The internal cohort included 395 patients (median [IQR] age, 63 [54-71] years; 214 men [54.2%]) and the external cohort included 556 patients (median [IQR] age, 60 [52-68] years; 308 men [55.4%]). The Harrell's C-index of the OCT in the external validation cohort was greater than that of the MSK nomogram (0.805 (95% CI: 0.803-0.808) vs 0.788 (95% CI: 0.786-0.791), respectively). In the external validation cohort, the slope and intercept of the calibration curve of the main OCT were 1.041 and 0.038, respectively. In comparison, the slope and intercept of the calibration curve for the MSK nomogram was 0.681 and 0.032, respectively. The MSK nomogram overestimated the recurrence risk throughout the entire calibration curve. Of note, the Brier score was lower for the OCT compared to the MSK nomogram (0.147 vs 0.564, respectively), and the Hosmer-Lemeshow test was insignificant (P = 0.087) for the OCT model but significant (P < 0.001) for the MSK nomogram. Both results confirmed the superior discrimination and calibration of the OCT over the MSK nomogram. A decision curve analysis showed that the AI-based OCT model allowed for superior decision making compared to the MSK nomogram for both patients with 25-50% recurrence risk as well as those with >50% risk of recurrence. Interpretation: We present the first prognostic models of recurrence risk in GIST that demonstrate excellent discrimination, calibration, and clinical utility on external validation. Additional studies for further validation are warranted. With further validation, these tools could potentially improve patient counseling and selection for adjuvant therapy. Funding: The NCI SPORE in Soft Tissue Sarcoma and NCI Cancer Center Support Grants.

5.
JAMA Surg ; 158(11): 1126-1132, 2023 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-37703025

RESUMO

Importance: There is variability in practice and imaging usage to diagnose cervical spine injury (CSI) following blunt trauma in pediatric patients. Objective: To develop a prediction model to guide imaging usage and to identify trends in imaging and to evaluate the PEDSPINE model. Design, Setting, and Participants: This cohort study included pediatric patients (<3 years years) following blunt trauma between January 2007 and July 2017. Of 22 centers in PEDSPINE, 15 centers, comprising level 1 and 2 stand-alone pediatric hospitals, level 1 and 2 pediatric hospitals within an adult hospital, and level 1 adult hospitals, were included. Patients who died prior to obtaining cervical spine imaging were excluded. Descriptive analysis was performed to describe the population, use of imaging, and injury patterns. PEDSPINE model validation was performed. A new algorithm was derived using clinical criteria and formulation of a multiclass classification problem. Analysis took place from January to October 2022. Exposure: Blunt trauma. Main Outcomes and Measures: Primary outcome was CSI. The primary and secondary objectives were predetermined. Results: The current study, PEDSPINE II, included 9389 patients, of which 128 (1.36%) had CSI, twice the rate in PEDSPINE (0.66%). The mean (SD) age was 1.3 (0.9) years; and 70 patients (54.7%) were male. Overall, 7113 children (80%) underwent cervical spine imaging, compared with 7882 (63%) in PEDSPINE. Several candidate models were fitted for the multiclass classification problem. After comparative analysis, the multinomial regression model was chosen with one-vs-rest area under the curve (AUC) of 0.903 (95% CI, 0.836-0.943) and was able to discriminate between bony and ligamentous injury. PEDSPINE and PEDSPINE II models' ability to identify CSI were compared. In predicting the presence of any injury, PEDSPINE II obtained a one-vs-rest AUC of 0.885 (95% CI, 0.804-0.934), outperforming the PEDSPINE score (AUC, 0.845; 95% CI, 0.769-0.915). Conclusion and Relevance: This study found wide clinical variability in the evaluation of pediatric trauma patients with increased use of cervical spine imaging. This has implications of increased cost, increased radiation exposure, and a potential for overdiagnosis. This prediction tool could help to decrease the use of imaging, aid in clinical decision-making, and decrease hospital resource use and cost.


Assuntos
Traumatismos da Coluna Vertebral , Ferimentos não Penetrantes , Adulto , Criança , Humanos , Masculino , Lactente , Feminino , Estudos de Coortes , Traumatismos da Coluna Vertebral/diagnóstico por imagem , Traumatismos da Coluna Vertebral/etiologia , Ferimentos não Penetrantes/diagnóstico por imagem , Ferimentos não Penetrantes/complicações , Vértebras Cervicais/diagnóstico por imagem , Vértebras Cervicais/lesões , Tomografia Computadorizada por Raios X , Estudos Retrospectivos , Centros de Traumatologia
6.
JAMA Surg ; 158(10): 1088-1095, 2023 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-37610746

RESUMO

Importance: The use of artificial intelligence (AI) in clinical medicine risks perpetuating existing bias in care, such as disparities in access to postinjury rehabilitation services. Objective: To leverage a novel, interpretable AI-based technology to uncover racial disparities in access to postinjury rehabilitation care and create an AI-based prescriptive tool to address these disparities. Design, Setting, and Participants: This cohort study used data from the 2010-2016 American College of Surgeons Trauma Quality Improvement Program database for Black and White patients with a penetrating mechanism of injury. An interpretable AI methodology called optimal classification trees (OCTs) was applied in an 80:20 derivation/validation split to predict discharge disposition (home vs postacute care [PAC]). The interpretable nature of OCTs allowed for examination of the AI logic to identify racial disparities. A prescriptive mixed-integer optimization model using age, injury, and gender data was allowed to "fairness-flip" the recommended discharge destination for a subset of patients while minimizing the ratio of imbalance between Black and White patients. Three OCTs were developed to predict discharge disposition: the first 2 trees used unadjusted data (one without and one with the race variable), and the third tree used fairness-adjusted data. Main Outcomes and Measures: Disparities and the discriminative performance (C statistic) were compared among fairness-adjusted and unadjusted OCTs. Results: A total of 52 468 patients were included; the median (IQR) age was 29 (22-40) years, 46 189 patients (88.0%) were male, 31 470 (60.0%) were Black, and 20 998 (40.0%) were White. A total of 3800 Black patients (12.1%) were discharged to PAC, compared with 4504 White patients (21.5%; P < .001). Examining the AI logic uncovered significant disparities in PAC discharge destination access, with race playing the second most important role. The prescriptive fairness adjustment recommended flipping the discharge destination of 4.5% of the patients, with the performance of the adjusted model increasing from a C statistic of 0.79 to 0.87. After fairness adjustment, disparities disappeared, and a similar percentage of Black and White patients (15.8% vs 15.8%; P = .87) had a recommended discharge to PAC. Conclusions and Relevance: In this study, we developed an accurate, machine learning-based, fairness-adjusted model that can identify barriers to discharge to postacute care. Instead of accidentally encoding bias, interpretable AI methodologies are powerful tools to diagnose and remedy system-related bias in care, such as disparities in access to postinjury rehabilitation care.

7.
Int J Radiat Oncol Biol Phys ; 117(3): 738-749, 2023 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-37451472

RESUMO

PURPOSE: The manual segmentation of organ structures in radiation oncology treatment planning is a time-consuming and highly skilled task, particularly when treating rare tumors like sacral chordomas. This study evaluates the performance of automated deep learning (DL) models in accurately segmenting the gross tumor volume (GTV) and surrounding muscle structures of sacral chordomas. METHODS AND MATERIALS: An expert radiation oncologist contoured 5 muscle structures (gluteus maximus, gluteus medius, gluteus minimus, paraspinal, piriformis) and sacral chordoma GTV on computed tomography images from 48 patients. We trained 6 DL auto-segmentation models based on 3-dimensional U-Net and residual 3-dimensional U-Net architectures. We then implemented an average and an optimally weighted average ensemble to improve prediction performance. We evaluated algorithms with the average and standard deviation of the volumetric Dice similarity coefficient, surface Dice similarity coefficient with 2- and 3-mm thresholds, and average symmetric surface distance. One independent expert radiation oncologist assessed the clinical viability of the DL contours and determined the necessary amount of editing before they could be used in clinical practice. RESULTS: Quantitatively, the ensembles performed the best across all structures. The optimal ensemble (volumetric Dice similarity coefficient, average symmetric surface distance) was (85.5 ± 6.4, 2.6 ± 0.8; GTV), (94.4 ± 1.5, 1.0 ± 0.4; gluteus maximus), (92.6 ± 0.9, 0.9 ± 0.1; gluteus medius), (85.0 ± 2.7, 1.1 ± 0.3; gluteus minimus), (92.1 ± 1.5, 0.8 ± 0.2; paraspinal), and (78.3 ± 5.7, 1.5 ± 0.6; piriformis). The qualitative evaluation suggested that the best model could reduce the total muscle and tumor delineation time to a 19-minute average. CONCLUSIONS: Our methodology produces expert-level muscle and sacral chordoma tumor segmentation using DL and ensemble modeling. It can substantially augment the streamlining and accuracy of treatment planning and represents a critical step toward automated delineation of the clinical target volume in sarcoma and other disease sites.


Assuntos
Cordoma , Aprendizado Profundo , Humanos , Cordoma/diagnóstico por imagem , Cordoma/radioterapia , Tomografia Computadorizada por Raios X/métodos , Algoritmos , Músculos , Processamento de Imagem Assistida por Computador/métodos
8.
J Trauma Acute Care Surg ; 95(4): 565-572, 2023 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-37314698

RESUMO

BACKGROUND: Artificial intelligence (AI) risk prediction algorithms such as the smartphone-available Predictive OpTimal Trees in Emergency Surgery Risk (POTTER) for emergency general surgery (EGS) are superior to traditional risk calculators because they account for complex nonlinear interactions between variables, but how they compare to surgeons' gestalt remains unknown. Herein, we sought to: (1) compare POTTER to surgeons' surgical risk estimation and (2) assess how POTTER influences surgeons' risk estimation. STUDY DESIGN: A total of 150 patients who underwent EGS at a large quaternary care center between May 2018 and May 2019 were prospectively followed up for 30-day postoperative outcomes (mortality, septic shock, ventilator dependence, bleeding requiring transfusion, pneumonia), and clinical cases were systematically created representing their initial presentation. POTTER's outcome predictions for each case were also recorded. Thirty acute care surgeons with diverse practice settings and levels of experience were then randomized into two groups: 15 surgeons (SURG) were asked to predict the outcomes without access to POTTER's predictions while the remaining 15 (SURG-POTTER) were asked to predict the same outcomes after interacting with POTTER. Comparing to actual patient outcomes, the area under the curve (AUC) methodology was used to assess the predictive performance of (1) POTTER versus SURG, and (2) SURG versus SURG-POTTER. RESULTS: POTTER outperformed SURG in predicting all outcomes (mortality-AUC: 0.880 vs. 0.841; ventilator dependence-AUC: 0.928 vs. 0.833; bleeding-AUC: 0.832 vs. 0.735; pneumonia-AUC: 0.837 vs. 0.753) except septic shock (AUC: 0.816 vs. 0.820). SURG-POTTER outperformed SURG in predicting mortality (AUC: 0.870 vs. 0.841), bleeding (AUC: 0.811 vs. 0.735), pneumonia (AUC: 0.803 vs. 0.753) but not septic shock (AUC: 0.712 vs. 0.820) or ventilator dependence (AUC: 0.834 vs. 0.833). CONCLUSION: The AI risk calculator POTTER outperformed surgeons' gestalt in predicting the postoperative mortality and outcomes of EGS patients, and when used, improved the individual surgeons' risk prediction. Artificial intelligence algorithms, such as POTTER, could prove useful as a bedside adjunct to surgeons when preoperatively counseling patients. LEVEL OF EVIDENCE: Prognostic and Epidemiological; Level II.


Assuntos
Inteligência Artificial , Cirurgiões , Humanos , Complicações Pós-Operatórias/epidemiologia , Complicações Pós-Operatórias/etiologia , Medição de Risco/métodos , Prognóstico
10.
Am J Surg ; 226(1): 115-121, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-36948897

RESUMO

BACKGROUNDS: New methods such as machine learning could provide accurate predictions with little statistical assumptions. We seek to develop prediction model of pediatric surgical complications based on pediatric National Surgical Quality Improvement Program(NSQIP). METHODS: All 2012-2018 pediatric-NSQIP procedures were reviewed. Primary outcome was defined as 30-day post-operative morbidity/mortality. Morbidity was further classified as any, major and minor. Models were developed using 2012-2017 data. 2018 data was used as independent performance evaluation. RESULTS: 431,148 patients were included in the 2012-2017 training and 108,604 were included in the 2018 testing set. Our prediction models had high performance in mortality prediction at 0.94 AUC in testing set. Our models outperformed ACS-NSQIP Calculator in all categories for morbidity (0.90 AUC for major, 0.86 AUC for any, 0.69 AUC in minor complications). CONCLUSIONS: We developed a high-performing pediatric surgical risk prediction model. This powerful tool could potentially be used to improve the surgical care quality.


Assuntos
Complicações Pós-Operatórias , Qualidade da Assistência à Saúde , Humanos , Criança , Medição de Risco/métodos , Complicações Pós-Operatórias/etiologia , Melhoria de Qualidade , Aprendizado de Máquina , Fatores de Risco , Estudos Retrospectivos
11.
Ann Surg ; 277(1): e8-e15, 2023 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-33378309

RESUMO

OBJECTIVE: We sought to assess the performance of the Predictive OpTimal Trees in Emergency Surgery Risk (POTTER) tool in elderly emergency surgery (ES) patients. SUMMARY BACKGROUND DATA: The POTTER tool was derived using a novel Artificial Intelligence (AI)-methodology called optimal classification trees and validated for prediction of ES outcomes. POTTER outperforms all existent risk-prediction models and is available as an interactive smartphone application. Predicting outcomes in elderly patients has been historically challenging and POTTER has not yet been tested in this population. METHODS: All patients ≥65 years who underwent ES in the ACS-NSQIP 2017 database were included. POTTER's performance for 30-day mortality and 18 postoperative complications (eg, respiratory or renal failure) was assessed using c-statistic methodology, with planned sub-analyses for patients 65 to 74, 75 to 84, and 85+ years. RESULTS: A total of 29,366 patients were included, with mean age 77, 55.8% females, and 62% who underwent emergency general surgery. POTTER predicted mortality accurately in all patients over 65 (c-statistic 0.80). Its best performance was in patients 65 to 74 years (c-statistic 0.84), and its worst in patients ≥85 years (c-statistic 0.71). POTTER had the best discrimination for predicting septic shock (c-statistic 0.90), respiratory failure requiring mechanical ventilation for ≥48 hours (c-statistic 0.86), and acute renal failure (c-statistic 0.85). CONCLUSIONS: POTTER is a novel, interpretable, and highly accurate predictor of in-hospital mortality in elderly ES patients up to age 85 years. POTTER could prove useful for bedside counseling and for benchmarking of ES care.


Assuntos
Inteligência Artificial , Complicações Pós-Operatórias , Feminino , Humanos , Idoso , Idoso de 80 Anos ou mais , Masculino , Medição de Risco/métodos , Complicações Pós-Operatórias/epidemiologia , Mortalidade Hospitalar , Bases de Dados Factuais , Fatores de Risco
12.
Commun Med (Lond) ; 2(1): 136, 2022 Oct 31.
Artigo em Inglês | MEDLINE | ID: mdl-36352249

RESUMO

BACKGROUND: During the COVID-19 pandemic there has been a strong interest in forecasts of the short-term development of epidemiological indicators to inform decision makers. In this study we evaluate probabilistic real-time predictions of confirmed cases and deaths from COVID-19 in Germany and Poland for the period from January through April 2021. METHODS: We evaluate probabilistic real-time predictions of confirmed cases and deaths from COVID-19 in Germany and Poland. These were issued by 15 different forecasting models, run by independent research teams. Moreover, we study the performance of combined ensemble forecasts. Evaluation of probabilistic forecasts is based on proper scoring rules, along with interval coverage proportions to assess calibration. The presented work is part of a pre-registered evaluation study. RESULTS: We find that many, though not all, models outperform a simple baseline model up to four weeks ahead for the considered targets. Ensemble methods show very good relative performance. The addressed time period is characterized by rather stable non-pharmaceutical interventions in both countries, making short-term predictions more straightforward than in previous periods. However, major trend changes in reported cases, like the rebound in cases due to the rise of the B.1.1.7 (Alpha) variant in March 2021, prove challenging to predict. CONCLUSIONS: Multi-model approaches can help to improve the performance of epidemiological forecasts. However, while death numbers can be predicted with some success based on current case and hospitalization data, predictability of case numbers remains low beyond quite short time horizons. Additional data sources including sequencing and mobility data, which were not extensively used in the present study, may help to improve performance.


We compare forecasts of weekly case and death numbers for COVID-19 in Germany and Poland based on 15 different modelling approaches. These cover the period from January to April 2021 and address numbers of cases and deaths one and two weeks into the future, along with the respective uncertainties. We find that combining different forecasts into one forecast can enable better predictions. However, case numbers over longer periods were challenging to predict. Additional data sources, such as information about different versions of the SARS-CoV-2 virus present in the population, might improve forecasts in the future.

13.
NPJ Digit Med ; 5(1): 149, 2022 Sep 20.
Artigo em Inglês | MEDLINE | ID: mdl-36127417

RESUMO

Artificial intelligence (AI) systems hold great promise to improve healthcare over the next decades. Specifically, AI systems leveraging multiple data sources and input modalities are poised to become a viable method to deliver more accurate results and deployable pipelines across a wide range of applications. In this work, we propose and evaluate a unified Holistic AI in Medicine (HAIM) framework to facilitate the generation and testing of AI systems that leverage multimodal inputs. Our approach uses generalizable data pre-processing and machine learning modeling stages that can be readily adapted for research and deployment in healthcare environments. We evaluate our HAIM framework by training and characterizing 14,324 independent models based on HAIM-MIMIC-MM, a multimodal clinical database (N = 34,537 samples) containing 7279 unique hospitalizations and 6485 patients, spanning all possible input combinations of 4 data modalities (i.e., tabular, time-series, text, and images), 11 unique data sources and 12 predictive tasks. We show that this framework can consistently and robustly produce models that outperform similar single-source approaches across various healthcare demonstrations (by 6-33%), including 10 distinct chest pathology diagnoses, along with length-of-stay and 48 h mortality predictions. We also quantify the contribution of each modality and data source using Shapley values, which demonstrates the heterogeneity in data modality importance and the necessity of multimodal inputs across different healthcare-relevant tasks. The generalizable properties and flexibility of our Holistic AI in Medicine (HAIM) framework could offer a promising pathway for future multimodal predictive systems in clinical and operational healthcare settings.

15.
JAMA Surg ; 157(8): e221819, 2022 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-35648428

RESUMO

Importance: In patients with resectable colorectal cancer liver metastases (CRLM), the choice of surgical technique and resection margin are the only variables that are under the surgeon's direct control and may influence oncologic outcomes. There is currently no consensus on the optimal margin width. Objective: To determine the optimal margin width in CRLM by using artificial intelligence-based techniques developed by the Massachusetts Institute of Technology and to assess whether optimal margin width should be individualized based on patient characteristics. Design, Setting, and Participants: The internal cohort of the study included patients who underwent curative-intent surgery for KRAS-variant CRLM between January 1, 2000, and December 31, 2017, at Johns Hopkins Hospital, Baltimore, Maryland, Memorial Sloan Kettering Cancer Center, New York, New York, and Charité-University of Berlin, Berlin, Germany. Patients from institutions in France, Norway, the US, Austria, Argentina, and Japan were retrospectively identified from institutional databases and formed the external cohort of the study. Data were analyzed from April 15, 2019, to November 11, 2021. Exposures: Hepatectomy. Main Outcomes and Measures: Patients with KRAS-variant CRLM who underwent surgery between 2000 and 2017 at 3 tertiary centers formed the internal cohort (training and testing). In the training cohort, an artificial intelligence-based technique called optimal policy trees (OPTs) was used by building on random forest (RF) predictive models to infer the margin width associated with the maximal decrease in death probability for a given patient (ie, optimal margin width). The RF component was validated by calculating its area under the curve (AUC) in the testing cohort, whereas the OPT component was validated by a game theory-based approach called Shapley additive explanations (SHAP). Patients from international institutions formed an external validation cohort, and a new RF model was trained to externally validate the OPT-based optimal margin values. Results: This cohort study included a total of 1843 patients (internal cohort, 965; external cohort, 878). The internal cohort included 386 patients (median [IQR] age, 58.3 [49.0-68.7] years; 200 men [51.8%]) with KRAS-variant tumors. The AUC of the RF counterfactual model was 0.76 in both the internal training and testing cohorts, which is the highest ever reported. The recommended optimal margin widths for patient subgroups A, B, C, and D were 6, 7, 12, and 7 mm, respectively. The SHAP analysis largely confirmed this by suggesting 6 to 7 mm for subgroup A, 7 mm for subgroup B, 7 to 8 mm for subgroup C, and 7 mm for subgroup D. The external cohort included 375 patients (median [IQR] age, 61.0 [53.0-70.0] years; 218 men [58.1%]) with KRAS-variant tumors. The new RF model had an AUC of 0.78, which allowed for a reliable external validation of the OPT-based optimal margin. The external validation was successful as it confirmed the association of the optimal margin width of 7 mm with a considerable prolongation of survival in the external cohort. Conclusions and Relevance: This cohort study used artificial intelligence-based methodologies to provide a possible resolution to the long-standing debate on optimal margin width in CRLM.


Assuntos
Neoplasias Colorretais , Neoplasias Hepáticas , Inteligência Artificial , Estudos de Coortes , Neoplasias Colorretais/patologia , Hepatectomia/métodos , Humanos , Neoplasias Hepáticas/secundário , Masculino , Margens de Excisão , Pessoa de Meia-Idade , Prognóstico , Proteínas Proto-Oncogênicas p21(ras) , Estudos Retrospectivos
16.
Katharine Sherratt; Hugo Gruson; Rok Grah; Helen Johnson; Rene Niehus; Bastian Prasse; Frank Sandman; Jannik Deuschel; Daniel Wolffram; Sam Abbott; Alexander Ullrich; Graham Gibson; Evan L Ray; Nicholas G Reich; Daniel Sheldon; Yijin Wang; Nutcha Wattanachit; Lijing Wang; Jan Trnka; Guillaume Obozinski; Tao Sun; Dorina Thanou; Loic Pottier; Ekaterina Krymova; Maria Vittoria Barbarossa; Neele Leithauser; Jan Mohring; Johanna Schneider; Jaroslaw Wlazlo; Jan Fuhrmann; Berit Lange; Isti Rodiah; Prasith Baccam; Heidi Gurung; Steven Stage; Bradley Suchoski; Jozef Budzinski; Robert Walraven; Inmaculada Villanueva; Vit Tucek; Martin Smid; Milan Zajicek; Cesar Perez Alvarez; Borja Reina; Nikos I Bosse; Sophie Meakin; Pierfrancesco Alaimo Di Loro; Antonello Maruotti; Veronika Eclerova; Andrea Kraus; David Kraus; Lenka Pribylova; Bertsimas Dimitris; Michael Lingzhi Li; Soni Saksham; Jonas Dehning; Sebastian Mohr; Viola Priesemann; Grzegorz Redlarski; Benjamin Bejar; Giovanni Ardenghi; Nicola Parolini; Giovanni Ziarelli; Wolfgang Bock; Stefan Heyder; Thomas Hotz; David E. Singh; Miguel Guzman-Merino; Jose L Aznarte; David Morina; Sergio Alonso; Enric Alvarez; Daniel Lopez; Clara Prats; Jan Pablo Burgard; Arne Rodloff; Tom Zimmermann; Alexander Kuhlmann; Janez Zibert; Fulvia Pennoni; Fabio Divino; Marti Catala; Gianfranco Lovison; Paolo Giudici; Barbara Tarantino; Francesco Bartolucci; Giovanna Jona Lasinio; Marco Mingione; Alessio Farcomeni; Ajitesh Srivastava; Pablo Montero-Manso; Aniruddha Adiga; Benjamin Hurt; Bryan Lewis; Madhav Marathe; Przemyslaw Porebski; Srinivasan Venkatramanan; Rafal Bartczuk; Filip Dreger; Anna Gambin; Krzysztof Gogolewski; Magdalena Gruziel-Slomka; Bartosz Krupa; Antoni Moszynski; Karol Niedzielewski; Jedrzej Nowosielski; Maciej Radwan; Franciszek Rakowski; Marcin Semeniuk; Ewa Szczurek; Jakub Zielinski; Jan Kisielewski; Barbara Pabjan; Kirsten Holger; Yuri Kheifetz; Markus Scholz; Marcin Bodych; Maciej Filinski; Radoslaw Idzikowski; Tyll Krueger; Tomasz Ozanski; Johannes Bracher; Sebastian Funk.
Preprint em Inglês | medRxiv | ID: ppmedrxiv-22276024

RESUMO

BackgroundShort-term forecasts of infectious disease burden can contribute to situational awareness and aid capacity planning. Based on best practice in other fields and recent insights in infectious disease epidemiology, one can maximise the predictive performance of such forecasts if multiple models are combined into an ensemble. Here we report on the performance of ensembles in predicting COVID-19 cases and deaths across Europe between 08 March 2021 and 07 March 2022. MethodsWe used open-source tools to develop a public European COVID-19 Forecast Hub. We invited groups globally to contribute weekly forecasts for COVID-19 cases and deaths reported from a standardised source over the next one to four weeks. Teams submitted forecasts from March 2021 using standardised quantiles of the predictive distribution. Each week we created an ensemble forecast, where each predictive quantile was calculated as the equally-weighted average (initially the mean and then from 26th July the median) of all individual models predictive quantiles. We measured the performance of each model using the relative Weighted Interval Score (WIS), comparing models forecast accuracy relative to all other models. We retrospectively explored alternative methods for ensemble forecasts, including weighted averages based on models past predictive performance. ResultsOver 52 weeks we collected and combined up to 28 forecast models for 32 countries. We found a weekly ensemble had a consistently strong performance across countries over time. Across all horizons and locations, the ensemble performed better on relative WIS than 84% of participating models forecasts of incident cases (with a total N=862), and 92% of participating models forecasts of deaths (N=746). Across a one to four week time horizon, ensemble performance declined with longer forecast periods when forecasting cases, but remained stable over four weeks for incident death forecasts. In every forecast across 32 countries, the ensemble outperformed most contributing models when forecasting either cases or deaths, frequently outperforming all of its individual component models. Among several choices of ensemble methods we found that the most influential and best choice was to use a median average of models instead of using the mean, regardless of methods of weighting component forecast models. ConclusionsOur results support the use of combining forecasts from individual models into an ensemble in order to improve predictive performance across epidemiological targets and populations during infectious disease epidemics. Our findings further suggest that median ensemble methods yield better predictive performance more than ones based on means. Our findings also highlight that forecast consumers should place more weight on incident death forecasts than incident case forecasts at forecast horizons greater than two weeks. Code and data availabilityAll data and code are publicly available on Github: covid19-forecast-hub-europe/euro-hub-ensemble.

17.
J Law Biosci ; 9(1): lsac012, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35496981

RESUMO

The distribution of crucial medical goods and services in conditions of scarcity is among the most important, albeit contested, areas of public policy development. Policymakers must strike a balance between multiple efficiency and fairness objectives, while reconciling disparate value judgments from a diverse set of stakeholders. We present a general framework for combining ethical theory, data modeling, and stakeholder input in this process and illustrate through a case study on designing organ transplant allocation policies. We develop a novel analytical tool, based on machine learning and optimization, designed to facilitate efficient and wide-ranging exploration of policy outcomes across multiple objectives. Such a tool enables all stakeholders, regardless of their technical expertise, to more effectively engage in the policymaking process by developing evidence-based value judgments based on relevant tradeoffs.

18.
Surgery ; 172(1): 470-475, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35489978

RESUMO

BACKGROUND: Delays in admitting high-risk emergency surgery patients to the intensive care unit result in worse outcomes and increased health care costs. We aimed to use interpretable artificial intelligence technology to create a preoperative predictor for postoperative intensive care unit need in emergency surgery patients. METHODS: A novel, interpretable artificial intelligence technology called optimal classification trees was leveraged in an 80:20 train:test split of adult emergency surgery patients in the 2007-2017 American College of Surgeons National Surgical Quality Improvement Program database. Demographics, comorbidities, and laboratory values were used to develop, train, and then validate optimal classification tree algorithms to predict the need for postoperative intensive care unit admission. The latter was defined as postoperative death or the development of 1 or more postoperative complications warranting critical care (eg, unplanned intubation, ventilator requirement ≥48 hours, cardiac arrest requiring cardiopulmonary resuscitation, and septic shock). An interactive and user-friendly application was created. C statistics were used to measure performance. RESULTS: A total of 464,861 patients were included. The mean age was 55 years, 48% were male, and 11% developed severe postoperative complications warranting critical care. The Predictive OpTimal Trees in Emergency Surgery Risk Intensive Care Unit application was created as the user-friendly interface of the complex optimal classification tree algorithms. The number of questions (ie, tree depths) needed to predict intensive care unit admission ranged from 2 to 11. The Predictive OpTimal Trees in Emergency Surgery Risk Intensive Care Unit application had excellent discrimination for predicting the need for intensive care unit admission (C statistics: 0.89 train, 0.88 test). CONCLUSION: We recommend the Predictive OpTimal Trees in Emergency Surgery Risk Intensive Care Unit application as an accurate, artificial intelligence-based tool for predicting severe complications warranting intensive care unit admission after emergency surgery. The Predictive OpTimal Trees in Emergency Surgery Risk Intensive Care Unit application can prove useful to triage patients to the intensive care unit and to potentially decrease failure to rescue in emergency surgery patients.


Assuntos
Inteligência Artificial , Smartphone , Adulto , Cuidados Críticos , Feminino , Humanos , Unidades de Terapia Intensiva , Masculino , Pessoa de Meia-Idade , Complicações Pós-Operatórias/epidemiologia , Complicações Pós-Operatórias/etiologia , Estudos Retrospectivos
20.
Artigo em Inglês | MEDLINE | ID: mdl-37965645

RESUMO

To promote healthy behaviors, many mobile health applications provide message-based interventions, such as tips, motivational messages, or suggestions for healthy activities. Ideally, the intervention policies should be carefully designed so that users obtain the benefits without being overwhelmed by overly frequent messages. As part of the HeartSteps physical-activity intervention, users receive messages intended to disrupt sedentary behavior. HeartSteps uses an algorithm to uniformly spread out the daily message budget over time, but does not attempt to maximize treatment effects. This limitation motivates constructing a policy to optimize the message delivery decisions for more effective treatments. Moreover, the learned policy needs to be interpretable to enable behavioral scientists to examine it and to inform future theorizing. We address this problem by learning an effective and interpretable policy that reduces sedentary behavior. We propose Optimal Policy Trees + (OPT+), an innovative batch off-policy learning method, that combines a personalized threshold learning and an extension of Optimal Policy Trees under a budget-constrained setting. We implement and test the method using data collected in HeartSteps V2/V3. Computational results demonstrate a significant reduction in sedentary behavior with a lower delivery budget. OPT+ produces a highly interpretable and stable output decision tree thus enabling theoretical insights to guide future research.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...