Búsqueda | Portal Regional de la BVS

1.

Health equity assessment of machine learning performance (HEAL): a framework and dermatology AI model case study.

Schaekermann, Mike; Spitz, Terry; Pyles, Malcolm; Cole-Lewis, Heather; Wulczyn, Ellery; Pfohl, Stephen R; Martin, Donald; Jaroensri, Ronnachai; Keeling, Geoff; Liu, Yuan; Farquhar, Stephanie; Xue, Qinghan; Lester, Jenna; Hughes, Cían; Strachan, Patricia; Tan, Fraser; Bui, Peggy; Mermel, Craig H; Peng, Lily H; Matias, Yossi; Corrado, Greg S; Webster, Dale R; Virmani, Sunny; Semturs, Christopher; Liu, Yun; Horn, Ivor; Cameron Chen, Po-Hsuan.

EClinicalMedicine ; 70: 102479, 2024 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-38685924

RESUMEN

Background: Artificial intelligence (AI) has repeatedly been shown to encode historical inequities in healthcare. We aimed to develop a framework to quantitatively assess the performance equity of health AI technologies and to illustrate its utility via a case study. Methods: Here, we propose a methodology to assess whether health AI technologies prioritise performance for patient populations experiencing worse outcomes, that is complementary to existing fairness metrics. We developed the Health Equity Assessment of machine Learning performance (HEAL) framework designed to quantitatively assess the performance equity of health AI technologies via a four-step interdisciplinary process to understand and quantify domain-specific criteria, and the resulting HEAL metric. As an illustrative case study (analysis conducted between October 2022 and January 2023), we applied the HEAL framework to a dermatology AI model. A set of 5420 teledermatology cases (store-and-forward cases from patients of 20 years or older, submitted from primary care providers in the USA and skin cancer clinics in Australia), enriched for diversity in age, sex and race/ethnicity, was used to retrospectively evaluate the AI model's HEAL metric, defined as the likelihood that the AI model performs better for subpopulations with worse average health outcomes as compared to others. The likelihood that AI performance was anticorrelated to pre-existing health outcomes was estimated using bootstrap methods as the probability that the negated Spearman's rank correlation coefficient (i.e., "R") was greater than zero. Positive values of R suggest that subpopulations with poorer health outcomes have better AI model performance. Thus, the HEAL metric, defined as p (R >0), measures how likely the AI technology is to prioritise performance for subpopulations with worse average health outcomes as compared to others (presented as a percentage below). Health outcomes were quantified as disability-adjusted life years (DALYs) when grouping by sex and age, and years of life lost (YLLs) when grouping by race/ethnicity. AI performance was measured as top-3 agreement with the reference diagnosis from a panel of 3 dermatologists per case. Findings: Across all dermatologic conditions, the HEAL metric was 80.5% for prioritizing AI performance of racial/ethnic subpopulations based on YLLs, and 92.1% and 0.0% respectively for prioritizing AI performance of sex and age subpopulations based on DALYs. Certain dermatologic conditions were significantly associated with greater AI model performance compared to a reference category of less common conditions. For skin cancer conditions, the HEAL metric was 73.8% for prioritizing AI performance of age subpopulations based on DALYs. Interpretation: Analysis using the proposed HEAL framework showed that the dermatology AI model prioritised performance for race/ethnicity, sex (all conditions) and age (cancer conditions) subpopulations with respect to pre-existing health disparities. More work is needed to investigate ways of promoting equitable AI performance across age for non-cancer conditions and to better understand how AI models can contribute towards improving equity in health outcomes. Funding: Google LLC.

2.

Assistive AI in Lung Cancer Screening: A Retrospective Multinational Study in the United States and Japan.

Kiraly, Atilla P; Cunningham, Corbin A; Najafi, Ryan; Nabulsi, Zaid; Yang, Jie; Lau, Charles; Ledsam, Joseph R; Ye, Wenxing; Ardila, Diego; McKinney, Scott M; Pilgrim, Rory; Liu, Yun; Saito, Hiroaki; Shimamura, Yasuteru; Etemadi, Mozziyar; Melnick, David; Jansen, Sunny; Corrado, Greg S; Peng, Lily; Tse, Daniel; Shetty, Shravya; Prabhakara, Shruthi; Naidich, David P; Beladia, Neeral; Eswaran, Krish.

Radiol Artif Intell ; 6(3): e230079, 2024 05.

Artículo en Inglés | MEDLINE | ID: mdl-38477661

RESUMEN

Purpose To evaluate the impact of an artificial intelligence (AI) assistant for lung cancer screening on multinational clinical workflows. Materials and Methods An AI assistant for lung cancer screening was evaluated on two retrospective randomized multireader multicase studies where 627 (141 cancer-positive cases) low-dose chest CT cases were each read twice (with and without AI assistance) by experienced thoracic radiologists (six U.S.-based or six Japan-based radiologists), resulting in a total of 7524 interpretations. Positive cases were defined as those within 2 years before a pathology-confirmed lung cancer diagnosis. Negative cases were defined as those without any subsequent cancer diagnosis for at least 2 years and were enriched for a spectrum of diverse nodules. The studies measured the readers' level of suspicion (on a 0-100 scale), country-specific screening system scoring categories, and management recommendations. Evaluation metrics included the area under the receiver operating characteristic curve (AUC) for level of suspicion and sensitivity and specificity of recall recommendations. Results With AI assistance, the radiologists' AUC increased by 0.023 (0.70 to 0.72; P = .02) for the U.S. study and by 0.023 (0.93 to 0.96; P = .18) for the Japan study. Scoring system specificity for actionable findings increased 5.5% (57% to 63%; P < .001) for the U.S. study and 6.7% (23% to 30%; P < .001) for the Japan study. There was no evidence of a difference in corresponding sensitivity between unassisted and AI-assisted reads for the U.S. (67.3% to 67.5%; P = .88) and Japan (98% to 100%; P > .99) studies. Corresponding stand-alone AI AUC system performance was 0.75 (95% CI: 0.70, 0.81) and 0.88 (95% CI: 0.78, 0.97) for the U.S.- and Japan-based datasets, respectively. Conclusion The concurrent AI interface improved lung cancer screening specificity in both U.S.- and Japan-based reader studies, meriting further study in additional international screening environments. Keywords: Assistive Artificial Intelligence, Lung Cancer Screening, CT Supplemental material is available for this article. Published under a CC BY 4.0 license.

Asunto(s)

Inteligencia Artificial , Detección Precoz del Cáncer , Neoplasias Pulmonares , Tomografía Computarizada por Rayos X , Humanos , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/epidemiología , Japón , Estados Unidos/epidemiología , Estudios Retrospectivos , Detección Precoz del Cáncer/métodos , Femenino , Masculino , Persona de Mediana Edad , Anciano , Sensibilidad y Especificidad , Interpretación de Imagen Radiográfica Asistida por Computador/métodos

3.

Lisocabtagene maraleucel for second-line relapsed or refractory large B-cell lymphoma: patient-reported outcomes from the PILOT study.

Gordon, Leo I; Liu, Fei Fei; Braverman, Julia; Hoda, Daanish; Ghosh, Nilanjan; Hamadani, Mehdi; Hildebrandt, Gerhard C; Peng, Lily; Guo, Shien; Shi, Ling; Sehgal, Alison.

Haematologica ; 109(3): 857-866, 2024 Mar 01.

Artículo en Inglés | MEDLINE | ID: mdl-37646670

RESUMEN

In the single-arm, open-label, multicenter, phase II PILOT study, second-line treatment with the chimeric antigen receptor (CAR) T-cell therapy lisocabtagene maraleucel (liso-cel) in patients with relapsed or refractory (R/R) large B-cell lymphoma (LBCL) for whom hematopoietic stem cell transplantation (HSCT) was not intended resulted in high response rates, durable responses, and a safety profile consistent with previous reports. Here, we analyzed changes in health-related quality of life (HRQOL) in patients who received liso-cel in PILOT. Patients received liso-cel, an autologous, CD19-directed, 4-1BB CAR T-cell product administered at equal target doses of CD8+ and CD4+ CAR+ T cells, for a total target dose of 100×106 CAR+ T cells. HRQOL, a secondary endpoint of PILOT, was assessed as prespecified using three patient-reported outcome instruments (EORTC QLQ-C30; FACT-LymS; EQ-5D-5L). Evaluable datasets for the EORTC QLQ-C30, FACT-LymS, and EQ-5D-5L health utility index, and visual analog scale (EQ-VAS) included 56 (92%), 49 (80%), 55 (90%), and 54 (89%) patients, respectively. Clinically meaningful improvement was achieved across most post-treatment visits for EORTC QLQ-C30 fatigue and FACT-LymS. Overall mean changes from baseline through day 545 showed significant improvements in EORTC QLQ-C30 fatigue, pain, and appetite loss, FACT-LymS, and EQ VAS. In within-patient analyses, clinically meaningful improvements or maintenance in scores were observed in most patients at days 90, 180, 270, and 365. HRQOL was maintained or improved in patients who received liso-cel as second-line therapy in PILOT. These findings support liso-cel as a preferred second-line treatment in patients with R/R LBCL not intended for HSCT (clinicaltrials gov. Identifier: NCT03483103).

Asunto(s)

Linfoma de Células B Grandes Difuso , Calidad de Vida , Humanos , Proyectos Piloto , Linfoma de Células B Grandes Difuso/terapia , Fatiga , Medición de Resultados Informados por el Paciente

4.

Lisocabtagene Maraleucel in Relapsed/Refractory Mantle Cell Lymphoma: Primary Analysis of the Mantle Cell Lymphoma Cohort From TRANSCEND NHL 001, a Phase I Multicenter Seamless Design Study.

Wang, Michael; Siddiqi, Tanya; Gordon, Leo I; Kamdar, Manali; Lunning, Matthew; Hirayama, Alexandre V; Abramson, Jeremy S; Arnason, Jon; Ghosh, Nilanjan; Mehta, Amitkumar; Andreadis, Charalambos; Solomon, Scott R; Kostic, Ana; Dehner, Christine; Espinola, Ricardo; Peng, Lily; Ogasawara, Ken; Chattin, Amy; Eliason, Laurie; Palomba, M Lia.

J Clin Oncol ; 42(10): 1146-1157, 2024 Apr 01.

Artículo en Inglés | MEDLINE | ID: mdl-38072625

RESUMEN

PURPOSE: To report the primary analysis results from the mantle cell lymphoma (MCL) cohort of the phase I seamless design TRANSCEND NHL 001 (ClinicalTrials.gov identifier: NCT02631044) study. METHODS: Patients with relapsed/refractory (R/R) MCL after ≥two lines of previous therapy, including a Bruton tyrosine kinase inhibitor (BTKi), an alkylating agent, and a CD20-targeted agent, received lisocabtagene maraleucel (liso-cel) at a target dose level (DL) of 50 × 106 (DL1) or 100 × 106 (DL2) chimeric antigen receptor-positive T cells. Primary end points were adverse events (AEs), dose-limiting toxicities, and objective response rate (ORR) by independent review committee per Lugano criteria. RESULTS: Of 104 leukapheresed patients, liso-cel was infused into 88. Median (range) number of previous lines of therapy was three (1-11) with 30% receiving ≥five previous lines of therapy, 73% of patients were age 65 years and older, 69% had refractory disease, 53% had BTKi refractory disease, 23% had TP53 mutation, and 8% had secondary CNS lymphoma. Median (range) on-study follow-up was 16.1 months (0.4-60.5). In the efficacy set (n = 83; DL1 + DL2), ORR was 83.1% (95% CI, 73.3 to 90.5) and complete response (CR) rate was 72.3% (95% CI, 61.4 to 81.6). Median duration of response was 15.7 months (95% CI, 6.2 to 24.0) and progression-free survival was 15.3 months (95% CI, 6.6 to 24.9). Most common grade ≥3 treatment-emergent AEs were neutropenia (56%), anemia (37.5%), and thrombocytopenia (25%). Cytokine release syndrome (CRS) was reported in 61% of patients (grade 3/4, 1%; grade 5, 0), neurologic events (NEs) in 31% (grade 3/4, 9%; grade 5, 0), grade ≥3 infections in 15%, and prolonged cytopenia in 40%. CONCLUSION: Liso-cel demonstrated high CR rate and deep, durable responses with low incidence of grade ≥3 CRS, NE, and infections in patients with heavily pretreated R/R MCL, including those with high-risk, aggressive disease.

Asunto(s)

Antineoplásicos , Linfoma de Células B Grandes Difuso , Linfoma de Células del Manto , Neutropenia , Adulto , Anciano , Humanos , Antineoplásicos/efectos adversos , Inmunoterapia Adoptiva/efectos adversos , Linfoma de Células B Grandes Difuso/tratamiento farmacológico , Recurrencia Local de Neoplasia/tratamiento farmacológico , Neutropenia/inducido químicamente

5.

The effects of extinction and an explicitly unpaired treatment on the reinforcing properties of a Pavlovian conditioned stimulus.

Kennedy, Nicholas G W; Holmes, Nathan M; Peng, Lily W T; Frederick Westbrook, R.

Neurobiol Learn Mem ; 207: 107879, 2024 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-38081536

RESUMEN

This series of experiments examined the effects of extinction and an explicitly unpaired treatment on the ability of a conditioned stimulus (CS) to function as a reinforcer. Rats were trained to lever press for food, exposed to pairings of a noise CS and food, and, finally, tested for their willingness to lever press for the CS in the absence of the food. Experiment 1 provided a demonstration of conditioned reinforcement (using controls that were only exposed to unpaired presentations of the CS and food) and showed that it was equivalent after one or four sessions of CS-food pairings. Experiments 2 and 3 showed that, after one session of CS-food pairings, repeated presentations of the CS alone reduced its reinforcing properties; but after four sessions of CS-food pairings, repeated presentations of the CS alone had no effect on these properties. Experiment 4 showed that, after four sessions of CS-food pairings, explicitly unpaired presentations of the CS and food completely undermined conditioned reinforcement. Finally, Experiment 5 provided within-experiment evidence that, after four sessions of CS-food pairings, the reinforcing properties of the CS were disrupted by explicitly unpaired presentations of the CS and food but spared by repeated presentations of the CS alone. Together, these findings indicate that the effectiveness of extinction in undermining the reinforcing properties of a CS depends on its level of conditioning; and that, where extinction fails to disrupt these properties, they are successfully undermined by an explicitly unpaired treatment. They are discussed with respect to findings in the literature on Pavlovian-to-instrumental transfer; and the Rescorla-Wagner model, which anticipates that an explicitly unpaired treatment will be more effective than extinction in reversing the effects of conditioning.

Asunto(s)

Condicionamiento Operante , Refuerzo en Psicología , Ratas , Animales , Condicionamiento Clásico , Extinción Psicológica

6.

Risk Stratification for Diabetic Retinopathy Screening Order Using Deep Learning: A Multicenter Prospective Study.

Bora, Ashish; Tiwari, Richa; Bavishi, Pinal; Virmani, Sunny; Huang, Rayman; Traynis, Ilana; Corrado, Greg S; Peng, Lily; Webster, Dale R; Varadarajan, Avinash V; Pattanapongpaiboon, Warisara; Chopra, Reena; Ruamviboonsuk, Paisan.

Transl Vis Sci Technol ; 12(12): 11, 2023 12 01.

Artículo en Inglés | MEDLINE | ID: mdl-38079169

RESUMEN

Purpose: Real-world evaluation of a deep learning model that prioritizes patients based on risk of progression to moderate or worse (MOD+) diabetic retinopathy (DR). Methods: This nonrandomized, single-arm, prospective, interventional study included patients attending DR screening at four centers across Thailand from September 2019 to January 2020, with mild or no DR. Fundus photographs were input into the model, and patients were scheduled for their subsequent screening from September 2020 to January 2021 in order of predicted risk. Evaluation focused on model sensitivity, defined as correctly ranking patients that developed MOD+ within the first 50% of subsequent screens. Results: We analyzed 1,757 patients, of which 52 (3.0%) developed MOD+. Using the model-proposed order, the model's sensitivity was 90.4%. Both the model-proposed order and mild/no DR plus HbA1c had significantly higher sensitivity than the random order (P < 0.001). Excluding one major (rural) site that had practical implementation challenges, the remaining sites included 567 patients and 15 (2.6%) developed MOD+. Here, the model-proposed order achieved 86.7% versus 73.3% for the ranking that used DR grade and hemoglobin A1c. Conclusions: The model can help prioritize follow-up visits for the largest subgroups of DR patients (those with no or mild DR). Further research is needed to evaluate the impact on clinical management and outcomes. Translational Relevance: Deep learning demonstrated potential for risk stratification in DR screening. However, real-world practicalities must be resolved to fully realize the benefit.

Asunto(s)

Aprendizaje Profundo , Diabetes Mellitus , Retinopatía Diabética , Humanos , Retinopatía Diabética/diagnóstico , Retinopatía Diabética/epidemiología , Estudios Prospectivos , Hemoglobina Glucada , Medición de Riesgo

7.

Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging.

Azizi, Shekoofeh; Culp, Laura; Freyberg, Jan; Mustafa, Basil; Baur, Sebastien; Kornblith, Simon; Chen, Ting; Tomasev, Nenad; Mitrovic, Jovana; Strachan, Patricia; Mahdavi, S Sara; Wulczyn, Ellery; Babenko, Boris; Walker, Megan; Loh, Aaron; Chen, Po-Hsuan Cameron; Liu, Yuan; Bavishi, Pinal; McKinney, Scott Mayer; Winkens, Jim; Roy, Abhijit Guha; Beaver, Zach; Ryan, Fiona; Krogue, Justin; Etemadi, Mozziyar; Telang, Umesh; Liu, Yun; Peng, Lily; Corrado, Greg S; Webster, Dale R; Fleet, David; Hinton, Geoffrey; Houlsby, Neil; Karthikesalingam, Alan; Norouzi, Mohammad; Natarajan, Vivek.

Nat Biomed Eng ; 7(6): 756-779, 2023 06.

Artículo en Inglés | MEDLINE | ID: mdl-37291435

RESUMEN

Machine-learning models for medical tasks can match or surpass the performance of clinical experts. However, in settings differing from those of the training dataset, the performance of a model can deteriorate substantially. Here we report a representation-learning strategy for machine-learning models applied to medical-imaging tasks that mitigates such 'out of distribution' performance problem and that improves model robustness and training efficiency. The strategy, which we named REMEDIS (for 'Robust and Efficient Medical Imaging with Self-supervision'), combines large-scale supervised transfer learning on natural images and intermediate contrastive self-supervised learning on medical images and requires minimal task-specific customization. We show the utility of REMEDIS in a range of diagnostic-imaging tasks covering six imaging domains and 15 test datasets, and by simulating three realistic out-of-distribution scenarios. REMEDIS improved in-distribution diagnostic accuracies up to 11.5% with respect to strong supervised baseline models, and in out-of-distribution settings required only 1-33% of the data for retraining to match the performance of supervised models retrained using all available data. REMEDIS may accelerate the development lifecycle of machine-learning models for medical imaging.

Asunto(s)

Aprendizaje Automático , Aprendizaje Automático Supervisado , Diagnóstico por Imagen

8.

Lessons learned from translating AI from development to deployment in healthcare.

Widner, Kasumi; Virmani, Sunny; Krause, Jonathan; Nayar, Jay; Tiwari, Richa; Pedersen, Elin Rønby; Jeji, Divleen; Hammel, Naama; Matias, Yossi; Corrado, Greg S; Liu, Yun; Peng, Lily; Webster, Dale R.

Nat Med ; 29(6): 1304-1306, 2023 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-37248297

Asunto(s)

Inteligencia Artificial , Atención a la Salud

9.

Predicting lymph node metastasis from primary tumor histology and clinicopathologic factors in colorectal cancer using deep learning.

Krogue, Justin D; Azizi, Shekoofeh; Tan, Fraser; Flament-Auvigne, Isabelle; Brown, Trissia; Plass, Markus; Reihs, Robert; Müller, Heimo; Zatloukal, Kurt; Richeson, Pema; Corrado, Greg S; Peng, Lily H; Mermel, Craig H; Liu, Yun; Chen, Po-Hsuan Cameron; Gombar, Saurabh; Montine, Thomas; Shen, Jeanne; Steiner, David F; Wulczyn, Ellery.

Commun Med (Lond) ; 3(1): 59, 2023 Apr 24.

Artículo en Inglés | MEDLINE | ID: mdl-37095223

RESUMEN

BACKGROUND: Presence of lymph node metastasis (LNM) influences prognosis and clinical decision-making in colorectal cancer. However, detection of LNM is variable and depends on a number of external factors. Deep learning has shown success in computational pathology, but has struggled to boost performance when combined with known predictors. METHODS: Machine-learned features are created by clustering deep learning embeddings of small patches of tumor in colorectal cancer via k-means, and then selecting the top clusters that add predictive value to a logistic regression model when combined with known baseline clinicopathological variables. We then analyze performance of logistic regression models trained with and without these machine-learned features in combination with the baseline variables. RESULTS: The machine-learned extracted features provide independent signal for the presence of LNM (AUROC: 0.638, 95% CI: [0.590, 0.683]). Furthermore, the machine-learned features add predictive value to the set of 6 clinicopathologic variables in an external validation set (likelihood ratio test, p < 0.00032; AUROC: 0.740, 95% CI: [0.701, 0.780]). A model incorporating these features can also further risk-stratify patients with and without identified metastasis (p < 0.001 for both stage II and stage III). CONCLUSION: This work demonstrates an effective approach to combine deep learning with established clinicopathologic factors in order to identify independently informative features associated with LNM. Further work building on these specific results may have important impact in prognostication and therapeutic decision making for LNM. Additionally, this general computational approach may prove useful in other contexts.

When colorectal cancers spread to the lymph nodes, it can indicate a poorer prognosis. However, detecting lymph node metastasis (spread) can be difficult and depends on a number of factors such as how samples are taken and processed. Here, we show that machine learning, which involves computer software learning from patterns in data, can predict lymph node metastasis in patients with colorectal cancer from the microscopic appearance of their primary tumor and the clinical characteristics of the patients. We also show that the same approach can predict patient survival. With further work, our approach may help clinicians to inform patients about their prognosis and decide on appropriate treatments.

10.

A deep learning model for novel systemic biomarkers in photographs of the external eye: a retrospective study.

Babenko, Boris; Traynis, Ilana; Chen, Christina; Singh, Preeti; Uddin, Akib; Cuadros, Jorge; Daskivich, Lauren P; Maa, April Y; Kim, Ramasamy; Kang, Eugene Yu-Chuan; Matias, Yossi; Corrado, Greg S; Peng, Lily; Webster, Dale R; Semturs, Christopher; Krause, Jonathan; Varadarajan, Avinash V; Hammel, Naama; Liu, Yun.

Lancet Digit Health ; 5(5): e257-e264, 2023 05.

Artículo en Inglés | MEDLINE | ID: mdl-36966118

RESUMEN

BACKGROUND: Photographs of the external eye were recently shown to reveal signs of diabetic retinal disease and elevated glycated haemoglobin. This study aimed to test the hypothesis that external eye photographs contain information about additional systemic medical conditions. METHODS: We developed a deep learning system (DLS) that takes external eye photographs as input and predicts systemic parameters, such as those related to the liver (albumin, aspartate aminotransferase [AST]); kidney (estimated glomerular filtration rate [eGFR], urine albumin-to-creatinine ratio [ACR]); bone or mineral (calcium); thyroid (thyroid stimulating hormone); and blood (haemoglobin, white blood cells [WBC], platelets). This DLS was trained using 123 130 images from 38 398 patients with diabetes undergoing diabetic eye screening in 11 sites across Los Angeles county, CA, USA. Evaluation focused on nine prespecified systemic parameters and leveraged three validation sets (A, B, C) spanning 25 510 patients with and without diabetes undergoing eye screening in three independent sites in Los Angeles county, CA, and the greater Atlanta area, GA, USA. We compared performance against baseline models incorporating available clinicodemographic variables (eg, age, sex, race and ethnicity, years with diabetes). FINDINGS: Relative to the baseline, the DLS achieved statistically significant superior performance at detecting AST >36·0 U/L, calcium <8·6 mg/dL, eGFR <60·0 mL/min/1·73 m2, haemoglobin <11·0 g/dL, platelets <150·0 × 103/µL, ACR ≥300 mg/g, and WBC <4·0 × 103/µL on validation set A (a population resembling the development datasets), with the area under the receiver operating characteristic curve (AUC) of the DLS exceeding that of the baseline by 5·3-19·9% (absolute differences in AUC). On validation sets B and C, with substantial patient population differences compared with the development datasets, the DLS outperformed the baseline for ACR ≥300·0 mg/g and haemoglobin <11·0 g/dL by 7·3-13·2%. INTERPRETATION: We found further evidence that external eye photographs contain biomarkers spanning multiple organ systems. Such biomarkers could enable accessible and non-invasive screening of disease. Further work is needed to understand the translational implications. FUNDING: Google.

Asunto(s)

Aprendizaje Profundo , Retinopatía Diabética , Humanos , Estudios Retrospectivos , Calcio , Retinopatía Diabética/diagnóstico , Biomarcadores , Albúminas

11.

Pathologist Validation of a Machine Learning-Derived Feature for Colon Cancer Risk Stratification.

L'Imperio, Vincenzo; Wulczyn, Ellery; Plass, Markus; Müller, Heimo; Tamini, Nicolò; Gianotti, Luca; Zucchini, Nicola; Reihs, Robert; Corrado, Greg S; Webster, Dale R; Peng, Lily H; Chen, Po-Hsuan Cameron; Lavitrano, Marialuisa; Liu, Yun; Steiner, David F; Zatloukal, Kurt; Pagni, Fabio.

JAMA Netw Open ; 6(3): e2254891, 2023 03 01.

Artículo en Inglés | MEDLINE | ID: mdl-36917112

RESUMEN

Importance: Identifying new prognostic features in colon cancer has the potential to refine histopathologic review and inform patient care. Although prognostic artificial intelligence systems have recently demonstrated significant risk stratification for several cancer types, studies have not yet shown that the machine learning-derived features associated with these prognostic artificial intelligence systems are both interpretable and usable by pathologists. Objective: To evaluate whether pathologist scoring of a histopathologic feature previously identified by machine learning is associated with survival among patients with colon cancer. Design, Setting, and Participants: This prognostic study used deidentified, archived colorectal cancer cases from January 2013 to December 2015 from the University of Milano-Bicocca. All available histologic slides from 258 consecutive colon adenocarcinoma cases were reviewed from December 2021 to February 2022 by 2 pathologists, who conducted semiquantitative scoring for tumor adipose feature (TAF), which was previously identified via a prognostic deep learning model developed with an independent colorectal cancer cohort. Main Outcomes and Measures: Prognostic value of TAF for overall survival and disease-specific survival as measured by univariable and multivariable regression analyses. Interpathologist agreement in TAF scoring was also evaluated. Results: A total of 258 colon adenocarcinoma histopathologic cases from 258 patients (138 men [53%]; median age, 67 years [IQR, 65-81 years]) with stage II (n = 119) or stage III (n = 139) cancer were included. Tumor adipose feature was identified in 120 cases (widespread in 63 cases, multifocal in 31, and unifocal in 26). For overall survival analysis after adjustment for tumor stage, TAF was independently prognostic in 2 ways: TAF as a binary feature (presence vs absence: hazard ratio [HR] for presence of TAF, 1.55 [95% CI, 1.07-2.25]; P = .02) and TAF as a semiquantitative categorical feature (HR for widespread TAF, 1.87 [95% CI, 1.23-2.85]; P = .004). Interpathologist agreement for widespread TAF vs lower categories (absent, unifocal, or multifocal) was 90%, corresponding to a κ metric at this threshold of 0.69 (95% CI, 0.58-0.80). Conclusions and Relevance: In this prognostic study, pathologists were able to learn and reproducibly score for TAF, providing significant risk stratification on this independent data set. Although additional work is warranted to understand the biological significance of this feature and to establish broadly reproducible TAF scoring, this work represents the first validation to date of human expert learning from machine learning in pathology. Specifically, this validation demonstrates that a computationally identified histologic feature can represent a human-identifiable, prognostic feature with the potential for integration into pathology practice.

Asunto(s)

Adenocarcinoma , Neoplasias del Colon , Masculino , Humanos , Anciano , Neoplasias del Colon/diagnóstico , Patólogos , Inteligencia Artificial , Aprendizaje Automático , Medición de Riesgo

12.

Deep Learning Detection of Active Pulmonary Tuberculosis at Chest Radiography Matched the Clinical Performance of Radiologists.

Kazemzadeh, Sahar; Yu, Jin; Jamshy, Shahar; Pilgrim, Rory; Nabulsi, Zaid; Chen, Christina; Beladia, Neeral; Lau, Charles; McKinney, Scott Mayer; Hughes, Thad; Kiraly, Atilla P; Kalidindi, Sreenivasa Raju; Muyoyeta, Monde; Malemela, Jameson; Shih, Ting; Corrado, Greg S; Peng, Lily; Chou, Katherine; Chen, Po-Hsuan Cameron; Liu, Yun; Eswaran, Krish; Tse, Daniel; Shetty, Shravya; Prabhakara, Shruthi.

Radiology ; 306(1): 124-137, 2023 01.

Artículo en Inglés | MEDLINE | ID: mdl-36066366

RESUMEN

Background The World Health Organization (WHO) recommends chest radiography to facilitate tuberculosis (TB) screening. However, chest radiograph interpretation expertise remains limited in many regions. Purpose To develop a deep learning system (DLS) to detect active pulmonary TB on chest radiographs and compare its performance to that of radiologists. Materials and Methods A DLS was trained and tested using retrospective chest radiographs (acquired between 1996 and 2020) from 10 countries. To improve generalization, large-scale chest radiograph pretraining, attention pooling, and semisupervised learning ("noisy-student") were incorporated. The DLS was evaluated in a four-country test set (China, India, the United States, and Zambia) and in a mining population in South Africa, with positive TB confirmed with microbiological tests or nucleic acid amplification testing (NAAT). The performance of the DLS was compared with that of 14 radiologists. The authors studied the efficacy of the DLS compared with that of nine radiologists using the Obuchowski-Rockette-Hillis procedure. Given WHO targets of 90% sensitivity and 70% specificity, the operating point of the DLS (0.45) was prespecified to favor sensitivity. Results A total of 165 754 images in 22 284 subjects (mean age, 45 years; 21% female) were used for model development and testing. In the four-country test set (1236 subjects, 17% with active TB), the receiver operating characteristic (ROC) curve of the DLS was higher than those for all nine India-based radiologists, with an area under the ROC curve of 0.89 (95% CI: 0.87, 0.91). Compared with these radiologists, at the prespecified operating point, the DLS sensitivity was higher (88% vs 75%, P < .001) and specificity was noninferior (79% vs 84%, P = .004). Trends were similar within other patient subgroups, in the South Africa data set, and across various TB-specific chest radiograph findings. In simulations, the use of the DLS to identify likely TB-positive chest radiographs for NAAT confirmation reduced the cost by 40%-80% per TB-positive patient detected. Conclusion A deep learning method was found to be noninferior to radiologists for the determination of active tuberculosis on digital chest radiographs. © RSNA, 2022 Online supplemental material is available for this article. See also the editorial by van Ginneken in this issue.

Asunto(s)

Aprendizaje Profundo , Tuberculosis Pulmonar , Humanos , Femenino , Persona de Mediana Edad , Masculino , Radiografía Torácica/métodos , Estudios Retrospectivos , Radiografía , Tuberculosis Pulmonar/diagnóstico por imagen , Radiólogos , Sensibilidad y Especificidad

13.

A mobile-optimized artificial intelligence system for gestational age and fetal malpresentation assessment.

Gomes, Ryan G; Vwalika, Bellington; Lee, Chace; Willis, Angelica; Sieniek, Marcin; Price, Joan T; Chen, Christina; Kasaro, Margaret P; Taylor, James A; Stringer, Elizabeth M; McKinney, Scott Mayer; Sindano, Ntazana; Dahl, George E; Goodnight, William; Gilmer, Justin; Chi, Benjamin H; Lau, Charles; Spitz, Terry; Saensuksopa, T; Liu, Kris; Tiyasirichokchai, Tiya; Wong, Jonny; Pilgrim, Rory; Uddin, Akib; Corrado, Greg; Peng, Lily; Chou, Katherine; Tse, Daniel; Stringer, Jeffrey S A; Shetty, Shravya.

Commun Med (Lond) ; 2: 128, 2022.

Artículo en Inglés | MEDLINE | ID: mdl-36249461

RESUMEN

Background: Fetal ultrasound is an important component of antenatal care, but shortage of adequately trained healthcare workers has limited its adoption in low-to-middle-income countries. This study investigated the use of artificial intelligence for fetal ultrasound in under-resourced settings. Methods: Blind sweep ultrasounds, consisting of six freehand ultrasound sweeps, were collected by sonographers in the USA and Zambia, and novice operators in Zambia. We developed artificial intelligence (AI) models that used blind sweeps to predict gestational age (GA) and fetal malpresentation. AI GA estimates and standard fetal biometry estimates were compared to a previously established ground truth, and evaluated for difference in absolute error. Fetal malpresentation (non-cephalic vs cephalic) was compared to sonographer assessment. On-device AI model run-times were benchmarked on Android mobile phones. Results: Here we show that GA estimation accuracy of the AI model is non-inferior to standard fetal biometry estimates (error difference -1.4 ± 4.5 days, 95% CI -1.8, -0.9, n = 406). Non-inferiority is maintained when blind sweeps are acquired by novice operators performing only two of six sweep motion types. Fetal malpresentation AUC-ROC is 0.977 (95% CI, 0.949, 1.00, n = 613), sonographers and novices have similar AUC-ROC. Software run-times on mobile phones for both diagnostic models are less than 3 s after completion of a sweep. Conclusions: The gestational age model is non-inferior to the clinical standard and the fetal malpresentation model has high AUC-ROCs across operators and devices. Our AI models are able to run on-device, without internet connectivity, and provide feedback scores to assist in upleveling the capabilities of lightly trained ultrasound operators in low resource settings.

14.

Deep learning models for histologic grading of breast cancer and association with disease prognosis.

Jaroensri, Ronnachai; Wulczyn, Ellery; Hegde, Narayan; Brown, Trissia; Flament-Auvigne, Isabelle; Tan, Fraser; Cai, Yuannan; Nagpal, Kunal; Rakha, Emad A; Dabbs, David J; Olson, Niels; Wren, James H; Thompson, Elaine E; Seetao, Erik; Robinson, Carrie; Miao, Melissa; Beckers, Fabien; Corrado, Greg S; Peng, Lily H; Mermel, Craig H; Liu, Yun; Steiner, David F; Chen, Po-Hsuan Cameron.

NPJ Breast Cancer ; 8(1): 113, 2022 Oct 04.

Artículo en Inglés | MEDLINE | ID: mdl-36192400

RESUMEN

Histologic grading of breast cancer involves review and scoring of three well-established morphologic features: mitotic count, nuclear pleomorphism, and tubule formation. Taken together, these features form the basis of the Nottingham Grading System which is used to inform breast cancer characterization and prognosis. In this study, we develop deep learning models to perform histologic scoring of all three components using digitized hematoxylin and eosin-stained slides containing invasive breast carcinoma. We first evaluate model performance using pathologist-based reference standards for each component. To complement this typical approach to evaluation, we further evaluate the deep learning models via prognostic analyses. The individual component models perform at or above published benchmarks for algorithm-based grading approaches, achieving high concordance rates with pathologist grading. Further, prognostic performance using deep learning-based grading is on par with that of pathologists performing review of matched slides. By providing scores for each component feature, the deep-learning based approach also provides the potential to identify the grading components contributing most to prognostic value. This may enable optimized prognostic models, opportunities to improve access to consistent grading, and approaches to better understand the links between histologic features and clinical outcomes in breast cancer.

15.

Real-time diabetic retinopathy screening by deep learning in a multisite national screening programme: a prospective interventional cohort study.

Ruamviboonsuk, Paisan; Tiwari, Richa; Sayres, Rory; Nganthavee, Variya; Hemarat, Kornwipa; Kongprayoon, Apinpat; Raman, Rajiv; Levinstein, Brian; Liu, Yun; Schaekermann, Mike; Lee, Roy; Virmani, Sunny; Widner, Kasumi; Chambers, John; Hersch, Fred; Peng, Lily; Webster, Dale R.

Lancet Digit Health ; 4(4): e235-e244, 2022 04.

Artículo en Inglés | MEDLINE | ID: mdl-35272972

RESUMEN

BACKGROUND: Diabetic retinopathy is a leading cause of preventable blindness, especially in low-income and middle-income countries (LMICs). Deep-learning systems have the potential to enhance diabetic retinopathy screenings in these settings, yet prospective studies assessing their usability and performance are scarce. METHODS: We did a prospective interventional cohort study to evaluate the real-world performance and feasibility of deploying a deep-learning system into the health-care system of Thailand. Patients with diabetes and listed on the national diabetes registry, aged 18 years or older, able to have their fundus photograph taken for at least one eye, and due for screening as per the Thai Ministry of Public Health guidelines were eligible for inclusion. Eligible patients were screened with the deep-learning system at nine primary care sites under Thailand's national diabetic retinopathy screening programme. Patients with a previous diagnosis of diabetic macular oedema, severe non-proliferative diabetic retinopathy, or proliferative diabetic retinopathy; previous laser treatment of the retina or retinal surgery; other non-diabetic retinopathy eye disease requiring referral to an ophthalmologist; or inability to have fundus photograph taken of both eyes for any reason were excluded. Deep-learning system-based interpretations of patient fundus images and referral recommendations were provided in real time. As a safety mechanism, regional retina specialists over-read each image. Performance of the deep-learning system (accuracy, sensitivity, specificity, positive predictive value [PPV], and negative predictive value [NPV]) were measured against an adjudicated reference standard, provided by fellowship-trained retina specialists. This study is registered with the Thai national clinical trials registry, TCRT20190902002. FINDINGS: Between Dec 12, 2018, and March 29, 2020, 7940 patients were screened for inclusion. 7651 (96·3%) patients were eligible for study analysis, and 2412 (31·5%) patients were referred for diabetic retinopathy, diabetic macular oedema, ungradable images, or low visual acuity. For vision-threatening diabetic retinopathy, the deep-learning system had an accuracy of 94·7% (95% CI 93·0-96·2), sensitivity of 91·4% (87·1-95·0), and specificity of 95·4% (94·1-96·7). The retina specialist over-readers had an accuracy of 93·5 (91·7-95·0; p=0·17), a sensitivity of 84·8% (79·4-90·0; p=0·024), and specificity of 95·5% (94·1-96·7; p=0·98). The PPV for the deep-learning system was 79·2 (95% CI 73·8-84·3) compared with 75·6 (69·8-81·1) for the over-readers. The NPV for the deep-learning system was 95·5 (92·8-97·9) compared with 92·4 (89·3-95·5) for the over-readers. INTERPRETATION: A deep-learning system can deliver real-time diabetic retinopathy detection capability similar to retina specialists in community-based screening settings. Socioenvironmental factors and workflows must be taken into consideration when implementing a deep-learning system within a large-scale screening programme in LMICs. FUNDING: Google and Rajavithi Hospital, Bangkok, Thailand. TRANSLATION: For the Thai translation of the abstract see Supplementary Materials section.

Asunto(s)

Aprendizaje Profundo , Diabetes Mellitus , Retinopatía Diabética , Edema Macular , Estudios de Cohortes , Retinopatía Diabética/diagnóstico , Humanos , Edema Macular/diagnóstico , Estudios Prospectivos , Tailandia

16.

Detection of signs of disease in external photographs of the eyes via deep learning.

Babenko, Boris; Mitani, Akinori; Traynis, Ilana; Kitade, Naho; Singh, Preeti; Maa, April Y; Cuadros, Jorge; Corrado, Greg S; Peng, Lily; Webster, Dale R; Varadarajan, Avinash; Hammel, Naama; Liu, Yun.

Nat Biomed Eng ; 6(12): 1370-1383, 2022 12.

Artículo en Inglés | MEDLINE | ID: mdl-35352000

RESUMEN

Retinal fundus photographs can be used to detect a range of retinal conditions. Here we show that deep-learning models trained instead on external photographs of the eyes can be used to detect diabetic retinopathy (DR), diabetic macular oedema and poor blood glucose control. We developed the models using eye photographs from 145,832 patients with diabetes from 301 DR screening sites and evaluated the models on four tasks and four validation datasets with a total of 48,644 patients from 198 additional screening sites. For all four tasks, the predictive performance of the deep-learning models was significantly higher than the performance of logistic regression models using self-reported demographic and medical history data, and the predictions generalized to patients with dilated pupils, to patients from a different DR screening programme and to a general eye care programme that included diabetics and non-diabetics. We also explored the use of the deep-learning models for the detection of elevated lipid levels. The utility of external eye photographs for the diagnosis and management of diseases should be further validated with images from different cameras and patient populations.

Asunto(s)

Aprendizaje Profundo , Retinopatía Diabética , Enfermedades de la Retina , Humanos , Sensibilidad y Especificidad , Retinopatía Diabética/diagnóstico por imagen , Fondo de Ojo

17.

Deep Learning to Detect OCT-derived Diabetic Macular Edema from Color Retinal Photographs: A Multicenter Validation Study.

Liu, Xinle; Ali, Tayyeba K; Singh, Preeti; Shah, Ami; McKinney, Scott Mayer; Ruamviboonsuk, Paisan; Turner, Angus W; Keane, Pearse A; Chotcomwongse, Peranut; Nganthavee, Variya; Chia, Mark; Huemer, Josef; Cuadros, Jorge; Raman, Rajiv; Corrado, Greg S; Peng, Lily; Webster, Dale R; Hammel, Naama; Varadarajan, Avinash V; Liu, Yun; Chopra, Reena; Bavishi, Pinal.

Ophthalmol Retina ; 6(5): 398-410, 2022 05.

Artículo en Inglés | MEDLINE | ID: mdl-34999015

RESUMEN

PURPOSE: To validate the generalizability of a deep learning system (DLS) that detects diabetic macular edema (DME) from 2-dimensional color fundus photographs (CFP), for which the reference standard for retinal thickness and fluid presence is derived from 3-dimensional OCT. DESIGN: Retrospective validation of a DLS across international datasets. PARTICIPANTS: Paired CFP and OCT of patients from diabetic retinopathy (DR) screening programs or retina clinics. The DLS was developed using data sets from Thailand, the United Kingdom, and the United States and validated using 3060 unique eyes from 1582 patients across screening populations in Australia, India, and Thailand. The DLS was separately validated in 698 eyes from 537 screened patients in the United Kingdom with mild DR and suspicion of DME based on CFP. METHODS: The DLS was trained using DME labels from OCT. The presence of DME was based on retinal thickening or intraretinal fluid. The DLS's performance was compared with expert grades of maculopathy and to a previous proof-of-concept version of the DLS. We further simulated the integration of the current DLS into an algorithm trained to detect DR from CFP. MAIN OUTCOME MEASURES: The superiority of specificity and noninferiority of sensitivity of the DLS for the detection of center-involving DME, using device-specific thresholds, compared with experts. RESULTS: The primary analysis in a combined data set spanning Australia, India, and Thailand showed the DLS had 80% specificity and 81% sensitivity, compared with expert graders, who had 59% specificity and 70% sensitivity. Relative to human experts, the DLS had significantly higher specificity (P = 0.008) and noninferior sensitivity (P < 0.001). In the data set from the United Kingdom, the DLS had a specificity of 80% (P < 0.001 for specificity of >50%) and a sensitivity of 100% (P = 0.02 for sensitivity of > 90%). CONCLUSIONS: The DLS can generalize to multiple international populations with an accuracy exceeding that of experts. The clinical value of this DLS to reduce false-positive referrals, thus decreasing the burden on specialist eye care, warrants a prospective evaluation.

Asunto(s)

Aprendizaje Profundo , Diabetes Mellitus , Retinopatía Diabética , Edema Macular , Retinopatía Diabética/complicaciones , Retinopatía Diabética/diagnóstico , Humanos , Edema Macular/diagnóstico , Edema Macular/etiología , Estudios Retrospectivos , Tomografía de Coherencia Óptica/métodos , Estados Unidos

18.

Artificial intelligence for diagnosis and Gleason grading of prostate cancer: the PANDA challenge.

Bulten, Wouter; Kartasalo, Kimmo; Chen, Po-Hsuan Cameron; Ström, Peter; Pinckaers, Hans; Nagpal, Kunal; Cai, Yuannan; Steiner, David F; van Boven, Hester; Vink, Robert; Hulsbergen-van de Kaa, Christina; van der Laak, Jeroen; Amin, Mahul B; Evans, Andrew J; van der Kwast, Theodorus; Allan, Robert; Humphrey, Peter A; Grönberg, Henrik; Samaratunga, Hemamali; Delahunt, Brett; Tsuzuki, Toyonori; Häkkinen, Tomi; Egevad, Lars; Demkin, Maggie; Dane, Sohier; Tan, Fraser; Valkonen, Masi; Corrado, Greg S; Peng, Lily; Mermel, Craig H; Ruusuvuori, Pekka; Litjens, Geert; Eklund, Martin.

Nat Med ; 28(1): 154-163, 2022 01.

Artículo en Inglés | MEDLINE | ID: mdl-35027755

RESUMEN

Artificial intelligence (AI) has shown promise for diagnosing prostate cancer in biopsies. However, results have been limited to individual studies, lacking validation in multinational settings. Competitions have been shown to be accelerators for medical imaging innovations, but their impact is hindered by lack of reproducibility and independent validation. With this in mind, we organized the PANDA challenge-the largest histopathology competition to date, joined by 1,290 developers-to catalyze development of reproducible AI algorithms for Gleason grading using 10,616 digitized prostate biopsies. We validated that a diverse set of submitted algorithms reached pathologist-level performance on independent cross-continental cohorts, fully blinded to the algorithm developers. On United States and European external validation sets, the algorithms achieved agreements of 0.862 (quadratically weighted κ, 95% confidence interval (CI), 0.840-0.884) and 0.868 (95% CI, 0.835-0.900) with expert uropathologists. Successful generalization across different patient populations, laboratories and reference standards, achieved by a variety of algorithmic approaches, warrants evaluating AI-based Gleason grading in prospective clinical trials.

Asunto(s)

Clasificación del Tumor , Neoplasias de la Próstata/patología , Algoritmos , Biopsia , Estudios de Cohortes , Humanos , Masculino , Neoplasias de la Próstata/diagnóstico , Reproducibilidad de los Resultados

19.

Deep learning for distinguishing normal versus abnormal chest radiographs and generalization to two unseen diseases tuberculosis and COVID-19.

Nabulsi, Zaid; Sellergren, Andrew; Jamshy, Shahar; Lau, Charles; Santos, Edward; Kiraly, Atilla P; Ye, Wenxing; Yang, Jie; Pilgrim, Rory; Kazemzadeh, Sahar; Yu, Jin; Kalidindi, Sreenivasa Raju; Etemadi, Mozziyar; Garcia-Vicente, Florencia; Melnick, David; Corrado, Greg S; Peng, Lily; Eswaran, Krish; Tse, Daniel; Beladia, Neeral; Liu, Yun; Chen, Po-Hsuan Cameron; Shetty, Shravya.

Sci Rep ; 11(1): 15523, 2021 09 01.

Artículo en Inglés | MEDLINE | ID: mdl-34471144

RESUMEN

Chest radiography (CXR) is the most widely-used thoracic clinical imaging modality and is crucial for guiding the management of cardiothoracic conditions. The detection of specific CXR findings has been the main focus of several artificial intelligence (AI) systems. However, the wide range of possible CXR abnormalities makes it impractical to detect every possible condition by building multiple separate systems, each of which detects one or more pre-specified conditions. In this work, we developed and evaluated an AI system to classify CXRs as normal or abnormal. For training and tuning the system, we used a de-identified dataset of 248,445 patients from a multi-city hospital network in India. To assess generalizability, we evaluated our system using 6 international datasets from India, China, and the United States. Of these datasets, 4 focused on diseases that the AI was not trained to detect: 2 datasets with tuberculosis and 2 datasets with coronavirus disease 2019. Our results suggest that the AI system trained using a large dataset containing a diverse array of CXR abnormalities generalizes to new patient populations and unseen diseases. In a simulated workflow where the AI system prioritized abnormal cases, the turnaround time for abnormal cases reduced by 7-28%. These results represent an important step towards evaluating whether AI can be safely used to flag cases in a general setting where previously unseen abnormalities exist. Lastly, to facilitate the continued development of AI models for CXR, we release our collected labels for the publicly available dataset.

Asunto(s)

COVID-19/diagnóstico por imagen , Interpretación de Imagen Radiográfica Asistida por Computador/métodos , Tuberculosis/diagnóstico por imagen , Adulto , Anciano , Algoritmos , Estudios de Casos y Controles , China , Aprendizaje Profundo , Femenino , Humanos , India , Masculino , Persona de Mediana Edad , Radiografía Torácica , Estados Unidos

20.

Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence.

Collins, Gary S; Dhiman, Paula; Andaur Navarro, Constanza L; Ma, Jie; Hooft, Lotty; Reitsma, Johannes B; Logullo, Patricia; Beam, Andrew L; Peng, Lily; Van Calster, Ben; van Smeden, Maarten; Riley, Richard D; Moons, Karel Gm.

BMJ Open ; 11(7): e048008, 2021 07 09.

Artículo en Inglés | MEDLINE | ID: mdl-34244270

RESUMEN

INTRODUCTION: The Transparent Reporting of a multivariable prediction model of Individual Prognosis Or Diagnosis (TRIPOD) statement and the Prediction model Risk Of Bias ASsessment Tool (PROBAST) were both published to improve the reporting and critical appraisal of prediction model studies for diagnosis and prognosis. This paper describes the processes and methods that will be used to develop an extension to the TRIPOD statement (TRIPOD-artificial intelligence, AI) and the PROBAST (PROBAST-AI) tool for prediction model studies that applied machine learning techniques. METHODS AND ANALYSIS: TRIPOD-AI and PROBAST-AI will be developed following published guidance from the EQUATOR Network, and will comprise five stages. Stage 1 will comprise two systematic reviews (across all medical fields and specifically in oncology) to examine the quality of reporting in published machine-learning-based prediction model studies. In stage 2, we will consult a diverse group of key stakeholders using a Delphi process to identify items to be considered for inclusion in TRIPOD-AI and PROBAST-AI. Stage 3 will be virtual consensus meetings to consolidate and prioritise key items to be included in TRIPOD-AI and PROBAST-AI. Stage 4 will involve developing the TRIPOD-AI checklist and the PROBAST-AI tool, and writing the accompanying explanation and elaboration papers. In the final stage, stage 5, we will disseminate TRIPOD-AI and PROBAST-AI via journals, conferences, blogs, websites (including TRIPOD, PROBAST and EQUATOR Network) and social media. TRIPOD-AI will provide researchers working on prediction model studies based on machine learning with a reporting guideline that can help them report key details that readers need to evaluate the study quality and interpret its findings, potentially reducing research waste. We anticipate PROBAST-AI will help researchers, clinicians, systematic reviewers and policymakers critically appraise the design, conduct and analysis of machine learning based prediction model studies, with a robust standardised tool for bias evaluation. ETHICS AND DISSEMINATION: Ethical approval has been granted by the Central University Research Ethics Committee, University of Oxford on 10-December-2020 (R73034/RE001). Findings from this study will be disseminated through peer-review publications. PROSPERO REGISTRATION NUMBER: CRD42019140361 and CRD42019161764.

Asunto(s)

Inteligencia Artificial , Lista de Verificación , Sesgo , Humanos , Pronóstico , Proyectos de Investigación , Medición de Riesgo

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA