Búsqueda | Biblioteca Virtual en Salud Odontología. Uruguay

1.

International evaluation of an AI system for breast cancer screening.

McKinney, Scott Mayer; Sieniek, Marcin; Godbole, Varun; Godwin, Jonathan; Antropova, Natasha; Ashrafian, Hutan; Back, Trevor; Chesus, Mary; Corrado, Greg S; Darzi, Ara; Etemadi, Mozziyar; Garcia-Vicente, Florencia; Gilbert, Fiona J; Halling-Brown, Mark; Hassabis, Demis; Jansen, Sunny; Karthikesalingam, Alan; Kelly, Christopher J; King, Dominic; Ledsam, Joseph R; Melnick, David; Mostofi, Hormuz; Peng, Lily; Reicher, Joshua Jay; Romera-Paredes, Bernardino; Sidebottom, Richard; Suleyman, Mustafa; Tse, Daniel; Young, Kenneth C; De Fauw, Jeffrey; Shetty, Shravya.

Nature ; 577(7788): 89-94, 2020 01.

Artículo en Inglés | MEDLINE | ID: mdl-31894144

RESUMEN

Screening mammography aims to identify breast cancer at earlier stages of the disease, when treatment can be more successful1. Despite the existence of screening programmes worldwide, the interpretation of mammograms is affected by high rates of false positives and false negatives2. Here we present an artificial intelligence (AI) system that is capable of surpassing human experts in breast cancer prediction. To assess its performance in the clinical setting, we curated a large representative dataset from the UK and a large enriched dataset from the USA. We show an absolute reduction of 5.7% and 1.2% (USA and UK) in false positives and 9.4% and 2.7% in false negatives. We provide evidence of the ability of the system to generalize from the UK to the USA. In an independent study of six radiologists, the AI system outperformed all of the human readers: the area under the receiver operating characteristic curve (AUC-ROC) for the AI system was greater than the AUC-ROC for the average radiologist by an absolute margin of 11.5%. We ran a simulation in which the AI system participated in the double-reading process that is used in the UK, and found that the AI system maintained non-inferior performance and reduced the workload of the second reader by 88%. This robust assessment of the AI system paves the way for clinical trials to improve the accuracy and efficiency of breast cancer screening.

Asunto(s)

Inteligencia Artificial/normas , Neoplasias de la Mama/diagnóstico por imagen , Detección Precoz del Cáncer/métodos , Detección Precoz del Cáncer/normas , Femenino , Humanos , Mamografía/normas , Reproducibilidad de los Resultados , Reino Unido , Estados Unidos

2.

Large-scale machine-learning-based phenotyping significantly improves genomic discovery for optic nerve head morphology.

Alipanahi, Babak; Hormozdiari, Farhad; Behsaz, Babak; Cosentino, Justin; McCaw, Zachary R; Schorsch, Emanuel; Sculley, D; Dorfman, Elizabeth H; Foster, Paul J; Peng, Lily H; Phene, Sonia; Hammel, Naama; Carroll, Andrew; Khawaja, Anthony P; McLean, Cory Y.

Am J Hum Genet ; 108(7): 1217-1230, 2021 07 01.

Artículo en Inglés | MEDLINE | ID: mdl-34077760

RESUMEN

Genome-wide association studies (GWASs) require accurate cohort phenotyping, but expert labeling can be costly, time intensive, and variable. Here, we develop a machine learning (ML) model to predict glaucomatous optic nerve head features from color fundus photographs. We used the model to predict vertical cup-to-disc ratio (VCDR), a diagnostic parameter and cardinal endophenotype for glaucoma, in 65,680 Europeans in the UK Biobank (UKB). A GWAS of ML-based VCDR identified 299 independent genome-wide significant (GWS; p ≤ 5 × 10-8) hits in 156 loci. The ML-based GWAS replicated 62 of 65 GWS loci from a recent VCDR GWAS in the UKB for which two ophthalmologists manually labeled images for 67,040 Europeans. The ML-based GWAS also identified 93 novel loci, significantly expanding our understanding of the genetic etiologies of glaucoma and VCDR. Pathway analyses support the biological significance of the novel hits to VCDR: select loci near genes involved in neuronal and synaptic biology or harboring variants are known to cause severe Mendelian ophthalmic disease. Finally, the ML-based GWAS results significantly improve polygenic prediction of VCDR and primary open-angle glaucoma in the independent EPIC-Norfolk cohort.

Asunto(s)

Aprendizaje Automático , Disco Óptico/anatomía & histología , Conjuntos de Datos como Asunto , Angiografía con Fluoresceína , Estudio de Asociación del Genoma Completo , Glaucoma de Ángulo Abierto/diagnóstico por imagen , Humanos , Modelos Anatómicos , Disco Óptico/diagnóstico por imagen , Fenotipo , Medición de Riesgo

3.

The effects of extinction and an explicitly unpaired treatment on the reinforcing properties of a Pavlovian conditioned stimulus.

Kennedy, Nicholas G W; Holmes, Nathan M; Peng, Lily W T; Frederick Westbrook, R.

Neurobiol Learn Mem ; 207: 107879, 2024 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-38081536

RESUMEN

This series of experiments examined the effects of extinction and an explicitly unpaired treatment on the ability of a conditioned stimulus (CS) to function as a reinforcer. Rats were trained to lever press for food, exposed to pairings of a noise CS and food, and, finally, tested for their willingness to lever press for the CS in the absence of the food. Experiment 1 provided a demonstration of conditioned reinforcement (using controls that were only exposed to unpaired presentations of the CS and food) and showed that it was equivalent after one or four sessions of CS-food pairings. Experiments 2 and 3 showed that, after one session of CS-food pairings, repeated presentations of the CS alone reduced its reinforcing properties; but after four sessions of CS-food pairings, repeated presentations of the CS alone had no effect on these properties. Experiment 4 showed that, after four sessions of CS-food pairings, explicitly unpaired presentations of the CS and food completely undermined conditioned reinforcement. Finally, Experiment 5 provided within-experiment evidence that, after four sessions of CS-food pairings, the reinforcing properties of the CS were disrupted by explicitly unpaired presentations of the CS and food but spared by repeated presentations of the CS alone. Together, these findings indicate that the effectiveness of extinction in undermining the reinforcing properties of a CS depends on its level of conditioning; and that, where extinction fails to disrupt these properties, they are successfully undermined by an explicitly unpaired treatment. They are discussed with respect to findings in the literature on Pavlovian-to-instrumental transfer; and the Rescorla-Wagner model, which anticipates that an explicitly unpaired treatment will be more effective than extinction in reversing the effects of conditioning.

Asunto(s)

Condicionamiento Operante , Refuerzo en Psicología , Ratas , Animales , Condicionamiento Clásico , Extinción Psicológica

4.

Lisocabtagene maraleucel for second-line relapsed or refractory large B-cell lymphoma: patient-reported outcomes from the PILOT study.

Gordon, Leo I; Liu, Fei Fei; Braverman, Julia; Hoda, Daanish; Ghosh, Nilanjan; Hamadani, Mehdi; Hildebrandt, Gerhard C; Peng, Lily; Guo, Shien; Shi, Ling; Sehgal, Alison.

Haematologica ; 109(3): 857-866, 2024 Mar 01.

Artículo en Inglés | MEDLINE | ID: mdl-37646670

RESUMEN

In the single-arm, open-label, multicenter, phase II PILOT study, second-line treatment with the chimeric antigen receptor (CAR) T-cell therapy lisocabtagene maraleucel (liso-cel) in patients with relapsed or refractory (R/R) large B-cell lymphoma (LBCL) for whom hematopoietic stem cell transplantation (HSCT) was not intended resulted in high response rates, durable responses, and a safety profile consistent with previous reports. Here, we analyzed changes in health-related quality of life (HRQOL) in patients who received liso-cel in PILOT. Patients received liso-cel, an autologous, CD19-directed, 4-1BB CAR T-cell product administered at equal target doses of CD8+ and CD4+ CAR+ T cells, for a total target dose of 100×106 CAR+ T cells. HRQOL, a secondary endpoint of PILOT, was assessed as prespecified using three patient-reported outcome instruments (EORTC QLQ-C30; FACT-LymS; EQ-5D-5L). Evaluable datasets for the EORTC QLQ-C30, FACT-LymS, and EQ-5D-5L health utility index, and visual analog scale (EQ-VAS) included 56 (92%), 49 (80%), 55 (90%), and 54 (89%) patients, respectively. Clinically meaningful improvement was achieved across most post-treatment visits for EORTC QLQ-C30 fatigue and FACT-LymS. Overall mean changes from baseline through day 545 showed significant improvements in EORTC QLQ-C30 fatigue, pain, and appetite loss, FACT-LymS, and EQ VAS. In within-patient analyses, clinically meaningful improvements or maintenance in scores were observed in most patients at days 90, 180, 270, and 365. HRQOL was maintained or improved in patients who received liso-cel as second-line therapy in PILOT. These findings support liso-cel as a preferred second-line treatment in patients with R/R LBCL not intended for HSCT (clinicaltrials gov. Identifier: NCT03483103).

Asunto(s)

Linfoma de Células B Grandes Difuso , Calidad de Vida , Humanos , Proyectos Piloto , Linfoma de Células B Grandes Difuso/terapia , Fatiga , Medición de Resultados Informados por el Paciente

5.

Deep Learning Detection of Active Pulmonary Tuberculosis at Chest Radiography Matched the Clinical Performance of Radiologists.

Kazemzadeh, Sahar; Yu, Jin; Jamshy, Shahar; Pilgrim, Rory; Nabulsi, Zaid; Chen, Christina; Beladia, Neeral; Lau, Charles; McKinney, Scott Mayer; Hughes, Thad; Kiraly, Atilla P; Kalidindi, Sreenivasa Raju; Muyoyeta, Monde; Malemela, Jameson; Shih, Ting; Corrado, Greg S; Peng, Lily; Chou, Katherine; Chen, Po-Hsuan Cameron; Liu, Yun; Eswaran, Krish; Tse, Daniel; Shetty, Shravya; Prabhakara, Shruthi.

Radiology ; 306(1): 124-137, 2023 01.

Artículo en Inglés | MEDLINE | ID: mdl-36066366

RESUMEN

Background The World Health Organization (WHO) recommends chest radiography to facilitate tuberculosis (TB) screening. However, chest radiograph interpretation expertise remains limited in many regions. Purpose To develop a deep learning system (DLS) to detect active pulmonary TB on chest radiographs and compare its performance to that of radiologists. Materials and Methods A DLS was trained and tested using retrospective chest radiographs (acquired between 1996 and 2020) from 10 countries. To improve generalization, large-scale chest radiograph pretraining, attention pooling, and semisupervised learning ("noisy-student") were incorporated. The DLS was evaluated in a four-country test set (China, India, the United States, and Zambia) and in a mining population in South Africa, with positive TB confirmed with microbiological tests or nucleic acid amplification testing (NAAT). The performance of the DLS was compared with that of 14 radiologists. The authors studied the efficacy of the DLS compared with that of nine radiologists using the Obuchowski-Rockette-Hillis procedure. Given WHO targets of 90% sensitivity and 70% specificity, the operating point of the DLS (0.45) was prespecified to favor sensitivity. Results A total of 165 754 images in 22 284 subjects (mean age, 45 years; 21% female) were used for model development and testing. In the four-country test set (1236 subjects, 17% with active TB), the receiver operating characteristic (ROC) curve of the DLS was higher than those for all nine India-based radiologists, with an area under the ROC curve of 0.89 (95% CI: 0.87, 0.91). Compared with these radiologists, at the prespecified operating point, the DLS sensitivity was higher (88% vs 75%, P < .001) and specificity was noninferior (79% vs 84%, P = .004). Trends were similar within other patient subgroups, in the South Africa data set, and across various TB-specific chest radiograph findings. In simulations, the use of the DLS to identify likely TB-positive chest radiographs for NAAT confirmation reduced the cost by 40%-80% per TB-positive patient detected. Conclusion A deep learning method was found to be noninferior to radiologists for the determination of active tuberculosis on digital chest radiographs. © RSNA, 2022 Online supplemental material is available for this article. See also the editorial by van Ginneken in this issue.

Asunto(s)

Aprendizaje Profundo , Tuberculosis Pulmonar , Humanos , Femenino , Persona de Mediana Edad , Masculino , Radiografía Torácica/métodos , Estudios Retrospectivos , Radiografía , Tuberculosis Pulmonar/diagnóstico por imagen , Radiólogos , Sensibilidad y Especificidad

6.

Persistence of oral pre-exposure prophylaxis (PrEP) among adolescent girls and young women initiating PrEP for HIV prevention in Kenya.

de Dieu Tapsoba, Jean; Zangeneh, Sahar Z; Appelmans, Eline; Pasalar, Siavash; Mori, Kira; Peng, Lily; Tao, Janice; Drain, Paul; Okomo, Gordon; Bii, Stanley; Mukabi, James; Zobrist, Stephanie; Brady, Martha; Obanda, Rael; Madiang, Daniel Oluoch; Cover, Jane; Duerr, Ann; Chen, Ying Qing; Obong'o, Christopher.

AIDS Care ; 33(6): 712-720, 2021 06.

Artículo en Inglés | MEDLINE | ID: mdl-32951437

RESUMEN

The Determined, Resilient, Empowered, AIDS-free, Mentored, and Safe (DREAMS) Initiative aims to reduce HIV infections among adolescent girls and young women (AGYW) in Africa. Oral pre-exposure prophylaxis (PrEP) is offered through DREAMS in Kenya to eligible AGYW in high burden counties including Kisumu and Homa Bay. This study examines PrEP persistence among AGYW in high burden community-based PrEP delivery settings. We evaluated PrEP persistence among AGYW in the DREAMS PrEP program in Kisumu and Homa Bay using survival analysis and programmatic PrEP refill data collected between March through December 2017. Among 1,259 AGYW who initiated PrEP during the study period, the median persistence time in the program was 56 days (95% CI: 49-58 days) and the proportion who persisted 3 months later was 37% (95% CI: 34-40%). Persistence varied by county (p < 0.001), age at PrEP initiation (p = 0.002), marital status (p = 0.008), transactional sex (p = 0.002), gender-based violence (GBV) experience (p = 0.009) and current school attendance (p = 0.001) at DREAMS enrollment. Persistence did not vary with orphan status, food insecurity, condom use, age at first sexual encounter or engagement in age-disparate sex at DREAMS enrollment. Targeted strategies are needed to improve AGYW retention in the PrEP program.

Asunto(s)

Fármacos Anti-VIH , Infecciones por VIH , Profilaxis Pre-Exposición , Adolescente , Fármacos Anti-VIH/uso terapéutico , Femenino , Infecciones por VIH/tratamiento farmacológico , Infecciones por VIH/prevención & control , Humanos , Kenia , Mentores , Conducta Sexual

7.

Addendum: International evaluation of an AI system for breast cancer screening.

McKinney, Scott Mayer; Sieniek, Marcin; Godbole, Varun; Godwin, Jonathan; Antropova, Natasha; Ashrafian, Hutan; Back, Trevor; Chesus, Mary; Corrado, Greg S; Darzi, Ara; Etemadi, Mozziyar; Garcia-Vicente, Florencia; Gilbert, Fiona J; Halling-Brown, Mark; Hassabis, Demis; Jansen, Sunny; Karthikesalingam, Alan; Kelly, Christopher J; King, Dominic; Ledsam, Joseph R; Melnick, David; Mostofi, Hormuz; Peng, Lily; Reicher, Joshua Jay; Romera-Paredes, Bernardino; Sidebottom, Richard; Suleyman, Mustafa; Tse, Daniel; Young, Kenneth C; De Fauw, Jeffrey; Shetty, Shravya.

Nature ; 586(7829): E19, 2020 10.

Artículo en Inglés | MEDLINE | ID: mdl-33057216

8.

Deep Learning and Glaucoma Specialists: The Relative Importance of Optic Disc Features to Predict Glaucoma Referral in Fundus Photographs.

Phene, Sonia; Dunn, R Carter; Hammel, Naama; Liu, Yun; Krause, Jonathan; Kitade, Naho; Schaekermann, Mike; Sayres, Rory; Wu, Derek J; Bora, Ashish; Semturs, Christopher; Misra, Anita; Huang, Abigail E; Spitze, Arielle; Medeiros, Felipe A; Maa, April Y; Gandhi, Monica; Corrado, Greg S; Peng, Lily; Webster, Dale R.

Ophthalmology ; 126(12): 1627-1639, 2019 12.

Artículo en Inglés | MEDLINE | ID: mdl-31561879

RESUMEN

PURPOSE: To develop and validate a deep learning (DL) algorithm that predicts referable glaucomatous optic neuropathy (GON) and optic nerve head (ONH) features from color fundus images, to determine the relative importance of these features in referral decisions by glaucoma specialists (GSs) and the algorithm, and to compare the performance of the algorithm with eye care providers. DESIGN: Development and validation of an algorithm. PARTICIPANTS: Fundus images from screening programs, studies, and a glaucoma clinic. METHODS: A DL algorithm was trained using a retrospective dataset of 86 618 images, assessed for glaucomatous ONH features and referable GON (defined as ONH appearance worrisome enough to justify referral for comprehensive examination) by 43 graders. The algorithm was validated using 3 datasets: dataset A (1205 images, 1 image/patient; 18.1% referable), images adjudicated by panels of GSs; dataset B (9642 images, 1 image/patient; 9.2% referable), images from a diabetic teleretinal screening program; and dataset C (346 images, 1 image/patient; 81.7% referable), images from a glaucoma clinic. MAIN OUTCOME MEASURES: The algorithm was evaluated using the area under the receiver operating characteristic curve (AUC), sensitivity, and specificity for referable GON and glaucomatous ONH features. RESULTS: The algorithm's AUC for referable GON was 0.945 (95% confidence interval [CI], 0.929-0.960) in dataset A, 0.855 (95% CI, 0.841-0.870) in dataset B, and 0.881 (95% CI, 0.838-0.918) in dataset C. Algorithm AUCs ranged between 0.661 and 0.973 for glaucomatous ONH features. The algorithm showed significantly higher sensitivity than 7 of 10 graders not involved in determining the reference standard, including 2 of 3 GSs, and showed higher specificity than 3 graders (including 1 GS), while remaining comparable to others. For both GSs and the algorithm, the most crucial features related to referable GON were: presence of vertical cup-to-disc ratio of 0.7 or more, neuroretinal rim notching, retinal nerve fiber layer defect, and bared circumlinear vessels. CONCLUSIONS: A DL algorithm trained on fundus images alone can detect referable GON with higher sensitivity than and comparable specificity to eye care providers. The algorithm maintained good performance on an independent dataset with diagnoses based on a full glaucoma workup.

Asunto(s)

Aprendizaje Profundo , Glaucoma de Ángulo Abierto/diagnóstico , Oftalmólogos , Disco Óptico/patología , Enfermedades del Nervio Óptico/diagnóstico , Especialización , Anciano , Área Bajo la Curva , Conjuntos de Datos como Asunto , Femenino , Humanos , Masculino , Persona de Mediana Edad , Fibras Nerviosas/patología , Curva ROC , Derivación y Consulta , Células Ganglionares de la Retina/patología , Estudios Retrospectivos , Sensibilidad y Especificidad

9.

Using a Deep Learning Algorithm and Integrated Gradients Explanation to Assist Grading for Diabetic Retinopathy.

Sayres, Rory; Taly, Ankur; Rahimy, Ehsan; Blumer, Katy; Coz, David; Hammel, Naama; Krause, Jonathan; Narayanaswamy, Arunachalam; Rastegar, Zahra; Wu, Derek; Xu, Shawn; Barb, Scott; Joseph, Anthony; Shumski, Michael; Smith, Jesse; Sood, Arjun B; Corrado, Greg S; Peng, Lily; Webster, Dale R.

Ophthalmology ; 126(4): 552-564, 2019 04.

Artículo en Inglés | MEDLINE | ID: mdl-30553900

RESUMEN

PURPOSE: To understand the impact of deep learning diabetic retinopathy (DR) algorithms on physician readers in computer-assisted settings. DESIGN: Evaluation of diagnostic technology. PARTICIPANTS: One thousand seven hundred ninety-six retinal fundus images from 1612 diabetic patients. METHODS: Ten ophthalmologists (5 general ophthalmologists, 4 retina specialists, 1 retina fellow) read images for DR severity based on the International Clinical Diabetic Retinopathy disease severity scale in each of 3 conditions: unassisted, grades only, or grades plus heatmap. Grades-only assistance comprised a histogram of DR predictions (grades) from a trained deep-learning model. For grades plus heatmap, we additionally showed explanatory heatmaps. MAIN OUTCOME MEASURES: For each experiment arm, we computed sensitivity and specificity of each reader and the algorithm for different levels of DR severity against an adjudicated reference standard. We also measured accuracy (exact 5-class level agreement and Cohen's quadratically weighted κ), reader-reported confidence (5-point Likert scale), and grading time. RESULTS: Readers graded more accurately with model assistance than without for the grades-only condition (P < 0.001). Grades plus heatmaps improved accuracy for patients with DR (P < 0.001), but reduced accuracy for patients without DR (P = 0.006). Both forms of assistance increased readers' sensitivity moderate-or-worse DR: unassisted: mean, 79.4% [95% confidence interval (CI), 72.3%-86.5%]; grades only: mean, 87.5% [95% CI, 85.1%-89.9%]; grades plus heatmap: mean, 88.7% [95% CI, 84.9%-92.5%] without a corresponding drop in specificity (unassisted: mean, 96.6% [95% CI, 95.9%-97.4%]; grades only: mean, 96.1% [95% CI, 95.5%-96.7%]; grades plus heatmap: mean, 95.5% [95% CI, 94.8%-96.1%]). Algorithmic assistance increased the accuracy of retina specialists above that of the unassisted reader or model alone; and increased grading confidence and grading time across all readers. For most cases, grades plus heatmap was only as effective as grades only. Over the course of the experiment, grading time decreased across all conditions, although most sharply for grades plus heatmap. CONCLUSIONS: Deep learning algorithms can improve the accuracy of, and confidence in, DR diagnosis in an assisted read setting. They also may increase grading time, although these effects may be ameliorated with experience.

Asunto(s)

Algoritmos , Aprendizaje Profundo , Retinopatía Diabética/clasificación , Retinopatía Diabética/diagnóstico , Diagnóstico por Computador/métodos , Femenino , Humanos , Masculino , Oftalmólogos/normas , Fotograbar/métodos , Curva ROC , Estándares de Referencia , Reproducibilidad de los Resultados , Sensibilidad y Especificidad

10.

How to Read Articles That Use Machine Learning: Users' Guides to the Medical Literature.

Liu, Yun; Chen, Po-Hsuan Cameron; Krause, Jonathan; Peng, Lily.

JAMA ; 322(18): 1806-1816, 2019 11 12.

Artículo en Inglés | MEDLINE | ID: mdl-31714992

RESUMEN

In recent years, many new clinical diagnostic tools have been developed using complicated machine learning methods. Irrespective of how a diagnostic tool is derived, it must be evaluated using a 3-step process of deriving, validating, and establishing the clinical effectiveness of the tool. Machine learning-based tools should also be assessed for the type of machine learning model used and its appropriateness for the input data type and data set size. Machine learning models also generally have additional prespecified settings called hyperparameters, which must be tuned on a data set independent of the validation set. On the validation set, the outcome against which the model is evaluated is termed the reference standard. The rigor of the reference standard must be assessed, such as against a universally accepted gold standard or expert grading.

Asunto(s)

Aprendizaje Automático , Modelos Teóricos , Algoritmos , Humanos , Publicaciones , Sensibilidad y Especificidad

11.

Grader Variability and the Importance of Reference Standards for Evaluating Machine Learning Models for Diabetic Retinopathy.

Krause, Jonathan; Gulshan, Varun; Rahimy, Ehsan; Karth, Peter; Widner, Kasumi; Corrado, Greg S; Peng, Lily; Webster, Dale R.

Ophthalmology ; 125(8): 1264-1272, 2018 08.

Artículo en Inglés | MEDLINE | ID: mdl-29548646

RESUMEN

PURPOSE: Use adjudication to quantify errors in diabetic retinopathy (DR) grading based on individual graders and majority decision, and to train an improved automated algorithm for DR grading. DESIGN: Retrospective analysis. PARTICIPANTS: Retinal fundus images from DR screening programs. METHODS: Images were each graded by the algorithm, U.S. board-certified ophthalmologists, and retinal specialists. The adjudicated consensus of the retinal specialists served as the reference standard. MAIN OUTCOME MEASURES: For agreement between different graders as well as between the graders and the algorithm, we measured the (quadratic-weighted) kappa score. To compare the performance of different forms of manual grading and the algorithm for various DR severity cutoffs (e.g., mild or worse DR, moderate or worse DR), we measured area under the curve (AUC), sensitivity, and specificity. RESULTS: Of the 193 discrepancies between adjudication by retinal specialists and majority decision of ophthalmologists, the most common were missing microaneurysm (MAs) (36%), artifacts (20%), and misclassified hemorrhages (16%). Relative to the reference standard, the kappa for individual retinal specialists, ophthalmologists, and algorithm ranged from 0.82 to 0.91, 0.80 to 0.84, and 0.84, respectively. For moderate or worse DR, the majority decision of ophthalmologists had a sensitivity of 0.838 and specificity of 0.981. The algorithm had a sensitivity of 0.971, specificity of 0.923, and AUC of 0.986. For mild or worse DR, the algorithm had a sensitivity of 0.970, specificity of 0.917, and AUC of 0.986. By using a small number of adjudicated consensus grades as a tuning dataset and higher-resolution images as input, the algorithm improved in AUC from 0.934 to 0.986 for moderate or worse DR. CONCLUSIONS: Adjudication reduces the errors in DR grading. A small set of adjudicated DR grades allows substantial improvements in algorithm performance. The resulting algorithm's performance was on par with that of individual U.S. Board-Certified ophthalmologists and retinal specialists.

Asunto(s)

Algoritmos , Competencia Clínica/normas , Retinopatía Diabética/diagnóstico , Aprendizaje Automático , Tamizaje Masivo/normas , Oftalmólogos/normas , Femenino , Humanos , Masculino , Persona de Mediana Edad , Curva ROC , Estándares de Referencia , Estudios Retrospectivos

12.

Combined targeting of MEK and PI3K/mTOR effector pathways is necessary to effectively inhibit NRAS mutant melanoma in vitro and in vivo.

Posch, Christian; Moslehi, Homayoun; Feeney, Luzviminda; Green, Gary A; Ebaee, Anoosheh; Feichtenschlager, Valentin; Chong, Kim; Peng, Lily; Dimon, Michelle T; Phillips, Thomas; Daud, Adil I; McCalmont, Timothy H; LeBoit, Philip E; Ortiz-Urda, Susana.

Proc Natl Acad Sci U S A ; 110(10): 4015-20, 2013 Mar 05.

Artículo en Inglés | MEDLINE | ID: mdl-23431193

RESUMEN

Activating mutations in the neuroblastoma rat sarcoma viral oncogene homolog (NRAS) gene are common genetic events in malignant melanoma being found in 15-25% of cases. NRAS is thought to activate both mitogen activated protein kinase (MAPK) and PI3K signaling in melanoma cells. We studied the influence of different components on the MAP/extracellular signal-regulated (ERK) kinase (MEK) and PI3K/mammalian target of rapamycin (mTOR)-signaling cascade in NRAS mutant melanoma cells. In general, these cells were more sensitive to MEK inhibition compared with inhibition in the PI3K/mTOR cascade. Combined targeting of MEK and PI3K was superior to MEK and mTOR1,2 inhibition in all NRAS mutant melanoma cell lines tested, suggesting that PI3K signaling is more important for cell survival in NRAS mutant melanoma when MEK is inhibited. However, targeting of PI3K/mTOR1,2 in combination with MEK inhibitors is necessary to effectively abolish growth of NRAS mutant melanoma cells in vitro and regress xenografted NRAS mutant melanoma. Furthermore, we showed that MEK and PI3K/mTOR1,2 inhibition is synergistic. Expression analysis confirms that combined MEK and PI3K/mTOR1,2 inhibition predominantly influences genes in the rat sarcoma (RAS) pathway and growth factor receptor pathways, which signal through MEK/ERK and PI3K/mTOR, respectively. Our results suggest that combined targeting of the MEK/ERK and PI3K/mTOR pathways has antitumor activity and might serve as a therapeutic option in the treatment of NRAS mutant melanoma, for which there are currently no effective therapies.

Asunto(s)

GTP Fosfohidrolasas/genética , Sistema de Señalización de MAP Quinasas/efectos de los fármacos , Melanoma/tratamiento farmacológico , Melanoma/metabolismo , Proteínas de la Membrana/genética , Inhibidores de las Quinasa Fosfoinosítidos-3 , Serina-Treonina Quinasas TOR/antagonistas & inhibidores , Animales , Antineoplásicos/administración & dosificación , Antineoplásicos/farmacología , Apoptosis/efectos de los fármacos , Línea Celular Tumoral , Sinergismo Farmacológico , Femenino , Humanos , Melanoma/genética , Melanoma/patología , Ratones , Ratones Desnudos , Mutación , Inhibidores de Proteínas Quinasas/administración & dosificación , Inhibidores de Proteínas Quinasas/farmacología , Transducción de Señal/efectos de los fármacos , Ensayos Antitumor por Modelo de Xenoinjerto

13.

Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs.

Gulshan, Varun; Peng, Lily; Coram, Marc; Stumpe, Martin C; Wu, Derek; Narayanaswamy, Arunachalam; Venugopalan, Subhashini; Widner, Kasumi; Madams, Tom; Cuadros, Jorge; Kim, Ramasamy; Raman, Rajiv; Nelson, Philip C; Mega, Jessica L; Webster, Dale R.

JAMA ; 316(22): 2402-2410, 2016 12 13.

Artículo en Inglés | MEDLINE | ID: mdl-27898976

RESUMEN

Importance: Deep learning is a family of computational methods that allow an algorithm to program itself by learning from a large set of examples that demonstrate the desired behavior, removing the need to specify rules explicitly. Application of these methods to medical imaging requires further assessment and validation. Objective: To apply deep learning to create an algorithm for automated detection of diabetic retinopathy and diabetic macular edema in retinal fundus photographs. Design and Setting: A specific type of neural network optimized for image classification called a deep convolutional neural network was trained using a retrospective development data set of 128â¯175 retinal images, which were graded 3 to 7 times for diabetic retinopathy, diabetic macular edema, and image gradability by a panel of 54 US licensed ophthalmologists and ophthalmology senior residents between May and December 2015. The resultant algorithm was validated in January and February 2016 using 2 separate data sets, both graded by at least 7 US board-certified ophthalmologists with high intragrader consistency. Exposure: Deep learning-trained algorithm. Main Outcomes and Measures: The sensitivity and specificity of the algorithm for detecting referable diabetic retinopathy (RDR), defined as moderate and worse diabetic retinopathy, referable diabetic macular edema, or both, were generated based on the reference standard of the majority decision of the ophthalmologist panel. The algorithm was evaluated at 2 operating points selected from the development set, one selected for high specificity and another for high sensitivity. Results: The EyePACS-1 data set consisted of 9963 images from 4997 patients (mean age, 54.4 years; 62.2% women; prevalence of RDR, 683/8878 fully gradable images [7.8%]); the Messidor-2 data set had 1748 images from 874 patients (mean age, 57.6 years; 42.6% women; prevalence of RDR, 254/1745 fully gradable images [14.6%]). For detecting RDR, the algorithm had an area under the receiver operating curve of 0.991 (95% CI, 0.988-0.993) for EyePACS-1 and 0.990 (95% CI, 0.986-0.995) for Messidor-2. Using the first operating cut point with high specificity, for EyePACS-1, the sensitivity was 90.3% (95% CI, 87.5%-92.7%) and the specificity was 98.1% (95% CI, 97.8%-98.5%). For Messidor-2, the sensitivity was 87.0% (95% CI, 81.1%-91.0%) and the specificity was 98.5% (95% CI, 97.7%-99.1%). Using a second operating point with high sensitivity in the development set, for EyePACS-1 the sensitivity was 97.5% and specificity was 93.4% and for Messidor-2 the sensitivity was 96.1% and specificity was 93.9%. Conclusions and Relevance: In this evaluation of retinal fundus photographs from adults with diabetes, an algorithm based on deep machine learning had high sensitivity and specificity for detecting referable diabetic retinopathy. Further research is necessary to determine the feasibility of applying this algorithm in the clinical setting and to determine whether use of the algorithm could lead to improved care and outcomes compared with current ophthalmologic assessment.

Asunto(s)

Algoritmos , Retinopatía Diabética/diagnóstico por imagen , Fondo de Ojo , Aprendizaje Automático , Edema Macular/diagnóstico por imagen , Redes Neurales de la Computación , Fotograbar , Femenino , Humanos , Masculino , Persona de Mediana Edad , Variaciones Dependientes del Observador , Oftalmólogos , Sensibilidad y Especificidad

14.

How to develop machine learning models for healthcare.

Chen, Po-Hsuan Cameron; Liu, Yun; Peng, Lily.

Nat Mater ; 18(5): 410-414, 2019 05.

Artículo en Inglés | MEDLINE | ID: mdl-31000806

Asunto(s)

Diagnóstico por Computador/métodos , Descubrimiento de Drogas , Aprendizaje Automático , Algoritmos , Enfermedades Cardiovasculares/diagnóstico , Retinopatía Diabética/diagnóstico , Errores Diagnósticos/prevención & control , Ecocardiografía/métodos , Humanos , Procesamiento de Imagen Asistido por Computador , Informática Médica/métodos , Modelos Teóricos , Valor Predictivo de las Pruebas , Pronóstico , Planificación de la Radioterapia Asistida por Computador , Reproducibilidad de los Resultados

15.

Lisocabtagene Maraleucel in Relapsed/Refractory Mantle Cell Lymphoma: Primary Analysis of the Mantle Cell Lymphoma Cohort From TRANSCEND NHL 001, a Phase I Multicenter Seamless Design Study.

Wang, Michael; Siddiqi, Tanya; Gordon, Leo I; Kamdar, Manali; Lunning, Matthew; Hirayama, Alexandre V; Abramson, Jeremy S; Arnason, Jon; Ghosh, Nilanjan; Mehta, Amitkumar; Andreadis, Charalambos; Solomon, Scott R; Kostic, Ana; Dehner, Christine; Espinola, Ricardo; Peng, Lily; Ogasawara, Ken; Chattin, Amy; Eliason, Laurie; Palomba, M Lia.

J Clin Oncol ; 42(10): 1146-1157, 2024 Apr 01.

Artículo en Inglés | MEDLINE | ID: mdl-38072625

RESUMEN

PURPOSE: To report the primary analysis results from the mantle cell lymphoma (MCL) cohort of the phase I seamless design TRANSCEND NHL 001 (ClinicalTrials.gov identifier: NCT02631044) study. METHODS: Patients with relapsed/refractory (R/R) MCL after ≥two lines of previous therapy, including a Bruton tyrosine kinase inhibitor (BTKi), an alkylating agent, and a CD20-targeted agent, received lisocabtagene maraleucel (liso-cel) at a target dose level (DL) of 50 × 106 (DL1) or 100 × 106 (DL2) chimeric antigen receptor-positive T cells. Primary end points were adverse events (AEs), dose-limiting toxicities, and objective response rate (ORR) by independent review committee per Lugano criteria. RESULTS: Of 104 leukapheresed patients, liso-cel was infused into 88. Median (range) number of previous lines of therapy was three (1-11) with 30% receiving ≥five previous lines of therapy, 73% of patients were age 65 years and older, 69% had refractory disease, 53% had BTKi refractory disease, 23% had TP53 mutation, and 8% had secondary CNS lymphoma. Median (range) on-study follow-up was 16.1 months (0.4-60.5). In the efficacy set (n = 83; DL1 + DL2), ORR was 83.1% (95% CI, 73.3 to 90.5) and complete response (CR) rate was 72.3% (95% CI, 61.4 to 81.6). Median duration of response was 15.7 months (95% CI, 6.2 to 24.0) and progression-free survival was 15.3 months (95% CI, 6.6 to 24.9). Most common grade ≥3 treatment-emergent AEs were neutropenia (56%), anemia (37.5%), and thrombocytopenia (25%). Cytokine release syndrome (CRS) was reported in 61% of patients (grade 3/4, 1%; grade 5, 0), neurologic events (NEs) in 31% (grade 3/4, 9%; grade 5, 0), grade ≥3 infections in 15%, and prolonged cytopenia in 40%. CONCLUSION: Liso-cel demonstrated high CR rate and deep, durable responses with low incidence of grade ≥3 CRS, NE, and infections in patients with heavily pretreated R/R MCL, including those with high-risk, aggressive disease.

Asunto(s)

Antineoplásicos , Linfoma de Células B Grandes Difuso , Linfoma de Células del Manto , Neutropenia , Adulto , Anciano , Humanos , Antineoplásicos/efectos adversos , Inmunoterapia Adoptiva/efectos adversos , Linfoma de Células B Grandes Difuso/tratamiento farmacológico , Recurrencia Local de Neoplasia/tratamiento farmacológico , Neutropenia/inducido químicamente

16.

Assistive AI in Lung Cancer Screening: A Retrospective Multinational Study in the United States and Japan.

Kiraly, Atilla P; Cunningham, Corbin A; Najafi, Ryan; Nabulsi, Zaid; Yang, Jie; Lau, Charles; Ledsam, Joseph R; Ye, Wenxing; Ardila, Diego; McKinney, Scott M; Pilgrim, Rory; Liu, Yun; Saito, Hiroaki; Shimamura, Yasuteru; Etemadi, Mozziyar; Melnick, David; Jansen, Sunny; Corrado, Greg S; Peng, Lily; Tse, Daniel; Shetty, Shravya; Prabhakara, Shruthi; Naidich, David P; Beladia, Neeral; Eswaran, Krish.

Radiol Artif Intell ; 6(3): e230079, 2024 05.

Artículo en Inglés | MEDLINE | ID: mdl-38477661

RESUMEN

Purpose To evaluate the impact of an artificial intelligence (AI) assistant for lung cancer screening on multinational clinical workflows. Materials and Methods An AI assistant for lung cancer screening was evaluated on two retrospective randomized multireader multicase studies where 627 (141 cancer-positive cases) low-dose chest CT cases were each read twice (with and without AI assistance) by experienced thoracic radiologists (six U.S.-based or six Japan-based radiologists), resulting in a total of 7524 interpretations. Positive cases were defined as those within 2 years before a pathology-confirmed lung cancer diagnosis. Negative cases were defined as those without any subsequent cancer diagnosis for at least 2 years and were enriched for a spectrum of diverse nodules. The studies measured the readers' level of suspicion (on a 0-100 scale), country-specific screening system scoring categories, and management recommendations. Evaluation metrics included the area under the receiver operating characteristic curve (AUC) for level of suspicion and sensitivity and specificity of recall recommendations. Results With AI assistance, the radiologists' AUC increased by 0.023 (0.70 to 0.72; P = .02) for the U.S. study and by 0.023 (0.93 to 0.96; P = .18) for the Japan study. Scoring system specificity for actionable findings increased 5.5% (57% to 63%; P < .001) for the U.S. study and 6.7% (23% to 30%; P < .001) for the Japan study. There was no evidence of a difference in corresponding sensitivity between unassisted and AI-assisted reads for the U.S. (67.3% to 67.5%; P = .88) and Japan (98% to 100%; P > .99) studies. Corresponding stand-alone AI AUC system performance was 0.75 (95% CI: 0.70, 0.81) and 0.88 (95% CI: 0.78, 0.97) for the U.S.- and Japan-based datasets, respectively. Conclusion The concurrent AI interface improved lung cancer screening specificity in both U.S.- and Japan-based reader studies, meriting further study in additional international screening environments. Keywords: Assistive Artificial Intelligence, Lung Cancer Screening, CT Supplemental material is available for this article. Published under a CC BY 4.0 license.

Asunto(s)

Inteligencia Artificial , Detección Precoz del Cáncer , Neoplasias Pulmonares , Tomografía Computarizada por Rayos X , Humanos , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/epidemiología , Japón , Estados Unidos/epidemiología , Estudios Retrospectivos , Detección Precoz del Cáncer/métodos , Femenino , Masculino , Persona de Mediana Edad , Anciano , Sensibilidad y Especificidad , Interpretación de Imagen Radiográfica Asistida por Computador/métodos

17.

Health equity assessment of machine learning performance (HEAL): a framework and dermatology AI model case study.

Schaekermann, Mike; Spitz, Terry; Pyles, Malcolm; Cole-Lewis, Heather; Wulczyn, Ellery; Pfohl, Stephen R; Martin, Donald; Jaroensri, Ronnachai; Keeling, Geoff; Liu, Yuan; Farquhar, Stephanie; Xue, Qinghan; Lester, Jenna; Hughes, Cían; Strachan, Patricia; Tan, Fraser; Bui, Peggy; Mermel, Craig H; Peng, Lily H; Matias, Yossi; Corrado, Greg S; Webster, Dale R; Virmani, Sunny; Semturs, Christopher; Liu, Yun; Horn, Ivor; Cameron Chen, Po-Hsuan.

EClinicalMedicine ; 70: 102479, 2024 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-38685924

RESUMEN

Background: Artificial intelligence (AI) has repeatedly been shown to encode historical inequities in healthcare. We aimed to develop a framework to quantitatively assess the performance equity of health AI technologies and to illustrate its utility via a case study. Methods: Here, we propose a methodology to assess whether health AI technologies prioritise performance for patient populations experiencing worse outcomes, that is complementary to existing fairness metrics. We developed the Health Equity Assessment of machine Learning performance (HEAL) framework designed to quantitatively assess the performance equity of health AI technologies via a four-step interdisciplinary process to understand and quantify domain-specific criteria, and the resulting HEAL metric. As an illustrative case study (analysis conducted between October 2022 and January 2023), we applied the HEAL framework to a dermatology AI model. A set of 5420 teledermatology cases (store-and-forward cases from patients of 20 years or older, submitted from primary care providers in the USA and skin cancer clinics in Australia), enriched for diversity in age, sex and race/ethnicity, was used to retrospectively evaluate the AI model's HEAL metric, defined as the likelihood that the AI model performs better for subpopulations with worse average health outcomes as compared to others. The likelihood that AI performance was anticorrelated to pre-existing health outcomes was estimated using bootstrap methods as the probability that the negated Spearman's rank correlation coefficient (i.e., "R") was greater than zero. Positive values of R suggest that subpopulations with poorer health outcomes have better AI model performance. Thus, the HEAL metric, defined as p (R >0), measures how likely the AI technology is to prioritise performance for subpopulations with worse average health outcomes as compared to others (presented as a percentage below). Health outcomes were quantified as disability-adjusted life years (DALYs) when grouping by sex and age, and years of life lost (YLLs) when grouping by race/ethnicity. AI performance was measured as top-3 agreement with the reference diagnosis from a panel of 3 dermatologists per case. Findings: Across all dermatologic conditions, the HEAL metric was 80.5% for prioritizing AI performance of racial/ethnic subpopulations based on YLLs, and 92.1% and 0.0% respectively for prioritizing AI performance of sex and age subpopulations based on DALYs. Certain dermatologic conditions were significantly associated with greater AI model performance compared to a reference category of less common conditions. For skin cancer conditions, the HEAL metric was 73.8% for prioritizing AI performance of age subpopulations based on DALYs. Interpretation: Analysis using the proposed HEAL framework showed that the dermatology AI model prioritised performance for race/ethnicity, sex (all conditions) and age (cancer conditions) subpopulations with respect to pre-existing health disparities. More work is needed to investigate ways of promoting equitable AI performance across age for non-cancer conditions and to better understand how AI models can contribute towards improving equity in health outcomes. Funding: Google LLC.

18.

Risk Stratification for Diabetic Retinopathy Screening Order Using Deep Learning: A Multicenter Prospective Study.

Bora, Ashish; Tiwari, Richa; Bavishi, Pinal; Virmani, Sunny; Huang, Rayman; Traynis, Ilana; Corrado, Greg S; Peng, Lily; Webster, Dale R; Varadarajan, Avinash V; Pattanapongpaiboon, Warisara; Chopra, Reena; Ruamviboonsuk, Paisan.

Transl Vis Sci Technol ; 12(12): 11, 2023 12 01.

Artículo en Inglés | MEDLINE | ID: mdl-38079169

RESUMEN

Purpose: Real-world evaluation of a deep learning model that prioritizes patients based on risk of progression to moderate or worse (MOD+) diabetic retinopathy (DR). Methods: This nonrandomized, single-arm, prospective, interventional study included patients attending DR screening at four centers across Thailand from September 2019 to January 2020, with mild or no DR. Fundus photographs were input into the model, and patients were scheduled for their subsequent screening from September 2020 to January 2021 in order of predicted risk. Evaluation focused on model sensitivity, defined as correctly ranking patients that developed MOD+ within the first 50% of subsequent screens. Results: We analyzed 1,757 patients, of which 52 (3.0%) developed MOD+. Using the model-proposed order, the model's sensitivity was 90.4%. Both the model-proposed order and mild/no DR plus HbA1c had significantly higher sensitivity than the random order (P < 0.001). Excluding one major (rural) site that had practical implementation challenges, the remaining sites included 567 patients and 15 (2.6%) developed MOD+. Here, the model-proposed order achieved 86.7% versus 73.3% for the ranking that used DR grade and hemoglobin A1c. Conclusions: The model can help prioritize follow-up visits for the largest subgroups of DR patients (those with no or mild DR). Further research is needed to evaluate the impact on clinical management and outcomes. Translational Relevance: Deep learning demonstrated potential for risk stratification in DR screening. However, real-world practicalities must be resolved to fully realize the benefit.

Asunto(s)

Aprendizaje Profundo , Diabetes Mellitus , Retinopatía Diabética , Humanos , Retinopatía Diabética/diagnóstico , Retinopatía Diabética/epidemiología , Estudios Prospectivos , Hemoglobina Glucada , Medición de Riesgo

19.

A deep learning model for novel systemic biomarkers in photographs of the external eye: a retrospective study.

Babenko, Boris; Traynis, Ilana; Chen, Christina; Singh, Preeti; Uddin, Akib; Cuadros, Jorge; Daskivich, Lauren P; Maa, April Y; Kim, Ramasamy; Kang, Eugene Yu-Chuan; Matias, Yossi; Corrado, Greg S; Peng, Lily; Webster, Dale R; Semturs, Christopher; Krause, Jonathan; Varadarajan, Avinash V; Hammel, Naama; Liu, Yun.

Lancet Digit Health ; 5(5): e257-e264, 2023 05.

Artículo en Inglés | MEDLINE | ID: mdl-36966118

RESUMEN

BACKGROUND: Photographs of the external eye were recently shown to reveal signs of diabetic retinal disease and elevated glycated haemoglobin. This study aimed to test the hypothesis that external eye photographs contain information about additional systemic medical conditions. METHODS: We developed a deep learning system (DLS) that takes external eye photographs as input and predicts systemic parameters, such as those related to the liver (albumin, aspartate aminotransferase [AST]); kidney (estimated glomerular filtration rate [eGFR], urine albumin-to-creatinine ratio [ACR]); bone or mineral (calcium); thyroid (thyroid stimulating hormone); and blood (haemoglobin, white blood cells [WBC], platelets). This DLS was trained using 123 130 images from 38 398 patients with diabetes undergoing diabetic eye screening in 11 sites across Los Angeles county, CA, USA. Evaluation focused on nine prespecified systemic parameters and leveraged three validation sets (A, B, C) spanning 25 510 patients with and without diabetes undergoing eye screening in three independent sites in Los Angeles county, CA, and the greater Atlanta area, GA, USA. We compared performance against baseline models incorporating available clinicodemographic variables (eg, age, sex, race and ethnicity, years with diabetes). FINDINGS: Relative to the baseline, the DLS achieved statistically significant superior performance at detecting AST >36·0 U/L, calcium <8·6 mg/dL, eGFR <60·0 mL/min/1·73 m2, haemoglobin <11·0 g/dL, platelets <150·0 × 103/µL, ACR ≥300 mg/g, and WBC <4·0 × 103/µL on validation set A (a population resembling the development datasets), with the area under the receiver operating characteristic curve (AUC) of the DLS exceeding that of the baseline by 5·3-19·9% (absolute differences in AUC). On validation sets B and C, with substantial patient population differences compared with the development datasets, the DLS outperformed the baseline for ACR ≥300·0 mg/g and haemoglobin <11·0 g/dL by 7·3-13·2%. INTERPRETATION: We found further evidence that external eye photographs contain biomarkers spanning multiple organ systems. Such biomarkers could enable accessible and non-invasive screening of disease. Further work is needed to understand the translational implications. FUNDING: Google.

Asunto(s)

Aprendizaje Profundo , Retinopatía Diabética , Humanos , Estudios Retrospectivos , Calcio , Retinopatía Diabética/diagnóstico , Biomarcadores , Albúminas

20.

Pathologist Validation of a Machine Learning-Derived Feature for Colon Cancer Risk Stratification.

L'Imperio, Vincenzo; Wulczyn, Ellery; Plass, Markus; Müller, Heimo; Tamini, Nicolò; Gianotti, Luca; Zucchini, Nicola; Reihs, Robert; Corrado, Greg S; Webster, Dale R; Peng, Lily H; Chen, Po-Hsuan Cameron; Lavitrano, Marialuisa; Liu, Yun; Steiner, David F; Zatloukal, Kurt; Pagni, Fabio.

JAMA Netw Open ; 6(3): e2254891, 2023 03 01.

Artículo en Inglés | MEDLINE | ID: mdl-36917112

RESUMEN

Importance: Identifying new prognostic features in colon cancer has the potential to refine histopathologic review and inform patient care. Although prognostic artificial intelligence systems have recently demonstrated significant risk stratification for several cancer types, studies have not yet shown that the machine learning-derived features associated with these prognostic artificial intelligence systems are both interpretable and usable by pathologists. Objective: To evaluate whether pathologist scoring of a histopathologic feature previously identified by machine learning is associated with survival among patients with colon cancer. Design, Setting, and Participants: This prognostic study used deidentified, archived colorectal cancer cases from January 2013 to December 2015 from the University of Milano-Bicocca. All available histologic slides from 258 consecutive colon adenocarcinoma cases were reviewed from December 2021 to February 2022 by 2 pathologists, who conducted semiquantitative scoring for tumor adipose feature (TAF), which was previously identified via a prognostic deep learning model developed with an independent colorectal cancer cohort. Main Outcomes and Measures: Prognostic value of TAF for overall survival and disease-specific survival as measured by univariable and multivariable regression analyses. Interpathologist agreement in TAF scoring was also evaluated. Results: A total of 258 colon adenocarcinoma histopathologic cases from 258 patients (138 men [53%]; median age, 67 years [IQR, 65-81 years]) with stage II (n = 119) or stage III (n = 139) cancer were included. Tumor adipose feature was identified in 120 cases (widespread in 63 cases, multifocal in 31, and unifocal in 26). For overall survival analysis after adjustment for tumor stage, TAF was independently prognostic in 2 ways: TAF as a binary feature (presence vs absence: hazard ratio [HR] for presence of TAF, 1.55 [95% CI, 1.07-2.25]; P = .02) and TAF as a semiquantitative categorical feature (HR for widespread TAF, 1.87 [95% CI, 1.23-2.85]; P = .004). Interpathologist agreement for widespread TAF vs lower categories (absent, unifocal, or multifocal) was 90%, corresponding to a κ metric at this threshold of 0.69 (95% CI, 0.58-0.80). Conclusions and Relevance: In this prognostic study, pathologists were able to learn and reproducibly score for TAF, providing significant risk stratification on this independent data set. Although additional work is warranted to understand the biological significance of this feature and to establish broadly reproducible TAF scoring, this work represents the first validation to date of human expert learning from machine learning in pathology. Specifically, this validation demonstrates that a computationally identified histologic feature can represent a human-identifiable, prognostic feature with the potential for integration into pathology practice.

Asunto(s)

Adenocarcinoma , Neoplasias del Colon , Masculino , Humanos , Anciano , Neoplasias del Colon/diagnóstico , Patólogos , Inteligencia Artificial , Aprendizaje Automático , Medición de Riesgo

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA