Búsqueda | Portal Regional de la BVS

Artificial Intelligence Screening of Medical School Applications: Development and Validation of a Machine-Learning Algorithm.

Triola, Marc M; Reinstein, Ilan; Marin, Marina; Gillespie, Colleen; Abramson, Steven; Grossman, Robert I; Rivera, Rafael.

Acad Med ; 98(9): 1036-1043, 2023 09 01.

Artículo en Inglés | MEDLINE | ID: mdl-36888969

RESUMEN

PURPOSE: To explore whether a machine-learning algorithm could accurately perform the initial screening of medical school applications. METHOD: Using application data and faculty screening outcomes from the 2013 to 2017 application cycles (n = 14,555 applications), the authors created a virtual faculty screener algorithm. A retrospective validation using 2,910 applications from the 2013 to 2017 cycles and a prospective validation using 2,715 applications during the 2018 application cycle were performed. To test the validated algorithm, a randomized trial was performed in the 2019 cycle, with 1,827 eligible applications being reviewed by faculty and 1,873 by algorithm. RESULTS: The retrospective validation yielded area under the receiver operating characteristic (AUROC) values of 0.83, 0.64, and 0.83 and area under the precision-recall curve (AUPRC) values of 0.61, 0.54, and 0.65 for the invite for interview, hold for review, and reject groups, respectively. The prospective validation yielded AUROC values of 0.83, 0.62, and 0.82 and AUPRC values of 0.66, 0.47, and 0.65 for the invite for interview, hold for review, and reject groups, respectively. The randomized trial found no significant differences in overall interview recommendation rates according to faculty or algorithm and among female or underrepresented in medicine applicants. In underrepresented in medicine applicants, there were no significant differences in the rates at which the admissions committee offered an interview (70 of 71 in the faculty reviewer arm and 61 of 65 in the algorithm arm; P = .14). No difference in the rate of the committee agreeing with the recommended interview was found among female applicants (224 of 229 in the faculty reviewer arm and 220 of 227 in the algorithm arm; P = .55). CONCLUSIONS: The virtual faculty screener algorithm successfully replicated faculty screening of medical school applications and may aid in the consistent and reliable review of medical school applicants.

Asunto(s)

Inteligencia Artificial , Facultades de Medicina , Humanos , Femenino , Estudios Retrospectivos , Algoritmos , Aprendizaje Automático

A New Tool for Holistic Residency Application Review: Using Natural Language Processing of Applicant Experiences to Predict Interview Invitation.

Mahtani, Arun Umesh; Reinstein, Ilan; Marin, Marina; Burk-Rafel, Jesse.

Acad Med ; 98(9): 1018-1021, 2023 09 01.

Artículo en Inglés | MEDLINE | ID: mdl-36940395

RESUMEN

PROBLEM: Reviewing residency application narrative components is time intensive and has contributed to nearly half of applications not receiving holistic review. The authors developed a natural language processing (NLP)-based tool to automate review of applicants' narrative experience entries and predict interview invitation. APPROACH: Experience entries (n = 188,500) were extracted from 6,403 residency applications across 3 application cycles (2017-2019) at 1 internal medicine program, combined at the applicant level, and paired with the interview invitation decision (n = 1,224 invitations). NLP identified important words (or word pairs) with term frequency-inverse document frequency, which were used to predict interview invitation using logistic regression with L1 regularization. Terms remaining in the model were analyzed thematically. Logistic regression models were also built using structured application data and a combination of NLP and structured data. Model performance was evaluated on never-before-seen data using area under the receiver operating characteristic and precision-recall curves (AUROC, AUPRC). OUTCOMES: The NLP model had an AUROC of 0.80 (vs chance decision of 0.50) and AUPRC of 0.49 (vs chance decision of 0.19), showing moderate predictive strength. Phrases indicating active leadership, research, or work in social justice and health disparities were associated with interview invitation. The model's detection of these key selection factors demonstrated face validity. Adding structured data to the model significantly improved prediction (AUROC 0.92, AUPRC 0.73), as expected given reliance on such metrics for interview invitation. NEXT STEPS: This model represents a first step in using NLP-based artificial intelligence tools to promote holistic residency application review. The authors are assessing the practical utility of using this model to identify applicants screened out using traditional metrics. Generalizability must be determined through model retraining and evaluation at other programs. Work is ongoing to thwart model "gaming," improve prediction, and remove unwanted biases introduced during model training.

Asunto(s)

Internado y Residencia , Humanos , Procesamiento de Lenguaje Natural , Inteligencia Artificial , Selección de Personal , Liderazgo

Identifying Meaningful Patterns of Internal Medicine Clerkship Grading Distributions: Application of Data Science Techniques Across 135 U.S. Medical Schools.

Burk-Rafel, Jesse; Reinstein, Ilan; Park, Yoon Soo.

Acad Med ; 98(3): 337-341, 2023 03 01.

Artículo en Inglés | MEDLINE | ID: mdl-36484555

RESUMEN

PROBLEM: Residency program directors use clerkship grades for high-stakes selection decisions despite substantial variability in grading systems and distributions. The authors apply clustering techniques from data science to identify groups of schools for which grading distributions were statistically similar in the internal medicine clerkship. APPROACH: Grading systems (e.g., honors/pass/fail) and distributions (i.e., percent of students in each grade tier) were tabulated for the internal medicine clerkship at U.S. MD-granting medical schools by manually reviewing Medical Student Performance Evaluations (MSPEs) in the 2019 and 2020 residency application cycles. Grading distributions were analyzed using k-means cluster analysis, with the optimal number of clusters selected using model fit indices. OUTCOMES: Among the 145 medical schools with available MSPE data, 64 distinct grading systems were reported. Among the 135 schools reporting a grading distribution, the median percent of students receiving the highest and lowest tier grade was 32% (range: 2%-66%) and 2% (range: 0%-91%), respectively. Four clusters was the most optimal solution (Î· 2 = 0.8): cluster 1 (45% [highest grade tier]-45% [middle tier]-10% [lowest tier], n = 64 [47%] schools), cluster 2 (25%-30%-45%, n = 40 [30%] schools), cluster 3 (20%-75%-5%, n = 25 [19%] schools), and cluster 4 (15%-25%-25%-25%-10%, n = 6 [4%] schools). The findings suggest internal medicine clerkship grading systems may be more comparable across institutions than previously thought. NEXT STEPS: The authors will prospectively review reported clerkship grading approaches across additional specialties and are conducting a mixed-methods analysis, incorporating a sequential explanatory model, to interview stakeholder groups on the use of the patterns identified.

Asunto(s)

Prácticas Clínicas , Estudiantes de Medicina , Humanos , Evaluación Educacional/métodos , Facultades de Medicina , Ciencia de los Datos

Mapping hospital data to characterize residents' educational experiences.

Rhee, David W; Reinstein, Ilan; Jrada, Morris; Pendse, Jay; Cocks, Patrick; Stern, David T; Sartori, Daniel J.

BMC Med Educ ; 22(1): 496, 2022 Jun 25.

Artículo en Inglés | MEDLINE | ID: mdl-35752814

RESUMEN

BACKGROUND: Experiential learning through patient care is fundamental to graduate medical education. Despite this, the actual content to which trainees are exposed in clinical practice is difficult to quantify and is poorly characterized. There remains an unmet need to define precisely how residents' patient care activities inform their educational experience. METHODS: Using a recently-described crosswalk tool, we mapped principal ICD-10 discharge diagnosis codes to American Board of Internal Medicine (ABIM) content at four training hospitals of a single Internal Medicine (IM) Residency Program over one academic year to characterize and compare residents' clinical educational experiences. Frequencies of broad content categories and more specific condition categories were compared across sites to profile residents' aggregate inpatient clinical experiences and drive curricular change. RESULTS: There were 18,604 discharges from inpatient resident teams during the study period. The crosswalk captured > 95% of discharges at each site. Infectious Disease (ranging 17.4 to 39.5% of total discharges) and Cardiovascular Disease (15.8 to 38.2%) represented the most common content categories at each site. Several content areas (Allergy/Immunology, Dermatology, Obstetrics/Gynecology, Ophthalmology, Otolaryngology/Dental Medicine) were notably underrepresented (≤ 1% at each site). There were significant differences in the frequencies of conditions within most content categories, suggesting that residents experience distinct site-specific clinical content during their inpatient training. CONCLUSIONS: There were substantial differences in the clinical content experienced by our residents across hospital sites, prompting several important programmatic and curricular changes to enrich our residents' hospital-based educational experiences.

Asunto(s)

Internado y Residencia , Competencia Clínica , Curriculum , Educación de Postgrado en Medicina , Hospitales de Enseñanza , Humanos , Medicina Interna/educación , Estados Unidos

Development and Validation of a Machine Learning Model for Automated Assessment of Resident Clinical Reasoning Documentation.

Schaye, Verity; Guzman, Benedict; Burk-Rafel, Jesse; Marin, Marina; Reinstein, Ilan; Kudlowitz, David; Miller, Louis; Chun, Jonathan; Aphinyanaphongs, Yindalon.

J Gen Intern Med ; 37(9): 2230-2238, 2022 07.

Artículo en Inglés | MEDLINE | ID: mdl-35710676

RESUMEN

BACKGROUND: Residents receive infrequent feedback on their clinical reasoning (CR) documentation. While machine learning (ML) and natural language processing (NLP) have been used to assess CR documentation in standardized cases, no studies have described similar use in the clinical environment. OBJECTIVE: The authors developed and validated using Kane's framework a ML model for automated assessment of CR documentation quality in residents' admission notes. DESIGN, PARTICIPANTS, MAIN MEASURES: Internal medicine residents' and subspecialty fellows' admission notes at one medical center from July 2014 to March 2020 were extracted from the electronic health record. Using a validated CR documentation rubric, the authors rated 414 notes for the ML development dataset. Notes were truncated to isolate the relevant portion; an NLP software (cTAKES) extracted disease/disorder named entities and human review generated CR terms. The final model had three input variables and classified notes as demonstrating low- or high-quality CR documentation. The ML model was applied to a retrospective dataset (9591 notes) for human validation and data analysis. Reliability between human and ML ratings was assessed on 205 of these notes with Cohen's kappa. CR documentation quality by post-graduate year (PGY) was evaluated by the Mantel-Haenszel test of trend. KEY RESULTS: The top-performing logistic regression model had an area under the receiver operating characteristic curve of 0.88, a positive predictive value of 0.68, and an accuracy of 0.79. Cohen's kappa was 0.67. Of the 9591 notes, 31.1% demonstrated high-quality CR documentation; quality increased from 27.0% (PGY1) to 31.0% (PGY2) to 39.0% (PGY3) (p < .001 for trend). Validity evidence was collected in each domain of Kane's framework (scoring, generalization, extrapolation, and implications). CONCLUSIONS: The authors developed and validated a high-performing ML model that classifies CR documentation quality in resident admission notes in the clinical environment-a novel application of ML and NLP with many potential use cases.

Asunto(s)

Razonamiento Clínico , Documentación , Registros Electrónicos de Salud , Humanos , Aprendizaje Automático , Procesamiento de Lenguaje Natural , Reproducibilidad de los Resultados , Estudios Retrospectivos

Development and Validation of a Machine Learning-Based Decision Support Tool for Residency Applicant Screening and Review.

Burk-Rafel, Jesse; Reinstein, Ilan; Feng, James; Kim, Moosun Brad; Miller, Louis H; Cocks, Patrick M; Marin, Marina; Aphinyanaphongs, Yindalon.

Acad Med ; 96(11S): S54-S61, 2021 11 01.

Artículo en Inglés | MEDLINE | ID: mdl-34348383

RESUMEN

PURPOSE: Residency programs face overwhelming numbers of residency applications, limiting holistic review. Artificial intelligence techniques have been proposed to address this challenge but have not been created. Here, a multidisciplinary team sought to develop and validate a machine learning (ML)-based decision support tool (DST) for residency applicant screening and review. METHOD: Categorical applicant data from the 2018, 2019, and 2020 residency application cycles (n = 8,243 applicants) at one large internal medicine residency program were downloaded from the Electronic Residency Application Service and linked to the outcome measure: interview invitation by human reviewers (n = 1,235 invites). An ML model using gradient boosting was designed using training data (80% of applicants) with over 60 applicant features (e.g., demographics, experiences, academic metrics). Model performance was validated on held-out data (20% of applicants). Sensitivity analysis was conducted without United States Medical Licensing Examination (USMLE) scores. An interactive DST incorporating the ML model was designed and deployed that provided applicant- and cohort-level visualizations. RESULTS: The ML model areas under the receiver operating characteristic and precision recall curves were 0.95 and 0.76, respectively; these changed to 0.94 and 0.72, respectively, with removal of USMLE scores. Applicants' medical school information was an important driver of predictions-which had face validity based on the local selection process-but numerous predictors contributed. Program directors used the DST in the 2021 application cycle to select 20 applicants for interview that had been initially screened out during human review. CONCLUSIONS: The authors developed and validated an ML algorithm for predicting residency interview offers from numerous application elements with high performance-even when USMLE scores were removed. Model deployment in a DST highlighted its potential for screening candidates and helped quantify and mitigate biases existing in the selection process. Further work will incorporate unstructured textual data through natural language processing methods.

Asunto(s)

Técnicas de Apoyo para la Decisión , Internado y Residencia , Aprendizaje Automático , Selección de Personal/métodos , Criterios de Admisión Escolar , Humanos , Estados Unidos

Multi-level longitudinal learning curve regression models integrated with item difficulty metrics for deliberate practice of visual diagnosis: groundwork for adaptive learning.

Reinstein, Ilan; Hill, Jennifer; Cook, David A; Lineberry, Matthew; Pusic, Martin V.

Adv Health Sci Educ Theory Pract ; 26(3): 881-912, 2021 08.

Artículo en Inglés | MEDLINE | ID: mdl-33646468

RESUMEN

Visual diagnosis of radiographs, histology and electrocardiograms lends itself to deliberate practice, facilitated by large online banks of cases. Which cases to supply to which learners in which order is still to be worked out, with there being considerable potential for adapting the learning. Advances in statistical modeling, based on an accumulating learning curve, offer methods for more effectively pairing learners with cases of known calibrations. Using demonstration radiograph and electrocardiogram datasets, the advantages of moving from traditional regression to multilevel methods for modeling growth in ability or performance are demonstrated, with a final step of integrating case-level item-response information based on diagnostic grouping. This produces more precise individual-level estimates that can eventually support learner adaptive case selection. The progressive increase in model sophistication is not simply statistical but rather brings the models into alignment with core learning principles including the importance of taking into account individual differences in baseline skill and learning rate as well as the differential interaction with cases of varying diagnosis and difficulty. The developed approach can thus give researchers and educators a better basis on which to anticipate learners' pathways and individually adapt their future learning.

Asunto(s)

Benchmarking , Curva de Aprendizaje , Competencia Clínica , Evaluación Educacional , Humanos , Modelos Estadísticos

A validated, real-time prediction model for favorable outcomes in hospitalized COVID-19 patients.

Razavian, Narges; Major, Vincent J; Sudarshan, Mukund; Burk-Rafel, Jesse; Stella, Peter; Randhawa, Hardev; Bilaloglu, Seda; Chen, Ji; Nguy, Vuthy; Wang, Walter; Zhang, Hao; Reinstein, Ilan; Kudlowitz, David; Zenger, Cameron; Cao, Meng; Zhang, Ruina; Dogra, Siddhant; Harish, Keerthi B; Bosworth, Brian; Francois, Fritz; Horwitz, Leora I; Ranganath, Rajesh; Austrian, Jonathan; Aphinyanaphongs, Yindalon.

NPJ Digit Med ; 3: 130, 2020.

Artículo en Inglés | MEDLINE | ID: mdl-33083565

RESUMEN

The COVID-19 pandemic has challenged front-line clinical decision-making, leading to numerous published prognostic tools. However, few models have been prospectively validated and none report implementation in practice. Here, we use 3345 retrospective and 474 prospective hospitalizations to develop and validate a parsimonious model to identify patients with favorable outcomes within 96 h of a prediction, based on real-time lab values, vital signs, and oxygen support variables. In retrospective and prospective validation, the model achieves high average precision (88.6% 95% CI: [88.4-88.7] and 90.8% [90.8-90.8]) and discrimination (95.1% [95.1-95.2] and 86.8% [86.8-86.9]) respectively. We implemented and integrated the model into the EHR, achieving a positive predictive value of 93.3% with 41% sensitivity. Preliminary results suggest clinicians are adopting these scores into their clinical workflows.

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA