RESUMEN
The metaverse refers to a collective virtual space that combines physical and digital realities to create immersive, interactive environments. This space is powered by technologies such as augmented reality (AR), virtual reality (VR), artificial intelligence (AI) and blockchain. In healthcare, the metaverse can offer many applications. Specifically in surgery, potential uses of the metaverse include the possibility of conducting immersive surgical training in a VR or AR setting, and enhancing surgical planning through the adoption of three-dimensional virtual models and simulated procedures. At the intraoperative level, AR-guided surgery can assist the surgeon in real time to increase surgical precision in tumour identification and selective management of vessels. In post-operative care, potential uses of the metaverse include recovery monitoring and patient education. In urology, AR and VR have been widely explored in the past decade, mainly for surgical navigation in prostate and kidney cancer surgery, whereas only anecdotal metaverse experiences have been reported to date, specifically in partial nephrectomy. In the future, further integration of AI will improve the metaverse experience, potentially increasing the possibility of carrying out surgical navigation, data collection and virtual trials within the metaverse. However, challenges concerning data security and regulatory compliance must be addressed before the metaverse can be used to improve patient care.
RESUMEN
With generative artificial intelligence (GenAI), particularly large language models (LLMs), continuing to make inroads in healthcare, assessing LLMs with human evaluations is essential to assuring safety and effectiveness. This study reviews existing literature on human evaluation methodologies for LLMs in healthcare across various medical specialties and addresses factors such as evaluation dimensions, sample types and sizes, selection, and recruitment of evaluators, frameworks and metrics, evaluation process, and statistical analysis type. Our literature review of 142 studies shows gaps in reliability, generalizability, and applicability of current human evaluation practices. To overcome such significant obstacles to healthcare LLM developments and deployments, we propose QUEST, a comprehensive and practical framework for human evaluation of LLMs covering three phases of workflow: Planning, Implementation and Adjudication, and Scoring and Review. QUEST is designed with five proposed evaluation principles: Quality of Information, Understanding and Reasoning, Expression Style and Persona, Safety and Harm, and Trust and Confidence.
RESUMEN
Generative-AI (GAI) models like ChatGPT are becoming widely discussed and utilized tools in medical education. For example, it can be used to assist with studying for exams, shown capable of passing the USMLE board exams. However, there have been concerns expressed regarding its fair and ethical use. We designed an electronic survey for students across North American medical colleges to gauge their views on and current use of ChatGPT and similar technologies in May, 2023. Overall, 415 students from at least 28 medical schools completed the questionnaire and 96% of respondents had heard of ChatGPT and 52% had used it for medical school coursework. The most common use in pre-clerkship and clerkship phase was asking for explanations of medical concepts and assisting with diagnosis/treatment plans, respectively. The most common use in academic research was for proof reading and grammar edits. Respondents recognized the potential limitations of ChatGPT, including inaccurate responses, patient privacy, and plagiarism. Students recognized the importance of regulations to ensure proper use of this novel technology. Understanding the views of students is essential to crafting workable instructional courses, guidelines, and regulations that ensure the safe, productive use of generative-AI in medical school.
RESUMEN
OBJECTIVE: To evaluate whether earlier administration of adjuvant chemotherapy (AC) can significantly augment survival rates in muscle-invasive bladder cancer. METHODS: We systematically searched PubMed, Cochrane Central, Scopus, and Web of Science library databases for original articles that looked at timing to AC after radical cystectomy. Heterogeneity was assessed using Higgins I2%, with values over 50% considered heterogeneous and analyzed with a random effects model; otherwise, a fixed effects model was used. Studies were stratified based on the cutoff time used for administering AC. Two primary cutoffs were employed: 45 days and 90 days. Immediate AC was defined as chemotherapy administered before the predefined cutoff, while delayed AC was defined as chemotherapy administered after this cutoff. Comparisons were made between immediate versus delayed. RESULTS: A total of 5 studies were included. Overall survival (OS) was reported in all of the studies. The meta-analysis showed that immediate AC significantly improved OS, with a hazard ratio (HR) of 1.20 [1.06, 1.36], P=.004. When stratifying by the timing of therapy, starting chemotherapy within 45 days resulted in a greater improvement in survival (HR 1.27 [1.02, 1.59], P=.03) compared to starting within 90 days (HR 1.17 [1.00, 1.36], P=.04). CONCLUSION: The findings of this systematic review and meta-analysis emphasize that the timing of AC post-radical cystectomy significantly influences survival outcomes in patients with MIBC. The benefits of early AC initiation underscore its potential in mitigating disease progression and improving long-term survival rates.
RESUMEN
OBJECTIVE: To evaluate the learning curve of a transperineal (TP) magnetic resonance imaging (MRI) and transrectal ultrasound (TRUS) fusion prostate biopsy (PBx). MATERIALS AND METHODS: Consecutive patients undergoing MRI followed by TP PBx from May/2017 to January/2023, were prospectively enrolled (IRB# HS-13-00663). All participants underwent MRI followed by 12 to 14 core systematic PBx (SB), with at least 2 additional targeted biopsy (TB) cores per PIRADS ≥3. The biopsies were performed transperineally using an organ tracking image-fusion system. The cohort was divided into chronological quintiles. An inflection point analysis was performed to determine proficiency. Operative time was defined from insertion to removal of the TRUS probe from the patient's rectum. Grade Group ≥2 defined clinically significant prostate cancer (CSPCa). Statistically significant if P < 0.05. RESULTS: A total of 370 patients were included and divided into quintiles of 74 patients. MRI findings and PIRADS distribution were similar between quintiles (Pâ¯=â¯0.08). The CSPCa detection with SB+TB was consistent across quintiles: PIRADS 1 and 2 (range, 0%-18%; Pâ¯=â¯0.25); PIRADS 3 to 5 (range, 46%-70%; Pâ¯=â¯0.12). The CSPCa detection on PIRADS 3 to 5 TB alone, for quintiles 1 to 5, was respectively 44%, 58%, 66%, 41%, and 53% (Pâ¯=â¯0.08). The median operative time significantly decreased for PIRADS 1 and 2 (33 min to 13 min; P < 0.01) and PIRADS 3 to 5 (48 min to 19 min; P < 0.01), reaching a plateau after 156 cases. Complications were not significantly different across quintiles (range, 0-5.4%; Pâ¯=â¯0.3). CONCLUSIONS: The CSPCa detection remained consistently satisfactory throughout the learning curve of the Transperineal MRI/TRUS fusion prostate biopsy. However, the operative time significantly decreased with proficiency achieved after 156 cases.
RESUMEN
Large Language Models (LLMs) are rapidly being adopted in healthcare, necessitating standardized reporting guidelines. We present TRIPOD-LLM, an extension of the TRIPOD+AI statement, addressing the unique challenges of LLMs in biomedical applications. TRIPOD-LLM provides a comprehensive checklist of 19 main items and 50 subitems, covering key aspects from title to discussion. The guidelines introduce a modular format accommodating various LLM research designs and tasks, with 14 main items and 32 subitems applicable across all categories. Developed through an expedited Delphi process and expert consensus, TRIPOD-LLM emphasizes transparency, human oversight, and task-specific performance reporting. We also introduce an interactive website ( https://tripod-llm.vercel.app/ ) facilitating easy guideline completion and PDF generation for submission. As a living document, TRIPOD-LLM will evolve with the field, aiming to enhance the quality, reproducibility, and clinical applicability of LLM research in healthcare through comprehensive reporting. COI: DSB: Editorial, unrelated to this work: Associate Editor of Radiation Oncology, HemOnc.org (no financial compensation); Research funding, unrelated to this work: American Association for Cancer Research; Advisory and consulting, unrelated to this work: MercurialAI. DDF: Editorial, unrelated to this work: Associate Editor of JAMIA, Editorial Board of Scientific Data, Nature; Funding, unrelated to this work: the intramural research program at the U.S. National Library of Medicine, National Institutes of Health. JWG: Editorial, unrelated to this work: Editorial Board of Radiology: Artificial Intelligence, British Journal of Radiology AI journal and NEJM AI. All other authors declare no conflicts of interest.
RESUMEN
PURPOSE: This cross-sectional study assessed a generative-AI platform to automate the creation of accurate, appropriate, and compelling social-media (SoMe) posts from urological journal articles. MATERIALS AND METHODS: One hundred SoMe-posts from the top 3 journals in urology X (Twitter) profiles were collected from Aug-2022 to Oct-2023 A freeware GPT-tool was developed to auto-generate SoMe posts, which included title-summarization, key findings, pertinent emojis, hashtags, and DOI links to the article. Three physicians independently evaluated GPT-generated posts for achieving tetrafecta of accuracy and appropriateness criteria. Fifteen scenarios were created from 5 randomly selected posts from each journal. Each scenario contained both the original and the GPT-generated post for the same article. Five questions were formulated to investigate the posts' likability, shareability, engagement, understandability, and comprehensiveness. The paired posts were then randomized and presented to blinded academic authors and general public through Amazon Mechanical Turk (AMT) responders for preference evaluation. RESULTS: Median (IQR) time for post auto-generation was 10.2 seconds (8.5-12.5). Of the 150 rated GPT-generated posts, 115 (76.6%) met the correctness tetrafecta: 144 (96%) accurately summarized the title, 147 (98%) accurately presented the articles' main findings, 131 (87.3%) appropriately used emojis and hashtags 138 (92%). A total of 258 academic urologists and 493 AMT responders answered the surveys, wherein the GPT-generated posts consistently outperformed the original journals' posts for both academicians and AMT responders (P < .05). CONCLUSIONS: Generative-AI can automate the creation of SoMe posts from urology journal abstracts that are both accurate and preferable by the academic community and general public.
RESUMEN
PURPOSE: To compare transperineal (TP) vs transrectal (TR) magnetic resonance imaging (MRI) and transrectal ultrasound (TRUS) fusion-guided prostate biopsy (PBx) in a large, ethnically diverse and multiracial cohort. MATERIALS AND METHODS: Consecutive patients who underwent multiparametric (mp) MRI followed by TP or TR TRUS-fusion guided PBx, were identified from a prospective database (IRB #HS-13-00663). All patients underwent mpMRI followed by 12-14 core systematic PBx. A minimum of two additional target-biopsy cores were taken per PIRADS≥3 lesion. The endpoint was the detection of clinically significant prostate cancer (CSPCa; Grade Group, GG≥2). Statistical significance was defined as p<0.05. RESULTS: A total of 1491 patients met inclusion criteria, with 480 undergoing TP and 1011 TR PBx. Overall, 11% of patients were Asians, 5% African Americans, 14% Hispanic, 14% Others, and 56% White, similar between TP and TR (p=0.4). For PIRADS 3-5, the TP PBx CSPCa detection was significantly higher (61% vs 54%, p=0.03) than TR PBx, but not for PIRADS 1-2 (13% vs 13%, p=1.0). After adjusting for confounders on multivariable analysis, Black race, but not the PBx approach (TP vs TR), was an independent predictor of CSPCa detection. The median maximum cancer core length (11 vs 8mm; p<0.001) and percent (80% vs 60%; p<0.001) were greater for TP PBx even after adjusting for confounders. CONCLUSIONS: In a large and diverse cohort, Black race, but not the biopsy approach, was an independent predictor for CSPCa detection. TP and TR PBx yielded similar CSPCa detection rates; however the TP PBx was histologically more informative.
Asunto(s)
Biopsia Guiada por Imagen , Próstata , Neoplasias de la Próstata , Ultrasonografía Intervencional , Humanos , Masculino , Neoplasias de la Próstata/patología , Neoplasias de la Próstata/diagnóstico por imagen , Biopsia Guiada por Imagen/métodos , Persona de Mediana Edad , Anciano , Ultrasonografía Intervencional/métodos , Próstata/patología , Próstata/diagnóstico por imagen , Perineo , Imagen por Resonancia Magnética Intervencional/métodos , Clasificación del Tumor , Imágenes de Resonancia Magnética Multiparamétrica/métodos , Reproducibilidad de los ResultadosRESUMEN
BACKGROUND: Inguinal lymph node dissection plays an important role in the management of melanoma, penile and vulval cancer. Inguinal lymph node dissection is associated with various intraoperative and postoperative complications with significant heterogeneity in classification and reporting. This lack of standardization challenges efforts to study and report inguinal lymph node dissection outcomes. The aim of this study was to devise a system to standardize the classification and reporting of inguinal lymph node dissection perioperative complications by creating a worldwide collaborative, the complications and adverse events in lymphadenectomy of the inguinal area (CALI) group. METHODS: A modified 3-round Delphi consensus approach surveyed a worldwide group of experts in inguinal lymph node dissection for melanoma, penile and vulval cancer. The group of experts included general surgeons, urologists and oncologists (gynaecological and surgical). The survey assessed expert agreement on inguinal lymph node dissection perioperative complications. Panel interrater agreement and consistency were assessed as the overall percentage agreement and Cronbach's α. RESULTS: Forty-seven experienced consultants were enrolled: 26 (55.3%) urologists, 11 (23.4%) surgical oncologists, 6 (12.8%) general surgeons and 4 (8.5%) gynaecology oncologists. Based on their expertise, 31 (66%), 10 (21.3%) and 22 (46.8%) of the participants treat penile cancer, vulval cancer and melanoma using inguinal lymph node dissection respectively; 89.4% (42 of 47) agreed with the definitions and inclusion as part of the inguinal lymph node dissection intraoperative complication group, while 93.6% (44 of 47) agreed that postoperative complications should be subclassified into five macrocategories. Unanimous agreement (100%, 37 of 37) was achieved with the final standardized classification system for reporting inguinal lymph node dissection complications in melanoma, vulval cancer and penile cancer. CONCLUSION: The complications and adverse events in lymphadenectomy of the inguinal area classification system has been developed as a tool to standardize the assessment and reporting of complications during inguinal lymph node dissection for the treatment of melanoma, vulval and penile cancer.
Asunto(s)
Consenso , Técnica Delphi , Conducto Inguinal , Escisión del Ganglio Linfático , Melanoma , Neoplasias del Pene , Complicaciones Posoperatorias , Neoplasias de la Vulva , Humanos , Escisión del Ganglio Linfático/efectos adversos , Escisión del Ganglio Linfático/métodos , Femenino , Masculino , Neoplasias del Pene/cirugía , Neoplasias del Pene/patología , Complicaciones Posoperatorias/etiología , Complicaciones Posoperatorias/epidemiología , Neoplasias de la Vulva/cirugía , Neoplasias de la Vulva/patología , Melanoma/cirugía , Melanoma/patología , Conducto Inguinal/cirugía , Encuestas y CuestionariosRESUMEN
BACKGROUND AND OBJECTIVE: Readability of patient education materials is of utmost importance to ensure understandability and dissemination of health care information in uro-oncology. We aimed to investigate the readability of the official patient education materials of the European Association of Urology (EAU) and American Urology Association (AUA). METHODS: Patient education materials for prostate, bladder, kidney, testicular, penile, and urethral cancers were retrieved from the respective organizations. Readability was assessed via the WebFX online tool for Flesch Kincaid Reading Ease Score (FRES) and for reading grade levels by Flesch Kincaid Grade Level (FKGL), Gunning Fog Score (GFS), Smog Index (SI), Coleman Liau Index (CLI), and Automated Readability Index (ARI). Layperson readability was defined as a FRES of ≥70 and with the other readability indexes <7 according to European Union recommendations. This study assessed only objective readability and no other metrics such as understandability. KEY FINDINGS AND LIMITATIONS: Most patient education materials failed to meet the recommended threshold for laypersons. The mean readability for EAU patient education material was as follows: FRES 50.9 (standard error [SE]: 3.0), and FKGL, GFS, SI, CLI, and ARI all with scores ≥7. The mean readability for AUA patient material was as follows: FRES 64.0 (SE: 1.4), with all of FKGL, GFS, SI, and ARI scoring ≥7 readability. Only 13 out of 70 (18.6%) patient education materials' paragraphs met the readability requirements. The mean readability for bladder cancer patient education materials was the lowest, with a FRES of 36.7 (SE: 4.1). CONCLUSIONS AND CLINICAL IMPLICATIONS: Patient education materials from leading urological associations reveal readability levels beyond the recommended thresholds for laypersons and may not be understood easily by patients. There is a future need for more patient-friendly reading materials. PATIENT SUMMARY: This study checked whether health information about different cancers was easy to read. Most of it was too hard for patients to understand.
RESUMEN
Prostate Cancer is one of the most frequently occurring cancers in men, with a low survival rate if not early diagnosed. PI-RADS reading has a high false positive rate, thus increasing the diagnostic incurred costs and patient discomfort. Deep learning (DL) models achieve a high segmentation performance, although require a large model size and complexity. Also, DL models lack of feature interpretability and are perceived as "black-boxes" in the medical field. PCa-RadHop pipeline is proposed in this work, aiming to provide a more transparent feature extraction process using a linear model. It adopts the recently introduced Green Learning (GL) paradigm, which offers a small model size and low complexity. PCa-RadHop consists of two stages: Stage-1 extracts data-driven radiomics features from the bi-parametric Magnetic Resonance Imaging (bp-MRI) input and predicts an initial heatmap. To reduce the false positive rate, a subsequent stage-2 is introduced to refine the predictions by including more contextual information and radiomics features from each already detected Region of Interest (ROI). Experiments on the largest publicly available dataset, PI-CAI, show a competitive performance standing of the proposed method among other deep DL models, achieving an area under the curve (AUC) of 0.807 among a cohort of 1,000 patients. Moreover, PCa-RadHop maintains orders of magnitude smaller model size and complexity.
Asunto(s)
Imagen por Resonancia Magnética , Neoplasias de la Próstata , Humanos , Neoplasias de la Próstata/diagnóstico por imagen , Masculino , Imagen por Resonancia Magnética/métodos , Aprendizaje Profundo , Interpretación de Imagen Asistida por Computador/métodos , AlgoritmosRESUMEN
ABSTRACT Purpose: To create a nomogram to predict the absence of clinically significant prostate cancer (CSPCa) in males with non-suspicion multiparametric magnetic resonance imaging (mpMRI) undergoing prostate biopsy (PBx). Materials and Methods: We identified consecutive patients who underwent 3T mpMRI followed by PBx for suspicion of PCa or surveillance follow-up. All patients had Prostate Imaging Reporting and Data System score 1-2 (negative mpMRI). CSPCa was defined as Grade Group ≥2. Multivariate logistic regression analysis was performed via backward elimination. Discrimination was evaluated with area under the receiver operating characteristic (AUROC). Internal validation with 1,000x bootstrapping for estimating the optimism corrected AUROC. Results: Total 327 patients met inclusion criteria. The median (IQR) age and PSA density (PSAD) were 64 years (58-70) and 0.10 ng/mL2 (0.07-0.15), respectively. Biopsy history was as follows: 117 (36%) males were PBx-naive, 130 (40%) had previous negative PBx and 80 (24%) had previous positive PBx. The majority were White (65%); 6% of males self-reported Black. Overall, 44 (13%) patients were diagnosed with CSPCa on PBx. Black race, history of previous negative PBx and PSAD ≥0.15ng/mL2 were independent predictors for CSPCa on PBx and were included in the nomogram. The AUROC of the nomogram was 0.78 and the optimism corrected AUROC was 0.75. Conclusions: Our nomogram facilitates evaluating individual probability of CSPCa on PBx in males with PIRADS 1-2 mpMRI and may be used to identify those in whom PBx may be safely avoided. Black males have increased risk of CSPCa on PBx, even in the setting of PIRADS 1-2 mpMRI
RESUMEN
BACKGROUND: Generative Pretrained Model (GPT) chatbots have gained popularity since the public release of ChatGPT. Studies have evaluated the ability of different GPT models to provide information about medical conditions. To date, no study has assessed the quality of ChatGPT outputs to prostate cancer related questions from both the physician and public perspective while optimizing outputs for patient consumption. METHODS: Nine prostate cancer-related questions, identified through Google Trends (Global), were categorized into diagnosis, treatment, and postoperative follow-up. These questions were processed using ChatGPT 3.5, and the responses were recorded. Subsequently, these responses were re-inputted into ChatGPT to create simplified summaries understandable at a sixth-grade level. Readability of both the original ChatGPT responses and the layperson summaries was evaluated using validated readability tools. A survey was conducted among urology providers (urologists and urologists in training) to rate the original ChatGPT responses for accuracy, completeness, and clarity using a 5-point Likert scale. Furthermore, two independent reviewers evaluated the layperson summaries on correctness trifecta: accuracy, completeness, and decision-making sufficiency. Public assessment of the simplified summaries' clarity and understandability was carried out through Amazon Mechanical Turk (MTurk). Participants rated the clarity and demonstrated their understanding through a multiple-choice question. RESULTS: GPT-generated output was deemed correct by 71.7% to 94.3% of raters (36 urologists, 17 urology residents) across 9 scenarios. GPT-generated simplified layperson summaries of this output was rated as accurate in 8 of 9 (88.9%) scenarios and sufficient for a patient to make a decision in 8 of 9 (88.9%) scenarios. Mean readability of layperson summaries was higher than original GPT outputs ([original ChatGPT v. simplified ChatGPT, mean (SD), p-value] Flesch Reading Ease: 36.5(9.1) v. 70.2(11.2), <0.0001; Gunning Fog: 15.8(1.7) v. 9.5(2.0), p < 0.0001; Flesch Grade Level: 12.8(1.2) v. 7.4(1.7), p < 0.0001; Coleman Liau: 13.7(2.1) v. 8.6(2.4), 0.0002; Smog index: 11.8(1.2) v. 6.7(1.8), <0.0001; Automated Readability Index: 13.1(1.4) v. 7.5(2.1), p < 0.0001). MTurk workers (n = 514) rated the layperson summaries as correct (89.5-95.7%) and correctly understood the content (63.0-87.4%). CONCLUSION: GPT shows promise for correct patient education for prostate cancer-related contents, but the technology is not designed for delivering patients information. Prompting the model to respond with accuracy, completeness, clarity and readability may enhance its utility when used for GPT-powered medical chatbots.
RESUMEN
Generative artificial intelligence is able to collect, extract, digest, and generate information in an understandable way for humans. As the first surgical applications of generative artificial intelligence are applied, this perspective paper aims to provide a comprehensive overview of current applications and future perspectives for the application of generative artificial intelligence in surgery, from preoperative planning to training. Generative artificial intelligence can be used before surgery for planning and decision support by extracting patient information and providing patients with information and simulation regarding the procedure. Intraoperatively, generative artificial intelligence can document data that is normally not captured as intraoperative adverse events or provide information to help decision-making. Postoperatively, GAIs can help with patient discharge and follow-up. The ability to provide real-time feedback and store it for later review is an important capability of GAIs. GAI applications are emerging as highly specialized, task-specific tools for tasks such as data extraction, synthesis, presentation, and communication within the realm of surgery. GAIs have the potential to play a pivotal role in facilitating interaction between surgeons and artificial intelligence.