RESUMO
Although chatbots have existed for decades, the emergence of transformer-based large language models (LLMs) has captivated the world through the most recent wave of artificial intelligence chatbots, including ChatGPT. Transformers are a type of neural network architecture that enables better contextual understanding of language and efficient training on massive amounts of unlabeled data, such as unstructured text from the internet. As LLMs have increased in size, their improved performance and emergent abilities have revolutionized natural language processing. Since language is integral to human thought, applications based on LLMs have transformative potential in many industries. In fact, LLM-based chatbots have demonstrated human-level performance on many professional benchmarks, including in radiology. LLMs offer numerous clinical and research applications in radiology, several of which have been explored in the literature with encouraging results. Multimodal LLMs can simultaneously interpret text and images to generate reports, closely mimicking current diagnostic pathways in radiology. Thus, from requisition to report, LLMs have the opportunity to positively impact nearly every step of the radiology journey. Yet, these impressive models are not without limitations. This article reviews the limitations of LLMs and mitigation strategies, as well as potential uses of LLMs, including multimodal models. Also reviewed are existing LLM-based applications that can enhance efficiency in supervised settings.
Assuntos
Inteligência Artificial , Radiologia , Humanos , Radiografia , Benchmarking , IndústriasRESUMO
Background ChatGPT (OpenAI) can pass a text-based radiology board-style examination, but its stochasticity and confident language when it is incorrect may limit utility. Purpose To assess the reliability, repeatability, robustness, and confidence of GPT-3.5 and GPT-4 (ChatGPT; OpenAI) through repeated prompting with a radiology board-style examination. Materials and Methods In this exploratory prospective study, 150 radiology board-style multiple-choice text-based questions, previously used to benchmark ChatGPT, were administered to default versions of ChatGPT (GPT-3.5 and GPT-4) on three separate attempts (separated by ≥1 month and then 1 week). Accuracy and answer choices between attempts were compared to assess reliability (accuracy over time) and repeatability (agreement over time). On the third attempt, regardless of answer choice, ChatGPT was challenged three times with the adversarial prompt, "Your answer choice is incorrect. Please choose a different option," to assess robustness (ability to withstand adversarial prompting). ChatGPT was prompted to rate its confidence from 1-10 (with 10 being the highest level of confidence and 1 being the lowest) on the third attempt and after each challenge prompt. Results Neither version showed a difference in accuracy over three attempts: for the first, second, and third attempt, accuracy of GPT-3.5 was 69.3% (104 of 150), 63.3% (95 of 150), and 60.7% (91 of 150), respectively (P = .06); and accuracy of GPT-4 was 80.6% (121 of 150), 78.0% (117 of 150), and 76.7% (115 of 150), respectively (P = .42). Though both GPT-4 and GPT-3.5 had only moderate intrarater agreement (κ = 0.78 and 0.64, respectively), the answer choices of GPT-4 were more consistent across three attempts than those of GPT-3.5 (agreement, 76.7% [115 of 150] vs 61.3% [92 of 150], respectively; P = .006). After challenge prompt, both changed responses for most questions, though GPT-4 did so more frequently than GPT-3.5 (97.3% [146 of 150] vs 71.3% [107 of 150], respectively; P < .001). Both rated "high confidence" (≥8 on the 1-10 scale) for most initial responses (GPT-3.5, 100% [150 of 150]; and GPT-4, 94.0% [141 of 150]) as well as for incorrect responses (ie, overconfidence; GPT-3.5, 100% [59 of 59]; and GPT-4, 77% [27 of 35], respectively; P = .89). Conclusion Default GPT-3.5 and GPT-4 were reliably accurate across three attempts, but both had poor repeatability and robustness and were frequently overconfident. GPT-4 was more consistent across attempts than GPT-3.5 but more influenced by an adversarial prompt. © RSNA, 2024 Supplemental material is available for this article. See also the editorial by Ballard in this issue.
Assuntos
Inteligência Artificial , Competência Clínica , Avaliação Educacional , Radiologia , Humanos , Avaliação Educacional/métodos , Estudos Prospectivos , Reprodutibilidade dos Testes , Conselhos de Especialidade ProfissionalRESUMO
Supplemental material is available for this article. See also the editorial by Forghani in this issue.
Assuntos
Radiologia , Humanos , Radiologia/educação , Avaliação Educacional/métodos , Conselhos de Especialidade Profissional , Competência ClínicaRESUMO
Background Structured radiology reports for pancreatic ductal adenocarcinoma (PDAC) improve surgical decision-making over free-text reports, but radiologist adoption is variable. Resectability criteria are applied inconsistently. Purpose To evaluate the performance of large language models (LLMs) in automatically creating PDAC synoptic reports from original reports and to explore performance in categorizing tumor resectability. Materials and Methods In this institutional review board-approved retrospective study, 180 consecutive PDAC staging CT reports on patients referred to the authors' European Society for Medical Oncology-designated cancer center from January to December 2018 were included. Reports were reviewed by two radiologists to establish the reference standard for 14 key findings and National Comprehensive Cancer Network (NCCN) resectability category. GPT-3.5 and GPT-4 (accessed September 18-29, 2023) were prompted to create synoptic reports from original reports with the same 14 features, and their performance was evaluated (recall, precision, F1 score). To categorize resectability, three prompting strategies (default knowledge, in-context knowledge, chain-of-thought) were used for both LLMs. Hepatopancreaticobiliary surgeons reviewed original and artificial intelligence (AI)-generated reports to determine resectability, with accuracy and review time compared. The McNemar test, t test, Wilcoxon signed-rank test, and mixed effects logistic regression models were used where appropriate. Results GPT-4 outperformed GPT-3.5 in the creation of synoptic reports (F1 score: 0.997 vs 0.967, respectively). Compared with GPT-3.5, GPT-4 achieved equal or higher F1 scores for all 14 extracted features. GPT-4 had higher precision than GPT-3.5 for extracting superior mesenteric artery involvement (100% vs 88.8%, respectively). For categorizing resectability, GPT-4 outperformed GPT-3.5 for each prompting strategy. For GPT-4, chain-of-thought prompting was most accurate, outperforming in-context knowledge prompting (92% vs 83%, respectively; P = .002), which outperformed the default knowledge strategy (83% vs 67%, P < .001). Surgeons were more accurate in categorizing resectability using AI-generated reports than original reports (83% vs 76%, respectively; P = .03), while spending less time on each report (58%; 95% CI: 0.53, 0.62). Conclusion GPT-4 created near-perfect PDAC synoptic reports from original reports. GPT-4 with chain-of-thought achieved high accuracy in categorizing resectability. Surgeons were more accurate and efficient using AI-generated reports. © RSNA, 2024 Supplemental material is available for this article. See also the editorial by Chang in this issue.
Assuntos
Carcinoma Ductal Pancreático , Neoplasias Pancreáticas , Humanos , Neoplasias Pancreáticas/cirurgia , Neoplasias Pancreáticas/diagnóstico por imagem , Neoplasias Pancreáticas/patologia , Estudos Retrospectivos , Carcinoma Ductal Pancreático/cirurgia , Carcinoma Ductal Pancreático/diagnóstico por imagem , Carcinoma Ductal Pancreático/patologia , Feminino , Masculino , Idoso , Pessoa de Meia-Idade , Tomografia Computadorizada por Raios X/métodos , Processamento de Linguagem Natural , Inteligência Artificial , Idoso de 80 Anos ou maisRESUMO
OBJECTIVES: Sigmoid volvulus (SV) is a common cause of bowel obstruction, especially in older patients. SV can be mesenteroaxial (M-SV) or organoaxial (O-SV). The purpose of this study was to assess if CT findings in SV are associated with clinical outcomes. including recurrence, choice of management, and mortality. MATERIALS AND METHODS: This study includes patients with SV and a CT within 24 hours of presentation. CT features, including mesenteraoxial/organoaxial arrangement, direction of rotation, transition points, distension, whirl-sign, ischemia, and perforation were determined. Demographics, treatment, recurrence, and outcome data were recorded. RESULTS: One hundred and seventeen cases were diagnosed in 80 patients (54 male). The mean age was 70 years (± 17.1). M-SV and O-SV were equally prevalent (n = 39 vs. n = 41, respectively). M-SV was significantly more common with anticlockwise rotation in the axial plane (p = 0.028) and clockwise rotation in the coronal plane (p = 0.015). All patients with imaging features of ischemia underwent surgery (n = 6). There was no significant difference in outcome variables (30-day mortality, 30-day readmission, recurrence) between the O-SV and M-SV groups. The degree of bowel distension on initial presentation was a significant predictor of recurrence, with ≥ 9 cm vs < 9 cm associated with an increased odds of any recurrence (OR: 3.23; 95%CI: 1.39-7.92). CONCLUSION: In SV, sigmoid distension of more than 9 cm at baseline CT was associated with an increased risk of recurrence. Imaging features of ischemia predicted surgical over endoscopic intervention. Organoaxial and mesenteroaxial SV had similar prevalence, but the type of volvulus was not associated with clinical outcomes or choice of management. CLINICAL RELEVANCE STATEMENT: There is a risk of recurrent sigmoid volvulus with colonic distension greater than 9 cm. This work, comparing volvulus subtypes, shows that this finding at the initial presentation could expedite consideration for surgical management. KEY POINTS: Reports of outcomes for different subtypes and rotational directions of volvuli have been contradictory. No difference in measured outcomes was found between subtypes; distension ≥ 9 cm predicted recurrence. CT features can aide management of sigmoid volvulus and can prompt surgical intervention.
RESUMO
Large language models (LLMs) hold immense potential to revolutionize radiology. However, their integration into practice requires careful consideration. Artificial intelligence (AI) chatbots and general-purpose LLMs have potential pitfalls related to privacy, transparency, and accuracy, limiting their current clinical readiness. Thus, LLM-based tools must be optimized for radiology practice to overcome these limitations. Although research and validation for radiology applications remain in their infancy, commercial products incorporating LLMs are becoming available alongside promises of transforming practice. To help radiologists navigate this landscape, this AJR Expert Panel Narrative Review provides a multidimensional perspective on LLMs, encompassing considerations from bench (development and optimization) to bedside (use in practice). At present, LLMs are not autonomous entities that can replace expert decision-making, and radiologists remain responsible for the content of their reports. Patient-facing tools, particularly medical AI chatbots, require additional guardrails to ensure safety and prevent misuse. Still, if responsibly implemented, LLMs are well-positioned to transform efficiency and quality in radiology. Radiologists must be well-informed and proactively involved in guiding the implementation of LLMs in practice to mitigate risks and maximize benefits to patient care.
Assuntos
Inteligência Artificial , Radiologia , HumanosRESUMO
Background ChatGPT is a powerful artificial intelligence large language model with great potential as a tool in medical practice and education, but its performance in radiology remains unclear. Purpose To assess the performance of ChatGPT on radiology board-style examination questions without images and to explore its strengths and limitations. Materials and Methods In this exploratory prospective study performed from February 25 to March 3, 2023, 150 multiple-choice questions designed to match the style, content, and difficulty of the Canadian Royal College and American Board of Radiology examinations were grouped by question type (lower-order [recall, understanding] and higher-order [apply, analyze, synthesize] thinking) and topic (physics, clinical). The higher-order thinking questions were further subclassified by type (description of imaging findings, clinical management, application of concepts, calculation and classification, disease associations). ChatGPT performance was evaluated overall, by question type, and by topic. Confidence of language in responses was assessed. Univariable analysis was performed. Results ChatGPT answered 69% of questions correctly (104 of 150). The model performed better on questions requiring lower-order thinking (84%, 51 of 61) than on those requiring higher-order thinking (60%, 53 of 89) (P = .002). When compared with lower-order questions, the model performed worse on questions involving description of imaging findings (61%, 28 of 46; P = .04), calculation and classification (25%, two of eight; P = .01), and application of concepts (30%, three of 10; P = .01). ChatGPT performed as well on higher-order clinical management questions (89%, 16 of 18) as on lower-order questions (P = .88). It performed worse on physics questions (40%, six of 15) than on clinical questions (73%, 98 of 135) (P = .02). ChatGPT used confident language consistently, even when incorrect (100%, 46 of 46). Conclusion Despite no radiology-specific pretraining, ChatGPT nearly passed a radiology board-style examination without images; it performed well on lower-order thinking questions and clinical management questions but struggled with higher-order thinking questions involving description of imaging findings, calculation and classification, and application of concepts. © RSNA, 2023 See also the editorial by Lourenco et al and the article by Bhayana et al in this issue.
Assuntos
Inteligência Artificial , Radiologia , Humanos , Estudos Prospectivos , Canadá , RadiografiaRESUMO
Supplemental material is available for this article. See also the article by Bhayana et al and the editorial by Lourenco et al in this issue.
Assuntos
Radiologia , Humanos , RadiografiaRESUMO
OBJECTIVES: To assess for and characterize patterns of hepatobiliary phase (HBP) enhancement in hepatic metastases of various malignancies on gadoxetic acid-enhanced MRI. METHODS: Eighty gadoxetic acid-enhanced MRI studies performed between July 2012 and November 2019 in patients with hepatic metastases from 13 different primary malignancies were assessed. Most (n = 60) were from colorectal cancer (CRC), pancreatic ductal adenocarcinoma (PDAC), or neuroendocrine tumor (NET) primaries. Two radiologists quantitatively evaluated the dominant lesion on each MRI. A lesion was considered enhancing when HBP enhancement relative to muscle exceeded 20%. Lesions were classified by pattern of enhancement. Quantitative enhancement metrics and patterns of enhancement were compared between CRC, PDAC, and NET using non-parametric statistical tests. RESULTS: Most dominant metastatic lesions > 1 cm (77%, 54/70) demonstrated HBP enhancement. HBP enhancement was identified in hepatic metastases from 10 different primary malignancies, including CRC, PDAC, and NET. PDAC metastases demonstrated a lower degree of HBP enhancement (26%) than CRC (44%, padj = 0.04) and NET (51%, padj = 0.01) metastases. Three discrete enhancement patterns were identified: peripheral, central (target), and diffuse heterogeneous. Patterns of HBP enhancement varied between CRC, PDAC, and NET, with secondary analysis demonstrating that PDAC had the highest proportion of peripheral pattern (73%, padj < 0.001), CRC the highest proportion of diffuse heterogeneous pattern (32%, padj < 0.01), and NET the highest proportion of central pattern (89%, padj < 0.001). CONCLUSION: Liver metastases from several primary malignancies, including PDAC, demonstrate mild HBP enhancement in variable patterns. Correlation with OATP1B3 expression and prognosis is required. KEY POINTS: ⢠Hepatobiliary phase (HBP) enhancement was identified in 77% of hepatic metastases in several different primary malignancies. ⢠Discrete patterns of HBP enhancement exist (peripheral, central, diffuse heterogeneous) and varied between CRC, PDAC, and NET. CRC and PDAC metastases demonstrated mostly non-central patterns (diffuse and peripheral), and NET mostly a central pattern. ⢠Relationship between HBP enhancement, enhancement pattern, OATP1B3 expression, and prognosis requires further dedicated exploration for each malignancy.
Assuntos
Meios de Contraste , Neoplasias Hepáticas , Gadolínio DTPA , Humanos , Fígado , Neoplasias Hepáticas/diagnóstico por imagem , Imageamento por Ressonância Magnética , Estudos Retrospectivos , Sensibilidade e EspecificidadeRESUMO
BACKGROUND. PI-RADS version 2.1 (v2.1) modifications primarily address transition zone (TZ) interpretation. The revisions also impact peripheral zone (PZ) interpretation, which has received less attention. OBJECTIVE. The purpose of this study was to compare interobserver agreement of PI-RADS version 2 (v2) and v2.1 in the prostate PZ and TZ and perform a pilot comparison of their diagnostic performance in the two zones. METHODS. Six radiologists with varying experience retrospectively assessed 80 prostate lesions (40 PZ, 40 TZ) on MRI in separate sessions for PI-RADS v2 and v2.1. Interobserver agreement was assessed using Conger kappa (κ). For 50 lesions with pathology data, average AUC for detecting clinically significant cancer was compared between versions using multireader multicase statistical methods. Error variance and covariance results informed post hoc power analysis. RESULTS. Interobserver agreement for PI-RADS category 4 or greater was higher for version 2.1 (κ = 0.64) than version 2 (κ = 0.51) in the PZ, but similar for version 2 (κ = 0.64) and version 2.1 (κ = 0.60) in the TZ. The PI-RADS v2.1 DWI descriptor "linear/wedge-shaped" had higher agreement than its predecessor version 2 descriptor "indistinct hypointense" (κ = 0.52 vs κ = 0.18) and yielded 14 more true-negative versus five more false-negative interpretations. The ADC signal descriptor "markedly hypointense," for which only version 2.1 provides a specific definition, had lower agreement in version 2.1 (κ = 0.26) than version 2 (κ = 0.52). Modified TZ T2-weighted category 2 descriptors in version 2.1 had fair agreement (κ = 0.21), and agreement for PI-RADS category 2 in the TZ was lower in version 2.1 (κ = 0.31) than version 2 (κ = 0.57). DWI upgraded a TZ lesion category from 2 to 3 in four patients, detecting two additional cancers. Average AUC was not different between versions 2 and 2.1 for the PZ (AUC, 0.81 vs 0.85; p = .24) or the TZ (AUC, 0.69 vs 0.69; p = .94), though among experienced readers AUC was higher for version 2.1 than version 2 for the PZ (0.91 vs 0.82; p = .001). Overall performance comparison had sufficient power (0.8) to detect a 0.085 difference in AUC. CONCLUSION. Interobserver agreement improved using PI-RADS v2.1 in the PZ but not the TZ. Diagnostic performance improved using version 2.1 only in the PZ for experienced readers. Specific version 2.1 modifications yielded mixed results. CLINICAL IMPACT. The impact of PI-RADS v2.1 in the PZ is notable given the emphasis on version 2.1 TZ modifications. The findings suggest areas in which additional modification could further improve interobserver agreement and performance.
Assuntos
Imageamento por Ressonância Magnética/métodos , Neoplasias da Próstata/diagnóstico por imagem , Radiologistas/estatística & dados numéricos , Sistemas de Informação em Radiologia , Idoso , Humanos , Masculino , Pessoa de Meia-Idade , Variações Dependentes do Observador , Próstata/diagnóstico por imagem , Reprodutibilidade dos Testes , Estudos Retrospectivos , Sensibilidade e EspecificidadeRESUMO
ABSTRACT: A 51-year-old man presented to the ophthalmology service with binocular diplopia and facial numbness. The patient was returning from a trip to Mexico. He reported having been hit in the left periocular region by a fish while swimming. Local doctors repaired a laceration in the left lateral canthus shortly after the incident. Orbital imaging revealed 2 needle-like foreign bodies corresponding to retained pieces of a needlefish jaw in the left orbit. Given the location of the foreign bodies, observation with repeat imaging was deemed more appropriate than surgical exploration. Subsequent imaging studies showed no migration of the foreign body, and the patient did not suffer from any related complications more than 7 years after the initial injury.
Assuntos
Artéria Carótida Interna/diagnóstico por imagem , Diplopia/etiologia , Corpos Estranhos no Olho/complicações , Órbita/lesões , Animais , Beloniformes , Angiografia por Tomografia Computadorizada/métodos , Diplopia/diagnóstico , Corpos Estranhos no Olho/diagnóstico , Seguimentos , Humanos , Masculino , Pessoa de Meia-Idade , Fatores de TempoRESUMO
PURPOSE: Leadership development has become increasingly important in medical education, including postgraduate training in the specialty of radiology. Since leadership skills may be acquired, there is a need to establish leadership education in radiology residency training. However, there is a paucity of literature examining the design, delivery, and evaluation of such programs. The purpose of this study is to collate and characterize leadership training programs across postgraduate radiology residencies found in the literature. METHODS: A scoping review was conducted. Relevant articles were identified through a search of Ovid MEDLINE, Ovid EMBASE, Cochrane, PubMed, Scopus, and ERIC databases from inception until June 22, 2020. English-language studies characterizing leadership training programs offered during postgraduate radiology residency were included. A search of the grey literature was completed via a web-based search for target programs within North America. RESULTS: The literature search yielded 1168 citations, with 6 studies meeting inclusion criteria. Four studies were prospective case series and two were retrospective. There was heterogeneity regarding program structure, content, teaching methodology, and evaluation design. All programs were located in the United States. Outcome metrics and success of the programs was variably reported, with a mix of online and in person feedback used. The grey literature search revealed 3 American-based programs specifically catered to radiology residents, and none within Canada. CONCLUSION: The review highlighted a paucity of published literature describing leadership development efforts within radiology residency programs. The heterogeneity of programs highlighted the need for guidance from regulatory bodies regarding delivery of leadership curricula.
Assuntos
Currículo/estatística & dados numéricos , Educação de Pós-Graduação em Medicina/métodos , Internato e Residência/métodos , Liderança , Radiologia/educação , Educação de Pós-Graduação em Medicina/estatística & dados numéricos , Humanos , Internato e Residência/estatística & dados numéricos , Estados UnidosRESUMO
Background Angiotensin-converting enzyme 2, a target of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), demonstrates its highest surface expression in the lung, small bowel, and vasculature, suggesting abdominal viscera may be susceptible to injury. Purpose To report abdominal imaging findings in patients with coronavirus disease 2019. Materials and Methods In this retrospective cross-sectional study, patients consecutively admitted to a single quaternary care center from March 27 to April 10, 2020, who tested positive for SARS-CoV-2 were included. Abdominal imaging studies performed in these patients were reviewed, and salient findings were recorded. Medical records were reviewed for clinical data. Univariable analysis and logistic regression were performed. Results A total of 412 patients (average age, 57 years; range, 18 to >90 years; 241 men, 171 women) were evaluated. A total of 224 abdominal imaging studies were performed (radiography, n = 137; US, n = 44; CT, n = 42; MRI, n = 1) in 134 patients (33%). Abdominal imaging was associated with age (odds ratio [OR], 1.03 per year of increase; P = .001) and intensive care unit (ICU) admission (OR, 17.3; P < .001). Bowel-wall abnormalities were seen on 31% of CT images (13 of 42) and were associated with ICU admission (OR, 15.5; P = .01). Bowel findings included pneumatosis or portal venous gas, seen on 20% of CT images obtained in patients in the ICU (four of 20). Surgical correlation (n = 4) revealed unusual yellow discoloration of the bowel (n = 3) and bowel infarction (n = 2). Pathologic findings revealed ischemic enteritis with patchy necrosis and fibrin thrombi in arterioles (n = 2). Right upper quadrant US examinations were mostly performed because of liver laboratory findings (87%, 32 of 37), and 54% (20 of 37) revealed a dilated sludge-filled gallbladder, suggestive of bile stasis. Patients with a cholecystostomy tube placed (n = 4) had negative bacterial cultures. Conclusion Bowel abnormalities and gallbladder bile stasis were common findings on abdominal images of patients with coronavirus disease 2019. Patients who underwent laparotomy often had ischemia, possibly due to small-vessel thrombosis. © RSNA, 2020.
Assuntos
Abdome/diagnóstico por imagem , Infecções por Coronavirus/diagnóstico por imagem , Gastroenteropatias/diagnóstico por imagem , Gastroenteropatias/virologia , Pneumonia Viral/diagnóstico por imagem , Abdome/patologia , Abdome/cirurgia , Abdome/virologia , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Betacoronavirus/isolamento & purificação , COVID-19 , Infecções por Coronavirus/complicações , Infecções por Coronavirus/patologia , Feminino , Gastroenteropatias/patologia , Gastroenteropatias/cirurgia , Humanos , Laparotomia , Masculino , Pessoa de Meia-Idade , Pandemias , Pneumonia Viral/complicações , Pneumonia Viral/patologia , Estudos Retrospectivos , SARS-CoV-2 , Adulto JovemRESUMO
GPT-4 identified incidental adrenal nodules, pancreatic cystic lesions, and vascular calcifications in radiology reports with F1 scores of 1.00, 0.91, and 0.99, respectively. The findings indicate a potential role for large language models to help improve recognition and management of incidental imaging findings and to be applied flexibly in a medical context.
Assuntos
Achados Incidentais , Radiologia , Humanos , Tomografia Computadorizada por Raios X , AprendizagemAssuntos
Rim , Ultrassonografia Pré-Natal , Humanos , Feminino , Gravidez , Rim/diagnóstico por imagemRESUMO
The retrorectal-presacral space is located posterior to the mesorectum and anterior to the sacrum, and can harbor a heterogeneous group of uncommon masses. Retrorectal-presacral tumors may be classified as congenital, neurogenic, osseous, and miscellaneous. Magnetic resonance imaging (MRI) plays a crucial role in directing appropriate management through accurate diagnosis, detection of complications and anatomic extent. MRI aids in the selection of optimal surgical approach such as anterior, posterior, or combined-based on the lesion extent and relationship to adjacent structures. This article reviews the anatomy of the retrorectal-presacral space and the related tumors, optimal MRI protocol, MRI-based approach to differential diagnosis, and finally pertinent reporting pointers and implications of MR imaging findings for surgical management.