RESUMO
OBJECTIVE: To evaluate the performance of an artificial intelligence (AI) large language model, ChatGPT (version 4.0), for common retinal diseases, in accordance with the American Academy of Ophthalmology (AAO) Preferred Practice Pattern (PPP) guidelines. DESIGN: A cross-sectional survey study design was employed to compare the responses made by ChatGPT to established clinical guidelines. PARTICIPANTS: Responses by the AI were reviewed by a panel of three vitreoretinal specialists for evaluation. METHODS: To investigate ChatGPT's comprehension of clinical guidelines, we designed 130 questions covering a broad spectrum of topics within 12 AAO PPP domains of retinal disease These questions were crafted to encompass diagnostic criteria, treatment guidelines, and management strategies, including both medical and surgical aspects of retinal care. A panel of 3 retinal specialists independently evaluated responses on a Likert scale from 1 to 5 based on their relevance, accuracy, and adherence to AAO PPP guidelines. Response readability was evaluated using Flesch Readability Ease and Flesch-Kincaid grade level scores. RESULTS: ChatGPT achieved an overall average score of 4.9/5.0, suggesting high alignment with the AAO PPP guidelines. Scores varied across domains, with the lowest in the surgical management of disease. The responses had a low reading ease score and required a college-to-graduate level of comprehension. Identified errors were related to diagnostic criteria, treatment options, and methodological procedures. CONCLUSION: ChatGPT 4.0 demonstrated significant potential in generating guideline-concordant responses, particularly for common medical retinal diseases. However, its performance slightly decreased in surgical retina, highlighting the ongoing need for clinician input, further model refinement, and improved comprehensibility.
RESUMO
Importance: Workforce diversity is integral to optimal function within health care teams. Objective: To analyze gender, race, and ethnicity trends in rank and leadership among US full-time academic ophthalmology faculty and department chairs between 1966 and 2021. Design, Setting, and Participants: This cohort study included full-time US academic ophthalmology faculty and department chairs registered in the Association of American Medical Colleges. Study data were analyzed in September 2023. Exposure: Identifying with an underrepresented in medicine (URiM) group. Main Outcomes and Measures: The main outcome measures were demographic (ie, gender, race, and ethnicity) changes among academic faculty and department chairs, assessed in 5-year intervals. The term minoritized race refers to any racial group other than White race. Results: There were 221 academic physicians in 1966 (27 women [12.2%]; 38 minoritized race [17.2%]; 8 Hispanic, Latino, or Spanish [3.6%]) and 3158 academic faculty by 2021 (1320 women [41.8%]; 1298 minoritized race [41.1%]; 147 Hispanic, Latino, or Spanish ethnicity [4.7%]). The annual proportional change for women, minoritized race, and Hispanic, Latino, or Spanish ethnicity was +0.63% per year (95% CI, 0.53%-0.72%), +0.54% per year (95% CI, 0.72%-0.36%), and -0.01% (95% CI, -0.03% to 0%), respectively. Women were underrepresented across academic ranks and increasingly so at higher echelons, ranging from nonprofessor/instructor roles (period-averaged mean difference [PA-MD], 19.88%; 95% CI, 16.82%-22.94%) to professor (PA-MD, 81.33%; 95% CI, 78.80%-83.86%). The corpus of department chairs grew from 77 in 1977 (0 women; 7 minoritized race [9.09%]; 2 Hispanic, Latino, or Spanish ethnicity [2.60%]) to 104 by 2021 (17 women [16.35%]; 22 minoritized race [21.15%]; 4 Hispanic, Latino, or Spanish ethnicity [3.85%]). For department chairs, the annual rate of change in the proportion of women, minoritized race, and Hispanic, Latino, or Spanish ethnicity was +0.32% per year (95% CI, 0.20%-0.44%), +0.34% per year (95% CI, 0.19%-0.49%), and +0.05% per year (95% CI, 0.02%-0.08%), respectively. In both faculty and department chairs, the proportion of URiM groups (American Indian or Alaska Native, Black or African American, Hispanic, and Native Hawaiian or Other Pacific Islander) grew the least. Intersectionality analysis suggested that men and non-URiM status were associated with greater representation across ophthalmology faculty and department chairs. However, among ophthalmology faculty, URiM women and men did not significantly differ across strata of academic ranks, whereas for department chairs, no difference was observed in representation between URiM men and non-URiM women. Conclusion & Relevance: Results of this cohort study revealed that since 1966, workforce diversity progressed slowly and was limited to lower academic ranks and leadership positions. Intersectionality of URiM status and gender persisted in representation trends. These findings suggest further advocacy and intervention are needed to increase workforce diversity.
Assuntos
Etnicidade , Docentes de Medicina , Liderança , Oftalmologia , Humanos , Feminino , Estados Unidos , Masculino , Docentes de Medicina/estatística & dados numéricos , Etnicidade/estatística & dados numéricos , Grupos Raciais/estatística & dados numéricos , Centros Médicos Acadêmicos , Diversidade Cultural , Distribuição por Sexo , Médicas/estatística & dados numéricos , Estudos RetrospectivosRESUMO
PURPOSE: This cross-sectional study evaluated the prevalence of inclusive author submission guidelines across ophthalmology journals. METHODS: Journals were identified from the 2021 Journal Citations Report (Clarivate Analytics). Independent reviewers rated each author submission guideline as "inclusive" for satisfying at-least one of six criteria: i) included examples of gender inclusive language; ii) recommended the use of gender-inclusive language; iii) distinguished between sex and gender; iv) provided educational resources on gender-inclusive language; v) provided a policy permitting name changes (e.g., in case of gender and name transition); and/or vi) provided a statement of commitment to inclusivity. The primary objective was to investigate the proportion of journals with "gender-inclusive" author submission guidelines and the elements of the gender-inclusive content within these guidelines. A secondary objective was to review the association between "gender-inclusivity" in author submission guidelines with publisher, origin country, and journal/source/influence metrics (Clarivate Analytics). RESULTS: Across 94 journals, 29.8% journals were rated as inclusive. Inclusive journals had significantly higher relative impact factor, citations, and article influence scores compared to non-inclusive journals. Of the 29.8% of inclusive journals, the three most common domains were inclusion of an inclusivity statement (71.4% of inclusive journals), distinguishing between sex and gender (67.9%), and provision of additional educational resources on gender reporting for authors (60.7%). CONCLUSION: A minority of ophthalmology journals have gender-inclusive author submission guidelines. Ophthalmology journals should update their submission guidelines to advance gender equity of both authors and study participants and promote the inclusion of gender-diverse communities.
Assuntos
Anestésicos Locais , Blefaroplastia , Medição da Dor , Dor Pós-Operatória , Humanos , Blefaroplastia/métodos , Anestésicos Locais/administração & dosagem , Dor Pós-Operatória/tratamento farmacológico , Dor Pós-Operatória/diagnóstico , Anestesia Local/métodos , Pálpebras/cirurgia , Ensaios Clínicos Controlados Aleatórios como Assunto , Dor Ocular/diagnóstico , Dor Ocular/etiologiaRESUMO
Integrating large language models (LLMs) like GPT-4 into medical ethics is a novel concept, and understanding the effectiveness of these models in aiding ethicists with decision-making can have significant implications for the healthcare sector. Thus, the objective of this study was to evaluate the performance of GPT-4 in responding to complex medical ethical vignettes and to gauge its utility and limitations for aiding medical ethicists. Using a mixed-methods, cross-sectional survey approach, a panel of six ethicists assessed LLM-generated responses to eight ethical vignettes.The main outcomes measured were relevance, reasoning, depth, technical and non-technical clarity, as well as acceptability of GPT-4's responses. The readability of the responses was also assessed. Of the six metrics evaluating the effectiveness of GPT-4's responses, the overall mean score was 4.1/5. GPT-4 was rated highest in providing technical (4.7/5) and non-technical clarity (4.4/5), whereas the lowest rated metrics were depth (3.8/5) and acceptability (3.8/5). There was poor-to-moderate inter-rater reliability characterised by an intraclass coefficient of 0.54 (95% CI: 0.30 to 0.71). Based on panellist feedback, GPT-4 was able to identify and articulate key ethical issues but struggled to appreciate the nuanced aspects of ethical dilemmas and misapplied certain moral principles.This study reveals limitations in the ability of GPT-4 to appreciate the depth and nuanced acceptability of real-world ethical dilemmas, particularly those that require a thorough understanding of relational complexities and context-specific values. Ongoing evaluation of LLM capabilities within medical ethics remains paramount, and further refinement is needed before it can be used effectively in clinical settings.
Assuntos
Eticistas , Ética Médica , Humanos , Estudos Transversais , Reprodutibilidade dos Testes , Resolução de ProblemasRESUMO
PURPOSE: To assess the accuracy and readability of responses generated by the artificial intelligence model, ChatGPT (version 4.0), to questions related to 10 essential domains of orbital and oculofacial disease. METHODS: A set of 100 questions related to the diagnosis, treatment, and interpretation of orbital and oculofacial diseases was posed to ChatGPT 4.0. Responses were evaluated by a panel of 7 experts based on appropriateness and accuracy, with performance scores measured on a 7-item Likert scale. Inter-rater reliability was determined via the intraclass correlation coefficient. RESULTS: The artificial intelligence model demonstrated accurate and consistent performance across all 10 domains of orbital and oculofacial disease, with an average appropriateness score of 5.3/6.0 ("mostly appropriate" to "completely appropriate"). Domains of cavernous sinus fistula, retrobulbar hemorrhage, and blepharospasm had the highest domain scores (average scores of 5.5 to 5.6), while the proptosis domain had the lowest (average score of 5.0/6.0). The intraclass correlation coefficient was 0.64 (95% CI: 0.52 to 0.74), reflecting moderate inter-rater reliability. The responses exhibited a high reading-level complexity, representing the comprehension levels of a college or graduate education. CONCLUSIONS: This study demonstrates the potential of ChatGPT 4.0 to provide accurate information in the field of ophthalmology, specifically orbital and oculofacial disease. However, challenges remain in ensuring accurate and comprehensive responses across all disease domains. Future improvements should focus on refining the model's correctness and eventually expanding the scope to visual data interpretation. Our results highlight the vast potential for artificial intelligence in educational and clinical ophthalmology contexts.
Assuntos
Blefarospasmo , Seio Cavernoso , Humanos , Inteligência Artificial , Compreensão , Reprodutibilidade dos TestesRESUMO
PURPOSE: The Isabel differential diagnosis generator is one of the most widely known electronic diagnosis decision support tools. The authors prospectively evaluated the utility of Isabel for orbital disease differential diagnosis. METHODS: The terms "proptosis," "lid retraction," "orbit inflammation," "orbit tumour," "orbit tumor, infiltrative" and "orbital tumor, well-circumscribed" were separately input into Isabel and the results were tabulated. Then the clinical details (patient age, gender, signs, symptoms, and imaging findings) of 25 orbital cases from a textbook of orbital surgery were entered into Isabel. The top 10 differential diagnoses generated by Isabel were compared with the correct diagnosis. RESULTS: Isabel identified hyperthyroidism and Graves ophthalmopathy as the leading causes of lid retraction, but many common causes of proptosis and orbital tumors were not correctly elucidated. Of the textbook cases, Isabel correctly identified 4/25 (16%) of orbital cases as one of its top 10 differential diagnoses, and the median rank of the correct diagnosis was 6/10. Thirty-two percent of the output diagnoses were unlikely to cause orbital disease. CONCLUSION: Isabel is currently of limited value in the mainstream orbital differential diagnosis. The incorporation of anatomic localizations and imaging findings may help increase the accuracy of orbital diagnosis.
Assuntos
Exoftalmia , Doenças Palpebrais , Oftalmopatia de Graves , Doenças Orbitárias , Neoplasias Orbitárias , Humanos , Diagnóstico Diferencial , Oftalmopatia de Graves/diagnóstico , Órbita/diagnóstico por imagem , Órbita/cirurgia , Exoftalmia/etiologia , Doenças Orbitárias/diagnóstico , Doenças Orbitárias/complicações , Neoplasias Orbitárias/diagnóstico , Neoplasias Orbitárias/complicações , Doenças Palpebrais/diagnósticoRESUMO
PURPOSE: To describe a transconjunctival technique for full-thickness (excisional) optic nerve biopsy. METHOD: A medial transconjunctival approach to the optic nerve with disinsertion of the medial rectus is used. A small right-angle Mixter forcep is used to clamp the optic nerve far posteriorly, and then a microscalpel is directed metal-on-metal to cut the posterior optic nerve. The cut nerve is then rotated anteriorly to complete the proximal nerve cut. RESULT: A full-thickness specimen of 11 mm of more can be obtained without undue traction on the globe. The globe remains viable. CONCLUSION: A long length, excisional optic nerve biopsy can be readily and safely performed without endoscopic techniques.
Assuntos
Músculos Oculomotores , Nervo Óptico , Humanos , Nervo Óptico/cirurgia , Biópsia , Endoscopia/métodos , Procedimentos NeurocirúrgicosRESUMO
Equity, diversity and inclusion (EDI) are increasingly important directives in medicine that add further complexity to adjudications. The analytic hierarchy process is proposed as a tool for multicriteria decision-making that can facilitate the incorporation of EDI directives, especially for collective, group determinations.
Assuntos
Processo de Hierarquia Analítica , Diversidade Cultural , Equidade em Saúde , HumanosRESUMO
PURPOSE: The editorship of medical journals is a leadership role that can affect recognition and career advancement. We determine the gender representation of the editorial boards of oculoplastic surgery journals in comparison to the proportion of women in oculoplastics societies. METHODS: The gender composition of the American, European and Asia-Pacific societies of oculoplastic and reconstructive surgery and the editorial boards of their respective society journals were determined with online searches in March 2021. Statistical tests for the equality of proportions were performed. RESULTS: Excluding 44 individuals with missing gender data, the three combined oculoplastics societies comprised 1,230 distinct members, with 29% women. The editorial review boards of the three official society publications comprised 59 medical editors, 22% of which were women. There was no statistically significant difference in the proportion of women editors versus women OPRS members (p = .201) but the study is underpowered to detect a 7% difference. A sensitivity analysis with the missing data did not alter the conclusions. The mean h-index/m-quotient of the women editors was 20.50/0.87 and for the men 21.05/0.84, with no statistically significant difference (p = .903/0.851). CONCLUSION: Women are underrepresented on the editorial boards of oculoplastic journals. Possible methods to improve gender balance include multicriteria objective decision-making criteria for editor nominations, mentoring peer reviewers that are women, and appointing a journal editor for equity, diversity and inclusion.
Assuntos
Médicas , Ásia , Feminino , Humanos , Masculino , Estados UnidosRESUMO
BACKGROUND: Authorship is a pinnacle activity in academic medicine that often involves collaboration and a mentor-mentee relationship. The International Committee of Medical Journal Editors criteria for authorship (ICMJEc) are intended to prevent abuses of authorship and are used by more than 5500 medical journals. However, the binary ICMJEc have not yet been quantified. AIM: To develop a numeric scoring rubric for the ICMJEc to corroborate the authenticity of authorship claims. METHODS: The four ICMJEc were separated into the nine authorship components of conception, design, data acquisition, data analysis, interpretation of data, draft, revision, final approval and accountability. In spring 2021, members of an international association of medical editors rated the importance of each authorship component using an 11-point Likert scale ranging from 0 (no importance) to 10 (most important). The median component scores were used to calibrate the pairwise comparisons in an analytic hierarchy process (AHP). The AHP priority weights were multiplied against a four-level perceived effort/capability grade to calculate an authorship score. RESULTS: Sixty-six decision-making medical editors completed the survey. The components had the median scores/AHP weights: conception 7.5/5.3%; design 8/8.9%; data acquisition 7/3.6%; data analysis 7/3.6%; interpretation of data 8/8.9%; draft 8/8.9%; revision 8/8.9%; final approval 9/20.1%; and accountability 10/31.8%, with Kruskal-Wallis Chi2 = 65.11, p < 0.001. CONCLUSION: The editors rated accountability as the most important component of authorship, followed by the final approval of the manuscript; data acquisition had the lowest median importance score for authorship. The scoring rubric (https://tinyurl.com/eyu86y96) transforms the binary tetrad ICMJEc into 9 quantifiable components of authorship, providing a transparent method to objectively assess authorship contributions, determine authorship order and potentially decrease the abuse of authorship. If desired, individual journals can survey their editorial boards and use the AHP method to derive customized weightings for an ICMJEc-based authorship index.