RESUMO
BACKGROUND: The release of ChatGPT (OpenAI) in November 2022 drastically reduced the barrier to using artificial intelligence by allowing a simple web-based text interface to a large language model (LLM). One use case where ChatGPT could be useful is in triaging patients at the site of a disaster using the Simple Triage and Rapid Treatment (START) protocol. However, LLMs experience several common errors including hallucinations (also called confabulations) and prompt dependency. OBJECTIVE: This study addresses the research problem: "Can ChatGPT adequately triage simulated disaster patients using the START protocol?" by measuring three outcomes: repeatability, reproducibility, and accuracy. METHODS: Nine prompts were developed by 5 disaster medicine physicians. A Python script queried ChatGPT Version 4 for each prompt combined with 391 validated simulated patient vignettes. Ten repetitions of each combination were performed for a total of 35,190 simulated triages. A reference standard START triage code for each simulated case was assigned by 2 disaster medicine specialists (JMF and MV), with a third specialist (LC) added if the first two did not agree. Results were evaluated using a gage repeatability and reproducibility study (gage R and R). Repeatability was defined as variation due to repeated use of the same prompt. Reproducibility was defined as variation due to the use of different prompts on the same patient vignette. Accuracy was defined as agreement with the reference standard. RESULTS: Although 35,102 (99.7%) queries returned a valid START score, there was considerable variability. Repeatability (use of the same prompt repeatedly) was 14% of the overall variation. Reproducibility (use of different prompts) was 4.1% of the overall variation. The accuracy of ChatGPT for START was 63.9% with a 32.9% overtriage rate and a 3.1% undertriage rate. Accuracy varied by prompt with a maximum of 71.8% and a minimum of 46.7%. CONCLUSIONS: This study indicates that ChatGPT version 4 is insufficient to triage simulated disaster patients via the START protocol. It demonstrated suboptimal repeatability and reproducibility. The overall accuracy of triage was only 63.9%. Health care professionals are advised to exercise caution while using commercial LLMs for vital medical determinations, given that these tools may commonly produce inaccurate data, colloquially referred to as hallucinations or confabulations. Artificial intelligence-guided tools should undergo rigorous statistical evaluation-using methods such as gage R and R-before implementation into clinical settings.
Assuntos
Triagem , Triagem/métodos , Humanos , Reprodutibilidade dos Testes , Simulação de Paciente , Medicina de Desastres/métodos , DesastresRESUMO
OBJECTIVE: A systematic literature review (SLR) was performed to elucidate the current triage and treatment of an entrapped or mangled extremity in resource scarce environments (RSEs). METHODS: A lead researcher followed the search strategy following inclusion and exclusion criteria. A first reviewer (FR) was randomly assigned sources. One of the 2 lead researchers was the second reviewer (SR). Each determined the level of evidence (LOE) and quality of evidence (QE) from each source. Any differing opinions between the FR and SR were discussed between them, and if differing opinions remained, then a third reviewer (the other lead researcher) discussed the article until a consensus was reached. The final opinion of each article was entered for analysis. RESULTS: Fifty-eight (58) articles were entered into the final study. There was 1 study determined to be LOE 1, 29 LOE 2, and 28 LOE 3, with 15 determined to achieve QE 1, 37 QE 2, and 6 QE 3. CONCLUSION: This SLR showed that there is a lack of studies producing strong evidence to support the triage and treatment of the mangled extremity in RSE. Therefore, a Delphi process is suggested to adapt and modify current civilian and military triage and treatment guidelines to the RSE.
Assuntos
Incidentes com Feridos em Massa , Militares , Consenso , Extremidades , Humanos , TriagemRESUMO
INTRODUCTION: Emerging evidence is guiding changes in prehospital management of potential spinal injuries. The majority of settings related to current recommendations are in resource-rich environments (RREs), whereas there is a lack of guidance on the provision of spinal motion restriction (SMR) in resource-scarce environments (RSEs), such as: mass-casualty incidents (MCIs); low-middle income countries; complex humanitarian emergencies; conflict zones; and prolonged transport times. The application of Translational Science (TS) in the Disaster Medicine (DM) context was used to develop this study, leading to statements that can be used in the creation of evidence-based clinical guidelines (CGs). OBJECTIVE: What is appropriate SMR in RSEs? METHODS: The first round of this modified Delphi (mD) study was a structured focus group conducted at the World Association for Disaster and Emergency Medicine (WADEM) Congress in Brisbane Australia on May 9, 2019. The result of the focus group discussion of open-ended questions produced ten statements that were added to ten statements derived from Fischer (2018) to create the second mD round questionnaire.Academic researchers and educators, operational first responders, or first receivers of patients with suspected spinal injuries were identified to be mD experts. Experts rated their agreement with each statement on a seven-point linear numeric scale. Consensus amongst experts was defined as a standard deviation ≤1.0. Statements that were in agreement reaching consensus were included in the final report; those that were not in agreement but reached consensus were removed from further consideration. Those not reaching consensus advanced to the third mD round.For subsequent rounds, experts were shown the mean response and their own response for each of the remaining statements and asked to reconsider their rating. As above, those that did not reach consensus advanced to the next round until consensus was reached for each statement. RESULTS: Twenty-two experts agreed to participate with 19 completing the second mD round and 16 completing the third mD round. Eleven statements reached consensus. Nine statements did not reach consensus. CONCLUSIONS: Experts reached consensus offering 11 statements to be incorporated into the creation of SMR CGs in RSEs. The nine statements that did not reach consensus can be further studied and potentially modified to determine if these can be considered in SMR CGs in RSEs.
Assuntos
Serviços Médicos de Emergência , Imobilização/normas , Traumatismos da Coluna Vertebral/terapia , Pesquisa Translacional Biomédica , Técnica Delphi , Grupos Focais , HumanosRESUMO
BACKGROUND: The fatality rate of coronavirus disease (COVID-19) in Italy is controversial and is greatly affecting discussion on the impact of containment measures that are straining the world's social and economic fabric, such as instigating large-scale isolation and quarantine, closing borders, imposing limits on public gatherings, and implementing nationwide lockdowns. OBJECTIVE: The scientific community, citizens, politicians, and mass media are expressing concerns regarding data suggesting that the number of COVID-19-related deaths in Italy is significantly higher than in the rest of the world. Moreover, Italian citizens have misleading perceptions related to the number of swab tests that have actually been performed. Citizens and mass media are denouncing the coverage of COVID-19 swab testing in Italy, claiming that it is not in line with that in other countries worldwide. METHODS: In this paper, we attempt to clarify the aspects of COVID-19 fatalities and testing in Italy by performing a set of statistical analyses that highlight the actual numbers in Italy and compare them with official worldwide data. RESULTS: The analysis clearly shows that the Italian COVID-19 fatality and mortality rates are in line with the official world scenario, as are the numbers of COVID-19 tests performed in Italy and in the Lombardy region. CONCLUSIONS: This up-to-date analysis may elucidate the evolution of the COVID-19 pandemic in Italy.
Assuntos
Técnicas de Laboratório Clínico/estatística & dados numéricos , Infecções por Coronavirus/diagnóstico , Infecções por Coronavirus/mortalidade , Pneumonia Viral/diagnóstico , Pneumonia Viral/mortalidade , Betacoronavirus/isolamento & purificação , COVID-19 , Teste para COVID-19 , Infecções por Coronavirus/psicologia , Infecções por Coronavirus/terapia , Análise de Dados , Humanos , Itália/epidemiologia , Mortalidade , Pandemias , Percepção , Pneumonia Viral/psicologia , Pneumonia Viral/terapia , SARS-CoV-2RESUMO
Objective measurement of simulation performance requires a validated and reliable tool. However, no published Italian language assessment tool is available. Translation of a published English language tool, the Ottawa Crisis Resource Management Global Rating Scale (GRS), may lead to a validated and reliable tool. After developing an Italian language translation of the English language tool, the study measured the reliability of the new tool by comparison with the English language tool used independently in the same simulation scenarios. In addition, the validity of the Italian language tool was measured by comparison to a skills score also applied independently. The correlation coefficient between the Italian language overall GRS and the English language overall GRS was 0.82 (adjusted 95 % confidence interval: 0.62-0.92). The correlation coefficient between the Italian language overall GRS and the skill score was 0.85 (adjusted 95 % confidence interval 0.68-0.94). This study demonstrated that the Italian language GRS has acceptable reliability when compared with the English language tool, suggesting that it can be used reliably to evaluate the performance during simulated emergencies. The study also suggests that the tool has acceptable validity for assessing the simulation performance. The study suggests that the Italian language GRS translation has reasonable reliability when compared with the English language GRS and reasonable validity when compared with the assessment of the skills scores. Data suggest that the instrument is adequately reliable for informal and formative type of examinations, but may require further confirmation before use for high-stake examinations such as licensing.
Assuntos
Avaliação Educacional/métodos , Treinamento por Simulação/normas , Desempenho Profissional/normas , Adulto , Competência Clínica/normas , Educação Médica Continuada/métodos , Feminino , Humanos , Internato e Residência/métodos , Itália , Masculino , Reprodutibilidade dos Testes , Inquéritos e Questionários , TraduçãoRESUMO
INTRODUCTION: Surge capacity, or the ability to manage an extraordinary volume of patients, is fundamental for hospital management of mass-casualty incidents. However, quantification of surge capacity is difficult and no universal standard for its measurement has emerged, nor has a standardized statistical method been advocated. As mass-casualty incidents are rare, simulation may represent a viable alternative to measure surge capacity. Hypothesis/Problem The objective of the current study was to develop a statistical method for the quantification of surge capacity using a combination of computer simulation and simple process-control statistical tools. Length-of-stay (LOS) and patient volume (PV) were used as metrics. The use of this method was then demonstrated on a subsequent computer simulation of an emergency department (ED) response to a mass-casualty incident. METHODS: In the derivation phase, 357 participants in five countries performed 62 computer simulations of an ED response to a mass-casualty incident. Benchmarks for ED response were derived from these simulations, including LOS and PV metrics for triage, bed assignment, physician assessment, and disposition. In the application phase, 13 students of the European Master in Disaster Medicine (EMDM) program completed the same simulation scenario, and the results were compared to the standards obtained in the derivation phase. RESULTS: Patient-volume metrics included number of patients to be triaged, assigned to rooms, assessed by a physician, and disposed. Length-of-stay metrics included median time to triage, room assignment, physician assessment, and disposition. Simple graphical methods were used to compare the application phase group to the derived benchmarks using process-control statistical tools. The group in the application phase failed to meet the indicated standard for LOS from admission to disposition decision. CONCLUSIONS: This study demonstrates how simulation software can be used to derive values for objective benchmarks of ED surge capacity using PV and LOS metrics. These objective metrics can then be applied to other simulation groups using simple graphical process-control tools to provide a numeric measure of surge capacity. Repeated use in simulations of actual EDs may represent a potential means of objectively quantifying disaster management surge capacity. It is hoped that the described statistical method, which is simple and reusable, will be useful for investigators in this field to apply to their own research.