RESUMO
Introduction Proper application of clinical reasoning skills is essential to reduce diagnostic and management errors. Explicit inclusion of training and assessment of clinical reasoning skills is the demand of time. The study intended to measure the clinical reasoning skills of second-phase undergraduate students in a medical college in West Bengal, India, and its distribution across several individual variables. Methods The clinical reasoning skills of 105 undergraduate medical students were assessed in a cross-sectional exploratory study using key feature questions (KFQs) with the partial credit scoring system. Six case vignettes aligned to the core competencies in the subject of pharmacology, pathology, and microbiology were designed and validated by the subject material experts for this purpose. The responses of the participants were collected through Google Forms (Google, Mountain View, CA) after obtaining written informed consent. The scores obtained in all KFQs were added and expressed in percentage of the maximum attainable score. Results The mean (±SD) clinical reasoning score of the participants was 42.5 (±12.6). Only 29.6% of respondents scored ≥ 50. Students with higher subjective economic status (p-value = 0.01) and perceived autonomy (p-value < 0.001) were more likely to have higher clinical reasoning scores. The marks obtained in previous summative examinations were significantly correlated with clinical reasoning scores. Conclusion Average score < 50.0 and inability to score ≥ 50.0 by more than two-thirds of the participants reflected the deficit in the clinical reasoning skills of second-phase MBBS students. The association of clinical reasoning skills with economic status, autonomy, and previous academic performances needs further exploration.
RESUMO
PURPOSE: To evaluate the accuracy of large language models (LLMs) in answering ophthalmology board-style questions. DESIGN: Meta-analysis. METHODS: Literature search was conducted using PubMed and Embase in March 2024. We included full-length articles and research letters published in English that reported the accuracy of LLMs in answering ophthalmology board-style questions. Data on LLM performance, including the number of questions submitted and correct responses generated, were extracted for each question set from individual studies. Pooled accuracy was calculated using a random-effects model. Subgroup analyses were performed based on the LLMs used and specific ophthalmology topics assessed. RESULTS: Among the 14 studies retrieved, 13 (93â¯%) tested LLMs on multiple ophthalmology topics. ChatGPT-3.5, ChatGPT-4, Bard, and Bing Chat were assessed in 12 (86â¯%), 11 (79â¯%), 4 (29â¯%), and 4 (29â¯%) studies, respectively. The overall pooled accuracy of LLMs was 0.65 (95â¯% CI: 0.61-0.69). Among the different LLMs, ChatGPT-4 achieved the highest pooled accuracy at 0.74 (95â¯% CI: 0.73-0.79), while ChatGPT-3.5 recorded the lowest at 0.52 (95â¯% CI: 0.51-0.54). LLMs performed best in "pathology" (0.78 [95â¯% CI: 0.70-0.86]) and worst in "fundamentals and principles of ophthalmology" (0.52 [95â¯% CI: 0.48-0.56]). CONCLUSIONS: The overall accuracy of LLMs in answering ophthalmology board-style questions was acceptable but not exceptional, with ChatGPT-4 and Bing Chat being top-performing models. Performance varied significantly based on specific ophthalmology topics tested. Inconsistent performances are of concern, highlighting the need for future studies to include ophthalmology board-style questions with images to more comprehensively examine the competency of LLMs.
Assuntos
Oftalmologia , Humanos , Idioma , Avaliação Educacional/métodos , Educação de Pós-Graduação em Medicina/métodos , Conselhos de Especialidade ProfissionalRESUMO
The Extended matching Questions (EMQs), or R-type questions, are format of selected-response. The validity evidence for this format is crucial, but there have been reports of misunderstandings about validity. It is unclear what kinds of evidence should be presented and how to present them to support their educational impact. This review explores the pattern and quality of reporting the sources of validity evidence of EMQs in health professions education, encompassing content, response process, internal structure, relationship to other variables, and consequences. A systematic search in the electronic databases including MEDLINE via PubMed, Scopus, Web of Science, CINAHL, and ERIC was conducted to extract studies that utilize EMQs. The framework for a unitary concept of validity was applied to extract data. A total of 218 titles were initially selected, the final number of titles was 19. The most reported pieces of evidence were the reliability coefficient, followed by the relationship to another variable. Additionally, the adopted definition of validity is mostly the old tripartite concept. This study found that reporting and presenting validity evidence appeared to be deficient. The available evidence can hardly provide a strong validity argument that supports the educational impact of EMQs. This review calls for more work on developing a tool to measure the reporting and presenting validity evidence.
Assuntos
Ocupações em Saúde , Humanos , Ocupações em Saúde/educação , Reprodutibilidade dos Testes , Avaliação Educacional/métodos , Avaliação Educacional/normasRESUMO
Practice questions are highly sought out for use as a study tool among medical students in undergraduate medical education. At the same time, it remains unknown how medical students use and incorporate practice questions and their rationales into their studies. To explore this heavily relied upon study strategy, semi-structured interviews were conducted with second-year medical students to assess how they approach using practice questions. Qualitative thematic analysis revealed several recurrent themes: (1) Medical students use practice questions for primary learning, (2) Medical students place more importance on the rationale of a practice question versus selecting the right answer, and (3) Medical students view practice questions as being designed to be used once or having a single-use. Together, these themes provide insight into how medical students use practice questions to study, which may guide medical educators in their creation of practice questions with appropriate rationales and provide foundational data for future mixed methods analyses seeking to generalize these findings.
Assuntos
Educação de Graduação em Medicina , Pesquisa Qualitativa , Estudantes de Medicina , Humanos , Estudantes de Medicina/psicologia , Feminino , Entrevistas como Assunto , Masculino , Avaliação EducacionalRESUMO
Active learning and peer instruction contribute to positive learning outcomes. We developed a 25-week, question-based program for first-year medical students (MS1). Senior students developed weekly question and answer sets. Second-year peer educators helped MS1s learn to collaboratively problem solve by working through the questions and answer explanations. Controlling for average MCAT 2015 score, attendance significantly correlated with improved academic performance in basic science coursework (beta = 0.196, p < .001) and one organ systems module (beta = 0.104, p = .033). Academic outcomes and an 83% sustained participation rate point to the benefits of peer-led, positive learning environments.
RESUMO
Introduction: Legislative assemblies often provide a platform for legislators to question the government during question hours, which are crucial for governance However, question hours remain understudied, especially when addressing health policy and systems related issues in lower- and middle-income countries. This study assesses the 14th Kerala Legislative Assembly questions, focusing on health-related areas to provide insights for health policy formulation and decision-making processes. Materials and Methods: We sourced and transcribed all starred questions (346) related to health that were answered by the health minister in the 14th Legislative Assembly between 2016 and 2021 from the archives of the assembly website. We conducted a thematic analysis of these questions and mapped them into various themes, guided by the World Health Organization building blocks framework. Results: About 7.8% of all questions (N = 4404) were related to health (N = 346). Of these questions, the majority were directly related to service delivery (43.4%), followed by health information (16.5%). Health financing, food safety, and human resources were the least discussed topics throughout the assessed period within the state. The legislators primarily focussed on health services and health information, with less attention given to health financing, food safety, and human resources regardless of constituency or political affiliation. Discussion: This study underscores the need for a balanced approach to public health issues, highlighting the importance of legislators to priortizing health services and information, while also addressing health financing, food safety, and human resources. This would enable a robust and resilient public health system to effectively address diverse health concerns.
RESUMO
We describe validation of a COVID-19 antibody test for detection of anti-SARS-CoV-2 receptor-binding domain (RBD) IgG antibodies in blood plasma utilizing ethically sourced reagents not derived from aborted fetal cell lines. The test demonstrated specificity of 100% (95% confidence intervals 77.2-100%) and sensitivity of 100% (95% confidence intervals 79.6-100%) when evaluating blood specimens previously determined to be negative (n = 13) or positive for anti-SARS-CoV-2 RBD IgG antibodies due to natural SARS-CoV-2 exposure (n = 13) or COVID-19 vaccination (n = 15). The test was used to screen 230 blood specimens from individuals with unknown SARS-CoV-2 exposure (n = 103) or that were naturally exposed to SARS-CoV-2 (n = 44), received a COVID-19 vaccine (n = 66), or received a COVID-19 vaccine before or after SARS-CoV-2 exposure (n = 17). Ninety-nine percent (95% confidence intervals 95.7-100%) of the 127 blood specimens from individuals that were naturally exposed, vaccinated, or both vaccinated and naturally exposed were positive for anti-SARS-CoV-2 RBD IgG which was consistent with the high sensitivity of our test. This COVID-19 antibody test, now named the PL COVID-19 RBD IgG antibody test, represents an effective and ethical alternative to commercially available COVID-19 antibody tests that utilize reagents derived from aborted fetal lines.
RESUMO
Medical schools are required to assess and evaluate their curricula and to develop exam questions with strong reliability and validity evidence, often based on data derived from statistically small samples of medical students. Achieving a large enough sample to reliably and validly evaluate courses, assessments, and exam questions would require extensive data collection over many years, which is inefficient, especially in the fast-changing educational environment of medical schools. This article demonstrates how advanced quantitative methods, such as bootstrapping, can provide reliable data by resampling a single dataset to create many simulated samples. This economic approach, among others, allows for the creation of confidence intervals and, consequently, the accurate evaluation of exam questions as well as broader course and curriculum assessments. Bootstrapping offers a robust alternative to traditional methods, improving the psychometric quality of exam questions, and contributing to fair and valid assessments in medical education.
RESUMO
The examination for the Medical Intern Resident (MIR) is a multiple-choice test aimed at ranking candidates for specialized medical training positions in Spain. The objective of this study is to provide an objective analysis of this test in its 2022 edition as an evaluative tool for discrimination, with a particular focus on the field of radiology and nuclear medicine. The clinical cases associated with radiology images or nuclear medicine pose greater difficulty compared to the rest of the MIR exam questions. Out of the 14 questions related to radiological or nuclear medicine images, six of them exhibit high difficulty, and only 5 out of the 14 questions demonstrate good or excellent discriminatory capacity. While the MIR exam proves to be an excellent discriminatory tool in psychometric terms, the image-related questions show a significant potential for improvement. In order for the image-associated question to exhibit appropriate discrimination, it is essential to minimize irrelevant information, ensure that it complements the clinical information provided in the text without contradicting it, represent the characteristic imaging finding of the disease, utilize the appropriate imaging modality, maintain a moderate difficulty level for the questions, and ensure that the distractors are clearly false.
Assuntos
Internato e Residência , Espanha , Radiologia/educação , Humanos , Medicina Interna/educação , Medicina Nuclear , Avaliação Educacional/métodosRESUMO
Background: The rising prominence of artificial intelligence in healthcare has revolutionized patient access to medical information. This cross-sectional study sought to assess if ChatGPT could satisfactorily address common patient questions about total shoulder arthroplasty (TSA). Methods: Ten commonly encountered questions in TSA practice were selected and posed to ChatGPT. Each response was assessed for accuracy and clarity using the Mika et al. scoring system, which ranges from "excellent response not requiring clarification" to "unsatisfactory response requiring substantial clarification," and a modified DISCERN score. The readability was further evaluated using the Flesch Reading Ease Score and the Flesch-Kincaid Grade Level. Results: The mean Mika et al. score was 2.93, corresponding to an overall subjective rating of "satisfactory but requiring moderate clarification." The mean DISCERN score was 46.60, which is considered "fair." The readability analysis suggested that the responses were at a college-graduate level, higher than the recommended level for patient educational materials. Discussion: Our results suggest that ChatGPT has the potential to supplement the collaborative decision-making process between patients and experienced orthopedic surgeons for TSA-related inquiries. Ultimately, while tools like ChatGPT can enhance traditional patient education methods, they should not replace direct consultations with medical professionals.
RESUMO
BACKGROUND: The emergence of artificial intelligence (AI) has allowed users to have access to large sources of information in a chat-like manner. Thereby, we sought to evaluate ChatGPT-4 response's accuracy to the 10 patient most frequently asked questions (FAQs) regarding anterior cruciate ligament (ACL) surgery. METHODS: A list of the top 10 FAQs pertaining to ACL surgery was created after conducting a search through all Sports Medicine Fellowship Institutions listed on the Arthroscopy Association of North America (AANA) and American Orthopaedic Society of Sports Medicine (AOSSM) websites. A Likert scale was used to grade response accuracy by two sports medicine fellowship-trained surgeons. Cohen's kappa was used to assess inter-rater agreement. Reproducibility of the responses over time was also assessed. RESULTS: Five of the 10 responses received a 'completely accurate' grade by two-fellowship trained surgeons with three additional replies receiving a 'completely accurate' status by at least one. Moreover, inter-rater reliability accuracy assessment revealed a moderate agreement between fellowship-trained attending physicians (weighted kappa = 0.57, 95% confidence interval 0.15-0.99). Additionally, 80% of the responses were reproducible over time. CONCLUSION: ChatGPT can be considered an accurate additional tool to answer general patient questions regarding ACL surgery. None the less, patient-surgeon interaction should not be deferred and must continue to be the driving force for information retrieval. Thus, the general recommendation is to address any questions in the presence of a qualified specialist.
RESUMO
BACKGROUND: Suicide and suicidal behaviors pose significant global public health challenges, especially among young individuals. Effective screening strategies are crucial for addressing this crisis, with depression screening and suicide-specific tools being common approaches. This study compares their effectiveness by evaluating the Ask Suicide-Screening Questions (ASQ) against item 9 of the Patient Health Questionnaire-A (PHQ-A). METHODS: This study is a secondary analysis of the Argentinean-Spanish version of the ASQ validation study, an observational, cross-sectional, and multicenter study conducted in medical settings in Buenos Aires, Argentina. A convenience sample of pediatric outpatients/inpatients aged 10 to 18 years completed the ASQ, PHQ-A, and Suicide Ideation Questionnaire (SIQ) along with clinical and sociodemographic questions. RESULTS: A sample of 267 children and adolescents were included in this secondary analysis. Results show that the ASQ exhibited higher sensitivity (95.1%; 95% CI: 83% - 99%) compared to PHQ-A item 9 (73.1%; 95% CI: 57% - 85%), and superior performance in identifying suicide risk in youth. LIMITATIONS: The study included a convenience sampling and was geographically restricted to Buenos Aires, Argentina. The study also lacked longitudinal follow-up to assess the predictive validity of these screening tools for suicide risk. CONCLUSION: The study highlights the ASQ's effectiveness in identifying suicide risk among youth, emphasizing the importance of specialized screening tools over depression screening tools alone for accurate risk assessment in this population.
RESUMO
BACKGROUND: Health information consumers increasingly rely on question-and-answer (Q&A) communities to address their health concerns. However, the quality of questions posted significantly impacts the likelihood and relevance of received answers. OBJECTIVE: This study aims to improve our understanding of the quality of health questions within web-based Q&A communities. METHODS: We develop a novel framework for defining and measuring question quality within web-based health communities, incorporating content- and language-based variables. This framework leverages k-means clustering and establishes automated metrics to assess overall question quality. To validate our framework, we analyze questions related to kidney disease from expert-curated and community-based Q&A platforms. Expert evaluations confirm the validity of our quality construct, while regression analysis helps identify key variables. RESULTS: High-quality questions were more likely to include demographic and medical information than lower-quality questions (P<.001). In contrast, asking questions at the various stages of disease development was less likely to reflect high-quality questions (P<.001). Low-quality questions were generally shorter with lengthier sentences than high-quality questions (P<.01). CONCLUSIONS: Our findings empower consumers to formulate more effective health information questions, ultimately leading to better engagement and more valuable insights within web-based Q&A communities. Furthermore, our findings provide valuable insights for platform developers and moderators seeking to enhance the quality of user interactions and foster a more trustworthy and informative environment for health information exchange.
Assuntos
Informação de Saúde ao Consumidor , Humanos , Informação de Saúde ao Consumidor/normas , Idioma , Internet , Inquéritos e Questionários/normasRESUMO
OBJECTIVE: The objective was to compare the average number of mistakes made on multiple-choice (MCQ) and fill-in-the-blank (FIB) questions in anatomy lab exams. METHODS: The study was conducted retrospectively; every exam had both MCQs and FIBs. The study cohorts were divided into 3 tiers based on the number and percentage of mistakes in answering sheets: low (21-32, >40%), middle (11-20, 40%-20%), and high (1-9, <20%) tiers. The study used an independent 2-sample t test to compare the number of mistakes between MCQs and FIBs overall and per tier and a 1-way analysis of variance to compare the number of mistakes in both formats across the 3 tiers. RESULTS: The results show that there was a significant difference in the number of mistakes between the 2 formats overall with more mistakes found on FIBs (p < .001). The number of mistakes made in the high and middle tiers had a statistical difference, being higher on MCQs (p < .001). There was no significant difference in the number of mistakes made in the low tier between formats (p > .05). Furthermore, the study found significant differences in the number of mistakes made on MCQs and FIBs across the 3 tiers, being highest in the low-tier group (p < .001). CONCLUSION: There were fewer mistakes on the MCQ than the FIB format in exams. It also suggests that, in the low tier answering sheets, both formats could be used to identify students at academic risk who need more attention.
RESUMO
OBJECTIVE: Pharmacists are often the last line of defense from medical errors caused by inaccurate calculations. Effective teaching and assessment of pharmaceutical calculations is essential in preparing students for successful pharmacy careers. This study aimed to elucidate the potential benefit of self-testing practice questions on final examination performance in a first-year pharmaceutical calculations course. METHODS: One-hundred and sixteen students across the class of 2026 and 2027 were given access to 110 online practice calculation questions eight days prior to the final examination. Retrospective analysis using Pearson's Correlation Coefficient and an Unpaired t-test was used to assess the effect of self-study practice questions on exam performance. RESULTS: A correlation between higher quiz scores and enhanced final examination scores was observed for both the class of 2026 and 2027. A greater number of attempts on practice quiz questions correlated with a higher score on the final examination for the class of 2026, but not the class of 2027. Also, an earlier first access date was associated with higher final examination scores specifically for the class of 2026. CONCLUSION: This retrospective study was conducted to evaluate the use of practice calculation questions on final examination performance, and results reveal that the utilization of practice calculation questions positively correlates with improved final examination performance, notably observed in the class of 2026 but not in 2027. These findings suggest the potential efficacy of this preparatory method across various pharmaceutical courses and other calculation-based disciplines internationally.
RESUMO
INTRODUCTION: ChatGPT 4.0, a large-scale language model (LLM) developed by OpenAI, has demonstrated the capability to pass Japan's national medical examination and other medical assessments. However, the impact of imaging-based questions and different question types on its performance has not been thoroughly examined. This study evaluated ChatGPT 4.0's performance on Japan's national examination for physical therapists, particularly its ability to handle complex questions involving images and tables. The study also assessed the model's potential in the field of rehabilitation and its performance with Japanese language inputs. METHODS: The evaluation utilized 1,000 questions from the 54th to 58th national exams for physical therapists in Japan, comprising 160 general questions and 40 practical questions per exam. All questions were input in Japanese and included additional information such as images or tables. The answers generated by ChatGPT were then compared with the official correct answers. ANALYSIS: ChatGPT's performance was evaluated based on accuracy rates using various criteria: general and practical questions were analyzed with Fisher's exact test, A-type (single correct answer) and X2-type (two correct answers) questions, text-only questions versus questions with images and tables, and different question lengths using Student's t-test. RESULTS: ChatGPT 4.0 met the passing criteria with an overall accuracy of 73.4%. The accuracy rates for general and practical questions were 80.1% and 46.6%, respectively. No significant difference was found between the accuracy rates for A-type (74.3%) and X2-type (67.4%) questions. However, a significant difference was observed between the accuracy rates for text-only questions (80.5%) and questions with images and tables (35.4%). DISCUSSION: The results indicate that ChatGPT 4.0 satisfies the passing criteria for the national exam and demonstrates adequate knowledge and application skills. However, its performance on practical questions and those with images and tables is lower, indicating areas for improvement. The effective handling of Japanese inputs suggests its potential use in non-English-speaking regions. CONCLUSION: ChatGPT 4.0 can pass the national examination for physical therapists, particularly with text-based questions. However, improvements are needed for specialized practical questions and those involving images and tables. The model shows promise for supporting clinical rehabilitation and medical education in Japanese-speaking contexts, though further enhancements are required for a comprehensive application.
RESUMO
Objective Emergency Medicine (EM) clerkships often use a written exam to assess the knowledge gained over the course of an EM rotation in medical school. Clerkship Directors (CDs) may choose the National Board of Medical Examiners (NBME) EM Advanced Clinical Science Subject Exam (ACE), the Society for Academic Emergency Medicine (SAEM) M4 exam, which has two versions, the SAEM M3 exam, or departmental exams. There are currently no published guidelines or consensus regarding their utility. This survey-based study was designed to collect data regarding current practices of EM clerkship exam usage to analyze trends and variability in what exams are used and how. Methods The authors designed a cross-sectional observational survey to collect data from EM CDs on exam utilization in clerkships. The survey population consisted of clerkship directors, assistant clerkship directors, or faculty familiar assessments in their EM clerkship. Initial dissemination was by electronic distribution to subscribers of the Clerkship Directors in Emergency Medicine (CDEM) list-serve on the SAEM website. Subsequently, contact information of CD's from institutions that had not responded was obtained by manual search of the Emergency Medicine Residents' Association (EMRA) Match website and individual correspondence was sent at regular intervals. Data obtained include clerkship characteristics, exam used, weight of the exam relative to the overall grade, and alternatives if the preferred exam was previously taken. Results Eighty-seven programs (42% response rate) completed the survey between August 2019 and February 2021. Of the 87 responses, 71 (82%) were completed by a CD. Forty-six (53%) institutions required an EM rotation. Students were tested in 34 (74%) required EM clerkships and 48 (69%) out of 70 EM electives. In required rotations that used an exam, 20 (59%) used the NBME EM ACE, while 28 of 46 (61%) of EM electives that reported an exam used the SAEM M4 Exam. Five (15%) of the required clerkships used a departmental exam. Of clerkships requiring an exam, 46 (57%) weighed the score at 11-30% of the final grade. Data for extramural rotations mirrored that of EM electives. One-third of respondents indicated they do not inquire about previously taken exams. Conclusion This survey demonstrates significant variability in the type of exam, the weighting of the score, and alternatives if the preferred exam was previously taken. The lack of a consistent approach in how these exams are used in determining students' final EM grades diminishes the reliability of the EM clerkship grade as a factor used by residency directors in choosing future residents. Further research on optimal usage of these exams is needed.
RESUMO
Lung cancer is the leading cause of cancer death in the US and globally. The mortality from lung cancer has been declining, due to a reduction in incidence and advances in treatment. Although recent success in developing targeted and immunotherapies for lung cancer has benefitted patients, it has also expanded the complexity of potential treatment options for health care providers. To aid in reducing such complexity, experts in oncology convened a conference (Bridging the Gaps in Lung Cancer) to identify current knowledge gaps and controversies in the diagnosis, treatment, and outcomes of various lung cancer scenarios, as described here. Such scenarios relate to biomarkers and testing in lung cancer, small cell lung cancer, EGFR mutations and targeted therapy in non-small cell lung cancer (NSCLC), early-stage NSCLC, KRAS/BRAF/MET and other genomic alterations in NSCLC, and immunotherapy in advanced NSCLC.
RESUMO
BACKGROUND AND OBJECTIVES: Addressing the issues of workplace advancement, resilience, and retention within medicine is crucial for creating a culture of equity, respect, and inclusivity especially towards women and nonbinary (WNB) providers including advanced practice providers (APPs), most notably those from marginalized groups. This also directly impacts healthcare quality, patient outcomes, and overall patient and employee satisfaction. The purpose of this study was to amplify the voices on challenges faced by WNB providers within a pediatric academic healthcare organization, to rank workplace interventions addressing advancement, resilience, and retention highlighting urgency towards addressing these issues, and, lastly, to provide suggestions on how to improve inclusivity. METHODS: Participants were self-identified WNB providers employed by a pediatric healthcare organization and its affiliated medical university. An eligibility screener was completed by 150 qualified respondents, and 40 WNBs actually participated in study interviews. Interviews were conducted using a semi-structured interview guide to rank interventions targeted at improving equity, with time allotted for interviewees to discuss their personal lives and how individual circumstances impacted their professional experiences. RESULTS: WNB providers called for efficient workflows and reducing uncompensated job demands. Support for family responsibilities, flexible financial/compensation models, and improved job resources all were endorsed similarly. Participants ranked direct supervisor and leader support substantially lower than other interventions. Conclusions: Career mentorship and academic support for WNB individuals are recognized interventions for advancement and retention but were not ranked as top priorities. Respondents focused on personal supports as they relate to family, job resources, and flexible compensation models. Future studies should focus on implementing realistic expectations and structures that support whole lives including professional ambitions, time with family, personal pursuits, and self-care.
RESUMO
AIMS: The aims of this study were to identify alcohol-related population surveys administered in the Americas, determine which alcohol-related measures are examined and identify coverage gaps regarding alcohol-related measures. METHODS: As part of the Global Information System on Alcohol and Health study, a systematic search was performed in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses criteria to identify regionally or nationally representative survey reports of the general population from 1 January 2010 to 6 August 2019. Alcohol-related measures extracted from surveys were categorized into 10 domains: alcohol consumption status; alcohol consumption; unrecorded alcohol consumption; drinking patterns; symptoms of dependence and/or harmful use; drinking during pregnancy; treatment coverage; second-hand harms; economic; and other. RESULTS: The systematic search identified 7417 survey reports, 94 of which were new and included in this study, with an additional 11 studies included from a previous systematic study of alcohol surveys. In total, 94 unique surveys and 161 unique survey waves were located, representing 105 unique survey questionnaires covering 30 countries. No population surveys were found for five member states; namely, Antigua and Barbuda, Dominica, Haiti, Saint Vincent and the Grenadines and Saint Kitts and Nevis. All countries with population-based alcohol surveys had had a population survey probing alcohol use in the past year/month. Questions regarding heavy episodic drinking, alcohol use disorders, treatment-seeking for alcohol use, drinking during pregnancy, harms to others and the amounts spent on alcohol were asked in 26, 25, 10, 6, 22 and 11 countries, respectively. CONCLUSIONS: The heterogeneity in alcohol-related population surveys in the Americas from 2010 to 2019 limits their comparability throughout countries and over time. Future surveys should implement a standardized set of core questions to provide consistency in the monitoring of alcohol consumption and alcohol-related harms.