RESUMO
Digital health research often relies on case vignettes (descriptions of fictitious or real patients) to navigate ethical and practical challenges. Despite their utility, the quality and lack of standardization of these vignettes has often been criticized, especially in studies on symptom-assessment applications (SAAs) and self-triage decision-making. To address this, our paper introduces a method to refine an existing set of vignettes, drawing on principles from classical test theory. First, we removed any vignette with an item difficulty of zero and an item-total correlation below zero. Second, we stratified the remaining vignettes to reflect the natural base rates of symptoms that SAAs are typically approached with, selecting those vignettes with the highest item-total correlation in each quota. Although this two-step procedure reduced the size of the original vignette set by 40%, comparing self-triage performance on the reduced and the original vignette sets, we found a strong correlation (r = 0.747 to r = 0.997, p < .001). This indicates that using our refinement method helps identifying vignettes with high predictive power of an agent's self-triage performance while simultaneously increasing cost-efficiency of vignette-based evaluation studies. This might ultimately lead to higher research quality and more reliable results.
RESUMO
Real-world evidence (RWE) trials have a key advantage over conventional randomized controlled trials (RCTs) due to their potentially better generalizability. High generalizability of study results facilitates new biological insights and enables targeted therapeutic strategies. Random sampling of RWE trial participants is regarded as the gold standard for generalizability. Additionally, the use of sample correction procedures can increase the generalizability of trial results, even when using nonrandomly sampled real-world data (RWD). This study presents descriptive evidence on the extent to which the design of currently planned or already conducted RWE trials takes sampling into account. It also examines whether random sampling or procedures for correcting nonrandom samples are considered. Based on text mining of publicly available metadata provided during registrations of RWE trials on clinicaltrials.gov, EU-PAS, and the OSF-RWE registry, it is shown that the share of RWE trial registrations with information on sampling increased from 65.27% in 2002 to 97.43% in 2022, with a corresponding increase from 14.79% to 28.30% for trials with random samples. For RWE trials with nonrandom samples, there is an increase from 0.00% to 0.95% of trials in which sample correction procedures are used. We conclude that the potential benefits of RWD in terms of generalizing trial results are not yet being fully realized.
Assuntos
Mineração de Dados , Projetos de Pesquisa , Humanos , Mineração de Dados/métodos , Ensaios Clínicos Controlados Aleatórios como Assunto/estatística & dados numéricos , Sistema de Registros/estatística & dados numéricos , Ensaios Clínicos como Assunto/estatística & dados numéricos , Ensaios Clínicos Pragmáticos como Assunto/métodos , Metadados/estatística & dados numéricosRESUMO
BACKGROUND: The question of the utility of face masks in preventing acute respiratory infections has received renewed attention during the COVID-19 pandemic. However, given the inconclusive evidence from existing randomized controlled trials, evidence based on real-world data with high external validity is missing. OBJECTIVE: To add real-world evidence, this study aims to examine whether mask mandates in 51 countries and mask recommendations in 10 countries increased self-reported face mask use and reduced SARS-CoV-2 reproduction numbers and COVID-19 case growth rates. METHODS: We applied an event study approach to data pooled from four sources: (1) country-level information on self-reported mask use was obtained from the COVID-19 Trends and Impact Survey, (2) data from the Oxford COVID-19 Government Response Tracker provided information on face mask mandates and recommendations and any other nonpharmacological interventions implemented, (3) mobility indicators from Google's Community Mobility Reports were also included, and (4) SARS-CoV-2 reproduction numbers and COVID-19 case growth rates were retrieved from the Our World in Data-COVID-19 data set. RESULTS: Mandates increased mask use by 8.81 percentage points (P=.006) on average, and SARS-CoV-2 reproduction numbers declined on average by -0.31 units (P=.008). Although no significant average effect of mask mandates was observed for growth rates of COVID-19 cases (-0.98 percentage points; P=.56), the results indicate incremental effects on days 26 (-1.76 percentage points; P=.04), 27 (-1.89 percentage points; P=.05), 29 (-1.78 percentage points; P=.04), and 30 (-2.14 percentage points; P=.02) after mandate implementation. For self-reported face mask use and reproduction numbers, incremental effects are seen 6 and 13 days after mandate implementation. Both incremental effects persist for >30 days. Furthermore, mask recommendations increased self-reported mask use on average (5.84 percentage points; P<.001). However, there were no effects of recommendations on SARS-CoV-2 reproduction numbers or COVID-19 case growth rates (-0.06 units; P=.70 and -2.45 percentage points; P=.59). Single incremental effects on self-reported mask use were observed on days 11 (3.96 percentage points; P=.04), 13 (3.77 percentage points; P=.04) and 25 to 27 (4.20 percentage points; P=.048 and 5.91 percentage points; P=.01) after recommendation. Recommendations also affected reproduction numbers on days 0 (-0.07 units; P=.03) and 1 (-0.07 units; P=.03) and between days 21 (-0.09 units; P=.04) and 28 (-0.11 units; P=.05) and case growth rates between days 1 and 4 (-1.60 percentage points; P=.03 and -2.19 percentage points; P=.03) and on day 23 (-2.83 percentage points; P=.05) after publication. CONCLUSIONS: Contrary to recommendations, mask mandates can be used as an effective measure to reduce SARS-CoV-2 reproduction numbers. However, mandates alone are not sufficient to reduce growth rates of COVID-19 cases. Our study adds external validity to the existing randomized controlled trials on the effectiveness of face masks to reduce the spread of SARS-CoV-2.
Assuntos
COVID-19 , Humanos , COVID-19/epidemiologia , COVID-19/prevenção & controle , SARS-CoV-2 , Pandemias/prevenção & controle , Estudos Retrospectivos , MáscarasRESUMO
Objective: To evaluate the ability of case vignettes to assess the performance of symptom checker applications and to suggest refinements to the methodology used in case vignette-based audit studies. Methods: We re-analyzed the publicly available data of two prominent case vignette-based symptom checker audit studies by calculating common metrics of test theory. Furthermore, we developed a new metric, the Capability Comparison Score (CCS), which compares symptom checker capability while controlling for the difficulty of the set of cases each symptom checker evaluated. We then scrutinized whether applying test theory and the CCS altered the performance ranking of the investigated symptom checkers. Results: In both studies, most symptom checkers changed their rank order when adjusting the triage capability for item difficulty (ID) with the CCS. The previously reported triage accuracies commonly overestimated the capability of symptom checkers because they did not account for the fact that symptom checkers tend to selectively appraise easier cases (i.e., with high ID values). Also, many case vignettes in both studies showed insufficient (very low and even negative) values of item-total correlation (ITC), suggesting that individual items or the composition of item sets are of low quality. Conclusions: A test-theoretic perspective helps identify previously undetected threats to the validity of case vignette-based symptom checker assessments and provides guidance and specific metrics to improve the quality of case vignettes, in particular by controlling for the difficulty of the vignettes an app was (not) able to evaluate correctly. Such measures might prove more meaningful than accuracy alone for the competitive assessment of symptom checkers. Our approach helps elaborate and standardize the methodology used for appraising symptom checker capability, which, ultimately, may yield more reliable results.
RESUMO
BACKGROUND: Previous studies have revealed that users of symptom checkers (SCs, apps that support self-diagnosis and self-triage) are predominantly female, are younger than average, and have higher levels of formal education. Little data are available for Germany, and no study has so far compared usage patterns with people's awareness of SCs and the perception of usefulness. OBJECTIVE: We explored the sociodemographic and individual characteristics that are associated with the awareness, usage, and perceived usefulness of SCs in the German population. METHODS: We conducted a cross-sectional online survey among 1084 German residents in July 2022 regarding personal characteristics and people's awareness and usage of SCs. Using random sampling from a commercial panel, we collected participant responses stratified by gender, state of residence, income, and age to reflect the German population. We analyzed the collected data exploratively. RESULTS: Of all respondents, 16.3% (177/1084) were aware of SCs and 6.5% (71/1084) had used them before. Those aware of SCs were younger (mean 38.8, SD 14.6 years, vs mean 48.3, SD 15.7 years), were more often female (107/177, 60.5%, vs 453/907, 49.9%), and had higher formal education levels (eg, 72/177, 40.7%, vs 238/907, 26.2%, with a university/college degree) than those unaware. The same observation applied to users compared to nonusers. It disappeared, however, when comparing users to nonusers who were aware of SCs. Among users, 40.8% (29/71) considered these tools useful. Those considering them useful reported higher self-efficacy (mean 4.21, SD 0.66, vs mean 3.63, SD 0.81, on a scale of 1-5) and a higher net household income (mean EUR 2591.63, SD EUR 1103.96 [mean US $2798.96, SD US $1192.28], vs mean EUR 1626.60, SD EUR 649.05 [mean US $1756.73, SD US $700.97]) than those who considered them not useful. More women considered SCs unhelpful (13/44, 29.5%) compared to men (4/26, 15.4%). CONCLUSIONS: Concurring with studies from other countries, our findings show associations between sociodemographic characteristics and SC usage in a German sample: users were on average younger, of higher socioeconomic status, and more commonly female compared to nonusers. However, usage cannot be explained by sociodemographic differences alone. It rather seems that sociodemographics explain who is or is not aware of the technology, but those who are aware of SCs are equally likely to use them, independently of sociodemographic differences. Although in some groups (eg, people with anxiety disorder), more participants reported to know and use SCs, they tended to perceive them as less useful. In other groups (eg, male participants), fewer respondents were aware of SCs, but those who used them perceived them to be more useful. Thus, SCs should be designed to fit specific user needs, and strategies should be developed to help reach individuals who could benefit but are not aware of SCs yet.
Assuntos
Saúde Pública , Telemedicina , Feminino , Humanos , Masculino , Estudos Transversais , Alemanha , Inquéritos e Questionários , Comportamento de Busca de InformaçãoRESUMO
The prevalence of mental health app use by people suffering from mental health disorders is rapidly growing. The integration of mental health apps shows promise in increasing the accessibility and quality of treatment. However, a lack of continued engagement is one of the significant challenges of such implementation. In response, the M-health Index and Navigation Database (MIND)- derived from the American Psychiatric Association's app evaluation framework- was created to support patient autonomy and enhance engagement. This study aimed to identify factors influencing engagement with mental health apps and explore how MIND may affect user engagement around selected apps. We conducted a longitudinal online survey over six weeks after participants were instructed to find mental health apps using MIND. The survey included demographic information, technology usage, access to healthcare, app selection information, System Usability Scale, the Digital Working Alliance Inventory, and the General Self-Efficacy Scale questions. Quantitative analysis was performed to analyze the data. A total of 321 surveys were completed (178 at the initial, 90 at the 2-week mark, and 53 at the 6-week mark). The most influential factors when choosing mental health apps included cost (76%), condition supported by the app (59%), and app features offered (51%), while privacy and clinical foundation to support app claims were among the least selected filters. The top ten apps selected by participants were analyzed for engagement. Rates of engagement among the top-ten apps decreased by 43% from the initial to week two and 22% from week two to week six on average. In the context of overall low engagement with mental health apps, implementation of mental health app databases like MIND can play an essential role in maintaining higher engagement and satisfaction. Together, this study offers early data on how educational approaches like MIND may help bolster mental health apps engagement.
RESUMO
BACKGROUND: Although medical decision-making may be thought of as a task involving health professionals, many decisions, including critical health-related decisions are made by laypersons alone. Specifically, as the first step to most care episodes, it is the patient who determines whether and where to seek health care (triage). Overcautious self-assessments (ie, overtriaging) may lead to overutilization of health care facilities and overcrowded emergency departments, whereas imprudent decisions (ie, undertriaging) constitute a risk to the patient's health. Recently, patient-facing decision support systems, commonly known as symptom checkers, have been developed to assist laypersons in these decisions. OBJECTIVE: The purpose of this study is to identify factors influencing laypersons' ability to self-triage and their risk averseness in self-triage decisions. METHODS: We analyzed publicly available data on 91 laypersons appraising 45 short fictitious patient descriptions (case vignettes; N=4095 appraisals). Using signal detection theory and descriptive and inferential statistics, we explored whether the type of medical decision laypersons face, their confidence in their decision, and sociodemographic factors influence their triage accuracy and the type of errors they make. We distinguished between 2 decisions: whether emergency care was required (decision 1) and whether self-care was sufficient (decision 2). RESULTS: The accuracy of detecting emergencies (decision 1) was higher (mean 82.2%, SD 5.9%) than that of deciding whether any type of medical care is required (decision 2, mean 75.9%, SD 5.25%; t>90=8.4; P<.001; Cohen d=0.9). Sensitivity for decision 1 was lower (mean 67.5%, SD 16.4%) than its specificity (mean 89.6%, SD 8.6%) whereas sensitivity for decision 2 was higher (mean 90.5%, SD 8.3%) than its specificity (mean 46.7%, SD 15.95%). Female participants were more risk averse and overtriaged more often than male participants, but age and level of education showed no association with participants' risk averseness. Participants' triage accuracy was higher when they were certain about their appraisal (2114/3381, 62.5%) than when being uncertain (378/714, 52.9%). However, most errors occurred when participants were certain of their decision (1267/1603, 79%). Participants were more commonly certain of their overtriage errors (mean 80.9%, SD 23.8%) than their undertriage errors (mean 72.5%, SD 30.9%; t>89=3.7; P<.001; d=0.39). CONCLUSIONS: Our study suggests that laypersons are overcautious in deciding whether they require medical care at all, but they miss identifying a considerable portion of emergencies. Our results further indicate that women are more risk averse than men in both types of decisions. Layperson participants made most triage errors when they were certain of their own appraisal. Thus, they might not follow or even seek advice (eg, from symptom checkers) in most instances where advice would be useful.
RESUMO
BACKGROUND: Due to the increasing use of online health information, symptom checkers have been developed to provide an individualized assessment of health complaints and provide potential diagnoses and an urgency estimation. It is assumed that they support patient empowerment and have a positive impact on patient-physician interaction and satisfaction with care. Particularly in the emergency department (ED), symptom checkers could be integrated to bridge waiting times in the ED, and patients as well as physicians could take advantage of potential positive effects. Our study therefore aims to assess the impact of symptom assessment application (SAA) usage compared to no SAA usage on the patient-physician interaction in self-referred walk-in patients in the ED population. METHODS: In this multi-center, 1:1 randomized, controlled, parallel-group superiority trial, 440 self-referred adult walk-in patients with a non-urgent triage category will be recruited in three EDs in Berlin. Eligible participants in the intervention group will use a SAA directly after initial triage. The control group receives standard care without using a SAA. The primary endpoint is patients' satisfaction with the patient-physician interaction assessed by the Patient Satisfaction Questionnaire. DISCUSSION: The results of this trial could influence the implementation of SAA into acute care to improve the satisfaction with the patient-physician interaction. TRIAL REGISTRATION: German Clinical Trials Registry DRKS00028598 . Registered on 25.03.2022.
Assuntos
Serviço Hospitalar de Emergência , Médicos , Adulto , Estudos de Equivalência como Asunto , Humanos , Estudos Multicêntricos como Assunto , Satisfação do Paciente , Ensaios Clínicos Controlados Aleatórios como Assunto , Avaliação de Sintomas , TriagemRESUMO
BACKGROUND: Symptom checker apps are patient-facing decision support systems aimed at providing advice to laypersons on whether, where, and how to seek health care (disposition advice). Such advice can improve laypersons' self-assessment and ultimately improve medical outcomes. Past research has mainly focused on the accuracy of symptom checker apps' suggestions. To support decision-making, such apps need to provide not only accurate but also trustworthy advice. To date, only few studies have addressed the question of the extent to which laypersons trust symptom checker app advice or the factors that moderate their trust. Studies on general decision support systems have shown that framing automated systems (anthropomorphic or emphasizing expertise), for example, by using icons symbolizing artificial intelligence (AI), affects users' trust. OBJECTIVE: This study aims to identify the factors influencing laypersons' trust in the advice provided by symptom checker apps. Primarily, we investigated whether designs using anthropomorphic framing or framing the app as an AI increases users' trust compared with no such framing. METHODS: Through a web-based survey, we recruited 494 US residents with no professional medical training. The participants had to first appraise the urgency of a fictitious patient description (case vignette). Subsequently, a decision aid (mock symptom checker app) provided disposition advice contradicting the participants' appraisal, and they had to subsequently reappraise the vignette. Participants were randomized into 3 groups: 2 experimental groups using visual framing (anthropomorphic, 160/494, 32.4%, vs AI, 161/494, 32.6%) and a neutral group without such framing (173/494, 35%). RESULTS: Most participants (384/494, 77.7%) followed the decision aid's advice, regardless of its urgency level. Neither anthropomorphic framing (odds ratio 1.120, 95% CI 0.664-1.897) nor framing as AI (odds ratio 0.942, 95% CI 0.565-1.570) increased behavioral or subjective trust (P=.99) compared with the no-frame condition. Even participants who were extremely certain in their own decisions (ie, 100% certain) commonly changed it in favor of the symptom checker's advice (19/34, 56%). Propensity to trust and eHealth literacy were associated with increased subjective trust in the symptom checker (propensity to trust b=0.25; eHealth literacy b=0.2), whereas sociodemographic variables showed no such link with either subjective or behavioral trust. CONCLUSIONS: Contrary to our expectation, neither the anthropomorphic framing nor the emphasis on AI increased trust in symptom checker advice compared with that of a neutral control condition. However, independent of the interface, most participants trusted the mock app's advice, even when they were very certain of their own assessment. Thus, the question arises as to whether laypersons use such symptom checkers as substitutes rather than as aids in their own decision-making. With trust in symptom checkers already high at baseline, the benefit of symptom checkers depends on interface designs that enable users to adequately calibrate their trust levels during usage. TRIAL REGISTRATION: Deutsches Register Klinischer Studien DRKS00028561; https://tinyurl.com/rv4utcfb (retrospectively registered).
RESUMO
BACKGROUND: Symptom checkers are digital tools assisting laypersons in self-assessing the urgency and potential causes of their medical complaints. They are widely used but face concerns from both patients and health care professionals, especially regarding their accuracy. A 2015 landmark study substantiated these concerns using case vignettes to demonstrate that symptom checkers commonly err in their triage assessment. OBJECTIVE: This study aims to revisit the landmark index study to investigate whether and how symptom checkers' capabilities have evolved since 2015 and how they currently compare with laypersons' stand-alone triage appraisal. METHODS: In early 2020, we searched for smartphone and web-based applications providing triage advice. We evaluated these apps on the same 45 case vignettes as the index study. Using descriptive statistics, we compared our findings with those of the index study and with publicly available data on laypersons' triage capability. RESULTS: We retrieved 22 symptom checkers providing triage advice. The median triage accuracy in 2020 (55.8%, IQR 15.1%) was close to that in 2015 (59.1%, IQR 15.5%). The apps in 2020 were less risk averse (odds 1.11:1, the ratio of overtriage errors to undertriage errors) than those in 2015 (odds 2.82:1), missing >40% of emergencies. Few apps outperformed laypersons in either deciding whether emergency care was required or whether self-care was sufficient. No apps outperformed the laypersons on both decisions. CONCLUSIONS: Triage performance of symptom checkers has, on average, not improved over the course of 5 years. It decreased in 2 use cases (advice on when emergency care is required and when no health care is needed for the moment). However, triage capability varies widely within the sample of symptom checkers. Whether it is beneficial to seek advice from symptom checkers depends on the app chosen and on the specific question to be answered. Future research should develop resources (eg, case vignette repositories) to audit the capabilities of symptom checkers continuously and independently and provide guidance on when and to whom they should be recommended.
Assuntos
Serviços Médicos de Emergência , Aplicativos Móveis , Coleta de Dados , Seguimentos , Humanos , Autocuidado , TriagemRESUMO
BACKGROUND: During the COVID-19 pandemic, medical laypersons with symptoms indicative of a COVID-19 infection commonly sought guidance on whether and where to find medical care. Numerous web-based decision support tools (DSTs) have been developed, both by public and commercial stakeholders, to assist their decision making. Though most of the DSTs' underlying algorithms are similar and simple decision trees, their mode of presentation differs: some DSTs present a static flowchart, while others are designed as a conversational agent, guiding the user through the decision tree's nodes step-by-step in an interactive manner. OBJECTIVE: This study aims to investigate whether interactive DSTs provide greater decision support than noninteractive (ie, static) flowcharts. METHODS: We developed mock interfaces for 2 DSTs (1 static, 1 interactive), mimicking patient-facing, freely available DSTs for COVID-19-related self-assessment. Their underlying algorithm was identical and based on the Centers for Disease Control and Prevention's guidelines. We recruited adult US residents online in November 2020. Participants appraised the appropriate social and care-seeking behavior for 7 fictitious descriptions of patients (case vignettes). Participants in the experimental groups received either the static or the interactive mock DST as support, while the control group appraised the case vignettes unsupported. We determined participants' accuracy, decision certainty (after deciding), and mental effort to measure the quality of decision support. Participants' ratings of the DSTs' usefulness, ease of use, trust, and future intention to use the tools served as measures to analyze differences in participants' perception of the tools. We used ANOVAs and t tests to assess statistical significance. RESULTS: Our survey yielded 196 responses. The mean number of correct assessments was higher in the intervention groups (interactive DST group: mean 11.71, SD 2.37; static DST group: mean 11.45, SD 2.48) than in the control group (mean 10.17, SD 2.00). Decisional certainty was significantly higher in the experimental groups (interactive DST group: mean 80.7%, SD 14.1%; static DST group: mean 80.5%, SD 15.8%) compared to the control group (mean 65.8%, SD 20.8%). The differences in these measures proved statistically significant in t tests comparing each intervention group with the control group (P<.001 for all 4 t tests). ANOVA detected no significant differences regarding mental effort between the 3 study groups. Differences between the 2 intervention groups were of small effect sizes and nonsignificant for all 3 measures of the quality of decision support and most measures of participants' perception of the DSTs. CONCLUSIONS: When the decision space is limited, as is the case in common COVID-19 self-assessment DSTs, static flowcharts might prove as beneficial in enhancing decision quality as interactive tools. Given that static flowcharts reveal the underlying decision algorithm more transparently and require less effort to develop, they might prove more efficient in providing guidance to the public. Further research should validate our findings on different use cases, elaborate on the trade-off between transparency and convenience in DSTs, and investigate whether subgroups of users benefit more with 1 type of user interface than the other. TRIAL REGISTRATION: Deutsches Register Klinischer Studien DRKS00028136; https://tinyurl.com/4bcfausx (retrospectively registered).