Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
1.
Nature ; 620(7972): 172-180, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37438534

RESUMO

Large language models (LLMs) have demonstrated impressive capabilities, but the bar for clinical applications is high. Attempts to assess the clinical knowledge of models typically rely on automated evaluations based on limited benchmarks. Here, to address these limitations, we present MultiMedQA, a benchmark combining six existing medical question answering datasets spanning professional medicine, research and consumer queries and a new dataset of medical questions searched online, HealthSearchQA. We propose a human evaluation framework for model answers along multiple axes including factuality, comprehension, reasoning, possible harm and bias. In addition, we evaluate Pathways Language Model1 (PaLM, a 540-billion parameter LLM) and its instruction-tuned variant, Flan-PaLM2 on MultiMedQA. Using a combination of prompting strategies, Flan-PaLM achieves state-of-the-art accuracy on every MultiMedQA multiple-choice dataset (MedQA3, MedMCQA4, PubMedQA5 and Measuring Massive Multitask Language Understanding (MMLU) clinical topics6), including 67.6% accuracy on MedQA (US Medical Licensing Exam-style questions), surpassing the prior state of the art by more than 17%. However, human evaluation reveals key gaps. To resolve this, we introduce instruction prompt tuning, a parameter-efficient approach for aligning LLMs to new domains using a few exemplars. The resulting model, Med-PaLM, performs encouragingly, but remains inferior to clinicians. We show that comprehension, knowledge recall and reasoning improve with model scale and instruction prompt tuning, suggesting the potential utility of LLMs in medicine. Our human evaluations reveal limitations of today's models, reinforcing the importance of both evaluation frameworks and method development in creating safe, helpful LLMs for clinical applications.


Assuntos
Benchmarking , Simulação por Computador , Conhecimento , Medicina , Processamento de Linguagem Natural , Viés , Competência Clínica , Compreensão , Conjuntos de Dados como Assunto , Licenciamento , Medicina/métodos , Medicina/normas , Segurança do Paciente , Médicos
3.
J Med Internet Res ; 20(3): e97, 2018 03 21.
Artigo em Inglês | MEDLINE | ID: mdl-29563076

RESUMO

BACKGROUND: The rise in usage of and access to new technologies in recent years has led to a growth in digital health behavior change interventions. As the shift to digital platforms continues to grow, it is increasingly important to consider how the field of information architecture (IA) can inform the development of digital health interventions. IA is the way in which digital content is organized and displayed, which strongly impacts users' ability to find and use content. While many information architecture best practices exist, there is a lack of empirical evidence on the role it plays in influencing behavior change and health outcomes. OBJECTIVE: Our aim was to conduct a systematic review synthesizing the existing literature on website information architecture and its effect on health outcomes, behavioral outcomes, and website engagement. METHODS: To identify all existing information architecture and health behavior literature, we searched articles published in English in the following databases (no date restrictions imposed): ACM Digital Library, CINAHL, Cochrane Library, Google Scholar, Ebsco, and PubMed. The search terms used included information terms (eg, information architecture, interaction design, persuasive design), behavior terms (eg, health behavior, behavioral intervention, ehealth), and health terms (eg, smoking, physical activity, diabetes). The search results were reviewed to determine if they met the inclusion and exclusion criteria created to identify empirical research that studied the effect of IA on health outcomes, behavioral outcomes, or website engagement. Articles that met inclusion criteria were assessed for study quality. Then, data from the articles were extracted using a priori categories established by 3 reviewers. However, the limited health outcome data gathered from the studies precluded a meta-analysis. RESULTS: The initial literature search yielded 685 results, which was narrowed down to three publications that examined the effect of information architecture on health outcomes, behavioral outcomes, or website engagement. One publication studied the isolated impact of information architecture on outcomes of interest (ie, website use and engagement; health-related knowledge, attitudes, and beliefs; and health behaviors), while the other two publications studied the impact of information architecture, website features (eg, interactivity, email prompts, and forums), and tailored content on these outcomes. The paper that investigated IA exclusively found that a tunnel IA improved site engagement and behavior knowledge, but it decreased users' perceived efficiency. The first study that did not isolate IA found that the enhanced site condition improved site usage but not the amount of content viewed. The second study that did not isolate IA found that a tailored site condition improved site usage, behavior knowledge, and some behavior outcomes. CONCLUSIONS: No clear conclusion can be made about the relationship between IA and health outcomes, given limited evidence in the peer-reviewed literature connecting IA to behavioral outcomes and website engagement. Only one study reviewed solely manipulated IA, and we therefore recommend improving the scientific evidence base such that additional empirical studies investigate the impact of IA in isolation. Moreover, information from the gray literature and expert opinion might be identified and added to the evidence base, in order to lay the groundwork for hypothesis generation to improve empirical evidence on information architecture and health and behavior outcomes.


Assuntos
Comportamentos Relacionados com a Saúde , Internet/instrumentação , Qualidade da Assistência à Saúde/normas , Feminino , Humanos , Masculino , Pessoa de Meia-Idade
4.
J Biomed Inform ; 76: 1-8, 2017 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-28974460

RESUMO

OBJECTIVE: To outline new design directions for informatics solutions that facilitate personal discovery with self-monitoring data. We investigate this question in the context of chronic disease self-management with the focus on type 2 diabetes. MATERIALS AND METHODS: We conducted an observational qualitative study of discovery with personal data among adults attending a diabetes self-management education (DSME) program that utilized a discovery-based curriculum. The study included observations of class sessions, and interviews and focus groups with the educator and attendees of the program (n = 14). RESULTS: The main discovery in diabetes self-management evolved around discovering patterns of association between characteristics of individuals' activities and changes in their blood glucose levels that the participants referred to as "cause and effect". This discovery empowered individuals to actively engage in self-management and provided a desired flexibility in selection of personalized self-management strategies. We show that discovery of cause and effect involves four essential phases: (1) feature selection, (2) hypothesis generation, (3) feature evaluation, and (4) goal specification. Further, we identify opportunities to support discovery at each stage with informatics and data visualization solutions by providing assistance with: (1) active manipulation of collected data (e.g., grouping, filtering and side-by-side inspection), (2) hypotheses formulation (e.g., using natural language statements or constructing visual queries), (3) inference evaluation (e.g., through aggregation and visual comparison, and statistical analysis of associations), and (4) translation of discoveries into actionable goals (e.g., tailored selection from computable knowledge sources of effective diabetes self-management behaviors). DISCUSSION: The study suggests that discovery of cause and effect in diabetes can be a powerful approach to helping individuals to improve their self-management strategies, and that self-monitoring data can serve as a driving engine for personal discovery that may lead to sustainable behavior changes. CONCLUSIONS: Enabling personal discovery is a promising new approach to enhancing chronic disease self-management with informatics interventions.


Assuntos
Diabetes Mellitus Tipo 2/terapia , Autocuidado , Autoeficácia , Terapia Comportamental , Automonitorização da Glicemia , Diabetes Mellitus Tipo 2/psicologia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Educação de Pacientes como Assunto
5.
Tob Control ; 26(6): 683-689, 2017 11.
Artigo em Inglês | MEDLINE | ID: mdl-27852892

RESUMO

OBJECTIVE: This observational study highlights key insights related to participant engagement and cessation among adults who voluntarily subscribed to the nationwide US-based SmokefreeTXT program, a 42-day mobile phone text message smoking cessation program. METHODS: Point prevalence abstinence rates were calculated for subscribers who initiated treatment in the program (n=18 080). The primary outcomes for this study were treatment completion and point prevalence abstinence rate at the end of the 42-day treatment. Secondary outcomes were point prevalence abstinence rates at 7 days postquit, 3 months post-treatment and 6 months post-treatment, as well as response rates to point prevalence abstinence assessments. RESULTS: Over half the sample completed the 42-day treatment (n=9686). The end-of-treatment point prevalence abstinence for subscribers who initiated treatment was 7.2%. Among those who completed the entire 42 days of treatment, the end-of-treatment point prevalence abstinence was 12.9%. For subscribers who completed treatment, point prevalence abstinence results varied: 7 days postquit (23.7%), 3 months post-treatment (7.3%) and 6 months post-treatment (3.7%). Response rates for abstinence assessment messages ranged from 4.36% to 34.48%. CONCLUSIONS: Findings from this study illuminate the need to more deeply understand reasons for subscriber non-response and opt out and, in turn, improve program engagement and our ability to increase the likelihood for participants to stop smoking and measure long-term outcomes. Patterns of opt out for the program mirror the relapse curve generally observed for smoking cessation, thus highlighting time points at which to increase efforts to retain participants and provide additional support or incentives.


Assuntos
Abandono do Hábito de Fumar/métodos , Fumar/epidemiologia , Envio de Mensagens de Texto/estatística & dados numéricos , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Prevalência , Autorrelato , Resultado do Tratamento , Estados Unidos , Adulto Jovem
6.
J Behav Med ; 40(1): 6-22, 2017 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-27481101

RESUMO

A central goal of behavioral medicine is the creation of evidence-based interventions for promoting behavior change. Scientific knowledge about behavior change could be more effectively accumulated using "ontologies." In information science, an ontology is a systematic method for articulating a "controlled vocabulary" of agreed-upon terms and their inter-relationships. It involves three core elements: (1) a controlled vocabulary specifying and defining existing classes; (2) specification of the inter-relationships between classes; and (3) codification in a computer-readable format to enable knowledge generation, organization, reuse, integration, and analysis. This paper introduces ontologies, provides a review of current efforts to create ontologies related to behavior change interventions and suggests future work. This paper was written by behavioral medicine and information science experts and was developed in partnership between the Society of Behavioral Medicine's Technology Special Interest Group (SIG) and the Theories and Techniques of Behavior Change Interventions SIG. In recent years significant progress has been made in the foundational work needed to develop ontologies of behavior change. Ontologies of behavior change could facilitate a transformation of behavioral science from a field in which data from different experiments are siloed into one in which data across experiments could be compared and/or integrated. This could facilitate new approaches to hypothesis generation and knowledge discovery in behavioral science.


Assuntos
Pesquisa Biomédica/normas , Biologia Computacional/métodos , Computação em Informática Médica , Vocabulário Controlado , Bases de Dados Factuais , Humanos , Semântica , Software
7.
J Med Internet Res ; 19(3): e96, 2017 03 31.
Artigo em Inglês | MEDLINE | ID: mdl-28363881

RESUMO

BACKGROUND: Health risk assessments (HRAs), which often screen for depressive symptoms, are administered to millions of employees and health plan members each year. HRA data provide an opportunity to examine longitudinal trends in depressive symptomatology, as researchers have done previously with other populations. OBJECTIVE: The primary research questions were: (1) Can we observe longitudinal trajectories in HRA populations like those observed in other study samples? (2) Do HRA variables, which primarily reflect modifiable health risks, help us to identify predictors associated with these trajectories? (3) Can we make meaningful recommendations for population health management, applicable to HRA participants, based on predictors we identify? METHODS: This study used growth mixture modeling (GMM) to examine longitudinal trends in depressive symptomatology among 22,963 participants in a Web-based HRA used by US employers and health plans. The HRA assessed modifiable health risks and variables such as stress, sleep, and quality of life. RESULTS: Five classes were identified: A "minimal depression" class (63.91%, 14,676/22,963) whose scores were consistently low across time, a "low risk" class (19.89%, 4568/22,963) whose condition remained subthreshold, a "deteriorating" class (3.15%, 705/22,963) who began at subthreshold but approached severe depression by the end of the study, a "chronic" class (4.71%, 1081/22,963) who remained highly depressed over time, and a "remitting" class (8.42%, 1933/22,963) who had moderate depression to start, but crossed into minimal depression by the end. Among those with subthreshold symptoms, individuals who were male (P<.001) and older (P=.01) were less likely to show symptom deterioration, whereas current depression treatment (P<.001) and surprisingly, higher sleep quality (P<.001) were associated with increased probability of membership in the "deteriorating" class as compared with "low risk." Among participants with greater symptomatology to start, those in the "severe" class tended to be younger than the "remitting" class (P<.001). Lower baseline sleep quality (P<.001), quality of life (P<.001), stress level (P<.001), and current treatment involvement (P<.001) were all predictive of membership in the "severe" class. CONCLUSIONS: The trajectories identified were consistent with trends in previous research. The results identified some key predictors: we discuss those that mirror prior studies and offer some hypotheses as to why others did not. The finding that 1 in 5 HRA participants with subthreshold symptoms deteriorated to the point of clinical distress during succeeding years underscores the need to learn more about such individuals. We offer additional recommendations for follow-up research, which should be designed to reflect changes in health plan demographics and HRA delivery platforms. In addition to utilizing additional variables such as cognitive style to refine predictive models, future research could also begin to test the impact of more aggressive outreach strategies aimed at participants who are likely to deteriorate or remain significantly depressed over time.


Assuntos
Depressão/psicologia , Internet/estatística & dados numéricos , Adolescente , Adulto , Idoso , Depressão/epidemiologia , Feminino , Humanos , Estudos Longitudinais , Masculino , Pessoa de Meia-Idade , Qualidade de Vida , Medição de Risco , Adulto Jovem
8.
J Med Internet Res ; 18(8): e205, 2016 08 02.
Artigo em Inglês | MEDLINE | ID: mdl-27485315

RESUMO

BACKGROUND: Social media platforms are increasingly being used to support individuals in behavior change attempts, including smoking cessation. Examining the interactions of participants in health-related social media groups can help inform our understanding of how these groups can best be leveraged to facilitate behavior change. OBJECTIVE: The aim of this study was to analyze patterns of participation, self-reported smoking cessation length, and interactions within the National Cancer Institutes' Facebook community for smoking cessation support. METHODS: Our sample consisted of approximately 4243 individuals who interacted (eg, posted, commented) on the public Smokefree Women Facebook page during the time of data collection. In Phase 1, social network visualizations and centrality measures were used to evaluate network structure and engagement. In Phase 2, an inductive, thematic qualitative content analysis was conducted with a subsample of 500 individuals, and correlational analysis was used to determine how participant engagement was associated with self-reported session length. RESULTS: Between February 2013 and March 2014, there were 875 posts and 4088 comments from approximately 4243 participants. Social network visualizations revealed the moderator's role in keeping the community together and distributing the most active participants. Correlation analyses suggest that engagement in the network was significantly inversely associated with cessation status (Spearman correlation coefficient = -0.14, P=.03, N=243). The content analysis of 1698 posts from 500 randomly selected participants identified the most frequent interactions in the community as providing support (43%, n=721) and announcing number of days smoke free (41%, n=689). CONCLUSIONS: These findings highlight the importance of the moderator for network engagement and provide helpful insights into the patterns and types of interactions participants are engaging in. This study adds knowledge of how the social network of a smoking cessation community behaves within the confines of a Facebook group.


Assuntos
Abandono do Hábito de Fumar/métodos , Comportamento Social , Mídias Sociais/estatística & dados numéricos , Rede Social , Apoio Social , Adulto , Coleta de Dados , Feminino , Humanos , Abandono do Hábito de Fumar/estatística & dados numéricos
9.
Annu Rev Public Health ; 36: 393-415, 2015 Mar 18.
Artigo em Inglês | MEDLINE | ID: mdl-25785892

RESUMO

The aim of this systematic review of reviews is to identify mobile text-messaging interventions designed for health improvement and behavior change and to derive recommendations for practice. We have compiled and reviewed existing systematic research reviews and meta-analyses to organize and summarize the text-messaging intervention evidence base, identify best-practice recommendations based on findings from multiple reviews, and explore implications for future research. Our review found that the majority of published text-messaging interventions were effective when addressing diabetes self-management, weight loss, physical activity, smoking cessation, and medication adherence for antiretroviral therapy. However, we found limited evidence across the population of studies and reviews to inform recommended intervention characteristics. Although strong evidence supports the value of integrating text-messaging interventions into public health practice, additional research is needed to establish longer-term intervention effects, identify recommended intervention characteristics, and explore issues of cost-effectiveness.


Assuntos
Promoção da Saúde/métodos , Envio de Mensagens de Texto , Telefone Celular , Humanos , Adesão à Medicação , Avaliação de Programas e Projetos de Saúde , Programas de Redução de Peso/métodos
10.
J Med Internet Res ; 17(8): e208, 2015 Aug 25.
Artigo em Inglês | MEDLINE | ID: mdl-26307512

RESUMO

BACKGROUND: Electronic cigarettes (e-cigarettes) continue to be a growing topic among social media users, especially on Twitter. The ability to analyze conversations about e-cigarettes in real-time can provide important insight into trends in the public's knowledge, attitudes, and beliefs surrounding e-cigarettes, and subsequently guide public health interventions. OBJECTIVE: Our aim was to establish a supervised machine learning algorithm to build predictive classification models that assess Twitter data for a range of factors related to e-cigarettes. METHODS: Manual content analysis was conducted for 17,098 tweets. These tweets were coded for five categories: e-cigarette relevance, sentiment, user description, genre, and theme. Machine learning classification models were then built for each of these five categories, and word groupings (n-grams) were used to define the feature space for each classifier. RESULTS: Predictive performance scores for classification models indicated that the models correctly labeled the tweets with the appropriate variables between 68.40% and 99.34% of the time, and the percentage of maximum possible improvement over a random baseline that was achieved by the classification models ranged from 41.59% to 80.62%. Classifiers with the highest performance scores that also achieved the highest percentage of the maximum possible improvement over a random baseline were Policy/Government (performance: 0.94; % improvement: 80.62%), Relevance (performance: 0.94; % improvement: 75.26%), Ad or Promotion (performance: 0.89; % improvement: 72.69%), and Marketing (performance: 0.91; % improvement: 72.56%). The most appropriate word-grouping unit (n-gram) was 1 for the majority of classifiers. Performance continued to marginally increase with the size of the training dataset of manually annotated data, but eventually leveled off. Even at low dataset sizes of 4000 observations, performance characteristics were fairly sound. CONCLUSIONS: Social media outlets like Twitter can uncover real-time snapshots of personal sentiment, knowledge, attitudes, and behavior that are not as accessible, at this scale, through any other offline platform. Using the vast data available through social media presents an opportunity for social science and public health methodologies to utilize computational methodologies to enhance and extend research and practice. This study was successful in automating a complex five-category manual content analysis of e-cigarette-related content on Twitter using machine learning techniques. The study details machine learning model specifications that provided the best accuracy for data related to e-cigarettes, as well as a replicable methodology to allow extension of these methods to additional topics.


Assuntos
Algoritmos , Sistemas Eletrônicos de Liberação de Nicotina , Aprendizado de Máquina , Mídias Sociais , Atitude Frente a Saúde , Humanos , Marketing , Saúde Pública
11.
J Med Internet Res ; 17(10): e243, 2015 Oct 27.
Artigo em Inglês | MEDLINE | ID: mdl-26508089

RESUMO

BACKGROUND: Electronic cigarette (e-cigarette) use has increased in the United States, leading to active debate in the public health sphere regarding e-cigarette use and regulation. To better understand trends in e-cigarette attitudes and behaviors, public health and communication professionals can turn to the dialogue taking place on popular social media platforms such as Twitter. OBJECTIVE: The objective of this study was to conduct a content analysis to identify key conversation trends and patterns over time using historical Twitter data. METHODS: A 5-category content analysis was conducted on a random sample of tweets chosen from all publicly available tweets sent between May 1, 2013, and April 30, 2014, that matched strategic keywords related to e-cigarettes. Relevant tweets were isolated from the random sample of approximately 10,000 tweets and classified according to sentiment, user description, genre, and theme. Descriptive analyses including univariate and bivariate associations, as well as correlation analyses were performed on all categories in order to identify patterns and trends. RESULTS: The analysis revealed an increase in e-cigarette-related tweets from May 2013 through April 2014, with tweets generally being positive; 71% of the sample tweets were classified as having a positive sentiment. The top two user categories were everyday people (65%) and individuals who are part of the e-cigarette community movement (16%). These two user groups were responsible for a majority of informational (79%) and news tweets (75%), compared to reputable news sources and foundations or organizations, which combined provided 5% of informational tweets and 12% of news tweets. Personal opinion (28%), marketing (21%), and first person e-cigarette use or intent (20%) were the three most common genres of tweets, which tended to have a positive sentiment. Marketing was the most common theme (26%), and policy and government was the second most common theme (20%), with 86% of these tweets coming from everyday people and the e-cigarette community movement combined, compared to 5% of policy and government tweets coming from government, reputable news sources, and foundations or organizations combined. CONCLUSIONS: Everyday people and the e-cigarette community are dominant forces across several genres and themes, warranting continued monitoring to understand trends and their implications regarding public opinion, e-cigarette use, and smoking cessation. Analyzing social media trends is a meaningful way to inform public health practitioners of current sentiments regarding e-cigarettes, and this study contributes a replicable methodology.


Assuntos
Sistemas Eletrônicos de Liberação de Nicotina/estatística & dados numéricos , Internet/estatística & dados numéricos , Mídias Sociais/estatística & dados numéricos , Feminino , Humanos , Opinião Pública , Abandono do Hábito de Fumar , Estados Unidos
13.
EBioMedicine ; 102: 105075, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38565004

RESUMO

BACKGROUND: AI models have shown promise in performing many medical imaging tasks. However, our ability to explain what signals these models have learned is severely lacking. Explanations are needed in order to increase the trust of doctors in AI-based models, especially in domains where AI prediction capabilities surpass those of humans. Moreover, such explanations could enable novel scientific discovery by uncovering signals in the data that aren't yet known to experts. METHODS: In this paper, we present a workflow for generating hypotheses to understand which visual signals in images are correlated with a classification model's predictions for a given task. This approach leverages an automatic visual explanation algorithm followed by interdisciplinary expert review. We propose the following 4 steps: (i) Train a classifier to perform a given task to assess whether the imagery indeed contains signals relevant to the task; (ii) Train a StyleGAN-based image generator with an architecture that enables guidance by the classifier ("StylEx"); (iii) Automatically detect, extract, and visualize the top visual attributes that the classifier is sensitive towards. For visualization, we independently modify each of these attributes to generate counterfactual visualizations for a set of images (i.e., what the image would look like with the attribute increased or decreased); (iv) Formulate hypotheses for the underlying mechanisms, to stimulate future research. Specifically, present the discovered attributes and corresponding counterfactual visualizations to an interdisciplinary panel of experts so that hypotheses can account for social and structural determinants of health (e.g., whether the attributes correspond to known patho-physiological or socio-cultural phenomena, or could be novel discoveries). FINDINGS: To demonstrate the broad applicability of our approach, we present results on eight prediction tasks across three medical imaging modalities-retinal fundus photographs, external eye photographs, and chest radiographs. We showcase examples where many of the automatically-learned attributes clearly capture clinically known features (e.g., types of cataract, enlarged heart), and demonstrate automatically-learned confounders that arise from factors beyond physiological mechanisms (e.g., chest X-ray underexposure is correlated with the classifier predicting abnormality, and eye makeup is correlated with the classifier predicting low hemoglobin levels). We further show that our method reveals a number of physiologically plausible, previously-unknown attributes based on the literature (e.g., differences in the fundus associated with self-reported sex, which were previously unknown). INTERPRETATION: Our approach enables hypotheses generation via attribute visualizations and has the potential to enable researchers to better understand, improve their assessment, and extract new knowledge from AI-based models, as well as debug and design better datasets. Though not designed to infer causality, importantly, we highlight that attributes generated by our framework can capture phenomena beyond physiology or pathophysiology, reflecting the real world nature of healthcare delivery and socio-cultural factors, and hence interdisciplinary perspectives are critical in these investigations. Finally, we will release code to help researchers train their own StylEx models and analyze their predictive tasks of interest, and use the methodology presented in this paper for responsible interpretation of the revealed attributes. FUNDING: Google.


Assuntos
Algoritmos , Catarata , Humanos , Cardiomegalia , Fundo de Olho , Inteligência Artificial
14.
Lancet Digit Health ; 6(2): e126-e130, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38278614

RESUMO

Advances in machine learning for health care have brought concerns about bias from the research community; specifically, the introduction, perpetuation, or exacerbation of care disparities. Reinforcing these concerns is the finding that medical images often reveal signals about sensitive attributes in ways that are hard to pinpoint by both algorithms and people. This finding raises a question about how to best design general purpose pretrained embeddings (GPPEs, defined as embeddings meant to support a broad array of use cases) for building downstream models that are free from particular types of bias. The downstream model should be carefully evaluated for bias, and audited and improved as appropriate. However, in our view, well intentioned attempts to prevent the upstream components-GPPEs-from learning sensitive attributes can have unintended consequences on the downstream models. Despite producing a veneer of technical neutrality, the resultant end-to-end system might still be biased or poorly performing. We present reasons, by building on previously published data, to support the reasoning that GPPEs should ideally contain as much information as the original data contain, and highlight the perils of trying to remove sensitive attributes from a GPPE. We also emphasise that downstream prediction models trained for specific tasks and settings, whether developed using GPPEs or not, should be carefully designed and evaluated to avoid bias that makes models vulnerable to issues such as distributional shift. These evaluations should be done by a diverse team, including social scientists, on a diverse cohort representing the full breadth of the patient population for which the final model is intended.


Assuntos
Atenção à Saúde , Aprendizado de Máquina , Humanos , Viés , Algoritmos
15.
EClinicalMedicine ; 70: 102479, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38685924

RESUMO

Background: Artificial intelligence (AI) has repeatedly been shown to encode historical inequities in healthcare. We aimed to develop a framework to quantitatively assess the performance equity of health AI technologies and to illustrate its utility via a case study. Methods: Here, we propose a methodology to assess whether health AI technologies prioritise performance for patient populations experiencing worse outcomes, that is complementary to existing fairness metrics. We developed the Health Equity Assessment of machine Learning performance (HEAL) framework designed to quantitatively assess the performance equity of health AI technologies via a four-step interdisciplinary process to understand and quantify domain-specific criteria, and the resulting HEAL metric. As an illustrative case study (analysis conducted between October 2022 and January 2023), we applied the HEAL framework to a dermatology AI model. A set of 5420 teledermatology cases (store-and-forward cases from patients of 20 years or older, submitted from primary care providers in the USA and skin cancer clinics in Australia), enriched for diversity in age, sex and race/ethnicity, was used to retrospectively evaluate the AI model's HEAL metric, defined as the likelihood that the AI model performs better for subpopulations with worse average health outcomes as compared to others. The likelihood that AI performance was anticorrelated to pre-existing health outcomes was estimated using bootstrap methods as the probability that the negated Spearman's rank correlation coefficient (i.e., "R") was greater than zero. Positive values of R suggest that subpopulations with poorer health outcomes have better AI model performance. Thus, the HEAL metric, defined as p (R >0), measures how likely the AI technology is to prioritise performance for subpopulations with worse average health outcomes as compared to others (presented as a percentage below). Health outcomes were quantified as disability-adjusted life years (DALYs) when grouping by sex and age, and years of life lost (YLLs) when grouping by race/ethnicity. AI performance was measured as top-3 agreement with the reference diagnosis from a panel of 3 dermatologists per case. Findings: Across all dermatologic conditions, the HEAL metric was 80.5% for prioritizing AI performance of racial/ethnic subpopulations based on YLLs, and 92.1% and 0.0% respectively for prioritizing AI performance of sex and age subpopulations based on DALYs. Certain dermatologic conditions were significantly associated with greater AI model performance compared to a reference category of less common conditions. For skin cancer conditions, the HEAL metric was 73.8% for prioritizing AI performance of age subpopulations based on DALYs. Interpretation: Analysis using the proposed HEAL framework showed that the dermatology AI model prioritised performance for race/ethnicity, sex (all conditions) and age (cancer conditions) subpopulations with respect to pre-existing health disparities. More work is needed to investigate ways of promoting equitable AI performance across age for non-cancer conditions and to better understand how AI models can contribute towards improving equity in health outcomes. Funding: Google LLC.

16.
Nat Med ; 29(11): 2929-2938, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37884627

RESUMO

Artificial intelligence as a medical device is increasingly being applied to healthcare for diagnosis, risk stratification and resource allocation. However, a growing body of evidence has highlighted the risk of algorithmic bias, which may perpetuate existing health inequity. This problem arises in part because of systemic inequalities in dataset curation, unequal opportunity to participate in research and inequalities of access. This study aims to explore existing standards, frameworks and best practices for ensuring adequate data diversity in health datasets. Exploring the body of existing literature and expert views is an important step towards the development of consensus-based guidelines. The study comprises two parts: a systematic review of existing standards, frameworks and best practices for healthcare datasets; and a survey and thematic analysis of stakeholder views of bias, health equity and best practices for artificial intelligence as a medical device. We found that the need for dataset diversity was well described in literature, and experts generally favored the development of a robust set of guidelines, but there were mixed views about how these could be implemented practically. The outputs of this study will be used to inform the development of standards for transparency of data diversity in health datasets (the STANDING Together initiative).


Assuntos
Inteligência Artificial , Atenção à Saúde , Humanos , Consenso , Revisões Sistemáticas como Assunto
17.
J Health Commun ; 17 Suppl 1: 62-6, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22548600

RESUMO

The field of mHealth has made significant advances in a short period of time, demanding a more thorough and scientific approach to understanding and evaluating its progress. A recent review of mHealth literature identified two primary research needs in order for mHealth to strengthen health systems and promote healthy behaviors, namely health outcomes and cost-benefits (Mechael et al., 2010 ). In direct response to the gaps identified in mHealth research, the aim of this paper is to present the study design and highlight key observations and next steps from an evaluation of the mHealth activities within the electronic health (eHealth) architecture implemented by the Millennium Villages Project (MVP) by leveraging data generated through mobile technology itself alongside complementary qualitative research and costing assessments. The study, funded by the International Development and Research Centre (IDRC) as part of the Open Architecture Standards and Information Systems research project (OASIS II) (Sinha, 2009 ), is being implemented on data generated by 14 MVP sites in 10 Sub-Saharan African countries including more in-depth research in Ghana, Rwanda, Tanzania, and Uganda. Specific components of the study include rigorous quantitative case-control analyses and other epidemiological approaches (such as survival analysis) supplemented by in-depth qualitative interviews spread out over 18 months, as well as a costing study to assess the impact of mHealth on health outcomes, service delivery, and efficiency.


Assuntos
Serviços de Saúde Comunitária/organização & administração , Eficiência Organizacional , Avaliação de Resultados em Cuidados de Saúde , Telemedicina/métodos , África Subsaariana , Estudos de Casos e Controles , Análise Custo-Benefício , Humanos , Avaliação de Programas e Projetos de Saúde , Pesquisa Qualitativa , Projetos de Pesquisa , Telemedicina/economia
18.
Int J Qual Health Care ; 23(3): 258-68, 2011 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-21531989

RESUMO

OBJECTIVE: The aim of this study was to develop and to assess the validity and reliability of two brief questionnaires for assessing patient experiences with hospital and outpatient care in a low-income setting. DESIGN: Using literature review and data from focus groups (n = 14), we developed questionnaires to assess patient experiences with inpatient (I-PAHC) and with outpatient (O-PAHC) care in a low-income setting. Questionnaires were administered in person by trained interviewers. Construct validity was assessed with factor analysis; convergent validity was assessed by correlating summary scores for each scale with overall patient evaluations, and reliability was assessed with Cronbach's alpha coefficients. SETTING: Eight health facilities in Ethiopia. PARTICIPANTS: Patients >18 years old who had a hospital stay >1 day (n = 230), and patients who received outpatient care (n = 486). MAIN OUTCOME MEASURES: Patient evaluations of health care experiences. RESULTS: The factor analysis revealed 12 items that loaded on five factors for the I-PAHC questionnaire. The O-PAHC showed similar results with 13 items that loaded on four factors. Summary scores for nearly all factors were significantly associated (P-value < 0.05) with the patient's overall evaluation score. The measure of reliability, Cronbach's alpha coefficients, showed good to excellent internal consistency for all scales. CONCLUSIONS: The I-PAHC on O-PAHC questionnaires can be useful in assessing patients' evaluations of care delivery in low-income settings. The questionnaires are brief and can be integrated into health systems strengthening efforts with the support of leadership at the health facility and the country levels.


Assuntos
Hospitais/normas , Relações Profissional-Paciente , Qualidade da Assistência à Saúde , Inquéritos e Questionários , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Assistência Ambulatorial/normas , Atitude Frente a Saúde , Etiópia , Feminino , Grupos Focais , Pesquisas sobre Atenção à Saúde , Humanos , Masculino , Pessoa de Meia-Idade , Pobreza , Adulto Jovem
19.
Transl Behav Med ; 11(2): 495-503, 2021 03 16.
Artigo em Inglês | MEDLINE | ID: mdl-32320039

RESUMO

Digital health promises to increase intervention reach and effectiveness for a range of behavioral health outcomes. Behavioral scientists have a unique opportunity to infuse their expertise in all phases of a digital health intervention, from design to implementation. The aim of this study was to assess behavioral scientists' interests and needs with respect to digital health endeavors, as well as gather expert insight into the role of behavioral science in the evolution of digital health. The study used a two-phased approach: (a) a survey of behavioral scientists' current needs and interests with respect to digital health endeavors (n = 346); (b) a series of interviews with digital health stakeholders for their expert insight on the evolution of the health field (n = 15). In terms of current needs and interests, the large majority of surveyed behavioral scientists (77%) already participate in digital health projects, and from those who have not done so yet, the majority (65%) reported intending to do so in the future. In terms of the expected evolution of the digital health field, interviewed stakeholders anticipated a number of changes, from overall landscape changes through evolving models of reimbursement to more significant oversight and regulations. These findings provide a timely insight into behavioral scientists' current needs, barriers, and attitudes toward the use of technology in health care and public health. Results might also highlight the areas where behavioral scientists can leverage their expertise to both enhance digital health's potential to improve health, as well as to prevent the potential unintended consequences that can emerge from scaling the use of technology in health care.


Assuntos
Ciências do Comportamento , Atitude , Atenção à Saúde , Humanos , Saúde Pública , Inquéritos e Questionários
20.
Epidemiol Rev ; 32: 56-69, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-20354039

RESUMO

Mobile phone text messaging is a potentially powerful tool for behavior change because it is widely available, inexpensive, and instant. This systematic review provides an overview of behavior change interventions for disease management and prevention delivered through text messaging. Evidence on behavior change and clinical outcomes was compiled from randomized or quasi-experimental controlled trials of text message interventions published in peer-reviewed journals by June 2009. Only those interventions using text message as the primary mode of communication were included. Study quality was assessed by using a standardized measure. Seventeen articles representing 12 studies (5 disease prevention and 7 disease management) were included. Intervention length ranged from 3 months to 12 months, none had long-term follow-up, and message frequency varied. Of 9 sufficiently powered studies, 8 found evidence to support text messaging as a tool for behavior change. Effects exist across age, minority status, and nationality. Nine countries are represented in this review, but it is problematic that only one is a developing country, given potential benefits of such a widely accessible, relatively inexpensive tool for health behavior change. Methodological issues and gaps in the literature are highlighted, and recommendations for future studies are provided.


Assuntos
Telefone Celular/estatística & dados numéricos , Gerenciamento Clínico , Comportamento de Redução do Risco , Humanos , Interface Usuário-Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA