Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 89
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Proc Natl Acad Sci U S A ; 119(1)2022 01 04.
Article in English | MEDLINE | ID: mdl-34969837

ABSTRACT

The recent emergence of machine-manipulated media raises an important societal question: How can we know whether a video that we watch is real or fake? In two online studies with 15,016 participants, we present authentic videos and deepfakes and ask participants to identify which is which. We compare the performance of ordinary human observers with the leading computer vision deepfake detection model and find them similarly accurate, while making different kinds of mistakes. Together, participants with access to the model's prediction are more accurate than either alone, but inaccurate model predictions often decrease participants' accuracy. To probe the relative strengths and weaknesses of humans and machines as detectors of deepfakes, we examine human and machine performance across video-level features, and we evaluate the impact of preregistered randomized interventions on deepfake detection. We find that manipulations designed to disrupt visual processing of faces hinder human participants' performance while mostly not affecting the model's performance, suggesting a role for specialized cognitive capacities in explaining human deepfake detection performance.


Subject(s)
Artificial Intelligence , Communication , Deception , Facial Recognition , Forensic Sciences , Humans , Social Media , Video Recording
2.
Psychol Med ; 53(7): 3124-3132, 2023 May.
Article in English | MEDLINE | ID: mdl-34937601

ABSTRACT

BACKGROUND: Predicting future states of psychopathology such as depressive episodes has been a hallmark initiative in mental health research. Dynamical systems theory has proposed that rises in certain 'early warning signals' (EWSs) in time-series data (e.g. auto-correlation, temporal variance, network connectivity) may precede impending changes in disorder severity. The current study investigates whether rises in these EWSs over time are associated with future changes in disorder severity among a group of patients with major depressive disorder (MDD). METHODS: Thirty-one patients with MDD completed the study, which consisted of daily smartphone-delivered surveys over 8 weeks. Daily positive and negative affect were collected for the time-series analyses. A rolling window approach was used to determine whether rises in auto-correlation of total affect, temporal standard deviation of total affect, and overall network connectivity in individual affect items were predictive of increases in depression symptoms. RESULTS: Results suggested that rises in auto-correlation were significantly associated with worsening in depression symptoms (r = 0.41, p = 0.02). Results indicated that neither rises in temporal standard deviation (r = -0.23, p = 0.23) nor in network connectivity (r = -0.12, p = 0.59) were associated with changes in depression symptoms. CONCLUSIONS: This study more rigorously examines whether rises in EWSs were associated with future depression symptoms in a larger group of patients with MDD. Results indicated that rises in auto-correlation were the only EWS that was associated with worsening future changes in depression.


Subject(s)
Depression , Depressive Disorder, Major , Humans , Depression/psychology , Depressive Disorder, Major/psychology , Psychopathology , Time Factors , Systems Analysis
5.
7.
J Pineal Res ; 70(3): e12720, 2021 Apr.
Article in English | MEDLINE | ID: mdl-33523499

ABSTRACT

Appropriate synchronization of the timing of behaviors with the circadian clock and adequate sleep are both important for almost every physiological process. The timing of the circadian clock relative to social (ie, local) clock time and the timing of sleep can vary greatly among individuals. Whether the timing of these processes is stable within an individual is not well-understood. We examined the stability of circadian-controlled melatonin timing, sleep timing, and their interaction across ~ 100 days in 15 students at a single university. At three time points ~ 35-days apart, circadian timing was determined from the dim-light melatonin onset (DLMO). Sleep behaviors (timing and duration) and chronotype (ie, mid-sleep time on free days corrected for sleep loss on school/work days) were determined via actigraphy and analyzed in ~ 1-month bins. Melatonin timing was stable, with an almost perfect relationship strength as determined via intraclass correlation coefficients ([ICC]=0.85); average DLMO timing across all participants only changed from the first month by 21 minutes in month 2 and 5 minutes in month 3. Sleep behaviors also demonstrated high stability, with ICC relationship strengths ranging from substantial to almost perfect (ICCs = 0.65-0.85). Average DLMO was significantly associated with average chronotype (r2  = 0.53, P <.01), with chronotype displaying substantial stability across months (ICC = 0.61). These findings of a robust stability in melatonin timing and sleep behaviors in young adults living in real-world settings holds promise for a better understanding of the reliability of previous cross-sectional reports and for the future individualized strategies to combat circadian-associated disease and impaired safety (ie, "chronomedicine").


Subject(s)
Activity Cycles , Circadian Rhythm , Melatonin/metabolism , Sleep , Students , Adolescent , Age Factors , Biomarkers/metabolism , Female , Humans , Male , Saliva/metabolism , Time Factors , Young Adult
8.
Stress ; 22(4): 408-413, 2019 07.
Article in English | MEDLINE | ID: mdl-30945584

ABSTRACT

Life stress is a well-established risk factor for a variety of mental and physical health problems, including anxiety disorders, depression, chronic pain, heart disease, asthma, autoimmune diseases, and neurodegenerative disorders. The purpose of this article is to describe emerging approaches for assessing stress using speech, which we do by reviewing the methodological advantages of these digital health tools, and the validation, ethical, and privacy issues raised by these technologies. As we describe, it is now possible to assess stress via the speech signal using smartphones and smart speakers that employ software programs and artificial intelligence to analyze several features of speech and speech acoustics, including pitch, jitter, energy, rate, and length and number of pauses. Because these digital devices are ubiquitous, we can now assess individuals' stress levels in real time in almost any natural environment in which people speak. These technologies thus have great potential for advancing digital health initiatives that involve continuously monitoring changes in psychosocial functioning and disease risk over time. However, speech-based indices of stress have yet to be well-validated against stress biomarkers (e.g., cortisol, cytokines) that predict disease risk. In addition, acquiring speech samples raises the possibility that conversations intended to be private could one day be made public; moreover, obtaining real-time psychosocial risk information prompts ethical questions regarding how these data should be used for medical, commercial, and personal purposes. Although assessing stress using speech thus has enormous potential, there are critical validation, privacy, and ethical issues that must be addressed.


Subject(s)
Speech , Stress, Psychological/psychology , Depression , Humans , Hydrocortisone , Longitudinal Studies , Privacy
9.
J Med Internet Res ; 21(1): e11683, 2019 01 03.
Article in English | MEDLINE | ID: mdl-30609986

ABSTRACT

BACKGROUND: Encouraging individuals to report daily information such as unpleasant disease symptoms, daily activities and behaviors, or aspects of their physical and emotional state is difficult but necessary for many studies and clinical trials that rely on patient-reported data as primary outcomes. Use of paper diaries is the traditional method of completing daily diaries, but digital surveys are becoming the new standard because of their increased compliance; however, they still fall short of desired compliance levels. OBJECTIVE: Mobile games using in-game rewards offer the opportunity to increase compliance above the rates of digital diaries and paper diaries. We conducted a 5-week randomized control trial to compare the completion rates of a daily diary across 3 conditions: a paper-based participant-reported outcome diary (Paper PRO), an electronic-based participant-reported outcome diary (ePRO), and a novel ePRO diary with in-game rewards (Game-Motivated ePRO). METHODS: We developed a novel mobile game that is a combination of the idle and pet collection genres to reward individuals who complete a daily diary with an in-game reward. Overall, 197 individuals aged 6 to 24 years (male: 100 and female: 97) were enrolled in a 5-week study after being randomized into 1 of the 3 methods of daily diary completion. Moreover, 157 participants (male: 84 and female: 69) completed at least one diary and were subsequently included in analysis of compliance rates. RESULTS: We observed a significant difference (F2,124=6.341; P=.002) in compliance to filling out daily diaries, with the Game-Motivated ePRO group having the highest compliance (mean completion 86.4%, SD 19.6%), followed by the ePRO group (mean completion 77.7%, SD 24.1%), and finally, the Paper PRO group (mean completion 70.6%, SD 23.4%). The Game-Motivated ePRO (P=.002) significantly improved compliance rates above the Paper PRO. In addition, the Game-Motivated ePRO resulted in higher compliance rates than the rates of ePRO alone (P=.09). Equally important, even though we observed significant differences in completion of daily diaries between groups, we did not observe any statistically significant differences in association between the responses to a daily mood question and study group, the average diary completion time (P=.52), or the System Usability Scale score (P=.88). CONCLUSIONS: The Game-Motivated ePRO system encouraged individuals to complete the daily diaries above the compliance rates of the Paper PRO and ePRO without altering the participants' responses. TRIAL REGISTRATION: ClinicalTrials.gov NCT03738254; http://clinicaltrials.gov/ct2/show/NCT03738254 (Archived by WebCite at http://www.webcitation.org/74T1p8u52).


Subject(s)
Mobile Applications/trends , Self Report/standards , Video Games/psychology , Adolescent , Adult , Child , Female , Humans , Male , Motivation , Patient Compliance , Reward , Surveys and Questionnaires , Young Adult
11.
Epilepsia ; 59(5): 1020-1026, 2018 05.
Article in English | MEDLINE | ID: mdl-29604050

ABSTRACT

OBJECTIVE: Common data elements (CDEs) are currently unavailable for mobile health (mHealth) in epilepsy devices and related applications. As a result, despite expansive growth of new digital services for people with epilepsy, information collected is often not interoperable or directly comparable. We aim to correct this problem through development of industry-wide standards for mHealth epilepsy data. METHODS: Using a group of stakeholders from industry, academia, and patient advocacy organizations, we offer a consensus statement for the elements that may facilitate communication among different systems. RESULTS: A consensus statement is presented for epilepsy mHealth CDEs. SIGNIFICANCE: Although it is not exclusive, we believe that the use of a minimal common information denominator, specifically these CDEs, will promote innovation, accelerate scientific discovery, and enhance clinical usage across applications and devices in the epilepsy mHealth space. As a consequence, people with epilepsy will have greater flexibility and ultimately more powerful tools to improve their lives.


Subject(s)
Common Data Elements/standards , Epilepsy , Neurology/standards , Telemedicine/standards , Terminology as Topic , Humans
12.
Depress Anxiety ; 35(7): 601-608, 2018 07.
Article in English | MEDLINE | ID: mdl-29637663

ABSTRACT

BACKGROUND: To examine whether there are subtypes of suicidal thinking using real-time digital monitoring, which allows for the measurement of such thoughts with greater temporal granularity than ever before possible. METHODS: We used smartphone-based real-time monitoring to assess suicidal thoughts four times per day in two samples: Adults who attempted suicide in the past year recruited from online forums (n = 51 participants with a total of 2,889 responses, surveyed over 28 days; ages ranged from 18 to 38 years) and psychiatric inpatients with recent suicidal ideation or attempts (n = 32 participants with a total of 640 responses, surveyed over the duration of inpatient treatment [mean stay = 8.79 days], ages ranged 23-68 years). Latent profile analyses were used to identify distinct phenotypes of suicidal thinking based on the frequency, intensity, and variability of such thoughts. RESULTS: Across both samples, five distinct phenotypes of suicidal thinking emerged that differed primarily on the intensity and variability of suicidal thoughts. Participants whose profile was characterized by more severe, persistent suicidal thoughts (i.e., higher mean and lower variability around the mean) were most likely to have made a recent suicide attempt. CONCLUSIONS: Suicidal thinking has historically been studied as a homogeneous construct, but using newly available monitoring technology we discovered five profiles of suicidal thinking. Key questions for future research include how these phenotypes prospectively relate to future suicidal behaviors, and whether they represent remain stable or trait-like over longer periods.


Subject(s)
Ecological Momentary Assessment , Smartphone , Suicidal Ideation , Suicide, Attempted/psychology , Adolescent , Adult , Aged , Female , Humans , Inpatients , Male , Middle Aged , Outpatients , Phenotype , Psychiatric Department, Hospital , Surveys and Questionnaires , Young Adult
13.
J Med Internet Res ; 20(2): e49, 2018 02 09.
Article in English | MEDLINE | ID: mdl-29426812

ABSTRACT

We describe an initiative to bring mental health researchers, computer scientists, human-computer interaction researchers, and other communities together to address the challenges of the global mental ill health epidemic. Two face-to-face events and one special issue of the Journal of Medical Internet Research were organized. The works presented in these events and publication reflect key state-of-the-art research in this interdisciplinary collaboration. We summarize the special issue articles and contextualize them to present a picture of the most recent research. In addition, we describe a series of collaborative activities held during the second symposium and where the community identified 5 challenges and their possible solutions.


Subject(s)
Biomedical Research/methods , Interdisciplinary Placement/methods , Mental Health/standards , Humans
14.
J Med Internet Res ; 20(6): e210, 2018 06 08.
Article in English | MEDLINE | ID: mdl-29884610

ABSTRACT

BACKGROUND: Wearable and mobile devices that capture multimodal data have the potential to identify risk factors for high stress and poor mental health and to provide information to improve health and well-being. OBJECTIVE: We developed new tools that provide objective physiological and behavioral measures using wearable sensors and mobile phones, together with methods that improve their data integrity. The aim of this study was to examine, using machine learning, how accurately these measures could identify conditions of self-reported high stress and poor mental health and which of the underlying modalities and measures were most accurate in identifying those conditions. METHODS: We designed and conducted the 1-month SNAPSHOT study that investigated how daily behaviors and social networks influence self-reported stress, mood, and other health or well-being-related factors. We collected over 145,000 hours of data from 201 college students (age: 18-25 years, male:female=1.8:1) at one university, all recruited within self-identified social groups. Each student filled out standardized pre- and postquestionnaires on stress and mental health; during the month, each student completed twice-daily electronic diaries (e-diaries), wore two wrist-based sensors that recorded continuous physical activity and autonomic physiology, and installed an app on their mobile phone that recorded phone usage and geolocation patterns. We developed tools to make data collection more efficient, including data-check systems for sensor and mobile phone data and an e-diary administrative module for study investigators to locate possible errors in the e-diaries and communicate with participants to correct their entries promptly, which reduced the time taken to clean e-diary data by 69%. We constructed features and applied machine learning to the multimodal data to identify factors associated with self-reported poststudy stress and mental health, including behaviors that can be possibly modified by the individual to improve these measures. RESULTS: We identified the physiological sensor, phone, mobility, and modifiable behavior features that were best predictors for stress and mental health classification. In general, wearable sensor features showed better classification performance than mobile phone or modifiable behavior features. Wearable sensor features, including skin conductance and temperature, reached 78.3% (148/189) accuracy for classifying students into high or low stress groups and 87% (41/47) accuracy for classifying high or low mental health groups. Modifiable behavior features, including number of naps, studying duration, calls, mobility patterns, and phone-screen-on time, reached 73.5% (139/189) accuracy for stress classification and 79% (37/47) accuracy for mental health classification. CONCLUSIONS: New semiautomated tools improved the efficiency of long-term ambulatory data collection from wearable and mobile devices. Applying machine learning to the resulting data revealed a set of both objective features and modifiable behavioral features that could classify self-reported high or low stress and mental health groups in a college student population better than previous studies and showed new insights into digital phenotyping.


Subject(s)
Cell Phone/instrumentation , Mental Health/standards , Wearable Electronic Devices/psychology , Adolescent , Adult , Female , Humans , Male , Observational Studies as Topic , Self Report , Young Adult
15.
Sensors (Basel) ; 18(4)2018 Apr 05.
Article in English | MEDLINE | ID: mdl-29621133

ABSTRACT

Smartphones and wearable sensors have enabled unprecedented data collection, with many products now providing feedback to users about recommended step counts or sleep durations. However, these recommendations do not provide personalized insights that have been shown to be best suited for a specific individual. A scientific way to find individualized recommendations and causal links is to conduct experiments using single-case experimental design; however, properly designed single-case experiments are not easy to conduct on oneself. We designed, developed, and evaluated a novel platform, QuantifyMe, for novice self-experimenters to conduct proper-methodology single-case self-experiments in an automated and scientific manner using their smartphones. We provide software for the platform that we used (available for free on GitHub), which provides the methodological elements to run many kinds of customized studies. In this work, we evaluate its use with four different kinds of personalized investigations, examining how variables such as sleep duration and regularity, activity, and leisure time affect personal happiness, stress, productivity, and sleep efficiency. We conducted a six-week pilot study (N = 13) to evaluate QuantifyMe. We describe the lessons learned developing the platform and recommendations for its improvement, as well as its potential for enabling personalized insights to be scientifically evaluated in many individuals, reducing the high administrative cost for advancing human health and wellbeing.

16.
Epilepsia ; 58(11): 1870-1879, 2017 11.
Article in English | MEDLINE | ID: mdl-28980315

ABSTRACT

OBJECTIVE: New devices are needed for monitoring seizures, especially those associated with sudden unexpected death in epilepsy (SUDEP). They must be unobtrusive and automated, and provide false alarm rates (FARs) bearable in everyday life. This study quantifies the performance of new multimodal wrist-worn convulsive seizure detectors. METHODS: Hand-annotated video-electroencephalographic seizure events were collected from 69 patients at six clinical sites. Three different wristbands were used to record electrodermal activity (EDA) and accelerometer (ACM) signals, obtaining 5,928 h of data, including 55 convulsive epileptic seizures (six focal tonic-clonic seizures and 49 focal to bilateral tonic-clonic seizures) from 22 patients. Recordings were analyzed offline to train and test two new machine learning classifiers and a published classifier based on EDA and ACM. Moreover, wristband data were analyzed to estimate seizure-motion duration and autonomic responses. RESULTS: The two novel classifiers consistently outperformed the previous detector. The most efficient (Classifier III) yielded sensitivity of 94.55%, and an FAR of 0.2 events/day. No nocturnal seizures were missed. Most patients had <1 false alarm every 4 days, with an FAR below their seizure frequency. When increasing the sensitivity to 100% (no missed seizures), the FAR is up to 13 times lower than with the previous detector. Furthermore, all detections occurred before the seizure ended, providing reasonable latency (median = 29.3 s, range = 14.8-151 s). Automatically estimated seizure durations were correlated with true durations, enabling reliable annotations. Finally, EDA measurements confirmed the presence of postictal autonomic dysfunction, exhibiting a significant rise in 73% of the convulsive seizures. SIGNIFICANCE: The proposed multimodal wrist-worn convulsive seizure detectors provide seizure counts that are more accurate than previous automated detectors and typical patient self-reports, while maintaining a tolerable FAR for ambulatory monitoring. Furthermore, the multimodal system provides an objective description of motor behavior and autonomic dysfunction, aimed at enriching seizure characterization, with potential utility for SUDEP warning.


Subject(s)
Electroencephalography/methods , Monitoring, Ambulatory/methods , Seizures/diagnosis , Seizures/physiopathology , Adolescent , Adult , Child , Child, Preschool , Electroencephalography/instrumentation , Female , Humans , Male , Middle Aged , Monitoring, Ambulatory/instrumentation , Retrospective Studies , Wrist , Young Adult
17.
J Med Internet Res ; 17(3): e72, 2015 Mar 30.
Article in English | MEDLINE | ID: mdl-25835472

ABSTRACT

BACKGROUND: Self-guided, Web-based interventions for depression show promising results but suffer from high attrition and low user engagement. Online peer support networks can be highly engaging, but they show mixed results and lack evidence-based content. OBJECTIVE: Our aim was to introduce and evaluate a novel Web-based, peer-to-peer cognitive reappraisal platform designed to promote evidence-based techniques, with the hypotheses that (1) repeated use of the platform increases reappraisal and reduces depression and (2) that the social, crowdsourced interactions enhance engagement. METHODS: Participants aged 18-35 were recruited online and were randomly assigned to the treatment group, "Panoply" (n=84), or an active control group, online expressive writing (n=82). Both are fully automated Web-based platforms. Participants were asked to use their assigned platform for a minimum of 25 minutes per week for 3 weeks. Both platforms involved posting descriptions of stressful thoughts and situations. Participants on the Panoply platform additionally received crowdsourced reappraisal support immediately after submitting a post (median response time=9 minutes). Panoply participants could also practice reappraising stressful situations submitted by other users. Online questionnaires administered at baseline and 3 weeks assessed depression symptoms, reappraisal, and perseverative thinking. Engagement was assessed through self-report measures, session data, and activity levels. RESULTS: The Panoply platform produced significant improvements from pre to post for depression (P=.001), reappraisal (P<.001), and perseverative thinking (P<.001). The expressive writing platform yielded significant pre to post improvements for depression (P=.02) and perseverative thinking (P<.001), but not reappraisal (P=.45). The two groups did not diverge significantly at post-test on measures of depression or perseverative thinking, though Panoply users had significantly higher reappraisal scores (P=.02) than expressive writing. We also found significant group by treatment interactions. Individuals with elevated depression symptoms showed greater comparative benefit from Panoply for depression (P=.02) and perseverative thinking (P=.008). Individuals with baseline reappraisal deficits showed greater comparative benefit from Panoply for depression (P=.002) and perseverative thinking (P=.002). Changes in reappraisal mediated the effects of Panoply, but not the expressive writing platform, for both outcomes of depression (ab=-1.04, SE 0.58, 95% CI -2.67 to -.12) and perseverative thinking (ab=-1.02, SE 0.61, 95% CI -2.88 to -.20). Dropout rates were similar for the two platforms; however, Panoply yielded significantly more usage activity (P<.001) and significantly greater user experience scores (P<.001). CONCLUSIONS: Panoply engaged its users and was especially helpful for depressed individuals and for those who might ordinarily underutilize reappraisal techniques. Further investigation is needed to examine the long-term effects of such a platform and whether the benefits generalize to a more diverse population of users. TRIAL REGISTRATION: ClinicalTrials.gov NCT02302248; https://clinicaltrials.gov/ct2/show/NCT02302248 (Archived by WebCite at http://www.webcitation.org/6Wtkj6CXU).


Subject(s)
Crowdsourcing/methods , Depression/therapy , Internet , Adolescent , Adult , Depression/psychology , Female , Humans , Male , Surveys and Questionnaires , Young Adult
18.
medRxiv ; 2024 May 16.
Article in English | MEDLINE | ID: mdl-38798669

ABSTRACT

Work is ongoing to advance seizure forecasting, but the performance metrics used to evaluate model effectiveness can sometimes lead to misleading outcomes. For example, some metrics improve when tested on patients with a particular range of seizure frequencies (SF). This study illustrates the connection between SF and metrics. Additionally, we compared benchmarks for testing performance: a moving average (MA) or the commonly used permutation benchmark. Three data sets were used for the evaluations: (1) Self-reported seizure diaries of 3,994 Seizure Tracker patients; (2) Automatically detected (and sometimes manually reported or edited) generalized tonic-clonic seizures from 2,350 Empatica Embrace 2 and Mate App seizure diary users, and (3) Simulated datasets with varying SFs. Metrics of calibration and discrimination were computed for each dataset, comparing MA and permutation performance across SF values. Most metrics were found to depend on SF. The MA model outperformed or matched the permutation model in all cases. The findings highlight SF's role in seizure forecasting accuracy and the MA model's suitability as a benchmark. This underscores the need for considering patient SF in forecasting studies and suggests the MA model may provide a better standard for evaluating future seizure forecasting models.

19.
J Autism Dev Disord ; 2024 Apr 13.
Article in English | MEDLINE | ID: mdl-38613592

ABSTRACT

PURPOSE: Non-verbal utterances are an important tool of communication for individuals who are non- or minimally-speaking. While these utterances are typically understood by caregivers, they can be challenging to interpret by their larger community. To date, there has been little work done to detect and characterize the vocalizations produced by non- or minimally-speaking individuals. This paper aims to characterize five categories of utterances across a set of 7 non- or minimally-speaking individuals. METHODS: The characterization is accomplished using a correlation structure methodology, acting as a proxy measurement for motor coordination, to localize similarities and differences to specific speech production systems. RESULTS: We specifically find that frustrated and dysregulated utterances show similar correlation structure outputs, especially when compared to self-talk, request, and delighted utterances. We additionally witness higher complexity of coordination between articulatory and respiratory subsystems and lower complexity of coordination between laryngeal and respiratory subsystems in frustration and dysregulation as compared to self-talk, request, and delight. Finally, we observe lower complexity of coordination across all three speech subsystems in the request utterances as compared to self-talk and delight. CONCLUSION: The insights from this work aid in understanding of the modifications made by non- or minimally-speaking individuals to accomplish specific goals in non-verbal communication.

20.
Nat Med ; 30(2): 573-583, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38317019

ABSTRACT

Although advances in deep learning systems for image-based medical diagnosis demonstrate their potential to augment clinical decision-making, the effectiveness of physician-machine partnerships remains an open question, in part because physicians and algorithms are both susceptible to systematic errors, especially for diagnosis of underrepresented populations. Here we present results from a large-scale digital experiment involving board-certified dermatologists (n = 389) and primary-care physicians (n = 459) from 39 countries to evaluate the accuracy of diagnoses submitted by physicians in a store-and-forward teledermatology simulation. In this experiment, physicians were presented with 364 images spanning 46 skin diseases and asked to submit up to four differential diagnoses. Specialists and generalists achieved diagnostic accuracies of 38% and 19%, respectively, but both specialists and generalists were four percentage points less accurate for the diagnosis of images of dark skin as compared to light skin. Fair deep learning system decision support improved the diagnostic accuracy of both specialists and generalists by more than 33%, but exacerbated the gap in the diagnostic accuracy of generalists across skin tones. These results demonstrate that well-designed physician-machine partnerships can enhance the diagnostic accuracy of physicians, illustrating that success in improving overall diagnostic accuracy does not necessarily address bias.


Subject(s)
Deep Learning , Skin Diseases , Humans , Skin Pigmentation , Skin Diseases/diagnosis , Algorithms , Diagnosis, Differential
SELECTION OF CITATIONS
SEARCH DETAIL