Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 43
Filtrar
1.
Med Image Comput Comput Assist Interv ; 13438: 130-139, 2022 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-36342887

RESUMO

Parkinson's disease (PD) is a neurological disorder that has a variety of observable motor-related symptoms such as slow movement, tremor, muscular rigidity, and impaired posture. PD is typically diagnosed by evaluating the severity of motor impairments according to scoring systems such as the Movement Disorder Society Unified Parkinson's Disease Rating Scale (MDS-UPDRS). Automated severity prediction using video recordings of individuals provides a promising route for non-intrusive monitoring of motor impairments. However, the limited size of PD gait data hinders model ability and clinical potential. Because of this clinical data scarcity and inspired by the recent advances in self-supervised large-scale language models like GPT-3, we use human motion forecasting as an effective self-supervised pre-training task for the estimation of motor impairment severity. We introduce GaitForeMer, Gait Forecasting and impairment estimation transforMer, which is first pre-trained on public datasets to forecast gait movements and then applied to clinical data to predict MDS-UPDRS gait impairment severity. Our method outperforms previous approaches that rely solely on clinical data by a large margin, achieving an F1 score of 0.76, precision of 0.79, and recall of 0.75. Using GaitForeMer, we show how public human movement data repositories can assist clinical use cases through learning universal motion representations. The code is available at https://github.com/markendo/GaitForeMer.

2.
Proc Natl Acad Sci U S A ; 119(39): e2115730119, 2022 09 27.
Artigo em Inglês | MEDLINE | ID: mdl-36122244

RESUMO

Regardless of how much data artificial intelligence agents have available, agents will inevitably encounter previously unseen situations in real-world deployments. Reacting to novel situations by acquiring new information from other people-socially situated learning-is a core faculty of human development. Unfortunately, socially situated learning remains an open challenge for artificial intelligence agents because they must learn how to interact with people to seek out the information that they lack. In this article, we formalize the task of socially situated artificial intelligence-agents that seek out new information through social interactions with people-as a reinforcement learning problem where the agent learns to identify meaningful and informative questions via rewards observed through social interaction. We manifest our framework as an interactive agent that learns how to ask natural language questions about photos as it broadens its visual intelligence on a large photo-sharing social network. Unlike active-learning methods, which implicitly assume that humans are oracles willing to answer any question, our agent adapts its behavior based on observed norms of which questions people are or are not interested to answer. Through an 8-mo deployment where our agent interacted with 236,000 social media users, our agent improved its performance at recognizing new visual information by 112%. A controlled field experiment confirmed that our agent outperformed an active-learning baseline by 25.6%. This work advances opportunities for continuously improving artificial intelligence (AI) agents that better respect norms in open social environments.


Assuntos
Inteligência Artificial , Reforço Psicológico , Interação Social , Humanos , Recompensa , Normas Sociais
3.
Artigo em Inglês | MEDLINE | ID: mdl-36624800

RESUMO

Federated learning is an emerging research paradigm enabling collaborative training of machine learning models among different organizations while keeping data private at each institution. Despite recent progress, there remain fundamental challenges such as the lack of convergence and the potential for catastrophic forgetting across real-world heterogeneous devices. In this paper, we demonstrate that self-attention-based architectures (e.g., Transformers) are more robust to distribution shifts and hence improve federated learning over heterogeneous data. Concretely, we conduct the first rigorous empirical investigation of different neural architectures across a range of federated algorithms, real-world benchmarks, and heterogeneous data splits. Our experiments show that simply replacing convolutional networks with Transformers can greatly reduce catastrophic forgetting of previous devices, accelerate convergence, and reach a better global model, especially when dealing with heterogeneous data. We release our code and pretrained models to encourage future exploration in robust architectures as an alternative to current research efforts on the optimization front.

4.
Artigo em Inglês | MEDLINE | ID: mdl-34776724

RESUMO

Batch Normalization (BN) and its variants have delivered tremendous success in combating the covariate shift induced by the training step of deep learning methods. While these techniques normalize feature distributions by standardizing with batch statistics, they do not correct the influence on features from extraneous variables or multiple distributions. Such extra variables, referred to as metadata here, may create bias or confounding effects (e.g., race when classifying gender from face images). We introduce the Metadata Normalization (MDN) layer, a new batch-level operation which can be used end-to-end within the training framework, to correct the influence of metadata on feature distributions. MDN adopts a regression analysis technique traditionally used for preprocessing to remove (regress out) the metadata effects on model features during training. We utilize a metric based on distance correlation to quantify the distribution bias from the metadata and demonstrate that our method successfully removes metadata effects on four diverse settings: one synthetic, one 2D image, one video, and one 3D medical image dataset.

5.
Nat Commun ; 12(1): 5721, 2021 10 06.
Artigo em Inglês | MEDLINE | ID: mdl-34615862

RESUMO

The intertwined processes of learning and evolution in complex environmental niches have resulted in a remarkable diversity of morphological forms. Moreover, many aspects of animal intelligence are deeply embodied in these evolved morphologies. However, the principles governing relations between environmental complexity, evolved morphology, and the learnability of intelligent control, remain elusive, because performing large-scale in silico experiments on evolution and learning is challenging. Here, we introduce Deep Evolutionary Reinforcement Learning (DERL): a computational framework which can evolve diverse agent morphologies to learn challenging locomotion and manipulation tasks in complex environments. Leveraging DERL we demonstrate several relations between environmental complexity, morphological intelligence and the learnability of control. First, environmental complexity fosters the evolution of morphological intelligence as quantified by the ability of a morphology to facilitate the learning of novel tasks. Second, we demonstrate a morphological Baldwin effect i.e., in our simulations evolution rapidly selects morphologies that learn faster, thereby enabling behaviors learned late in the lifetime of early ancestors to be expressed early in the descendants lifetime. Third, we suggest a mechanistic basis for the above relationships through the evolution of morphologies that are more physically stable and energy efficient, and can therefore facilitate learning and control.


Assuntos
Evolução Biológica , Aprendizado Profundo , Recompensa , Animais , Simulação por Computador
6.
IEEE Winter Conf Appl Comput Vis ; 2021: 2512-2522, 2021 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-34522832

RESUMO

Presence of bias (in datasets or tasks) is inarguably one of the most critical challenges in machine learning applications that has alluded to pivotal debates in recent years. Such challenges range from spurious associations between variables in medical studies to the bias of race in gender or face recognition systems. Controlling for all types of biases in the dataset curation stage is cumbersome and sometimes impossible. The alternative is to use the available data and build models incorporating fair representation learning. In this paper, we propose such a model based on adversarial training with two competing objectives to learn features that have (1) maximum discriminative power with respect to the task and (2) minimal statistical mean dependence with the protected (bias) variable(s). Our approach does so by incorporating a new adversarial loss function that encourages a vanished correlation between the bias and the learned features. We apply our method to synthetic data, medical images (containing task bias), and a dataset for gender classification (containing dataset bias). Our results show that the learned features by our method not only result in superior prediction performance but also are unbiased.

7.
Med Image Anal ; 73: 102179, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34340101

RESUMO

Parkinson's disease (PD) is a brain disorder that primarily affects motor function, leading to slow movement, tremor, and stiffness, as well as postural instability and difficulty with walking/balance. The severity of PD motor impairments is clinically assessed by part III of the Movement Disorder Society Unified Parkinson's Disease Rating Scale (MDS-UPDRS), a universally-accepted rating scale. However, experts often disagree on the exact scoring of individuals. In the presence of label noise, training a machine learning model using only scores from a single rater may introduce bias, while training models with multiple noisy ratings is a challenging task due to the inter-rater variabilities. In this paper, we introduce an ordinal focal neural network to estimate the MDS-UPDRS scores from input videos, to leverage the ordinal nature of MDS-UPDRS scores and combat class imbalance. To handle multiple noisy labels per exam, the training of the network is regularized via rater confusion estimation (RCE), which encodes the rating habits and skills of raters via a confusion matrix. We apply our pipeline to estimate MDS-UPDRS test scores from their video recordings including gait (with multiple Raters, R=3) and finger tapping scores (single rater). On a sizable clinical dataset for the gait test (N=55), we obtained a classification accuracy of 72% with majority vote as ground-truth, and an accuracy of ∼84% of our model predicting at least one of the raters' scores. Our work demonstrates how computer-assisted technologies can be used to track patients and their motor impairments, even when there is uncertainty in the clinical ratings. The latest version of the code will be available at https://github.com/mlu355/PD-Motor-Severity-Estimation.


Assuntos
Doença de Parkinson , Humanos , Testes de Estado Mental e Demência , Movimento , Doença de Parkinson/diagnóstico por imagem , Índice de Gravidade de Doença , Incerteza
8.
Lancet Digit Health ; 3(2): e115-e123, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33358138

RESUMO

Ambient intelligence is increasingly finding applications in health-care settings, such as helping to ensure clinician and patient safety by monitoring staff compliance with clinical best practices or relieving staff of burdensome documentation tasks. Ambient intelligence involves using contactless sensors and contact-based wearable devices embedded in health-care settings to collect data (eg, imaging data of physical spaces, audio data, or body temperature), coupled with machine learning algorithms to efficiently and effectively interpret these data. Despite the promise of ambient intelligence to improve quality of care, the continuous collection of large amounts of sensor data in health-care settings presents ethical challenges, particularly in terms of privacy, data management, bias and fairness, and informed consent. Navigating these ethical issues is crucial not only for the success of individual uses, but for acceptance of the field as a whole.


Assuntos
Inteligência Ambiental , Temas Bioéticos , Gerenciamento de Dados/ética , Assistência ao Paciente/ética , Telemedicina/ética , Telemetria/ética , Algoritmos , Coleta de Dados , Tecnologia Digital , Documentação/métodos , Pessoal de Saúde , Humanos , Consentimento Livre e Esclarecido , Aprendizado de Máquina , Assistência ao Paciente/métodos , Segurança do Paciente , Guias de Prática Clínica como Assunto , Privacidade , Qualidade da Assistência à Saúde , Telemedicina/métodos , Telemetria/métodos , Dispositivos Eletrônicos Vestíveis
9.
Med Image Comput Comput Assist Interv ; 12263: 637-647, 2020 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-33103164

RESUMO

Parkinson's disease (PD) is a progressive neurological disorder primarily affecting motor function resulting in tremor at rest, rigidity, bradykinesia, and postural instability. The physical severity of PD impairments can be quantified through the Movement Disorder Society Unified Parkinson's Disease Rating Scale (MDS-UPDRS), a widely used clinical rating scale. Accurate and quantitative assessment of disease progression is critical to developing a treatment that slows or stops further advancement of the disease. Prior work has mainly focused on dopamine transport neuroimaging for diagnosis or costly and intrusive wearables evaluating motor impairments. For the first time, we propose a computer vision-based model that observes non-intrusive video recordings of individuals, extracts their 3D body skeletons, tracks them through time, and classifies the movements according to the MDS-UPDRS gait scores. Experimental results show that our proposed method performs significantly better than chance and competing methods with an F 1-score of 0.83 and a balanced accuracy of 81%. This is the first benchmark for classifying PD patients based on MDS-UPDRS gait severity and could be an objective biomarker for disease severity. Our work demonstrates how computer-assisted technologies can be used to non-intrusively monitor patients and their motor impairments. The code is available at https://github.com/mlu355/PD-Motor-Severity-Estimation.

10.
Nature ; 585(7824): 193-202, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32908264

RESUMO

Advances in machine learning and contactless sensors have given rise to ambient intelligence-physical spaces that are sensitive and responsive to the presence of humans. Here we review how this technology could improve our understanding of the metaphorically dark, unobserved spaces of healthcare. In hospital spaces, early applications could soon enable more efficient clinical workflows and improved patient safety in intensive care units and operating rooms. In daily living spaces, ambient intelligence could prolong the independence of older individuals and improve the management of individuals with a chronic disease by understanding everyday behaviour. Similar to other technologies, transformation into clinical applications at scale must overcome challenges such as rigorous clinical validation, appropriate data privacy and model transparency. Thoughtful use of this technology would enable us to understand the complex interplay between the physical environment and health-critical human behaviours.


Assuntos
Inteligência Ambiental , Atenção à Saúde/métodos , Monitoramento Ambiental/métodos , Algoritmos , Doença Crônica/terapia , Atenção à Saúde/normas , Unidades Hospitalares , Humanos , Saúde Mental , Segurança do Paciente , Privacidade
11.
J Am Med Inform Assoc ; 27(8): 1316-1320, 2020 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-32712656

RESUMO

OBJECTIVE: Hand hygiene is essential for preventing hospital-acquired infections but is difficult to accurately track. The gold-standard (human auditors) is insufficient for assessing true overall compliance. Computer vision technology has the ability to perform more accurate appraisals. Our primary objective was to evaluate if a computer vision algorithm could accurately observe hand hygiene dispenser use in images captured by depth sensors. MATERIALS AND METHODS: Sixteen depth sensors were installed on one hospital unit. Images were collected continuously from March to August 2017. Utilizing a convolutional neural network, a machine learning algorithm was trained to detect hand hygiene dispenser use in the images. The algorithm's accuracy was then compared with simultaneous in-person observations of hand hygiene dispenser usage. Concordance rate between human observation and algorithm's assessment was calculated. Ground truth was established by blinded annotation of the entire image set. Sensitivity and specificity were calculated for both human and machine-level observation. RESULTS: A concordance rate of 96.8% was observed between human and algorithm (kappa = 0.85). Concordance among the 3 independent auditors to establish ground truth was 95.4% (Fleiss's kappa = 0.87). Sensitivity and specificity of the machine learning algorithm were 92.1% and 98.3%, respectively. Human observations showed sensitivity and specificity of 85.2% and 99.4%, respectively. CONCLUSIONS: A computer vision algorithm was equivalent to human observation in detecting hand hygiene dispenser use. Computer vision monitoring has the potential to provide a more complete appraisal of hand hygiene activity in hospitals than the current gold-standard given its ability for continuous coverage of a unit in space and time.


Assuntos
Algoritmos , Higiene das Mãos , Processamento de Imagem Assistida por Computador , Gravação em Vídeo , California , Infecção Hospitalar/prevenção & controle , Hospitais Pediátricos , Humanos , Controle de Infecções , Aprendizado de Máquina , Redes Neurais de Computação , Recursos Humanos em Hospital
12.
Trends Cogn Sci ; 24(9): 675-678, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32624386

RESUMO

We propose that developmental cognitive science should invest in an online CRADLE, a Collaboration for Reproducible and Distributed Large-Scale Experiments that crowdsources data from families participating on the internet. Here, we discuss how the field can work together to further expand and unify current prototypes for the benefit of researchers, science, and society.


Assuntos
Internet , Pesquisadores , Humanos
13.
NPJ Digit Med ; 3: 82, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32550644

RESUMO

Accurate transcription of audio recordings in psychotherapy would improve therapy effectiveness, clinician training, and safety monitoring. Although automatic speech recognition software is commercially available, its accuracy in mental health settings has not been well described. It is unclear which metrics and thresholds are appropriate for different clinical use cases, which may range from population descriptions to individual safety monitoring. Here we show that automatic speech recognition is feasible in psychotherapy, but further improvements in accuracy are needed before widespread use. Our HIPAA-compliant automatic speech recognition system demonstrated a transcription word error rate of 25%. For depression-related utterances, sensitivity was 80% and positive predictive value was 83%. For clinician-identified harm-related sentences, the word error rate was 34%. These results suggest that automatic speech recognition may support understanding of language patterns and subgroup variation in existing treatments but may not be ready for individual-level safety surveillance.

14.
NPJ Digit Med ; 2: 11, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31304360

RESUMO

Early and frequent patient mobilization substantially mitigates risk for post-intensive care syndrome and long-term functional impairment. We developed and tested computer vision algorithms to detect patient mobilization activities occurring in an adult ICU. Mobility activities were defined as moving the patient into and out of bed, and moving the patient into and out of a chair. A data set of privacy-safe-depth-video images was collected in the Intermountain LDS Hospital ICU, comprising 563 instances of mobility activities and 98,801 total frames of video data from seven wall-mounted depth sensors. In all, 67% of the mobility activity instances were used to train algorithms to detect mobility activity occurrence and duration, and the number of healthcare personnel involved in each activity. The remaining 33% of the mobility instances were used for algorithm evaluation. The algorithm for detecting mobility activities attained a mean specificity of 89.2% and sensitivity of 87.2% over the four activities; the algorithm for quantifying the number of personnel involved attained a mean accuracy of 68.8%.

15.
Proc IEEE Int Conf Comput Vis ; 2019: 2580-2590, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-32218709

RESUMO

Visual knowledge bases such as Visual Genome power numerous applications in computer vision, including visual question answering and captioning, but suffer from sparse, incomplete relationships. All scene graph models to date are limited to training on a small set of visual relationships that have thousands of training labels each. Hiring human annotators is expensive, and using textual knowledge base completion methods are incompatible with visual data. In this paper, we introduce a semi-supervised method that assigns probabilistic relationship labels to a large number of unlabeled images using few' labeled examples. We analyze visual relationships to suggest two types of image-agnostic features that are used to generate noisy heuristics, whose outputs are aggregated using a factor graph-based generative model. With as few as 10 labeled examples per relationship, the generative model creates enough training data to train any existing state-of-the-art scene graph model. We demonstrate that our method outperforms all baseline approaches on scene graph prediction by 5.16 recall@ 100 for PREDCLS. In our limited label setting, we define a complexity metric for relationships that serves as an indicator (R2 = 0.778) for conditions under which our method succeeds over transfer learning, the de-facto approach for training with limited labels.

17.
Elife ; 72018 03 07.
Artigo em Inglês | MEDLINE | ID: mdl-29513219

RESUMO

Inherent correlations between visual and semantic features in real-world scenes make it difficult to determine how different scene properties contribute to neural representations. Here, we assessed the contributions of multiple properties to scene representation by partitioning the variance explained in human behavioral and brain measurements by three feature models whose inter-correlations were minimized a priori through stimulus preselection. Behavioral assessments of scene similarity reflected unique contributions from a functional feature model indicating potential actions in scenes as well as high-level visual features from a deep neural network (DNN). In contrast, similarity of cortical responses in scene-selective areas was uniquely explained by mid- and high-level DNN features only, while an object label model did not contribute uniquely to either domain. The striking dissociation between functional and DNN features in their contribution to behavioral and brain representations of scenes indicates that scene-selective cortex represents only a subset of behaviorally relevant scene information.


Assuntos
Encéfalo/fisiologia , Rede Nervosa/fisiologia , Reconhecimento Visual de Modelos/fisiologia , Percepção Visual/fisiologia , Adulto , Encéfalo/diagnóstico por imagem , Mapeamento Encefálico/métodos , Feminino , Humanos , Imageamento por Ressonância Magnética , Masculino , Rede Nervosa/diagnóstico por imagem , Estimulação Luminosa , Semântica
18.
Proc Natl Acad Sci U S A ; 114(50): 13108-13113, 2017 12 12.
Artigo em Inglês | MEDLINE | ID: mdl-29183967

RESUMO

The United States spends more than $250 million each year on the American Community Survey (ACS), a labor-intensive door-to-door study that measures statistics relating to race, gender, education, occupation, unemployment, and other demographic factors. Although a comprehensive source of data, the lag between demographic changes and their appearance in the ACS can exceed several years. As digital imagery becomes ubiquitous and machine vision techniques improve, automated data analysis may become an increasingly practical supplement to the ACS. Here, we present a method that estimates socioeconomic characteristics of regions spanning 200 US cities by using 50 million images of street scenes gathered with Google Street View cars. Using deep learning-based computer vision techniques, we determined the make, model, and year of all motor vehicles encountered in particular neighborhoods. Data from this census of motor vehicles, which enumerated 22 million automobiles in total (8% of all automobiles in the United States), were used to accurately estimate income, race, education, and voting patterns at the zip code and precinct level. (The average US precinct contains ∼1,000 people.) The resulting associations are surprisingly simple and powerful. For instance, if the number of sedans encountered during a drive through a city is higher than the number of pickup trucks, the city is likely to vote for a Democrat during the next presidential election (88% chance); otherwise, it is likely to vote Republican (82%). Our results suggest that automated systems for monitoring demographics may effectively complement labor-intensive approaches, with the potential to measure demographics with fine spatial resolution, in close to real time.


Assuntos
Automóveis/estatística & dados numéricos , Demografia/métodos , Aprendizado de Máquina , População , Imagens de Satélites/métodos , Humanos , Fatores Socioeconômicos , Estados Unidos
19.
Neuroimage ; 155: 422-436, 2017 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-28343000

RESUMO

A long-standing core question in cognitive science is whether different modalities and representation types (pictures, words, sounds, etc.) access a common store of semantic information. Although different input types have been shown to activate a shared network of brain regions, this does not necessitate that there is a common representation, as the neurons in these regions could still differentially process the different modalities. However, multi-voxel pattern analysis can be used to assess whether, e.g., pictures and words evoke a similar pattern of activity, such that the patterns that separate categories in one modality transfer to the other. Prior work using this method has found support for a common code, but has two limitations: they have either only examined disparate categories (e.g. animals vs. tools) that are known to activate different brain regions, raising the possibility that the pattern separation and inferred similarity reflects only large scale differences between the categories or they have been limited to individual object representations. By using natural scene categories, we not only extend the current literature on cross-modal representations beyond objects, but also, because natural scene categories activate a common set of brain regions, we identify a more fine-grained (i.e. higher spatial resolution) common representation. Specifically, we studied picture- and word-based representations of natural scene stimuli from four different categories: beaches, cities, highways, and mountains. Participants passively viewed blocks of either phrases (e.g. "sandy beach") describing scenes or photographs from those same scene categories. To determine whether the phrases and pictures evoke a common code, we asked whether a classifier trained on one stimulus type (e.g. phrase stimuli) would transfer (i.e. cross-decode) to the other stimulus type (e.g. picture stimuli). The analysis revealed cross-decoding in the occipitotemporal, posterior parietal and frontal cortices. This similarity of neural activity patterns across the two input types, for categories that co-activate local brain regions, provides strong evidence of a common semantic code for pictures and words in the brain.


Assuntos
Mapeamento Encefálico/métodos , Córtex Cerebral/fisiologia , Idioma , Reconhecimento Visual de Modelos/fisiologia , Adulto , Córtex Cerebral/diagnóstico por imagem , Feminino , Humanos , Imageamento por Ressonância Magnética/métodos , Masculino , Semântica , Adulto Jovem
20.
J Vis ; 17(1): 21, 2017 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-28114496

RESUMO

Traditional models of recognition and categorization proceed from registering low-level features, perceptually organizing that input, and linking it with stored representations. Recent evidence, however, suggests that this serial model may not be accurate, with object and category knowledge affecting rather than following early visual processing. Here, we show that the degree to which an image exemplifies its category influences how easily it is detected. Participants performed a two-alternative forced-choice task in which they indicated whether a briefly presented image was an intact or phase-scrambled scene photograph. Critically, the category of the scene is irrelevant to the detection task. We nonetheless found that participants "see" good images better, more accurately discriminating them from phase-scrambled images than bad scenes, and this advantage is apparent regardless of whether participants are asked to consider category during the experiment or not. We then demonstrate that good exemplars are more similar to same-category images than bad exemplars, influencing behavior in two ways: First, prototypical images are easier to detect, and second, intact good scenes are more likely than bad to have been primed by a previous trial.


Assuntos
Comportamento de Escolha/fisiologia , Reconhecimento Visual de Modelos/fisiologia , Análise e Desempenho de Tarefas , Feminino , Humanos , Masculino , Adulto Jovem
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA