Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Sci Robot ; 7(68): eabn1944, 2022 07 13.
Artigo em Inglês | MEDLINE | ID: mdl-35857575

RESUMO

Internal computational models of physical bodies are fundamental to the ability of robots and animals alike to plan and control their actions. These "self-models" allow robots to consider outcomes of multiple possible future actions without trying them out in physical reality. Recent progress in fully data-driven self-modeling has enabled machines to learn their own forward kinematics directly from task-agnostic interaction data. However, forward kinematic models can only predict limited aspects of the morphology, such as the position of end effectors or velocity of joints and masses. A key challenge is to model the entire morphology and kinematics without prior knowledge of what aspects of the morphology will be relevant to future tasks. Here, we propose that instead of directly modeling forward kinematics, a more useful form of self-modeling is one that could answer space occupancy queries, conditioned on the robot's state. Such query-driven self-models are continuous in the spatial domain, memory efficient, fully differentiable, and kinematic aware and can be used across a broader range of tasks. In physical experiments, we demonstrate how a visual self-model is accurate to about 1% of the workspace, enabling the robot to perform various motion planning and control tasks. Visual self-modeling can also allow the robot to detect, localize, and recover from real-world damage, leading to improved machine resiliency.


Assuntos
Robótica , Animais , Fenômenos Biomecânicos , Conhecimento , Aprendizagem , Movimento (Física)
2.
Sci Rep ; 11(1): 424, 2021 01 11.
Artigo em Inglês | MEDLINE | ID: mdl-33431917

RESUMO

Behavior modeling is an essential cognitive ability that underlies many aspects of human and animal social behavior (Watson in Psychol Rev 20:158, 1913), and an ability we would like to endow robots. Most studies of machine behavior modelling, however, rely on symbolic or selected parametric sensory inputs and built-in knowledge relevant to a given task. Here, we propose that an observer can model the behavior of an actor through visual processing alone, without any prior symbolic information and assumptions about relevant inputs. To test this hypothesis, we designed a non-verbal non-symbolic robotic experiment in which an observer must visualize future plans of an actor robot, based only on an image depicting the initial scene of the actor robot. We found that an AI-observer is able to visualize the future plans of the actor with 98.5% success across four different activities, even when the activity is not known a-priori. We hypothesize that such visual behavior modeling is an essential cognitive ability that will allow machines to understand and coordinate with surrounding agents, while sidestepping the notorious symbol grounding problem. Through a false-belief test, we suggest that this approach may be a precursor to Theory of Mind, one of the distinguishing hallmarks of primate social cognition.


Assuntos
Reconhecimento Automatizado de Padrão , Robótica , Teoria da Mente/fisiologia , Percepção Visual/fisiologia , Animais , Redes de Comunicação de Computadores , Simulação por Computador , Humanos , Aprendizado de Máquina , Reconhecimento Automatizado de Padrão/métodos , Robótica/métodos , Robótica/tendências
3.
IEEE Trans Pattern Anal Mach Intell ; 42(2): 502-508, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-30802849

RESUMO

We present the Moments in Time Dataset, a large-scale human-annotated collection of one million short videos corresponding to dynamic events unfolding within three seconds. Modeling the spatial-audio-temporal dynamics even for actions occurring in 3 second videos poses many challenges: meaningful events do not include only people, but also objects, animals, and natural phenomena; visual and auditory events can be symmetrical in time ("opening" is "closing" in reverse), and either transient or sustained. We describe the annotation process of our dataset (each video is tagged with one action or activity label among 339 different classes), analyze its scale and diversity in comparison to other large-scale video datasets for action recognition, and report results of several baseline models addressing separately, and jointly, three modalities: spatial, temporal and auditory. The Moments in Time dataset, designed to have a large coverage and diversity of events in both visual and auditory modalities, can serve as a new challenge to develop models that scale to the level of complexity and abstract reasoning that a human processes on a daily basis.


Assuntos
Bases de Dados Factuais , Gravação em Vídeo , Animais , Atividades Humanas/classificação , Humanos , Processamento de Imagem Assistida por Computador , Reconhecimento Automatizado de Padrão
4.
IEEE Trans Pattern Anal Mach Intell ; 40(10): 2303-2314, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-28922114

RESUMO

People can recognize scenes across many different modalities beyond natural images. In this paper, we investigate how to learn cross-modal scene representations that transfer across modalities. To study this problem, we introduce a new cross-modal scene dataset. While convolutional neural networks can categorize scenes well, they also learn an intermediate representation not aligned across modalities, which is undesirable for cross-modal transfer applications. We present methods to regularize cross-modal convolutional neural networks so that they have a shared representation that is agnostic of the modality. Our experiments suggest that our scene representation can help transfer representations across modalities for retrieval. Moreover, our visualizations suggest that units emerge in the shared representation that tend to activate on consistent concepts independently of the modality.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...