RESUMEN
Contemporary pose estimation methods enable precise measurements of behavior via supervised deep learning with hand-labeled video frames. Although effective in many cases, the supervised approach requires extensive labeling and often produces outputs that are unreliable for downstream analyses. Here, we introduce 'Lightning Pose', an efficient pose estimation package with three algorithmic contributions. First, in addition to training on a few labeled video frames, we use many unlabeled videos and penalize the network whenever its predictions violate motion continuity, multiple-view geometry and posture plausibility (semi-supervised learning). Second, we introduce a network architecture that resolves occlusions by predicting pose on any given frame using surrounding unlabeled frames. Third, we refine the pose predictions post hoc by combining ensembling and Kalman smoothing. Together, these components render pose trajectories more accurate and scientifically usable. We released a cloud application that allows users to label data, train networks and process new videos directly from the browser.
Asunto(s)
Algoritmos , Teorema de Bayes , Grabación en Video , Animales , Grabación en Video/métodos , Aprendizaje Automático Supervisado , Nube Computacional , Programas Informáticos , Postura/fisiología , Aprendizaje Profundo , Procesamiento de Imagen Asistido por Computador/métodos , Conducta AnimalRESUMEN
Recent neuroscience studies demonstrate that a deeper understanding of brain function requires a deeper understanding of behavior. Detailed behavioral measurements are now often collected using video cameras, resulting in an increased need for computer vision algorithms that extract useful information from video data. Here we introduce a new video analysis tool that combines the output of supervised pose estimation algorithms (e.g. DeepLabCut) with unsupervised dimensionality reduction methods to produce interpretable, low-dimensional representations of behavioral videos that extract more information than pose estimates alone. We demonstrate this tool by extracting interpretable behavioral features from videos of three different head-fixed mouse preparations, as well as a freely moving mouse in an open field arena, and show how these interpretable features can facilitate downstream behavioral and neural analyses. We also show how the behavioral features produced by our model improve the precision and interpretation of these downstream analyses compared to using the outputs of either fully supervised or fully unsupervised methods alone.
Asunto(s)
Algoritmos , Inteligencia Artificial/estadística & datos numéricos , Conducta Animal , Grabación en Video , Animales , Biología Computacional , Simulación por Computador , Cadenas de Markov , Ratones , Modelos Estadísticos , Redes Neurales de la Computación , Aprendizaje Automático Supervisado/estadística & datos numéricos , Aprendizaje Automático no Supervisado/estadística & datos numéricos , Grabación en Video/estadística & datos numéricosRESUMEN
The activity of sensory cortical neurons is not only driven by external stimuli but also shaped by other sources of input to the cortex. Unlike external stimuli, these other sources of input are challenging to experimentally control, or even observe, and as a result contribute to variability of neural responses to sensory stimuli. However, such sources of input are likely not "noise" and may play an integral role in sensory cortex function. Here we introduce the rectified latent variable model (RLVM) in order to identify these sources of input using simultaneously recorded cortical neuron populations. The RLVM is novel in that it employs nonnegative (rectified) latent variables and is much less restrictive in the mathematical constraints on solutions because of the use of an autoencoder neural network to initialize model parameters. We show that the RLVM outperforms principal component analysis, factor analysis, and independent component analysis, using simulated data across a range of conditions. We then apply this model to two-photon imaging of hundreds of simultaneously recorded neurons in mouse primary somatosensory cortex during a tactile discrimination task. Across many experiments, the RLVM identifies latent variables related to both the tactile stimulation as well as nonstimulus aspects of the behavioral task, with a majority of activity explained by the latter. These results suggest that properly identifying such latent variables is necessary for a full understanding of sensory cortical function and demonstrate novel methods for leveraging large population recordings to this end.NEW & NOTEWORTHY The rapid development of neural recording technologies presents new opportunities for understanding patterns of activity across neural populations. Here we show how a latent variable model with appropriate nonlinear form can be used to identify sources of input to a neural population and infer their time courses. Furthermore, we demonstrate how these sources are related to behavioral contexts outside of direct experimental control.
Asunto(s)
Corteza Cerebral/fisiología , Modelos Neurológicos , Redes Neurales de la Computación , Neuronas/fisiología , Animales , Interpretación Estadística de Datos , Ratones , Corteza Somatosensorial/fisiología , Percepción del TactoRESUMEN
Action segmentation of behavioral videos is the process of labeling each frame as belonging to one or more discrete classes, and is a crucial component of many studies that investigate animal behavior. A wide range of algorithms exist to automatically parse discrete animal behavior, encompassing supervised, unsupervised, and semi-supervised learning paradigms. These algorithms - which include tree-based models, deep neural networks, and graphical models - differ widely in their structure and assumptions on the data. Using four datasets spanning multiple species - fly, mouse, and human - we systematically study how the outputs of these various algorithms align with manually annotated behaviors of interest. Along the way, we introduce a semi-supervised action segmentation model that bridges the gap between supervised deep neural networks and unsupervised graphical models. We find that fully supervised temporal convolutional networks with the addition of temporal information in the observations perform the best on our supervised metrics across all datasets.
RESUMEN
Animals coordinate their behavior with each other during both cooperative and agonistic social interactions. Such coordination often adopts the form of "turn taking", in which the interactive partners alternate the performance of a behavior. Apart from acoustic communication, how turn taking between animals is coordinated is not well understood. Furthermore, the neural substrates that regulate persistence in engaging in social interactions are poorly studied. Here, we use Siamese fighting fish ( Betta splendens ), to study visually-driven turn-taking aggressive behavior. Using encounters with conspecifics and with animations, we characterize the dynamic visual features of an opponent and the behavioral sequences that drive turn taking. Through a brain-wide screen of neuronal activity during coordinated and persistent aggressive behavior, followed by targeted brain lesions, we find that the caudal portion of the dorsomedial telencephalon, an amygdala-like region, promotes persistent participation in aggressive interactions, yet is not necessary for coordination. Our work highlights how dynamic visual cues shape the rhythm of social interactions at multiple timescales, and points to the pallial amygdala as a region controlling engagement in such interactions. These results suggest an evolutionarily conserved role of the vertebrate pallial amygdala in regulating the persistence of emotional states.
RESUMEN
Contemporary pose estimation methods enable precise measurements of behavior via supervised deep learning with hand-labeled video frames. Although effective in many cases, the supervised approach requires extensive labeling and often produces outputs that are unreliable for downstream analyses. Here, we introduce "Lightning Pose," an efficient pose estimation package with three algorithmic contributions. First, in addition to training on a few labeled video frames, we use many unlabeled videos and penalize the network whenever its predictions violate motion continuity, multiple-view geometry, and posture plausibility (semi-supervised learning). Second, we introduce a network architecture that resolves occlusions by predicting pose on any given frame using surrounding unlabeled frames. Third, we refine the pose predictions post-hoc by combining ensembling and Kalman smoothing. Together, these components render pose trajectories more accurate and scientifically usable. We release a cloud application that allows users to label data, train networks, and predict new videos directly from the browser.
RESUMEN
What are the spatial and temporal scales of brainwide neuronal activity? We used swept, confocally-aligned planar excitation (SCAPE) microscopy to image all cells in a large volume of the brain of adult Drosophila with high spatiotemporal resolution while flies engaged in a variety of spontaneous behaviors. This revealed neural representations of behavior on multiple spatial and temporal scales. The activity of most neurons correlated (or anticorrelated) with running and flailing over timescales that ranged from seconds to a minute. Grooming elicited a weaker global response. Significant residual activity not directly correlated with behavior was high dimensional and reflected the activity of small clusters of spatially organized neurons that may correspond to genetically defined cell types. These clusters participate in the global dynamics, indicating that neural activity reflects a combination of local and broadly distributed components. This suggests that microcircuits with highly specified functions are provided with knowledge of the larger context in which they operate.
Asunto(s)
Encéfalo , Neuronas , Animales , Drosophila , Aseo Animal , ConocimientoRESUMEN
Many aspects of brain function arise from the coordinated activity of large populations of neurons. Recent developments in neural recording technologies are providing unprecedented access to the activity of such populations during increasingly complex experimental contexts; however, extracting scientific insights from such recordings requires the concurrent development of analytical tools that relate this population activity to system-level function. This is a primary motivation for latent variable models, which seek to provide a low-dimensional description of population activity that can be related to experimentally controlled variables, as well as uncontrolled variables such as internal states (e.g. attention and arousal) and elements of behavior. While deriving an understanding of function from traditional latent variable methods relies on low-dimensional visualizations, new approaches are targeting more interpretable descriptions of the components underlying system-level function.
Asunto(s)
Fenómenos Fisiológicos del Sistema Nervioso , NeuronasRESUMEN
Sensory neurons often have variable responses to repeated presentations of the same stimulus, which can significantly degrade the stimulus information contained in those responses. This information can in principle be preserved if variability is shared across many neurons, but depends on the structure of the shared variability and its relationship to sensory encoding at the population level. The structure of this shared variability in neural activity can be characterized by latent variable models, although they have thus far typically been used under restrictive mathematical assumptions, such as assuming linear transformations between the latent variables and neural activity. Here we introduce two nonlinear latent variable models for analyzing large-scale neural recordings. We first present a general nonlinear latent variable model that is agnostic to the stimulus tuning properties of the individual neurons, and is hence well suited for exploring neural populations whose tuning properties are not well characterized. This motivates a second class of model, the Generalized Affine Model, which simultaneously determines each neuron's stimulus selectivity and a set of latent variables that modulate these stimulus-driven responses both additively and multiplicatively. While these approaches can detect very general nonlinear relationships in shared neural variability, we find that neural activity recorded in anesthetized primary visual cortex (V1) is best described by a single additive and single multiplicative latent variable, i.e., an "affine model". In contrast, application of the same models to recordings in awake macaque prefrontal cortex discover more general nonlinearities to compactly describe the population response variability. These results thus demonstrate how nonlinear latent variable models can be used to describe population variability, and suggest that a range of methods is necessary to study different brain regions under different experimental conditions.
RESUMEN
Natural sounds have rich spectrotemporal dynamics. Spectral information is spatially represented in the auditory cortex (ACX) via large-scale maps. However, the representation of temporal information, e.g., sound offset, is unclear. We perform multiscale imaging of neuronal and thalamic activity evoked by sound onset and offset in awake mouse ACX. ACX areas differed in onset responses (On-Rs) and offset responses (Off-Rs). Most excitatory L2/3 neurons show either On-Rs or Off-Rs, and ACX areas are characterized by differing fractions of On and Off-R neurons. Somatostatin and parvalbumin interneurons show distinct temporal dynamics, potentially amplifying Off-Rs. Functional network analysis shows that ACX areas contain distinct parallel onset and offset networks. Thalamic (MGB) terminals show either On-Rs or Off-Rs, indicating a thalamic origin of On and Off-R pathways. Thus, ACX areas spatially represent temporal features, and this representation is created by spatial convergence and co-activation of distinct MGB inputs and is refined by specific intracortical connectivity.