Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 41
Filtrar
1.
Sci Robot ; 9(89): eadi9579, 2024 Apr 17.
Artigo em Inglês | MEDLINE | ID: mdl-38630806

RESUMO

Humanoid robots that can autonomously operate in diverse environments have the potential to help address labor shortages in factories, assist elderly at home, and colonize new planets. Although classical controllers for humanoid robots have shown impressive results in a number of settings, they are challenging to generalize and adapt to new environments. Here, we present a fully learning-based approach for real-world humanoid locomotion. Our controller is a causal transformer that takes the history of proprioceptive observations and actions as input and predicts the next action. We hypothesized that the observation-action history contains useful information about the world that a powerful transformer model can use to adapt its behavior in context, without updating its weights. We trained our model with large-scale model-free reinforcement learning on an ensemble of randomized environments in simulation and deployed it to the real-world zero-shot. Our controller could walk over various outdoor terrains, was robust to external disturbances, and could adapt in context.


Assuntos
Robótica , Humanos , Idoso , Robótica/métodos , Locomoção , Caminhada , Aprendizagem , Reforço Psicológico
2.
Sci Robot ; 8(79): eadf6991, 2023 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-37379376

RESUMO

Semantic navigation is necessary to deploy mobile robots in uncontrolled environments such as homes or hospitals. Many learning-based approaches have been proposed in response to the lack of semantic understanding of the classical pipeline for spatial navigation, which builds a geometric map using depth sensors and plans to reach point goals. Broadly, end-to-end learning approaches reactively map sensor inputs to actions with deep neural networks, whereas modular learning approaches enrich the classical pipeline with learning-based semantic sensing and exploration. However, learned visual navigation policies have predominantly been evaluated in sim, with little known about what works on a robot. We present a large-scale empirical study of semantic visual navigation methods comparing representative methods with classical, modular, and end-to-end learning approaches across six homes with no prior experience, maps, or instrumentation. We found that modular learning works well in the real world, attaining a 90% success rate. In contrast, end-to-end learning does not, dropping from 77% sim to a 23% real-world success rate because of a large image domain gap between sim and reality. For practitioners, we show that modular learning is a reliable approach to navigate to objects: Modularity and abstraction in policy design enable sim-to-real transfer. For researchers, we identify two key issues that prevent today's simulators from being reliable evaluation benchmarks-a large sim-to-real gap in images and a disconnect between sim and real-world error modes-and propose concrete steps forward.

3.
IEEE Trans Pattern Anal Mach Intell ; 44(12): 8754-8765, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-30762530

RESUMO

We study the notion of consistency between a 3D shape and a 2D observation and propose a differentiable formulation which allows computing gradients of the 3D shape given an observation from an arbitrary view. We do so by reformulating view consistency using a differentiable ray consistency (DRC) term. We show that this formulation can be incorporated in a learning framework to leverage different types of multi-view observations e.g., foreground masks, depth, color images, semantics etc. as supervision for learning single-view 3D prediction. We present empirical analysis of our technique in a controlled setting. We also show that this approach allows us to improve over existing techniques for single-view reconstruction of objects from the PASCAL VOC dataset.

4.
IEEE Trans Pattern Anal Mach Intell ; 42(6): 1348-1361, 2020 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-30714908

RESUMO

Recently, Convolutional Neural Networks have shown promising results for 3D geometry prediction. They can make predictions from very little input data such as a single color image. A major limitation of such approaches is that they only predict a coarse resolution voxel grid, which does not capture the surface of the objects well. We propose a general framework, called hierarchical surface prediction (HSP), which facilitates prediction of high resolution voxel grids. The main insight is that it is sufficient to predict high resolution voxels around the predicted surfaces. The exterior and interior of the objects can be represented with coarse resolution voxels. This allows us to predict significantly higher resolution voxel grids around the surface, from which triangle meshes can be extracted. Additionally it allows us to predict properties such as surface color which are only defined on the surface. Our approach is not dependent on a specific input type. We show results for geometry prediction from color images and depth images. Our analysis shows that our high resolution predictions are more accurate than low resolution predictions.

5.
Proc Natl Acad Sci U S A ; 116(45): 22737-22745, 2019 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-31636195

RESUMO

Computed tomography (CT) of the head is used worldwide to diagnose neurologic emergencies. However, expertise is required to interpret these scans, and even highly trained experts may miss subtle life-threatening findings. For head CT, a unique challenge is to identify, with perfect or near-perfect sensitivity and very high specificity, often small subtle abnormalities on a multislice cross-sectional (three-dimensional [3D]) imaging modality that is characterized by poor soft tissue contrast, low signal-to-noise using current low radiation-dose protocols, and a high incidence of artifacts. We trained a fully convolutional neural network with 4,396 head CT scans performed at the University of California at San Francisco and affiliated hospitals and compared the algorithm's performance to that of 4 American Board of Radiology (ABR) certified radiologists on an independent test set of 200 randomly selected head CT scans. Our algorithm demonstrated the highest accuracy to date for this clinical application, with a receiver operating characteristic (ROC) area under the curve (AUC) of 0.991 ± 0.006 for identification of examinations positive for acute intracranial hemorrhage, and also exceeded the performance of 2 of 4 radiologists. We demonstrate an end-to-end network that performs joint classification and segmentation with examination-level classification comparable to experts, in addition to robust localization of abnormalities, including some that are missed by radiologists, both of which are critically important elements for this application.


Assuntos
Aprendizado Profundo , Hemorragias Intracranianas/diagnóstico por imagem , Tomografia Computadorizada por Raios X/métodos , Doença Aguda , Algoritmos , Humanos , Redes Neurais de Computação
6.
IEEE Trans Pattern Anal Mach Intell ; 39(1): 128-140, 2017 01.
Artigo em Inglês | MEDLINE | ID: mdl-26955014

RESUMO

We propose a unified approach for bottom-up hierarchical image segmentation and object proposal generation for recognition, called Multiscale Combinatorial Grouping (MCG). For this purpose, we first develop a fast normalized cuts algorithm. We then propose a high-performance hierarchical segmenter that makes effective use of multiscale information. Finally, we propose a grouping strategy that combines our multiscale regions into highly-accurate object proposals by exploring efficiently their combinatorial space. We also present Single-scale Combinatorial Grouping (SCG), a faster version of MCG that produces competitive proposals in under five seconds per image. We conduct an extensive and comprehensive empirical validation on the BSDS500, SegVOC12, SBD, and COCO datasets, showing that MCG produces state-of-the-art contours, hierarchical regions, and object proposals.

7.
IEEE Trans Pattern Anal Mach Intell ; 39(4): 627-639, 2017 04.
Artigo em Inglês | MEDLINE | ID: mdl-27295654

RESUMO

Recognition algorithms based on convolutional networks (CNNs) typically use the output of the last layer as a feature representation. However, the information in this layer may be too coarse spatially to allow precise localization. On the contrary, earlier layers may be precise in localization but will not capture semantics. To get the best of both worlds, we define the hypercolumn at a pixel as the vector of activations of all CNN units above that pixel. Using hypercolumns as pixel descriptors, we show results on three fine-grained localization tasks: simultaneous detection and segmentation, where we improve state-of-the-art from 49.7 mean APr to 62.4, keypoint localization, where we get a 3.3 point boost over a strong regression baseline using CNN features, and part labeling, where we show a 6.6 point gain over a strong baseline.

8.
IEEE Trans Pattern Anal Mach Intell ; 39(4): 719-731, 2017 04.
Artigo em Inglês | MEDLINE | ID: mdl-27254860

RESUMO

We address the problem of fully automatic object localization and reconstruction from a single image. This is both a very challenging and very important problem which has, until recently, received limited attention due to difficulties in segmenting objects and predicting their poses. Here we leverage recent advances in learning convolutional networks for object detection and segmentation and introduce a complementary network for the task of camera viewpoint prediction. These predictors are very powerful, but still not perfect given the stringent requirements of shape reconstruction. Our main contribution is a new class of deformable 3D models that can be robustly fitted to images based on noisy pose and silhouette estimates computed upstream and that can be learned directly from 2D annotations available in object detection datasets. Our models capture top-down information about the main global modes of shape variation within a class providing a "low-frequency" shape. In order to capture fine instance-specific shape details, we fuse it with a high-frequency component recovered from shading cues. A comprehensive quantitative analysis and ablation study on the PASCAL 3D+ dataset validates the approach as we show fully automatic reconstructions on PASCAL VOC as well as large improvements on the task of viewpoint prediction.

9.
IEEE Trans Pattern Anal Mach Intell ; 39(3): 546-560, 2017 03.
Artigo em Inglês | MEDLINE | ID: mdl-27101598

RESUMO

Light-field cameras are quickly becoming commodity items, with consumer and industrial applications. They capture many nearby views simultaneously using a single image with a micro-lens array, thereby providing a wealth of cues for depth recovery: defocus, correspondence, and shading. In particular, apart from conventional image shading, one can refocus images after acquisition, and shift one's viewpoint within the sub-apertures of the main lens, effectively obtaining multiple views. We present a principled algorithm for dense depth estimation that combines defocus and correspondence metrics. We then extend our analysis to the additional cue of shading, using it to refine fine details in the shape. By exploiting an all-in-focus image, in which pixels are expected to exhibit angular coherence, we define an optimization framework that integrates photo consistency, depth consistency, and shading consistency. We show that combining all three sources of information: defocus, correspondence, and shading, outperforms state-of-the-art light-field depth estimation algorithms in multiple scenarios.

10.
IEEE Trans Pattern Anal Mach Intell ; 38(4): 690-703, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26959674

RESUMO

In this paper, we present a technique for recovering a model of shape, illumination, reflectance, and shading from a single image taken from an RGB-D sensor. To do this, we extend the SIRFS ("shape, illumination and reflectance from shading") model, which recovers intrinsic scene properties from a single image. Though SIRFS works well on neatly segmented images of objects, it performs poorly on images of natural scenes which often contain occlusion and spatially-varying illumination. We therefore present Scene-SIRFS, a generalization of SIRFS in which we model a scene using a mixture of shapes and a mixture of illuminations, where those mixture components are embedded in a "soft" segmentation-like representation of the input image. We use the noisy depth maps provided by RGB-D sensors (such as the Microsoft Kinect) to guide and improve shape estimation. Our model takes as input a single RGB-D image and produces as output an improved depth map, a set of surface normals, a reflectance image, a shading image, and a spatially varying model of illumination. The output of our model can be used for graphics applications such as relighting and retargeting, or for more broad applications (recognition, segmentation) involving RGB-D images.

11.
IEEE Trans Pattern Anal Mach Intell ; 38(6): 1155-69, 2016 06.
Artigo em Inglês | MEDLINE | ID: mdl-26372203

RESUMO

Light-field cameras have now become available in both consumer and industrial applications, and recent papers have demonstrated practical algorithms for depth recovery from a passive single-shot capture. However, current light-field depth estimation methods are designed for Lambertian objects and fail or degrade for glossy or specular surfaces. The standard Lambertian photoconsistency measure considers the variance of different views, effectively enforcing point-consistency, i.e., that all views map to the same point in RGB space. This variance or point-consistency condition is a poor metric for glossy surfaces. In this paper, we present a novel theory of the relationship between light-field data and reflectance from the dichromatic model. We present a physically-based and practical method to estimate the light source color and separate specularity. We present a new photo consistency metric, line-consistency, which represents how viewpoint changes affect specular points. We then show how the new metric can be used in combination with the standard Lambertian variance or point-consistency measure to give us results that are robust against scenes with glossy surfaces. With our analysis, we can also robustly estimate multiple light source colors and remove the specular component from glossy objects. We show that our method outperforms current state-of-the-art specular removal and depth estimation algorithms in multiple real world scenarios using the consumer Lytro and Lytro Illum light field cameras.

12.
IEEE Trans Pattern Anal Mach Intell ; 38(1): 142-58, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-26656583

RESUMO

Object detection performance, as measured on the canonical PASCAL VOC Challenge datasets, plateaued in the final years of the competition. The best-performing methods were complex ensemble systems that typically combined multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 50 percent relative to the previous best result on VOC 2012-achieving a mAP of 62.4 percent. Our approach combines two ideas: (1) one can apply high-capacity convolutional networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data are scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, boosts performance significantly. Since we combine region proposals with CNNs, we call the resulting model an R-CNN or Region-based Convolutional Network. Source code for the complete system is available at http://www.cs.berkeley.edu/~rbg/rcnn.

13.
Environ Toxicol Pharmacol ; 40(2): 645-9, 2015 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-26363987

RESUMO

The potential impact of subchronic exposure of aflatoxin B1 was investigated on the pharmacokinetic disposition of enrofloxacin in broiler chickens. Broiler chickens given either normal or aflatoxin B1 (750µg/kg diet) supplemented diet for 6 weeks received a single oral dose of enrofloxacin (10mg/kg body wt). Blood samples were drawn from the brachial vein at predetermined time intervals after drug administration. Enrofloxacin plasma concentrations analyzed by RP-HPLC were significantly lower in aflatoxin B1-exposed broiler chickens at 0.167, 0.5 and 1.0h after drug administration. In aflatoxin B1-exposed broiler chickens, the absorption rate constant (ka) of enrofloxacin (0.20±0.05h(-1)) was significantly decreased as compared to the unexposed birds (0.98±0.31h(-1)). The values of [Formula: see text] , tmax and AUC0-∞ of enrofloxacin were nonsignificantly increased by 17%, 26% and 17% in aflatoxin-exposed broiler chickens, respectively. Subchronic aflatoxin B1 exposure markedly decreased the initial absorption of enrofloxacin without significantly influencing other pharmacokinetic parameters in broiler chickens.


Assuntos
Aflatoxina B1/administração & dosagem , Fluoroquinolonas/farmacocinética , Administração Oral , Aflatoxina B1/toxicidade , Animais , Galinhas , Cromatografia Líquida de Alta Pressão , Suplementos Nutricionais , Enrofloxacina , Fluoroquinolonas/administração & dosagem , Fluoroquinolonas/sangue , Testes de Toxicidade Subcrônica
14.
IEEE Trans Pattern Anal Mach Intell ; 37(8): 1670-87, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-26353003

RESUMO

A fundamental problem in computer vision is that of inferring the intrinsic, 3D structure of the world from flat, 2D images of that world. Traditional methods for recovering scene properties such as shape, reflectance, or illumination rely on multiple observations of the same scene to overconstrain the problem. Recovering these same properties from a single image seems almost impossible in comparison-there are an infinite number of shapes, paint, and lights that exactly reproduce a single image. However, certain explanations are more likely than others: surfaces tend to be smooth, paint tends to be uniform, and illumination tends to be natural. We therefore pose this problem as one of statistical inference, and define an optimization problem that searches for the most likely explanation of a single image. Our technique can be viewed as a superset of several classic computer vision problems (shape-from-shading, intrinsic images, color constancy, illumination estimation, etc) and outperforms all previous solutions to those constituent problems.

15.
J Struct Biol ; 187(1): 66-75, 2014 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-24694675

RESUMO

Tilted electron microscope images are routinely collected for an ab initio structure reconstruction as a part of the Random Conical Tilt (RCT) or Orthogonal Tilt Reconstruction (OTR) methods, as well as for various applications using the "free-hand" procedure. These procedures all require identification of particle pairs in two corresponding images as well as accurate estimation of the tilt-axis used to rotate the electron microscope (EM) grid. Here we present a computational approach, PCT (particle correspondence from tilted pairs), based on tilt-invariant context and projection matching that addresses both problems. The method benefits from treating the two problems as a single optimization task. It automatically finds corresponding particle pairs and accurately computes tilt-axis direction even in the cases when EM grid is not perfectly planar.


Assuntos
IMP Desidrogenase/ultraestrutura , Processamento de Imagem Assistida por Computador/estatística & dados numéricos , Imageamento Tridimensional/estatística & dados numéricos , Ribossomos/ultraestrutura , Microscopia Crioeletrônica/instrumentação , Desulfovibrio vulgaris/química , Escherichia coli/química , Imageamento Tridimensional/instrumentação , Imageamento Tridimensional/métodos
16.
IEEE Trans Pattern Anal Mach Intell ; 36(6): 1187-200, 2014 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-26353280

RESUMO

Motion is a strong cue for unsupervised object-level grouping. In this paper, we demonstrate that motion will be exploited most effectively, if it is regarded over larger time windows. Opposed to classical two-frame optical flow, point trajectories that span hundreds of frames are less susceptible to short-term variations that hinder separating different objects. As a positive side effect, the resulting groupings are temporally consistent over a whole video shot, a property that requires tedious post-processing in the vast majority of existing approaches. We suggest working with a paradigm that starts with semi-dense motion cues first and that fills up textureless areas afterwards based on color. This paper also contributes the Freiburg-Berkeley motion segmentation (FBMS) dataset, a large, heterogeneous benchmark with 59 sequences and pixel-accurate ground truth annotation of moving objects.

17.
IEEE Trans Pattern Anal Mach Intell ; 35(1): 66-77, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22392703

RESUMO

We show that a class of nonlinear kernel SVMs admits approximate classifiers with runtime and memory complexity that is independent of the number of support vectors. This class of kernels, which we refer to as additive kernels, includes widely used kernels for histogram-based image comparison like intersection and chi-squared kernels. Additive kernel SVMs can offer significant improvements in accuracy over linear SVMs on a wide variety of tasks while having the same runtime, making them practical for large-scale recognition or real-time detection tasks. We present experiments on a variety of datasets, including the INRIA person, Daimler-Chrysler pedestrians, UIUC Cars, Caltech-101, MNIST, and USPS digits, to demonstrate the effectiveness of our method for efficient evaluation of SVMs with additive kernels. Since its introduction, our method has become integral to various state-of-the-art systems for PASCAL VOC object detection/image classification, ImageNet Challenge, TRECVID, etc. The techniques we propose can also be applied to settings where evaluation of weighted additive kernels is required, which include kernelized versions of PCA, LDA, regression, k-means, as well as speeding up the inner loop of SVM classifier training algorithms.


Assuntos
Algoritmos , Inteligência Artificial , Técnicas de Apoio para a Decisão , Interpretação de Imagem Assistida por Computador/métodos , Modelos Teóricos , Reconhecimento Automatizado de Padrão/métodos , Máquina de Vetores de Suporte , Simulação por Computador
18.
Proc IEEE Int Conf Comput Vis ; 2013: 3448-3455, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26029008

RESUMO

We present an algorithm for the per-voxel semantic segmentation of a three-dimensional volume. At the core of our algorithm is a novel "pyramid context" feature, a descriptive representation designed such that exact per-voxel linear classification can be made extremely efficient. This feature not only allows for efficient semantic segmentation but enables other aspects of our algorithm, such as novel learned features and a stacked architecture that can reason about self-consistency. We demonstrate our technique on 3D fluorescence microscopy data of Drosophila embryos for which we are able to produce extremely accurate semantic segmentations in a matter of minutes, and for which other algorithms fail due to the size and high-dimensionality of the data, or due to the difficulty of the task.

19.
Environ Toxicol Pharmacol ; 33(2): 121-6, 2012 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-22209724

RESUMO

The impact of subchronic exposure of aflatoxin B1 on the tissue residues of enrofloxacin and its metabolite ciprofloxacin was examined in broiler chickens. Broiler chickens given either normal or aflatoxin B1 (750 µg/kg diet) supplemented diets for 6 weeks received enrofloxacin (10 mg/kg/day, p.o.) for 4 days and thereafter, residue levels were determined. Aflatoxin B1 induced alterations in serum marker enzymes. As compared to unexposed broiler chickens, enrofloxacin concentrations in aflatoxin B1-exposed broiler chickens were significantly higher in all tissues (0.62-4.53 µg/g) analyzed except muscle 24h after termination of enrofloxacin administration. Ciprofloxacin was detectable in tissues of only mycotoxin-exposed broiler chickens. Enrofloxacin residues in liver, kidney and skin plus fat persisted for 10 days in mycotoxin-exposed broiler chickens whereas it was detectable only in liver of unexposed broiler chickens. Our results indicate that subchronic aflatoxin B1 exposure markedly influences the residue levels of enrofloxacin and ciprofloxacin in tissues of broiler chickens.


Assuntos
Aflatoxina B1/administração & dosagem , Ração Animal , Anti-Infecciosos/farmacocinética , Ciprofloxacina/farmacocinética , Fluoroquinolonas/farmacocinética , Tecido Adiposo/efeitos dos fármacos , Tecido Adiposo/metabolismo , Alanina Transaminase/sangue , Fosfatase Alcalina/sangue , Animais , Aspartato Aminotransferases/sangue , Galinhas , Resíduos de Drogas , Enrofloxacina , Rim/efeitos dos fármacos , Rim/metabolismo , Fígado/efeitos dos fármacos , Fígado/metabolismo , Músculo Esquelético/efeitos dos fármacos , Músculo Esquelético/metabolismo , Pele/efeitos dos fármacos , Pele/metabolismo , Distribuição Tecidual
20.
Med Image Comput Comput Assist Interv ; 15(Pt 3): 345-52, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23286149

RESUMO

In low-resource areas, the most common method of tuberculosis (TB) diagnosis is visual identification of rod-shaped TB bacilli in microscopic images of sputum smears. We present an algorithm for automated TB detection using images from digital microscopes such as CellScope, a novel, portable device capable of brightfield and fluorescence microscopy. Automated processing on such platforms could save lives by bringing healthcare to rural areas with limited access to laboratory-based diagnostics. Our algorithm applies morphological operations and template matching with a Gaussian kernel to identify candidate TB-objects. We characterize these objects using Hu moments, geometric and photometric features, and histograms of oriented gradients and then perform support vector machine classification. We test our algorithm on a large set of CellScope images (594 images corresponding to 290 patients) from sputum smears collected at clinics in Uganda. Our object-level classification performance is highly accurate, with average precision of 89.2% +/- 2.1%. For slide-level classification, our algorithm performs at the level of human readers, demonstrating the potential for making a significant impact on global healthcare.


Assuntos
Microscopia de Fluorescência/instrumentação , Mycobacterium tuberculosis/citologia , Reconhecimento Automatizado de Padrão/métodos , Escarro/citologia , Escarro/microbiologia , Tuberculose/microbiologia , Tuberculose/patologia , Desenho de Equipamento , Análise de Falha de Equipamento , Humanos , Sistemas Automatizados de Assistência Junto ao Leito , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA