Pesquisa | BVS - MINISTÉRIO DA SAÚDE

1.

Impact of data on generalization of AI for surgical intelligence applications.

Bar, Omri; Neimark, Daniel; Zohar, Maya; Hager, Gregory D; Girshick, Ross; Fried, Gerald M; Wolf, Tamir; Asselmann, Dotan.

Sci Rep ; 10(1): 22208, 2020 12 17.

Artigo em Inglês | MEDLINE | ID: mdl-33335191

RESUMO

AI is becoming ubiquitous, revolutionizing many aspects of our lives. In surgery, it is still a promise. AI has the potential to improve surgeon performance and impact patient care, from post-operative debrief to real-time decision support. But, how much data is needed by an AI-based system to learn surgical context with high fidelity? To answer this question, we leveraged a large-scale, diverse, cholecystectomy video dataset. We assessed surgical workflow recognition and report a deep learning system, that not only detects surgical phases, but does so with high accuracy and is able to generalize to new settings and unseen medical centers. Our findings provide a solid foundation for translating AI applications from research to practice, ushering in a new era of surgical intelligence.

2.

Mask R-CNN.

He, Kaiming; Gkioxari, Georgia; Dollar, Piotr; Girshick, Ross.

IEEE Trans Pattern Anal Mach Intell ; 42(2): 386-397, 2020 02.

Artigo em Inglês | MEDLINE | ID: mdl-29994331

RESUMO

We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing us to estimate human poses in the same framework. We show top results in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection. Without bells and whistles, Mask R-CNN outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners. We hope our simple and effective approach will serve as a solid baseline and help ease future research in instance-level recognition. Code has been made available at: https://github.com/facebookresearch/Detectron.

3.

Focal Loss for Dense Object Detection.

Lin, Tsung-Yi; Goyal, Priya; Girshick, Ross; He, Kaiming; Dollar, Piotr.

IEEE Trans Pattern Anal Mach Intell ; 42(2): 318-327, 2020 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-30040631

RESUMO

The highest accuracy object detectors to date are based on a two-stage approach popularized by R-CNN, where a classifier is applied to a sparse set of candidate object locations. In contrast, one-stage detectors that are applied over a regular, dense sampling of possible object locations have the potential to be faster and simpler, but have trailed the accuracy of two-stage detectors thus far. In this paper, we investigate why this is the case. We discover that the extreme foreground-background class imbalance encountered during training of dense detectors is the central cause. We propose to address this class imbalance by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples. Our novel Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training. To evaluate the effectiveness of our loss, we design and train a simple dense detector we call RetinaNet. Our results show that when trained with the focal loss, RetinaNet is able to match the speed of previous one-stage detectors while surpassing the accuracy of all existing state-of-the-art two-stage detectors. Code is at: https://github.com/facebookresearch/Detectron.

4.

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.

Ren, Shaoqing; He, Kaiming; Girshick, Ross; Sun, Jian.

IEEE Trans Pattern Anal Mach Intell ; 39(6): 1137-1149, 2017 06.

Artigo em Inglês | MEDLINE | ID: mdl-27295650

RESUMO

State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet [1] and Fast R-CNN [2] have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features-using the recently popular terminology of neural networks with 'attention' mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model [3] , our detection system has a frame rate of 5 fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.

5.

Object Instance Segmentation and Fine-Grained Localization Using Hypercolumns.

Hariharan, Bharath; Arbelaez, Pablo; Girshick, Ross; Malik, Jitendra.

IEEE Trans Pattern Anal Mach Intell ; 39(4): 627-639, 2017 04.

Artigo em Inglês | MEDLINE | ID: mdl-27295654

RESUMO

Recognition algorithms based on convolutional networks (CNNs) typically use the output of the last layer as a feature representation. However, the information in this layer may be too coarse spatially to allow precise localization. On the contrary, earlier layers may be precise in localization but will not capture semantics. To get the best of both worlds, we define the hypercolumn at a pixel as the vector of activations of all CNN units above that pixel. Using hypercolumns as pixel descriptors, we show results on three fine-grained localization tasks: simultaneous detection and segmentation, where we improve state-of-the-art from 49.7 mean APr to 62.4, keypoint localization, where we get a 3.3 point boost over a strong regression baseline using CNN features, and part labeling, where we show a 6.6 point gain over a strong baseline.

6.

Object Detection Networks on Convolutional Feature Maps.

Girshick, Ross.

IEEE Trans Pattern Anal Mach Intell ; 39(7): 1476-1481, 2017 07.

Artigo em Inglês | MEDLINE | ID: mdl-27541490

RESUMO

Most object detectors contain two important components: a feature extractor and an object classifier. The feature extractor has rapidly evolved with significant research efforts leading to better deep convolutional architectures. The object classifier, however, has not received much attention and many recent systems (like SPPnet and Fast/Faster R-CNN) use simple multi-layer perceptrons. This paper demonstrates that carefully designing deep networks for object classification is just as important. We experiment with region-wise classifier networks that use shared, region-independent convolutional features. We call them "Networks on Convolutional feature maps" (NoCs). We discover that aside from deep feature maps, a deep and convolutional per-region classifier is of particular importance for object detection, whereas latest superior image classification models (such as ResNets and GoogLeNets) do not directly lead to good detection accuracy without using such a per-region classifier. We show by experiments that despite the effective ResNets and Faster R-CNN systems, the design of NoCs is an essential element for the 1st-place winning entries in ImageNet and MS COCO challenges 2015.

7.

Region-Based Convolutional Networks for Accurate Object Detection and Segmentation.

Girshick, Ross; Donahue, Jeff; Darrell, Trevor; Malik, Jitendra.

IEEE Trans Pattern Anal Mach Intell ; 38(1): 142-58, 2016 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-26656583

RESUMO

Object detection performance, as measured on the canonical PASCAL VOC Challenge datasets, plateaued in the final years of the competition. The best-performing methods were complex ensemble systems that typically combined multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 50 percent relative to the previous best result on VOC 2012-achieving a mAP of 62.4 percent. Our approach combines two ideas: (1) one can apply high-capacity convolutional networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data are scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, boosts performance significantly. Since we combine region proposals with CNNs, we call the resulting model an R-CNN or Region-based Convolutional Network. Source code for the complete system is available at http://www.cs.berkeley.edu/~rbg/rcnn.

8.

Generalized Sparselet Models for Real-Time Multiclass Object Recognition.

Song, Hyun Oh; Girshick, Ross; Zickler, Stefan; Geyer, Christopher; Felzenszwalb, Pedro; Darrell, Trevor.

IEEE Trans Pattern Anal Mach Intell ; 37(5): 1001-12, 2015 May.

Artigo em Inglês | MEDLINE | ID: mdl-26353324

RESUMO

The problem of real-time multiclass object recognition is of great practical importance in object recognition. In this paper, we describe a framework that simultaneously utilizes shared representation, reconstruction sparsity, and parallelism to enable real-time multiclass object detection with deformable part models at 5Hz on a laptop computer with almost no decrease in task performance. Our framework is trained in the standard structured output prediction formulation and is generically applicable for speeding up object recognition systems where the computational bottleneck is in multiclass, multi-convolutional inference. We experimentally demonstrate the efficiency and task performance of our method on PASCAL VOC, subset of ImageNet, Caltech101 and Caltech256 dataset.

9.

Efficient human pose estimation from single depth images.

Shotton, Jamie; Girshick, Ross; Fitzgibbon, Andrew; Sharp, Toby; Cook, Mat; Finocchio, Mark; Moore, Richard; Kohli, Pushmeet; Criminisi, Antonio; Kipman, Alex; Blake, Andrew.

IEEE Trans Pattern Anal Mach Intell ; 35(12): 2821-40, 2013 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-24136424

RESUMO

We describe two new approaches to human pose estimation. Both can quickly and accurately predict the 3D positions of body joints from a single depth image without using any temporal information. The key to both approaches is the use of a large, realistic, and highly varied synthetic set of training images. This allows us to learn models that are largely invariant to factors such as pose, body shape, field-of-view cropping, and clothing. Our first approach employs an intermediate body parts representation, designed so that an accurate per-pixel classification of the parts will localize the joints of the body. The second approach instead directly regresses the positions of body joints. By using simple depth pixel comparison features and parallelizable decision forests, both approaches can run super-real time on consumer hardware. Our evaluation investigates many aspects of our methods, and compares the approaches to each other and to the state of the art. Results on silhouettes suggest broader applicability to other imaging modalities.

Assuntos

Algoritmos , Imageamento Tridimensional , Humanos , Interpretação de Imagem Assistida por Computador , Análise de Regressão

10.

Object detection with discriminatively trained part-based models.

Felzenszwalb, Pedro F; Girshick, Ross B; McAllester, David; Ramanan, Deva.

IEEE Trans Pattern Anal Mach Intell ; 32(9): 1627-45, 2010 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-20634557

RESUMO

We describe an object detection system based on mixtures of multiscale deformable part models. Our system is able to represent highly variable object classes and achieves state-of-the-art results in the PASCAL object detection challenges. While deformable part models have become quite popular, their value had not been demonstrated on difficult benchmarks such as the PASCAL data sets. Our system relies on new methods for discriminative training with partially labeled data. We combine a margin-sensitive approach for data-mining hard negative examples with a formalism we call latent SVM. A latent SVM is a reformulation of MI--SVM in terms of latent variables. A latent SVM is semiconvex, and the training problem becomes convex once latent information is specified for the positive examples. This leads to an iterative training algorithm that alternates between fixing latent values for positive examples and optimizing the latent SVM objective function.

Assuntos

Algoritmos , Inteligência Artificial , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Reconhecimento Automatizado de Padrão/métodos , Análise Discriminante , Aumento da Imagem/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA