Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
Sensors (Basel) ; 20(20)2020 Oct 18.
Artigo em Inglês | MEDLINE | ID: mdl-33080979

RESUMO

In order to enable timely actions to prevent major losses of crops caused by lack of nutrients and, hence, increase the potential yield throughout the growing season while at the same time prevent excess fertilization with detrimental environmental consequences, early, non-invasive, and on-site detection of nutrient deficiency is required. Current non-invasive methods for assessing the nutrient status of crops deal in most cases with nitrogen (N) deficiency only and optical sensors to diagnose N deficiency, such as chlorophyll meters or canopy reflectance sensors, do not monitor N, but instead measure changes in leaf spectral properties that may or may not be caused by N deficiency. In this work, we study how well nutrient deficiency symptoms can be recognized in RGB images of sugar beets. To this end, we collected the Deep Nutrient Deficiency for Sugar Beet (DND-SB) dataset, which contains 5648 images of sugar beets growing on a long-term fertilizer experiment with nutrient deficiency plots comprising N, phosphorous (P), and potassium (K) deficiency, as well as the omission of liming (Ca), full fertilization, and no fertilization at all. We use the dataset to analyse the performance of five convolutional neural networks for recognizing nutrient deficiency symptoms and discuss their limitations.


Assuntos
Beta vulgaris , Aprendizado Profundo , Análise de Alimentos/métodos , Fertilizantes , Nutrientes , Açúcares
2.
IEEE Trans Pattern Anal Mach Intell ; 45(6): 6647-6658, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-32886607

RESUMO

With the success of deep learning in classifying short trimmed videos, more attention has been focused on temporally segmenting and classifying activities in long untrimmed videos. State-of-the-art approaches for action segmentation utilize several layers of temporal convolution and temporal pooling. Despite the capabilities of these approaches in capturing temporal dependencies, their predictions suffer from over-segmentation errors. In this paper, we propose a multi-stage architecture for the temporal action segmentation task that overcomes the limitations of the previous approaches. The first stage generates an initial prediction that is refined by the next ones. In each stage we stack several layers of dilated temporal convolutions covering a large receptive field with few parameters. While this architecture already performs well, lower layers still suffer from a small receptive field. To address this limitation, we propose a dual dilated layer that combines both large and small receptive fields. We further decouple the design of the first stage from the refining stages to address the different requirements of these stages. Extensive evaluation shows the effectiveness of the proposed model in capturing long-range dependencies and recognizing action segments. Our models achieve state-of-the-art results on three datasets: 50Salads, Georgia Tech Egocentric Activities (GTEA), and the Breakfast dataset.

3.
IEEE Trans Pattern Anal Mach Intell ; 44(10): 6196-6208, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-34125671

RESUMO

Action segmentation is the task of predicting the actions for each frame of a video. As obtaining the full annotation of videos for action segmentation is expensive, weakly supervised approaches that can learn only from transcripts are appealing. In this paper, we propose a novel end-to-end approach for weakly supervised action segmentation based on a two-branch neural network. The two branches of our network predict two redundant but different representations for action segmentation and we propose a novel mutual consistency (MuCon) loss that enforces the consistency of the two redundant representations. Using the MuCon loss together with a loss for transcript prediction, our proposed approach achieves the accuracy of state-of-the-art approaches while being 14 times faster to train and 20 times faster during inference. The MuCon loss proves beneficial even in the fully supervised setting.


Assuntos
Algoritmos , Aprendizado de Máquina Supervisionado , Redes Neurais de Computação
4.
Artigo em Inglês | MEDLINE | ID: mdl-34914599

RESUMO

Many point-based semantic segmentation methods have been designed for indoor scenarios, but they struggle if they are applied to point clouds that are captured by a light detection and ranging (LiDAR) sensor in an outdoor environment. In order to make these methods more efficient and robust such that they can handle LiDAR data, we introduce the general concept of reformulating 3-D point-based operations such that they can operate in the projection space. While we show by means of three point-based methods that the reformulated versions are between 300 and 400 times faster and achieve higher accuracy, we furthermore demonstrate that the concept of reformulating 3-D point-based operations allows to design new architectures that unify the benefits of point-based and image-based methods. As an example, we introduce a network that integrates reformulated 3-D point-based operations into a 2-D encoder-decoder architecture that fuses the information from different 2-D scales. We evaluate the approach on four challenging datasets for semantic LiDAR point cloud segmentation and show that leveraging reformulated 3-D point-based operations with 2-D image-based operations achieves very good results for all four datasets.

5.
IEEE Trans Pattern Anal Mach Intell ; 42(2): 413-429, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-30418898

RESUMO

Since annotating and curating large datasets is very expensive, there is a need to transfer the knowledge from existing annotated datasets to unlabelled data. Data that is relevant for a specific application, however, usually differs from publicly available datasets since it is sampled from a different domain. While domain adaptation methods compensate for such a domain shift, they assume that all categories in the target domain are known and match the categories in the source domain. Since this assumption is violated under real-world conditions, we propose an approach for open set domain adaptation where the target domain contains instances of categories that are not present in the source domain. The proposed approach achieves state-of-the-art results on various datasets for image classification and action recognition. Since the approach can be used for open set and closed set domain adaptation, as well as unsupervised and semi-supervised domain adaptation, it is a versatile tool for many applications.

6.
IEEE Trans Pattern Anal Mach Intell ; 42(4): 765-779, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-30582525

RESUMO

Action recognition has become a rapidly developing research field within the last decade. But with the increasing demand for large scale data, the need of hand annotated data for the training becomes more and more impractical. One way to avoid frame-based human annotation is the use of action order information to learn the respective action classes. In this context, we propose a hierarchical approach to address the problem of weakly supervised learning of human actions from ordered action labels by structuring recognition in a coarse-to-fine manner. Given a set of videos and an ordered list of the occurring actions, the task is to infer start and end frames of the related action classes within the video and to train the respective action classifiers without any need for hand labeled frame boundaries. We address this problem by combining a framewise RNN model with a coarse probabilistic inference. This combination allows for the temporal alignment of long sequences and thus, for an iterative training of both elements. While this system alone already generates good results, we show that the performance can be further improved by approximating the number of subactions to the characteristics of the different action classes as well as by the introduction of a regularizing length prior. The proposed system is evaluated on two benchmark datasets, the Breakfast and the Hollywood extended dataset, showing a competitive performance on various weak learning tasks such as temporal action segmentation and action alignment.

7.
IEEE Trans Pattern Anal Mach Intell ; 38(3): 490-503, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-27046493

RESUMO

Large image datasets such as ImageNet or open-ended photo websites like Flickr are revealing new challenges to image classification that were not apparent in smaller, fixed sets. In particular, the efficient handling of dynamically growing datasets, where not only the amount of training data but also the number of classes increases over time, is a relatively unexplored problem. In this challenging setting, we study how two variants of Random Forests (RF) perform under four strategies to incorporate new classes while avoiding to retrain the RFs from scratch. The various strategies account for different trade-offs between classification accuracy and computational efficiency. In our extensive experiments, we show that both RF variants, one based on Nearest Class Mean classifiers and the other on SVMs, outperform conventional RFs and are well suited for incrementally learning new classes. In particular, we show that RFs initially trained with just 10 classes can be extended to 1,000 classes with an acceptable loss of accuracy compared to training from the full data and with great computational savings compared to retraining for each new batch of classes.

8.
IEEE Trans Pattern Anal Mach Intell ; 36(11): 2131-43, 2014 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26353056

RESUMO

In this work, we address the problem of estimating 2d human pose from still images. Articulated body pose estimation is challenging due to the large variation in body poses and appearances of the different body parts. Recent methods that rely on the pictorial structure framework have shown to be very successful in solving this task. They model the body part appearances using discriminatively trained, independent part templates and the spatial relations of the body parts using a tree model. Within such a framework, we address the problem of obtaining better part templates which are able to handle a very high variation in appearance. To this end, we introduce parts dependent body joint regressors which are random forests that operate over two layers. While the first layer acts as an independent body part classifier, the second layer takes the estimated class distributions of the first one into account and is thereby able to predict joint locations by modeling the interdependence and co-occurrence of the parts. This helps to overcome typical ambiguities of tree structures, such as self-similarities of legs and arms. In addition, we introduce a novel data set termed FashionPose that contains over 7,000 images with a challenging variation of body part appearances due to a large variation of dressing styles. In the experiments, we demonstrate that the proposed parts dependent joint regressors outperform independent classifiers or regressors. The method also performs better or similar to the state-of-the-art in terms of accuracy, while running with a couple of frames per second.


Assuntos
Processamento de Imagem Assistida por Computador/métodos , Articulações/fisiologia , Postura/fisiologia , Algoritmos , Bases de Dados Factuais , Humanos
9.
IEEE Trans Pattern Anal Mach Intell ; 35(11): 2720-35, 2013 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-24051731

RESUMO

Capturing the skeleton motion and detailed time-varying surface geometry of multiple, closely interacting peoples is a very challenging task, even in a multicamera setup, due to frequent occlusions and ambiguities in feature-to-person assignments. To address this task, we propose a framework that exploits multiview image segmentation. To this end, a probabilistic shape and appearance model is employed to segment the input images and to assign each pixel uniquely to one person. Given the articulated template models of each person and the labeled pixels, a combined optimization scheme, which splits the skeleton pose optimization problem into a local one and a lower dimensional global one, is applied one by one to each individual, followed with surface estimation to capture detailed nonrigid deformations. We show on various sequences that our approach can capture the 3D motion of humans accurately even if they move rapidly, if they wear wide apparel, and if they are engaged in challenging multiperson motions, including dancing, wrestling, and hugging.


Assuntos
Algoritmos , Inteligência Artificial , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Movimento/fisiologia , Reconhecimento Automatizado de Padrão/métodos , Imagem Corporal Total/métodos , Humanos
10.
IEEE Trans Pattern Anal Mach Intell ; 33(11): 2188-202, 2011 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-21464503

RESUMO

Abstract­The paper introduces Hough forests, which are random forests adapted to perform a generalized Hough transform in an efficient way. Compared to previous Hough-based systems such as implicit shape models, Hough forests improve the performance of the generalized Hough transform for object detection on a categorical level. At the same time, their flexibility permits extensions of the Hough transform to new domains such as object tracking and action recognition. Hough forests can be regarded as task-adapted codebooks of local appearance that allow fast supervised training and fast matching at test time. They achieve high detection accuracy since the entries of such codebooks are optimized to cast Hough votes with small variance and since their efficiency permits dense sampling of local image patches or video cuboids during detection. The efficacy of Hough forests for a set of computer vision tasks is validated through experiments on a large set of publicly available benchmark data sets and comparisons with the state-of-the-art.

11.
IEEE Trans Pattern Anal Mach Intell ; 32(3): 402-15, 2010 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-20075468

RESUMO

In this paper, we propose the combined use of complementary concepts for 3D tracking: region fitting on one side and dense optical flow as well as tracked SIFT features on the other. Both concepts are chosen such that they can compensate for the shortcomings of each other. While tracking by the object region can prevent the accumulation of errors, optical flow and SIFT can handle larger transformations. Whereas segmentation works best in case of homogeneous objects, optical flow computation and SIFT tracking rely on sufficiently structured objects. We show that a sensible combination yields a general tracking system that can be applied in a large variety of scenarios without the need to manually adjust weighting parameters.


Assuntos
Processamento de Imagem Assistida por Computador/métodos , Movimento/fisiologia , Algoritmos , Humanos , Modelos Estatísticos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA