RESUMO
This paper addresses the problem of single-target tracker performance evaluation. We consider the performance measures, the dataset and the evaluation system to be the most important components of tracker evaluation and propose requirements for each of them. The requirements are the basis of a new evaluation methodology that aims at a simple and easily interpretable tracker comparison. The ranking-based methodology addresses tracker equivalence in terms of statistical significance and practical differences. A fully-annotated dataset with per-frame annotations with several visual attributes is introduced. The diversity of its visual properties is maximized in a novel way by clustering a large number of videos according to their visual attributes. This makes it the most sophistically constructed and annotated dataset to date. A multi-platform evaluation system allowing easy integration of third-party trackers is presented as well. The proposed evaluation methodology was tested on the VOT2014 challenge on the new dataset and 38 trackers, making it the largest benchmark to date. Most of the tested trackers are indeed state-of-the-art since they outperform the standard baselines, resulting in a highly-challenging benchmark. An exhaustive analysis of the dataset from the perspective of tracking difficulty is carried out. To facilitate tracker comparison a new performance visualization technique is proposed.
RESUMO
This paper proposes a method that localizes two surveillance cameras and simultaneously reconstructs object trajectories in 3D space. The method is an extension of the Direct Reference Plane method, which formulates the localization and the reconstruction as a system of linear equations that is globally solvable by Singular Value Decomposition. The method's assumptions are static synchronized cameras, smooth trajectories, known camera internal parameters, and the rotation between the cameras in a world coordinate system. The paper describes the method in the context of self-calibrating cameras, where the internal parameters and the rotation can be jointly obtained assuming a man-made scene with orthogonal structures. Experiments with synthetic and real--image data show that the method can recover the camera centers with an error less than half a meter even in the presence of a 4 meter gap between the fields of view.