Context-aware and Attentional Visual Object Tracking

Ming Ying

doi:https://doi.org/10.21985/N25M89

Work

Context-aware and Attentional Visual Object Tracking

Public Deposited

Download PDF

Download All Files (.zip)

Visual object tracking, i.e. consistently inferring the motion of a desired target from image sequences, is a must-have component to bridge low-level image processing techniques and high-level video content analysis. This has been an active and fruitful research topic in the computer vision community for decades due to both its versatile applications in practice, e.g. in human-computer interaction, security surveillance, robotics, medical imaging and multimedia applications, and diverse impacts in theory, e.g. Bayesian inference on graphical models, particle filtering, kernel density estimation, and machine learning algorithms. However, long-term robust tracking in unconstrained environments remains a very challenging task, and the difficulties in reality are far from being conquered. The two core challenges of the visual object tracking task are the computational efficiency constraint and the enormous unpredictable variations in targets due to lighting changes, deformations, partial occlusions, camouflage, quick motion and imperfect image qualities, etc. More critical, the tracking algorithms have to deal with these variations in an unsupervised manner. All the target variations in on-line applications are unpredictable, thus it is extremely hard, if not impossible, to design universal target specific or non-specific observation models in advance. Therefore, these challenges call for non-stationary target observation models and agile motion estimation paradigms that are intelligent and adaptive to different scenarios. In the thesis, we mainly focus on how to enhance the generality and reliability of object-level visual tracking, which strives to handle enormous variations and takes the computational efficiency constraint into consideration as well. We first present an in-depth analysis of the chicken-and-egg nature of on-line adaptation of target observation models directly using the previous tracking results. Then, we propose two novel ideas to combat unpredictable variations: context-aware tracking and attentional tracking. In context-aware tracking, the tracker automatically discovers some auxiliary objects that have short-term motion correlation with the target. These auxiliary objects are regarded as the spatial contexts to enhance the target observation model and verify the tracking results. The attentional tracking algorithms enhance the robustness of the observation models by selectively focusing on some discriminative regions inside the targets, or adaptively tuning the feature granularity and model elasticity. Context-aware tracking aims to search for external informative contexts of targets, in contrast, attentional tracking tries to identify internal discriminative characteristics of targets, thus they are complementary to each other in some sense. The proposed approaches can tolerate many typical difficult variations, thus greatly enhancing the robustness of the region-based object trackers. Besides single object tracking, we also introduce a new view to multiple target tracking from a game-theoretic perspective which bridges the joint motion estimation and the Nash Equilibrium of a particular game and has linear complexity with respect to the number of targets. Extensive experiments on challenging real-world test video sequences demonstrate excellent and promising results of the proposed object-level visual tracking algorithms.

Last modified