EN FR
EN FR
STARS - 2016
Overall Objectives
Bilateral Contracts and Grants with Industry
Bibliography
Overall Objectives
Bilateral Contracts and Grants with Industry
Bibliography


Section: New Results

Multi-Object Tracking of Pedestrian Driven by Context

Participants : Thi Lan Anh Nguyen, François Brémond, Jana Trojanova.

Keywords: Tracklet fusion, Multi-object tracking

Multi-object tracking (MOT) is essential to many applications in computer vision. As so many trackers have been proposed in the past, one would expect the tracking task as solved. It is true for scenarios containing solid background with a low number of objects and few interactions. However, scenarios with appearance changes due to pose variation, abrupt motion changes, and occlusion still represent a big challenge.

In the state of the art, some sets of efficient methods are proposed to face this challenge: data association (local and global) and tracking parameter adaptation. A very popular method for local data association is the bipartite matching. The exact solution can be found via Hungarian algorithm [85]. These methods are computationally inexpensive, but can deal only with short term occlusion. An example of global method is the extension of the bipartite matching into network flow [104]. Given the objects detections at each frame, the direct acyclic graph is formed and the solution is found through minimum-cost flow algorithm. The algorithms reduce trajectory fragments and improve trajectory consistency but lack robustness to identity switches of close or intersecting trajectories.

Figure 8. Our proposed framework.
IMG/framework_lan.png

Another set of methods for MOT is online parameter adaptation [56]. They tune automatically the tracking parameters based on the context information, while methods mentioned above use one appearance and/or one motion feature for the whole video. In [56], the authors learn the parameters for the scene context offline. In online phase the tracking parameters are selected from database based on the current context of the scene. These parameters are applied to all objects in the scene. Such a concept assumes discriminative appearance and trajectories among individuals, which is not always the case in real scenarios.

In order to overcome these limitations, we propose a new long term tracking framework. This framework has several dominant contributions:

  • We introduce new long term tracking framework which combines short data association and the online parameter tuning for individual tracklets. In contrast to previous methods that used the same setting for all tracklets.

  • We show that large number of parameters can be efficiently tuned via multiple simulated annealing, whereas previous method could tune only a limited number of parameters and fix the rest to be able to do exhaustive search.

  • We define the surrounding context around each tracklet and similarity metric among tracklets allowing us to match learned context with unseen video set.

The proposed framework was trained on 9 public video sequences and tested on 3 unseen sets. It outperforms the state-of-art pedestrian trackers in scenarios of motion changes, appearance changes and occlusion of objects as shown in Table 4. The paper is accepted in conference AVSS-2016 [39].

Table 4. Tracking performance. The best values are printed in red.
Dataset Method MOTA MOTP GT MT PT ML
PETS2009 Shitrit et al.   [52] 0.81 0.58 21
Bae et al.-global association   [50] 0.73 0.69 23 100 0 0.0
Chau et al.  [57] 0.62 0.63 21
Chau   [58](   [57] + parameter tuning for whole video context) 0.85 0.71 21
Ours (   [57] + Proposed approach ) 0.86 0.73 21 76.2 14.3 9.5
TUD-Stadtmitte Andriyenko et al.   [47] 0.62 0.63 9 60.0 20.0 10.0
Milan et al.   [81] 0.71 0.65 9 70.0 20.0 0.0
Chau et al.   [57] 0.45 0.62 10 60.0 40.0 0.0
Chau   [58][57] + parameter tuning for whole video context) 10 70.0 10.0 20.0
Ours (   [57] + Proposed approach ) 0.47 0.65 10 70.0 30.0 0.0
TUD-Crossing Tang et al.   [96] 11 53.8 38.4 7.8
Chau et al.   [57] 0.69 0.65 11 46.2 53.8 0.0
Ours[57] + Proposed approach) 0.72 0.67 11 53.8 46.2 0.0