Object Tracking

liudongdong1 收录于 Categories 视觉AI

2020-06-07 约 327 字预计阅读 1 分钟 - 次阅读

Paper: SiamRPN++

propose a new model architecture to perform layer-wise and depth-wise aggregations, which not only further improves the accuracy but also reduces the model size.
provide a deep analysis of Siamese trackers and prove that when using deep networks the decrease in accuracy comes from the destroy of the strict translation invariance.
present a simple yet effective sampling strategy to break the spatial invariance restriction which successfully trains Siamese tracker driven by ResNet architecture.
propose a layer wise feature aggregation structure for the cross-correlation operation, which helps the tracker to predict the similarity map from features learned at multiple levels.
propose a depth-wise separable correlation structure to enhance the cross-correlation to produce multiple similarity maps associated with different semantic meanings.

Application Area: object tracking, velocity measurement, multi-object analyse

previous work:

Siamese trackers formulate the visual object tracking problem as learning a general similarity map by cross-correlation between the feature representations learned for the target template and the search region.

【Qustion 1】for strict translation

the spatial aware sampling strategy effectively alleviate the break of the strict tranlation invariance property caused by the networks with padding.

【Question 2】 how to transfer a deep network into our tracking algorithms

lay-wise aggregation: compounding and aggregating these representations improve inference of recognition and localization.
- features from earlier layers mainly foces on low level information such as color, shape, are essential for localization, the latter layers have rich semantic information like motion blur, huge deformation.
- the output sizes of the three RPN modules have same spatial resolution, weighted sum is adopted directly on the RPN output.$S_all=\sum_{l=5}^5a_iS_l, B_all=\sum_{l=3}^5b_iB_l$
Depthwise cross correlation: the object in the same category have high resppnse on same channels.