Object Tracking
目录
Paper: SiamRPN++
- propose a new model architecture to perform layer-wise and depth-wise aggregations, which not only further improves the accuracy but also reduces the model size.
- provide a deep analysis of Siamese trackers and prove that when using deep networks the decrease in accuracy comes from the destroy of the strict translation invariance.
- present a simple yet effective sampling strategy to break the spatial invariance restriction which successfully trains Siamese tracker driven by ResNet architecture.
- propose a layer wise feature aggregation structure for the cross-correlation operation, which helps the tracker to predict the similarity map from features learned at multiple levels.
- propose a depth-wise separable correlation structure to enhance the cross-correlation to produce multiple similarity maps associated with different semantic meanings.
Research Objective
- Application Area: object tracking, velocity measurement, multi-object analyse
Proble Statement
previous work:
- Siamese trackers formulate the visual object tracking problem as learning a general similarity map by cross-correlation between the feature representations learned for the target template and the search region.
Methods
- Problem Formulation:
【Qustion 1】for strict translation
- the spatial aware sampling strategy effectively alleviate the break of the strict tranlation invariance property caused by the networks with padding.
【Question 2】 how to transfer a deep network into our tracking algorithms
- propose a SiamRPN++ network.
-
lay-wise aggregation: compounding and aggregating these representations improve inference of recognition and localization.
- features from earlier layers mainly foces on low level information such as color, shape, are essential for localization, the latter layers have rich semantic information like motion blur, huge deformation.
- the output sizes of the three RPN modules have same spatial resolution, weighted sum is adopted directly on the RPN output.$S_all=\sum_{l=5}^5a_iS_l, B_all=\sum_{l=3}^5b_iB_l$
-
Depthwise cross correlation: the object in the same category have high resppnse on same channels.
Notes 去加强了解
- https://lb1100.github.io/SiamRPN++. 开源代码pysot