Dynamic Head: Unified Detection Head
TBC
CNN features are of shape LCHW, where L is the number of layers that are aggregated, HW tells the spatial size and C represents the number of channels.
Dynamic Head [1] achieved state of the art when came out by combining scale aware attention(L dimension), spatial aware attention(HW dimension) and task aware attention(C dimension).
Scale Aware Attention
Spatial Aware Attention
Task Aware Attention
References
[1] Dai, Xiyang, et al. “Dynamic head: Unifying object detection heads with attentions.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.