Dynamic Head: Unified Detection Head

Looking Back to Look Forward

Aug 21, 2023


CNN features are of shape LCHW, where L is the number of layers that are aggregated, HW tells the spatial size and C represents the number of channels.

Dynamic Head [1] achieved state of the art when came out by combining scale aware attention(L dimension), spatial aware attention(HW dimension) and task aware attention(C dimension).

Scale Aware Attention

Spatial Aware Attention

Task Aware Attention


[1] Dai, Xiyang, et al. “Dynamic head: Unifying object detection heads with attentions.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.