Dynamic Head: Unified Detection Head

Looking Back to Look Forward

Aug 21, 2023

TBC

CNN features are of shape LCHW, where L is the number of layers that are aggregated, HW tells the spatial size and C represents the number of channels.

Dynamic Head [1] achieved state of the art when came out by combining scale aware attention(L dimension), spatial aware attention(HW dimension) and task aware attention(C dimension).

Scale Aware Attention

Spatial Aware Attention

Task Aware Attention

References

[1] Dai, Xiyang, et al. “Dynamic head: Unifying object detection heads with attentions.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.