Rethinking SENet: Squeeze and Excitation Network
SENet[1] is one of the pioneer works that introduced the attention mechanism into the modeling of vision data.
So what is attention mechanism? in mathematical terms, attention is all about “weighting” : that weights some features more over others. The larger the weight, the more influence they will contribute to the aggregated features of next layer, to put it in another way, the network attends more to the features with higher weights, while neglects more for the features with lower weights.
If the weights are learned(transformed) by the features themselves, we call it “self attention”.
SENet learns to focus on the most information rich channels of the incoming feature map.
The incoming feature map is firstly transformed into a vector of the same dimension as the number of feature channels, the vector is then further transformed(learning) and softmaxed into a probability like distribution with a sum of 1.
The original feature map is multiplied by the vector, each scale learned within the vector is broadcast to the channel with corresponding index.
The superior performance released in the SENet paper echoes with countless experimental results that many channels are redundant.
References
[1] Hu, Jie, Li Shen, and Gang Sun. “Squeeze-and-excitation networks.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.