Open in app

Sign in

Write

Sign in

Neville
Neville

4 Followers

Home

About

Sep 29

Cluster-GCN and Swin Transformer

Looking Back to Look Forward — Long before Swin Transformer[1], Cluster-GCN uses inter-cluster links inside a mini-batch, which facilitates the information flow between clusters. TBC [1] Liu, Ze, et al. “Swin transformer: Hierarchical vision transformer using shifted windows.” Proceedings of the IEEE/CVF international conference on computer vision. 2021. [2] Chiang, Wei-Lin, et al. “Cluster-gcn: An efficient algorithm for training deep and large graph convolutional networks.” Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 2019.

Swin Transformer

1 min read

Swin Transformer

1 min read


Aug 31

Rethinking SENet: Squeeze and Excitation Network

Looking Back to Look Forward — SENet[1] is one of the pioneer works that introduced the attention mechanism into the modeling of vision data. So what is attention mechanism? in mathematical terms, attention is all about “weighting” : that weights some features more over others. The larger the weight, the more influence they will contribute to…

Self Attention

2 min read

Rethinking SENet: Squeeze and Excitation Network
Rethinking SENet: Squeeze and Excitation Network
Self Attention

2 min read


Aug 30

XGBoost Explained

Looking Back to Look Forward — XGBoost[1] and its successor LightGBM[2] are the de facto industrial standard for coping many real life machine learning problems that involve tabular data, like CTR prediction, weather prediction and fraud detection, just to name a few, even in the era of Deep Learning. Efficient Algorithm and system design share the…

Xgboost

1 min read

Xgboost

1 min read


Aug 29

Why Swin Transformer?

Looking Back to Look Forward — The debut of Vision Transformer[1] confirms the effectiveness of handling vision data using Transformer[3] like architectures. The success of ViT is attributed to its long range relationship modeling ability that could capture the interactions of different image parts from the very early on. ViT sparks great enthusiasm among the Computer…

Vision Transformer

4 min read

Vision Transformer

4 min read


Aug 28

Revisiting OCR: Object Contextual Representations

Looking Back to Look Forward — OCR[1] was mainly motivated by the fact that the label of a pixel is the category of the object that the pixel belongs to. So OCR augments pixels with Object Representations and at the same time learning the weights between the pixels and the object region representations. Logically, OCR augments pixels in three steps Construct soft region generator by learning from ground truth segmentation labels

Object Contexual Rep

1 min read

Object Contexual Rep

1 min read


Aug 25

SAM-Segment Anything

One more step for zero shot segmentation — SAM tried to build a foundation model for segmentation, which consists of three interconnected parts. promptable segmentation task prompts can be fed into the model in various forms, e.g. points, box rectangles, free form texts or just masks. prompts and image labels guide the model to produce reliable and precise segmentation masks. segmentation model In…

Segmentation

1 min read

Segmentation

1 min read


Aug 23

A Tour of MAE: Masked AutoEncoder

Looking Back to Look Forward — Why did progress of self supervised learning like masked auto-encoder in Computer Vision lag behind that of Nature Language Processing(NLP)? Information Density Words are information dense, pictures are information redundant, some missing information can be inferred from pixels nearby So MAE employs a radically higher mask ratio like 75% to…

Mae

1 min read

Mae

1 min read


Aug 22

Grounding DINO Explained

A Small Step Towards Open-Set Object Detection — Open-Set Detection leverages the learning of region-aware region embeddings, so that each region can be classified into novel categories in a language aware semantic space.[1] TBC Bullet points: The success of Grounding DINO [2] is attributed to the effective fusion of vision and language modalities from the very early on. Feature enhancement language-guided query selection

Open Set Object Detection

1 min read

Open Set Object Detection

1 min read


Aug 21

Dynamic Head: Unified Detection Head

Looking Back to Look Forward — TBC CNN features are of shape LCHW, where L is the number of layers that are aggregated, HW tells the spatial size and C represents the number of channels. Dynamic Head [1] achieved state of the art when came out by combining scale aware attention(L dimension), spatial aware attention(HW dimension) and task aware attention(C dimension). Scale Aware Attention Spatial Aware Attention

Dynamic Head

1 min read

Dynamic Head

1 min read


Aug 20

Rethinking Bilinear CNN

Looking Back to Look Forward — Fine Grained Image Analysis requires the modeling of interactions between various parts of the image Bilinear CNN [1] can be seen as a special variation of the attention mechanism. TBC References [1] Lin, Tsung-Yu, Aruni RoyChowdhury, and Subhransu Maji. “Bilinear CNN models for fine-grained visual recognition.” Proceedings of the IEEE international conference on computer vision. 2015.

Fgia

1 min read

Rethinking Bilinear CNN
Rethinking Bilinear CNN
Fgia

1 min read

Neville

Neville

4 Followers
Following
  • Davide Gazzè - Ph.D.

    Davide Gazzè - Ph.D.

  • Maxime Labonne

    Maxime Labonne

  • Natan Katz

    Natan Katz

  • Vered Shwartz

    Vered Shwartz

  • Mingu Kang

    Mingu Kang

See all (15)

Help

Status

About

Careers

Blog

Privacy

Terms

Text to speech

Teams