Post

10. 2D CNN Case Study

10. 2D CNN Case Study

2D CNN Case Study


Models

1
2
3
4
5
1. AlexNet
2. ZFNet
3. VGGNet
4. GoogLeNet (Inception v1)
5. ResNet

AlexNet

1. Structure

AlexNet

\begin{table}[h] \centering \begin{tabular}{llll} \hline Layer & Input & Filters & Output
\hline CONV1 & 224x224x3 & 96 filters 11x11, stride 4, pad 1.5 & 55x55x96
POOL1 & 55x55x96 & 3x3, stride 2 & 27x27x96
NORM1 & 27x27x96 & Normalization & 27x27x96
CONV2 & 27x27x96 & 256 filters 5x5, stride 1, pad 2 & 27x27x256
POOL2 & 27x27x256 & 3x3, stride 2 & 13x13x256
NORM2 & 13x13x256 & Normalization & 13x13x256
CONV3 & 13x13x256 & 384 filters 3x3, stride 1, pad 1 & 13x13x384
CONV4 & 13x13x384 & 384 filters 3x3, stride 1, pad 1 & 13x13x384
CONV5 & 13x13x384 & 256 filters 3x3, stride 1, pad 1 & 13x13x256
POOL3 & 13x13x256 & 3x3, stride 2 & 6x6x256
FC6 & 6x6x256 & Fully connected & 4096
FC7 & 4096 & Fully connected & 4096
FC8 & 4096 & Fully connected & 1000
\hline \end{tabular} \caption{AlexNet Architecture} \end{table}

2. Main Idea

  • This model is innovative structure on that years. The main point of model is using non-linear activation, Dropout, parallel learning with GPU and Data Augmentation.

ZFNet

1. Structure

ZFNet

2. Main Idea

  • This model is similiar to AlexNet. The main point of model found that large stride and kernel size is not efficient to performance.

VGGNet

1. Structure

VGGNet

$No\; longer\; 11\; or\; 5\;size\; filters \rightarrow 3\;size\; conv\; filters\; only. $

2. Main Idea

  • This model is similiar to AlexNet. The main point of model found that a number of 3x3 kernel size is more better than 5x5 or 11x11 kernel size because it is more efficient aspect to computational source.

GoogLeNet

1. Structure

GoogLeNet

Inception Structure

Parallel branches:

  • 1×1
  • 3×3
  • 5×5
  • 3×3 MaxPool

Concatenate outputs.

2. Main Idea

  • The main point of model found that a number of 3x3 kernel size makes too much huge parameters and lack of memory when having many layers.
ModelYearDepthParameters (approx.)Key Characteristics
VGG-16201416 layers138 millionRepeated 3×3 convolutions, very large fully connected layers
VGG-19201419 layers144 millionDeeper version of VGG-16
GoogLeNet (Inception v1)201422 layers6.8 millionUses Inception modules with 1×1 convolutions for dimensionality reduction

ResNet

1. Structure

ResNet

Instead of learning H(x), learn residual:

\[H(x) = F(x) + x\]

Residual block:

\[y = F(x) + x\]

Where:

\[F(x) = W_2 \sigma(W_1 x)\]

2. Main Idea

ResNet enables very deep networks by adding identity shortcut connections. If a layer cannot learn a useful transformation, it can simply learn a near-zero residual and behave like an identity mapping. This helps preserve information and allows gradients to flow more easily, mitigating the vanishing gradient problem. Now we can enable very deep networks (100+ layers)

This post is licensed under CC BY 4.0 by the author.