Comprehensive Guide on Convolutional Neural Network
Start your free 7-days trial now!
What is convolutional neural network?
Convolution neural network, or CNN for short, is a variant of the standard artificial neural network that is often applied for image processing. Contrary to popular belief, the standard neural networks can also handle images, but they do so by flattening the input array such that a 28-pixel by 28-pixel two-dimensional array is flattened to an one-dimensional vector of size 784. This one-dimensional vector is then fed into the neural network for training.
The problem with this is that we are discarding positional information. For instance, pixels that are near one another should be similar, while pixels that are far away should not be related. For CNN, we preserve the shape, that is, the input as well as the corresponding output are 3-dimensional arrays. The three dimensions are as follows:
For convolutional neural networks, we deal with 3-dimensional data:
height
width
color channel (3 for RGB and 1 for grayscale)
Diagram of convolutional neural network
Here are some terminology:
The input of a convolution layer is called an input feature map
The output of a convolution layer is called an output feature map
The input and output of a convolution layer is called feature map
Filters
Consider the following parameters:
$H$ is the height of the original input
$W$ is the width of the original input
$P$ is the padding
$FH$ is the height of the filter
$FW$ is the width of the filter
$S$ is the stride
The new size would be $(OH, OW)$:
You have to select the value of the parameters such that OH and OW are integers. In some deep learning frameworks, OH and OW are simply rounded without throwing an error.
Here, we are using just one filter, and therefore, we end up with a two-dimensional output data. We could apply multiple filters.
The completed block of size (FN, OH, OW) is passed on to the next layer. As for the notation, we can define the shape of the filter like so:
(FN, C, FH, FW)
For instance, if the channel size is 3, and we have fifty 5x5 filter, the shape would be as follows:
(50, 3, 5, 5)
Just like for traditional neural networks, convolutional layers also incorporate bias in the arithmetics:
The bias holds a single value per channel. Therefore, if we are dealing with 3 channels, the bias would be a vector of size 3 with possibly three different values.
Just like for standard neural networks, we can apply the notion of batch processing for CNN. Now the input data would be 4-dimensional since we now consider multiple input data at the same time as a batch.