Exploring advancements in lightweight CNNs: diving into the roles of Residual Blocks, Residual Connections, Dilated & Grouped Convolutions

Table of Contents

1.1. shortcut Connection

He, K., Zhang, X., Ren, S. and Sun, J., 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778). cited 179932

In the perspective of performance enhancement rather than resolving the flaws from previous research: Forming an output from f(x) to f(x)+x:

  • It addresses the issue where the gradient keeps increasing or decreasing.
  • This leads to performance improvements in deep networks, efficient convergence, and compatibility with various network architectures.

However:

  • Since it’s a technique used in deep networks, the advantages aren’t clearly evident in our wearable robot research with only six layers.

Experience with implementation: Yes. From the perspective of applying to our research:

  • There’s potential for improved generalization performance. (The reason it’s considered potential: The results vary depending on the dataset and the specific problem.)
residual connection
shortcut connection

1.2. Residual block

He, K., Zhang, X., Ren, S. and Sun, J., 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778). cited 179932

From a perspective of performance enhancement rather than addressing the shortcomings of previous research: A structure where the number of channels increases and then decreases (wide->narrow->wide):

  • It reduces computational demands and the number of parameters.

Implementation experience: Yes. From our research application perspective:

  • Ultimately, this paper proposed such techniques while stacking over 100 layers. Therefore, in the 6-layer structure we are dealing with, performance improvements may not be observed.
residual block

2. Dilated Convolution

Yu, F. and Koltun, V., 2015. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122. cited 9156Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F. and Adam, H., 2018.
Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV) (pp. 801-818). cited 11675

Dilated Convolution: A Quick Dive

When diving into advanced neural networks, you might come across a term called “dilated convolution”. It might sound complex, but it’s a nifty way to make your network see more without adding more layers.

How Does It Work? Imagine a regular convolution sliding over an image. Now, instead of looking at pixels right next to each other, the convolution skips some pixels in between. This ‘skip’ or ‘dilation’ allows the network to capture a broader context without increasing the filter size.

Why Use Dilated Convolution?

  1. Wider Receptive Field: Without adding more layers or bigger filters, you can make your network capture a more extensive part of the input.
  2. Preserve Resolution: Often, deep networks reduce resolution through pooling. Dilated convolutions can help keep the resolution intact.

In a Nutshell: Dilated convolution is like giving your network binoculars. It allows it to see a more extensive part of the picture without moving closer.

3. Depthwise Separable Convolution

Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M. and Adam, H., 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861. cited 21086

What Is It? Depthwise convolution, as the name suggests, divides the convolution process along the depth (or channels) of the input. Instead of performing convolution in one go, it operates in two steps: first across the height and width, and then across the channels.

Why It Matters?

  1. Reduced Computation: By splitting the convolution process, there’s a significant reduction in computation—roughly by a factor of 9!
  2. Efficiency with Less: It allows networks to achieve similar accuracy as famous models but with fewer multiplications and parameters.

Our Experience & Research Application:

  • With the same number of parameters, it’s possible to widen the channels.
  • Increasing the layer count isn’t feasible since performance drops with more than six convolution layers.
  • Especially in constrained computing environments like Jetson Nano, utilizing depthwise convolution can reduce computation and provide the option to expand layers, pointing towards potential performance improvements.

In Short: Depthwise convolution is like an efficiency expert for neural networks, allowing them to do more with less!

dw(depthwise separable convolution)

4. Grouped Convolution

Zhang, X., Zhou, X., Lin, M. and Sun, J., 2018. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6848-6856). cited 6628
Krizhevsky, A., Sutskever, I. and Hinton, G.E., 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems25. cited 120329

Convolutional operations are central to the field of deep learning, particularly in the area of computer vision. The classic convolution, while powerful, often demands a lot of computational resources. Enter: Grouped Convolution. This technique holds the potential to transform the way convolution operations are carried out.

What is Grouped Convolution? Grouped convolution involves splitting the channels into multiple groups and then performing convolution independently within each group.

Benefits:

  1. Reduced Computational Demand: The primary advantage is the reduction in the amount of computation. By limiting the convolution to independent groups, we drastically cut down on the number of operations.
  2. Parallel Processing: This approach lends itself nicely to parallel processing. If one has access to multiple GPUs, it’s possible to process each group on a separate GPU.
  3. Fewer Parameters and Operations: Compared to the traditional convolution, grouped convolution requires fewer parameters and involves fewer operations.

Challenges: The main challenge with grouped convolution is the absence of information exchange between the groups. This might cause loss of some important feature representations.

The Solution: Channel Shuffle Algorithm: To combat the potential loss of feature representations between groups, the channel shuffle algorithm can be employed. The essence of this technique is to shuffle the channels in a way that makes them fully related. This helps in ensuring that there is no significant loss of information when using grouped convolution.

Personal Note: While I haven’t had hands-on experience implementing this, from a research application perspective, the ability to reduce computation means we can have broader channels for the same number of parameters.

grouped convolution
Categories: DEEP LEARNING

0 Comments

Leave a Reply