2 Comments

Congrats for the great work you are doing!!!

In typical images representing CNNs, it seems that the sliding window that scans the input images and feeds a conv layer in the first group of layers, uses the same set of weights in each layer. For example, a 4x4 sliding window requires 16 weights. Assuming 20 layers in the 1st conv group we have 4x4x20=320 weights, right?

Then, in the second group of conv layers it seems that each layer gets input from a volume (3D) of neurons that includes part of every layer of the previous group of layers. Right? What happens with the weights there? Reading 20 layers at the same time with a, say, 3x3 sliding window results in 3x3x20=180 weights. Having, say, 20 layers in the second conv group gives 180x20=3600 weights. Is that right or am I missing something?

For example, in the visualisation, a layer in conv_1_2 gets input from 10 layers of the previous stage. Each layer there is scaned by a 3X3 sliding window, that is, 3X3X10=90 weights. Given that conv_1_2 has 10 layers, we have 90X10=900 trainable parameters between conv_1_1 and conv_1_2. Right?

Expand full comment

One of the most informative newsletter I have ever subscribed to.

Thanks Avi !

Expand full comment