11 Comments

There are two ways it can be handled. The paper on dropout says that during inference they multiply the probability of dropout with the activations of that layers. Where as the implementation in the library such as PyTorch and other, the compensation is done during the training by scaling the activations. Great write up. So true that many doesn’t the scaling factors and many senior people who read the paper says only inference way of compensation is correct.

Expand full comment

Totally, Jaiprasad. Both ways are pretty widely used. Thanks for appreciating :)

Expand full comment

Avi TBH, it would be very interesting if you'd create a blog post listing the practical ML questions (with answers) you've asked the candidates during the interview!

Expand full comment

Never knew this. Thanks a lot for the insights. Amazing these techniques are implemented in the train and eval methods.

Expand full comment

The best ML newsletter! Thanks Avi

Expand full comment

Thanks so much, Damien :)

Expand full comment

Hello,

As mentioned by Jaiprasad R, in the comments, the paper on dropout mentions about multiplying w with p during evaluation phase(all the neurons will be present all the time during evaluation time, but the weights(w) are multiplied by p). I am new to Pytorch and TensorFlow. So, I am not sure on how they do it.

Been following dailydoseofds from long time. Thanks for all the great work!

Expand full comment

Can you please tell me which online tool you are using to create these beautiful and eye catching images

Expand full comment

Great article. Don’t know this concept. Thanks Avi.

Expand full comment

Great write up.

Like this article, can you suggest some resources, from where we can gain comprehensive understanding of ML ? Maybe a book or something.

Expand full comment

Great write Avi!

Expand full comment