- How did we get to a single vector of activations in the CNNs used for MNIST in previous chapters? Why isn’t that suitable for Imagenette?
- What do we do for Imagenette instead?
- What is “adaptive pooling”?
- What is “average pooling”?
- Why do we need after an adaptive average pooling layer?
- What is a “skip connection”?
- Why do skip connections allow us to train deeper models?
- What is “identity mapping”?
- What is the basic equation for a ResNet block (ignoring batchnorm and ReLU layers)?
- What do ResNets have to do with residuals?
- How do we deal with the skip connection when there is a stride-2 convolution? How about when the number of filters changes?
- How can we express a 1×1 convolution in terms of a vector dot product?
- Create a
1x1 convolution
with ornn.Conv2d
and apply it to an image. What happens to the of the image? - What does the
noop
function return? - Explain what is shown in <>.
- What is the “stem” of a CNN?
- Why do we use plain convolutions in the CNN stem, instead of ResNet blocks?
- How does a bottleneck block differ from a plain ResNet block?
- Why is a bottleneck block faster?
- How do fully convolutional nets (and nets with adaptive pooling in general) allow for progressive resizing?
- Try creating a fully convolutional net with adaptive average pooling for MNIST (note that you’ll need fewer stride-2 layers). How does it compare to a network without such a pooling layer?
- In <> we introduce Einstein summation notation. Skip ahead to see how this works, and then write an implementation of the 1×1 convolution operation using . Compare it to the same operation using
torch.conv2d
. - Write a “top-5 accuracy” function using plain PyTorch or plain Python.