14 ResNets - Questionnaire - 《The fastai book》

How did we get to a single vector of activations in the CNNs used for MNIST in previous chapters? Why isn’t that suitable for Imagenette?
What do we do for Imagenette instead?
What is “adaptive pooling”?
What is “average pooling”?
Why do we need after an adaptive average pooling layer?
What is a “skip connection”?
Why do skip connections allow us to train deeper models?
What is “identity mapping”?
What is the basic equation for a ResNet block (ignoring batchnorm and ReLU layers)?
What do ResNets have to do with residuals?
How do we deal with the skip connection when there is a stride-2 convolution? How about when the number of filters changes?
How can we express a 1×1 convolution in terms of a vector dot product?
Create a 1x1 convolution with or nn.Conv2d and apply it to an image. What happens to the of the image?
What does the noop function return?
Explain what is shown in <>.
What is the “stem” of a CNN?
Why do we use plain convolutions in the CNN stem, instead of ResNet blocks?
How does a bottleneck block differ from a plain ResNet block?
Why is a bottleneck block faster?
How do fully convolutional nets (and nets with adaptive pooling in general) allow for progressive resizing?

Try creating a fully convolutional net with adaptive average pooling for MNIST (note that you’ll need fewer stride-2 layers). How does it compare to a network without such a pooling layer?
In <> we introduce Einstein summation notation. Skip ahead to see how this works, and then write an implementation of the 1×1 convolution operation using . Compare it to the same operation using torch.conv2d.
Write a “top-5 accuracy” function using plain PyTorch or plain Python.