With deeper layers, we still want the gradients, but they won’t just be equal to the weights anymore. We have to calculate them. The gradients of every layer are calculated for us by PyTorch during the backward pass, but they’re not stored (except for tensors where is True). We can, however, register a hook on the backward pass, which PyTorch will give the gradients to as a parameter, so we can store them there. For this we will use a HookBwd class that works like Hook, but intercepts and stores gradients instead of activations:

    In [ ]:

    Then for the class index 1 (for True, which is “cat”) we intercept the features of the last convolutional layer as before, and compute the gradients of the output activations of our class. We can’t just call output.backward(), because gradients only make sense with respect to a scalar (which is normally our loss) and output is a rank-2 tensor. But if we pick a single image (we’ll use ) and a single class (we’ll use 1), then we can calculate the gradients of any weight or activation we like, with respect to that single value, using output[0,cls].backward(). Our hook intercepts the gradients that we’ll use as weights:

    In [ ]:

    1. cls = 1
    2. with HookBwd(learn.model[0]) as hookg:
    3. with Hook(learn.model[0]) as hook:
    4. act = hook.stored
    5. grad = hookg.stored

    In [ ]:

    In [ ]:

    1. x_dec.show(ctx=ax)
    2. ax.imshow(cam_map.detach().cpu(), alpha=0.6, extent=(0,224,224,0),
    3. interpolation='bilinear', cmap='magma');

    The novelty with Grad-CAM is that we can use it on any layer. For example, here we use it on the output of the second-to-last ResNet group:

    In [ ]:

    1. cam_map = (w * act[0]).sum(0)

    And we can now view the activation map for this layer:

    In [ ]:

    Gradient CAM - 图1