We saw in <> that we can use a confusion matrix to see where our model is doing well, and where it’s doing badly:

    In [ ]:

    Oh dear—in this case, a confusion matrix is very hard to read. We have 37 different breeds of pet, which means we have 37×37 entries in this giant matrix! Instead, we can use the method, which just shows us the cells of the confusion matrix with the most incorrect predictions (here, with at least 5 or more):

    In [ ]:

    Since we are not pet breed experts, it is hard for us to know whether these category errors reflect actual difficulties in recognizing breeds. So again, we turn to Google. A little bit of Googling tells us that the most common category errors shown here are actually breed differences that even expert breeders sometimes disagree about. So this gives us some comfort that we are on the right track.

    We seem to have a good baseline. What can we do now to make it even better?