19 A fastai Learner from Scratch - Loss - 《The fastai book》

In [ ]:

Well actually, there’s no log here, since we’re using the same definition as PyTorch. That means we need to put the log together with softmax:

In [ ]:


sm = log_softmax(r); sm[0][0]

Out[ ]:

tensor(-1.2790, grad_fn=<SelectBackward>)

Combining these gives us our cross-entropy loss:

In [ ]:

loss = nll(sm, yb)
loss

Out[ ]:

Note that the formula:

\log \left ( \frac{a}{b} \right ) = \log(a) - \log(b)

In [ ]:

Out[ ]:

Then, there is a more stable way to compute the log of the sum of exponentials, called the trick. The idea is to use the following formula:

\log \left ( \sum_{j=1}^{n} e^{x_{j}} \right ) = \log \left ( e^{a} \sum_{j=1}^{n} e^{x_{j}-a} \right ) = a + \log \left ( \sum_{j=1}^{n} e^{x_{j}-a} \right )

where $a$ is the maximum of $x_{j}$.

Here’s the same thing in code:

In [ ]:

x = torch.rand(5)
a = x.max()
x.exp().sum().log() == a + (x-a).exp().sum().log()

Out[ ]:

tensor(True)

We’ll put that into a function:

def logsumexp(x):
    return m + (x-m[:,None]).exp().sum(-1).log()
logsumexp(r)[0]

Out[ ]:

so we can use it for our log_softmax function:

In [ ]:

def log_softmax(x): return x - x.logsumexp(-1,keepdim=True)

Which gives the same result as before:

In [ ]:

sm = log_softmax(r); sm[0][0]

Out[ ]:

tensor(-1.2790, grad_fn=<SelectBackward>)

We can use these to create cross_entropy:

In [ ]:

Let’s now combine all those pieces together to create a .