In [ ]:
Well actually, there’s no log here, since we’re using the same definition as PyTorch. That means we need to put the log together with softmax:
In [ ]:
sm = log_softmax(r); sm[0][0]
Out[ ]:
tensor(-1.2790, grad_fn=<SelectBackward>)
Combining these gives us our cross-entropy loss:
In [ ]:
loss = nll(sm, yb)
loss
Out[ ]:
Note that the formula:
\log \left ( \frac{a}{b} \right ) = \log(a) - \log(b)
In [ ]:
Out[ ]:
Then, there is a more stable way to compute the log of the sum of exponentials, called the trick. The idea is to use the following formula:
\log \left ( \sum_{j=1}^{n} e^{x_{j}} \right ) = \log \left ( e^{a} \sum_{j=1}^{n} e^{x_{j}-a} \right ) = a + \log \left ( \sum_{j=1}^{n} e^{x_{j}-a} \right )
where $a$ is the maximum of $x_{j}$.
Here’s the same thing in code:
In [ ]:
x = torch.rand(5)
a = x.max()
x.exp().sum().log() == a + (x-a).exp().sum().log()
Out[ ]:
tensor(True)
We’ll put that into a function:
def logsumexp(x):
return m + (x-m[:,None]).exp().sum(-1).log()
logsumexp(r)[0]
Out[ ]:
so we can use it for our log_softmax
function:
In [ ]:
def log_softmax(x): return x - x.logsumexp(-1,keepdim=True)
Which gives the same result as before:
In [ ]:
sm = log_softmax(r); sm[0][0]
Out[ ]:
tensor(-1.2790, grad_fn=<SelectBackward>)
We can use these to create cross_entropy
:
In [ ]:
Let’s now combine all those pieces together to create a .