In [ ]:

    Well actually, there’s no log here, since we’re using the same definition as PyTorch. That means we need to put the log together with softmax:

    In [ ]:

    1. sm = log_softmax(r); sm[0][0]

    Out[ ]:

    1. tensor(-1.2790, grad_fn=<SelectBackward>)

    Combining these gives us our cross-entropy loss:

    In [ ]:

    1. loss = nll(sm, yb)
    2. loss

    Out[ ]:

      Note that the formula:

      \log \left ( \frac{a}{b} \right ) = \log(a) - \log(b)

      In [ ]:

      Out[ ]:

      Then, there is a more stable way to compute the log of the sum of exponentials, called the trick. The idea is to use the following formula:

      \log \left ( \sum_{j=1}^{n} e^{x_{j}} \right ) = \log \left ( e^{a} \sum_{j=1}^{n} e^{x_{j}-a} \right ) = a + \log \left ( \sum_{j=1}^{n} e^{x_{j}-a} \right )

      where $a$ is the maximum of $x_{j}$.

      Here’s the same thing in code:

      In [ ]:

      1. x = torch.rand(5)
      2. a = x.max()
      3. x.exp().sum().log() == a + (x-a).exp().sum().log()

      Out[ ]:

      1. tensor(True)

      We’ll put that into a function:

      1. def logsumexp(x):
      2. return m + (x-m[:,None]).exp().sum(-1).log()
      3. logsumexp(r)[0]

      Out[ ]:

      so we can use it for our log_softmax function:

      In [ ]:

      1. def log_softmax(x): return x - x.logsumexp(-1,keepdim=True)

      Which gives the same result as before:

      In [ ]:

      1. sm = log_softmax(r); sm[0][0]

      Out[ ]:

      1. tensor(-1.2790, grad_fn=<SelectBackward>)

      We can use these to create cross_entropy:

      In [ ]:

        Let’s now combine all those pieces together to create a .