Before we move to let’s talk about timing, or how a video player knows the right time to play a frame.

    In the last example, we saved some frames that can be seen here:

    Therefore we need to introduce some logic to play each frame smoothly. For that matter, each frame has a presentation timestamp (PTS) which is an increasing number factored in a timebase that is a rational number (where the denominator is known as timescale) divisible by the frame rate (fps).

    It’s easier to understand when we look at some examples, let’s simulate some scenarios.

    For a and timebase=1/60000 each PTS will increase timescale / fps = 1000 therefore the PTS real time for each frame could be (supposing it started at 0):

    • frame=0, PTS = 0, PTS_TIME = 0
    • frame=1, PTS = 1000, PTS_TIME = PTS * timebase = 0.016
    • frame=2, PTS = 2000, PTS_TIME = PTS * timebase = 0.033
    • frame=0, PTS = 0, PTS_TIME = 0
    • frame=3, PTS = 3, PTS_TIME = PTS * timebase = 0.050

    For a fps=25/1 and timebase=1/75 each PTS will increase timescale / fps = 3 and the PTS time could be:

    • frame=0, PTS = 0, PTS_TIME = 0
    • frame=1, PTS = 3, PTS_TIME = PTS * timebase = 0.04
    • frame=3, PTS = 9, PTS_TIME = PTS * timebase = 0.12
    • frame=24, PTS = 72, PTS_TIME = PTS * timebase = 0.96
    • frame=4064, PTS = 12192, PTS_TIME = PTS * timebase = 162.56

    Now with the pts_time we can find a way to render this synched with audio pts_time or with a system clock. The FFmpeg libav provides these info through its API:

    Just out of curiosity, the frames we saved were sent in a DTS order (frames: 1,6,4,2,3,5) but played at a PTS order (frames: 1,2,3,4,5). Also, notice how cheap are B-Frames in comparison to P or I-Frames.