Before we move to let’s talk about timing, or how a video player knows the right time to play a frame.
In the last example, we saved some frames that can be seen here:
Therefore we need to introduce some logic to play each frame smoothly. For that matter, each frame has a presentation timestamp (PTS) which is an increasing number factored in a timebase that is a rational number (where the denominator is known as timescale) divisible by the frame rate (fps).
It’s easier to understand when we look at some examples, let’s simulate some scenarios.
For a and timebase=1/60000
each PTS will increase timescale / fps = 1000
therefore the PTS real time for each frame could be (supposing it started at 0):
frame=0, PTS = 0, PTS_TIME = 0
frame=1, PTS = 1000, PTS_TIME = PTS * timebase = 0.016
frame=2, PTS = 2000, PTS_TIME = PTS * timebase = 0.033
frame=0, PTS = 0, PTS_TIME = 0
frame=3, PTS = 3, PTS_TIME = PTS * timebase = 0.050
For a fps=25/1
and timebase=1/75
each PTS will increase timescale / fps = 3
and the PTS time could be:
frame=0, PTS = 0, PTS_TIME = 0
frame=1, PTS = 3, PTS_TIME = PTS * timebase = 0.04
frame=3, PTS = 9, PTS_TIME = PTS * timebase = 0.12
frame=24, PTS = 72, PTS_TIME = PTS * timebase = 0.96
- …
frame=4064, PTS = 12192, PTS_TIME = PTS * timebase = 162.56
Now with the pts_time
we can find a way to render this synched with audio pts_time
or with a system clock. The FFmpeg libav provides these info through its API:
- fps =
AVStream->avg_frame_rate
- tbr =
Just out of curiosity, the frames we saved were sent in a DTS order (frames: 1,6,4,2,3,5) but played at a PTS order (frames: 1,2,3,4,5). Also, notice how cheap are B-Frames in comparison to P or I-Frames.