Improved video compression is important for delivering digital video files more quickly and with higher quality, while using less bandwidth and storage. Everything from 4K movie streaming to smartphone video chat to laptop screen sharing can be enhanced by making the video files smaller through better compression codecs.
The Alliance for Open Media — a consortium founded in 2015 and made up of video-on-demand providers including Amazon, Facebook, Google, Microsoft and Netflix, along with web browser developers and semiconductor firms — has just released AV1 (also known as AOMedia Video 1), a new open, royalty-free video format that promises to be a significant step forward in compression efficiency.
We tested the new codec under conditions that closely match the most common real-world use cases for Facebook video. Our test examined AV1’s performance vs. practical open source video encoders that can be deployed to a practical production system, rather than merely testing efficiency vs. standard reference software encoders (i.e., H.264/AVC Joint Model or JM). By structuring the test this way, we were able to show how the codec will perform in a true production environment compared with current widely used alternatives, such as x264 and libvpx-vp9. In designing our comparison tests, we aligned our approach closely to previous work by Netflix, comparing x264, x265 and libvpx.
Our testing shows AV1 surpasses its stated goal of 30% better compression than VP9, and achieves gains of 50.3%, 46.2% and 34.0%, compared to x264 main profile, x264 high profile and libvpx-vp9, respectively. The new codec requires longer encoding times vs. current alternatives, however, due to increased complexity.
Our tests were conducted primarily with Standard Definition (SD) and High Definition (HD) video files, because those are currently the most popular video formats on Facebook. But because AV1’s performance increased as video resolution increased, we conclude the new compression codec will likely deliver even higher efficiency gains with UHD/4K and 8K content.
With the official public release of AV1 on March 28, 2018, these results should foster confidence that the technology is capable of significant gains in compression in real-world implementations.
The specifics of our testing process and results below will help engineers evaluate AV1 compression performance in detail.
Test Methodology and Evaluation Setup
Instead of using uncompressed test video sequences, such as common test sequences in video standard quality evaluation or public test sequences in https://media.xiph.org/video/derf/, our experiment selected 400 top-viewed public videos from Facebook (FB) Pages. Those videos had the following characteristics:
- Most videos were recorded on smartphones
- They were already compressed on the client side before being uploaded to Facebook servers
- Most were SD or HD, instead of UHD/4K or 8K.
As these criteria make clear, the test content was quite different from that in video standard test conditions, where uncompressed and UHD test sequences are essential for recent video standard quality evaluation. The already-compressed test content was decompressed first and then re-compressed by all tested encoders. Again, this approach allowed us to gauge how AV1 would perform in a real-world production environment.
To measure the representation of these videos, the content analysis was conducted according to ITU-T P.910 Subjective video quality assessment methods for multimedia applications. This content analysis is useful to show the relative spatial information and temporal information found in the various videos available, since the compression difficulty is directly related to the spatial and temporal information of a video.
Due to scene change within the video, except for the maximum values of the standard deviation as recommended in ITU-T P.910, the median values of spatial and temporal information are also calculated:
Figure 1 shows scatter plots of the spatial and temporal information for all 400 FB top videos (the first 10 seconds). The plots show a wide spread of content coverage, including slow/fast motion and low/high spatial complexity.
Encoder Implementations
For AV1 encodings, we used a snapshot version of AOM AV1 reference software. For H.264/AVC and VP9 encodings, we used ffmpeg version 3.3.3, with corresponding libx264 and libvpx-vp9 libraries. Table 1 lists the video codec versions used in our test setup.
Here are details on the three codecs used in our test:
AV1
This snapshot version was fetched from AOM AV1 repository when the AV1 specification was officially released on March 28, 2018. The coding performance of AV1 should be stable since this snapshot version, and the main focus of current AV1 development is on speed optimization to make it practical for use in production systems.
x264
x264 is a well-known video encoder for H.264/AVC and provides best-in-class performance, compression, and features with ~24% better encoding than the second place encoder in MSU Sixth MPEG-4 AVC/H.264 Video Codecs Comparison. x264 is widely used in the core of many web video services, including Facebook’s, and adopted by television broadcasters and ISPs.
libvpx-vp9
The free software video codec library libvpx was developed by Google and serves as the reference software implementation for the video coding formats VP8 and VP9. With the release of 1.5 and 1.6, libvpx-vp9 delivered significant speedups for both encoding and decoding, which make it practical for use in production systems.
Encoder configurations
In order to determine the bit rates in a content-adaptive way, each video was first encoded using Constant Rate Factor (CRF) or Quantization Parameter (QP) mode with 6 CRF/QP values; then the output bit rates in CRF/QP encoding stage are fed into the 2-pass Adaptive Bit Rate (ABR) encoding. To match the quality/bit rate range across codecs, the following CRF/QP values were used:
x264 CRF = {19, 23, 27, 31, 35, 39}, VP9/AV1 CRF/QP = {27, 33, 39, 45, 51, 57}
The CRF/QP and ABR configurations are as follows:
We chose settings that reflect the most common x264 and libvpx-vp9 encoding setting used in Facebook Video On Demand (VOD) applications. Since both x264’s main profile and high profile are used in Facebook video encoding, they are reported separately. AV1 tries to match x264 and libvpx-vp9’s encoding setting. Note: In order to match other codecs’ settings, our test used “–kf-max-dist=60 –kf-min-dist=60” for AV1, instead of the setting originally recommended by Google’s WebM team (“–kf-max-dist=150 –kf-min-dist=0”).
Experimental Results
Compression efficiency was measured by the Bjontegaard-Delta rate (BD-rate) metric, which calculates the average bit-rate differences between Rate-Distortion (R-D) curves for the same distortion, e.g., for the same Peak Signal-to-Noise Ratio (PSNR) or Structural Similarity (SSIM). Note that negative BD-rate values indicate actual bit rate savings. As outlined above, the test used 400 FB videos for different resolutions (360p/480p/720p/1080p) with 30fps, 16:9 aspect ratio, 1:1 pixel aspect ratio and 8bit depth. The first 10 seconds were extracted from each video for encoding tests.
Experimental Results for CRF/QP
Figure 2 and Figure 3 show AV1 BD-rate savings for CRF/QP mode against x264 main, x264 high, and libvpx-vp9. In terms of PSNR, the average BD-rate savings of AV1 relative to x264 main, x264 high and libvpx-vp9 are 50.0%, 45.8% and 32.9%, respectively. In terms of SSIM, the average BD-rate savings of AV1 relative to x264 main, x264 high and libvpx-vp9 are 49.8%, 45.7% and 40.5%, respectively.
On the other hand, the encoding computational complexity (in terms of encoding run time) of AV1 compared with x264 main, x264 high and libvpx-vp9 for CRF/QP mode was increased by factors of 5721.5x, 5869.9x and 658.5x, respectively, as shown in Figure 4.
To summarize the BD-rate performance of all tested encoders for CRF/QP mode, Table 3 provides an overview of cross-codec comparisons in terms of PSNR and SSIM.
Experimental Results for ABR
Figure 5 and Figure 6 show AV1 BD-rate savings for ABR mode against x264 main, x264 high and libvpx-vp9. In terms of PSNR, the average BD-rate savings of AV1 relative to x264 main, x264 high and libvpx-vp9 were 51.0%, 47.0% and 29.9%, respectively. In terms of SSIM, the average BD-rate savings of AV1 relative to x264 main, x264 high and libvpx-vp9 were 50.3%, 46.3% and 32.5%, respectively.
However, AV1 saw increases in encoding computational complexity compared with x264 main, x264 high and libvpx-vp9 for ABR mode. Encoding run time was 9226.4x, 8139.2x and 667.1x greater, respectively, as shown in Figure 7.
To summarize the BD-rate performance of all tested encoders for ABR mode, Table 4 provides an overview of cross-codec comparisons in terms of PSNR and SSIM.
Next Steps
These results should give engineers confidence in how AV1 performs and speed up the adoption of AV1 in production systems. Based on our findings, software developers can look to add support for AV1 knowing it outperforms its efficiency targets in these real-world conditions.
Facebook will continue to promote the adoption of AV1 in our production systems. We plan to gradually serve AV1 content on web for popular Facebook videos once major web browsers such as Chrome and Firefox implement AV1 support. Users watching AV1 content will enjoy better quality at the same bit rate or see 30% to 50% less buffering at the same quality compared with VP9 or H.264/AVC content.