Compression Algorithms

Data Compression, PHP, and the Web

Data compression is an extremely important topic in modern computing, networking, and software engineering. Sharing information faster and in smaller sizes across a network is a boundary that will continue to be pushed as long as computers and the internet exist. Large companies like Google and very smart people have continuously refined and created new algorithms to make things smaller. Better compression algorithms not only make companies profit but have implications on low bandwidth users, critical health data, financial data, etc. The topic is so important, HBO even created a show about it!

Let’s discuss some compression basics as it relates to the web, networking, and PHP.

By far the most used compression technique is deflate which powers zip, gzip, and zlib. Gzip compressed data can be decompressed by modern browsers on the fly. Gzip compression is lossless, meaning the original data can be fully recovered during decompression. Due to it’s power and widespread browser support, it’s almost a standard that we must gzip a websites contents before returning that information to the browser. Here’s how that typically looks:

Browser requests web page -> Nginx receives request -> PHP output is generated/static file is returned -> Nginx gzip's the output and responds to the browser -> browser decompresses the data for the end user

Here’s a useful article on enabling compression in nginx.

Recently, Google and Facebook have released their own compression algorithms. Google’s algorithm, Brotli, is another lossless compression solution. In a paper comparing compression algorithms, Brotli’s compressed data (at maximum level) is about 30% smaller or denser than gzip.

However, when looking at compression algorithms, we can’t just look at density (also referred to as compression ratio). We also have to consider compression and decompression speed. If our algorithm produces denser data but takes a month to compress, what have we really accomplished? In the paper referenced above, Brotli, performs about the same as gzip in compression and decompression time.

Zstandard is a lossless compression algorithm announced by Facebook in August 2016. Facebook is touting Zstandard to be a solid balance between compression ratio, compression speed, and decompression speed and a big step forward in modern computing.

Let’s look at some benchmarks: (table columns in order are: Plugin, Codec, Level, Compression Ratio, Compression Speed, and Decompression Speed)
benchmarks

This benchmark was produced by Squash Compression Benchmark on a 122 KB text file.

The results show Brotli has the best file density (compression ratio) while Zstandard has the worst. Zstandard has the fasted compression speed by far while Brotli has the slowest. I ran some of my own tests locally just on compression ratio:

Original Gzip (level 9) Brotli (level 11) Zstandard (level 22)
Webpage 1 44.05 KB 14.45 KB 12.67 KB 14.05 KB
Webpage 2 176.26 KB 175.98 KB 176.27 KB 176.28 KB
Webpage 3 208.38 KB 57.09 KB 47.76 KB 52.14 KB
Webpage 4 237.4 KB 39.07 KB 29.81 KB 33.28 KB
Webpage 5 191.72 KB 35.97 KB 28.64 KB 32.38 KB
Webpage 6 113.45 KB 16.22 KB 12.88 KB 15.05 KB
Webpage 7 533.23 KB 106.87 KB 84.02 KB 92.93 KB
Webpage 8 146.41 KB 27.86 KB 22.59 KB 25.08 KB
Webpage 9 30.54 KB 6.69 KB 5.4 KB 6.53 KB
Webpage 10 47.92 KB 10.23 KB 8.22 KB 9.86 KB
Webpage 11 116.57 KB 22.35 KB 18.55 KB 20.87 KB
Webpage 12 217.89 KB 36.57 KB 26.93 KB 30.44 KB

 

Average gzip compression ratio: 4.73
Average Brotli compression ratio: 5.91
Average Zstandard compression ratio: 5.21

So what does this all mean and how does it relate to the web, networking, and PHP?

Well, in the context of serving assets on the web, without a better compression ratio it’s unlikely that anything will unseat gzip. Therefore while Zstandard’s compression speed is very impressive, it is not useful for serving websites. Morever, modern browsers can all decompress gzip on the fly. There is no browser support for Zstandard. That being said, one can still use PHP and the zstd extension to compress and decompress files server side.

Brotli, on the other hand, does have a better compression ratio than gzip (and Zstandard). Google claims Brotli’s ratio is about 20-30% higher. Compression ratio improvements are heavily influenced by the type of file being compressed. The tests I ran (table above) show an average compression ratio improvement of about 24%. However, Brotli’s compression speed is about half that of gzip. However, for smaller file sizes (web pages), the compression ratio improvement trumps the loss in compression speed. Brotli is superior than gzip for serving web assets.

Brotli, unlike gzip, is not universally supported by browsers. In fact as of now it is not supported by Safari or IE/Edge but only new versions of Chrome and Firefox. Also, Brotli will only be properly decoded by browsers when served over https. There is a PHP extension for compressing as well as an nginx module.

As of today, Brotli is ready and worth it for production use based on my tests. We can use PHP to compress page cached files and decompress on the fly (perhaps an addition to Simple Cache) or use nginx to detect browser capabilities and serve Brotli compressed files accordingly. The nginx method is an easy win since all we need to do is compile the Brotli module in nginx and tweak our configuration file.

Shawn Maust wrote a nice article on compiling nginx with the Brotli module. I also wrote an nginx config file that let’s you enable Brotli with PHP7 FPM but fall back to gzip for non-supporting browsers.

Compression algorithms will continue to be iterated on and improved. For now, we can improve experiences for users and decrease bandwidth usage with Brotli.

Edit: The compression levels used for my tests were 9 for gzip, 11 for Brotli, and 22 for Zstandard.

Standard

5 thoughts on “Data Compression, PHP, and the Web

  1. Xianjin YE says:

    Hi, which version of gzip, brotli, zstd and compression levels are you using during the test?

    According to the lzbench, https://github.com/inikep/lzbench, zstd should have slightly better compression ratio when operating at the similar compression speed.

    Of course, the compression ratio varies due to the content of file. You can try the lzbench by supply your input.

      • Xianjin YE says:

        Hi, thanks for the info.

        However the highest compression level is usually impractical in real world usage as it consumes too much CPU. Maybe comparing compression ratios when compression algorithms operating at similar and suitable compression speed(normally we use gzip default compression level: 6?) is more appropriate?

  2. Hello, it might be worth mentioning that Brotli performs so well, because it uses a default pre-shared dictionary that has been trained on text based web content.
    There’s no reason why ZStandard couldn’t do the same, it is just a matter of specifying it in the RFC.

    For reference, see Appendix A in the Brotli RFC: https://tools.ietf.org/html/rfc7932

    In addition, a generic protocol for pre-shared dictionaries is on the way: https://en.wikipedia.org/wiki/SDCH

  3. Valérie Martin says:

    The main problem with Brotli is that it comes with a large dictionary that holds mostly English words, that’s fine in English speaking countries and when compressing English text, but on the World Wild Web it may prove more limited when dealing with pages in Chinese, Russian, Greek or even Spanish. Therefore if your test pages are only in English your results are biased, pick some pages of similar size but in different languages (on Wikipedia for instance) and rerun your tests… the results could be quite different.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s