why unix | RBL service | netrs | please | ripcalc | linescroll
mod-gzip-disk-ng

mod-gzip-disk-ng

Everyday web servers are reading content from disks, compressing it and sending it to clients. Over and over again. Why not store the compressed data on the disk, then send pre-compressed data to the client without the overhead of compression. Not all requests will be stored in a web cache, some of them will be requested several times over due to headers/querystring. So why not store the content in its final form on disk? Save some space, save some CPU and save disk IO, improve latency as disk IO will be lower and so too will be CPU.

Why not indeed.

As a rough guide, 91% of the requests to my site included the Accept-Encoding: gzip header.

Building

apt-get install git cargo apache2-dev libapr1-dev libclang-dev
git clone https://gitlab.com/edneville/gzip-disk.git
cd gzip-disk
cargo build --release
please cp target/release/libmod_gzip_disk_ng.so /usr/lib/apache2/modules

Enabling

Add to directories where static content will be served, within Location blocks or within your vhost:

SetHandler gzip-disk-ng

(test the config now)

please apache2tl configtest

echo 'LoadModule gzip_disk_ng_module /usr/lib/apache2/modules/libmod_gzip_disk_ng.so' | please tee /etc/apache2/mods-available/gzip_disk_ng_module.load

please a2enmod gzip_disk_ng_module
please apache2tl configtest && please apache2ctl restart

Pages that were served sending pre-compressed data will get the header:

Mod-Gzip-Disk-Ng: :-)

Pages that were decompressed:

Mod-Gzip-Disk-Ng: :-(

Once in place, publish in .gz and rsync to your destination or, if that's not possible, atomically gzip/zstd/br your files:

$ find . -type f -name '*.html' -print0 | while IFS= read -r -d '' NAME; do
  cat "$NAME" | tee \
    >(gzip -9 > "$NAME.gz.$$" && mv "$NAME.gz.$$" "$NAME.gz") \
    >(brotli -9 > "$NAME.br.$$" && mv "$NAME.br.$$" "$NAME.br") \
    >(zstd -9 > "$NAME.zstd.$$" && mv "$NAME.zstd.$$" "$NAME.zstd") \
    >/dev/null && rm "$NAME";
done

Ideally, keep your publishing system as much the same as possible, just stage your files locally, compress in the staging area, and rsync that to the web service, it really depends how big or small the operation.

Notice how much less space your site needs now!

directory indexes/default documents

RewriteCond -f %{REQUEST_FILE}/index.html.gz
RewriteRule ^ %{REQUEST_FILE}/index.html

selectively enabling

Another way to use this is to enable only if the .gz file is present:

RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME} !-f
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME}.gz -f
RewriteRule ^ - [H=gzip-disk-ng]

This way the SetHandler does not need to be enabled and the web server will function entirely as before, with the exception that if the request file is not present but the compressed version is, then that is sent via the handler. For complex configurations this is probably the best option.

tests

The test outputs below show that even in the worst case, when the client doesn't have gzip encoding decompression of the pages is much faster than compressing each page request.

Nearly 9k requests/sec vs 53 requests/sec on a Ryzen 3. Forget about speed for a moment, think energy.

On a directory with the handler:

$ ab -n 10000 -H 'Accept-Encoding:gzip' http://localhost/test/index.html  
This is ApacheBench, Version 2.3 <$Revision: 1903618 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests


Server Software:        Apache/2.4.57
Server Hostname:        localhost
Server Port:            80

Document Path:          /test/index.html
Document Length:        70050 bytes

Concurrency Level:      1
Time taken for tests:   1.116 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      702750000 bytes
HTML transferred:       700500000 bytes
Requests per second:    8959.38 [#/sec] (mean)
Time per request:       0.112 [ms] (mean)
Time per request:       0.112 [ms] (mean, across all concurrent requests)
Transfer rate:          614863.51 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     0    0   0.0      0       0
Waiting:        0    0   0.0      0       0
Total:          0    0   0.0      0       0

Percentage of the requests served within a certain time (ms)
  50%      0
  66%      0
  75%      0
  80%      0
  90%      0
  95%      0
  98%      0
  99%      0
 100%      0 (longest request)

Same directory without gzip encoding:

$ ab -n 10000 http://localhost/test/index.html
This is ApacheBench, Version 2.3 <$Revision: 1903618 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests


Server Software:        Apache/2.4.57
Server Hostname:        localhost
Server Port:            80

Document Path:          /test/index.html
Document Length:        222057 bytes

Concurrency Level:      1
Time taken for tests:   16.066 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      2222810000 bytes
HTML transferred:       2220570000 bytes
Requests per second:    622.43 [#/sec] (mean)
Time per request:       1.607 [ms] (mean)
Time per request:       1.607 [ms] (mean, across all concurrent requests)
Transfer rate:          135112.38 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     1    2   0.5      2       3
Waiting:        0    1   0.2      1       2
Total:          1    2   0.5      2       3

Percentage of the requests served within a certain time (ms)
  50%      2
  66%      2
  75%      2
  80%      2
  90%      2
  95%      2
  98%      2
  99%      2
 100%      3 (longest request)

On a directory where gzip is permitted but data has to be compressed on the fly:

$ ab -n 10000 -H 'Accept-Encoding:gzip' http://localhost/test-nohandler/index.html
This is ApacheBench, Version 2.3 <$Revision: 1903618 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests


Server Software:        Apache/2.4.57
Server Hostname:        localhost
Server Port:            80

Document Path:          /test-nohandler/index.html
Document Length:        70729 bytes

Concurrency Level:      1
Time taken for tests:   187.908 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      710100000 bytes
HTML transferred:       707290000 bytes
Requests per second:    53.22 [#/sec] (mean)
Time per request:       18.791 [ms] (mean)
Time per request:       18.791 [ms] (mean, across all concurrent requests)
Transfer rate:          3690.40 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     8   19   7.6     23      38
Waiting:        3   15   6.2     19      31
Total:          8   19   7.6     23      38

Percentage of the requests served within a certain time (ms)
  50%     23
  66%     25
  75%     25
  80%     25
  90%     26
  95%     28
  98%     29
  99%     29
 100%     38 (longest request)

no gzip and no handler

Here we can see more data is transferred and not as fast as gzip'd

$ ab -n 10000 http://localhost/test-nohandler/index.html
This is ApacheBench, Version 2.3 <$Revision: 1903618 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests


Server Software:        Apache/2.4.57
Server Hostname:        localhost
Server Port:            80

Document Path:          /test-nohandler/index.html
Document Length:        222057 bytes

Concurrency Level:      1
Time taken for tests:   1.404 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      2223330000 bytes
HTML transferred:       2220570000 bytes
Requests per second:    7122.17 [#/sec] (mean)
Time per request:       0.140 [ms] (mean)
Time per request:       0.140 [ms] (mean, across all concurrent requests)
Transfer rate:          1546379.70 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     0    0   0.0      0       1
Waiting:        0    0   0.0      0       1
Total:          0    0   0.0      0       1

Percentage of the requests served within a certain time (ms)
  50%      0
  66%      0
  75%      0
  80%      0
  90%      0
  95%      0
  98%      0
  99%      0
 100%      1 (longest request)