hosted services
RewriteMaps are extremely useful, especially if you have a large database of links that you're moving from and old site structure. The redirectmaps allow for indexing/hash of data through the use of external programs or DBM files.
introduction
This page is here to document how to implement a DBM RewriteMap
in the simplest form for regular modifications (at least this is the way that I found it simplest).
The rewrite map has to be defined first in your main server configuration and cannot be defined within .htaccess
files. I find it easiest to reference a DBM file that is in user space, but you might want to limit this yourself to an area that is writeable only by a super user.
RewriteMap redirects-map dbm:/var/www/sites/example.test/maps/redirects
The above is telling Apache's configuration that the rewrite map we're using is named redirects-map, in DBM format and is located in the path /var/www/sites/example.test/maps/redirects
.
The map itself is a set of key:value pairs which are set-out like so:
main /index.php?article=main
search /index.php?article=search
updating
In order to make frequent updates I have found it easiest to use a Makefile
which allows for definitions of which httxt2dbm
to run from the command line
HTTXT2DBM=/usr/sbin/httxt2dbm
all:
$(HTTXT2DBM) -i links.txt -o links
$(HTTXT2DBM) -i redirects.txt -o redirects
In this example there are two maps which are built, the links
and redirects
.
.htaccess
To use the map within .htaccess
configuration it can be referenced like so:
RewriteCond ${redirects-map:$1} >"" [NC]
RewriteRule ^(.*) ${redirects-map:$1} [R=301,L]
What this is telling Apache is that if the map contains the URI and the value is not "" (an empty string) then perform the RewriteRule
. The second line of this snippet tells Apache that the URI should be re-written to the value of the key that we matched in the previous line. The redirect code is 301 (permanent redirect) and this is the last processing rule.
It's possible to redirect to the value of the key that matches the URI in a single line (ignoring the RewriteCond
), but that's only good if there is a match for the key.
It's also possible to use the logic to redirect matching URI's to a single location:
RewriteCond ${redirects-map:$1} >"" [NC]
RewriteRule ^(.*) /update_your_bookmarks.html [R=301,L]
The logic above just requests that the keys value is not blank, in this case the following logic in the RewriteRule
is applied.
lowercase keys
Should you want to only store lowercase keys (which might be advisable) for ease of maintenance, you can use the internal tolower
function, just add the below to your Apache main configuration
RewriteMap lowercase int:tolower
This is then accessible in the rewrite functions, such as in the example below:
RewriteCond ${lowercase:$1} ^(.+)$ [NC]
RewriteCond ${redirects-map:%1} >"" [NC]
RewriteRule ^(.*) ${redirects-map:%1} [R=301,L]
The first line of the above turns puts the result of the lowercase
map into the %1 value which is referenced in the second line. The %1
value is also used in the third and final line of this block.
trailing slash
So, rather than insert double the number of URI's to account for trailing slash, you can compensate for that in the RewriteCond
:
RewriteCond ${lowercase:$1} ^(.+)/$ [NC]
RewriteCond ${redirects-map:%1} >""
RewriteRule ^(.*) ${redirects-map:%1} [R=301,L]
In the lowercase
rule we add the /
to the end of %1
.
block lists
After receiving a lot of spam to a site I decided to start blocking those requests and storing them in a log.
This evolved into a RewriteMap
RewriteMap ipblock-map dbm:/var/www/sites/ip_block
The block text list should contain the following key value pair syntax:
127.0.0.1 block
RewriteCond ${ipblock-map:%{REMOTE_ADDR}} =block
RewriteRule ^ - [F,L]
This block rule just checks for a key with the word 'block' as the value and responds with a forbidden notice.
This is great for single IP addresses. This following snippet works for a /24 CIDR range (if you want to do more complex rules take a look at mid_cidr.
RewriteCond %{REMOTE_ADDR} ^(\d+)\.(\d+)\.(\d+)\.
RewriteCond ${ipblock-map:%1.%2.%3} =block
RewriteRule ^ - [F,L]
The entries in the block list need to be stored in the following format for /24 addresses:
127.0.0 block
host redirects
Another useful feature you can make use of is simple redirects based on the HTTP_HOST variable. The first VirtualHost configuration will be the 'catch all' container, so if you add a rewrite map such as this:
RewriteMap lowercase int:tolower
RewriteMap host-map dbm:/var/www/sites/maps/host-map
RewriteCond ${lowercase:%{HTTP_HOST}} ^(.+)$ [NC]
RewriteCond ${host-map:%1} >""
RewriteRule ^ ${host-map:%1} [R=301,L]
All you need to do now is add host redirects to the host-map
dbm file. If you're running a very large redirect server then the chances are that you don't want to have many virtual host containers as each virtualhost container will consume some memory. If you have the redirect configured through a dynamic script then that script will need to be invoked for each request that needs to go through the virtualhost redirect server. Using a rewritemap in this instance has strong benefits.
rewritemap programs
We can take the above example and improve it somewhat with the use of an external program to handle the lookup key and respond with a value. This is really handy since external programs can do computation things which would be near impossible with mod_rewrite. Here is a rather simple example
RewriteMap beanmap prg:/home/perl/rewritemaps/beanmap.pl
RewriteCond ${mymap:%{REMOTE_ADDR}} ^(\S+)\s(\S+)\s(\S+)\s(\S+)
RewriteRule ^ - [E=V1:%1,E=V2:%2,E=V3:%3,E=V4:%4]
In the above example, we're expecting the program to return four values separated by a space otherwise we'll discard the values all together and the RewriteRule won't have any effect. In the above example we're sending the program the remote IP address of the connection as the input (key).
The program which responds to this input can be as simple as the following:
#!/usr/bin/perl
use strict;
use warnings;
# it is essential to turn off buffering, otherwise the output from this program will not be flushed
$|=1;
while( my $line = <STDIN> ) {
chomp $line;
if( $line =~ /^([0-9a-f]+)\.|\:([0-9a-f]+)\.|\:([0-9a-f]+)\.|\:([0-9a-f]+)/ ) {
print "Fi\tFy\tFo\tThumb\n";
next;
}
print $line, "\n";
}
In the above, if we validate the input as IPv4 or IPv6 then we can write some output. This will be matched by the RewriteCond and the environment variables will be set.
The program is a bit silly and doesn't do very much at all of value other than serve as an example to show how the output from this program can become environment variables through RewriteCond matching and RewriteRule setting.