hosted services

flush
cgi parameters with fcgid
memory
python
spawnfcgi

FastCGI is a neat method of keeping an application running in memory to service web requests without restarting the whole application. If you consider that Perl applications require lengthy execution delays, it is sensible to keep the application in memory with all libraries loaded and simply iterate the main() loop.

flush

One of the things which I noticed immediately with fcgi was that it didn't appear to have a method of flushing the output from the user application. The work around for this was to add the following to the <VirtualHost> directive.

FcgidOutputBufferSize 0

(I'm not sure if this incurs any optimisation penalties, I didn't bother to test in any scientific way). Once the config is reloaded call the following to see the results:

while( $request->Accept() >= 0 ) {
print( "Content-type: text/html\r\n\r\n", ++$count );
  for( my $i=0 ; $i<3; $i++ ) {
    print "Hi!<br />\n";
    $request->Flush();
    sleep( 1 );
  }
}

You should then see the lines appear one second apart.

Of course though, there is more than one way to skin a cat.

cgi parameters with fcgid

Often if you're using CGI.pm (like I do) you may find that variables with param don't appear correctly after the first page access. You will need to refresh them each time the application while loop runs.

You may find it helpful to write your while loop something like this:

#!/usr/bin/perl

use strict;
use warnings;
use FCGI;
use CGI;

sub main {
  my $request = FCGI::Request();

  while( $request->Accept() >= 0 ) {
    my $e = $request->GetEnvironment();
    undef @CGI::QUERY_PARAM;
    my $q = CGI->new();
    # ...
  }
}

main;

There appears to be two main ways to interact with FCGI from a perl script. these are CGI::Fast and FCGI. I have found CGI::Fast to be rather limited whilst FCGI seems to offer the majority of what CGI::Fast does by default, that is everything except pre-loading with CGI.pm functionality. Do yourself a favour and use FCGI rather than CGI::Fast.

memory

The trouble with mod_fcgid.so is that requests greater in length than fcgidmaxrequestlen will be written to disk and then passed to the application, otherwise they will be read into memory. This is not ideal for either option. Firstly, the memory could be saturated, or in the case of disk, the IO is wasted to simply do shovelling.

If you want to run an application, such as a file uploader, then you can use mod_proxy_fcgi and simply send the traffic directly to the application without disk or memory consumption. You'll get a faster response to your user as a result and other applications on the system will not run the risk of getting memory/disk starvation.

ProxyPass "/myapplication/" "fcgi://localhost:9000/"

Beware though, if you use CGI; as the moment you use 'new' you'll be reading the whole request into memory again, instead, if you want to read the post data you can do something similar to this:

my $socket  = FCGI::OpenSocket( ":9000", 3 );
my $request = FCGI::Request(
  \*STDIN, \*STDOUT, \*STDERR, \%request_params, $socket );

while( $request->Accept() >= 0 ) {
  print("Content-type: text/plain\r\n\r\n" );

  my ($in, $out, $err) = $request->GetHandles();
  open( my $f, ">", "/tmp/destination_post_file" );
  binmode($in);
  binmode($f);
  my $buffer;
  while( my $br = sysread( $in, $buffer, 16384 ) ) {
    syswrite( $f, $buffer, $br );
  }
  close($in);
  close($f);
}
FCGI::CloseSocket( $socket );

If you're running within docker, you can run one application per container with a large swarm for redundancy, or use a watchdog program in your entrypoint to ensure that the application is restarted should it die. Unlike mod_fcgi you have to manage the jobs. However, mod_proxy_balancer can help you here if you make a job pool (docker swarm, for instance).

python

Things are a little different with python's FCGI implementation. It is named 'wscgi'. To start, you'll need this boiler plate code:

from flup.server.fcgi import WSGIServer
def app(environ, start_response):
  import cgi

  start_response('200 OK', [('Content-Type', 'text/html')])
  yield '<html><head><title></title></head>\n' \
    '<body>\n' \
    '<p>hello world</p>\n'


WSGIServer( app, bindAddress=( '0', 9001) ).run()

An interesting difference with python's implementation of FCGI, and CGI in general, compared with perl, is that the input data is not written to disk. What's also interesting with the approach of using yield is that the connection is set to close and chunked encoding is often not used (though I've had mixed results).

spawnfcgi

A great way to keep a FCGI application running (or microservice, if you want to call it that) is to combine spawn-fcgi and multiwatch. This combination allows you to start and expose a service daemon that listens to the FCGI protocol and keep a number of service daemons running.

/usr/bin/spawn-fcgi -n -p 9000 -u nobody -- \
/usr/bin/multiwatch -f 5 -- \
/usr/bin/python /usr/local/bin/microservice.py

I find this a good balance. We could use docker swarm to keep a service running with a number of workers, but that could be more overhead than we need, multiwatch can do that work for us without requiring additional containers.

Docker swarm is useful with this setup still as that performs part of a cluster management role which would be outside what multiwatch can do for us.