Now that kiwidoc has been released, I can share my experience on how I configured the system in 'production'. kiwidoc is hosted at rackspace on 2 machines. A small one for the load balancer (apache web server) and a bigger one for the main application (tomcat). Configuring it was quite a challenge and I just want to share how I did it. Note that the instructions are for Ubuntu 9.0.4 with a stock installation of apache (2.2.11-2ubuntu2.3) and tomcat (6.0.18-0ubuntu6) using the standard apt* commands.

What did I want to achieve ?

My main application is a web application and is deployed in tomcat under [/java]. The load balancer (apache) should be able to direct traffic to multiple instances of tomcat when the need arises. I also wanted http://www.kiwidoc.com/ (in other word [/]) to be redirected to [/java/] which is my main entry point. The catch is that there are some pages that need to be served by apache (like some error pages) and this was not easy to configure.

Configuring tomcat (Part I)

On the tomcat side, I setup a new connector for ajp (file /etc/tomcat6/server.xml):
<!-- Define an AJP 1.3 Connector on port 8009 -->
<Connector port="8009" protocol="AJP/1.3" redirectPort="8010"
           proxyname="www.kiwidoc.com" proxyPort="80"
           URIEncoding="UTF-8"/>
I chose ajp because it is supposed to be much faster than standard http. So far the configuration is not too difficult. proxyname and proxyport are used so that the methods ServletRequest.getRemoteHost() and ServletRequest.getRemotePort() return the correct value.

Configuring apache

On the apache side, I added the 4 modules (directory /etc/apache2/mods-enabled):
proxy_ajp.load      -> ../mods-available/proxy_ajp.load
proxy.load          -> ../mods-available/proxy.load
proxy.conf          -> ../mods-available/proxy.conf
proxy_balancer.load -> ../mods-available/proxy_balancer.load
Then under /etc/apache2/sites-enabled, I have the following file (which I called 100-lb):
<VirtualHost *:80>
##########################
# DocumentRoot
DocumentRoot /var/www
<Directory /var/www/>
  Options FollowSymLinks MultiViews
  AllowOverride None
  Order allow,deny
  allow from all
</Directory>
##########################
# Error handling
ErrorLog /var/log/apache2/error.log
LogLevel warn
CustomLog /var/log/apache2/access.log combined
ErrorDocument 503 /errors/error_503.html
ErrorDocument 404 /errors/error_404.html
##########################
# Proxy
ProxyRequests Off
<Proxy *>
  Order deny,allow
  Allow from all
</Proxy>
<Proxy balancer://kiwidoc>
  BalancerMember ajp://123.123.123.123:8009
  BalancerMember ajp://123.123.123.124:8009
</Proxy>
ProxyPass /errors !
ProxyPass /images !
ProxyPass / balancer://kiwidoc/
</VirtualHost>
Let's cover each section:
  • The 'Document Root' section defines where to find the static html on the file system (/var/www is the standard by default).
  • The 'Error handling' section defines where the log goes and assigns the 503 and 404 error codes to static html pages that I have created and saved under /var/www/errors. I will explain why I did it this way.
  • The 'Proxy' section configures the proxy. Note that it is ok to have only one member. The last 3 lines defines the rules for load balancing:
    • The first 2 lines define an exclusion rule: all requests to [/errors] and [/images] will be served by apache and not forwarded (this is required due to the 3rd line).
    • The last line send all traffic to [/] to the balancer.
  • The reason why I need to have some pages served by apache is 2 fold:
    1. The static content should be served by apache (which does it very efficiently).
    2. If all tomcat instances are unreachable, then apache will issue a 503 error code, which gets mapped to [/errors/error_503.html] and without the exclusion rule, it would try to go to tomcat (which we know is unreachable...). This use case happens for example when I need to shutdown the main application for maintenance: you see a nice maintenance page.

    Configuring tomcat (Part II)

    We are almost there. The issue now is that [/] goes to tomcat which needs to handle it properly. So here is what I did:

    Under /var/lib/tomcat6/webapps/ROOT (which is what tomcat uses for [/]), I have a mini webapp:

    WEB-INF/web.xml
    ---------------
    <web-app xsi:schemaLocation='http://java.sun.com/xml/ns/j2ee 
                     http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd'
             version='2.4' 
             xmlns='http://java.sun.com/xml/ns/j2ee'
             xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'>
    <error-page>
      <error-code>404</error-code>
      <location>/errors/error_404.html</location>
    </error-page>
    </web-app>
    
    index.jsp
    ---------
    <% response.sendRedirect("http://www.kiwidoc.com/java/"); %>
    
    errors/error_404.html
    

    Let's cover each section:
  • The web.xml is required to define a nice 404 error page: the main application (under [/java]) already handles 404 errors nicely, but a request to [/donotexist] for example, is going to be handled by the ROOT webapp and I wanted the same error page.
  • The index.jsp page simply does the redirect and is served automatically by tomcat with no other configuration.
  • Conclusion

    First of all, it works, achieves what I described early on and I believe it covers all cases like maintenance mode and 'not found' error pages. Nonetheless I wish I had found a cleaner/simpler way to do that or in other words, to not have to create a ROOT webapp on the tomcat side. The main issue stems from the fact that I was totally unable to express in apache the simple rule: redirect [/] ONLY to [/java/] because using the [/] rule is treated as [/*]. I would be surprised if it was not possible, it is just very hard to find the documentation that explains how to do it.