|Race against the clock with Smalltalk|
September 16, 2004, 8:26:30 am
Last week Wizard launched http://www.wizardpolling.com.au. This website is a polling website for the Australia Federal Electioon 2004. It's the brainchild of the Australian National University Innovation and several other partners are involved, such as Nine Networks and The Bulletin magazine.
The idea is fairly simple. Lots of media coverage, good demographics while preserving privacy, university researchers to analyse the results, media to then publish the results on a day to day basis. It's a win win situation for all.... but....
This is where my story begins. We had a visit from the CEO of ANU Innovation asking us to build him his software. At this point the Federal Election had not been called. We didn't know when it was going to be called - but we knew one thing was for sure, we didn't have much time!
We decided on a plan of action - 6 weeks to build, test, QA and implement the entire project. Now, granted, a system like this should be simple and straight forward, but any body who has been hit by the clue stick a few times knows that nothing in the IT industry is simple. There is the requirements gathering phases, contract and IP resolutions, the cycling around bugs and testing, actual QA and the constant changing requirements that happen regardless of how well your upfront gathering phase went.
As a rule, no project should be scheduled for any time less than three months. Naturally, we had to break this rule. This should have been a fatal blunder - except that we were programming in Smalltalk.
Smalltalk allowed us to implement, test, deploy in a rapid fashion. We got constant immediate feed back from our QA team as to the quality of the product. We also were able to implement a framework that was geared specifically toward the requirements of the system.
Another important rule in software development is never to optimise too early (or ever at all?) - but you can certainly design around some ineffeciencies you might have. Always profile: because your assumption on what is 'slow' is nearly always wrong.
As it was in this case. The original design had Apache at the front of the system, directing only dynamic page requests through to the back end Smalltalk software - where we assumed it would be slow. We therefore needed some sort of load balancing system.
In reality, the Smalltalk code could deliver dynamic pages faster than Apache could deliver the static pages! How is this possible? Well, Apache is designed to be a generalised web server - while our software was designed to keep every thing it needed cached in memory. In fact, we compiled our HTML with substitutions in to smalltalk code, compiled the smalltalk code and then let the JIT in the VM compile it to native assembly language. Put simply, you cannot get it faster than that! :)
The next mis-assumption in this concoction was the load balancing. Apache had to do the load balancing in some form or another. This, according to the Apache manuals, should take the form of some perl. Now.. this generally means dynamically executing perl, having it compile perl, run perl.. etc. Perl isn't even jitted.. so where exactly was I going to get my speed benefits from?
It began to dawn on us that a general web engine like Apache is not the answer to speed. We needed something to sit in front of our program to do load balancing and fail-over, but it had to be lightweight, fast and unobtrusive. We had two choices: squid or pound.
Now, I already knew someone who used squid for load balancing and fail-over as well as caching. So I knew there was an answer in sight. But.. when I started playing with squid, nothing came easily. It was not at all evident how these jobs were done. I remembered that Bruce Badger actually had one of the core squid engineers under his wing to help him - I didn't have said luxury, nor said time for said luxury.
So I settled on pound. I'm glad I did too, it's exactly what I needed. It even works as an SSL front-end, handling certificates, it routes based on URL's and other HTTP headers and it can even remember state information based on cookies, parameters or other things. If you need HTTP based routing and fail-over, I recommend taking a look at pound. Inter-machine routing was especially important if we discovered one of our machines was too weak during the actual live run of the site. We needed a way to expand our infrastructure quickly, on the spot, without having to write more code (as a result, one of the key features of the software design was to be as stateless and sessionless as possible so that any request can be completed by any backend server on any backend machine).
The world started to change shape rapidly. We knew we were building a system that needed to withstand a slashdotting (or more) every single day (some times many times a day). Worse, our partners were trying to get in to a contract that we'd have 24/7 uptime and then they wanted us to tell them how much traffic we could handle - more on this later. Apache was starting to look like a weak link.
The end result has now got images and css being routed through pound to apache (for HTTP/1.1 headers - I ran out of QA time to throw that one in, but I did have enough time to throw in compressed content to save bandwidth), the rest goes back to as many Smalltalk images as we want to have running. It turns out that one image was more than enough to handle well over a thousand connections per second. For the hell of it, we run two - that way we can take one down, upgrade it, then swap their roles and do an incremental upgrade.
On the last two days we had several critical changes requested by the client. And boy was I glad we were programming in Smalltalk! We could put those changes through, run our tests, deploy a development image in to the QA environment, have QA test it, all the while getting interactive debuggers.. and in the space of two hours we had all the changes completed.
To keep the production deployment simple, we decided to skip this whole "packaging" concept entirely and just deploy development images. We only ran in to one snag here: because we're programming in VisualAge Smalltalk, the development environment must have some sort of repository to link to for its source code.
We had naively left it connecting to the development envy repository. When we moved to the production environment - the firewall blocked access. This wasn't a big deal, we now have two repositories - a dev one and one on the production machine. This also limits access to other source and helps isolate upgrades.
The final last minute problems in the system actually came down to the shiny new fibre-optic line we had installed for the system. We wanted a large amount of unshared bandwidth to use. Our existing lines weren't big enough for the expected traffic. So now we have a nice 10/10 line coming in over fibre, dedicated to the site, going through a nice new shiny pix firewall.
About the speed.. to pick up on the 'more on that later' - most sites are now linking directly to www.wizardpolling.com.au instead of to bulletin.ninemsn.com.au or news.ninemsn.com.au because those two sites are slower than our site. Now, I'm not going to say we get as much traffic volume as they do, or that their pages aren't more complex than ours.. but, I wager our approach/architecture, etc, could continue to expand to their complexity and still serve up the results faster. That's something to be proud of in a geeky sense.
So int he end, Smalltalk not only exceeded expectations, but also saved us in the last few days of the project before going live. In all manner of speaking, I highly doubt if I could have done this entire project (we had three other developers working on this, two smalltalkers, one web developer) in a language other than smalltalk in six weeks. It's not just the dynamic nature of the language, the lack of typing, but more the environment, the tools, the intergrated unit testing, so on and so forth.
There are so many technical nuggets I could throw in to this blog entry about this project - such as database session pooling and other such nice things, but the idea behind this blog post was to give yet another story of "How smalltalk actually helps when doing a project"
By Sean Malloy on September 16, 2004, 8:32:15 pm
By Michael Lucas-Smith on September 17, 2004, 4:50:01 am