Saturday, December 15, 2012

Amazon CloudFormation (DSL anyone?)


It has been quite a while since I have posted anything to a blog, but I thought I would get started writing again.

I wanted to talk a little about Amazon CloudFormation. (And plug something that I wrote recently.) I have worked with it for several months now, and I have to say that overall I am quite impressed.
that this particular system 
In the past, I worked on a product that used fog and chef and some other ruby scripting glue to deploy resources in AWS. Overall, the system worked quite well, and I was able to deploy some fairly complicated systems with very little effort, but there were a few drawbacks.
One of the most significant things about this system (that's right, I'm not naming it...) was it was effectively using the database on the chef Master as a kind of registry of cluster available resources. In one way, this was a very powerful move, in that it allowed clustered resources to be aware of their surroundings in a way that was previously difficult to do. For instance, if I was setting up a Hadoop cluster in this manner, I could tell the machine that was running the namenode that it was part of a hadoop cluster, and it could use an api call to the chef master to figure out where all of the datanodes were, and vice versa, allowing both types of machines to set up the correct configuration. So what is not to love? Well, the chef master server was something of a point of failure for the hadoop cluster. Well, really for a hadoop cluster, it is not all that much of a problem, as the chef master only has to be around when significant configuration changes occur - like adding more datanodes to the cluster. The system in question was not using Autoscaling groups, so changes in the configuration pretty much only happened during times of manual intervention, so it was not really all that much of a burden that there was a dependency on the chef master.
Life is different when you are putting together an autoscaling group of webservers. The idea is that you tell AWS how to bootstrap a machine to be a web server. You also tell it that machines in such a group need to have http requests coming to a particular address be load balanced across the set. Then you tell it (the autoscaling group) that when the load on your website exceeds a certain threshold, that it should spin up some more servers, or if the traffic falls below some other level, it is ok to kill off a few servers. In this framework, automatic configuration of cloud resources can become a critical operation - if you can't manage to get them up and running to meet rising demand for requests, your web service could potentially fail to meet SLAs. If you are using chef (or puppet for that matter) as a part of the bootstrap process, a failure on the master could precipitate a larger system failure. There are of course ways to mitigate this risk, but lets move forward - even though both of these systems are fantastic at configuring what goes inside a virtual box, neither of them (to my imperfect knowledge) is particularly good about making sure that external systems are configured.
Enter CloudFormation. CloudFormation gives you an entry point into the AWS ecology where you can orchestrate the provisioning of a suite of AWS resources. AWS is api driven, and just about all of the api is accessible to you through CloudFormation templates - including autoscaling groups, which are not supported on the web console. There is something magic about being able to set up a data store, a dns record, a load balanced set of servers that will automatically scale with load, and other resources from a single call to AWS. I can not say enough good things about the service, and I am sure that they have all been said before, so I will stop trying to sell it. I have had a lot of success with it so far, and I plan to use it again in future deployments.
One of the ways that CloudFormation wins over my previous approach of using chef master as the registry for clustered systems is that you you can embed configuration information into the script that builds the cluster, and the machines in the cluster are able to get that data out at configuration time (this is what the aws bootstrapping tool cfn-init does when it runs) or you can set up a process on the machine that polls for chagnes to this (this is what cfn-hup does, I believe). These are roughly analogous to the the an initial chef run and a periodic update call to chef-client. In actuality, by putting the cluster configuration information into a CloudFormation template, I have transferred the potential failure from a chef master to some nebulous parts of the Amazon API, but I am pretty comfortable with the redundancy that I get from multiple availability zones, so I think that this is still probably a win. Of course, there is no reason why you can't engineer a solution that uses both, so this is not by any means an argument to use cloudFormation to replace chef functionality.
However, I have to say that I have a couple of complaints about CloudFormation. The first one is that debug cycles for cloudformations stacks are long. It can take several minutes for resources to get created. In some cases, you get warned early that there are problems with your template, but if there are problems getting all of the resources that you install on EC2 instances working, you may have to wait 10 minutes to find out if something is wrong, and then have to tear the whole thing down and start again (just a single API call, but a long wait.) I suppose that this is the nature of the beast, but it gets infuriating at times. (I remind myself at these times that if I were waiting for an IT staff to set up my cluster of 50 servers, I would be waiting for a week, not for half an hour, and I would have to pay them even when the machines were not running...)
My real complaint is that the language used for the templates is ghastly. From Amazon's architectural point of view, it makes a lot of sense to represent this language as a giant JSON object - everything else about their API is in JSON and there is no good reason to make an exception for this. However, from my point of view as a human (well, half, anyway), JSON is a terrible language. I find the syntax for function calls confusing. I get lost in the long lists of things. In an especially large template file I find that I have trouble figuring out where one resource ends and the next one begins. You can do a certain amount of organization with white space: you can indent objects - that helps some. If you could include comments, I suspect that many of my complaints would evaporate entirely, as I could then add the kind of commenting structure that I am used to in regular programming languages to help me navigate and to explain particularly tricky parts of the code, etc.
As I was plowing through a particularly complicated CloudFormation template the other day, it struck me that there was a real lack of tools to help you build CloudFormation templates. There are a few - Amazon has a tool called CloudFormer. Basically, you fire up an ec2 instance with cloudformer on it, and it helps you piece together a template from your existing resources. This is somewhat helpful, but it means that you have to set up the thing that you want first. If you need to modify the result, you have to wade into the JSON yourself. There were a couple of other things out there too, AWS gives you command line tools for manipulating your stacks, Fog has some support for doing the same, etc. I didn't really see anything that substantially changed the drudgery creating and maintaining the awful JSON that comprises the template itself.
Just because the native version of a language is hard to use does not mean you should give up the tool. I hardly ever code in assembly these days - or even C, for that matter - we have computer programs to write our computer programs in those languages, and most people try to write in something that comes easier to them. Remember a couple of years ago, everyone was talking about building ruby domain specific languages (dsl's) for everything. Chef and Puppet both have a dsl hiding around around them, and they are quite useful. As I thought about this, I thought that surely someone has already put the two ideas together (that is ruby dsl and cloudFormation), but after a quick web search I found that I was wrong. Nobody had. 

I wrote one this week.
It is still pretty raw, but feel free to take a look at it on github https://github.com/howech/cfndsl, and rubygems.org (http://rubygems.org/gems/cfndsl). It probably requires ruby 1.9, and it probably has some horrible bugs in it, still.

Next posting (soon) will talk a little bit about cfndsl.

No comments:

Post a Comment