Death by Data Data: 2012

Last posting I ranted a little about what I like and don't like about [AWS CloudFormation](http://aws.amazon.com/cloudformation/). This time, I am going to do something about it.

AWS CloudFormation Templates

If you are using AWS for anything substantial and you are not using CloudFormation, you should think about it. It gives you a place to launch a whole bunch of AWS resources in a well defined and repeatable fashion. In my mind, there is really only one drawback: the template language is awful.

What do I mean?

Here is a template I have been playing with this afternoon:

{

"AWSTemplateFormatVersion" : "2010-09-09"

"Parameters" : {

"BucketName" : {

"Type" : "String",

"Default" : "MyBucket",

"Description" : "Name of the bucket to grant read access to."

"Folder" : {

"Type" : "String",

"Default" : "myFolder",

"MinLength" : 2,

"Description" : "Name of a folder in the bucket to grant read access to."

}

"Resources" : {

"ReadBucketIProfile" : {

"Type" : "AWS::IAM::InstanceProfile",

"Properties" : {

"Roles" : [{"Ref" : "ReadBucketRole"}],

"Path" : "/"

}

"ReadBucketRole" : {

"Type" : "AWS::IAM::Role",

"Properties" : {

"AssumeRolePolicyDocument" : {

"Statement" : [

{

"Effect" : "Allow",

"Action" : ["sts:AssumeRole" ],

"Principal" : { "Service" : ["ec2.amazonaws.com"]

}

]

"Policies" : [

{

"PolicyDocument" : {

"Statement" : [

{

"Effect" : "Allow",

"Resource" : {

"Fn::Join" : [

"",

[

"arn:aws:s3:::",

{

"Ref" : "BucketName"

"/",

{

"Ref" : "Folder"

"/*"

]

"Action" : [

"s3:GetObject",

"s3:GetObjectVersion"

]

}

]

"PolicyName" : "readBucket"

}

"Path" : "/"

}

As you can see, the language is ghastly. This particular template sets up an IAM Role that allows the machines that it is associated with have read access to the files stored in a particular folder in a particular bucket in S3. It has two input parameters, and it creates two resources. The input parameters allow you to tell it what bucket and folder you want the created role to access. The resources are an IAM Role and and instance profile. The role contains some policy definitions, allowing access to an S3 bucket, while the profile object gathers roles together to associate with EC2 instances. This probably is not a template that is going to be of much use to you, or anyone else, but when you run it, it does create resources in AWS (I think that they are all free!)

Anyway, here goes launching a stack with command line tools:

chris@frankentstein:~$ cfn-create-stack TestStack --template-file rr.template -p "BucketName=deathbydatadata;Folder=cfn" -n arn:aws:sns:us-east-1:99999999999:deathbydatadata

arn:aws:cloudformation:us-east-1:999999999999:stack/TestStack/e9dae100-47d8-11e2-b6f5-5081c366858d

One thing I have learned about setting up CFN stacks is that it is really useful to set up an SNS topic for it to talk to using the "-n arn:aws:sns:..." notation, especially when you are writing a new stack. CloudFormation will send very detailed notifications about what is going on, and sometimes this is about the only way to diagnose what went wrong when a stack folds on creation because of some failure to create a resource. You can also of course watch the progression of stack creation through the events tab on the cloudformation web panel, or you can periodically ask for updates from the api:

chris@frankentstein:~$ cfn-describe-stack-events -s TestStack

STACK_EVENT TestStack TestStack AWS::CloudFormation::Stack 2012-12-16T23:35:47Z CREATE_COMPLETE

STACK_EVENT TestStack ReadBucketIProfile AWS::IAM::InstanceProfile 2012-12-16T23:35:46Z CREATE_COMPLETE

STACK_EVENT TestStack ReadBucketIProfile AWS::IAM::InstanceProfile 2012-12-16T23:33:26Z CREATE_IN_PROGRESS

STACK_EVENT TestStack ReadBucketRole AWS::IAM::Role 2012-12-16T23:33:26Z CREATE_COMPLETE

STACK_EVENT TestStack ReadBucketRole AWS::IAM::Role 2012-12-16T23:33:13Z CREATE_IN_PROGRESS

STACK_EVENT TestStack TestStack AWS::CloudFormation::Stack 2012-12-16T23:32:53Z CREATE_IN_PROGRESS User Initiated

Yay, it looks as if the stack worked: (The CREATE_COMPLETE event for the AWS::CloudFormation::Stack at the top of the list is a good indicator of this). One of the nice things about parameterizing stacks is that you can often update the paramaters without destroying everything (although, sometimes updates can be pretty ugly in terms of rebuilding resources - read the docs about your resources carefully before you do it to a production-level system...) Here is an update, chagning the Folder parameter to "dsl" insdtead of "cfn":

chris@frankentstein:~$ cfn-update-stack TestStack --template-file rr.template -p "BucketName=deathbydatadata;Folder=dsl"

arn:aws:cloudformation:us-east-1:999999999999:stack/TestStack/e9dae100-47d8-11e2-b6f5-5081c366858d

chris@frankentstein:~$ cfn-describe-stack-events -s TestStack

STACK_EVENT TestStack TestStack AWS::CloudFormation::Stack 2012-12-16T23:56:36Z UPDATE_COMPLETE

STACK_EVENT TestStack TestStack AWS::CloudFormation::Stack 2012-12-16T23:56:32Z UPDATE_COMPLETE_CLEANUP_IN_PROGRESS

STACK_EVENT TestStack ReadBucketRole AWS::IAM::Role 2012-12-16T23:56:28Z UPDATE_COMPLETE

STACK_EVENT TestStack ReadBucketRole AWS::IAM::Role 2012-12-16T23:56:11Z UPDATE_IN_PROGRESS

STACK_EVENT TestStack TestStack AWS::CloudFormation::Stack 2012-12-16T23:55:58Z UPDATE_IN_PROGRESS User Initiated

Note that in this case, all we needed to touch was the Role object, and CloudFormation was able to update it in place. Nifty. If I has machines associated with this role, they would now be able to read the "dsl" folder, but not the "cfn" folder of the deathbydatadata bucket. Outstanding!

A Snippet of CfnDsl

Now, go back to the template a minute. The list of parameters is not too bad, but the resources part is ugly. Even though there are only two resources present, I get lost looking at the resource definitions. The real culprit here is the "readBucket" policy object, and it is especially bad because we are building up an arn string out of its compnents. The template language has a tool that is sufficient for this kind of work in the form of the built in function Fn::Join. It works a lot like you would expect a join command to work if you have used JavaScript, Perl or Ruby - it builds a string by concatenating together an array of strings, interspersed by a separator string. Here it is in detail:

{ "Fn::Join" : [ "",

[ "arn:aws:s3:::",

{ "Ref" : "BucketName" },

"/",

{ "Ref" : "Folder" },

"/*"

]

}

You know, the "Ref"s dont help a whole lot either when you are trying to read this thing. So, what does this same thing look like in cfndsl?

FnFormat("arn:aws:s3:::%0/%1/*", Ref("BucketName"), Ref("Folder") )

Ref("BucketName") is ultimately going to turn into a "Ref" style JSON object. What's up with the FnFormat? It will ultimately resolve to a string with instances of %0 replaced with the value of the first parameter after the format string, %1 replaced by the second after the format string, etc. AWS doesn't have one of those! Of course, it doesnt need it as you can do the same thing with Fn::Join. If you use FnFormat, the ruby DSL will take care of figuring out how to write it into Fn::Join notation. So far, FnFormat is the only extra function that I have written for the DSL. The AWS builtin functions are all available by their Amazon names, with the "::" removed. FnJoin(...) produces {"Fn::Join":,,,}.

A whole Template in CfnDsl

Ok, so now that you have had a taste of it, here is the whole read bucket template written in cfndsl:

CloudFormation {

AWSTemplateFormatVersion "2010-09-09"

Parameter("BucketName") {

Type :String

Default "MyBucket"

Description "Name of the bucket to grant read access to."

}

Parameter("Folder") {

Type :String

Default "myFolder"

MinLength 2

Description "Name of a folder in the bucket to grant read access to."

}

Resource("ReadBucketRole") {

Type "AWS::IAM::Role"

Property( "AssumeRolePolicyDocument", {

"Statement" => [ {

"Effect" => "Allow",

"Principal"=>

{

"Service" => [ "ec2.amazonaws.com" ]

"Action" => [ "sts:AssumeRole" ]

} ]

})

Property("Path", "/")

Property("Policies",

[

{ "PolicyName"=> "readBucket",

"PolicyDocument"=>

{

"Statement" =>

[

{

"Effect" => "Allow",

"Action" => ["s3:GetObject","s3:GetObjectVersion"],

"Resource" => FnFormat("arn:aws:s3:::%0/%1/*",

Ref("BucketName"),

Ref("Folder") )

}

]

}

]

)

}

Resource( "ReadBucketIProfile") {

Type "AWS::IAM::InstanceProfile"

Property( "Path", "/")

Property( "Roles", [ Ref("ReadBucketRole") ] )

}

Easier to read? I think so, but it is probably a matter of opinion. I like that the resources are declared individually, rather than as a long list. Resource properties are usually pretty simple, so I declare them here by just giving the value as a second parameter to the Property constructor keyword - you could actually use the block form just as easily.

I have defined special objects for handling most of the top level stuff in a template - the template itself, Parameters, Resources, Mappings, Outputs, Metadata, and Resource Properties. I also have in place a means for dealing with function calls, discussed previously. There is of course a whole lot more to a template than these things, as many of the resources have complicated and dedicated data types used to specify their inner workings. While I eventually plan to capture some more of these into the same object notation, it is not always convenient to do so. When a particular type structure has not been explicitly implemented in the dsl, template authors can always fall back on creating ruby hashes and arrays that parallel the JSON notation for the structure that they are creating, and the dsl will handle it appropriately.

Running CfnDsl

How do your turn this ruby thing into something that AWS understands? Ah - simplicity itself (assuming that you have ruby 1.9). First, you need to get yourself set up with the cfndsl ruby gem

chris@frankentstein:~$ sudo gem install cfndsl

Fetching: cfndsl-0.0.4.gem (100%)

Successfully installed cfndsl-0.0.4

1 gem installed

Installing ri documentation for cfndsl-0.0.4...

Installing RDoc documentation for cfndsl-0.0.4...

Then you just run cfndsl on the ruby

chris@frankentstein:~$ cfndsl rr.rb

{"AWSTemplateFormatVersion":"2010-09-09","Parameters":{"BucketName":{"Type":"String","Default":"MyBucket","Description":"Name of the bucket to grant read access to."},"Folder":{"Type":"String","Default":"myFolder","Description":"Name of a folder in the bucket to grant read access to.","MinLength":2}},"Resources":{"ReadBucketRole":{"Type":"AWS::IAM::Role","Properties":{"AssumeRolePolicyDocument":{"Statement":[{"Effect":"Allow","Principal":{"Service":["ec2.amazonaws.com"]},"Action":["sts:AssumeRole"]}]},"Path":"/","Policies":[{"PolicyName":"readBucket","PolicyDocument":{"Statement":[{"Effect":"Allow","Action":["s3:GetObject","s3:GetObjectVersion"],"Resource":{"Fn::Join":["",["arn:aws:s3:::",{"Ref":"BucketName"},"/",{"Ref":"Folder"},"/*"]]}}]}}]}},"ReadBucketIProfile":{"Type":"AWS::IAM::InstanceProfile","Properties":{"Path":"/","Roles":[{"Ref":"ReadBucketRole"}]}}}}

There it is, ready to build resources with! How do you build a stack with it? Well, so far I have just been redirecting the output of cfndsl into a text file and then running "cfn-create-stack" referencing the result. There may be better ways to hook these tools together.

It could be that these few improvements are enough to justify having a ruby dsl behind what is effectively a JSON dsl. As I said, before I believe that the dsl representation of my template is a little nicer. However, ruby allows some other things that we have not explored yet. Not the least of these is comments - sometimes a small (or a large) amount of comments in a piece of code keeps it maintainable. But of course, ruby lets you do much more, but I will save it for next time.

It has been quite a while since I have posted anything to a blog, but I thought I would get started writing again.

I wanted to talk a little about Amazon CloudFormation. (And plug something that I wrote recently.) I have worked with it for several months now, and I have to say that overall I am quite impressed.
that this particular system

In the past, I worked on a product that used fog and chef and some other ruby scripting glue to deploy resources in AWS. Overall, the system worked quite well, and I was able to deploy some fairly complicated systems with very little effort, but there were a few drawbacks.

One of the most significant things about this system (that's right, I'm not naming it...) was it was effectively using the database on the chef Master as a kind of registry of cluster available resources. In one way, this was a very powerful move, in that it allowed clustered resources to be aware of their surroundings in a way that was previously difficult to do. For instance, if I was setting up a Hadoop cluster in this manner, I could tell the machine that was running the namenode that it was part of a hadoop cluster, and it could use an api call to the chef master to figure out where all of the datanodes were, and vice versa, allowing both types of machines to set up the correct configuration. So what is not to love? Well, the chef master server was something of a point of failure for the hadoop cluster. Well, really for a hadoop cluster, it is not all that much of a problem, as the chef master only has to be around when significant configuration changes occur - like adding more datanodes to the cluster. The system in question was not using Autoscaling groups, so changes in the configuration pretty much only happened during times of manual intervention, so it was not really all that much of a burden that there was a dependency on the chef master.

Life is different when you are putting together an autoscaling group of webservers. The idea is that you tell AWS how to bootstrap a machine to be a web server. You also tell it that machines in such a group need to have http requests coming to a particular address be load balanced across the set. Then you tell it (the autoscaling group) that when the load on your website exceeds a certain threshold, that it should spin up some more servers, or if the traffic falls below some other level, it is ok to kill off a few servers. In this framework, automatic configuration of cloud resources can become a critical operation - if you can't manage to get them up and running to meet rising demand for requests, your web service could potentially fail to meet SLAs. If you are using chef (or puppet for that matter) as a part of the bootstrap process, a failure on the master could precipitate a larger system failure. There are of course ways to mitigate this risk, but lets move forward - even though both of these systems are fantastic at configuring what goes inside a virtual box, neither of them (to my imperfect knowledge) is particularly good about making sure that external systems are configured.

Enter CloudFormation. CloudFormation gives you an entry point into the AWS ecology where you can orchestrate the provisioning of a suite of AWS resources. AWS is api driven, and just about all of the api is accessible to you through CloudFormation templates - including autoscaling groups, which are not supported on the web console. There is something magic about being able to set up a data store, a dns record, a load balanced set of servers that will automatically scale with load, and other resources from a single call to AWS. I can not say enough good things about the service, and I am sure that they have all been said before, so I will stop trying to sell it. I have had a lot of success with it so far, and I plan to use it again in future deployments.

One of the ways that CloudFormation wins over my previous approach of using chef master as the registry for clustered systems is that you you can embed configuration information into the script that builds the cluster, and the machines in the cluster are able to get that data out at configuration time (this is what the aws bootstrapping tool cfn-init does when it runs) or you can set up a process on the machine that polls for chagnes to this (this is what cfn-hup does, I believe). These are roughly analogous to the the an initial chef run and a periodic update call to chef-client. In actuality, by putting the cluster configuration information into a CloudFormation template, I have transferred the potential failure from a chef master to some nebulous parts of the Amazon API, but I am pretty comfortable with the redundancy that I get from multiple availability zones, so I think that this is still probably a win. Of course, there is no reason why you can't engineer a solution that uses both, so this is not by any means an argument to use cloudFormation to replace chef functionality.

However, I have to say that I have a couple of complaints about CloudFormation. The first one is that debug cycles for cloudformations stacks are long. It can take several minutes for resources to get created. In some cases, you get warned early that there are problems with your template, but if there are problems getting all of the resources that you install on EC2 instances working, you may have to wait 10 minutes to find out if something is wrong, and then have to tear the whole thing down and start again (just a single API call, but a long wait.) I suppose that this is the nature of the beast, but it gets infuriating at times. (I remind myself at these times that if I were waiting for an IT staff to set up my cluster of 50 servers, I would be waiting for a week, not for half an hour, and I would have to pay them even when the machines were not running...)

My real complaint is that the language used for the templates is ghastly. From Amazon's architectural point of view, it makes a lot of sense to represent this language as a giant JSON object - everything else about their API is in JSON and there is no good reason to make an exception for this. However, from my point of view as a human (well, half, anyway), JSON is a terrible language. I find the syntax for function calls confusing. I get lost in the long lists of things. In an especially large template file I find that I have trouble figuring out where one resource ends and the next one begins. You can do a certain amount of organization with white space: you can indent objects - that helps some. If you could include comments, I suspect that many of my complaints would evaporate entirely, as I could then add the kind of commenting structure that I am used to in regular programming languages to help me navigate and to explain particularly tricky parts of the code, etc.

As I was plowing through a particularly complicated CloudFormation template the other day, it struck me that there was a real lack of tools to help you build CloudFormation templates. There are a few - Amazon has a tool called CloudFormer. Basically, you fire up an ec2 instance with cloudformer on it, and it helps you piece together a template from your existing resources. This is somewhat helpful, but it means that you have to set up the thing that you want first. If you need to modify the result, you have to wade into the JSON yourself. There were a couple of other things out there too, AWS gives you command line tools for manipulating your stacks, Fog has some support for doing the same, etc. I didn't really see anything that substantially changed the drudgery creating and maintaining the awful JSON that comprises the template itself.

Just because the native version of a language is hard to use does not mean you should give up the tool. I hardly ever code in assembly these days - or even C, for that matter - we have computer programs to write our computer programs in those languages, and most people try to write in something that comes easier to them. Remember a couple of years ago, everyone was talking about building ruby domain specific languages (dsl's) for everything. Chef and Puppet both have a dsl hiding around around them, and they are quite useful. As I thought about this, I thought that surely someone has already put the two ideas together (that is ruby dsl and cloudFormation), but after a quick web search I found that I was wrong. Nobody had.

I wrote one this week.

It is still pretty raw, but feel free to take a look at it on github https://github.com/howech/cfndsl, and rubygems.org (http://rubygems.org/gems/cfndsl). It probably requires ruby 1.9, and it probably has some horrible bugs in it, still.

Next posting (soon) will talk a little bit about cfndsl.

Death by Data Data

Sunday, December 16, 2012

Introducing cfndsl

AWS CloudFormation Templates

A Snippet of CfnDsl

A whole Template in CfnDsl

Running CfnDsl

Saturday, December 15, 2012

Amazon CloudFormation (DSL anyone?)