As you know we run our servers on Amazon’s AWS.
When we needed to create a new server in the past, we bundled all of our packages, applications and magic into an Amazon AMI. Put that together with some home-cooked deploy scripts that read from an svn server + a glusterfs server with some common config files, this was our server setup + config mgmt system.
This system started to quickly fall apart when we either changed our technology or upgraded servers. Recently two things killed us – upgrading JBOSS and switching monitoring systems. Both cases we had to rebuild AMIs, and if we wanted to be clean we would have had to relaunch the whole production network from scratch. Ugggh.
So we wanted to use a strategy where we had really raw AMIs, basically with just the OS running on it. And then we’d use “something” to add all of our packages and do everything we needed to turn a raw server into something useful.
My eyes got turned on to Chef a while back. So I decided to try it out. Well, 3 days later, Chef has delivered everything I hoped for… and even a bit more. We can now launch an EC2 image that only runs Ubuntu + Chef. Chef then installs everything we need:
-jboss
-java
-glusterfs
-users, groups, sshkeys
-snmp and all our crazy snmp scripts for monitoring
-and so on….
Its pretty amazing in its power now that its running. I can launch a new node in a few minutes, and I can easily upgrade it along the way (as I learned by making many mistakes in the first server I launched!)
The one annoyance I had is that our tech stack seemed to be a bit different than the shared cookbooks from opscode, 37Signals and others. We run jboss for instance and there was nothing available publicly. I’d love to see how folks are handling cluster.xml config, log4j config and so on in their cookbooks. I’ll put mine up on github for yawl to see what we’ve done. Also all the monitoring cookbooks were nagios/ganglia. We use zenoss running on snmp agents, and again, there wasn’t much available to help us.
It took me a good three days straight to set this up. Part of it was learning Chef as a tool. But a big part of it was forgetting how we originally setup certain things. I lost a ton of time re-figuring out our glusterfs config that we hadn’t touched in months so I could bring it under Chef.
Any of you folks running lean & agile production networks, I definitely suggest looking at this tool.