For those of you following we had our best month to date of service!
Service was at 99.998%, which relates to roughly 12 minutes of lost service per customer (our goal has been 99.9% availability which correlates to 1 hour per customer). We would have had our first 100% month if it wasn’t for one little slip-up on my part. We needed to add more memory to one of our servers and I configured it in one place but not the other (bad config mgmt on our part). So eventually that server max’d out as load grew and then did what most servers do when they are out of memory… it crashed.
As we’ve been growing there has been a huge focus shift over to network ops and areas like security, capacity management, infrastructure management, and so on. Operations is a far less discussed world in high-tech. If you think of development there are several competing ideas on how to do each area of development – coding, building, testing, etc. Agile, extreme, waterfall, RUP, DOORS – there is tons of energy put into maturing software development. Annoyingly there is far less discussion & idea generation in the world of operations – deployments, monitoring, system administration, auditing, capacity management, etc.
So, I’m going to try to engage the tech community in some spirited debate in this forgotten realm of the tech sphere over the next few months.









Maj | 06-Jul-09 at 10:58 am | Permalink
Coding sprints are great because everyone knows that programmers are not humans they’re machines and they don’t mind coding for 48 hours straight because their manager has a piece of paper that says this sprint should be done and the next one started tomorrow.