Monthly Archives: December 2011

Anatomy of a Change Request – The Basics

In most IT environments you’ll find some kind of Change Request (CR) form.  Some of them are simple forms for simple workflows and some of them…well, aren’t.  What does a typical CR look like?  If you’re creating a Change Management (CM) process for your organization (and you should have one!), what should your CR look like?

In this post I’ll talk about the very basic information that should be in every CR. In a subsequent post I’ll go through some of the optional information that more heavyweight CM processes may use.

A minimal CR

Any CR should have at least the following information:

  • Title
  • Requestor
  • Executor
  • Execution Time
  • Purpose
  • Procedure (including execution, validation, and rollback)
  • Results

Let’s go through these one by one:

This is a short (less than one line) summary of the CR, used mainly for displaying CRs in lists.
Who asked for this change? This is important to have in case there are any questions about what should be done or decisions that need to be made about different options that can be chosen. If you don’t know who requested it, you can’t get answers to those questions.
Who is actually doing the change? This is important to know for later troubleshooting purposes – if something goes wrong you’ll want to consult the person who made the change as they will have the best knowledge of what happened and if anything strange occurred.
Execution time
For troubleshooting it is critical to know exactly when changes took place, so you can correlate with service impacts or other important events. (Your CM process may record execution time as part of the change workflow itself, in which case it’s not critical to have it actually in the CR – but it needs to be somewhere).
Why is this change being made? What is the business value of doing this? This is the field I see missing most often. Everyone involved in the CM process should understand the reason why changes are being made – and those reasons should be tied to the needs of the business. This understanding allows everyone to make informed decisions at every stage about priorities, strategies, tactics, etc. Without this understanding, the people making the changes are disconnected from the business and become disengaged and jaded, eventually leading to poor decisions.
Procedure (execution, validation, rollback)
What are you going to do? What order are you going to do it in? How are you going to make sure it worked, and didn’t break anything else? What are you going to do if something goes wrong? There are many different viewpoints on what level of detail and rigor this procedure needs to have – there is no one right answer but I always think of every CR as a candidate for future automation, and the more detailed, specific, and complete the procedural section of the CR is, the easier it will be to automate in the future.
What happened when the change was executed? Typically this part of the CR will contain pasted output from execution or validation commands, or screenshots showing the effective change, etc. If there are any problems later this prevents wasted time while people ask “did you do _____” or “what does ______ command show?” A tiny amount of work to cut’n’paste some info here can save a huge amount of heartache later.

This may seem like a lot of information for a simple CR, but in practice it doesn’t take very long to fill these out for simple changes. And for complicated changes, you shouldn’t be worried about the extra overhead of typing – if you’re not thinking through and planning your complicated changes, you’re taking big risks with your business.

Where does a CR form live?

When your CM process gets started, CR forms will often be simple documents – they could be in GDocs (this is how we do it at my company today), they could be in a wiki, or they could live directly in the ticketing system that manages your CM workflow (if you have one). What’s important is that the CRs be easy to fill out and easy to find later.

How do I start using a CR form?

Once you’ve created your CR form, the next step is simple. Just start using it for your changes! Ideally the person in charge of your infrastructure already understands the value of CM, and will be eager to have everyone start using the CR. If that’s not the case, use the CR form yourself, and ask others to use it. Even if no one else does, at some point there will be an incident that will make the value of using CRs obvious to everyone – and when that happens you’ll be ready.

Why Change Management?

Recently I had the opportunity to create a template for infrastructure change requests at work. Based on the reaction from some of my co-workers, I thought it might be valuable to explain what change requests are for. In a subsequent post I’ll go through what a basic change request looks like.

Change Requests are part of the Change Management (CM) process. Now don’t get freaked out, that doesn’t mean we need forms filled out in triplicate sent through multiple people for review and approval. Processes can have as much or as little heft as required to meet the needs of your organization. But if your infrastructure’s availability is important to you, you should have a CM process. We are a small startup, so our CM process is very lightweight. Here are the main tenets:

  1. Think about a change before you start executing it
  2. If something is high-risk, test it before you do it for real
  3. Know how you’re going to handle it if something goes horribly wrong
  4. Record that you made the change so people can find it later if they need to (for example, when troubleshooting a problem)

Point 1 (think before you execute) is really philosophical. After many years of doing production web operations, I’m convinced based on the empirical evidence that you’re far more likely to screw something up if you just start cowboying your way through a change rather than planning it ahead of time. You see this point of view in other contexts as well (“plan your flight, fly your plan”). Many times when planning a change, I have thought of something new as I’m doing the planning that I would otherwise have encountered during execution – something that in the heat of the moment would have caused me great panic. Better to hit that and work through it when you’re not stressed out in the middle of a big production change. For me one of the most important parts of having a written Change Request is that it enforces thinking through a change before you execute it.

Point 2 (test high-risk changes) may sound obvious but there are certainly nuances. How do you determine what’s high-risk and where do you draw the line? How much time do you spend doing testing vs simply rolling back a change if it does cause problems? I’ve found that it’s best to leave these decisions in the hands of the people executing the changes – but your CM process needs to remind them to ask these questions, think about the answers, and use their best judgment.

Point 3 (how to handle problems) is not theoretical. If your job is web operations, you will be involved with a change that goes horribly wrong. It just happens. When it happens, if you have not thought about it ahead of time you will be up a smelly brown creek without a paddle. This is when panic sets in, and in the heat of those moments some spectacularly bad decisions can be made which could make the situation even worse. Spending some time prior to execution thinking through potential failure scenarios allows you to execute your rollback plan calmly and effectively. Which way do you prefer?

Point 4 (change recording) is absolutely critical unless you a) never forget anything and b) are the only person involved in the support of your infrastructure. In my experience, the majority of thorny production problems are caused by changes, usually when they introduce latent faults that don’t manifest as incidents for a while. When diagnosing such a problem, it is critical that you know what changed when, and that is precisely the purpose of change recording. There are a million ways to do this, from sending emails to a “changelog” alias or putting change summaries in IRC to having a CMDB with change records in it. Less important than the specific mechanism(*) is that you have a mechanism, that people use it religiously, that it’s easy to search for changes at particular times and to particular systems, and that everyone knows where to find it and how to use it. What seems like busywork when you’re performing a change (“Why do I have to write this down? It’s already done!”) will pay giant dividends when it prevents someone from spending tons of time reverse engineering what happened while the service is down.

(*) – Note: one thing you really should leverage is version control for your CM and recording processes – it’s invaluable for being able to track a sequence of changes and to easily pull back a previously working configuration.