Tag Archives: change management

Change Management Workflows

In a previous post I covered the basic fields you need in a CR (Change Request) and promised to look at optional info that more heavyweight CM processes might use.  In this post I’m going to focus on workflows: approvals, reviews, and validations.  While there are other fields that you can find on a CR, it’s really the workflows that are a critical part of CM.  The basic CR we looked at last time could be used even if only one person is involved in the entire process – but typically multiple people are involved.  One person might request a change, someone else might execute it, someone else might validate that it worked, etc.  If you have workflows like these that involve multiple people, you really want to have those workflows recorded in your ticketing system – so that you can see when one step is completed (and by whom) and the next has begun, and so that you can tell from your ticketing system what stage a ticket is in and who currently owns it.  Please note that I’m not saying you HAVE to incorporate these flows into your CM process – they are appropriate in some places but not in all.  As I said in earlier posts, you should have the right amount of process and paperwork that’s appropriate for your environment – no more, no less.

Here in fantastic ASCII format is a basic CM process, showing how review/approval/validation workflows might be linked:

Create  –> Review –> Approve –> Schedule –> Execute –> Validate

Each of these steps could potentially be performed by a different person or role.  If this seems crazy to you, note that in some cases it may be required for compliance or governance reasons that different people be involved in the various stages of the process.

Create

A request for a change often comes from a non-technical resource.  It might be a customer, a project manager, a product manager, etc.  The request will not have any execution details in it, it will just say what is being requested and why.  This will then be picked up by a technical resource that will create the actual procedure to be executed, including rollback and validation steps – hopefully with an eye towards automating the change in the future.  🙂

Reviews

A technical review step is generally performed by a peer of the person who created the execution procedure and/or will be executing it.  The purpose of the review is to ensure that the procedure is complete and correct and doesn’t require any special or unique knowledge on the part of the executor.

There may also be a business review step, where a peer of the original requestor evaluates the “why” behind the change request to ensure that it is appropriate to execute.

The output of a review step may involve changes to the procedure itself or sending the CR back to the original author for modification, in which case it will come back through the review step after those modifications are made.

Approval

The approval step is a gate prior to execution.  It is often combined with the review step above, but may be separated – usually this is done when technical reviews are done by SMEs in the relevant systems or technologies, but those SMEs may not have larger/wider knowledge about all the systems that could be impacted.  In such cases there will typically be a separate approval (which may be done by a committee like a CAB) signifying that the change has been reviewed in a larger context and is deemed safe and appropriate for the system as a whole.

Schedule

Changes may be executed at the discretion of the executor, or there may be more central scheduling for some or all changes  One typical case is when there is a defined maintenance window where changes that impact customer-facing services are performed.  These windows are often planned and scheduled by a specific person or role to ensure that the limited time of the window is used most efficiently while guarding against change collisions.  (Executing multiple changes simultaneously can lead to great difficulties with diagnosis if something goes wrong – since you’re not sure which of the changes is the cause of the problem).  Even outside of maintenance windows, changes may be centrally scheduled for other reasons.

Execution

Someone actually performs the change.  Nothing special to see here.

Validation

After the change procedure has been performed, there may be a handover to another individual (typically a peer of the executor) to perform technical validation of the change.  A second pair of eyes can sometimes see unintended effects that otherwise would have been missed, and also eliminates any temptation to “paper over” what appears to be an inconsequential deviation from the CR (“it was just a one character typo!”).  This validation should be tightly coupled in time to the change execution as until this validation is complete the system is potentially in a bad state.

Separately, there may be a business validation step to ensure that the intended effect has occurred.  This could potentially be done at some remove in time from the execution and technical validation.

Record your workflows!

Arguably the most valuable part of using a ticketing system to track CRs is how easy it makes it to handle workflows.  You can see at a glance where something is in the process and who owns it.  You can record the results of each step in a workflow, including who performed it and timestamps.  And you can use the ticketing system to actively manage the workflow, including auto-assignment and sending reminders or SLA warnings if appropriate.


Anatomy of a Change Request – The Basics

In most IT environments you’ll find some kind of Change Request (CR) form.  Some of them are simple forms for simple workflows and some of them…well, aren’t.  What does a typical CR look like?  If you’re creating a Change Management (CM) process for your organization (and you should have one!), what should your CR look like?

In this post I’ll talk about the very basic information that should be in every CR. In a subsequent post I’ll go through some of the optional information that more heavyweight CM processes may use.

A minimal CR

Any CR should have at least the following information:

  • Title
  • Requestor
  • Executor
  • Execution Time
  • Purpose
  • Procedure (including execution, validation, and rollback)
  • Results

Let’s go through these one by one:

Title
This is a short (less than one line) summary of the CR, used mainly for displaying CRs in lists.
Requestor
Who asked for this change? This is important to have in case there are any questions about what should be done or decisions that need to be made about different options that can be chosen. If you don’t know who requested it, you can’t get answers to those questions.
Executor
Who is actually doing the change? This is important to know for later troubleshooting purposes – if something goes wrong you’ll want to consult the person who made the change as they will have the best knowledge of what happened and if anything strange occurred.
Execution time
For troubleshooting it is critical to know exactly when changes took place, so you can correlate with service impacts or other important events. (Your CM process may record execution time as part of the change workflow itself, in which case it’s not critical to have it actually in the CR – but it needs to be somewhere).
Purpose
Why is this change being made? What is the business value of doing this? This is the field I see missing most often. Everyone involved in the CM process should understand the reason why changes are being made – and those reasons should be tied to the needs of the business. This understanding allows everyone to make informed decisions at every stage about priorities, strategies, tactics, etc. Without this understanding, the people making the changes are disconnected from the business and become disengaged and jaded, eventually leading to poor decisions.
Procedure (execution, validation, rollback)
What are you going to do? What order are you going to do it in? How are you going to make sure it worked, and didn’t break anything else? What are you going to do if something goes wrong? There are many different viewpoints on what level of detail and rigor this procedure needs to have – there is no one right answer but I always think of every CR as a candidate for future automation, and the more detailed, specific, and complete the procedural section of the CR is, the easier it will be to automate in the future.
Results
What happened when the change was executed? Typically this part of the CR will contain pasted output from execution or validation commands, or screenshots showing the effective change, etc. If there are any problems later this prevents wasted time while people ask “did you do _____” or “what does ______ command show?” A tiny amount of work to cut’n’paste some info here can save a huge amount of heartache later.

This may seem like a lot of information for a simple CR, but in practice it doesn’t take very long to fill these out for simple changes. And for complicated changes, you shouldn’t be worried about the extra overhead of typing – if you’re not thinking through and planning your complicated changes, you’re taking big risks with your business.

Where does a CR form live?

When your CM process gets started, CR forms will often be simple documents – they could be in GDocs (this is how we do it at my company today), they could be in a wiki, or they could live directly in the ticketing system that manages your CM workflow (if you have one). What’s important is that the CRs be easy to fill out and easy to find later.

How do I start using a CR form?

Once you’ve created your CR form, the next step is simple. Just start using it for your changes! Ideally the person in charge of your infrastructure already understands the value of CM, and will be eager to have everyone start using the CR. If that’s not the case, use the CR form yourself, and ask others to use it. Even if no one else does, at some point there will be an incident that will make the value of using CRs obvious to everyone – and when that happens you’ll be ready.


Why Change Management?

Recently I had the opportunity to create a template for infrastructure change requests at work. Based on the reaction from some of my co-workers, I thought it might be valuable to explain what change requests are for. In a subsequent post I’ll go through what a basic change request looks like.

Change Requests are part of the Change Management (CM) process. Now don’t get freaked out, that doesn’t mean we need forms filled out in triplicate sent through multiple people for review and approval. Processes can have as much or as little heft as required to meet the needs of your organization. But if your infrastructure’s availability is important to you, you should have a CM process. We are a small startup, so our CM process is very lightweight. Here are the main tenets:

  1. Think about a change before you start executing it
  2. If something is high-risk, test it before you do it for real
  3. Know how you’re going to handle it if something goes horribly wrong
  4. Record that you made the change so people can find it later if they need to (for example, when troubleshooting a problem)

Point 1 (think before you execute) is really philosophical. After many years of doing production web operations, I’m convinced based on the empirical evidence that you’re far more likely to screw something up if you just start cowboying your way through a change rather than planning it ahead of time. You see this point of view in other contexts as well (“plan your flight, fly your plan”). Many times when planning a change, I have thought of something new as I’m doing the planning that I would otherwise have encountered during execution – something that in the heat of the moment would have caused me great panic. Better to hit that and work through it when you’re not stressed out in the middle of a big production change. For me one of the most important parts of having a written Change Request is that it enforces thinking through a change before you execute it.

Point 2 (test high-risk changes) may sound obvious but there are certainly nuances. How do you determine what’s high-risk and where do you draw the line? How much time do you spend doing testing vs simply rolling back a change if it does cause problems? I’ve found that it’s best to leave these decisions in the hands of the people executing the changes – but your CM process needs to remind them to ask these questions, think about the answers, and use their best judgment.

Point 3 (how to handle problems) is not theoretical. If your job is web operations, you will be involved with a change that goes horribly wrong. It just happens. When it happens, if you have not thought about it ahead of time you will be up a smelly brown creek without a paddle. This is when panic sets in, and in the heat of those moments some spectacularly bad decisions can be made which could make the situation even worse. Spending some time prior to execution thinking through potential failure scenarios allows you to execute your rollback plan calmly and effectively. Which way do you prefer?

Point 4 (change recording) is absolutely critical unless you a) never forget anything and b) are the only person involved in the support of your infrastructure. In my experience, the majority of thorny production problems are caused by changes, usually when they introduce latent faults that don’t manifest as incidents for a while. When diagnosing such a problem, it is critical that you know what changed when, and that is precisely the purpose of change recording. There are a million ways to do this, from sending emails to a “changelog” alias or putting change summaries in IRC to having a CMDB with change records in it. Less important than the specific mechanism(*) is that you have a mechanism, that people use it religiously, that it’s easy to search for changes at particular times and to particular systems, and that everyone knows where to find it and how to use it. What seems like busywork when you’re performing a change (“Why do I have to write this down? It’s already done!”) will pay giant dividends when it prevents someone from spending tons of time reverse engineering what happened while the service is down.

(*) – Note: one thing you really should leverage is version control for your CM and recording processes – it’s invaluable for being able to track a sequence of changes and to easily pull back a previously working configuration.