TL;DR: Sometimes you might want to pause deployment of changes to minimise risk of unexpected bugs or a potential outage.
What is a "change freeze"?
It's a period where no changes are to be deployed in a system (generally this means not deploying or promoting code changes to Production).
Note: Some organisations have different names for change freezes. At Amazon, we used the term "black days" and then later "blocked days".
What's the benefit?
All change introduces risk. A change freeze is just another risk mitigation strategy that tries to minimise risk by minimising change. In theory, without changing anything, nothing should break (unless it's already broken).
When would I want a change freeze?
There's a few different scenarios in which you might want a change freeze. Some might be temporal (tied to a certain time period) whereas others might be tied to a project or some other factor. Here's a few examples where you might consider a freeze:
- Ahead of high traffic periods to ensure the customer experience is not compromised when customers rely on it most (eg. For an e-commerce site, you might want to enact a change freeze in the lead up to Christmas and during Boxing Day sales where you anticipate a higher sales volume).
- When trying to deprecate a service or system, as a means of encouraging teams to "work around" the frozen system.
- When staffing / support capability is low and there is nobody to respond to any issues or outages (eg. During and ahead of a Public Holiday weekend).
This section is a collection of things that might come to mind when thinking about a change freeze and how to implement one.
So... do we keep developing?
Yes! Keep developing, just don't be deploying!
But what do I do with my queued changes?
There's two approaches I see here:
- Get the changes reviewed, but don't merge/push them into the pipeline
- Merge the changes in, but block them from deploying to Prod
The advantage of the first approach is that the pipeline is in a stable state and if an emergency change needs to come through, you can use the pipeline to deploy and test the change without much fuss.
The advantage of the second approach is that you can get your changes merged in, tested and "ready for deployment". A drawback being that it adds complexity in the case that you need an emergency change. Do you rollback the queued changes? Do you deploy out of band? Do you deploy the emergency fix with the queued changes?
One of the things to consider with a change freeze is the "exit strategy". How do you go back to normal after the change freeze period is over?
An example of the things I've seen in the past:
- Deploying different services on different days after the freeze (as opposed to "opening the flood gates" and deploying all services at the same time).
- Deploying groups of changes separately (eg. If you have two new large features developed in a service, you might want to deploy one at a time)
Ultimately, there's a few different strategies and approaches that you can take, but the main callout here is that it's something worth thinking about.
Anti-Pattern? What about CI/CD?
You might be thinking, but we have all these CI/CD processes in place, do we still need the change freeze? Isn't it a bit of an anti-pattern?
I agree, in an ideal world, organisations would roll changes out to users gradually, with ample monitoring, automatic rollback, etc. etc. In practice, I've found most organisations don't have CI/CD pipelines which are sophisticated enough. As such, a change freeze is generally a reasonable approach.