BoxBoat Blog
Service updates, customer stories, and tips and tricks for effective DevOps
Canary Deployments
by Christopher Andrews | Tuesday, Mar 15, 2022 | GitLab DevOps CI/CD
In DevOps, there are a multitude of CICD deployment strategies and methodologies. Each come with their Pros and each come with some Cons - some more than others. In this edition of our Deployment Methodologies Blog Post series, we are going to cover “Canary” Deployments, otherwise known as “Incremental Rollouts”.
What are Canary deployments, or Incremental Rollouts?
Canary deployments are named and modeled off of the phrase “Canaries in the Cole mine”, in which a canary was taken into a coal mine to see if it would succumb to poisonous gas, giving the humans time to escape to safety (kind of mean, right?). The Canary's physiology pushed it to succumb to poisonous gas (such as Carbon Monoxide) much quicker than a human would.
Well what the heck does that have to do with deploying applications?
First, I'll explain the technical definition of the Canary Deployments / Incremental Rollouts.
We have an upgrade to an application that has passed testing in the Staging environment; the last step before it goes into Production.
We want to incrementally roll out the application to Production now, and down the road cut over to the version change completely.
Incrementally, we can roll out 2 different ways. First, by forwarding a percentage of the traffic to containers/instances running the new code. We can begin with 10%, 25%, 50%, and then finally to 100% pending that it passed all of the increments without unexpected failures of any kind. Secondly, by loading new code onto a portion of containers/instances done in a similar incremental fashion.
So what about Canaries? In this analogy, the Canary is just an increment of the containers or users. If the application dies for those users, then it is easy to revert back to the stable version, or pull the people out of the coal mines.
What would happen otherwise? Coal miners would be sent into possible toxic conditions with no warning that the Silent Killer is hunting them. Imagine any of your favorite applications just falling from the sky at any point not to be recovered immediately. This creates bad sentiment and a lack of trust between the users/clients of the application, which will cause people to find better alternatives, hence a lack of business.
Canary Deployments Pros and Cons
Pros
- When applied correctly, canary deployments will reduce or eliminate down time due to not having to kill the stable version in production.
- In the case of ‘something bad happening’, the canaries can be easily pulled out of the coal mine.
- Not very resource intensive since there is only one production environment..
Cons
- Might be hard to configure/script.
- Will be some level of testing in production, which can result in some dead birds.
How can we use Canary Deployments/Incremental Rollouts?
GitLab easily gives us a way to implement incremental roll outs. Please fulfill the following prerequisites before continuing the implementation of a Canary deployment via Gitlab: requirements
Enabling Auto DevOps
In the same GitLab project where want to benefit from a Canary deployment, go to Settings -> CI/CD -> Auto DevOps
, click Default to Auto DevOps pipeline
, then for Deployment Strategy, choose Continuous deployment to production using timed incremental rollout
. Otherwise known as a ‘Canary Deployment’.
Save your changes and exit.
Usage
Go to your IDE, or in GitLab, simply use the Web IDE to make a change to a string somewhere. Perhaps an ‘echo’ or a print() statement. Write a simple commit message and push those changes. NOTE: In a sample project to just test the functionality of incremental rollouts with Auto DevOps, pushing directly to ‘main’ is fine, but in an actual working repository, it would be best to add a feature branch or a branch with a concise but precise name, and merge upstream based on reviews, tests to merge and deploy the changes strategically.
Once those changes are pushed, navigate to the left side pane of GitLab, hover on CI/CD, and scroll to click on pipelines. Click on the pipeline hash (usually looks something like #143
). From this view, jobs will be organized into stages. View the stages and procedures that ‘Auto DevOps - Continuous deployment to production using timed incremental rollout’ defines for us.
Lets dig into the picture above. We see 4 stages defined for us
- Build
- Test
- Production
- Performance
We are going to dig into and focus on primarily the 1st stage, ‘Build’, and the 3rd ‘Production’, since those focus on the execution of the Incremental Rollout we are studying. Reason for doing this is that ‘Test’ or ‘Performance’ stages are kind of vague and could vary widely depending on the purpose of the application and certain requirements.
Lets look at this example of a canary deployment pipeline defined in a .gitlab-ci.yml to strengthen the argument we've made above.
canary:
extends: .auto-deploy
stage: canary
allow_failure: true
script:
- auto-deploy check_kube_domain
- auto-deploy download_chart
- auto-deploy use_kube_context || true
- auto-deploy ensure_namespace
- auto-deploy initialize_tiller
- auto-deploy create_secret
- auto-deploy deploy canary 50
environment:
name: production
url: http://$CI_PROJECT_PATH_SLUG.$KUBE_INGRESS_BASE_DOMAIN
rules:
- if: '$CI_DEPLOY_FREEZE != null'
when: never
- if: '($CI_KUBERNETES_ACTIVE == null || $CI_KUBERNETES_ACTIVE == "") && ($KUBECONFIG == null || $KUBECONFIG == "")'
when: never
- if: '$CI_COMMIT_BRANCH != $CI_DEFAULT_BRANCH'
when: never
- if: '$CANARY_ENABLED'
when: manual
Lets break some of this down.
extends: .auto-deploy
This means we will be extending GitLab Auto Deploy. We define this in our .gitlab-ci.yml file as such:
variables:
AUTO_DEPLOY_IMAGE_VERSION: 'v2.18.1'
.auto-deploy:
image: "registry.gitlab.com/gitlab-org/cluster-integration/auto-deploy-image:${AUTO_DEPLOY_IMAGE_VERSION}"
dependencies: []
This defines the version of the ‘auto-deploy-image’ we want.
In the above ‘canary’ snip, we then see:
canary:
extends: .auto-deploy
stage: canary
allow_failure: true
script:
- auto-deploy check_kube_domain
- auto-deploy download_chart
- auto-deploy use_kube_context || true
- auto-deploy ensure_namespace
- auto-deploy initialize_tiller
- auto-deploy create_secret
- auto-deploy deploy canary 50
The ‘auto-deploy’ tool is making a series of checks that will fail if not satisfied. Note: If helm2 is still configured, there is a tool to upgrade to helm3 to get rid of tiller.
environment:
name: production
url: http://$CI_PROJECT_PATH_SLUG.$KUBE_INGRESS_BASE_DOMAIN
rules:
- if: '$CI_DEPLOY_FREEZE != null'
when: never
- if: '($CI_KUBERNETES_ACTIVE == null || $CI_KUBERNETES_ACTIVE == "") && ($KUBECONFIG == null || $KUBECONFIG == "")'
when: never
- if: '$CI_COMMIT_BRANCH != $CI_DEFAULT_BRANCH'
when: never
- if: '$CANARY_ENABLED'
when: manual
The environment defined production
can be imported into GitLab by going to the left side pane to ‘Deployments’ -> ‘Environments’.
Lets dig into the rules
section.
The first rule checks to see if there is a deployment freeze. The second rule ensures there is an active kubernetes cluster to deploy to, then checks to see if the following is in the correct branch:
- if: $CANARY_ENABLED
when: manual
This is the manual step checking if they are enabled in the left side pane under ‘Settings’ -> ‘CI/CD’ -> ‘Variables’.
The manual step means that there will be a play button in the Pipeline graph to start the incremental roll out if the previous tests have been passed.
Now that canary is set up and defined, lets look into the rollout section. See the below rollout_template
.rollout: &rollout_template
extends: .auto-deploy
script:
- auto-deploy check_kube_domain
- auto-deploy download_chart
- auto-deploy use_kube_context
- auto-deploy ensure_namespace
- auto-deploy initialize_tiller
- auto-deploy create_secret
- auto-deploy deploy canary $ROLLOUT_PERCENTAGE
- auto-deploy persist_environment_url
environment:
name: production
url: http://$CI_PROJECT_PATH_SLUG.$KUBE_INGRESS_BASE_DOMAIN
artifacts:
paths: [environment_url.txt, tiller.log]
when: always
After auto-deploy does a series of operations similar to the canary stage, we see the defined environment with its identifying name and url with artifacts defined below.
The ‘manual’ rollout section is defined in the snip below. Note that if you don't want it to be manual (hitting the play button), you can always take that step out of your configuration, so that when the previous stages pass, the rollout will start.
Here we see the manual_rollout_template
.manual_rollout_template: &manual_rollout_template
<<: *rollout_template
stage: production
resource_group: production
allow_failure: true
rules:
- if: '$CI_DEPLOY_FREEZE != null'
when: never
- if: '($CI_KUBERNETES_ACTIVE == null || $CI_KUBERNETES_ACTIVE == "") && ($KUBECONFIG == null || $KUBECONFIG == "")'
when: never
- if: '$INCREMENTAL_ROLLOUT_MODE == "timed"'
when: never
- if: '$CI_COMMIT_BRANCH != $CI_DEFAULT_BRANCH'
when: never
# $INCREMENTAL_ROLLOUT_ENABLED is for compatibility with pre-GitLab 11.4 syntax
- if: '$INCREMENTAL_ROLLOUT_MODE == "manual" || $INCREMENTAL_ROLLOUT_ENABLED'
when: manual
We see some of the same rules as the canary stage checking for a timed INCREMENTAL_ROLLOUT_MODE
. After, we check to make sure it is the default branch.
rollout 25%:
<<: *manual_rollout_template
variables:
ROLLOUT_PERCENTAGE: 25
The above defines how we use the rollout template for different percentages. The CI file has one for 10, 25, 50, and a rollout 100%
extension of the template, that also makes use of the .production_template
.production: &production_template
extends: .auto-deploy
stage: production
script:
- auto-deploy check_kube_domain
- auto-deploy download_chart
- auto-deploy use_kube_context
- auto-deploy ensure_namespace
- auto-deploy initialize_tiller
- auto-deploy create_secret
- auto-deploy deploy
- auto-deploy delete canary
- auto-deploy persist_environment_url
environment:
name: production
url: http://$CI_PROJECT_PATH_SLUG.$KUBE_INGRESS_BASE_DOMAIN
artifacts:
paths: [environment_url.txt, tiller.log]
when: always
From the above, we see some of the same checks as before, however, this deletes the canary release we deployed, finally transitioning our application to a full 100% deployment within the ‘production’ environment. Thus, the canary is now no longer needed.