BoxBoat Blog

Service updates, customer stories, and tips and tricks for effective DevOps

x ?

Get Hands-On Experience with BoxBoat's Cloud Native Academy

Canary Deployments

by Christopher Andrews | Tuesday, Mar 15, 2022 | GitLab DevOps CI/CD

featured.png

In DevOps, there are a multitude of CICD deployment strategies and methodologies. Each come with their Pros and each come with some Cons - some more than others. In this edition of our Deployment Methodologies Blog Post series, we are going to cover “Canary” Deployments, otherwise known as “Incremental Rollouts”.

What are Canary deployments, or Incremental Rollouts?

Canary deployments are named and modeled off of the phrase “Canaries in the Cole mine”, in which a canary was taken into a coal mine to see if it would succumb to poisonous gas, giving the humans time to escape to safety (kind of mean, right?). The Canary's physiology pushed it to succumb to poisonous gas (such as Carbon Monoxide) much quicker than a human would.

Well what the heck does that have to do with deploying applications?

First, I'll explain the technical definition of the Canary Deployments / Incremental Rollouts.

We have an upgrade to an application that has passed testing in the Staging environment; the last step before it goes into Production.

We want to incrementally roll out the application to Production now, and down the road cut over to the version change completely.

Incrementally, we can roll out 2 different ways. First, by forwarding a percentage of the traffic to containers/instances running the new code. We can begin with 10%, 25%, 50%, and then finally to 100% pending that it passed all of the increments without unexpected failures of any kind. Secondly, by loading new code onto a portion of containers/instances done in a similar incremental fashion.

So what about Canaries? In this analogy, the Canary is just an increment of the containers or users. If the application dies for those users, then it is easy to revert back to the stable version, or pull the people out of the coal mines.

What would happen otherwise? Coal miners would be sent into possible toxic conditions with no warning that the Silent Killer is hunting them. Imagine any of your favorite applications just falling from the sky at any point not to be recovered immediately. This creates bad sentiment and a lack of trust between the users/clients of the application, which will cause people to find better alternatives, hence a lack of business.

Canary Deployments Pros and Cons

Pros

  • When applied correctly, canary deployments will reduce or eliminate down time due to not having to kill the stable version in production.
  • In the case of ‘something bad happening’, the canaries can be easily pulled out of the coal mine.
  • Not very resource intensive since there is only one production environment..

Cons

  • Might be hard to configure/script.
  • Will be some level of testing in production, which can result in some dead birds.

How can we use Canary Deployments/Incremental Rollouts?

GitLab easily gives us a way to implement incremental roll outs. Please fulfill the following prerequisites before continuing the implementation of a Canary deployment via Gitlab: requirements

Enabling Auto DevOps

In the same GitLab project where want to benefit from a Canary deployment, go to Settings -> CI/CD -> Auto DevOps, click Default to Auto DevOps pipeline, then for Deployment Strategy, choose Continuous deployment to production using timed incremental rollout. Otherwise known as a ‘Canary Deployment’.

Save your changes and exit.

Usage

Go to your IDE, or in GitLab, simply use the Web IDE to make a change to a string somewhere. Perhaps an ‘echo’ or a print() statement. Write a simple commit message and push those changes. NOTE: In a sample project to just test the functionality of incremental rollouts with Auto DevOps, pushing directly to ‘main’ is fine, but in an actual working repository, it would be best to add a feature branch or a branch with a concise but precise name, and merge upstream based on reviews, tests to merge and deploy the changes strategically.

Once those changes are pushed, navigate to the left side pane of GitLab, hover on CI/CD, and scroll to click on pipelines. Click on the pipeline hash (usually looks something like #143). From this view, jobs will be organized into stages. View the stages and procedures that ‘Auto DevOps - Continuous deployment to production using timed incremental rollout’ defines for us.

Canary

Lets dig into the picture above. We see 4 stages defined for us

  1. Build
  2. Test
  3. Production
  4. Performance

We are going to dig into and focus on primarily the 1st stage, ‘Build’, and the 3rd ‘Production’, since those focus on the execution of the Incremental Rollout we are studying. Reason for doing this is that ‘Test’ or ‘Performance’ stages are kind of vague and could vary widely depending on the purpose of the application and certain requirements.

Lets look at this example of a canary deployment pipeline defined in a .gitlab-ci.yml to strengthen the argument we've made above.

canary:
  extends: .auto-deploy
  stage: canary
  allow_failure: true
  script:
    - auto-deploy check_kube_domain
    - auto-deploy download_chart
    - auto-deploy use_kube_context || true
    - auto-deploy ensure_namespace
    - auto-deploy initialize_tiller
    - auto-deploy create_secret
    - auto-deploy deploy canary 50
  environment:
    name: production
    url: http://$CI_PROJECT_PATH_SLUG.$KUBE_INGRESS_BASE_DOMAIN
  rules:
    - if: '$CI_DEPLOY_FREEZE != null'
      when: never
    - if: '($CI_KUBERNETES_ACTIVE == null || $CI_KUBERNETES_ACTIVE == "") && ($KUBECONFIG == null || $KUBECONFIG == "")'
      when: never
    - if: '$CI_COMMIT_BRANCH != $CI_DEFAULT_BRANCH'
      when: never
    - if: '$CANARY_ENABLED'
      when: manual

Lets break some of this down.

  extends: .auto-deploy

This means we will be extending GitLab Auto Deploy. We define this in our .gitlab-ci.yml file as such:

variables:
  AUTO_DEPLOY_IMAGE_VERSION: 'v2.18.1'

.auto-deploy:
  image: "registry.gitlab.com/gitlab-org/cluster-integration/auto-deploy-image:${AUTO_DEPLOY_IMAGE_VERSION}"
  dependencies: []

This defines the version of the ‘auto-deploy-image’ we want.

In the above ‘canary’ snip, we then see:

canary:
  extends: .auto-deploy
  stage: canary
  allow_failure: true
  script:
    - auto-deploy check_kube_domain
    - auto-deploy download_chart
    - auto-deploy use_kube_context || true
    - auto-deploy ensure_namespace
    - auto-deploy initialize_tiller
    - auto-deploy create_secret
    - auto-deploy deploy canary 50

The ‘auto-deploy’ tool is making a series of checks that will fail if not satisfied. Note: If helm2 is still configured, there is a tool to upgrade to helm3 to get rid of tiller.

  environment:
    name: production
    url: http://$CI_PROJECT_PATH_SLUG.$KUBE_INGRESS_BASE_DOMAIN
  rules:
    - if: '$CI_DEPLOY_FREEZE != null'
      when: never
    - if: '($CI_KUBERNETES_ACTIVE == null || $CI_KUBERNETES_ACTIVE == "") && ($KUBECONFIG == null || $KUBECONFIG == "")'
      when: never
    - if: '$CI_COMMIT_BRANCH != $CI_DEFAULT_BRANCH'
      when: never
    - if: '$CANARY_ENABLED'
      when: manual

The environment defined production can be imported into GitLab by going to the left side pane to ‘Deployments’ -> ‘Environments’.

Lets dig into the rules section. The first rule checks to see if there is a deployment freeze. The second rule ensures there is an active kubernetes cluster to deploy to, then checks to see if the following is in the correct branch:

    - if: $CANARY_ENABLED
      when: manual

This is the manual step checking if they are enabled in the left side pane under ‘Settings’ -> ‘CI/CD’ -> ‘Variables’.

The manual step means that there will be a play button in the Pipeline graph to start the incremental roll out if the previous tests have been passed.

Now that canary is set up and defined, lets look into the rollout section. See the below rollout_template

.rollout: &rollout_template
  extends: .auto-deploy
  script:
    - auto-deploy check_kube_domain
    - auto-deploy download_chart
    - auto-deploy use_kube_context
    - auto-deploy ensure_namespace
    - auto-deploy initialize_tiller
    - auto-deploy create_secret
    - auto-deploy deploy canary $ROLLOUT_PERCENTAGE
    - auto-deploy persist_environment_url
  environment:
    name: production
    url: http://$CI_PROJECT_PATH_SLUG.$KUBE_INGRESS_BASE_DOMAIN
  artifacts:
    paths: [environment_url.txt, tiller.log]
    when: always

After auto-deploy does a series of operations similar to the canary stage, we see the defined environment with its identifying name and url with artifacts defined below.

The ‘manual’ rollout section is defined in the snip below. Note that if you don't want it to be manual (hitting the play button), you can always take that step out of your configuration, so that when the previous stages pass, the rollout will start.

Here we see the manual_rollout_template

.manual_rollout_template: &manual_rollout_template
  <<: *rollout_template
  stage: production
  resource_group: production
  allow_failure: true
  rules:
    - if: '$CI_DEPLOY_FREEZE != null'
      when: never
    - if: '($CI_KUBERNETES_ACTIVE == null || $CI_KUBERNETES_ACTIVE == "") && ($KUBECONFIG == null || $KUBECONFIG == "")'
      when: never
    - if: '$INCREMENTAL_ROLLOUT_MODE == "timed"'
      when: never
    - if: '$CI_COMMIT_BRANCH != $CI_DEFAULT_BRANCH'
      when: never
    # $INCREMENTAL_ROLLOUT_ENABLED is for compatibility with pre-GitLab 11.4 syntax
    - if: '$INCREMENTAL_ROLLOUT_MODE == "manual" || $INCREMENTAL_ROLLOUT_ENABLED'
      when: manual

We see some of the same rules as the canary stage checking for a timed INCREMENTAL_ROLLOUT_MODE. After, we check to make sure it is the default branch.

rollout 25%:
  <<: *manual_rollout_template
  variables:
    ROLLOUT_PERCENTAGE: 25

The above defines how we use the rollout template for different percentages. The CI file has one for 10, 25, 50, and a rollout 100% extension of the template, that also makes use of the .production_template

.production: &production_template
  extends: .auto-deploy
  stage: production
  script:
    - auto-deploy check_kube_domain
    - auto-deploy download_chart
    - auto-deploy use_kube_context
    - auto-deploy ensure_namespace
    - auto-deploy initialize_tiller
    - auto-deploy create_secret
    - auto-deploy deploy
    - auto-deploy delete canary
    - auto-deploy persist_environment_url
  environment:
    name: production
    url: http://$CI_PROJECT_PATH_SLUG.$KUBE_INGRESS_BASE_DOMAIN
  artifacts:
    paths: [environment_url.txt, tiller.log]
    when: always

From the above, we see some of the same checks as before, however, this deletes the canary release we deployed, finally transitioning our application to a full 100% deployment within the ‘production’ environment. Thus, the canary is now no longer needed.