BoxBoat Blog

Service updates, customer stories, and tips and tricks for effective DevOps

x ?

Get Hands-On Experience with BoxBoat's Cloud Native Academy

Getting Stuff Done With Argo Workflows

by Bryton Hall | Monday, Feb 10, 2020 | Kubernetes

featured.png

What is Argo?

Argoproj (or more commonly Argo) is a collection of open source tools to help “get stuff done” in Kubernetes. This includes Argo Workflows, Argo CD, Argo Events, and Argo Rollouts.

What is Argo Workflows?

Argo Workflows is a Kubernetes-native workflow engine for complex job orchestration, including serial and parallel execution. Argo Workflows simplifies the process of leveraging Kubernetes to help deploy these workflows.

How Argo Works

Argo adds a new object to Kubernetes called a Workflow, that we can create and modify as any other Kubernetes object (like a Pod or Deployment). A Workflow is, in fancy speak, a directed acyclic graph of “steps”.

For example, if we want an arbitrary step D to depend on steps A and ( B or C), we might create a Workflow that looks like the following:

an example dag

With Argo, each “step” executes in a pod and can run in parallel with, or as a dependency of, any number of other steps.

Some of Argo's features include:

  • parametrization and conditional execution
  • passing artifacts between steps
  • timeouts and retry logic
  • recursion and flow control
  • suspend, resume, and cancellation
  • memoized resubmission
  • and much more!

A Simple Workflow

That said, let's take a brief look at possibly the simplest Workflow object.

apiVersion: argoproj.io/v1alpha1
kind: Workflow                  # new type of k8s spec
metadata:
  generateName: hello-world-    # name of the workflow spec
spec:
  entrypoint: whalesay          # invoke the whalesay template
  templates:
  - name: whalesay              # name of the template
    container:
      image: docker/whalesay
      command: [cowsay]
      args: ["hello world"]

Here, we see that we have a new kind under the argoproj.io/v1alpha1 API. Looking at the inline comments, we'll notice that there is only one step in this workflow (defined under the spec.templates key). That step runs a single container (docker/whalesay) and executes cowsay "hello world" inside that container.

This workflow is largely analogous to running the following command locally:

$ docker container run docker/whalesay cowsay "hello world"

While this example isn't very exciting – fear not! Workflows can quickly get complicated to suit our needs. There are several examples of the features listed above in the official docs.

Why Argo?

Why do we want to use Argo? Why not use another tool like Airflow, or hack something up on our existing Jenkins cluster?

Because Kubernetes!

Argo doesn't reinvent what Kubernetes already provides. If we know how to attach a volume to a pod, we know how to attach a volume to a step in our workflow. The same applies to networking, environment variables, resource requests/limits, service accounts, node/pod (anti-)affinities, and everything else a pod can define.

This is possible because Workflows use the same mechanism as vanilla Kubernetes Deployments or DaemonSets. For example, they use a pod template.

Using Argo to Analyze Climate Data

CIRA (Cooperative Institute for Research in the Atmosphere) is a research institute of Colorado State University. Their mandate is to link climate models with climate data.

CIRA logo

BoxBoat worked with CIRA on two projects: CloudSat and GeoCarb (planned for launch in 2020s). At CIRA, we helped design and build out a proof of concept to accelerate their adoption of Kubernetes with Argo as their distributed data science workflow engine.

CIRA had previously utilized a rudimentary form of orchestration built around Docker Compose. Workload scaling was largely a manual process of creating long-running daemons which poll for new tasks. Creating and maintaining custom orchestration logic was becoming an increasingly large burden on their team.

CIRA’s new cloud native platform is an event-driven system of workflows that can scale from zero to tens of nodes on-prem and into the thousands of nodes across clouds all while reducing management overhead and manual intervention.

Running Argo locally

To quickly demo Argo (or most any tool on Kubernetes), we recommend using either

  • kind - for a full-fledged Kubernetes distribution or
  • k3d - for a light-weight k3s distribution

Once you have one of those installed (as well as docker and kubectl), setting up Argo can be done in just a few steps:

  1. First, create a Kubernetes cluster.

    $ kind create cluster
    
  2. Then create a namespace for our Argo deployment.

    $ kubectl create namespace argo
    
  3. Since Argo creates pods to execute the steps of a workflow, we must create a rolebinding for the Argo workflow controller to have the permission run pods in the default namespace.

    $ kubectl create rolebinding default-admin --clusterrole=admin --serviceaccount=default:default
    
  4. Since kind uses containerd, we need to configure the workflow controller to use the PNS (Pod Namespace Sharing) executor:

    $ kubectl create configmap -n argo workflow-controller-configmap --from-literal=config="containerRuntimeExecutor: pns"
    
  5. Now we can deploy Argo!

    $ kubectl apply -n argo -f https://raw.githubusercontent.com/argoproj/argo/v2.4.3/manifests/install.yaml
    
  6. Run a test workflow.

$ kubectl create -f https://raw.githubusercontent.com/argoproj/argo/master/examples/coinflip.yaml
  1. Expose the Argo UI on an available local port (we're using port 8080 here) and then open our browser to http://localhost:8080.
 $ kubectl port-forward -n argo svc/argo-ui 8080:80

example workflow screenshot

Argo also has a CLI to make interacting with the workflow engine a little nicer. But you can now try most (we'll need a bit more configuration to get things like artifact storage) of the examples from the official docs on our local machine!

Once we're done we can delete the cluster with

$ kind delete cluster

The Argoproj suite

Workflows isn't the only tool in the Argoproj portfolio.

Argoproj - Get stuff done with Kubernetes

We encourage you to take a look some of the other awesome work they've done in extending the Kubernetes API:

  • Argo Events - The Event-Based Dependency Manager for Kubernetes
  • Argo CD - Declarative Continuous Delivery for Kubernetes

At BoxBoat, we've helped teams build workflows to get things done on Kubernetes. We're a team of experts driven to help you build value with whatever tools fit into your ecosystem.

Contact us and we'll be glad to help you discover which tools work for you!