Getting Stuff Done With Argo Workflows
What is Argo?
Argoproj (or more commonly Argo) is a collection of open source tools to help “get stuff done” in Kubernetes. This includes Argo Workflows, Argo CD, Argo Events, and Argo Rollouts.
What is Argo Workflows?
Argo Workflows is a Kubernetes-native workflow engine for complex job orchestration, including serial and parallel execution. Argo Workflows simplifies the process of leveraging Kubernetes to help deploy these workflows.
How Argo Works
Argo adds a new object to Kubernetes called a
Workflow, that we can create and modify as any other Kubernetes object (like a
Workflow is, in fancy speak, a directed acyclic graph of “steps”.
For example, if we want an arbitrary step
D to depend on steps
A and (
C), we might create a Workflow that looks like the following:
With Argo, each “step” executes in a pod and can run in parallel with, or as a dependency of, any number of other steps.
Some of Argo's features include:
- parametrization and conditional execution
- passing artifacts between steps
- timeouts and retry logic
- recursion and flow control
- suspend, resume, and cancellation
- memoized resubmission
- and much more!
A Simple Workflow
That said, let's take a brief look at possibly the simplest
apiVersion: argoproj.io/v1alpha1 kind: Workflow # new type of k8s spec metadata: generateName: hello-world- # name of the workflow spec spec: entrypoint: whalesay # invoke the whalesay template templates: - name: whalesay # name of the template container: image: docker/whalesay command: [cowsay] args: ["hello world"]
Here, we see that we have a new
kind under the
Looking at the inline comments, we'll notice that there is only one step in this workflow (defined under the
That step runs a single container (
docker/whalesay) and executes
cowsay "hello world" inside that container.
This workflow is largely analogous to running the following command locally:
$ docker container run docker/whalesay cowsay "hello world"
While this example isn't very exciting – fear not! Workflows can quickly get complicated to suit our needs. There are several examples of the features listed above in the official docs.
Why do we want to use Argo? Why not use another tool like Airflow, or hack something up on our existing Jenkins cluster?
Argo doesn't reinvent what Kubernetes already provides. If we know how to attach a volume to a pod, we know how to attach a volume to a step in our workflow. The same applies to networking, environment variables, resource requests/limits, service accounts, node/pod (anti-)affinities, and everything else a pod can define.
This is possible because Workflows use the same mechanism as vanilla Kubernetes Deployments or DaemonSets. For example, they use a pod template.
Using Argo to Analyze Climate Data
CIRA (Cooperative Institute for Research in the Atmosphere) is a research institute of Colorado State University. Their mandate is to link climate models with climate data.
BoxBoat worked with CIRA on two projects: CloudSat and GeoCarb (planned for launch in 2020s). At CIRA, we helped design and build out a proof of concept to accelerate their adoption of Kubernetes with Argo as their distributed data science workflow engine.
CIRA had previously utilized a rudimentary form of orchestration built around Docker Compose. Workload scaling was largely a manual process of creating long-running daemons which poll for new tasks. Creating and maintaining custom orchestration logic was becoming an increasingly large burden on their team.
CIRA’s new cloud native platform is an event-driven system of workflows that can scale from zero to tens of nodes on-prem and into the thousands of nodes across clouds all while reducing management overhead and manual intervention.
Running Argo locally
To quickly demo Argo (or most any tool on Kubernetes), we recommend using either
Once you have one of those installed (as well as
kubectl), setting up Argo can be done in just a few steps:
First, create a Kubernetes cluster.
$ kind create cluster
Then create a namespace for our Argo deployment.
$ kubectl create namespace argo
Since Argo creates pods to execute the steps of a workflow, we must create a rolebinding for the Argo workflow controller to have the permission run pods in the default namespace.
$ kubectl create rolebinding default-admin --clusterrole=admin --serviceaccount=default:default
Since kind uses containerd, we need to configure the workflow controller to use the PNS (Pod Namespace Sharing) executor:
$ kubectl create configmap -n argo workflow-controller-configmap --from-literal=config="containerRuntimeExecutor: pns"
Now we can deploy Argo!
$ kubectl apply -n argo -f https://raw.githubusercontent.com/argoproj/argo/v2.4.3/manifests/install.yaml
Run a test workflow.
$ kubectl create -f https://raw.githubusercontent.com/argoproj/argo/master/examples/coinflip.yaml
- Expose the Argo UI on an available local port (we're using port
8080here) and then open our browser to http://localhost:8080.
$ kubectl port-forward -n argo svc/argo-ui 8080:80
Argo also has a CLI to make interacting with the workflow engine a little nicer. But you can now try most (we'll need a bit more configuration to get things like artifact storage) of the examples from the official docs on our local machine!
Once we're done we can delete the cluster with
$ kind delete cluster
The Argoproj suite
Workflows isn't the only tool in the Argoproj portfolio.
Argoproj - Get stuff done with Kubernetes
We encourage you to take a look some of the other awesome work they've done in extending the Kubernetes API:
- Argo Events - The Event-Based Dependency Manager for Kubernetes
- Argo CD - Declarative Continuous Delivery for Kubernetes
At BoxBoat, we've helped teams build workflows to get things done on Kubernetes. We're a team of experts driven to help you build value with whatever tools fit into your ecosystem.
Contact us and we'll be glad to help you discover which tools work for you!