The Grype Admission Controller
Today I want to write about the grype admission controller. I wrote it. I am proud of it. I think it solves a really uncomfortable problem in DevSecOps.
Security has a big problem: On one hand, security teams are responsible for making everything secure. That's their job. But on the other hand, they need to somehow do that job while not being directly involved in the production of the code. That creates a very sticky wicket in the form of rigid policies, and usually means the security team is blamed for being inflexible, draconian, and rigid.
How can we empower the security team to have control, without causing them to become those blockers we all know and (don't) love? DevSecOps, of course!
In DevOps, we wouldn't traditionally say “well we do it that way because of corporate policy”. That's a bad solution, and it disempowers people to innovate and do their best work. We want people to innovate, and we want the corporate stuff to get out of their way so they can do awesome jobs and shine. The mechanism of communication and record in serious DevOps shops is the merge request, and so we need to bring the Security team to the table in a dynamic, yet secure way.
We're going to explore software designed to help keep us secure, the social factors that make this workflow successful, and learn quite a bit about Kubernetes along the way.
The backbone of our product is grype. Grype is cool - it provides a way to scan containers, and it will tell you when a container is vulnerable. The problem with this approach is that it does not quite go far enough. We don't want to scan the container when we build it anymore than we would just install Windows and declare our work done. Security is an ongoing, and changing environment. For our project, we want to run grype whenever we spin up a container. Not only will this ensure we are secure at build time via our CI/CD pipeline, but it will ensure we are secure going forward. If there is a new CVE published tomorrow, we don't want to run the same containers we did yesterday. We need to throw a flag.
Kubernetes is great. Out of box it does a lot of stuff. It also gives us a great mechanism to make it do more stuff! The kubernetes mechanism which allows us to extend it this way is an admission controller. We have made a new admission controller, which runs grype each time someone wants to run a container. If the container has vulnerabilities greater than “medium” (by default) then grype will signal to kubernetes that grype is not OK running this container. The container will not be permitted to run in the cluster.
Clone the repo:
git clone https://github.com/boxboat/grypeadmissioncontroller.git cd grypeadmissioncontroller
Now run the generation script:
You should see the following output:
Generating RSA private key, 2048 bit long modulus ................+++ ..................+++ e is 65537 (0x10001) Generating RSA private key, 2048 bit long modulus ............................................................................+++ ............+++ e is 65537 (0x10001) Signature ok subject=/CN=grypy.default.svc Getting CA Private Key To install the grype validating webhook run: kubectl apply -f manifest.yaml After the grypy pod is running, set the webhook to Fail rather than Ignore: kubectl patch validatingwebhookconfigurations grypy --patch-file webhookpatch.yaml
Nice instructions! You have a valid kubeconfig and you're ready to run, right?
Now install the admission controller:
kubectl apply -f manifest.yaml
You're just about there, now we have to enable the enforcement of the results!
kubectl patch validatingwebhookconfigurations grypy --patch-file webhookpatch.yaml
Now test your installation by applying
app_ok.yaml which will succeed, and
app_wrong.yaml which will fail. Check grype's pod logs for the grype output, or your cluster's logging solution, which should have also captured it.
kubectl delete -f manifest.yaml -f app_ok.yaml -f app_wrong.yaml
That's all there is to it!
If you performed the quickstart, you might have noticed something important: kubectl gave you a visible error to tell you why it wasn't going to run your pod. I think this is a very important piece of the puzzle in the “good neighbors” philosophy. Suddenly we've gone from “why didn't that work” to “I am being told exactly why this didn't work”. Kubernetes is hard. We don't want to make it harder. Advertising why something didn't happen the way we expected it to happen is an important part in not being that blocker.
This is where the magic happens. Lets say you're a DevOps DevOperator and your cool app you spent a lot of time hacking on has been found to have some problems. That's OK. Hopefully you're using grype in your CI/CD system and your build pipeline alerted you to a problem before you deployed it. You worked with your security team to evaluate your options and they gave you the greenlight to press onward! How do we get our admission controller to allow our pods to run?
Hopefully you noticed the
_manifest_.yaml in the repo directory. This is our template we generated the
manifest.yaml out of. It creates a bunch of things grype depends on to operate as an admission controller and among them is a configMap. That configMap contains the
config.yaml for grype. It can be viewed here, but you're going to have your own stored in your own repository.
At the top of that yaml document are comments, and the comments direct you how to configure grype. In this case, we need to read up on how to specify matches to ignore. Looking at the grype example, we see how we need to change our configMap. Make your configMap look like this:
# https://github.com/anchore/grype#configuration # Fill out .grype/config.yaml with some nice-to-haves # Ignore rules for whitelisting: https://github.com/anchore/grype#specifying-matches-to-ignore apiVersion: v1 kind: ConfigMap metadata: name: grypeconfig data: config.yaml: | check-for-app-update: true output: "table" quiet: false db: auto-update: true cache-dir: "/tmp" ignore: # This is the full set of supported rule fields: - vulnerability: CVE-2008-4318 fix-state: unknown package: name: libcurl version: 1.5.1 type: npm location: "/usr/local/lib/node_modules/**" # We can make rules to match just by vulnerability ID: - vulnerability: CVE-2017-41432 # ...or just by a single package field: - package: type: gem
Slam dunk! Now open a merge request against your security repository. This is where the security team gets to fill the role of DevSecOps.
Did the whitelist only apply to the problem? Is the whitelist too broad? Does security approve of these changes? All of these discussions and events generate an audit trail in your CI/CD system and will make passing an audit a breeze later. Additionally this is helping security transition from “policy” to “partner” in the development process.
If everything passes inspection, you can apply that configMap to your cluster, and remember to delete any running grype pods so they pick up the new configuration. If you simply delete the admission controller, you'll turn off the security for a moment and that is not something we want to do. Bear in mind that this whitelist will apply to all the pods in your cluster.
If you want to whitelist entire namespaces, that is supported also by changing the ValidatingWebhookConfiguration. This seems heavy handed, but useful for clusters where namespaces might be used for builds, or ETL jobs, or similar workflows.
Hopefully you enjoyed learning how admission controllers can help you secure your clusters and how we can use technology to help us interact and become better partners. If you have problems running the admission controller or getting your applications built in the best way, don't forget you can always contact us and we will see how we can help.