Getting started with Logging and Kubernetes (Part 2)
In our previous post on logging, we took a look at enhancing Kubernetes’ native functionality by adding log aggregation. Along the way (and perhaps more importantly) we discussed how to wade through the overwhelming number of options available and zero in on a single solution.
While getting the cluster logs aggregated to one spot is a great first step, in this post we discuss taking it to the next level. Agile development shops need to be able react quickly. To do so, they must be able to curate and interrogate data about their applications quickly and easily.
The de facto solution for this (on-prem anyway) is typically the combination of Elasticsearch fronted by a dashboard. Elasticsearch provides a scalable, battle-tested solution for indexing data (even non-log data), and a dashboard provides a user-friendly interface for querying.
In this blog post, we'll walk through the different ways to implement this type of monitoring stack, and we'll actually go through a deployment with you step-by-step.
Related: Need your own Kubernetes cluster for this walkthrough? Try Getting Started with Rancher
ELK, EFK, What Does it all Mean?
As before, we have to start by making some decisions.
If you've researched container logging solutions, you've probably come across references to the ELK stack, and more recently the EFK stack. Luckily, they are the primary options to choose from:
ELK: Elasticsearch, Logstash, Kibana
EFK: Elasticsearch, Fluentd, Kibana
As mentioned, Elasticsearch is where logs are stored and indexed. Kibana provides a dashboard for querying and visualizations. The variable part, Logstash/Fluentd, provides aggregation and forwarding.
Seems simple enough. However, there's a small catch. There is actually another letter hiding out in those acronyms, and really they could be written:
ELfK: Elasticsearch, Logstash, Filebeat, Kibana
EFfK: Elasticsearch, Fluentd, Fluent Bit, Kibana
(Yup, that's right, the f is different for each stack 😭)
And just to complicate things further, sometimes when bloggers reference EFK, they're actually referring to Elastic, Fluent Bit/Filebeat, and Kibana, and leaving Logstash/Fluentd out altogether.
Why is this ELK/EFK Stuff so Complicated?
Why so many options? And how is that we can leave out the aggregation layer?
To answer, let's have a quick (and riveting!) history recap on logging:
Logstash was written, in JRuby, to parse and forward logs to Elasticsearch. It's flexible and performant, but since it runs in the JVM, is a bit of a memory hog.
Later Fluentd, also written in Ruby, and also parsing and forwarding logs to Elasticsearch, was released. But since it uses MRI Ruby, it consumes less memory than Logstash.
As time goes on and cluster sizes increase, the realization grows that we can reduce memory consumption across the cluster by replacing those heavyweight aggregators on every node with lightweight processes forwarding logs to only a few aggregators.
To answer this, Logstash releases Filebeat; Fluentd releases Fluent Bit.
As time goes on - and this bit is crucial - more and more features begin to be added to Filebeat and Fluent Bit. In fact they reach the point that for many organizations, the native ability of the lightweight forwarders to parse is sufficient. So they get updated to forward to Elasticsearch directly - bypassing the aggregation layer altogether.
Which Kubernetes Logging Stack is Right for You?
The good news is, you can't really make a wrong choice - their functionality is too similar.
If you have older, or more esoteric apps deployed, it may be worth checking the plugin list for each stack (ELK vs EFK). One may have better support for your applications than the other.
That said, for many organizations running modern workloads, an EFK stack is sufficient. And for the rest of this post, we're actually going to go with the Elasticsearch, Fluent Bit, Kibana stack.
How to Deploy the EFK Stack to Kubernetes
Decision made. Now for the easy part!
How to Deploy Elasticsearch
We'll first deploy Elasticsearch, since the other two components depend on it. We'll use Helm to make things super simple.
Here is the Elasticsearch Helm Chart. We'll use it to deploy Elasticsearch to our Kubernetes cluster.
$ helm repo add elastic https://helm.elastic.co $ helm repo update $ helm install elastic/elasticsearch \ --name elasticsearch \ --namespace logging \ --set replicas=1 \ --set minimumMasterNodes=1 \ --set volumeClaimTemplate.accessModes=ReadWriteMany \ --set resources.requests.storage=100Gi
And to just confirm we've successfully deployed:
$ kubectl get pods -n logging NAME READY STATUS RESTARTS AGE elasticsearch-master-0 1/1 Running 0 10s
How to Deploy Fluent Bit
$ helm install stable/fluent-bit \ --name fluent-bit \ --namespace logging \ --set backend.type=es \ --set backend.es.host=elasticsearch-master
And to just confirm we've successfully deployed:
$ kubectl get pods -n fluent-bit NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES fluent-bit-7sbpf 1/1 Running 0 20s 10.244.3.187 node-1 <none> <none> fluent-bit-7z8zj 1/1 Running 0 20s 10.244.2.138 node-2 <none> <none> fluent-bit-bqq65 1/1 Running 0 20s 10.244.0.233 node-3 <none> <none>
How to Deploy Kibana
$ helm install elastic/kibana \ --name kibana \ --namespace logging \ --set ingress.enabled=true \ --set ingress.hosts=kibana.pv \ --set service.externalPort=80
And just to confirm we've successfully deployed:
$ kubectl get pods -n kibana NAME READY STATUS RESTARTS AGE kibana-c597fd4d5-mwsl9 1/1 Running 0 5s
And that's it. We can fire up our favorite web browser and go to the ingress URL we used when deploying Kibana above to check out our logs.
With that you've got your logs not only centrally located, but curated and searchable.
Terms and Conditions May Apply
Now for some disclaimers.
The above installs represent the simplest possible. While fine for trying things out, a production deployment would be more complex.
If you follow the links to the individual Helm charts, you'll also discover there are a ton of options available to configure. This is actually great in some respects. Helm charts provide a consistent, opinionated framework that makes discovering and keeping track of all the config options manageable. However, for organizations that are new to Kubernetes, it can be daunting.
In the course of composing this post, I actually learned something new myself. What I thought was the official home of the Helm charts for Elasticsearch and Kibana, has actually been deprecated. Make no mistake - the projects themselves are still active. But the Helm charts have a new, official (yet oddly still beta?) home under the Elastic Github repo.
Keeping Track of It All
Hopefully these posts have helped illustrate the decision making steps in choosing and deploying an open source logging solution. If it seems a bit overwhelming, quite frankly, that's because it is - at first. There's definitely a learning curve to overcome when starting out with Kubernetes - but they payoff in the long run makes it enormously worthwhile.
And of course, if you need help getting over that initial curve, we're happy to help.