Getting started with Logging and Kubernetes
We're all pretty well aware of the need for logging in our environments. Logs help us understand what is going on in our cluster today, and provide a record of events if we need it down the road.
But as is often the case with Kubernetes, there's a lot of competing options available. In this post, we'll look at how to pare those options down and discuss some important features to look for along the way.
Wait… Doesn't Kubernetes already handle logging?
It does! In fact, the essential
kubectl command provides us with a simple, easy-to-use interface for accessing application logs.
We don't have to log into some host VM. We don't have to recall some app-specific log location.
Just to demonstrate, here's an example of using
kubectl to retrieve logs from some hypothetical web application:
As long as we have the proper credentials, we can see our applications logs without having to figure out where it's running in the cluster.
For a single developer working on a single feature, this method can work great. But things get a little more complicated when we move that app to production. As we move to higher environments, that single app instance becomes a small part of a more complex system.
Not only do we have more components, but each component will be scaled to run in multiple instances for performance and resiliency. If we need to troubleshoot a problem, and diagnose which layer the error is occurring at, our kubectl command(s!) looks something more like:
Wouldn't it be great if we could access all these logs from a single source?
It's not unreasonable to think the de facto tool for managing containers at scale would provide functionality to handle logging at scale as well. But Kubernetes is designed with flexibility in mind, and this extends to logging. To allow for any number of solutions, Kubernetes typically relies on whatever logging functionality is provided by the container runtime (e.g., Docker) to handle container logs.
It's left to us to decide which logging solution to layer over this to best meet our needs.
Obviously we're not the first to face this dilemma, and it turns out there is an answer to this problem.
Actually, it turns out, there are a LOT of answers to this problem.
This is just the tip of the logging iceberg!
Great. So how do we choose?
So there's basically an explosion of options available, to the point that it's overwhelming. When trying to digest all the choices, it's important to remember, there's no one correct solution. There's a lot of overlap in the features provided by the different providers. For the sake of this discussion, we're going to knock out a number of contenders by ruling out SaaS/cloud offerings. It's not to say they can't be a good choice to get up and running, but they do come at a cost; both financially, and via vendor lock in.
So let's remove those and focus on on-prem offerings.
That's a little better, but we're still left with a lot to choose from.
To keep things simple, let's focus on our original problem: getting logs from our containers to a central repository. We're not going to worry about sophisticated searching or analytics (we'll leave that for a later post).
Now we're getting down to the basics. On the left we have our forwarding agents, on the right, our aggregators. In a typical cluster set up, you'll have one forwarding agent per node. Each agent is responsible for collecting logs for all the containers deployed to that node. These logs are forwarded to a centralized aggregator. Pretty straightforward.
But even in this simplified architecture, we still have a few options to choose from. You may have noticed that two of those options, Logstash and Fluentd, are included on both sides of our architecture. Originally, these were the two primary choices for Kubernetes log forwarding, and you may come across dated guides that employ this method still. However, both are a bit “heavyweight” - i.e., they consume a substantial amount of memory and compute resources - since they are designed to not only forward, but to process the logs as well. A better approach is to just forward from your nodes, and leave any processing to the aggregation point. To this end, the Logstash project developed a lightweight forwarder, Filebeat, while the Fluentd project created the forwarder Fluent Bit. So we'll remove Logstash and Fluentd from the agent half of the equation.
Syslog might seem like a good option; it is one of the most battle-tested log agents out there. It may even be already in use within your organization. However, it was never designed with containerized environments in mind. Syslog typically monitors static file paths, as opposed to the ephemeral container log paths. It also is not capable of processing or forwarding container metadata, which can be quite useful. So let's go ahead and rule it out as well.
This gets us down to basically choosing between Logstash and Fluentd, and really you can't go wrong with either one. Both have loads of plugins, great community support, and are written in Ruby. Since Logstash runs under JRuby and generally consumes a bit more memory, for our hypothetical cluster, let's go with Fluentd.
Wait! What happened to Docker?
So far, every diagram has not just included Docker in the picture, but clearly showed that it is capable of logging to a variety of targets.
So why not just configure Docker to forward logs to the appropriate destination?
One less thing to install should mean reduced complexity, and the variety of targets means flexibility. Seems like a no-brainer.
However, it's not quite so simple. Changing the Docker log driver from the default has a major
drawback if you're not paying for an enterprise license:
you will no longer be able access your logs locally
kubectl logs). If something happens to our aggregation point, we're basically left with no logs, and a really
hard time trying to troubleshoot.
This a powerful enough feature to make this a deal-breaker.
Whew. Fluentd it is then.
Who would have thought there'd be so much to logging?
But now we're ready for the easy part. We just need to deploy instances of Fluent Bit to collect and forward our container logs, and an instance of Fluentd to receive them.
We'll start with Fluentd, which will run as a Kubernetes Deployment. Normally it's configured to forward to Elasticsearch, but to keep things simple for this post, we'll just override the default configuration and have it save logs to a local file.
#### Custom configuration for Fluentd kind: ConfigMap apiVersion: v1 metadata: name: fluentd namespace: logging data: fluent.conf: |- <source> @type forward port 24284 </source> <filter **> @type stdout </filter> <match **> @type file path /fluentd/log symlink_path /fluentd/log/cluster.log append true format json </match> --- #### Fluentd Kubernetes Deployment apiVersion: apps/v1 kind: Deployment metadata: name: fluentd namespace: logging labels: app: fluentd spec: selector: matchLabels: app: fluentd template: metadata: labels: app: fluentd spec: containers: - name: fluentd image: fluentd:latest volumeMounts: - name: config-volume mountPath: /fluentd/etc volumes: - name: config-volume configMap: name: fluentd --- #### Fluentd Kubernetes Service apiVersion: v1 kind: Service metadata: name: fluentd namespace: logging spec: ports: - port: 24224 protocol: TCP selector: app: fluentd type: ClusterIP
Above is the YAML that we'll use to configure and run Fluentd, which we deploy with the following:
kubectl apply -f fluentd-deployment.yaml
Since it doesn't need any customizations, deploying Fluent Bit is even easier. We can utilize helm (a package manager for Kubernetes), with the following:
helm install stable/fluent-bit --name fluent-bit --namespace logging
Checking our pod status shows we've got a single instance of Fluentd. Since once instance can handle ~5000 messages/second, this is fine for our needs. Fluent Bit is deployed as a DaemonSet, so we automatically get one instance for each node of our cluster.
And with that, we're done.
You might be wondering how the fluent bit agents know where to forward to. In configuring Fluentd, values were chosen that would match up to the Fluent Bit defaults - namely the service hostname and port Fluent Bit expects out of the box. These details can be obtained at the helm chart for Fluent Bit.
Wait… What was the original problem?
Oh yeah: can we access the logs for all of our containers in a single place? We've had to go through a lot of steps it seems. But note: the hard part has been paring down the landscape of options available to us. When it came to implementing our final choice, Kubernetes made that quite easy.
But lets confirm.
Recall we've specified the log file Fluentd is writing to, so we know its location.
kubectl to access this log file, and since everything in our cluster is going to one place now, we'll
grep out the bits we're interested in.
And voilà, we can see that multiple applications logs are available in this one location.
A second important takeaway: if you look closely, a tremendous amount of container metadata has been preserved as well. This can be vital for diagnosing problems with a service that seem inconsistent or intermittent. The metadata can make patterns more apparent, and help surface whether the issue is with a single worker node, one half of a blue-green deployment, etc.
There's something different about these logs…
It's true. All that metadata, while valuable, is also overwhelming, and these logs are definitely a bit harder to read. Even though we've glossed over the analytics portion of logging, it actually is a key component, and really brings forth the value of a good logging solution.
So to that end, be on the lookout for part 2 of this series, where we'll dive into the analytics portion of logging.