BoxBoat Blog

Service updates, customer stories, and tips and tricks for effective DevOps

x ?

Get Hands-On Experience with BoxBoat's Cloud Native Academy

What’s New in Docker 1.12 – Part 3

by Brandon Mitchell | Monday, Sep 19, 2016 | Docker

featured.jpg

This is the last in a 3 part series covering all the changes in the Docker 1.12 release. Part 1 covered changes introduced to building images, distribution, and logging. And part 2 covered changes introduced to networking, the new plugin interface, and the client API. This post will cover changes to the runtime, swarm, volumes, and deprecated features.

Runtime

The runtime gets to the core of Docker, with changes made to components like storage drivers, the engine restart behavior, events, systemd integration, namespaces, and container specific kernel settings.

Live Restore

Pull request #23213

Live restore allows you to stop the Docker daemon without stopping the running containers, and automatically reconnect to those containers when the daemon restarts. Because of the FIFO buffers used for container output, container processes may hang during an extended daemon outage if they fill up this pipe. Swarm mode also doesn't support the live restore option. The built-in orchestration of swarm mode allows you to gracefully migrate containers to another host in the swarm before stopping or restarting the engine, so this is a minor limitation.

To enable this, you need to pass the --live-restore flag to the daemon when starting it, or you can modify the configuration of a running daemon by changing the /etc/docker/daemon.json to include:

{
  "live-restore": true
}

and then signaling dockerd to reload it's configuration with killall -HUP dockerd. More details on this feature are available in the docker docs.

Runtime command

Pull request #22983

The Docker engine doesn't include code to run containers. Wait, what? Don't panic, this is a somewhat old change when they split out the containerd and runc code, and from the user perspective, everything still works the same.

The change for 1.12 is the ability to update the daemon with information on other runtimes. And with the daemon configured, each container you launch can now be served by a different runtime. This opens the doors for containers to be run in a VM to give even further isolation, possibly with a microkernel to keep it extremely lightweight, while still running your existing containers in runc. I suspect this change is in preparation for the Windows native container support. Some example scenarios look like:

# configure the runtime directly on the daemon command line
$ dockerd --add-runtime ”oci=/usr/local/sbin/runc” \

  --add-runtime ”vm=/usr/local/bin/vm-manager”

# or you can configure via daemon.json
$ cat /etc/docker/daemon.json
{
  "runtimes": {
    "oci": {
      "path": "/usr/bin/docker-runc",
      "runtimeArgs": [
        "--debug"
      ]
    },
    "vm": {
      "path": "/usr/local/bin/vm-manager",
      "runtimeArgs": [
        "--debug"
      ]
    }
  }
}

# then each container can use a different runtime
$ docker run --runtime=vm <image-name>

Filesystem Drivers

For this release, the filesystem drivers received a few bug fixes and minor features:

  • Added overlay2 driver that resolves inode exhaustion issues from overlay (pull request #22126)
  • Btrfs adds daemon option --storage-opt btrfs.min_space=10G for minimum size of subvolumes it creates, docker run --storage-opt size=5G would then fail for being too small (pull request #19651)
  • Zfs adds daemon option --storage-opt size=2G for the container block size, this is an upper quota for the RW layer of a container (pull request #21946)

Events

Load and Save now output events in pull request #22137. You can see the new behavior in the below example:

$ docker save busybox:latest > busybox.tar

$ docker events --since 5m --until 0m
2016-08-05T16:56:33.609312884-04:00 image save sha256:2b8fd9751c4c0f5dd266fcae00707e67a2545ef34f9a29354585f93dac906749 (name=sha256:2b8fd9751c4c0f5dd266fcae00707e67a2545ef34f9a29354585f93dac906749)

A daemon reload will now generate an event from pull request #22590. Note that stop and start still do not generate any event. This does require a configuration file, /etc/docker/daemon.json by default, for an event to be generated, otherwise no configuration change would occur. Here's an example of both creating that configuration file and seeing the event from the reload:

$ echo '{ "labels": ["foo=bar", "env=laptop"] }' | \

  sudo tee /etc/docker/daemon.json

$ sudo systemctl reload docker

$ docker events --since 0 --until 0s --filter "type=daemon"
2016-08-31T17:30:56.763501090-04:00 daemon reload .... (cluster-advertise=, cluster-store=, cluster-store-opts={}, debug=false, default-runtime=runc, labels=["foo=bar","env=laptop"], max-concurrent-downloads=3, max-concurrent-uploads=5, name=docker-demo, runtimes=runc:{docker-runc []})

$ docker info | grep -i -A 2 Labels
Labels:
 foo=bar
 env=laptop

Container detach now generates an event, mirroring the attach event that already existed, from pull request #22898. When attached to a container with a tty (-t), you can detach with the [cont]-P [cont]-Q key combination. In practice this looks like:

$ docker run -itd --name test-attach busybox /bin/sh
5d4c5ed10a6c160e12afdacda3e2ac776ad75158917db22e82aabebe3b6b961d

$ docker attach test-attach
/ #
# cont-P cont-Q entered to detach back to host

$ docker events --since 1m --until 0s --filter event=detach
2016-08-31T17:57:07.881459110-04:00 container detach .... (image=busybox, name=test-attach)

Systemd configuration

The supplied systemd unit file now supports reloading the daemon. This simply adds an ExecReload unit file entry as describe in pull request #22446.

In my own environments, I've copied the /lib/systemd/system/docker.service to /etc/systemd/system/docker.service and made my local configuration changes in there where automated upgrades won't overwrite. So the change is visible in a diff before I copy the new line into my local configuration:

# Before updating config
$ sudo systemctl reload docker
Failed to reload docker.service: Job type reload is not applicable for unit docker.service.

$ diff /lib/systemd/system/docker.service /etc/systemd/system/docker.service
12,17c12,20
&lt; ExecReload=/bin/kill -s HUP $MAINPID
# other local changes for PKI certificates where also listed

$ sudo vi /etc/systemd/system/docker.service
# Added the ExecReload line from the above diff

$ sudo systemctl daemon-reload

$ sudo systemctl reload docker
# no errors, containers remain up

Pid Namespace

Docker added the ability to join the pid namespace of another container with docker run --pid container:<name|id> in pull request #22481. This can be useful for so called “sidecar” containers that can be run and managed individually without a docker exec command that includes no visibility from docker ps. Here's an example of this in the lab:

$ docker run --name test-tail -itd busybox tail -f /dev/null
35fc6df2e27ee42eac5a506b8ee36a504e535e36838f53111b9cbfe6e191c348

$ docker run --name test-pid-ns --pid container:test-tail -it busybox /bin/sh
/ # ps -ef
PID   USER     TIME   COMMAND
    1 root       0:00 tail -f /dev/null
    6 root       0:00 /bin/sh
   12 root       0:00 ps -ef

Setting sysctls per container

Docker added a new flag, --sysctl, for passing name spaced kernel settings. This is only supported with values that are name spaced into the container and so won't impact the host or other containers. At present, the “net.” values can be set when not running with --net=host and the following IPC values can be defined: kernel.msgmax, kernel.msgmnb, kernel.msgmni, kernel.sem, kernel.shmall, kernel.shmmax, kernel.shmmni, kernel.shm_rmid_forced Sysctls beginning with fs.mqueue.. More details are available in the docker run reference and pull request #19265. The below example shows how shmmax can be adjusted:

$ cat /proc/sys/kernel/shmmax
18446744073692774399

$ docker run -it --rm debian cat /proc/sys/kernel/shmmax
18446744073692774399

# lower the available shared memory, could also increase it
$ docker run -it --rm --sysctl "kernel.shmmax=100" debian \

  cat /proc/sys/kernel/shmmax
100

Support setting size of root filesystem

Docker added the ability to define the size of the root filesystem per container for those not running AUFS in pull request #19367. This means a single large container on the host won't result in all the other containers using a similarly large chunk of disk, you can now set a low starting threshold and increase or decrease it for specific containers. This is implemented with docker run --storage-opt size=1G to change it to 1 Gig for that specific container. Since I'm running AUFS on all of my lab systems, I don't have a demo for this.

Adjust the out-of-memory score

If you ever run out of memory on a server, the kernel has it's choice of what to kill. You can adjust the score it uses to avoid having the docker daemon itself killed using a new dockerd --oom-score-adjust=500 option in pull request #24516. Possible values for this option are from -1000 to 1000, defaulting at 0, and setting it to 500 means the daemon will be killed after other processes on the host but before any OS services that may increase their own score (e.g. dbus and sshd). This also falls into the “didn't demo” category because I didn't want to crash a server.

Don't show created containers with exited filter

Docker fixed a bug where created containers would be included when filtering for the exited status in pull request #21947. Now they show with the status=created but not exited=0:

$ docker create --name=test-create busybox
40c48518b53536538f85206d1a8386619d441aa5cabd511b89de2bb7df356374

$ docker ps -a --filter exited=0
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS                   PORTS               NAMES
a6b3db79dc52        busybox             "env"                    9 hours ago         Exited (0) 9 hours ago                       test_test_run_5

$ docker ps -a --filter status=created
CONTAINER ID        IMAGE               COMMAND             CREATED              STATUS              PORTS               NAMES
40c48518b535        busybox             "sh"                About a minute ago   Created                                 test-create

Detach Key Sequence

A popular tip to new Docker users is using the [cont]-P [cont]-Q key sequence in tty connections to detach from a running container while leaving it running. Docker supports the ability to provide your own key sequence to perform the detach, and in pull request #22943 they fixed a bug where a multi-key sequence would drop the initial keys from a partial key sequence. So before, if you said “a-b-c” was your sequence to detach, and then typed “a-b-d”, the A and B would be dropped instead of passed to your container.

If you haven't tried --detach-keys before, use a , to separate multiple keys in a sequence, and ctrl-x for control characters. Note that the sequence defaults back to ctrl-p,ctrl-q when you reattach, so you need to provide your personal sequence each time you run a docker run or docker attach. Also, note that docker run --rm is not compatible with --detach-keys. The key sequence will detach you from the container, but the remove will hang waiting for the container to exit and you won't get a shell prompt.

$ docker run --detach-keys=a,b,c -it --name test-detach busybox
# note that "abd" don't appear until the "d" is pressed in an invalid sequence
/ # echo abd
abd
/ # echo aa
aa
# entering "abc" on the next line drops to the host bash prompt "$"
/ # echo $

Handle “on-failure” counts on daemon restart

If you have docker configured to restart your containers on failures with a limit, you may have noticed that docker resets that limit when the docker engine is restarted. In pull request #20853 that behavior was corrected to honor the failure count across daemon restarts and leave a failing container down if it reaches its limit. The below example shows a simple container that does nothing but fail:

$ docker run -tid --name test-restart --restart on-failure:3 busybox sh -c "sleep 30; exit 127"

$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED              STATUS              PORTS                    NAMES
9f0299f0b09a        busybox             "sh -c 'sleep 30; exi"   5 seconds ago        Up 3 seconds                                 test-restart

$ docker inspect -f '{{.RestartCount}}' test-restart
0

# after 100 seconds, all 3 retries are done
$ sleep 100

$ docker ps -a
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS                                PORTS                    NAMES
9f0299f0b09a        busybox             "sh -c 'sleep 30; exi"   2 minutes ago       Exited (127) Less than a second ago                            test-restart

$ docker inspect -f '{{.RestartCount}}' test-restart
3

# bounce the docker daemon to verify the container no longer restarts
$ sudo systemctl restart docker

# before 1.12, the container would be trying to restart
$ docker ps -a
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS                                PORTS                    NAMES
9f0299f0b09a        busybox             "sh -c 'sleep 30; exi"   3 minutes ago       Exited (127) About a minute ago                                test-restart

$ docker inspect -f '{{.RestartCount}}' test-restart
3

Fix stats when network is shared

Docker allows you to remove isolation of containers in many ways, or to merge parts of containers together. One of those options is to share the network stack between two containers. In pull request #21904 docker fixed a bug where a shared network stack would show 0s for stats of the network on the second container. The below example shows this when connecting the network to a preexisting container:

# registry container is already running locally
$ docker run -d --net container:registry --name test-net busybox tail -f /dev/null

$ docker stats
CONTAINER           CPU %               MEM USAGE / LIMIT      MEM %               NET I/O             BLOCK I/O           PIDS
cc6d49863395        0.01%               284 KiB / 5.75 GiB     0.00%               23.19 kB / 648 B    69.63 kB / 0 B      0
13062f97a75c        0.01%               15.52 MiB / 5.75 GiB   0.26%               23.19 kB / 648 B    12.95 MB / 0 B      0

Other runtime changes

Some other notable runtime changes include:

  • Automatically update the seccomp profile when containers are given specific capabilities in pull request #22554
  • Un-deprecated the docker run -c shorthand for --cpu-shares since it is still widely used despite warnings since 1.9 release in pull request #22621
  • Order nested tmpfs mounts to avoid mounting a child directory before the parent in pull request #22329
  • Recover from crash during container removal by changing to a dead state in pull request #22423
  • Return error code 400 instead of 500 when a container is created without a command in pull request #22762
  • Ignore SIGPIPE in daemon so journald restart doesn't silent crash engine in pull request #22460
  • Show containers with failing stats requests as error instead of hiding the container from the stats output in pull request #20835

Swarm

There's no way I'm getting through a 1.12 “what's new” blog post without mentioning the integrated swarm mode. There's still a standalone solution that uses containers. However, the new swarm mode is completely integrated with the docker engine and CLI, and it includes high availability, orchestration, and service discovery out of the box. The high availability is implemented with a quorum algorithm, so with 3 managers you can lose one and still have a majority to maintain the state. The orchestration allows you to define a target state and the swarm will constantly correct for any failed containers or nodes to bring up the desired number of instances of your services. And the service discovery uses a routing mesh to connect a port opened on all nodes in the swarm to your containers running the attached service, regardless of where the containers are running. This topic is big enough to deserve a longer discussion than this post has space for, so be sure to check out docker's blog post about the release for more details.

Merging the swarmkit codebase

Pull request #23361, pull request #23362, and pull request #23363

This group of 3 pull requests was the big one, hundreds of files added or changed to add the new feature. This is pulling in the swarmkit codebase where Docker developed the new feature. In these pulls, the new engine code, API calls, and CLI interface were added. For the user, this is the docker swarm, docker service, and docker node commands. Below is the final version of these CLI's on a 1.12.1 system:

$ docker swarm --help
...
Commands:
  init        Initialize a swarm
  join        Join a swarm as a node and/or manager
  join-token  Manage join tokens
  update      Update the swarm
  leave       Leave a swarm

$ docker service --help
...
Commands:
  create      Create a new service
  inspect     Display detailed information on one or more services
  ps          List the tasks of a service
  ls          List services
  rm          Remove one or more services
  scale       Scale one or multiple services
  update      Update a service

$ docker node --help
...
Commands:
  demote      Demote one or more nodes from manager in the swarm
  inspect     Display detailed information on one or more nodes
  ls          List nodes in the swarm
  promote     Promote one or more nodes to manager in the swarm
  rm          Remove one or more nodes from the swarm
  ps          List tasks running on a node
  update      Update a node

Switching swarm from secrets to tokens

Pull request #24823

This change removes the user generated secrets used in non-GA releases in favor of machine generated tokens. It uses a different token for the worker vs the manager instead of auto accepting into one role based on a single secret. And tokens may be rotated at any time for either role separately.

$ docker swarm join-token --help

Usage:  docker swarm join-token [-q] [--rotate] (worker|manager)

Manage join tokens

Options:
      --help     Print usage
  -q, --quiet    Only display token
      --rotate   Rotate join token

$ docker swarm join-token worker
To add a worker to this swarm, run the following command:

    docker swarm join \

    --token SWMTKN-1-36nk7elqghdm34gcnd7dan7ss4wa41wmj8wdifnif4xl7c1wsr-aikk431lw4xnh3q9i1b65i75u \

    192.168.2.3:2377

$ docker swarm join-token manager
To add a manager to this swarm, run the following command:

    docker swarm join \

    --token SWMTKN-1-36nk7elqghdm34gcnd7dan7ss4wa41wmj8wdifnif4xl7c1wsr-cieriqr5lp7hza3aa65vjarty \

    192.168.2.3:2377

$ docker swarm join-token --rotate manager
Succesfully rotated manager join token.

To add a manager to this swarm, run the following command:

    docker swarm join \

    --token SWMTKN-1-36nk7elqghdm34gcnd7dan7ss4wa41wmj8wdifnif4xl7c1wsr-4pnhnqrsgzljl8n19z4jh5dxx \

    192.168.2.3:2377

Change swarm commands to ps instead of tasks

Pull request #25140

If you checked some of the early releases, you may have seen commands like docker service tasks. These have now changed from tasks to ps in docker service, docker node, and docker stack (stack is still experimental). From the pull requests, there are hints that “tasks” may be reused in a future release, so I'm avoiding the term:

Rather than conflict with the unexposed task model, change the names of the object-oriented task display to docker <object> ps. The command works identically to docker service tasks. This change is superficial.

This provides a more sensical docker experience while not trampling on the task model that may be introduced as a top-level command at a later date.

$ docker service create --name pause busybox tail -f /dev/null

$ docker service ls
ID            NAME   REPLICAS  IMAGE    COMMAND
d8adqe8izzrt  pause  1/1       busybox  tail -f /dev/null

$ docker service ps pause
ID                         NAME     IMAGE    NODE          DESIRED STATE  CURRENT STATE          ERROR
bble60yby7bffjem9r76p4ho4  pause.1  busybox  docker-demo   Running        Running 7 seconds ago

$ docker node ps self
ID                         NAME     IMAGE    NODE          DESIRED STATE  CURRENT STATE           ERROR
bble60yby7bffjem9r76p4ho4  pause.1  busybox  docker-demo   Running        Running 24 seconds ago

Added “docker stack” commands to experimental

Pull request #23522

For those on the experimental releases, you have a docker stack command. This, combined with the docker-compose bundle is docker's version of compose inside of the new swarm. It allows a group of services to be deployed as a single “stack”. DAB files or bundles are a subset of compose, and currently don't support volumes and networks, so those currently using compose may want to hold off on using the new swarm mode for these tasks. The DAB files are also considered experimental and likely to change in the future.

Volumes

Changes to volumes were relatively minor in this release, mostly focused around features needed by volume drivers and bug fixes.

Volume Scopes

Pull request #22077

Volume scopes give the ability for a volume driver to identify itself as “local” or “global”. Local volumes affect a single docker host and global is used for an entire swarm. This is similar to the network scope capability and is visible in the docker volume inspect output.

$ docker volume inspect test
[
    {
        "Name": "test",
        "Driver": "local",
        "Mountpoint": "/home/var-docker/volumes/test/_data",
        "Labels": null,
        "Scope": "local"
    }
]

Support name and driver filters

Pull request #21361

Docker added support to filter volumes by name and driver. This support includes partial name matches with regex support to match specific patterns. The below example shows a listing with all volumes that end in “certs”.

$ docker volume ls -f name=.*certs
DRIVER              VOLUME NAME
local               dm-certs
local               sm-certs

Other changes to volumes

Other changes to volumes include:

  • Add a status field for volume drivers to report back in inspect (pull request #21006)
  • Generate a unique id for mount/umount operations to differentiate callers (pull request #21015)
  • Cleanup volume mounts when container fails to start (pull request #22103)
  • Create missing host paths on Windows, mirroring the existing Linux behavior (pull request #22094)

Deprecation

Docker had previously announced many old behaviors and syntax had been deprecated and would be removed in 1.12. Each of these has had a replacement and a long transition period.

CLI changes

  • Previously deprecated environment variables DOCKER_CONTENT_TRUST_OFFLINE_PASSPHRASE and DOCKER_CONTENT_TRUST_TAGGING_PASSPHRASE have been removed, use DOCKER_CONTENT_TRUST_ROOT_PASSPHRASE and DOCKER_CONTENT_TRUST_REPOSITORY_PASSPHRASE instead. Pull request #22574
  • Previously deprecated options syslog-tag, gelf-tag, and fluentd-tag have been removed, instead use --log-opt tag=name to change from the default container id tags included in logfile entries. Pull request #22620
  • The force option on docker tag has been removed, docker tag defaults to forcing all tag operations making it consistent with other commands. Pull request #23090
  • The docker ps options --before and --since options have been removed. Use docker ps --filter=before=... and --filter=since=... instead. Pull request #22138
  • The three argument form of docker import file repository tag has been removed, instead use a colon between the repository and tag: file repository:tag. Pull request #23273

API changes

  • Previously deprecated feature to pass HostConfig on container start has been removed in the API 1.24, this value must be passed on container create instead. Pull request #22570
  • The API /containers/(id or name)/copy has been removed, use /containers/(id or name)/archive instead. Pull request #22149

Other stuff

To channel my inner Billy Mays, “but wait, that's not all!” At the same time Docker announced 1.12 at DockerCon 2016, they also released a few other things that aren't listed in the release notes.

  • Docker for Windows is now stable, using HyperV
  • Docker for Mac is now stable, using xhyve
  • Docker for AWS and Azure are entering beta, which are based on Docker 1.12
  • Docker store entering beta, which is a commercial/paid image registry
    All of this leaves me thinking that this is a great time to be working with Docker and that it's going to be tough keeping up with how much they change in these releases. In fact, the 1.12.1 release was already made before we had a change to finish up this three part series.