BoxBoat Blog
Service updates, customer stories, and tips and tricks for effective DevOps
What’s New in Docker 1.12 – Part 3
by Brandon Mitchell | Monday, Sep 19, 2016 | Docker
This is the last in a 3 part series covering all the changes in the Docker 1.12 release. Part 1 covered changes introduced to building images, distribution, and logging. And part 2 covered changes introduced to networking, the new plugin interface, and the client API. This post will cover changes to the runtime, swarm, volumes, and deprecated features.
Runtime
The runtime gets to the core of Docker, with changes made to components like storage drivers, the engine restart behavior, events, systemd integration, namespaces, and container specific kernel settings.
Live Restore
Pull request #23213
Live restore allows you to stop the Docker daemon without stopping the running containers, and automatically reconnect to those containers when the daemon restarts. Because of the FIFO buffers used for container output, container processes may hang during an extended daemon outage if they fill up this pipe. Swarm mode also doesn't support the live restore option. The built-in orchestration of swarm mode allows you to gracefully migrate containers to another host in the swarm before stopping or restarting the engine, so this is a minor limitation.
To enable this, you need to pass the --live-restore
flag to the daemon when starting it, or you can modify the configuration of a running daemon by changing the /etc/docker/daemon.json
to include:
{
"live-restore": true
}
and then signaling dockerd to reload it's configuration with killall -HUP dockerd
. More details on this feature are available in the docker docs.
Runtime command
Pull request #22983
The Docker engine doesn't include code to run containers. Wait, what? Don't panic, this is a somewhat old change when they split out the containerd and runc code, and from the user perspective, everything still works the same.
The change for 1.12 is the ability to update the daemon with information on other runtimes. And with the daemon configured, each container you launch can now be served by a different runtime. This opens the doors for containers to be run in a VM to give even further isolation, possibly with a microkernel to keep it extremely lightweight, while still running your existing containers in runc. I suspect this change is in preparation for the Windows native container support. Some example scenarios look like:
# configure the runtime directly on the daemon command line
$ dockerd --add-runtime ”oci=/usr/local/sbin/runc” \
--add-runtime ”vm=/usr/local/bin/vm-manager”
# or you can configure via daemon.json
$ cat /etc/docker/daemon.json
{
"runtimes": {
"oci": {
"path": "/usr/bin/docker-runc",
"runtimeArgs": [
"--debug"
]
},
"vm": {
"path": "/usr/local/bin/vm-manager",
"runtimeArgs": [
"--debug"
]
}
}
}
# then each container can use a different runtime
$ docker run --runtime=vm <image-name>
Filesystem Drivers
For this release, the filesystem drivers received a few bug fixes and minor features:
- Added overlay2 driver that resolves inode exhaustion issues from overlay (pull request #22126)
- Btrfs adds daemon option
--storage-opt btrfs.min_space=10G
for minimum size of subvolumes it creates,docker run --storage-opt size=5G
would then fail for being too small (pull request #19651) - Zfs adds daemon option
--storage-opt size=2G
for the container block size, this is an upper quota for the RW layer of a container (pull request #21946)
Events
Load and Save now output events in pull request #22137. You can see the new behavior in the below example:
$ docker save busybox:latest > busybox.tar
$ docker events --since 5m --until 0m
2016-08-05T16:56:33.609312884-04:00 image save sha256:2b8fd9751c4c0f5dd266fcae00707e67a2545ef34f9a29354585f93dac906749 (name=sha256:2b8fd9751c4c0f5dd266fcae00707e67a2545ef34f9a29354585f93dac906749)
A daemon reload will now generate an event from pull request #22590. Note that stop and start still do not generate any event. This does require a configuration file, /etc/docker/daemon.json
by default, for an event to be generated, otherwise no configuration change would occur. Here's an example of both creating that configuration file and seeing the event from the reload:
$ echo '{ "labels": ["foo=bar", "env=laptop"] }' | \
sudo tee /etc/docker/daemon.json
$ sudo systemctl reload docker
$ docker events --since 0 --until 0s --filter "type=daemon"
2016-08-31T17:30:56.763501090-04:00 daemon reload .... (cluster-advertise=, cluster-store=, cluster-store-opts={}, debug=false, default-runtime=runc, labels=["foo=bar","env=laptop"], max-concurrent-downloads=3, max-concurrent-uploads=5, name=docker-demo, runtimes=runc:{docker-runc []})
$ docker info | grep -i -A 2 Labels
Labels:
foo=bar
env=laptop
Container detach now generates an event, mirroring the attach event that already existed, from pull request #22898. When attached to a container with a tty (-t
), you can detach with the [cont]-P [cont]-Q key combination. In practice this looks like:
$ docker run -itd --name test-attach busybox /bin/sh
5d4c5ed10a6c160e12afdacda3e2ac776ad75158917db22e82aabebe3b6b961d
$ docker attach test-attach
/ #
# cont-P cont-Q entered to detach back to host
$ docker events --since 1m --until 0s --filter event=detach
2016-08-31T17:57:07.881459110-04:00 container detach .... (image=busybox, name=test-attach)
Systemd configuration
The supplied systemd unit file now supports reloading the daemon. This simply adds an ExecReload unit file entry as describe in pull request #22446.
In my own environments, I've copied the /lib/systemd/system/docker.service to /etc/systemd/system/docker.service and made my local configuration changes in there where automated upgrades won't overwrite. So the change is visible in a diff before I copy the new line into my local configuration:
# Before updating config
$ sudo systemctl reload docker
Failed to reload docker.service: Job type reload is not applicable for unit docker.service.
$ diff /lib/systemd/system/docker.service /etc/systemd/system/docker.service
12,17c12,20
< ExecReload=/bin/kill -s HUP $MAINPID
# other local changes for PKI certificates where also listed
$ sudo vi /etc/systemd/system/docker.service
# Added the ExecReload line from the above diff
$ sudo systemctl daemon-reload
$ sudo systemctl reload docker
# no errors, containers remain up
Pid Namespace
Docker added the ability to join the pid namespace of another container with docker run --pid container:<name|id>
in pull request #22481. This can be useful for so called “sidecar” containers that can be run and managed individually without a docker exec
command that includes no visibility from docker ps
. Here's an example of this in the lab:
$ docker run --name test-tail -itd busybox tail -f /dev/null
35fc6df2e27ee42eac5a506b8ee36a504e535e36838f53111b9cbfe6e191c348
$ docker run --name test-pid-ns --pid container:test-tail -it busybox /bin/sh
/ # ps -ef
PID USER TIME COMMAND
1 root 0:00 tail -f /dev/null
6 root 0:00 /bin/sh
12 root 0:00 ps -ef
Setting sysctls per container
Docker added a new flag, --sysctl
, for passing name spaced kernel settings. This is only supported with values that are name spaced into the container and so won't impact the host or other containers. At present, the “net.” values can be set when not running with --net=host
and the following IPC values can be defined: kernel.msgmax, kernel.msgmnb, kernel.msgmni, kernel.sem, kernel.shmall, kernel.shmmax, kernel.shmmni, kernel.shm_rmid_forced Sysctls beginning with fs.mqueue.. More details are available in the docker run reference and pull request #19265. The below example shows how shmmax can be adjusted:
$ cat /proc/sys/kernel/shmmax
18446744073692774399
$ docker run -it --rm debian cat /proc/sys/kernel/shmmax
18446744073692774399
# lower the available shared memory, could also increase it
$ docker run -it --rm --sysctl "kernel.shmmax=100" debian \
cat /proc/sys/kernel/shmmax
100
Support setting size of root filesystem
Docker added the ability to define the size of the root filesystem per container for those not running AUFS in pull request #19367. This means a single large container on the host won't result in all the other containers using a similarly large chunk of disk, you can now set a low starting threshold and increase or decrease it for specific containers. This is implemented with docker run --storage-opt size=1G
to change it to 1 Gig for that specific container. Since I'm running AUFS on all of my lab systems, I don't have a demo for this.
Adjust the out-of-memory score
If you ever run out of memory on a server, the kernel has it's choice of what to kill. You can adjust the score it uses to avoid having the docker daemon itself killed using a new dockerd --oom-score-adjust=500
option in pull request #24516. Possible values for this option are from -1000 to 1000, defaulting at 0, and setting it to 500 means the daemon will be killed after other processes on the host but before any OS services that may increase their own score (e.g. dbus and sshd). This also falls into the “didn't demo” category because I didn't want to crash a server.
Don't show created containers with exited filter
Docker fixed a bug where created containers would be included when filtering for the exited status in pull request #21947. Now they show with the status=created
but not exited=0
:
$ docker create --name=test-create busybox
40c48518b53536538f85206d1a8386619d441aa5cabd511b89de2bb7df356374
$ docker ps -a --filter exited=0
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a6b3db79dc52 busybox "env" 9 hours ago Exited (0) 9 hours ago test_test_run_5
$ docker ps -a --filter status=created
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
40c48518b535 busybox "sh" About a minute ago Created test-create
Detach Key Sequence
A popular tip to new Docker users is using the [cont]-P [cont]-Q key sequence in tty connections to detach from a running container while leaving it running. Docker supports the ability to provide your own key sequence to perform the detach, and in pull request #22943 they fixed a bug where a multi-key sequence would drop the initial keys from a partial key sequence. So before, if you said “a-b-c” was your sequence to detach, and then typed “a-b-d”, the A and B would be dropped instead of passed to your container.
If you haven't tried --detach-keys
before, use a ,
to separate multiple keys in a sequence, and ctrl-x
for control characters. Note that the sequence defaults back to ctrl-p,ctrl-q
when you reattach, so you need to provide your personal sequence each time you run a docker run
or docker attach
. Also, note that docker run --rm
is not compatible with --detach-keys
. The key sequence will detach you from the container, but the remove will hang waiting for the container to exit and you won't get a shell prompt.
$ docker run --detach-keys=a,b,c -it --name test-detach busybox
# note that "abd" don't appear until the "d" is pressed in an invalid sequence
/ # echo abd
abd
/ # echo aa
aa
# entering "abc" on the next line drops to the host bash prompt "$"
/ # echo $
Handle “on-failure” counts on daemon restart
If you have docker configured to restart your containers on failures with a limit, you may have noticed that docker resets that limit when the docker engine is restarted. In pull request #20853 that behavior was corrected to honor the failure count across daemon restarts and leave a failing container down if it reaches its limit. The below example shows a simple container that does nothing but fail:
$ docker run -tid --name test-restart --restart on-failure:3 busybox sh -c "sleep 30; exit 127"
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9f0299f0b09a busybox "sh -c 'sleep 30; exi" 5 seconds ago Up 3 seconds test-restart
$ docker inspect -f '{{.RestartCount}}' test-restart
0
# after 100 seconds, all 3 retries are done
$ sleep 100
$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9f0299f0b09a busybox "sh -c 'sleep 30; exi" 2 minutes ago Exited (127) Less than a second ago test-restart
$ docker inspect -f '{{.RestartCount}}' test-restart
3
# bounce the docker daemon to verify the container no longer restarts
$ sudo systemctl restart docker
# before 1.12, the container would be trying to restart
$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9f0299f0b09a busybox "sh -c 'sleep 30; exi" 3 minutes ago Exited (127) About a minute ago test-restart
$ docker inspect -f '{{.RestartCount}}' test-restart
3
Fix stats when network is shared
Docker allows you to remove isolation of containers in many ways, or to merge parts of containers together. One of those options is to share the network stack between two containers. In pull request #21904 docker fixed a bug where a shared network stack would show 0s for stats of the network on the second container. The below example shows this when connecting the network to a preexisting container:
# registry container is already running locally
$ docker run -d --net container:registry --name test-net busybox tail -f /dev/null
$ docker stats
CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
cc6d49863395 0.01% 284 KiB / 5.75 GiB 0.00% 23.19 kB / 648 B 69.63 kB / 0 B 0
13062f97a75c 0.01% 15.52 MiB / 5.75 GiB 0.26% 23.19 kB / 648 B 12.95 MB / 0 B 0
Other runtime changes
Some other notable runtime changes include:
- Automatically update the seccomp profile when containers are given specific capabilities in pull request #22554
- Un-deprecated the
docker run -c
shorthand for--cpu-shares
since it is still widely used despite warnings since 1.9 release in pull request #22621 - Order nested tmpfs mounts to avoid mounting a child directory before the parent in pull request #22329
- Recover from crash during container removal by changing to a dead state in pull request #22423
- Return error code 400 instead of 500 when a container is created without a command in pull request #22762
- Ignore SIGPIPE in daemon so journald restart doesn't silent crash engine in pull request #22460
- Show containers with failing stats requests as error instead of hiding the container from the stats output in pull request #20835
Swarm
There's no way I'm getting through a 1.12 “what's new” blog post without mentioning the integrated swarm mode. There's still a standalone solution that uses containers. However, the new swarm mode is completely integrated with the docker engine and CLI, and it includes high availability, orchestration, and service discovery out of the box. The high availability is implemented with a quorum algorithm, so with 3 managers you can lose one and still have a majority to maintain the state. The orchestration allows you to define a target state and the swarm will constantly correct for any failed containers or nodes to bring up the desired number of instances of your services. And the service discovery uses a routing mesh to connect a port opened on all nodes in the swarm to your containers running the attached service, regardless of where the containers are running. This topic is big enough to deserve a longer discussion than this post has space for, so be sure to check out docker's blog post about the release for more details.
Merging the swarmkit codebase
Pull request #23361, pull request #23362, and pull request #23363
This group of 3 pull requests was the big one, hundreds of files added or changed to add the new feature. This is pulling in the swarmkit codebase where Docker developed the new feature. In these pulls, the new engine code, API calls, and CLI interface were added. For the user, this is the docker swarm
, docker service
, and docker node
commands. Below is the final version of these CLI's on a 1.12.1 system:
$ docker swarm --help
...
Commands:
init Initialize a swarm
join Join a swarm as a node and/or manager
join-token Manage join tokens
update Update the swarm
leave Leave a swarm
$ docker service --help
...
Commands:
create Create a new service
inspect Display detailed information on one or more services
ps List the tasks of a service
ls List services
rm Remove one or more services
scale Scale one or multiple services
update Update a service
$ docker node --help
...
Commands:
demote Demote one or more nodes from manager in the swarm
inspect Display detailed information on one or more nodes
ls List nodes in the swarm
promote Promote one or more nodes to manager in the swarm
rm Remove one or more nodes from the swarm
ps List tasks running on a node
update Update a node
Switching swarm from secrets to tokens
Pull request #24823
This change removes the user generated secrets used in non-GA releases in favor of machine generated tokens. It uses a different token for the worker vs the manager instead of auto accepting into one role based on a single secret. And tokens may be rotated at any time for either role separately.
$ docker swarm join-token --help
Usage: docker swarm join-token [-q] [--rotate] (worker|manager)
Manage join tokens
Options:
--help Print usage
-q, --quiet Only display token
--rotate Rotate join token
$ docker swarm join-token worker
To add a worker to this swarm, run the following command:
docker swarm join \
--token SWMTKN-1-36nk7elqghdm34gcnd7dan7ss4wa41wmj8wdifnif4xl7c1wsr-aikk431lw4xnh3q9i1b65i75u \
192.168.2.3:2377
$ docker swarm join-token manager
To add a manager to this swarm, run the following command:
docker swarm join \
--token SWMTKN-1-36nk7elqghdm34gcnd7dan7ss4wa41wmj8wdifnif4xl7c1wsr-cieriqr5lp7hza3aa65vjarty \
192.168.2.3:2377
$ docker swarm join-token --rotate manager
Succesfully rotated manager join token.
To add a manager to this swarm, run the following command:
docker swarm join \
--token SWMTKN-1-36nk7elqghdm34gcnd7dan7ss4wa41wmj8wdifnif4xl7c1wsr-4pnhnqrsgzljl8n19z4jh5dxx \
192.168.2.3:2377
Change swarm commands to ps instead of tasks
Pull request #25140
If you checked some of the early releases, you may have seen commands like docker service tasks
. These have now changed from tasks to ps in docker service
, docker node
, and docker stack
(stack is still experimental). From the pull requests, there are hints that “tasks” may be reused in a future release, so I'm avoiding the term:
Rather than conflict with the unexposed task model, change the names of the object-oriented task display to
docker <object> ps
. The command works identically todocker service tasks
. This change is superficial.This provides a more sensical docker experience while not trampling on the task model that may be introduced as a top-level command at a later date.
$ docker service create --name pause busybox tail -f /dev/null
$ docker service ls
ID NAME REPLICAS IMAGE COMMAND
d8adqe8izzrt pause 1/1 busybox tail -f /dev/null
$ docker service ps pause
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR
bble60yby7bffjem9r76p4ho4 pause.1 busybox docker-demo Running Running 7 seconds ago
$ docker node ps self
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR
bble60yby7bffjem9r76p4ho4 pause.1 busybox docker-demo Running Running 24 seconds ago
Added “docker stack” commands to experimental
Pull request #23522
For those on the experimental releases, you have a docker stack
command. This, combined with the docker-compose bundle
is docker's version of compose inside of the new swarm. It allows a group of services to be deployed as a single “stack”. DAB files or bundles are a subset of compose, and currently don't support volumes and networks, so those currently using compose may want to hold off on using the new swarm mode for these tasks. The DAB files are also considered experimental and likely to change in the future.
Volumes
Changes to volumes were relatively minor in this release, mostly focused around features needed by volume drivers and bug fixes.
Volume Scopes
Pull request #22077
Volume scopes give the ability for a volume driver to identify itself as “local” or “global”. Local volumes affect a single docker host and global is used for an entire swarm. This is similar to the network scope capability and is visible in the docker volume inspect
output.
$ docker volume inspect test
[
{
"Name": "test",
"Driver": "local",
"Mountpoint": "/home/var-docker/volumes/test/_data",
"Labels": null,
"Scope": "local"
}
]
Support name and driver filters
Pull request #21361
Docker added support to filter volumes by name and driver. This support includes partial name matches with regex support to match specific patterns. The below example shows a listing with all volumes that end in “certs”.
$ docker volume ls -f name=.*certs
DRIVER VOLUME NAME
local dm-certs
local sm-certs
Other changes to volumes
Other changes to volumes include:
- Add a status field for volume drivers to report back in inspect (pull request #21006)
- Generate a unique id for mount/umount operations to differentiate callers (pull request #21015)
- Cleanup volume mounts when container fails to start (pull request #22103)
- Create missing host paths on Windows, mirroring the existing Linux behavior (pull request #22094)
Deprecation
Docker had previously announced many old behaviors and syntax had been deprecated and would be removed in 1.12. Each of these has had a replacement and a long transition period.
CLI changes
- Previously deprecated environment variables DOCKER_CONTENT_TRUST_OFFLINE_PASSPHRASE and DOCKER_CONTENT_TRUST_TAGGING_PASSPHRASE have been removed, use DOCKER_CONTENT_TRUST_ROOT_PASSPHRASE and DOCKER_CONTENT_TRUST_REPOSITORY_PASSPHRASE instead. Pull request #22574
- Previously deprecated options syslog-tag, gelf-tag, and fluentd-tag have been removed, instead use
--log-opt tag=name
to change from the default container id tags included in logfile entries. Pull request #22620 - The force option on
docker tag
has been removed,docker tag
defaults to forcing all tag operations making it consistent with other commands. Pull request #23090 - The
docker ps
options--before
and--since
options have been removed. Usedocker ps --filter=before=...
and--filter=since=...
instead. Pull request #22138 - The three argument form of
docker import file repository tag
has been removed, instead use a colon between the repository and tag:file repository:tag
. Pull request #23273
API changes
- Previously deprecated feature to pass HostConfig on container start has been removed in the API 1.24, this value must be passed on container create instead. Pull request #22570
- The API
/containers/(id or name)/copy
has been removed, use/containers/(id or name)/archive
instead. Pull request #22149
Other stuff
To channel my inner Billy Mays, “but wait, that's not all!” At the same time Docker announced 1.12 at DockerCon 2016, they also released a few other things that aren't listed in the release notes.
- Docker for Windows is now stable, using HyperV
- Docker for Mac is now stable, using xhyve
- Docker for AWS and Azure are entering beta, which are based on Docker 1.12
- Docker store entering beta, which is a commercial/paid image registry
All of this leaves me thinking that this is a great time to be working with Docker and that it's going to be tough keeping up with how much they change in these releases. In fact, the 1.12.1 release was already made before we had a change to finish up this three part series.