Volumes and Dockerfiles Don’t Mix
Volumes with Docker are a popular topic, particularly on the forums and Q&A sites I watch. From there I've seen confusion because of the multiple ways to create a volume. From these interactions, I've come away with a best practice of my own:
Don't Create Volumes Inside a Dockerfile
When creating an image, specifying your volumes in the image seems like a basic step that better defines how your image should be used. However, this practice has no upside, and lots of problems that it creates.
Issue #1: Anonymous Volumes
Defining a volume inside the image tells docker to store this data separate from the rest of the container, even if you don't define the volume when you spin up the container. Docker's way of storing this data is to create a local volume without a name. The name itself is a long unique id string that contains no reference to the image or container it's attached to. And unless you explicitly tell docker to remove volumes when you remove the container, these volumes remain, unlikely to ever be used again:
$ docker volume ls
DRIVER VOLUME NAME
In environments using multiple images, this can quickly become a search through every image being run, sometimes every docker-compose.yml, to discover where these anonymous volumes are coming from.
Issue #2: You Cannot Modify a Volume Once Created, Sometimes
Docker's documentation states that once you create a volume, any future changes will be discarded:
Note: If any build steps change the data within the volume after it has been declared, those changes will be discarded.
In practice, this is sometimes true, though not always. This has resulted in lots of confusion where someone creates an image and suddenly finds files missing from their volume folder that they know they added as part of the build. The fix is easy enough, move the volume definition to the end of your Dockerfile, unless…
Issue #3: Derived Images Cannot Modify the Volume Either
If you happen to be unlucky enough to base your image off of another that defines a volume in their Dockerfile, there's no reliable way to update the contents of the volume contents. The choices are very limited and not pretty. They include updating the application to use a different directory for data, making an entrypoint that modifies the directory each time the container is started, or recreate and maintain your own copy of the upstream image without the Dockerfile. This last option tends to be the most common solution, and means that you lose some of the benefit of sharing image layers and increase the amount of maintenance needed to keep your images updated.
You Can Use Volumes Without Defining Them Inside the Image
All of these issues begs the question, “if I shouldn't define volumes inside the Dockerfile, then how can I use volumes in docker?” The answer is to use the docker-compose.yml or
docker run. Not only can you define your volume, but you can give it a name, select a volume driver, or map a directory from the host. You can make any folder inside an image a volume when you run the container, regardless of whether the image defines the directory as a volume.
So go ahead, use volumes, they are a great feature of docker. Just don't define them inside your Dockerfile.