Docker and User Namespaces by Trial and Error

February 22, 2016 - 5 minutes read - 892 words

Included in the recent release of Docker 1.10 is a feature destined to become more important with future releases: support for user namespaces. At the moment, it’s not enabled in a fresh install, and it still feels a little bleeding edge compared to more established Docker features, but it does work and is worth getting to know.

I spent a little time getting familiar; by no means enough to claim expertise, but enough to make it work. Hopefully the fact that it’s new to me will make it easier for me to explain to others, since I hit some obstacles on the way to getting it to work.

I’m assuming that if you’re reading this, you may have seen one of the excellent pages on user namespaces in Docker. Very briefly, the idea is to map the “root” user in the container to be some normal (unprivileged) user on the host system. This allows us to prevent containers from modifying files on the host, even with mapped volumes, which allows closing other security holes that allow containers to improperly obtain privileges on the host.

I started with a fresh install of Ubuntu Wily; however, despite this being the latest, it doesn’t have a very new Docker in the default set of packages. So we need to move on to using Docker’s own repository. In my Ansible playbook, this looks like this:

  - name: docker apt key
    apt_key: keyserver=keyserver.ubuntu.com id=F76221572C52609D
  - name: install docker repo
    apt_repository: repo='deb http://apt.dockerproject.org/repo ubuntu-wily main'
  - name: Install packages
    apt: pkg=docker-engine state=installed update_cache=yes

Note that the package name changed recently from ‘docker.io’ to ‘docker-engine’.

With that done, and the Docker service started, we now have:

root@penguin64:~# docker -v
Docker version 1.10.1, build 9e83765

However, this installation does not have user namespaces enabled. To enable it, we need to pass an argument to the Docker daemon. Here’s the first place where there is a potential for confusion. On Ubuntu, there is a file /etc/default/docker with some content; however, this file is not used now that Docker has switched over to running services with systemd. Instead, the expected way to handle it is to create a “drop-in”. Systemd takes configuration files from /lib/systemd, but it also looks in /etc/systemd for files that override the defaults. This is a nice feature in that it avoids the issue of having a package manager not be able to update a file because it’s been customized.

The convention with systemd is to create an override directory for each service. Since the Docker configuration file lives in /lib/systemd/system/docker.service, this means a directory called /etc/systemd/system/docker.service.d. All *.conf files in this directory will override anything in the default configuration file.

---
layout: post
title: "mkdir -p /etc/systemd/system/docker.service.d"
description: ""
category: articles
tags: []
---
---
layout: post
title: "cat >/etc/systemd/system/docker.service.d/userns.conf <<EOD"
description: ""
category: articles
tags: []
---
[Service]
ExecStart=
ExecStart=/usr/bin/docker daemon -H fd:// --userns-remap=default
EOD
---
layout: post
title: ""
description: ""
category: articles
tags: []
---

The first ExecStart= clears out the default value, since systemd supports multiple processes in a single service for some service types. The second replaces the default value with the command we want. Getting that command right was itself a little painful, since docker daemon --help in 1.10 isn’t terribly verbose when it comes to identifying what kind of parameter is expected for --userns-remap. (That documentation issue has been fixed in latest master.)

Of course, there are other options besides default, but the default worked for my purposes; it remaps into the ’nobody’ user on the host.

With this file in place, we need to reload systemd, then docker:

---
layout: post
title: "systemctl daemon-reload"
description: ""
category: articles
tags: []
---
---
layout: post
title: "systemctl restart docker"
description: ""
category: articles
tags: []
---

Here’s where I hit the second obstacle. I tried running a Docker image, only to find out that I had no images. When switching to a separate namespace, Docker creates a directory under /var/lib/docker for the namespace:

vagrant@penguin64:~$ sudo ls -l /var/lib/docker
total 32
drwx------ 9 296608 296608 4096 Feb 23 00:59 296608.296608
drwx------ 2 root   root   4096 Feb 12 21:09 containers
drwx------ 5 root   root   4096 Feb 23 00:42 devicemapper
drwx------ 3 root   root   4096 Feb 12 21:09 image
drwxr-x--- 3 root   root   4096 Feb 12 21:09 network
drwx------ 2 root   root   4096 Feb 23 00:43 tmp
drwx------ 2 root   root   4096 Feb 12 21:09 trust
drwx------ 2 root   root   4096 Feb 12 21:09 volumes

No big deal, just had to pull the image I wanted again.

Finally, we can get down to starting a container and seeing the effect of namespaces:

vagrant@penguin64:~$ docker run -it --rm centos /bin/bash
[root@65fd7566b552 /]# whoami
root

So inside the container, it still thinks of itself as root. But root inside the container is not root on the host system:

vagrant@penguin64:~$ docker run -it --rm -v /opt:/opt centos /bin/bash
[root@690e37988416 /]# ls -ld /opt
drwxr-xr-x 2 65534 65534 4096 Nov  6 21:38 /opt
[root@690e37988416 /]# touch /opt/file1
touch: cannot touch '/opt/file1': Permission denied

And this means that it is no longer possible to use a SUID trick to root the host.

Right now, this feature is limited so that all containers on a host share the same namespace. On the roadmap is supporting per-container namespaces, allowing finer control over what each container can access on the host.