Mar 28, 2019 02:48
Docker containerization is presently all the rage, and I have been using it for some time on Linux. However, for a long while, I never really bothered to find out how Docker containerization is done, even at a high level. Recently, I decided to read more about this topic, and here is a very brief summary of what I learned.
Containerization is not virtualization. While virtual machines allow you to run Windows under Linux and vice versa, in Docker you can't. In Docker, you can only start Linux containers on Linux hosts, and Windows containers on Windows hosts. More specifically, the operating system kernel in the host is shared with the containers. Although Docker for Windows make it appear that you can run Linux containers in Windows, it actually achieves this by running these containers inside a Moby Linux virtual machine.
Docker use several mechanisms available in the kernel to provide isolation. On Linux, it uses namespaces and to achieve these goals. Some of the namespaces include process ID, network, inter-process communication, mount, and cgroup. On Windows, namespace, resource control, and process isolation technologies are applied in a manner similar to Linux. These are provided as part of Hyper-V, although container creation do not require the creation of virtual machines. The remainder of this post is about Linux namespaces.
Process ID namespace allow other processes on the host to be hidden from the container, and allow the container to see a process ID of 1. Note that the process IDs in a container is mapped to some other process ID on the host. If Docker is executed with --pid=host, then process ID namespace is disabled, and the container can see all processes running on the host. This has security implications if untrusted processes are being run on the container.
Similar overviews can be given for other namespaces. For example, network namespace allow each container to see its own eth0 interface, which is distinct from the eth0 interface on the host. Similarly, each container appear to have their own set of mount points provided by the mount namespace.
Finally, control groups or cgroups allow resource limits and isolation to be applied for a collection of processes, rather than just one process alone. This feature readily lends itself to containerization technologies, Docker or otherwise.