Since I started working as a computer vision engineer, I have had the chance to experience how Docker works. All the experience I have accumulated so far, is distilled in here. You can consider this a 101 Docker tutorial to get yourself equipped with a solid basic understanding and acquire some hands-on experience.
We have all been there. You learn something that triggers your curiosity. You can’t wait to test it out on a real project. But life happens, and you hide your new shiny knowledge in the back of your skull.
“In theory, theory and practice are the same. In practice, they are not”
These words by Benjamin Brewster , have resonated within me for some weeks now. I invite you to follow the steps described in here. You can own the practical experience that took me a couple of months to gain in a matter of 10 minutes.
Time to get our hands dirty.
Before jumping into the practical side, let’s get the bare minimum to have a clear idea about Docker.
Docker executes processes in isolated environments known as containers. It doesn’t matter wether it is in our local host or server. This means you can install OS -operating systems-, libraries and packages in a container. Therefore, your container is like this mini-computer separated from your real computer.
This is far from formal, but it is useful to know what you can do with a container. The learning curve is fast and encompasses multiple applications in your real-life projects.
Just think about the following situations:
- You are working on Windows, but you need a different OS to develop a package.
- You want to install some libraries in a separated way and run some code, but you want to avoid compatibility issues.
- You want to upload some scripts to the cloud, but those scripts run on specific OS and libraries.
All those situations (and more) can be solved using Docker. Fine, containers are pretty powerful, but how do you create a container in the first place?
Just like every treasure has a map, every container has an image. More specifically a Docker image, which is a file comprised of layers with all the necessary information to generate a container. The process to generate a Docker image can start from scratch or by using pre-existing Docker images.
When you define a Docker image from scratch you want to generate a set of instructions that describes how the image will behave. The place where you insert those instructions is a file called Dockerfile. One interesting fact about Dockerfile: it is a text file with no extension.
The following diagram shows the stages to start a Docker container:
Sometime, you don’t need to define and customize a Dockerfile. For instance, you may require just a basic image to start your container. That’s where Docker Hub can become a suitable resource for your work. Basically, once you have registered in the platform, you can pull (download) and push (upload) Docker images. In that sense, Docker Hub is somehow similar to GitHub.
Docker Hub facilitates your process of operating with Docker, as it reduces the number of stages:
Implementing a Docker container will teach you some of the steps I follow when working on machine learning projects. I have decided to use the first diagram to provide a panoramic perspective of Docker.
Let’s recreate a realistic situation with some constraints to think, understand and apply containers.
Your manager or client demands you to try a certain machine learning algorithm, in a script.py file. The algorithm will be part of a software solution. If you approve the algorithm, then a production team will deploy the software to serve a certain high-potential market.
Before your excitement goes through the roof, your manager warns you about some conditions:
- The algorithm requires you to work with Ubuntu 16.04 and python version 3.5.2
- The algorithm is based on pandas 0.24.2 and numpy 1.16.4
- After you try the algorithm, your manager wants to try it as well
- If successful, your team will deploy the algorithm in the public cloud and escalate it according to demand
- You are going to work on your brand-new laptop that comes with Ubuntu 20:04 installed and python 3.8.6
Fine, so how do you approach the situation? Well, probably the best way to start is realising that you can’t directly launch script.py on your laptop. This is because your OS and python version are different from the ones required by the algorithm.
Some solutions are available when it comes to fix compatibility issues regarding software.
A popular strategy in python is to apply virtual enviroments. They share the isolation feature of containers, as they are separated from your device. In a virtual enviroment, you can install whatever version of python you wish. Although, this may sound good, you can’t use virtual enviroments to test the machine learning algorithm. This is because, you would still work with Ubuntu 20.04 and we need Ubuntu 16.04.
Another potential solution is to try a virtual machine (VM). A VM gets you covered to install Ubuntu 16.04 and python 3.5.2 on your laptop. However, it has some drawbacks for this particular situation. Namely, it fails to be portable enough to be transferred to other devices (your manager’s laptop) and it can’t be deployed into the cloud.
The best solution in this case is probably a Docker container. In case you don’t have Docker installed on your computer, you can follow the instructions in the official page.
Let’s start to write some code.
The first stage is to build our Dockerfile:
RUN apt-get update &&\
apt-get install -y python3 &&\
apt-get install -y python3-pip &&\
python3 -m pip install -U pip
COPY requirements.txt /opt/app/requirements.txt
RUN pip install -r requirements.txt
The first line pulls Ubuntu 16.04 OS as the base image for subsequent instruction. We are basically setting the foundation of our container using the command FROM.
Next, up to the fifth line we execute some commands using the RUN instruction. In the second line we update the apt-get command-line tool for handling packages in Ubuntu-based Linux distributions. Right after that we install python 3.5.2, and in the following line, we also install pip, the package installer for python. Finally, in the fifth line we upgrade pip to its latest version available.
The sixth line copies requirements.txt, that contains the required libraries of the algorithm, into the path /opt/app/requirements.txt that will be created inside the docker container.
The seventh line sets the working directory to /opt/app path inside the container. This will be the default path inside the container once we start it.
The last line executes pip install using read mode in the requirements.txt file. This will trigger the installation of pandas 0.24.2 and numpy 1.16.4 on the Docker container.
Below you can see the content of requirements.txt
Now, it is all ready to create our Docker Image! Make sure you know the path to your Dockerfile and requirements.txt and then change your terminal directory. In my case, I have this path as mypath/docker but you can choose whatever you want.
mylaptop:~mypath/docker$ docker build .
We build the Docker image using the build command. This generates the Docker image in the current path, that’s the reason to use the dot symbol (.) at the end. The build command takes the Dockerfile and requirements.txt and builds our image from them. You can tag your image using the following:
mylaptop:~mypath/docker$ docker build -t coolimage:1.0 .
The -t flag will give a repository name coolimage to the image and the tag will be 1.0
When you run this command, you will see a bunch of steps being executed in your terminal:
You can check if your image was created successfully by looking at the bottom message displayed on the terminal:
This means that our image ID is 8039bb5fb541 and we also tagged it as coolimage:1.0
The only thing left to do is to get our container up and running. There are some good practices that you can apply to have a more versatile container:
- Port mapping: to connect some ports between your container and the host where it is running
- Set your docker run [options]: while there are numerous options that can be checked here, one of the most important running options is to chose between detached mode -container exits when the root process ends — and foreground mode that allows to start the process in the container and attach a pseudo-terminal, also known as tty or pts.
- Attach some volumes to the container: this allows to access directories inside your local device/server and your container. More efficient than just copying folders and files in your container
- Tag your container: instead of having a long and boring container ID by default such as 46d5e125776f, why don’t call it something like happyhippo.
Now, we type the following command in our terminal:
$ docker run -ti -v/home/victor/tutorials/ml:/container_ml --name happyhippo coolimage:1.0
docker run is what specifies an image — for example coolimage:1.0 — to derive the container from. The flag -ti runs processes in foreground interactively and attaches a pseudo terminal where you can see input and output streams.
-v/home/hostdirectory:/container_ml maps the host directory on the left of the colon (:), with the docker directory on the right. All files that you have in your device directory /home/hostdirectory are now visible and accessible from your container in /container_ml. The script.py file is stored in /home/hostdirectory in the laptop
happyhippo is a recognisible name of your container in addition to the container ID 46d5e125776f that is generated automatically.
Congrats! You have just created a Docker container that meets all the conditions set by your manager. For instance you can check the python version installed on the container:
You can write some commands in the terminal of the container.
You can see that requirements.txt file was copied in /opt/app which is the work directory of the container. We defined this in our Dockerfile.
By chaning the directory to container_ml/ we can see the script.py file that we mounted in the volume is now inside our container. This means that we can execute the algorithm and test it.
root@46d5e125776f:/container_ml# python3 script.py
The diagram below describes the main commands that we have used to solve the assignment from your manager:
I have tried to provide the content that covers the most common basic operations with Docker. Sometimes you could need extra tricks to make your day-to-day smoother. Let’s see some examples
List and remove images and containers
It’s likely that you end up with several images and containers when you work with Docker. That’s why you want to list and sometimes remove images and containers.
To list images and containers:
# list all images (including intermediate images)
docker image ls -a # list all containers
docker container ls -a# list running containers
docker container ls
When containers and images are listed you will see a STATUS column indicating the current status of the image and the container.
To remove images and containers
# remove specific image
docker image rm IMAGE_ID# remove specific container
docker rm CONTAINER_ID# remove all unused images
docker image prune# remove all stopped containers
docker container prune
Just bear in mind that containers are built upon images. Therefore, you should remove containers before removing images.
Restart container and save changes
I have faced situations in my work where I needed to store changes in my container, so the next day I can retake my progress where I left it. Imagine for a moment, that you have managed to gather some external data to feed our machine learning algorithm. You left the data inside the container via mounted volume, and tomorrow you will train the algorithm using that data.
You come back to the office with your newly acquired Docker skills and type:
docker container ls -a
The status of the container is exited. But we want the container up and running. To do so, we follow a two-steps process: first we start the container and then we attach a terminal to it:
docker start happyhippo
docker attach happyhippo
Now, you have the container up back again along with all the changes you have made so far.
Information about container and images
Containers and images include many different details about their features. There is a lot happening beneath the surface!
# information about container
docker inspect CONTAINER_ID# information about image
docker inspect IMAGE_ID
You will get a lot of information in json format such as ports, mounted volumes and so on. Sometimes, I have used the inspect command to find the ip address of a certain container. To do it, just make sure that the status of the container is up and then use the inspect command. If you scroll down to to NetworkSetting you will find the ip address.
I just became confident with Docker after practicing several times the steps I have shown you in the previous diagram. First define the Dockerfile, then create an image and finally generate a container. This article is meant to give you a taste of the whole process.
I encourage you not only to follow along, but to start a container yourself (visit Docker Hub for some inspiration if necessary). That exercise will put you one step closer of mastering Docker.
Feel free to check my other article(s)!