Dockerfile

Why Docker?

  • Docker is great for quick and reproducible environments. Isolated from your development machine. Captured as code.
  • Pack it up in a Docker container, and share the image – either via a public registry, or a private one if only a select few people should be able to access it
  • If your app is containerized, anybody can run it on their machine – no matter which operating system they use. As long as Docker is installed.
  • A dockerized app is self-contained. The way to generate the image is saved as code, and is reproducible.
  • A Docker image is a great build artifact. Everything is packaged up and ready to go, (in the best case) just waiting for correct configuration values to be passed to it.
  • Docker helps by capturing dependencies and environments as code, and making it easy to start from a clean slate. When your environments are automated and reproducible by design, it’s harder for things to be forgotten or misconfigured.
  • Docker makes it possible to use those popular orchestrators (like Kubernetes or Nomad)
  • Docker makes it possible to limit the amount of RAM and CPU a container can use.

Summary of this article: https://vsupalov.com/docker-solve-problems/

Create the dockerfile

Tutorial & References:

https://docker-curriculum.com/

https://www.simplilearn.com/tutorials/docker-tutorial/what-is-dockerfile

https://takacsmark.com/dockerfile-tutorial-by-example-dockerfile-best-practices-2018/

If there’s a ready-made official image available, I would opt towards using it instead of writing my own. You can find the prebuilt official image here:

https://hub.docker.com/search?image_filter=official&q=

Choose the base image (FROM command)

Choose a base image that has the tooling you’re comfortable with and supports the changes you’ll be enacting on top of the base image

  • Ubuntu/Debian has apt for a package manager, Alpine uses musl libc and has apk

Regarding to the size of the base image, there are a different reason to choose smaller one, some advantage may be:

  • Smaller in size (build, pull, push is fast)
  • Take little space as the compared large image
  • Consume less MEMORY by the OS itself as compared to CentOS
  • Alpine is considered secure and fast
  • Alpine is an official image for docker registry

If your project is deep learning based, using miniconda, you can find the base image with verified publisher here: https://hub.docker.com/u/continuumio

If python project, you can use the python image directly https://hub.docker.com/_/python

Setup environment, install dependencies, libraries (RUN command)

Using RUN command as many as you need. We prefer to group them into categories. Each line of RUN command will create 1 layer in the docker image.

RUN has two forms:

  • RUN : will invoke a shell automatically (/bin/sh -c  by default)
    • RUN apt-get update && apt-get install gcc
  • RUN [“executable”, “param1”, “param2”]: won’t invoke a command shell.

Examples:

# Configure the locales and encoding
RUN apt-get update && apt-get install -y locales && rm -rf /var/lib/apt/lists/* \
    && localedef -i en_US -c -f UTF-8 -A /usr/share/locale/locale.alias en_US.UTF-8

# install kubectl
RUN wget https://storage.googleapis.com/kubernetes-release/release/v1.15.1/bin/linux/amd64/kubectl
RUN chmod +x ./kubectl

# Configure timezone
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone

# Change owner of the directory
RUN chown appuser:appuser /home/appuser/opt/

Running application as a normal user (USER command)

Don’t run your stuff as root, use the USER  instruction to specify the user. Using the USER command after installing/configure all the shared requirements.

# Using user instead of root
RUN useradd --create-home appuser

# Specify the user in the docker image/containers.
USER appuser

Using mount directory or file (VOLUME command):

You can use the VOLUME instruction in a Dockerfile to tell Docker that the stuff you store in that specific directory should be stored on the host file system not in the container file system. This implies that stuff stored in the volume will persist and be available also after you destroy the container.

In other words it is best practice to crate a volume for your data files, database files, or any file or directory that your users will change when they use your application.

The data stored in the volume will remain on the hoest machine even if you stop the container and remove the container with docker rm. (The volume will be removed on exit if you start the container with docker run --rm, though.)e

You can also share these volumes between containers with docker run --volumes-from.

You can inspect your volumes with the docker volume ls and docker volume inspect commands.

It’s more commone to use volumes in the docker-compose file (but you should have directory in the Dockerfile or using the path inside the container for the module/application):

Eg: when the ML application wants to load the deep learning model at (or link at /opt/models/)

The dockerfile should have:

# map to the model directory dynamically, so that no need to edit the model files
RUN ln -s /home/appuser/opt/models /opt/models
image: <registry/imagename:imagetag>
restart: unless-stopped
volumes:
    # mapping volumes
    # left: path on the host, right: path in the docker container/image 
    - /models/dir/on/the/host/machine/:/opt/models/
environment:
    - ENV1=value1
    - ENV2=vallue2

command: /home/appuser/start.sh

Using environment variables (ENV command)

ENV is used to define environment variables. The interesting thing about ENV is that it does two things:

  1. You can use it to define environment variables that will be available in your container. So when you build an image and start up a container with that image you’ll find that the environment variable is available and is set to the value you specified in the Dockerfile.
  2. You can use the variables that you specify by ENV in the Dockerfile itself. So in subsequent instructions the environment variable will be available.
# Set timezone
ENV TZ Asia/Singapore

# Commit on datetime (eg: Kaldi)
ENV TINI_VERSION=v0.18.0 \
    KALDI_SHA1=882b0a6daba7d7a62d1a1037b1ced987946df2e1

RUN git clone https://github.com/kaldi-asr/kaldi && \
    cd /home/appuser/opt/kaldi && \
    git reset --hard $KALDI_SHA1

Adding files, directories to the docker image (COPY or ADD command)

Both ADD and COPY are designed to add directories and files to your Docker image in the form of ADD <src>... <dest> or COPY <src>... <dest>. Most resources, including myself, suggest to use COPY.

COPY utils/ /home/appuser/opt/kaldi/tools/
COPY --chown=appuser:appuser module1/ /home/appuser/opt/module1/

Set the working directory (WORKDIR command)

A very convenient way to define the working directory, it will be used with subsequent RUNCMDENTRYPOINTCOPY and ADD instructions. You can specify WORKDIR multiple times in a Dockerfile.

If the directory does not exists, Docker will create it for you.

RUN mkdir -p /home/appuser/opt
WORKDIR /home/appuser/opt

Specify the listening port (EXPOSE command) – if your application is a webserver.

An important instruction to inform your users about the ports your application is listening on. EXPOSE will not publish the port, you need to use docker run -p... to do that when you start the container.

EXPOSE 8010

Publish the port via docker-compose file (incomplete file):

image: <registry/imagename:imagetag>
restart: unless-stopped
ports:
    - "8010:8010"
command: /home/appuser/start.sh -p 8010

Start the application (CMD or ENTRYPOINT command)

CMD is the instruction to specify what component is to be run by your image with arguments in the following form: CMD [“executable”, “param1”, “param2”…].

You can override CMD when you’re starting up your container by specifying your command after the image name like this: $ docker run [OPTIONS] IMAGE[:TAG|@DIGEST] [COMMAND] [ARG...].

You can only specify one CMD in a Dockerfile (OK, physically you can specify more than one, but only the last one will be used).

It is good practice to specify a CMD even if you are developing a generic container, in this case an interactive shell is a good CMD entry. So you do CMD ["python"] or CMD [“php”, “-a”] to give your users something to work with.

So what’s the deal with ENTRYPOINT? When you specify an entry point, your image will work a bit differently. You use ENTRYPOINT as the main executable of your image. In this case whatever you specify in CMD will be added to ENTRYPOINT as parameters.

ENTRYPOINT ["git"]
CMD ["--help"]

This way you can build Docker images that mimic the behavior of the main executable you specify in ENTRYPOINT.

Docker commands

https://dockerlabs.collabnix.com/docker/cheatsheet/

Comment Disabled for this post!