Complete Dockerfile Tutorial

Table of Contents

What is a Dockerfile?

A Dockerfile is a text file containing instructions to build a Docker image. It's like a recipe that tells Docker how to create your application's container environment step by step.

Basic Structure

Every Dockerfile follows this general pattern:

# Comments start with #
FROM base_image
LABEL maintainer="your-email@example.com"
WORKDIR /app
COPY . .
RUN command_to_run
EXPOSE port_number
CMD ["executable", "param1", "param2"]

Essential Instructions

FROM

Specifies the base image for your Docker image.

# Use official Python runtime as base image
FROM python:3.9-slim

# Use specific version for reproducibility
FROM node:16.14.0-alpine

# Use multi-architecture image
FROM --platform=linux/amd64 ubuntu:20.04

WORKDIR

Sets the working directory inside the container.

# Set working directory
WORKDIR /app

# All subsequent commands will run from /app
COPY package.json .
RUN npm install

COPY vs ADD

Both copy files from host to container, but ADD has extra features.

# COPY - preferred for simple file copying
COPY src/ /app/src/
COPY package*.json ./

# ADD - can extract archives and download URLs
ADD https://example.com/file.tar.gz /tmp/
ADD archive.tar.gz /extracted/

RUN

Executes commands during image build.

# Install packages
RUN apt-get update && apt-get install -y \
    curl \
    vim \
    git \
    && rm -rf /var/lib/apt/lists/*

# Run multiple commands
RUN pip install --no-cache-dir -r requirements.txt

# Use shell form
RUN echo "Hello World" > /tmp/hello.txt

# Use exec form (preferred)
RUN ["python", "-c", "print('Hello from Python')"]

EXPOSE

Informs Docker that the container listens on specified ports.

# Single port
EXPOSE 8080

# Multiple ports
EXPOSE 8080 8443

# UDP port
EXPOSE 53/udp

ENV

Sets environment variables.

# Set environment variables
ENV NODE_ENV=production
ENV PORT=3000
ENV DATABASE_URL=postgres://localhost/mydb

# Multiple variables
ENV NODE_ENV=production \
    PORT=3000 \
    DEBUG=false

ARG

Defines build-time variables.

# Define build argument
ARG VERSION=latest
ARG BUILD_DATE

# Use argument
FROM node:${VERSION}
LABEL build-date=${BUILD_DATE}

# Build with: docker build --build-arg VERSION=16 --build-arg BUILD_DATE=$(date) .

CMD vs ENTRYPOINT

Both specify what command to run when container starts.

# CMD - can be overridden by docker run arguments
CMD ["python", "app.py"]
CMD python app.py

# ENTRYPOINT - always executes, docker run args become parameters
ENTRYPOINT ["python", "app.py"]

# Combine both
ENTRYPOINT ["python", "app.py"]
CMD ["--help"]

VOLUME

Creates mount points for external volumes.

# Create volume mount points
VOLUME ["/data"]
VOLUME ["/var/log", "/var/db"]

# Single volume
VOLUME /app/uploads

USER

Sets the user for subsequent instructions.

# Create user and switch to it
RUN groupadd -r appuser && useradd -r -g appuser appuser
USER appuser

# Use numeric IDs for better security
USER 1000:1000

HEALTHCHECK

Defines how to test if container is healthy.

# HTTP health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
    CMD curl -f http://localhost:8080/health || exit 1

# Custom health check
HEALTHCHECK --interval=30s --timeout=10s --retries=3 \
    CMD python health_check.py || exit 1

# Disable health check
HEALTHCHECK NONE

Best Practices

1. Use Multi-Stage Builds

Reduce final image size by using multiple FROM statements.

# Build stage
FROM node:16 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production

# Production stage
FROM node:16-alpine
WORKDIR /app
COPY --from=builder /app/node_modules ./node_modules
COPY . .
EXPOSE 3000
CMD ["node", "server.js"]

2. Optimize Layer Caching

Order instructions from least to most frequently changing.

# Good - dependencies change less frequently
FROM python:3.9-slim
WORKDIR /app

# Copy requirements first
COPY requirements.txt .
RUN pip install -r requirements.txt

# Copy source code last
COPY . .
CMD ["python", "app.py"]

3. Minimize Image Size

Use alpine images, remove unnecessary packages, and clean up.

FROM python:3.9-alpine

# Install packages and clean up in same layer
RUN apk add --no-cache \
    gcc \
    musl-dev \
    && pip install --no-cache-dir -r requirements.txt \
    && apk del gcc musl-dev

# Use .dockerignore to exclude unnecessary files

4. Use Specific Tags

Avoid latest tag for reproducible builds.

# Bad
FROM python:latest

# Good
FROM python:3.9.7-slim-buster

5. Run as Non-Root User

Improve security by creating and using a non-root user.

FROM python:3.9-slim

# Create user
RUN groupadd -r appuser && useradd -r -g appuser appuser

# Set up app directory
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

# Copy app and change ownership
COPY . .
RUN chown -R appuser:appuser /app

# Switch to non-root user
USER appuser

CMD ["python", "app.py"]

Advanced Techniques

1. Build Arguments for Flexibility

ARG PYTHON_VERSION=3.9
ARG ENVIRONMENT=production

FROM python:${PYTHON_VERSION}-slim

ENV APP_ENV=${ENVIRONMENT}

RUN if [ "$ENVIRONMENT" = "development" ]; then \
        pip install pytest flake8; \
    fi

2. Conditional Logic

ARG INSTALL_DEV=false

RUN if [ "$INSTALL_DEV" = "true" ]; then \
        apt-get update && apt-get install -y \
        vim \
        curl \
        git; \
    fi

3. Using Build Context Efficiently

# Use .dockerignore to exclude files
# .dockerignore content:
# node_modules
# .git
# *.md
# .env*

FROM node:16-alpine
WORKDIR /app

# Copy only necessary files
COPY package*.json ./
RUN npm ci --only=production

COPY src/ ./src/
COPY public/ ./public/

Multi-Stage Builds

Example: Go Application

# Build stage
FROM golang:1.19-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o main .

# Production stage
FROM alpine:latest
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=builder /app/main .
CMD ["./main"]

Example: React Application

# Build stage
FROM node:16-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

# Production stage
FROM nginx:alpine
COPY --from=builder /app/build /usr/share/nginx/html
COPY nginx.conf /etc/nginx/nginx.conf
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]

Security Considerations

1. Use Minimal Base Images

# Use distroless or alpine images
FROM gcr.io/distroless/python3
FROM alpine:3.15

2. Scan for Vulnerabilities

# Add labels for better tracking
LABEL org.opencontainers.image.source="https://github.com/user/repo"
LABEL org.opencontainers.image.created="2023-01-01T00:00:00Z"
LABEL org.opencontainers.image.revision="abc123"

3. Manage Secrets Properly

# Use build secrets (BuildKit)
# docker build --secret id=mypassword,src=./password.txt .

FROM alpine
RUN --mount=type=secret,id=mypassword \
    PASSWORD=$(cat /run/secrets/mypassword) && \
    echo "Password loaded securely"

Real-World Examples

Python Flask Application

FROM python:3.9-slim

# Set environment variables
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1

# Create and set work directory
WORKDIR /app

# Install system dependencies
RUN apt-get update \
    && apt-get install -y --no-install-recommends \
        build-essential \
        libpq-dev \
    && rm -rf /var/lib/apt/lists/*

# Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Create non-root user
RUN adduser --disabled-password --gecos '' appuser

# Copy application code
COPY . .

# Change ownership and switch to non-root user
RUN chown -R appuser:appuser /app
USER appuser

# Expose port
EXPOSE 5000

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
    CMD curl -f http://localhost:5000/health || exit 1

# Run application
CMD ["gunicorn", "--bind", "0.0.0.0:5000", "app:app"]

Node.js Application

FROM node:16-alpine

# Set working directory
WORKDIR /app

# Add a non-root user
RUN addgroup -g 1001 -S nodejs
RUN adduser -S nextjs -u 1001

# Copy package files
COPY package*.json ./

# Install dependencies
RUN npm ci --only=production && npm cache clean --force

# Copy source code
COPY . .

# Change ownership
RUN chown -R nextjs:nodejs /app
USER nextjs

# Expose port
EXPOSE 3000

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
    CMD curl -f http://localhost:3000/api/health || exit 1

# Start application
CMD ["npm", "start"]

Database with Initialization

FROM postgres:13-alpine

# Environment variables
ENV POSTGRES_DB=myapp
ENV POSTGRES_USER=appuser
ENV POSTGRES_PASSWORD=secretpassword

# Copy initialization scripts
COPY init-scripts/ /docker-entrypoint-initdb.d/

# Copy custom configuration
COPY postgresql.conf /usr/local/share/postgresql/postgresql.conf.sample

# Expose port
EXPOSE 5432

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
    CMD pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB} || exit 1

# Volume for data persistence
VOLUME ["/var/lib/postgresql/data"]

Troubleshooting

Common Issues and Solutions

1. Layer Caching Problems

# Problem: Dependencies reinstall every time
COPY . .
RUN pip install -r requirements.txt

# Solution: Copy requirements first
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .

2. Large Image Size

# Problem: Multiple RUN commands create layers
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get clean

# Solution: Chain commands
RUN apt-get update && \
    apt-get install -y curl && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

3. Permission Issues

# Problem: Files owned by root
COPY . .
USER appuser

# Solution: Change ownership
COPY . .
RUN chown -R appuser:appuser /app
USER appuser

4. Build Context Too Large

# Use .dockerignore file:
.git
node_modules
*.log
.env*
README.md

Debugging Tips

  1. Use docker build --no-cache to rebuild without cache
  2. Add temporary RUN commands to debug:
    RUN ls -la /app
    RUN whoami
    RUN env
  3. Use multi-stage builds to separate build and runtime environments
  4. Check logs with docker logs container_name
  5. Interactive debugging:
    docker run -it --entrypoint /bin/sh image_name

Build and Run Commands

Building Images

# Basic build
docker build -t my-app .

# Build with arguments
docker build --build-arg VERSION=1.0 -t my-app .

# Build with specific Dockerfile
docker build -f Dockerfile.dev -t my-app-dev .

# Build with build context from URL
docker build -t my-app https://github.com/user/repo.git

Running Containers

# Basic run
docker run -p 8080:8080 my-app

# Run with environment variables
docker run -e NODE_ENV=production -p 8080:8080 my-app

# Run with volumes
docker run -v /host/path:/container/path my-app

# Run in detached mode
docker run -d --name my-container my-app
Note: This tutorial covers the essential aspects of Dockerfile creation, from basic concepts to advanced techniques. Remember to always test your Dockerfiles thoroughly and follow security best practices for production deployments.