Writing a Dockerfile and Build Cache

So far we've only run images made by others. This article builds an image for our own application using a Dockerfile — a text file describing how to build an image. We also dissect the build cache, which decides whether your build is fast or slow.

What a Dockerfile is

A Dockerfile is the recipe for building an image: each line is an instruction, Docker executes them in order from top to bottom, and (recall Article 2) most instructions create a new layer in the image.

Let's package a small Node.js app. Create a project directory with three files.

package.json:

{
  "name": "demo",
  "version": "1.0.0",
  "main": "server.js",
  "scripts": { "start": "node server.js" },
  "dependencies": { "express": "^4.19.2" }
}

server.js:

const express = require("express");
const app = express();
app.get("/", (req, res) => res.send("Xin chao tu container"));
app.listen(3000, () => console.log("Server chay tren cong 3000"));

Dockerfile:

# Base layer: official Node image, alpine variant for lightness
FROM node:20-alpine

# Working directory inside the image
WORKDIR /app

# Copy the dependency manifest file FIRST, on its own
COPY package.json ./

# Install dependencies
RUN npm install --omit=dev

# Copy the rest of the code AFTER
COPY . .

# Port the app listens on (documentation only)
EXPOSE 3000

# Command to run when the container starts
CMD ["npm", "start"]

The core Dockerfile instructions

FROM — the base image to build on. Every Dockerfile starts with FROM.
WORKDIR — set the working directory; later commands run in this directory.
COPY — copy files from the build machine into the image. (There's a more powerful ADD, but it's surprise-prone; prefer COPY.)
RUN — run a command at build time (install packages, compile...). Each RUN creates a layer.
ENV — set an environment variable.
EXPOSE — declare the port the app uses. It's documentation only and doesn't open the port (you still need -p at run).
CMD — the default command to run when the container starts (runs at run time, not at build time).

Distinguishing RUN and CMD: RUN runs while building the image (its result is baked into a layer); CMD runs when starting the container. A common beginner mistake is thinking CMD runs at build time.

Distinguishing CMD and ENTRYPOINT: both define the command to run when the container starts. The difference: arguments you pass to docker run <image> <args> replace CMD, but are appended to ENTRYPOINT. A common pattern: ENTRYPOINT is the fixed program, CMD is the default arguments that can be overridden. When you're starting out, CMD is enough.

Building the image

From the directory containing the Dockerfile:

docker build -t demo:v1 .

-t demo:v1 gives the image a name:tag. The trailing . is the build context — the directory Docker sends to the daemon to build (Article 1: the build is done by the daemon, so it needs the files sent to it).

Run the image you just built:

docker run --rm -p 3000:3000 demo:v1

Open http://localhost:3000 and you'll see "Xin chao tu container". Ctrl+C to stop.

Build cache: why the second build is faster

Docker caches each layer. On a rebuild, for each instruction it checks: is this layer already in the cache with unchanged inputs? If so it reuses it (CACHED), without re-running.

Build a second time without changing anything:

docker build -t demo:v1 .

 => CACHED [2/5] WORKDIR /app
 => CACHED [3/5] COPY package.json ./
 => CACHED [4/5] RUN npm install --omit=dev
 => CACHED [5/5] COPY . .

Every layer is CACHED — the build is nearly instant, with no dependency reinstall.

The most important rule: the cache cascades down

Per the Docker docs: when a layer changes, "that layer needs to be rebuilt," and "all layers after it must run again." The cache holds only up to the first changed layer; from there down everything rebuilds.

This is why instruction order in a Dockerfile matters. Try editing server.js (change the returned content) and rebuilding:

 => CACHED [2/5] WORKDIR /app
 => CACHED [3/5] COPY package.json ./
 => CACHED [4/5] RUN npm install --omit=dev      ← still CACHED!
 =>        [5/5] COPY . .                          ← rebuilds from here

RUN npm install is still cached even though we edited code, because it sits before COPY . . and its input (package.json) is unchanged. Only COPY . . and below rebuild.

   Dockerfile              build after changing server.js
   ─────────────────────────────────────────────────
   FROM node:20-alpine     ✓ CACHED
   WORKDIR /app            ✓ CACHED
   COPY package.json ./    ✓ CACHED   (package.json unchanged)
   RUN npm install         ✓ CACHED   ← NOT reinstalled, big saving
   COPY . .                ✗ rebuild  ← server.js changed, rebuilds from here
   CMD ["npm","start"]     ✗ rebuild

Why split `COPY package.json` before `COPY . .`

If you merge them into one COPY . . right before RUN npm install, then every time you edit any code file the COPY changes, which drags npm install into rebuilding — reinstalling all dependencies, very slow.

By copying package.json on its own first and then installing, the npm install layer only rebuilds when the dependencies change (i.e. package.json changes), not when the code changes. The general rule from the docs: put the rarely-changing instructions (installation) first, and the often-changing ones (copying code) last.

.dockerignore: don't ship junk into the build context

When building, Docker sends the entire build context (the . directory) to the daemon. Add a .dockerignore file to exclude what you don't need, keeping builds fast and the image tidy:

node_modules
npm-debug.log
.git
.env

Excluding node_modules is especially important with Node: we want dependencies installed inside the image by npm install, not the host machine's node_modules copied in (which could be the wrong architecture, e.g. installed on macOS arm64 but the image runs on linux).

🧹 Cleanup

docker rmi demo:v1
docker builder prune     # remove old build cache to reclaim disk

docker builder prune cleans the build process's cache specifically (separate from images). See how much the build cache is using: docker system df.

Wrap-up

A Dockerfile describes how to build an image; each RUN/COPY creates a layer. The build cache reuses unchanged layers, but when one layer changes, every layer after it rebuilds — so order from rarely-changing (install dependencies) to often-changing (copy code), and copy package.json first to keep the install layer cached. .dockerignore keeps the build context tidy.

Your image now runs, but any data it writes is lost when the container is removed (recall the writable layer in Article 2). Article 6 solves that: volumes and bind mounts for storing data persistently.

Writing a Dockerfile and Build Cache

What a Dockerfile is

The core Dockerfile instructions

Building the image

Build cache: why the second build is faster

The most important rule: the cache cascades down

Why split `COPY package.json` before `COPY . .`

.dockerignore: don't ship junk into the build context

🧹 Cleanup

Wrap-up

Related Posts

AWS-native Observability for EC2 with the CloudWatch Agent

Things GitHub Actions Tutorials Tend to Skip

What a Dockerfile is

The core Dockerfile instructions

Building the image

Build cache: why the second build is faster

The most important rule: the cache cascades down

Why split COPY package.json before COPY . .

.dockerignore: don't ship junk into the build context

🧹 Cleanup

Wrap-up

Related Posts

AWS-native Observability for EC2 with the CloudWatch Agent

Things GitHub Actions Tutorials Tend to Skip

Why split `COPY package.json` before `COPY . .`