Writing a Dockerfile and Build Cache
So far we've only run images made by others. This article builds an image for our own application using a Dockerfile — a text file describing how to build an image. We also dissect the build cache, which decides whether your build is fast or slow.
What a Dockerfile is
A Dockerfile is the recipe for building an image: each line is an instruction, Docker executes them in order from top to bottom, and (recall Article 2) most instructions create a new layer in the image.
Let's package a small Node.js app. Create a project directory with three files.
package.json:
{
"name": "demo",
"version": "1.0.0",
"main": "server.js",
"scripts": { "start": "node server.js" },
"dependencies": { "express": "^4.19.2" }
}
server.js:
const express = require("express");
const app = express();
app.get("/", (req, res) => res.send("Xin chao tu container"));
app.listen(3000, () => console.log("Server chay tren cong 3000"));
Dockerfile:
# Base layer: official Node image, alpine variant for lightness
FROM node:20-alpine
# Working directory inside the image
WORKDIR /app
# Copy the dependency manifest file FIRST, on its own
COPY package.json ./
# Install dependencies
RUN npm install --omit=dev
# Copy the rest of the code AFTER
COPY . .
# Port the app listens on (documentation only)
EXPOSE 3000
# Command to run when the container starts
CMD ["npm", "start"]
The core Dockerfile instructions
- FROM — the base image to build on. Every Dockerfile starts with FROM.
- WORKDIR — set the working directory; later commands run in this directory.
- COPY — copy files from the build machine into the image. (There's a more powerful
ADD, but it's surprise-prone; prefer COPY.) - RUN — run a command at build time (install packages, compile...). Each RUN creates a layer.
- ENV — set an environment variable.
- EXPOSE — declare the port the app uses. It's documentation only and doesn't open the port (you still need
-patrun). - CMD — the default command to run when the container starts (runs at run time, not at build time).
Distinguishing RUN and CMD: RUN runs while building the image (its result is baked into a layer); CMD runs when starting the container. A common beginner mistake is thinking CMD runs at build time.
Distinguishing CMD and ENTRYPOINT: both define the command to run when the container starts. The difference: arguments you pass to
docker run <image> <args>replace CMD, but are appended to ENTRYPOINT. A common pattern: ENTRYPOINT is the fixed program, CMD is the default arguments that can be overridden. When you're starting out, CMD is enough.
Building the image
From the directory containing the Dockerfile:
docker build -t demo:v1 .
-t demo:v1 gives the image a name:tag. The trailing . is the build context — the directory Docker sends to the daemon to build (Article 1: the build is done by the daemon, so it needs the files sent to it).
Run the image you just built:
docker run --rm -p 3000:3000 demo:v1
Open http://localhost:3000 and you'll see "Xin chao tu container". Ctrl+C to stop.
Build cache: why the second build is faster
Docker caches each layer. On a rebuild, for each instruction it checks: is this layer already in the cache with unchanged inputs? If so it reuses it (CACHED), without re-running.
Build a second time without changing anything:
docker build -t demo:v1 .
=> CACHED [2/5] WORKDIR /app
=> CACHED [3/5] COPY package.json ./
=> CACHED [4/5] RUN npm install --omit=dev
=> CACHED [5/5] COPY . .
Every layer is CACHED — the build is nearly instant, with no dependency reinstall.
The most important rule: the cache cascades down
Per the Docker docs: when a layer changes, "that layer needs to be rebuilt," and "all layers after it must run again." The cache holds only up to the first changed layer; from there down everything rebuilds.
This is why instruction order in a Dockerfile matters. Try editing server.js (change the returned content) and rebuilding:
=> CACHED [2/5] WORKDIR /app
=> CACHED [3/5] COPY package.json ./
=> CACHED [4/5] RUN npm install --omit=dev ← still CACHED!
=> [5/5] COPY . . ← rebuilds from here
RUN npm install is still cached even though we edited code, because it sits before COPY . . and its input (package.json) is unchanged. Only COPY . . and below rebuild.
Dockerfile build after changing server.js
─────────────────────────────────────────────────
FROM node:20-alpine ✓ CACHED
WORKDIR /app ✓ CACHED
COPY package.json ./ ✓ CACHED (package.json unchanged)
RUN npm install ✓ CACHED ← NOT reinstalled, big saving
COPY . . ✗ rebuild ← server.js changed, rebuilds from here
CMD ["npm","start"] ✗ rebuild
Why split COPY package.json before COPY . .
If you merge them into one COPY . . right before RUN npm install, then every time you edit any code file the COPY changes, which drags npm install into rebuilding — reinstalling all dependencies, very slow.
By copying package.json on its own first and then installing, the npm install layer only rebuilds when the dependencies change (i.e. package.json changes), not when the code changes. The general rule from the docs: put the rarely-changing instructions (installation) first, and the often-changing ones (copying code) last.
.dockerignore: don't ship junk into the build context
When building, Docker sends the entire build context (the . directory) to the daemon. Add a .dockerignore file to exclude what you don't need, keeping builds fast and the image tidy:
node_modules
npm-debug.log
.git
.env
Excluding node_modules is especially important with Node: we want dependencies installed inside the image by npm install, not the host machine's node_modules copied in (which could be the wrong architecture, e.g. installed on macOS arm64 but the image runs on linux).
🧹 Cleanup
docker rmi demo:v1
docker builder prune # remove old build cache to reclaim disk
docker builder prune cleans the build process's cache specifically (separate from images). See how much the build cache is using: docker system df.
Wrap-up
A Dockerfile describes how to build an image; each RUN/COPY creates a layer. The build cache reuses unchanged layers, but when one layer changes, every layer after it rebuilds — so order from rarely-changing (install dependencies) to often-changing (copy code), and copy package.json first to keep the install layer cached. .dockerignore keeps the build context tidy.
Your image now runs, but any data it writes is lost when the container is removed (recall the writable layer in Article 2). Article 6 solves that: volumes and bind mounts for storing data persistently.