Images and the Layer Mechanism: Pull, Tag, Docker Hub

K
Kai··4 min read

The docker run in Article 3 relies on an image. This article explores the image: where it comes from, how its name is read, what layers it's made of (continuing from Article 2), and how to manage images on your machine. The registry we use throughout is Docker Hub — Docker's default registry.

Reading an image name

A full image name has the form:

[registry/][namespace/]repository[:tag]

   docker.io / library /  nginx     : alpine
   └────┬───┘ └───┬───┘  └──┬──┘     └──┬──┘
     registry  namespace   name       tag
    (default)  (default)

When you type nginx:alpine, Docker fills in the defaults to make docker.io/library/nginx:alpine:

  • The default registry is docker.io (Docker Hub).
  • The library namespace is where the official images live (images vetted by Docker and the publishers). A user's or organization's image has their name as the namespace, e.g. bitnami/postgresql.
  • The default tag is latest if you don't write one. nginx = nginx:latest.

About the latest tag: it does not mean "always the newest version" in any auto-updating sense. It's just the tag used when you don't write one. An image myapp:latest could well be old. So in real environments, write a specific (version) tag rather than relying on latest — a point we return to in Articles 5 and 9.

Pull: fetching an image from Docker Hub

docker run pulls automatically if the image isn't present. But you can also pull explicitly:

docker pull python:3.12-alpine
3.12-alpine: Pulling from library/python
d17f077ada11: Already exists
3124cc6c064b: Pull complete
74bec2074998: Pull complete
05b6ee55fad3: Pull complete
Digest: sha256:...

Notice: the image is downloaded layer by layer (recall Article 2 — an image is a series of layers). The line d17f077ada11: Already exists means that layer is already on the machine, so it isn't re-downloaded.

Layers shared across images

This is where the layer mechanism pays off. Pull a related image:

docker pull python:3.13-alpine
d17f077ada11: Already exists
3070388042c6: Pull complete
...

Again d17f077ada11: Already exists. Both python:3.12-alpine and python:3.13-alpine build on the same Alpine base layer, so that layer is stored once on disk and shared. The more related images you use, the more you save.

   python:3.12-alpine        python:3.13-alpine
        │                          │
   ┌────┴─────┐              ┌─────┴────┐
   │ py3.12 layer │          │ py3.13 layer │   ← different
   ├──────────┤              ├──────────┤
   │   Alpine layer (d17f077ada11)       │   ← SHARED, stored once
   └─────────────────────────────────────┘

Viewing images on your machine

docker images
REPOSITORY   TAG           SIZE
python       3.13-alpine   48.7MB
python       3.12-alpine   ...

The SIZE column is the image's size (including its layers). Note: the total size of the listed images is larger than the real disk usage, because shared layers are counted in each image. To see the actual disk usage: docker system df.

Tag: giving an image another name

docker tag creates a new name pointing to the same image (nothing is copied):

docker tag python:3.12-alpine myapp:v1
docker images --format '{{.Repository}}:{{.Tag}} -> {{.ID}}'
myapp:v1            -> 2fdd31120aa2
python:3.12-alpine  -> 2fdd31120aa2

The same IMAGE ID (2fdd31120aa2) corresponds to two names. A tag is just a label pointing to an image; one image can carry many tags. You'll tag your own image as username/repo:tag before pushing it to Docker Hub.

Image ID and digest: two ways to identify

  • Image ID (2fdd31120aa2...): identifies the image on your machine, computed from the image's config contents.
  • Digest (sha256:...): identifies the image on the registry, and is immutable. Pulling by digest guarantees you get that exact image even if the tag is re-pushed over:
docker pull python@sha256:<digest>

In environments that need high trust, people pin by digest instead of tag, because a tag can be repointed to a different image.

Inspecting an image's layers

Continuing from Article 2, two commands to look inside an image:

docker history python:3.12-alpine

Shows which Dockerfile instruction each layer corresponds to and how large it is — the RUN instructions that install things are usually heavy, while ENV/CMD are 0B.

docker image inspect python:3.12-alpine

Returns full JSON: the list of layers (RootFS.Layers), environment variables, the default command (Cmd), the architecture... Pull out one part with --format:

docker image inspect python:3.12-alpine --format '{{.Os}}/{{.Architecture}}'

Finding an image on Docker Hub

docker search postgres

Lists matching images on Docker Hub with their star counts. However, to see which tags are available and read an image's documentation, the Docker Hub website (hub.docker.com) is more complete than the command line.

Tip for choosing an image: prefer official images (the library namespace) and the -alpine/-slim variants for lightness. Avoid unfamiliar images of unknown origin — you're running someone else's code.

🧹 Cleanup

# remove this article's tags/images
docker rmi myapp:v1 python:3.12-alpine python:3.13-alpine

# remove all images not tied to any container (dangling + unused)
docker image prune -a

docker image prune -a asks for confirmation then removes unused images — very effective for reclaiming disk. Check the usage:

docker system df

Wrap-up

An image has a name of the form registry/namespace/repository:tag, defaulting to Docker Hub and the latest tag. An image is made of read-only layers, and identical layers are shared across many images to save disk — clearly visible through the "Already exists" line when pulling. A tag is just a label pointing to an image; the digest is the fixed identity on the registry.

So far we've only used images made by others. In Article 5 we build our own image: write a Dockerfile, understand how the build cache works layer by layer, and package a real application.