Stand Up Six EC2 Machines and Prepare the OS
The previous two articles were theory. From this one we start getting our hands on the infrastructure. Today's goal is small but it's the foundation for every article that follows: stand up six VMs on AWS, place them in a private network with fixed addresses, then prepare the OS so they're ready to run the Kubernetes components. Nothing Kubernetes is installed yet โ just the groundwork and foundations.
Every command below I ran for real on an AWS account, region ap-southeast-1 (Singapore), and the output is real output. You follow along on your own account, changing the region if you like.
๐ฐ Cost
Six instances running on-demand in ap-southeast-1:
lb-0 t3.small (2 vCPU / 2GB) ~$0.026/hour
controller-0..2 t3.medium (2 vCPU / 4GB) ~$0.053/hour ร 3
worker-0..1 t3.medium (2 vCPU / 4GB) ~$0.053/hour ร 2
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
rounded total ~$0.29/hour
Plus a few tens of GB of gp3 EBS (negligible per hour). If you spread the series over several sessions, remember to stop the instances when you take a break (the Cleanup section at the end) โ while stopped you pay no compute, only a very small EBS charge.
The infrastructure we're about to build
One dedicated VPC 10.0.0.0/16, one public subnet 10.0.1.0/24, and six machines with fixed private IPs so config later is easy to remember:
VPC 10.0.0.0/16
โโโ subnet 10.0.1.0/24 (public, ap-southeast-1a)
โโโ 10.0.1.10 lb-0 (HAProxy, later)
โโโ 10.0.1.11 controller-0 โ
โโโ 10.0.1.12 controller-1 โ control plane (etcd + apiserver + ...)
โโโ 10.0.1.13 controller-2 โ
โโโ 10.0.1.20 worker-0 โ
โโโ 10.0.1.21 worker-1 โ where pods run
Fixed IPs matter more than they look: the certificates in Article 4 will embed exactly these IPs in the SAN field, and the etcd config files will point at exactly 10.0.1.11..13. If the IPs jumped on every restart, the whole cluster would break. Within a VPC subnet, an instance keeps the same private IP for its whole lifetime, so we just specify it at creation.
Step 1 โ Prepare the AWS CLI and pick an AMI
This article drives AWS with the aws CLI from your machine. Assume you've configured credentials (aws configure) and have permission to create VPC/EC2. Set the region up front to keep it short:
export AWS_REGION=ap-southeast-1
Find the latest Ubuntu 24.04 LTS (amd64) AMI. Canonical publishes this ID via SSM Parameter Store, so there's no need to hunt for it manually:
aws ssm get-parameter \
--name /aws/service/canonical/ubuntu/server/24.04/stable/current/amd64/hvm/ebs-gp3/ami-id \
--query 'Parameter.Value' --output text
ami-04592e28abc2b9fc9
Pull that ID into a variable to reuse:
export AMI=ami-04592e28abc2b9fc9
Step 2 โ Create the VPC, Internet Gateway, subnet, route table
We don't use the default VPC, we create a dedicated one. The reason: clean isolation, and easy to wipe at the end of the series (deleting the VPC takes everything inside with it). Create the VPC and enable DNS hostnames:
VPC_ID=$(aws ec2 create-vpc --cidr-block 10.0.0.0/16 \
--tag-specifications 'ResourceType=vpc,Tags=[{Key=Name,Value=k8s-scratch-vpc},{Key=Project,Value=k8s-from-scratch}]' \
--query 'Vpc.VpcId' --output text)
aws ec2 modify-vpc-attribute --vpc-id $VPC_ID --enable-dns-hostnames '{"Value":true}'
echo $VPC_ID
vpc-0b0f8006d8d9c0c08
Note the tag Project=k8s-from-scratch attached to every resource โ at the end of the series we filter by this tag to clean up, missing nothing.
An Internet Gateway so the machines can reach the Internet (pull install packages, download binaries), then attach it to the VPC:
IGW_ID=$(aws ec2 create-internet-gateway \
--tag-specifications 'ResourceType=internet-gateway,Tags=[{Key=Project,Value=k8s-from-scratch}]' \
--query 'InternetGateway.InternetGatewayId' --output text)
aws ec2 attach-internet-gateway --vpc-id $VPC_ID --internet-gateway-id $IGW_ID
A subnet 10.0.1.0/24 in AZ ap-southeast-1a, with auto-assign public IP on so any machine created in it gets a public IP we can SSH into:
SUBNET_ID=$(aws ec2 create-subnet --vpc-id $VPC_ID \
--cidr-block 10.0.1.0/24 --availability-zone ${AWS_REGION}a \
--tag-specifications 'ResourceType=subnet,Tags=[{Key=Project,Value=k8s-from-scratch}]' \
--query 'Subnet.SubnetId' --output text)
aws ec2 modify-subnet-attribute --subnet-id $SUBNET_ID --map-public-ip-on-launch
Finally the route table: add a default route 0.0.0.0/0 pointing at the Internet Gateway, then associate the route table with the subnet:
RTB_ID=$(aws ec2 create-route-table --vpc-id $VPC_ID \
--tag-specifications 'ResourceType=route-table,Tags=[{Key=Project,Value=k8s-from-scratch}]' \
--query 'RouteTable.RouteTableId' --output text)
aws ec2 create-route --route-table-id $RTB_ID --destination-cidr-block 0.0.0.0/0 --gateway-id $IGW_ID
aws ec2 associate-route-table --route-table-id $RTB_ID --subnet-id $SUBNET_ID
This route table has one more role at the pod-networking stage (Article 14): we'll add a route for each worker's pod range here. For now it just needs to allow Internet access.
Step 3 โ Security Group and key pair
A Security Group is the firewall at the instance level. We need three inbound rules:
- SSH (22) from your IP for administration.
- 6443 from your IP so
kubectlcan later reach the load balancer. - All traffic between machines in the same group โ so the cluster components (etcd, apiserver, kubelet, pod networking) can talk freely to each other.
SG_ID=$(aws ec2 create-security-group --group-name k8s-scratch-sg \
--description "k8s from scratch lab" --vpc-id $VPC_ID \
--tag-specifications 'ResourceType=security-group,Tags=[{Key=Project,Value=k8s-from-scratch}]' \
--query 'GroupId' --output text)
MYIP=$(curl -s https://checkip.amazonaws.com) # your machine's public IP
# SSH + apiserver from your IP
aws ec2 authorize-security-group-ingress --group-id $SG_ID \
--ip-permissions \
"IpProtocol=tcp,FromPort=22,ToPort=22,IpRanges=[{CidrIp=${MYIP}/32,Description=ssh}]" \
"IpProtocol=tcp,FromPort=6443,ToPort=6443,IpRanges=[{CidrIp=${MYIP}/32,Description=kube-apiserver}]"
# All internal traffic between members of the same SG
aws ec2 authorize-security-group-ingress --group-id $SG_ID \
--ip-permissions "IpProtocol=-1,UserIdGroupPairs=[{GroupId=${SG_ID},Description=intra-cluster}]"
Here my
MYIPis203.0.113.45(replaced with an example IP). Yours will differ; if your home network changes IP, remember to update the SSH rule or you'll be locked out.
Allowing "all internal traffic" in a lab is acceptable and saves opening each port (etcd 2379/2380, apiserver 6443, kubelet 10250, the pod ports...). In production we'd tighten it port by port, but that's another matter.
Create a key pair for SSH, save the private key locally with tight permissions:
mkdir -p ~/k8s-scratch && cd ~/k8s-scratch
aws ec2 create-key-pair --key-name k8s-scratch \
--query 'KeyMaterial' --output text > k8s-scratch.pem
chmod 600 k8s-scratch.pem
Step 4 โ Launch six instances
Now create the machines. Each instance is assigned a fixed private IP (--private-ip-address) and a 20GB gp3 root volume. One easily-missed detail: right after creating each machine, we disable source/destination check. EC2 by default blocks an instance from forwarding packets that aren't its own; but in Article 14 the nodes will route pod traffic for each other, so we have to disable this check now.
A small helper to keep it short, then call it six times:
launch() { # name ip type
local id=$(aws ec2 run-instances \
--image-id $AMI --instance-type $3 --key-name k8s-scratch \
--subnet-id $SUBNET_ID --security-group-ids $SG_ID \
--private-ip-address $2 \
--block-device-mappings 'DeviceName=/dev/sda1,Ebs={VolumeSize=20,VolumeType=gp3}' \
--tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=$1},{Key=Project,Value=k8s-from-scratch}]" \
--query 'Instances[0].InstanceId' --output text)
echo "$1 ($2, $3) -> $id"
aws ec2 modify-instance-attribute --instance-id $id --no-source-dest-check
}
launch lb-0 10.0.1.10 t3.small
launch controller-0 10.0.1.11 t3.medium
launch controller-1 10.0.1.12 t3.medium
launch controller-2 10.0.1.13 t3.medium
launch worker-0 10.0.1.20 t3.medium
launch worker-1 10.0.1.21 t3.medium
lb-0 (10.0.1.10, t3.small) -> i-01e955f527ff25a57
controller-0 (10.0.1.11, t3.medium) -> i-05d8b7584a933394a
controller-1 (10.0.1.12, t3.medium) -> i-0ee0d05a3f68f2b73
controller-2 (10.0.1.13, t3.medium) -> i-07d62211877ed360b
worker-0 (10.0.1.20, t3.medium) -> i-0f1ab7628507cb9cd
worker-1 (10.0.1.21, t3.medium) -> i-0a33782c408f5bf09
Wait for all of them to reach running, then look at the IP table:
aws ec2 wait instance-running \
--filters Name=tag:Project,Values=k8s-from-scratch
aws ec2 describe-instances \
--filters Name=tag:Project,Values=k8s-from-scratch Name=instance-state-name,Values=running \
--query 'Reservations[].Instances[].[Tags[?Key==`Name`]|[0].Value, PrivateIpAddress, PublicIpAddress, InstanceType]' \
--output table
--------------------------------------------------------------
| DescribeInstances |
+---------------+------------+------------------+------------+
| controller-1 | 10.0.1.12 | 203.0.113.12 | t3.medium |
| lb-0 | 10.0.1.10 | 47.129.155.41 | t3.small |
| worker-1 | 10.0.1.21 | 203.0.113.21 | t3.medium |
| controller-2 | 10.0.1.13 | 203.0.113.13 | t3.medium |
| worker-0 | 10.0.1.20 | 203.0.113.20 | t3.medium |
| controller-0 | 10.0.1.11 | 203.0.113.11 | t3.medium |
+---------------+------------+------------------+------------+
The private IPs are exactly as we set them; the public IPs are assigned by AWS (yours will differ, and they change on every stop/start โ we only use them to SSH).
A fixed Elastic IP for lb-0
kubectl from your machine will reach the api-server via the public IP of lb-0. EC2's auto-assigned public IP changes on every stop/start, yet this address has to be permanently embedded in the api-server certificate (Article 4). To avoid breaking it later, we assign lb-0 an Elastic IP โ a fixed public IP that survives stop/start:
LB_ID=$(aws ec2 describe-instances \
--filters Name=tag:Name,Values=lb-0 Name=instance-state-name,Values=running \
--query 'Reservations[0].Instances[0].InstanceId' --output text)
EIP_ALLOC=$(aws ec2 allocate-address --domain vpc \
--tag-specifications 'ResourceType=elastic-ip,Tags=[{Key=Project,Value=k8s-from-scratch}]' \
--query 'AllocationId' --output text)
aws ec2 associate-address --instance-id $LB_ID --allocation-id $EIP_ALLOC
aws ec2 describe-addresses --allocation-ids $EIP_ALLOC --query 'Addresses[0].PublicIp' --output text
203.0.113.10
Note this Elastic IP (here 203.0.113.10) โ Article 4 will put it in the SAN of the api-server cert, and it's also the address kubectl points at. After assigning the Elastic IP, the public IP of lb-0 changes to this very IP.
Step 5 โ Set up SSH for the whole series
The whole series will SSH into these six machines many times. Instead of typing -i key.pem ubuntu@<ip> each time, we write a dedicated SSH config file (without touching the system's ~/.ssh/config), filling in the public IPs we got above:
cat > ~/k8s-scratch/ssh_config <<'EOF'
Host *
User ubuntu
IdentityFile ~/k8s-scratch/k8s-scratch.pem
IdentitiesOnly yes
StrictHostKeyChecking no
UserKnownHostsFile /dev/null
LogLevel ERROR
Host lb-0
HostName 203.0.113.10
Host controller-0
HostName 203.0.113.11
Host controller-1
HostName 203.0.113.12
Host controller-2
HostName 203.0.113.13
Host worker-0
HostName 203.0.113.20
Host worker-1
HostName 203.0.113.21
EOF
IdentitiesOnly yes is worth noting: if your machine has an ssh-agent holding several other keys, ssh will offer those keys one by one first and may get refused by the server for too many attempts, before it even reaches our key. This line forces ssh to use only the declared IdentityFile. Try connecting to controller-0:
ssh -F ~/k8s-scratch/ssh_config controller-0 'hostname; lsb_release -ds; uname -r'
ip-10-0-1-11
Ubuntu 24.04.3 LTS
6.14.0-1018-aws
(The hostname is still the default ip-10-0-1-11 โ we fix it in the next step.)
Step 6 โ Prepare the OS
There are two groups of OS work. The first group applies to all six machines: set the hostname and a shared /etc/hosts file so the machines can call each other by name. The second group is only for the five Kubernetes nodes (controllers + workers): load kernel modules, tune sysctl, disable swap. The lb-0 machine only runs HAProxy so it doesn't need the second group.
Hostname and /etc/hosts (all six machines)
HOSTS_BLOCK='10.0.1.10 lb-0
10.0.1.11 controller-0
10.0.1.12 controller-1
10.0.1.13 controller-2
10.0.1.20 worker-0
10.0.1.21 worker-1'
for host in lb-0 controller-0 controller-1 controller-2 worker-0 worker-1; do
echo "== $host =="
ssh -F ~/k8s-scratch/ssh_config $host "sudo hostnamectl set-hostname $host && \
printf '%s\n' '$HOSTS_BLOCK' | sudo tee /etc/k8s-hosts >/dev/null && \
( echo '# k8s-scratch'; cat /etc/k8s-hosts ) | sudo tee -a /etc/hosts >/dev/null && \
echo hostname=\$(hostname)"
done
== lb-0 ==
hostname=lb-0
== controller-0 ==
hostname=controller-0
== controller-1 ==
hostname=controller-1
== controller-2 ==
hostname=controller-2
== worker-0 ==
hostname=worker-0
== worker-1 ==
hostname=worker-1
Now from any machine, ping controller-1 or ping worker-0 resolves to the right private IP.
Kernel modules, sysctl, swap (the five k8s nodes)
These three are foundational Kubernetes requirements, and here's the reason for each:
- The
overlaymodule: containerd uses the overlay filesystem to layer container images. - The
br_netfiltermodule + thebridge-nf-call-iptablessysctl: lets traffic crossing a Linux bridge (the pod network) be seen by iptables, so kube-proxy's rules can apply to pod packets. net.ipv4.ip_forward=1: enables packet forwarding between interfaces โ required for a node to route pod traffic.- Disable swap: kubelet by default refuses to run while swap is on, because swap muddles memory management and pod QoS.
Run on the five k8s nodes:
for host in controller-0 controller-1 controller-2 worker-0 worker-1; do
echo "== $host =="
ssh -F ~/k8s-scratch/ssh_config $host 'bash -s' <<'PREP'
set -e
# Kernel modules
printf 'overlay\nbr_netfilter\n' | sudo tee /etc/modules-load.d/k8s.conf >/dev/null
sudo modprobe overlay && sudo modprobe br_netfilter
# sysctl
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf >/dev/null
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
sudo sysctl --system >/dev/null
# Disable swap
sudo swapoff -a
sudo sed -i '/\bswap\b/s/^/#/' /etc/fstab 2>/dev/null || true
# Verify
echo "modules: $(lsmod | grep -E '^(overlay|br_netfilter)' | awk '{print $1}' | sort | tr '\n' ' ')"
echo "ip_forward=$(cat /proc/sys/net/ipv4/ip_forward) bridge-nf=$(cat /proc/sys/net/bridge/bridge-nf-call-iptables)"
echo "swap: $(free -h | awk '/Swap/{print $2}')"
PREP
done
== controller-0 ==
modules: br_netfilter overlay
ip_forward=1 bridge-nf=1
swap: 0B
== controller-1 ==
modules: br_netfilter overlay
ip_forward=1 bridge-nf=1
swap: 0B
== controller-2 ==
...
swap: 0B
Putting the modules in /etc/modules-load.d/ and the sysctl in /etc/sysctl.d/ ensures they survive a reboot โ manual modprobe and sysctl only apply to the current session. (Ubuntu on EC2 doesn't enable swap by default, so swap: 0B is the expected result; the swapoff step here is to be sure and to get you used to the operation.)
Step 7 โ Install tools on the workstation
Generating certificates (Article 4) and running kubectl we do from our own machine (the workstation), then push the files up to the nodes. We need two things: kubectl at the cluster version, and CloudFlare's cfssl and cfssljson to build the PKI.
Pin kubectl to v1.36.1 to match the cluster (change darwin/arm64 to linux/amd64 if your workstation is Linux):
cd ~/k8s-scratch
curl -sLO https://dl.k8s.io/release/v1.36.1/bin/darwin/arm64/kubectl
chmod +x kubectl
./kubectl version --client
Client Version: v1.36.1
Kustomize Version: v5.8.1
cfssl and cfssljson โ on macOS install via Homebrew, on Linux download the binaries from cfssl's GitHub:
# macOS
brew install cfssl
# or Linux:
# curl -sL -o /usr/local/bin/cfssl https://github.com/cloudflare/cfssl/releases/download/v1.6.5/cfssl_1.6.5_linux_amd64
# curl -sL -o /usr/local/bin/cfssljson https://github.com/cloudflare/cfssl/releases/download/v1.6.5/cfssljson_1.6.5_linux_amd64
# chmod +x /usr/local/bin/cfssl /usr/local/bin/cfssljson
cfssl version
Version: 1.6.5
Runtime: go1.25.1
So the workstation is now fully equipped. In Article 4 we'll sit down and use cfssl to sign all the certificates in Article 2's PKI diagram.
๐งน Cleanup
This series spans many articles, so don't delete the infrastructure mid-way. But to avoid paying while you take breaks between sessions, stop the instances โ while stopped you pay no compute, only a very small EBS charge:
# Get the instance ids by tag then stop them all
IDS=$(aws ec2 describe-instances \
--filters Name=tag:Project,Values=k8s-from-scratch Name=instance-state-name,Values=running \
--query 'Reservations[].Instances[].InstanceId' --output text)
aws ec2 stop-instances --instance-ids $IDS
When you come back to continue, start-instances then update the public IPs in ssh_config (the private IPs stay the same, so cluster config is unaffected):
aws ec2 start-instances --instance-ids $IDS
The full teardown (instances, VPC, IGW, subnet, SG, key pair) is saved for Article 23 โ at which point we filter by the tag Project=k8s-from-scratch and remove everything.
Wrap-up
We now have a private network with six fixed-address machines, an OS tuned correctly for Kubernetes, and a fully-equipped workstation. The infrastructure so far is "empty" โ not a single Kubernetes component yet. That's deliberate: we want to lay each brick consciously.
Article 4 is the first truly "Kubernetes" step: use cfssl to create three CAs and all the certificates for each component โ apiserver, each kubelet, controller-manager, scheduler, kube-proxy, the etcd client โ with exactly the CN/O and SAN fields discussed in Article 2. This is the part kubeadm keeps hidden, and the most worthwhile part to do by hand.