The Real Bill: How Much Serverless Costs, and Where It Saves

K
Kai··5 min read

Every serverless architecture eventually has to answer one question: how much does it cost? The serverless answer differs from a traditional server in that cost tracks usage rather than uptime. This article doesn't theorize about pricing; it opens the actual bill of the product built across the series, then explains the pricing model and where the design saved money.

Goal

See the stack's real cost via Cost Explorer, understand the pricing model of each major service, and recognize where the series' design choices cut money. Viewing the cost itself costs nothing.

The real bill: near zero

Asking Cost Explorer for cost by service over the few days of building and testing the whole series, the result for each serverless service is zero or so small it's unreadable:

$ aws ce get-cost-and-usage --time-period Start=2026-05-24,End=2026-05-27 \
    --granularity DAILY --metrics UnblendedCost --group-by Type=DIMENSION,Key=SERVICE
...
Amazon Simple Queue Service          0
Amazon Simple Notification Service   0
AmazonCloudWatch                     0
Amazon Elastic Compute Cloud         0
Amazon Simple Storage Service        0.0000473711
...

The services that show in the list (SQS, SNS, CloudWatch...) are all 0. Lambda, DynamoDB, API Gateway, EventBridge, and Step Functions don't appear in the result at all: Cost Explorer only lists services that incurred cost, so their absence means the charge is zero (inside the free tier). The only visible charge is a few thousandths of a cent for S3, where SAM stashes the code package on deploy. Building a complete system (API, database, auth, event bus, queue, realtime, state machine, observability) and testing it many times cost an amount not worth mentioning.

The reason comes down to two things. First, serverless doesn't charge while idle: no requests means the compute part is zero, unlike an EC2 or RDS running 24/7 that's billed even when idle. Second, usage during testing fits comfortably inside the free tier. This is what makes serverless suit both learning and products with erratic load: cost contracts with usage, and when usage is small the bill is small too.

Pricing model per service

Understanding why it's nearly free requires knowing how each service charges. The numbers below are for modeling the pricing only; specific prices change by date and region, so always check the official pricing page before doing serious math.

  • Lambda charges per invocation (around $0.20 per million requests) plus run time times memory (GB-seconds). The always-free tier is fairly generous: one million requests and 400,000 GB-seconds per month. A URL shortener usually doesn't get close to that.
  • API Gateway HTTP API charges per request, and this is where the choice in Article 03 pays off: HTTP API is several times cheaper than REST API per million requests. Choosing HTTP API from the start was a cost decision, not just a feature one.
  • DynamoDB on-demand charges per read/write unit and storage, with no charge for provisioned capacity. When no one is using it, all that's left is storage for a few items, almost nothing. An empty table sitting overnight produces no meaningful bill.
  • EventBridge charges per million custom events; SQS per million requests (the first million each month free); Step Functions Standard per thousand state transitions (4,000 per month free).
  • Cognito is free up to tens of thousands of monthly active users. CloudWatch charges by log volume ingested, by alarm, and by dashboard; X-Ray is free for a certain amount of traces per month before charging.
  • WebSocket API charges per million messages and per connection-minute, also with a first-year free tier.

The common thread: everything charges by usage, and most have a free threshold. A new product or light load usually stays under that threshold, so the real bill is zero or a few cents.

Where the design saved money

Many decisions scattered across the series weren't just technical but also cost decisions, even when we discussed them for other reasons.

Choosing HTTP API over REST API (Article 03) cuts the per-request cost to a fraction. Choosing arm64 (Article 02) gives cheaper compute than x86 at the same duration. DynamoDB on-demand (Article 04) drops provisioned-capacity cost, suiting erratic load and keeping the idle bill near zero. Pushing metrics with EMF via logs (Article 13) instead of separate PutMetricData API calls means no extra API calls. Single-table (Article 04) keeps the resource count minimal. Sparse GSI (Article 05) indexes only the items that need it, reducing storage and index writes. And TTL auto-deleting markers (Article 10) keeps the table from growing over time.

None of those were done only to save money, but together they explain why a fully featured system still costs almost nothing under light load.

Watch out for what charges while idle

The flip side to remember: a few things in and around a serverless architecture don't follow the pay-per-use model. A CloudWatch dashboard charges a fixed monthly fee (with a small free allowance). A NAT Gateway, if you put Lambda in a private VPC with Internet access, charges by the hour regardless of traffic. Provisioned concurrency, if enabled, keeps the environment warm continuously and so charges even when no one is calling. This product avoids all of them: no VPC, no NAT, no provisioned concurrency, and just one dashboard. When you add those components, the bill changes character, from "per use" to "with a fixed part," and that's when it needs closer watching.

Wrap-up

The real bill for the whole series is just a few thousandths of a cent, because serverless doesn't charge while idle and usage stays inside the free tier. Each service charges by usage with a free threshold, and many design choices scattered across the series (HTTP API, arm64, on-demand, EMF, single-table, sparse index, TTL) together keep cost at the absolute minimum. What needs watching is the handful of components that charge a fixed fee or by the hour, the ones this product deliberately doesn't use.

The cost is low partly because we haven't thrown real load at it yet. The next article does exactly that: load test with k6, push a large volume of requests at the system, then read the metrics and traces back under load to see how it scales and where the bottleneck is, especially with the concurrency limit of 10 we've hit several times.