What Serverless Is and When to Use It

Running a small service on a traditional server means you keep a machine always on: you pay for it 24/7 even when no one is calling, you patch the operating system, you watch for the disk filling up, and you handle scaling yourself when traffic spikes. Most of that effort has nothing to do with the business logic you want to write. Serverless is the way to push all that "keep a machine alive" work onto the provider, leaving you with just code and the events that trigger it.

This series builds a real serverless product end to end on AWS, then operates it to production grade. The first article creates no resources. We need to agree on what serverless really is, what it trades away, when to avoid it, and to sketch out the product and its architecture so that in later articles, as we type commands, we know what piece fits where.

Serverless does not mean "no servers"

There are still servers running your code. The difference is that you don't see them, don't provision them, and don't pay when they're idle. The AWS documentation defines Lambda in one tidy sentence: "AWS Lambda is a compute service that runs code without the need to manage servers. Your code runs, scaling up and down automatically, with pay-per-use pricing." The three ideas in that sentence are the whole spirit of serverless.

First, no server management. AWS handles the infrastructure: "Lambda runs your code on a high-availability compute infrastructure and manages all the computing resources, including server and operating system maintenance, capacity provisioning, automatic scaling, and logging." You hand over a function, AWS handles where it runs.

Second, automatic scaling. One request, one copy of the function runs. A thousand requests arriving at once, and Lambda spins up many copies running in parallel, then reels them back in when the work is done. You don't configure an Auto Scaling Group, you don't set a CPU threshold.

Third, pay-per-use. With no invocations, the compute portion of your bill is zero. This is what makes serverless fit workloads with erratic load: it expands itself at peak, and costs nothing when quiet.

The price for that convenience is a few constraints. Each Lambda function run is capped at 15 minutes (the docs describe "Lambda functions run for up to 15 minutes"; recently there are also durable functions that run up to a year for multi-step workflows, but that's a separate model). The function holds no state between invocations, so anything that needs to be remembered has to be pushed out to a database or storage. And because the code runs in an environment AWS sets up per event, sometimes a request has to wait for AWS to initialize a new environment before the code runs. That waiting latency is called cold start, and we'll dissect it in its own article.

Event-driven: the mindset that comes with it

Serverless on AWS is bound up with the event-driven architecture. A function doesn't run on its own; it runs when something triggers it: an HTTP request to API Gateway, a new file in S3, a changed record in DynamoDB, an event pushed to EventBridge. The docs list exactly the kind of work Lambda was made for: processing files on upload, reacting to database changes, running scheduled tasks, processing realtime data streams, serving as a backend for web and mobile.

This mindset differs from a traditional web app, where one process runs continuously and decides for itself what to do next. Here, small components sit idle until an event calls them, do exactly one thing, then shut down. Stitch many such pieces together via events and you get a system where nothing runs 24/7 yet still reacts instantly when needed.

   Traditional way                       Event-driven way (serverless)

  ┌────────────────────┐            event   ──▶ ┌─────────┐
  │  process running    │           HTTP    ──▶ │  func A │──▶ (done, off)
  │  continuously 24/7  │           file    ──▶ ┌─────────┐
  │  self-looping       │           DB      ──▶ │  func B │──▶ (done, off)
  │  (pays even when    │           change       ┌─────────┐
  │   idle)             │           timer   ──▶ │  func C │──▶ (done, off)
  └────────────────────┘                        └─────────┘
                                       no event = no run = no cost

When NOT to use serverless

Serverless is not the default choice for everything. There are workloads where it's more expensive or more awkward than a container running steadily.

High, steady load all day is the first case. If your system takes large, stable traffic 24/7, the pay-per-invocation model added up can cost more than an hourly-rented machine running at full capacity. Serverless wins on erratic load; flat, high load erases that advantage.

A requirement for extremely low and stable latency is the second case. Cold start adds tens to hundreds of milliseconds to the requests that happen to land when a new environment must be initialized. For most APIs that's acceptable, but a system that needs sub-millisecond latency stable across every request will struggle to live with this characteristic (we have ways to reduce cold start, but reduce is not eliminate).

Tasks that run very long or need to hold heavy in-memory state between steps also don't fit the 15-minute box and the stateless model of a single invocation. Finally, the vendor lock-in constraint: a serverless architecture binds fairly tightly to one provider's services, so moving elsewhere costs more effort than an app packaged in a standard container.

The product in this series, a URL shortening service, lands right in the zone where serverless shines: erratic load (a viral link spikes, otherwise quiet), each request is short, and most of the time the system does nothing.

The product we'll build

A URL shortener sounds simple, but doing it properly touches nearly all the important pieces of serverless. Here's what it will do:

A user logs in, submits a long URL, gets back a short code (e.g. https://sho.rt/aK9x).
Someone opens the short link, the system redirects them to the original URL, and at the same time records a click.
Each click fires an event; another component aggregates those events into metrics (total clicks, clicks over time).
The link owner opens a dashboard and sees the click count update in realtime every time someone clicks, with no page reload.
Each person only sees and manages their own links (multi-tenant).

Within that frame, we'll touch: storage with a compact DynamoDB table design, user authentication, recording events asynchronously so they're not double-counted and not lost, pushing data to the browser in realtime, orchestrating a multi-step process, then observing and optimizing the whole thing under real load.

The overall architecture

This is the picture we're aiming for at the end of the series. Don't try to understand all of it now; each article builds and dissects one piece, and we return to this diagram many times.

                    ┌──────────── Cognito (login, JWT) ───────────────┐
                    │                                                 │ auth
   Browser ────HTTPS──▶ API Gateway (HTTP API)                        │
        │           │        │                                       │
        │  POST /links ──────┼──▶ Lambda create ──▶ DynamoDB (single-table)
        │  GET /{code} ──────┼──▶ Lambda resolve ─┬─▶ 301 redirect
        │           │        │                    └─▶ EventBridge (click event)
        │           │        │                            │
        │           │        │           ┌──── rule ──────┤
        │           │        │           ▼                ▼
        │           │        │   Lambda aggregator    Step Functions
        │           │        │   (idempotent + DLQ)   (link moderation)
        │           │        │        │
        │           │        │        ▼
        │           │        │   DynamoDB (counter)
        │           │        │        │
        └───WebSocket────────┴────────┴──▶ push realtime click count to dashboard

   Observability: Lambda Powertools (log/trace/metric) + X-Ray + CloudWatch
   Operations: SAM (infra) · CI/CD canary + rollback · IAM least-privilege · WAF

Each box is an AWS service, each arrow is an event or a call. No box is a server you have to keep alive. When no one's using it, almost everything sits at rest and the compute bill goes to zero.

Tools and conventions

All infrastructure in the series is built with AWS SAM (Serverless Application Model), a way to declare serverless resources more compactly than raw CloudFormation. Function code is written in Node.js + TypeScript. Every command runs for real on an AWS account in region ap-southeast-1 (Singapore), and every output in the articles is real output. Account IDs and personal resource names are masked with dummy values (e.g. 111122223333). The hands-on code lives at github.com/nghiadaulau/serverless-url-shortener-aws.

A note on cost before we start: because the product is fully serverless, most of the time it incurs no charge. Articles that do create resources will state the estimated cost, and the final article cleans everything up with one command. The estimate for the whole series is in the range of a few dollars, most of it in the load-testing article.

Roadmap

The series runs in six parts. The first part lays the foundation: the SAM environment, and understanding how Lambda runs internally. Part two builds the product core: the API, and a DynamoDB single-table design for the URL shortener. Part three adds login and splits data per user. Part four is the event-driven part: emitting and consuming click events, pushing realtime, orchestrating the process. Part five turns it into production: observability with X-Ray, measuring and cutting cold start, tightening security. Part six is operations: CI/CD, dissecting the bill, load testing, then cleanup.

The next article builds the working environment: install the SAM CLI, create the project skeleton, deploy a first Lambda function to AWS and call it, and equally importantly, delete it cleanly with one command.