How Lambda Runs Your Code Internally

K
Kai··8 min read

The REPORT line in the previous article had a field we glossed over: Init Duration. It's a window into how Lambda actually runs code, and understanding it explains most of what's puzzling about both the performance and cost of serverless. This article goes down to that layer: how Lambda's execution environment lives and dies, why cold start exists, and why a function with more memory can be both faster and cheaper.

Goal

Understand the three phases in an execution environment's lifecycle (Init, Invoke, Shutdown), see that static code runs once while the handler runs per request, measure cold start for real, and measure the relationship between memory and CPU with the same computation at two memory levels. The measurements in this article run on a real function then delete it immediately, cost negligible.

The execution environment: where code actually runs

When an event arrives, Lambda doesn't run your code directly on a shared machine. It sets up an execution environment, an isolated and secure environment holding the language runtime, your code, and any accompanying extensions. The docs describe it: "Lambda invokes your function in an execution environment, which provides a secure and isolated runtime environment. The execution environment manages the resources required to run your function."

An execution environment handles one request at a time. If ten requests arrive at once, Lambda sets up as many as ten environments running in parallel. Each environment has a three-phase lifecycle, and most of the performance behavior lives there.

Three phases: Init, Invoke, Shutdown

The docs split the lifecycle into three phases. Init is when the environment is set up: Lambda starts the extensions, bootstraps the runtime, then runs the function's static code (the code outside the handler, which runs when the module is loaded). The Init phase is capped at 10 seconds; if it doesn't finish in that window, Lambda retries on the first invoke with the function's timeout.

Invoke is when your handler runs to process an event. An environment can go through the Invoke phase many times, once per request.

Shutdown happens when Lambda decides to tear down the environment (usually after a period with no requests). Lambda sends a shutdown signal to the runtime and extensions so they can clean up, then destroys the environment.

The key point is between the Invokes: the docs say "Lambda freezes the execution environment when the runtime and each extension have completed and there are no pending events". The environment isn't destroyed right after a request; it's frozen and can be thawed to serve the next request. When that happens the Init phase doesn't run again, the static code doesn't run again, and the request goes straight into the handler.

   Cold start (new environment)         Warm (reuse a frozen environment)

   ┌──── Init phase ────┐
   │ extension init     │
   │ runtime init       │   runs once
   │ FUNCTION init      │   (static code)
   └────────┬───────────┘
            ▼
   ┌── Invoke ──┐  ┌── Invoke ──┐  ┌── Invoke ──┐   ...
   │  handler   │  │  handler   │  │  handler   │
   └────────────┘  └────────────┘  └────────────┘
        ▲ cold        ▲ warm          ▲ warm
   (has Init Duration) (no Init)     (no Init)
                                          │ no more requests, a while later
                                          ▼
                                   ┌── Shutdown ──┐
                                   └──────────────┘

Cold start, seen in real numbers

Cold start is when a request lands at the moment Lambda has to set up a new environment: that request has to wait for the Init phase to finish before the handler runs. To see it clearly, we deploy a function whose static code deliberately does a bit of work, then call it three times in a row.

The static code (outside the handler) prints a log line and runs a small loop, while the handler just prints a line then returns:

// Static code: runs ONCE when the environment is initialized (Init phase),
// does NOT run again on warm invokes.
console.log("STATIC INIT: module dang duoc nap");
let acc = 0;
for (let i = 0; i < 8_000_000; i++) acc += i;
const loadedAt = new Date().toISOString();

export const handler = async () => {
  console.log("HANDLER: dang xu ly mot invoke");
  return { ok: true, loadedAt, acc };
};

Call it three times with aws lambda invoke --log-type Tail (this flag returns the tail of the run's logs, including the REPORT line):

=== INVOKE #1 (cold — new environment) ===
STATIC INIT: module dang duoc nap
HANDLER: dang xu ly mot invoke
REPORT Duration: 17.80 ms  Billed Duration: 205 ms  Memory Size: 128 MB  Max Memory Used: 82 MB  Init Duration: 187.16 ms

=== INVOKE #2 (warm — reuse environment) ===
HANDLER: dang xu ly mot invoke
REPORT Duration: 1.84 ms  Billed Duration: 2 ms  Memory Size: 128 MB  Max Memory Used: 82 MB

=== INVOKE #3 (warm again) ===
HANDLER: dang xu ly mot invoke
REPORT Duration: 2.62 ms  Billed Duration: 3 ms  Memory Size: 128 MB  Max Memory Used: 82 MB

Three things to read here, all matching the theory above. The first run has a STATIC INIT line, the second and third don't; that is, static code runs only once when the environment is set up, then the environment is reused. The first run has Init Duration: 187.16 ms in the REPORT, the next two don't have that field; that is precisely the cold start cost, the time to set up the environment. And the actual processing time differs sharply: the cold run took 17.80 ms while the warm runs took just 1.84 ms, because the warm runs don't have to reload anything.

Note on the cold line: Billed Duration is 205 ms while Duration is only 17.80 ms. The difference is exactly the Init Duration, meaning on this cold run the initialization time is also billed. That's one more reason to care about cold start beyond latency.

Practical consequence: where to put the heavy work

Because static code runs once then is reused across many warm invokes, this is the place to put expensive initialization shared by many requests: creating an AWS SDK client, opening a connection, reading configuration, loading a library. Putting them outside the handler means they pay the price once per environment, not per request. The docs call this optimizing static initialization. We'll apply this principle from the API article onward: the DynamoDB client is created at module level, not inside the handler.

Conversely, don't put things that depend on a specific request at the static level, because they'll get "stuck" from the first cold run and shared across every later warm request, producing hard-to-find bugs.

Memory and CPU: one lever, not two

Lambda's memory configuration isn't just memory. The docs say it plainly: "Lambda allocates CPU power in proportion to the amount of memory configured... At 1,769 MB, a function has the equivalent of one vCPU." Raising memory raises CPU proportionally too. The default 128 MB is the lowest level, and the docs recommend using 128 MB only for simple functions like event forwarders.

To see this in numbers, we put a CPU-heavy computation in the handler (50 million square roots), then measure at two memory levels. At 128 MB:

CPU work xong trong 2594 ms
REPORT Duration: 2655.47 ms  Billed Duration: 2799 ms  Memory Size: 128 MB  Max Memory Used: 82 MB

Raise the function to 1769 MB (the level equivalent to one full vCPU) then run the exact same computation:

CPU work xong trong 88 ms
REPORT Duration: 89.70 ms  Billed Duration: 90 ms  Memory Size: 1769 MB  Max Memory Used: 84 MB

Same work, from 2594 ms down to 88 ms, about 29 times faster. Notably, Max Memory Used on both runs is around 82–84 MB; this function isn't short on memory, it's short on CPU. What we buy by raising memory here is entirely compute power.

The counterintuitive part is cost. Lambda bills by memory times time, so do the multiplication:

128 MB:  128 × 2799 ms = 357.872 (MB·ms)
1769 MB: 1769 × 90 ms  = 159.210 (MB·ms)

The 1769 MB level isn't just 29 times faster but also about 2.25 times cheaper for the same work. For a CPU-bound function, keeping memory low is both slower and more expensive. This is why article 15 will use a measuring tool (AWS Lambda Power Tuning, which the docs introduce) to find the optimal memory level instead of guessing. For a function that only forwards events, like most functions in this series, 128 MB is reasonable; for a compute-heavy function, that level is a cost trap.

arm64: why we choose Graviton

The template in the previous article had the line Architectures: [arm64]. Lambda lets you run on AWS's Graviton (arm64) CPU alongside x86. For most Node workloads, arm64 is cheaper per unit of time and usually as fast as or faster than x86. Since our code is TypeScript compiled to JavaScript running on the Node runtime available for both architectures, choosing arm64 is almost the default choice that costs nothing. When you do need x86, it's usually because of a native library only built for x86, which we won't run into in this series.

🧹 Cleanup

The function used for measuring in this article sits in its own stack (coldstart-demo) and was deleted right after the measurements:

$ sam delete --stack-name coldstart-demo --no-prompts --region ap-southeast-1
Deleted successfully

No resources kept from this article.

Wrap-up

Lambda runs code in an execution environment with three phases. Static code runs once in the Init phase and is reused across warm invokes, so cold start is the price paid for setting up a new environment, measurable via Init Duration. Memory and CPU are one shared lever: for heavy work, raising memory can be both faster and cheaper. And arm64 is a reasonable default choice for Node code.

The foundation part ends here. The next article steps into the product core: building a real API with API Gateway. We'll compare HTTP API with REST API and choose the right one, define routes for creating and resolving links, then handle CORS properly so the dashboard article later doesn't break.