Observability: Lambda Powertools and X-Ray Tracing
The product works end to end, but when something breaks in production, "it works" isn't enough. You need to see where a request went, where it was slow, and what its logs say. Part V of the series is about operating like the real thing, and the first brick is observability. This article wires in tooling so every request leaves a readable trail.
Goal
Wire Lambda Powertools into the handler for structured JSON logs, X-Ray tracing, and custom metrics; enable X-Ray on the functions; then read a real trace to see the request's path through DynamoDB and EventBridge. X-Ray and CloudWatch costs for a small test volume are negligible.
Why ordinary logs aren't enough
A line like console.log("resolved " + code) is readable to the eye but hard for a machine to filter. With thousands of lines, you want to filter by link code, by request id, by whether it was a cold start, or to link a log line to the trace of that exact request. That needs structured logs: each line a JSON with consistent fields. Lambda Powertools is AWS's official library suite that handles exactly the three observability jobs: Logger (structured logs), Tracer (X-Ray), and Metrics (pushing metrics as EMF).
Wiring in Powertools
Three instances shared across every handler, placed in one file:
import { Logger } from "@aws-lambda-powertools/logger";
import { Tracer } from "@aws-lambda-powertools/tracer";
import { Metrics } from "@aws-lambda-powertools/metrics";
export const logger = new Logger({ serviceName: "url-shortener" });
export const tracer = new Tracer({ serviceName: "url-shortener" });
export const metrics = new Metrics({ namespace: "UrlShortener", serviceName: "url-shortener" });
To make AWS SDK calls show up as subsegments in the trace, wrap the client with tracer.captureAWSv3Client. Since the DynamoDB client lives in a shared file, you only need to wrap it once:
import { tracer } from "./powertools.js";
const client = tracer.captureAWSv3Client(new DynamoDBClient({}));
The resolve handler uses middy to compose the three Powertools middleware, the recommended standard approach:
const lambdaHandler = async (event) => {
// ...GetItem, PutEvents as before...
logger.info("link resolved", { code });
metrics.addMetric("LinkResolved", MetricUnit.Count, 1);
return { statusCode: 301, headers: { location: target }, body: "" };
};
export const handler = middy(lambdaHandler)
.use(captureLambdaHandler(tracer)) // create a subsegment for the handler in X-Ray
.use(injectLambdaContext(logger)) // attach request id, cold_start... to every log
.use(logMetrics(metrics)); // push metrics when the handler finishes
Enabling X-Ray on every function is just one line in the template:
Globals:
Function:
Tracing: Active
SAM automatically adds the X-Ray write permission to the function when you enable Tracing: Active, no hand-written IAM.
Structured logs, read for real
After deploying and opening a link a few times, the resolver's log line is no longer free-form text but a JSON full of context:
{
"level": "INFO",
"message": "link resolved",
"timestamp": "2026-05-25T17:27:37.027Z",
"service": "url-shortener",
"cold_start": false,
"function_name": "url-shortener-ResolveLinkFunction-rc7lUf4kG5u5",
"function_request_id": "80bd3dd5-692f-487b-b8ba-2301c997e8d3",
"function_memory_size": "128",
"xray_trace_id": "1-6a148688-3996dca03cc76b0d1cfb676b",
"code": "oi90ICl"
}
Each field here is something filterable. cold_start tells you whether this request landed in a new environment. function_request_id ties together every log from the same run. And xray_trace_id links this log line to the corresponding trace in X-Ray, so from an error log you can jump straight to seeing where that request went. The code field is one we added ourselves, to filter per link.
Trace: where a request goes
X-Ray records the path a request takes through the services. Take a resolve trace and print the segment tree to show the structure:
- url-shortener-ResolveLinkFunction (function segment)
- Overhead
- ## resolve-link.handler (subsegment created by captureLambdaHandler)
- Events (EventBridge PutEvents call)
- DynamoDB (DynamoDB GetItem call)
- DynamoDB (downstream service node)
- Events (downstream service node)
This is the service map as a tree. The resolve function calls two services, DynamoDB to look up the link and EventBridge to emit the event, and both show up as separate subsegments thanks to captureAWSv3Client. The ## resolve-link.handler subsegment is the handler body, created by the tracer middleware. The two outermost nodes (DynamoDB, Events) are downstream services that X-Ray draws as nodes in the service map on the console.
The real value is that each subsegment has its own duration. When a request is slow, the trace shows where most of the time goes: reading DynamoDB, emitting the event, or your own code. Without a trace, "the API is slow" is just a guess; with a trace, you know exactly which step to fix.
Custom metrics
metrics.addMetric("LinkResolved", MetricUnit.Count, 1) pushes a metric on every successful resolve, in EMF (Embedded Metric Format), which CloudWatch picks up automatically from the logs. From there CloudWatch builds a chart of resolve counts over time, and the next article will set alarms on metrics like this. Unlike calling the CloudWatch API yourself, EMF is just logging in the right format, so it adds no latency to the request.
🧹 Cleanup
aws dynamodb scan --table-name url-shortener --query 'Items[].{PK:PK.S,SK:SK.S}' --output text | \
while read pk sk; do aws dynamodb delete-item --table-name url-shortener \
--key "{\"PK\":{\"S\":\"$pk\"},\"SK\":{\"S\":\"$sk\"}}"; done
aws cognito-idp admin-delete-user --user-pool-id "$POOL" --username obs@example.com
Keep the stack for the next article.
Wrap-up
The system is now observable. Powertools Logger gives structured JSON logs carrying request id, cold start, and trace id; Tracer builds an X-Ray trace with each service call as a subsegment with its own duration; Metrics pushes numbers via EMF without adding latency. From a log line you can jump to a trace, and from a trace you can see where a request is slow.
With logs, traces, and metrics in place, the next step is to use them to know when the system has a problem without sitting and watching. The next article builds a dashboard and alarms on CloudWatch, defines a few SLOs for the API, and sets alerts so that when the error rate or latency crosses a threshold, someone gets paged, instead of waiting for users to complain.