Load Testing With k6: Finding the Bottleneck Under Real Load

The cost in the previous article was low partly because we hadn't thrown real load at it. This article does exactly that: use k6 to push a large volume of requests at the link-open path, then read both the client side (k6) and the server side (CloudWatch) to see how the system scales and where the bottleneck is. The result confirms in numbers something seen scattered throughout the series.

Goal

Use k6 to run a load test that ramps up against the link-open path, read the results (request count, latency, error rate), then cross-check against CloudWatch metrics to identify the bottleneck and how the system handles excess load. The test fires many requests but they're all short and cheap; cost is negligible.

k6 and the scenario

k6 is a load-testing tool whose scenarios are written in JavaScript. The scenario below ramps from 0 to 40 virtual users (VU) running concurrently, holds for 30 seconds, then drops back to 0. Each VU repeatedly calls GET /{code} and doesn't follow redirects (to measure the API's response itself, not following through to the real destination):

import http from 'k6/http';
import { check } from 'k6';
export const options = {
  stages: [
    { duration: '15s', target: 40 },
    { duration: '30s', target: 40 },
    { duration: '5s',  target: 0 },
  ],
};
export default function () {
  const res = http.get(__ENV.TARGET, { redirects: 0 });
  check(res, {
    '301': (r) => r.status === 301,
    '503 (throttle)': (r) => r.status === 503,
  });
}

The link-open path is public, so load testing it needs no token. Each VU sends requests back to back at full speed, so 40 VUs produce a very large volume of requests.

Results from the client side

After about 50 seconds, k6 fired 27,741 requests at ~554 requests per second, p95 latency 146 ms, and http_req_failed 99.66%. Counting by status code with a custom counter in the scenario:

Total requests: 27741  | 554 req/s | p95 146 ms | failed 99.66%
  301 (success)   : 304    (1.1%)
  503 (overload)  :  25    (0.1%)
  remainder       : 27412  (98.8%)

What status is that 98.8% "remainder"? A shorter burst that recorded the exact codes shows:

$ seq 1 400 | xargs -P80 -I{} curl -s -o /dev/null -w "%{http_code}\n" "$API/$CODE" \
    | sort | uniq -c
 305 429
  76 503
  19 301

Most of it is 429, not 503. 429 is API Gateway's Too Many Requests: the throttle set in Article 16 (rate 5/sec, burst 2) sheds most requests right at the gateway. The few that get past the rate limit then hit the Lambda concurrency limit and receive 503. The final ~1% actually run and return 301. Overall latency is low because 429 and 503 return almost instantly; the successful requests are still fast too. The system doesn't slow down under load, it rejects most of the load.

Cross-checking the server side

A load test only gives half the picture without looking at the server side. Asking CloudWatch about the resolve function in the test window:

Invocations (sum):           403
Throttles (sum):             242
ConcurrentExecutions (max):   10

This is where everything lines up. ConcurrentExecutions peaks at exactly 10, no more. That's the account's concurrency limit, the number seen in Article 06, Article 14, Article 15. The function can only run ten copies at once, so even though k6 fires ~554 requests per second, only about 403 requests actually reach Lambda (Invocations); the overwhelming majority is shed beforehand. The shedding happens at two layers: API Gateway rejects most right at the gateway with 429 (over the rate throttle), while those that slip through get rejected by Lambda when all ten slots are busy, recorded in Throttles (242 times) and returned as 503 to the client.

   k6: ~554 req/s, 27,741 requests
        │
        ▼
   API Gateway (throttle 5 req/s, burst 2)
        ├── ~99% over rate ──▶ 429  (shed right at the gateway)
        └── slips through ──▶ Lambda (max 10 concurrent)
                          ├── no slots ──▶ 503
                          └── ~1% ──▶ 301   (ConcurrentExecutions max = 10)

The bottleneck is quota and configuration, not code

The most important lesson: the bottleneck isn't in the code or DynamoDB or the design, but in two configuration/quota latches. The first latch is the API Gateway throttle set very low in Article 16 (5 req/sec), so at ~554 req/sec it sheds almost everything with 429. The second is the account's Lambda concurrency limit of 10 (the default in many accounts is 1,000). The way to raise capacity isn't optimizing code but loosening those two latches: raise the rate throttle to match real demand, and request a concurrency quota increase via Service Quotas. After loosening them, this same test would give a much higher success rate.

What's notable is how the system fails. It doesn't collapse, doesn't hang, doesn't leave requests waiting forever until they time out. Excess load is rejected fast with 429 and 503, while the part that gets served stays fast. This is graceful degradation: under load beyond capacity, the system serves as much as it can and rejects the rest decisively, instead of trying to hold everything and toppling over. The two blocking layers, the API Gateway throttle (Article 16) and the Lambda concurrency limit, play the fuse role exactly as expected.

What to read after a load test

A good load test doesn't just give a single "how much can it take" number, it points at the next bottleneck. Here the two clear bottlenecks are the API Gateway rate throttle, then the Lambda concurrency. If you loosen both, the next bottleneck might move to DynamoDB (if one link is so hot it creates a hot partition), or to the event-emission rate itself. X-Ray traces (Article 13) and CloudWatch metrics (Article 14) are the tools to find that new bottleneck. The process repeats: throw load, find the bottleneck, loosen it, throw load again.

🧹 Cleanup

aws dynamodb scan --table-name url-shortener --query 'Items[].{PK:PK.S,SK:SK.S}' --output text | \
  while read pk sk; do aws dynamodb delete-item --table-name url-shortener \
    --key "{\"PK\":{\"S\":\"$pk\"},\"SK\":{\"S\":\"$sk\"}}"; done
aws cognito-idp admin-delete-user --user-pool-id "$POOL" --username lt@example.com

Wrap-up

The k6 load test shows the system handles the load that the rate throttle and concurrency limit allow, sheds the rest with 429 then 503, and keeps latency low for the part it serves. Cross-checking k6 against CloudWatch points straight at the bottleneck being the throttle configuration and concurrency quota rather than code, and shows the system degrades gracefully instead of collapsing. The fix is to loosen those two latches, then measure again to find the next bottleneck.

The series is nearly closed. The product is feature-complete, observable, secured, has CI/CD, knows its cost, and is measured under load. The last article is the capstone: review the whole thing against Well-Architected, leave the system running for you to verify, and discuss cleanup along with where to extend next.