Security: IAM Least-Privilege, Throttling, and WAF

K
Kai··6 min read·1 views

The product works and is observable, but many functions still hold permissions wider than what they do: Article 06 gave each function the whole DynamoDB read-write set for convenience. Before opening it to real users, it needs tightening. This article shrinks permissions to exactly what's needed, sets throttling against abuse, and discusses where to store secrets and how to attach WAF.

Goal

Shrink each function's IAM to exactly the actions it needs (least-privilege), set throttling at API Gateway, and discuss Parameter Store for secrets and WAF for the HTTP API. We verify that tightening permissions doesn't break functionality, and watch the load-shedding behavior when the API is flooded. No notable cost is incurred.

Least-privilege: grant the exact actions each function needs

The DynamoDBCrudPolicy we used for convenience grants a whole set of actions: read, write, delete, query, scan, batch. But each function only does a small part of that. The create-link function only needs PutItem. The open-link function only needs GetItem. The list function only needs Query. The delete function only needs DeleteItem. The least-privilege principle says: grant exactly those actions, on exactly those resources, no more. If a function is exploited, the attacker can only do exactly its narrow slice of permissions.

Replace the broad policy with a Statement that lists specific actions. The create-link function:

Policies:
  - Statement:
      - Effect: Allow
        Action: dynamodb:PutItem
        Resource: !GetAtt Table.Arn

The open-link function only reads:

Policies:
  - Statement:
      - { Effect: Allow, Action: dynamodb:GetItem, Resource: !GetAtt Table.Arn }
  - EventBridgePutEventsPolicy: { EventBusName: !Ref EventBus }

The list function only queries, and must include the index ARN too, because a query on the GSI is a separate resource:

Policies:
  - Statement:
      - Effect: Allow
        Action: dynamodb:Query
        Resource:
          - !GetAtt Table.Arn
          - !Sub "${Table.Arn}/index/GSI1"

The delete function only DeleteItem. The aggregator is more complex because it updates counters, writes markers, queries connections, and reads, so it needs exactly five actions — UpdateItem, PutItem, GetItem, DeleteItem, Query — on the table and index, still a finite, explicit list rather than the whole set. The two WebSocket functions each need a single action: connect writes (PutItem), disconnect deletes (DeleteItem). And the moderation state machine only UpdateItem.

Verify: tightening permissions must not break the work

The trap of least-privilege is tightening too hard and breaking functionality. So after shrinking permissions and deploying, run the exact operations again to confirm they still work:

=== FUNCTIONAL voi IAM da siet ===
  create  -> code=86Fgqtq
  resolve -> 301
  list    -> {"links":[{"code":"86Fgqtq","target":"https://aws.amazon.com...
  delete  -> {"deleted":"86Fgqtq"}

All four operations run correctly with the narrowed permissions, meaning each function can still do its job with exactly its minimal slice. This is the kind of change worth verifying with a real test, because a missing permission only surfaces at runtime — the template won't flag it.

Throttling: against flooding

A public API needs a ceiling so one client (or one attacker) can't flood hard enough to take it down or run up the bill. The HTTP API sets throttling right at the API layer via route settings:

HttpApi:
  Type: AWS::Serverless::HttpApi
  Properties:
    DefaultRouteSettings:
      ThrottlingBurstLimit: 2
      ThrottlingRateLimit: 5

Checking the stage shows the throttle is applied:

$ aws apigatewayv2 get-stage --api-id "$APIID" --stage-name '$default' \
    --query 'DefaultRouteSettings'
{ "ThrottlingBurstLimit": 2, "ThrottlingRateLimit": 5.0 }

One point worth stating plainly about real behavior. API Gateway applies throttling on a best-effort basis, so at very small values and bursty traffic, it doesn't cut exactly per request. Flooding 30 nearly-simultaneous requests at the open-link function, the observed result is:

phan bo status:  10 301   20 503

Ten requests pass, twenty are rejected with 503. Notably, this 503 comes from a different protection layer: the account's Lambda concurrency limit is 10 (seen in Article 06), so only ten environments run at once and the rest are shed. That is, the system has two layers against flooding: API Gateway throttling (returns 429 when the rate is exceeded) and the Lambda concurrency limit (returns 503 when the number of environments is exceeded). On this low-concurrency account, the 503 layer hits first. The operationally important thing is that excess load is shed rather than dragging the backend down; to see a pure 429 from API Gateway, you'd raise the concurrency quota and exceed the rate more decisively.

Secrets: don't keep them in code or plain environment variables

The URL shortener has no secret keys yet, but when one is added (say, the API key for a real link-scanning service from Article 12), don't embed it in code or put it directly in an environment variable as plaintext. The two right places to store it are AWS Systems Manager Parameter Store (SecureString type, encrypted with KMS) for configuration and simple secrets, and AWS Secrets Manager for secrets that need automatic rotation. The function reads the value at init time (at the module level, per the principle from Article 02), and IAM only grants read access to that specific parameter. This keeps secrets out of source code, out of logs, and away from people who only have permission to view the function configuration.

WAF: HTTP API has to go around

A Web Application Firewall filters common attack patterns (SQL injection, scanning, bad IPs) before requests reach the application. But as noted in Article 03, an HTTP API cannot attach AWS WAF directly; only REST API and CloudFront can. The standard approach for an HTTP API is to put a CloudFront distribution in front, attach the WAF web ACL to CloudFront, then have CloudFront forward to the HTTP API.

   client ─▶ CloudFront (attach WAF web ACL) ─▶ HTTP API ─▶ Lambda
                 │ block SQLi, rate-based rule, IP block...

This series doesn't build CloudFront (it's heavy and slow, and we're keeping the infrastructure lean for acceptance), but this is the point you must know when taking an HTTP API to production with a WAF requirement: the architecture must have CloudFront in the middle, not a direct attachment. If you must attach WAF directly to API Gateway, that's a reason to choose REST API from the start instead of HTTP API.

🧹 Cleanup

The verification operations left a demo link and user, now deleted; throttling and IAM are stack configuration so they stay:

aws dynamodb scan --table-name url-shortener --query 'Items[].{PK:PK.S,SK:SK.S}' --output text | \
  while read pk sk; do aws dynamodb delete-item --table-name url-shortener \
    --key "{\"PK\":{\"S\":\"$pk\"},\"SK\":{\"S\":\"$sk\"}}"; done
aws cognito-idp admin-delete-user --user-pool-id "$POOL" --username sec@example.com

Wrap-up

The product is much tighter now. Each function holds exactly the IAM actions it needs, and the test confirmed that tightening permissions didn't break functionality. API Gateway has throttling against flooding, and together with the Lambda concurrency limit, excess load is shed instead of taking the backend down. Secrets belong in Parameter Store or Secrets Manager, not in code. And WAF for an HTTP API has to go around through CloudFront, an architectural constraint to know in advance.

Part V closes here: the product is observable, optimizable, and security-tightened. Part VI is lifecycle operations: the next article builds CI/CD so each change goes through automated, controlled build and deploy, with canary and rollback, replacing the manual sam deploy we've typed all series.