Global Secondary Index and Sparse Index: Opening a New Query Path

The table from the previous article opens a link very fast because we know the code exactly. But the management page needs to answer a different question: given an ownerId, list every link that person created. With the current key, this can't be answered, because the partition key is LINK#<code> and we don't know a person's codes up front. This article adds a global secondary index to open exactly that path, and along the way meets a tidy technique: the sparse index.

Goal

Understand why some access patterns need a secondary index, add a global secondary index that inverts the key to query by user, and see that a GSI is sparse by default, meaning only items that have the full index key show up. Create a real GSI, write data and run a real query. We still use the on-demand table from the previous article, so the cost is negligible.

Why you can't just use the primary key

DynamoDB only queries efficiently by partition key (with additional filtering by sort key). To fetch links by ownerId, that attribute isn't part of the key, so the only way against the base table is a Scan, reading the whole table and then filtering. The documentation warns about exactly this: when you need to fetch data by a non-key attribute, "it would need to use a Scan operation. As more items are added to the table, scans of all the data would become slow and inefficient."

This is where a global secondary index solves the problem. The documentation describes it as "a selection of attributes from the base table, but they are organized by a primary key that is different from that of the table", and crucially: "The index key does not need to have any of the key attributes from the table." In other words, a GSI is a different view of the same data, sorted by a key we choose, to open a query path the base table doesn't have.

Invert the key by user

We add an index named GSI1 with two key attributes given the neutral names GSI1PK and GSI1SK. On the link item (the item with SK = META), we additionally set:

  GSI1PK = USER#<ownerId>
  GSI1SK = LINK#<createdAt>

Now, seen from GSI1, all of user-001's links share GSI1PK = USER#user-001, and are sorted by GSI1SK, i.e. by creation time. Querying the index by GSI1PK returns exactly that person's list of links, sorted by date.

   Base table (PK/SK)                  Seen through GSI1 (GSI1PK/GSI1SK)
   ─────────────────────────          ──────────────────────────────────
   LINK#aK9xQ2z / META    ─────┐       USER#user-001 / LINK#2026-05-24...
   LINK#Zb7Kp1m / META    ─────┼────▶  USER#user-001 / LINK#2026-05-25...
   LINK#aK9xQ2z / STAT#... ────┘ (STAT has no GSI1PK -> not in index)

Declaring the GSI in SAM

Add the index's two key attributes to AttributeDefinitions, then declare GlobalSecondaryIndexes:

AttributeDefinitions:
  - { AttributeName: PK, AttributeType: S }
  - { AttributeName: SK, AttributeType: S }
  - { AttributeName: GSI1PK, AttributeType: S }
  - { AttributeName: GSI1SK, AttributeType: S }
KeySchema:
  - { AttributeName: PK, KeyType: HASH }
  - { AttributeName: SK, KeyType: RANGE }
GlobalSecondaryIndexes:
  - IndexName: GSI1
    KeySchema:
      - { AttributeName: GSI1PK, KeyType: HASH }
      - { AttributeName: GSI1SK, KeyType: RANGE }
    Projection:
      ProjectionType: ALL

ProjectionType: ALL puts every attribute of the base item into the index, so a GSI query returns full data without reading back to the table. The trade-off is extra storage and write cost for the index; for a small table that's negligible, but at larger data sizes consider KEYS_ONLY or INCLUDE to project only the attributes you need. When sam deploy applies this change to an existing table, DynamoDB builds the index in the background and takes a while to move from CREATING to ACTIVE before you can query it.

Sparse index: a GSI is sparse by default

Note in the diagram above: the STAT#... item has no GSI1PK. That's not an oversight but a deliberate choice. The documentation states the rule: "DynamoDB writes a corresponding index entry only if the index key attributes are present in the item... If either key attribute is missing from an item, that item does not appear in the index." An index where only a subset of the base table's items appear is called a sparse index, and the documentation says plainly "Global secondary indexes are sparse by default."

The consequence fits us perfectly: because only the link item (SK = META) is assigned a GSI1PK, while the stat item is not, GSI1 automatically holds only links. We don't need to filter out stats when querying the link list, they were never in the index. Assigning the GSI key only to link items is a way of pre-classifying right in the data layer.

Real query

After GSI1 turns ACTIVE, write two links for user-001 (with GSI1PK/GSI1SK) and one stat record (without), then check. The base table has three items:

$ aws dynamodb scan --table-name url-shortener --select COUNT --query Count
3

Query GSI1 by user, with --no-scan-index-forward to get the newest first:

$ aws dynamodb query --table-name url-shortener --index-name GSI1 \
    --key-condition-expression "GSI1PK = :u" \
    --expression-attribute-values '{":u":{"S":"USER#user-001"}}' \
    --no-scan-index-forward \
    --query 'Items[].{code:PK.S,createdAt:createdAt.S,target:target.S}' --output table
-----------------------------------------------------------------------------------
|     code     |       createdAt        |                 target                  |
+--------------+------------------------+-----------------------------------------+
|  LINK#Zb7Kp1m|  2026-05-25T14:30:00Z  |  https://aws.amazon.com/lambda/         |
|  LINK#aK9xQ2z|  2026-05-24T09:00:00Z  |  https://docs.aws.amazon.com/dynamodb/  |
-----------------------------------------------------------------------------------

Two links come back, sorted by creation time descending, exactly the order the dashboard needs. And proof of sparseness: although the table has three items, the GSI1 query sees only two:

$ aws dynamodb query --table-name url-shortener --index-name GSI1 \
    --key-condition-expression "GSI1PK = :u" \
    --expression-attribute-values '{":u":{"S":"USER#user-001"}}' --select COUNT --query Count
2

The STAT#... record isn't present in the index because it has no GSI1PK. We've just opened a new query path (by user) and pre-filtered the irrelevant item type, both just by how we set the keys.

🧹 Cleanup

Delete the demo items, keep the table and index for later articles:

for k in 'LINK#aK9xQ2z|META' 'LINK#Zb7Kp1m|META' 'LINK#aK9xQ2z|STAT#2026-05-25'; do
  pk="${k%%|*}"; sk="${k##*|}"
  aws dynamodb delete-item --table-name url-shortener \
    --key "{\"PK\":{\"S\":\"$pk\"},\"SK\":{\"S\":\"$sk\"}}"
done

Wrap-up

A global secondary index gives us a different view of the same data, sorted by a key we choose, to open a query path the base table doesn't have. Inverting the key by USER#<ownerId> lets us list links by user, and because a GSI is sparse by default, only items with the index key appear, so the index filters out the stats by itself. Our URL shortener now has all three paths: open a link by code, fetch a link's item collection, and list links by user.

The data design is now far enough along to wire into code. The next article connects DynamoDB to the two handlers from Article 03: creating a link will write to the real table, opening a link will do a real lookup. Along the way we solve two problems people often get wrong: how to make link creation idempotent so a repeated request doesn't produce two records, and how to count clicks safely when many opens happen at once, using an atomic counter.