The Data Foundry

Built by Data with Pranjal

Back to labs

AWS Data Platform Lab

Choose AWS services with production judgment.

Work through storage, security, compute, streaming, governance, and observability incidents. Every lab asks why a service or design fits, not merely what it is.

Labs

17

Free

3

Completed

0

beginnerS3 + Athena18 minFree

The Athena Bill That Tripled Overnight

Business context

A new clickstream ingestion job goes live. Dashboard queries still work, but daily Athena spend triples.

Production problem

The new pipeline writes small gzip JSON files and no longer partitions by event date.

Interactive system map

The Athena Bill That Tripled Overnight production path

Trace how storage layout and file format affect an analytical query.

1

Source data

Produces the events or records entering this design.

Storage metrics

before: Parquet, 256 MB avg file, partitioned by event_date
after: JSON.gz, 1.8 MB avg file, prefix by ingestion_id
Athena bytes scanned: +340%

Your task

Diagnose first, then write the production response.

Identify the storage-layout regression and propose a cost-safe repair.

Most likely diagnosis or next action

Drafts are saved locally on this device.