The Data Foundry

Built by Data with Pranjal

Premium lockedPySparkIntermediate

Too Many Small Files from Hourly Writes

Each hourly job writes many tiny files into the same date partition. Metadata overhead dominates scan time.

Practice type

Log / Error Analysis

Estimated time

17 min

Skills

PySpark, Small Files, Compaction

Create an account to continue

Sign in with OTP first, then choose a plan and complete UPI activation.

OTP Login