The Data Foundry

Built by Data with Pranjal

Practice hub

Choose the next skill, not the next random card.

Search SQL, Python, PySpark, production scenarios, and system design practice. Start with a guided section, then narrow the library only when you need it.

Practice items

496

Free starters

116

Modes

5

Advanced filters

Recommended for you

Start with a focused practice track.

Open a browser lab, production scenario, or architecture exercise.

Guided selection

Free labs to start

SQLBroken SQL FixBeginnerFree

Wrong GROUP BY Grain Causing Customer Revenue Inflation

Build a customer-level revenue result with exactly one row per customer. Include only completed orders, return customer_id, customer_name, and completed_revenue, and make sure duplicate status rows cannot inflate the dashboard.

SQLGrainRevenueData Quality
18 minNot started
Start Lab
SQLBroken SQL FixBeginnerFree

LEFT JOIN Turned Into INNER JOIN by WHERE Filter

Return every active customer. For customers who clicked campaign SPRING_26, show their latest click timestamp. For customers with no click, keep the customer row and return NULL for last_click_at.

SQLJoinsNULLsRetention
16 minNot started
Start Lab
SQLOutput Mismatch DebuggingIntermediateFree

Duplicate Revenue from Joining Orders to Multiple Payments and Refunds

Return one row per order with paid_amount, refunded_amount, and net_revenue. Aggregate each child table to order_id before joining so payment and refund rows cannot multiply each other.

SQLJoin ExplosionRevenueOutput Mismatch
22 minNot started
Start Lab

Guided selection

Popular interview labs

PySparkBroken PySpark FixIntermediateFree

The Executor Graveyard

Fix the PySpark code so the pipeline is correct, scalable, and safe to rerun.

PysparkBroken PysparkSpark Performance and DebuggingRepeated executor deaths after a wide join
24 minNot started
Start Lab
PySparkBroken PySpark FixIntermediateFree

The AQE Surprise

Fix the PySpark code so the pipeline is correct, scalable, and safe to rerun.

PysparkBroken PysparkSpark Performance and DebuggingAdaptive Query Execution changes join or partition strategy in unexpected ways
24 minNot started
Start Lab
PySparkBroken PySpark FixIntermediateFree

The Window Function Blowup

Fix the PySpark code so the pipeline is correct, scalable, and safe to rerun.

PysparkBroken PysparkSpark Performance and DebuggingLarge window operations create spill, skew, or sort pressure
24 minNot started
Start Lab

Guided selection

Production debugging labs

SQLOutput Mismatch DebuggingIntermediateFree

Duplicate Revenue from Joining Orders to Multiple Payments and Refunds

Return one row per order with paid_amount, refunded_amount, and net_revenue. Aggregate each child table to order_id before joining so payment and refund rows cannot multiply each other.

SQLJoin ExplosionRevenueOutput Mismatch
22 minNot started
Start Lab
PySparkLog / Error AnalysisIntermediatePremium

Too Many Small Files from Hourly Writes

Diagnose the performance issue from logs.

PySparkSmall FilesCompactionLakehouse
17 minNot started
Preview / Unlock
AirflowLog / Error AnalysisIntermediatePremium

DAG Green but Dashboard Wrong

Find why green status is misleading.

AirflowMonitoringData QualityIncident
18 minNot started
Preview / Unlock

Library results

496 matching practice items

The first 24 are shown to keep this page useful instead of overwhelming.

PySparkBroken PySpark FixBeginnerFree

The Null Key Funnel

Fix the PySpark code so the pipeline is correct, scalable, and safe to rerun.

PysparkBroken PysparkSpark Performance and DebuggingNull or default join keys collapse data into a pathological partition
24 minNot started
Start Lab
SQLWindowsbeginnerFree

SQL 3: Top 3 Salaries per Department

Return exactly the requested result set. The browser will compare your output with the model query. After you submit, the browser runs your SQL against the visible sample plus an additional edge-case dataset.

SQLWindow Functions
12 minNot started
Start Lab
SQLWindowsbeginnerFree

SQL 4: Latest Order per Customer

Return exactly the requested result set. The browser will compare your output with the model query. After you submit, the browser runs your SQL against the visible sample plus an additional edge-case dataset.

SQLWindow Functions
12 minNot started
Start Lab
SQLWindowsbeginnerFree

SQL 6: Running Total by Date

Return exactly the requested result set. The browser will compare your output with the model query. After you submit, the browser runs your SQL against the visible sample plus an additional edge-case dataset.

SQLWindow Functions
12 minNot started
Start Lab
SQLWindowsbeginnerFree

SQL 7: 3-Day Moving Average

Return exactly the requested result set. The browser will compare your output with the model query. After you submit, the browser runs your SQL against the visible sample plus an additional edge-case dataset.

SQLWindow Functions
12 minNot started
Start Lab
SQLJoinsbeginnerFree

SQL 11: Customers with No Orders

Return exactly the requested result set. The browser will compare your output with the model query. After you submit, the browser runs your SQL against the visible sample plus an additional edge-case dataset.

SQLAnti JoinNULL Handling
12 minNot started
Start Lab
SQLSQL Coverage PackbeginnerFree

CRM Profile Backfill Coverage: Profile Enrichment Join 001

Write a LEFT JOIN query returning customer_id, customer_name, and city. Your query will be tested with customers that have no profile and with orphan profile rows.

SQLJoinsLEFT JOINCustomer Analytics
12 minNot started
Start Lab
SQLSQL Coverage PackbeginnerFree

Missing Profile Preservation Check: Profile Enrichment Join 173

Write a LEFT JOIN query returning customer_id, customer_name, and city. Your query will be tested with customers that have no profile and with orphan profile rows.

SQLJoinsLEFT JOINCustomer Analytics
12 minNot started
Start Lab
SQLSQL Coverage PackbeginnerFree

Customer 360 Left Join Audit: Profile Enrichment Join 174

Write a LEFT JOIN query returning customer_id, customer_name, and city. Your query will be tested with customers that have no profile and with orphan profile rows.

SQLJoinsLEFT JOINCustomer Analytics
12 minNot started
Start Lab
SQLSQL Coverage PackbeginnerFree

Completed-Order Coverage Gap: First Activity Retention 008

Write a query that returns customer_id and customer_name for customers with no completed orders. Your query will be tested with NULL customer IDs in the fact table and customers with only cancelled orders.

SQLAnti JoinNULL SafetyCustomer Analytics
14 minNot started
Start Lab
SQLSQL Coverage PackbeginnerFree

No-Purchase Customer Segment: First Activity Retention 130

Write a query that returns customer_id and customer_name for customers with no completed orders. Your query will be tested with NULL customer IDs in the fact table and customers with only cancelled orders.

SQLAnti JoinNULL SafetyCustomer Analytics
14 minNot started
Start Lab
SQLSQL Coverage PackbeginnerFree

Order Absence Reconciliation: First Activity Retention 131

Write a query that returns customer_id and customer_name for customers with no completed orders. Your query will be tested with NULL customer IDs in the fact table and customers with only cancelled orders.

SQLAnti JoinNULL SafetyCustomer Analytics
14 minNot started
Start Lab
SQLSQL Coverage PackbeginnerFree

CRM Reactivation Candidate Pull: First Activity Retention 168

Write a query that returns customer_id and customer_name for customers with no completed orders. Your query will be tested with NULL customer IDs in the fact table and customers with only cancelled orders.

SQLAnti JoinNULL SafetyCustomer Analytics
14 minNot started
Start Lab
SQLBroken SQL FixBeginnerFree

LEFT JOIN Turned Into INNER JOIN by WHERE Filter

Return every active customer. For customers who clicked campaign SPRING_26, show their latest click timestamp. For customers with no click, keep the customer row and return NULL for last_click_at.

SQLJoinsNULLsRetention
16 minNot started
Start Lab
PySparkBroken PySpark FixBeginnerFree

The Small Files Avalanche

Fix the PySpark code so the pipeline is correct, scalable, and safe to rerun.

PysparkBroken PysparkSpark Performance and DebuggingSpark write path creates too many tiny files
24 minNot started
Start Lab
PySparkBroken PySpark FixBeginnerFree

The Silent UDF Tax

Fix the PySpark code so the pipeline is correct, scalable, and safe to rerun.

PysparkBroken PysparkSpark Performance and DebuggingPython UDF makes a pipeline unexpectedly slow
24 minNot started
Start Lab
PySparkBroken PySpark FixBeginnerFree

The Cache Everything Trap

Fix the PySpark code so the pipeline is correct, scalable, and safe to rerun.

PysparkBroken PysparkSpark Performance and DebuggingOver-caching makes the cluster slower
24 minNot started
Start Lab
PySparkBroken PySpark FixBeginnerFree

The Union of Doom

Fix the PySpark code so the pipeline is correct, scalable, and safe to rerun.

PysparkBroken PysparkSpark Performance and DebuggingSchema mismatch during union creates silent data corruption risk
24 minNot started
Start Lab
PySparkBroken PySpark FixBeginnerFree

The Driver Memory Trap

Fix the PySpark code so the pipeline is correct, scalable, and safe to rerun.

PysparkBroken PysparkSpark Performance and DebuggingWork is accidentally pulled back to the driver and causes instability
24 minNot started
Start Lab
SQLSQL Coverage PackbeginnerFree

Regional Enablement Filter: Reference Dimension Filter 029

Return country_name, population, and area_sq_km where population is at least 25,000,000 or area is at least 1,000,000. Your query will be tested with rows that qualify by only one threshold.

SQLFilteringReference DataReporting
10 minNot started
Start Lab
SQLSQL Coverage PackbeginnerFree

Country Threshold Logic Fix: Reference Dimension Filter 033

Return country_name, population, and area_sq_km where population is at least 25,000,000 or area is at least 1,000,000. Your query will be tested with rows that qualify by only one threshold.

SQLFilteringReference DataReporting
10 minNot started
Start Lab
SQLSQL Coverage PackbeginnerFree

OR Rule Reporting Drill: Reference Dimension Filter 040

Return country_name, population, and area_sq_km where population is at least 25,000,000 or area is at least 1,000,000. Your query will be tested with rows that qualify by only one threshold.

SQLFilteringReference DataReporting
10 minNot started
Start Lab
SQLSQL Coverage PackbeginnerFree

Reference Dimension Filter Audit: Reference Dimension Filter 044

Return country_name, population, and area_sq_km where population is at least 25,000,000 or area is at least 1,000,000. Your query will be tested with rows that qualify by only one threshold.

SQLFilteringReference DataReporting
10 minNot started
Start Lab
SQLSQL Coverage PackbeginnerFree

Dimension Qualification Extract: Reference Dimension Filter 060

Return country_name, population, and area_sq_km where population is at least 25,000,000 or area is at least 1,000,000. Your query will be tested with rows that qualify by only one threshold.

SQLFilteringReference DataReporting
10 minNot started
Start Lab