Scale Before You Fail: Testing Partitioning & Performance on Your Laptop

The "Localhost" Illusion

We've all been there: You write a query, it runs in 2ms on your local machine with its 50 rows of seed data. You deploy it, and three months later, it takes down the production DB because the table grew to 10 million rows and you forgot a composite index.

Traditionally, developers couldn't test scale locally because:

Downloading production dumps is a security nightmare (PII risks).
Anonymizing 100GB of data takes forever.
Writing `INSERT INTO` scripts for millions of rows is impossible.

The Aphelion Advantage

Aphelion generates massive datasets entirely algorithmically. You can spin up a 10M-row database on your laptop in minutes, consuming 0 network bandwidth and risking 0 customer data.

Example 1: Testing Partitioning Strategies

PostgreSQL offers powerful partitioning (declarative partitioning), but setting it up correctly is tricky. Are you partitioning by `RANGE` (date) or `LIST` (region)? Is your partition key included in your primary key?

Let's say you want to test if partitioning your audit logs by month improves query speed.

Step 1: Generate the Volume

Using Aphelion, you generate 5 million audit log entries spanning the last year.

bash

# Generate 5,000,000 rows for the 'audit_logs' table
aphelion generate audit_schema.json --table audit_logs --rows 5000000

Step 2: Verify Partition Pruning

Now you can run `EXPLAIN ANALYZE` on your local machine to see if Postgres is actually pruning partitions (skipping the tables it doesn't need to read).

sql

EXPLAIN ANALYZE SELECT * FROM audit_logs 
WHERE created_at BETWEEN '2025-01-01' AND '2025-01-31';

-- Expected Output: Partition pruning in action
Append  (cost=0.00..14.33 rows=5 width=128)
  ->  Seq Scan on audit_logs_2025_01  (cost=0.00..14.33 rows=5 width=128)
        Filter: ((created_at >= '2025-01-01') AND (created_at <= '2025-01-31'))
-- If you see scans on _2025_02 or _2025_03, your partitioning strategy failed!

Without massive data: You would see a fast scan regardless of pruning success, because inspecting 50 rows is instantaneous. With 5M rows, a failure to prune causes a noticeable 2-3 second delay locally, alerting you immediately.

Example 2: Query Performance & Indexing

N+1 query problems and missing indexes are the silent killers of SaaS applications. They often only surface when a specific customer's data grows beyond a tipping point.

The Scenario

You have an E-commerce app. You want to display a user's order history with total amounts. Simple enough?

The Test

Generate a "Whale" user scenario: 1 user with 10,000 orders, and each order has 50 line items. (Aphelion allows you to configure these distributions easily).

bash

# Use distribution configuration to force heavy outliers
aphelion generate ecommerce.json --config heavy_users.json

Now run your dashboard query:

SELECT o.id, SUM(li.price * li.quantity) 
FROM orders o 
JOIN line_items li ON o.id = li.order_id 
WHERE o.user_id = ? 
GROUP BY o.id;

If you forgot an index on `line_items.order_id`, this query might take 400ms locally on such a dataset. That's sluggish. In production under load, it could spike to seconds.

Add the index: CREATE INDEX idx_line_items_order_id ON line_items(order_id);
Re-run the query. It drops to 15ms. You just prevented a production incident from the comfort of your coffee shop.

Summary

Partitioning Tests

Use massive date-ranged datasets to ensure your Postgres `PARTITION BY RANGE` setup is actually pruning scans.

Index Verification

Use "Whale" data distributions (users with huge histories) to stress-test joins and aggregates, ensuring indexes are used.

Ready to Stress Test?

Generate the scale you need to feel confident in your architecture. Download Aphelion and start simulating millions of rows today.

Download Free CLI Performance Guide