How to Seed PostgreSQL Databases in 2025: Complete Guide
From manual SQL scripts to automated tools—learn the best practices for seeding PostgreSQL databases with foreign keys, constraints, and realistic data.
Why Database Seeding Matters
Every PostgreSQL application needs test data. Whether you're:
- Developing locally and need realistic data
- Running integration tests in CI/CD
- Setting up staging environments
- Demoing your application to stakeholders
You need a reliable way to populate your database with valid, constraint-safe data.
This guide covers 5 approaches to database seeding, from manual SQL scripts to fully automated tools, with real code examples and performance comparisons.
Table of Contents
- 1. Manual SQL Scripts (The Old Way)
- 2. ORM Seeders (Prisma, TypeORM, Sequelize)
- 3. Faker.js + Custom Scripts
- 4. Database Cloning Tools
- 5. Automated Synthetic Data Generators
- 6. Best Practices & Performance Tips
- 7. Choosing the Right Approach
Method 1: Manual SQL Scripts
The traditional approach: write SQL INSERT statements by hand.
Example: seed.sql
-- seed.sql
BEGIN;
-- Insert users first (no dependencies)
INSERT INTO users (id, name, email, created_at) VALUES
(1, 'Alice Johnson', 'alice@example.com', NOW()),
(2, 'Bob Smith', 'bob@example.com', NOW()),
(3, 'Carol White', 'carol@example.com', NOW());
-- Insert orders (depends on users)
INSERT INTO orders (id, user_id, total, status, created_at) VALUES
(1, 1, 99.99, 'completed', NOW()),
(2, 1, 149.50, 'pending', NOW()),
(3, 2, 75.00, 'shipped', NOW());
COMMIT;
Pros
- ✓ Simple and transparent
- ✓ Version controlled
- ✓ Fast execution
- ✓ No dependencies
Cons
- ✗ Not scalable (hard to maintain 1000+ rows)
- ✗ Data is static (same every time)
- ✗ Manual FK tracking required
- ✗ Breaks when schema changes
- ✗ No realistic data (all hardcoded)
When to Use
Manual SQL scripts work well for small, static datasets (< 100 rows) that rarely change. Perfect for initial schema setup or minimal test fixtures.
Method 2: ORM Seeders (Prisma, TypeORM, Sequelize)
Most ORMs provide seeding functionality. Here's how they compare:
Prisma Example
// prisma/seed.ts
import { PrismaClient } from '@prisma/client';
const prisma = new PrismaClient();
async function main() {
// Create users
const alice = await prisma.user.create({
data: {
name: 'Alice Johnson',
email: 'alice@example.com',
orders: {
create: [
{ total: 99.99, status: 'completed' },
{ total: 149.50, status: 'pending' }
]
}
}
});
const bob = await prisma.user.create({
data: {
name: 'Bob Smith',
email: 'bob@example.com',
orders: {
create: [
{ total: 75.00, status: 'shipped' }
]
}
}
});
console.log({ alice, bob });
}
main()
.catch((e) => console.error(e))
.finally(async () => await prisma.$disconnect());
TypeORM Example
// src/database/seeds/user.seed.ts
import { Factory, Seeder } from 'typeorm-seeding';
import { User } from '../entities/User';
import { Order } from '../entities/Order';
export default class CreateUsers implements Seeder {
public async run(factory: Factory): Promise {
await factory(User)()
.map(async (user) => {
user.orders = await factory(Order)().createMany(3);
return user;
})
.createMany(10);
}
}
Pros
- ✓ Type-safe (TypeScript)
- ✓ Handles FK relationships automatically
- ✓ Integrates with your ORM
- ✓ Can use factories for randomization
Cons
- ✗ Slow for large datasets (N+1 queries)
- ✗ Still requires manual data definition
- ✗ Limited to your ORM's capabilities
- ✗ Doesn't work without an ORM
When to Use
ORM seeders are great if you're already using Prisma/TypeORM and need type-safe seeding for small to medium datasets (< 10,000 rows).
Method 3: Faker.js + Custom Scripts
Use Faker.js to generate realistic fake data, but you handle the database logic.
Example
// seed.js
const { faker } = require('@faker-js/faker');
const { Pool } = require('pg');
const pool = new Pool({ connectionString: process.env.DATABASE_URL });
async function seed() {
// Insert 1000 users
const userIds = [];
for (let i = 0; i < 1000; i++) {
const result = await pool.query(
'INSERT INTO users (name, email) VALUES ($1, $2) RETURNING id',
[faker.person.fullName(), faker.internet.email()]
);
userIds.push(result.rows[0].id);
}
// Insert 3000 orders (3 per user on average)
for (const userId of userIds) {
const orderCount = faker.number.int({ min: 1, max: 5 });
for (let i = 0; i < orderCount; i++) {
await pool.query(
'INSERT INTO orders (user_id, total, status) VALUES ($1, $2, $3)',
[
userId,
faker.number.float({ min: 10, max: 1000, precision: 0.01 }),
faker.helpers.arrayElement(['pending', 'shipped', 'completed'])
]
);
}
}
console.log('Seeded 1000 users and ~3000 orders');
}
seed().catch(console.error);
Pros
- ✓ Realistic fake data
- ✓ Flexible and customizable
- ✓ Works with any database
- ✓ Large ecosystem of generators
Cons
- ✗ Manual FK tracking (userIds array)
- ✗ Slow (one INSERT per row)
- ✗ No automatic constraint handling
- ✗ Breaks on circular dependencies
- ✗ 100+ lines of boilerplate for complex schemas
When to Use
Faker.js is excellent for simple schemas (< 5 tables) or when you need very specific fake data formats. Not recommended for complex relational databases.
Read more: Faker.js vs Aphelion: When to Use Each
Method 4: Database Cloning Tools
Clone your production database structure and generate synthetic data automatically.
Example: Using Aphelion
# Clone production schema, generate test data
aphelion clone postgresql://localhost/production \
test_db --rows 10000 --seed 42
# Output:
# 🔍 Introspecting schema...
# ✓ Found 23 tables
# ✓ Detected 47 foreign keys
# ✓ Resolved 3 circular dependencies
#
# 📊 Generating data...
# ✓ users (10,000 rows)
# ✓ orders (34,567 rows)
# ✓ products (5,000 rows)
# ... (20 more tables)
#
# ✅ Generated 156,789 rows in 52 seconds
# All constraints satisfied. Zero errors.
Pros
- ✓ Fully automated (no manual scripting)
- ✓ Handles all constraints automatically
- ✓ Scales to millions of rows
- ✓ Deterministic (same seed = same data)
- ✓ Fast (bulk inserts)
- ✓ Works with complex schemas
Cons
- ✗ Less control over specific data values
- ✗ Requires external tool
- ✗ May need customization for domain-specific data
When to Use
Database cloning tools are ideal for complex schemas (10+ tables), CI/CD pipelines, and when you need production-like data volumes without manual scripting.
Method 5: Automated Synthetic Data Generators
Enterprise-grade tools that generate statistically similar data to production.
Options
- Aphelion - PostgreSQL-native, $49/year, constraint-safe
- Tonic.ai - Multi-database, $20k+/year, ML-based
- MOSTLY AI - Enterprise focus, $3k+/year
- Gretel.ai - Cloud-based, API-first
Comparison
| Tool | Price | Best For |
|---|---|---|
| Aphelion | $0-$49/year | PostgreSQL specialists, startups |
| Tonic.ai | $20k+/year | Multi-database enterprises |
| MOSTLY AI | $3k+/year | ML-based similarity |
Best Practices for PostgreSQL Seeding
1. Use Transactions
Always wrap your seed scripts in transactions to ensure atomicity:
BEGIN;
-- Your INSERT statements
COMMIT;
-- Or ROLLBACK on error
2. Disable Triggers Temporarily
For faster seeding, disable triggers during bulk inserts:
ALTER TABLE users DISABLE TRIGGER ALL;
-- Insert data
ALTER TABLE users ENABLE TRIGGER ALL;
3. Use COPY for Bulk Inserts
COPY is 10-100x faster than INSERT for large datasets:
COPY users (name, email, created_at) FROM STDIN WITH CSV;
Alice Johnson,alice@example.com,2025-01-01
Bob Smith,bob@example.com,2025-01-01
\.
4. Set Explicit IDs for Reproducibility
Use deterministic IDs for consistent test data:
INSERT INTO users (id, name, email) VALUES
(1, 'Alice', 'alice@example.com'),
(2, 'Bob', 'bob@example.com');
-- Reset sequence
SELECT setval('users_id_seq', (SELECT MAX(id) FROM users));
5. Validate Constraints After Seeding
Verify all constraints are satisfied:
-- Check for FK violations
SELECT * FROM orders WHERE user_id NOT IN (SELECT id FROM users);
-- Check for NULL violations
SELECT * FROM users WHERE email IS NULL;
-- Check for unique violations
SELECT email, COUNT(*) FROM users GROUP BY email HAVING COUNT(*) > 1;
Performance Tips
Benchmark: 100,000 Rows
| Method | Time | Notes |
|---|---|---|
| Individual INSERTs | ~10 minutes | Slow, not recommended |
| Batch INSERTs (1000/batch) | ~2 minutes | Better, but still slow |
| COPY command | ~15 seconds | Fast, recommended |
| Aphelion (automated) | ~20 seconds | Fast + automatic FKs |
Optimization Checklist
- ✓ Use COPY instead of INSERT for bulk data
- ✓ Disable indexes during seeding, rebuild after
- ✓ Disable triggers temporarily
- ✓ Use UNLOGGED tables for temporary data
- ✓ Increase
maintenance_work_memfor faster index creation - ✓ Use parallel workers for large datasets
Choosing the Right Approach
Decision Matrix
-
Small, static data (< 100 rows)?
→ Use Manual SQL scripts -
Using an ORM, small dataset (< 10k rows)?
→ Use ORM seeders -
Simple schema, need realistic data?
→ Use Faker.js + custom scripts -
Complex schema, many FKs, large dataset?
→ Use Aphelion or database cloning tools -
Enterprise, multi-database, huge budget?
→ Use Tonic.ai or MOSTLY AI
Conclusion
Database seeding has evolved significantly. In 2025, you have options ranging from manual SQL scripts to fully automated synthetic data generators.
Our recommendation:
- For simple projects: Manual SQL or ORM seeders
- For complex PostgreSQL schemas: Aphelion (automated, $49/year)
- For multi-database enterprises: Tonic.ai ($20k+/year)
The key is choosing a method that scales with your schema complexity and doesn't require constant maintenance.
Try Automated PostgreSQL Seeding
Generate constraint-safe test data in seconds. No manual scripting required.
Free forever for local development • 1,000 rows per table
Tags: #PostgreSQL #DatabaseSeeding #TestData #BestPractices #DevOps
Related: Faker.js vs Aphelion • Aphelion vs Tonic.ai