← Back to algomimic.com
Pillar 1 — The Simulation Engine
<Aphelion />

Enterprise Synthetic Data
at 1/400th the Cost

$49/year vs. $20,000+ enterprise platforms

Aphelion CLI Demo

100% FK Integrity

Topological dependency graph ensures every insert respects referential integrity—including circular references.

Rust Speed

10,000+ rows/second. Fills Magento's 401-table schema in seconds, not hours.

80+ Exotic Types

PostGIS, ltree, hstore, JSONB, spatial — all handled automatically without manual configuration.

What is Aphelion?

Aphelion is a high-performance, Rust-native synthetic data generator designed specifically for PostgreSQL and MySQL databases. Unlike other tools that require manual configuration, Aphelion automatically introspects your database schema, resolves complex foreign key dependencies (including circular references), and generates realistic, constraint-safe test data in seconds.

Why use Aphelion?

  • Speed: Written in Rust, it generates 10,000+ rows/second.
  • Accuracy: Guarantees 100% foreign key integrity.
  • Coverage: Supports 80+ exotic types (PostGIS, ltree, JSON).
  • Compliance: Generates HIPAA and GDPR-safe synthetic data locally.

How it works

  1. Introspect: Connects to your DB to learn the schema.
  2. Plan: Builds a dependency graph to strictly order inserts.
  3. Generate: Creates and inserts data in parallel.

Comprehensive Database Coverage

100% exotic type support for both PostgreSQL and MySQL/MariaDB

PostgreSQL

52 Types

Exotic Types Supported:

  • PostGIS: geometry, geography
  • Hierarchical: ltree paths
  • Key-Value: hstore
  • Network: inet, cidr, macaddr, macaddr8
  • Ranges: int4range, tsrange, daterange (6 types)
  • Geometric: point, line, polygon, circle (7 types)
  • Full-Text: tsvector, tsquery
  • Advanced: arrays, JSONB, UUID, XML, money

Perfect For:

  • ✓ Complex hierarchical data
  • ✓ Geospatial applications
  • ✓ Full-text search
  • ✓ Advanced analytics

MySQL/MariaDB

28 Types

Exotic Types Supported:

  • JSON: Native + MariaDB (via json_valid)
  • UUIDs: Auto-detect CHAR(36) & BINARY(16)
  • Network: IP/MAC heuristics (polyfill)
  • Spatial: POINT, LINESTRING, POLYGON (8 types)
  • Coded Values: ENUM, SET
  • Binary: BIT, BINARY, VARBINARY, BLOB
  • Scale: PARTITION BY RANGE, GENERATED columns

Perfect For:

  • ✓ E-commerce (Magento, WooCommerce)
  • ✓ WordPress/Drupal schemas
  • ✓ Legacy apps with UUIDs in CHAR(36)
  • ✓ MariaDB with JSON constraints

84 Production-Ready Tables Across 6 Industries

Available for both PostgreSQL and MySQL/MariaDB

🏥 Healthcare (15 tables) 💰 Finance (16 tables) 🛒 E-commerce (17 tables) 🏢 Insurance (11 tables) 📱 Telecom (13 tables) ⚖️ Legal (12 tables)

Compliance-Ready Data Generation

HIPAA-Ready
Synthetic PHI Generation
PCI-DSS Safe
Fake Payment Data
GDPR-Ready
No Real PII
Privacy-First
Runs Locally

Aphelion generates 100% synthetic data to help you maintain compliance. No real patient data, financial records, or personal information is used or required. Learn more →

Why Developers, Startups & Enterprises Love Us

Built by engineers tired of SQL seeds. Perfect for MVP velocity and Enterprise scale.

Deploy with Confidence

Never break staging again. Our topological dependency graph ensures 100% referential integrity for every insert.

Scale Before You fail

Simulate massive datasets on your laptop. Test partitioning strategies and query performance against production-scale volume.

Pass Audits Instantly

Compliance comes standard. HIPAA-ready patient records and PCI-safe financial transactions generated without real PII.

Privacy by Default

All data is generated locally in your infrastructure. We never see your schema, we never see your data. Zero external API calls.

Catch Edge Cases

Deterministic seeding guarantees reproducible bugs. Need a user with exactly 3 failed payments? Script it once, reuse forever.

Zero Tech Debt

Stop maintaining fragile SQL scripts. Aphelion auto-introspects schema changes, so your seed data never rots.

View Deep Technical Capabilities

  • Generated Columns: Auto-computed values based on expressions.
  • Partitioning: Intelligent data distribution for ranged partitions.
  • Ltree / HierarchyID: Recursive tree generation (5-11 levels deep).
  • Spatial Types: PostGIS geometry/geography, MySQL spatial.
  • Weighted Distributions: Realistic demographic & geographic spread.
  • Composite Keys: Correctly handles multi-column uniqueness.
  • Domains & Enums: Respects custom types and constraints.
  • Circular Dependencies: Automatically resolves FK cycles.
  • Array Types: Generates realistic array distributions.

Built for Your Entire Team

Different goals, one source of truth.

For QA Teams

  • Spin up realistic test envs in under 10 minutes
  • Eliminate PII from test data while preserving edge cases
  • Reproducible bugs with deterministic seeds

For Compliance

  • HIPAA & PCI-DSS safe by design (no real data used)
  • Generate audit-ready datasets for penetration testing
  • Zero risk of data leaks in staging/dev

For Data Science

  • Version entire synthetic corpora with code
  • Benchmark drift detection against stable baselines

Industry-Specific Solutions

Pre-built generators for healthcare, finance, e-commerce, and more.

Works With Your Existing Schema

No configuration needed to start. We introspect your database, detect types, and map them to realistic generators automatically.

  • Smart Type Detection Maps `user_email` to `internet.email` automatically
  • Zero Config Start Just point it at your DB URL and go
  • JSON Export Export layout to JSON for fine-tuning
bash — 80x24
~ aphelion introspect postgres://localhost/myapp
> Connected to database 'myapp'
> Found 14 tables
> Detected 3 circular dependencies
> Generating schema map... Done
~ aphelion generate --rows 1000 --seed 42
> Generating data plan
> Phase 1: Base tables (users, products)...
> Phase 2: Dependent tables (orders, items)...
> Phase 3: Resolving circular refs...
> Successfully generated 14,000 rows in 1.2s

See Aphelion in Action

Real workflows for real teams.

Workflow: Seed Healthcare Sandbox
# 1. Initialize with Healthcare template
$ aphelion init --template healthcare-fhir
# 2. Generate 50k patients with history
$ aphelion generate --rows 50000 --seed 2024
> Generating patients... Done
> Generating encounters... Done
> 0 FK Violations. 0 PII Leaks.
Workflow: Inject Fraud Patterns
# 1. Introspect payment schema
$ aphelion introspect postgres://prod-replica/payments
# 2. Generate data with fraud signals
$ aphelion generate --scenario "velocity_attack"
> Modeling transactions...
> Injecting signals (2%)...
> Dataset ready for model training.

Perfect For

  • Database Seeding & Cloning Fill a complex Postgres schema with 10M+ rows that respect FKs and constraints.
  • Integration Testing Deterministic data for CI/CD pipelines. Seed 42 always produces the exact same dataset.
  • Regulated Industries Specialized Healthcare (HIPAA), Finance (PCI), and Telecom schemas.

Not Designed For

  • ML Model Training We generate structured data, not statistical duplicates for ML research (Use CausalFoundry for ML).
  • Unstructured Media We don't generate synthetic images, video, or long-form generated text/audio.
  • SaaS Hosting Aphelion is a local CLI tool. We don't host or see your data.

Why Aphelion is Different

We fill the gap between hacking together scripts and expensive enterprise platforms.

Vs. Scripts & Libraries

Feature
Aphelion
Faker.js / Seeds Custom SQL Scripts
Relational Integrity Automated
Circular Deps Handled ⚠️ Hard
Maintenance Zero High High

Vs. Enterprise Platforms

Feature
Aphelion
Enterprise AI Platforms
(Gretel, MOSTLY AI, Tonic)
Primary Focus Relational Structure Perfect DB seeding & Foreign Keys Statistical Similarity ML Model Training & Privacy
Developer Experience CLI Native Runs locally, works in CI Web UI / SaaS Upload data to cloud
Postgres Depth Native Support ltree, hierarchyid, jsonb, ranges Generic SQL Often treats everything as tables
Price Free / $49 mo $20k+ / year

Simple, Transparent Pricing

Start free on your local machine. Scale when your team grows.

Developer (CLI)

$0/forever
  • Unlimited tables & databases
  • 1,000 Rows per table
  • All industry templates
Download Free

Team (CI/CD)

$49/year
~0.2% the cost of enterprise tools
  • 1.5 Million Rows
  • Auto-Approve CI Mode
  • Priority Email Support

Secure payment via Stripe

Contact Sales & Support

Frequently Asked Questions

Everything you need to know about Aphelion

How is Aphelion different from Faker.js?

Faker.js generates random data but doesn't understand database constraints. Aphelion ensure zero FK violations, handles circular dependencies, and generates realistic codes (ICD-10, LOINC) that Faker doesn't support.

Is the data truly realistic?

Yes! We use weighted distributions and industry generators. Healthcare schemas support ICD-10 codes, LOINC tests, etc. It mirrors production without compliance risk.

Can I use this in production?

No. Aphelion is for testing, development, and staging environments only. It's designed to replace production data in non-production environments to maintain HIPAA/PCI-DSS compliance.

What databases are supported?

Currently, Aphelion supports PostgreSQL, MySQL, MariaDB, and SQLite. Support for SQL Server and Oracle is on the roadmap.

How does deterministic generation work?

Use the --seed flag to generate identical data every time. Perfect for CI/CD pipelines and debugging.

Which features are in the Rust version?

  • ✅ All 52 PostgreSQL & 36+ MySQL types
  • ✅ Constraint-safe generation (FK, unique, check)
  • ✅ Industry-specific generators (ICD-10, LOINC)
  • ✅ Weighted distributions

Do I need to write code?

No coding required! Aphelion introspects your database schema automatically. Point it at your database, and it generates a configuration for you.

What's included in the Team tier?

Team ($49/year) includes: unlimited rows, CI/CD auto-approve mode, and priority support.

How fast is data generation?

Aphelion generates ~10,000 rows/second on modern hardware. 100K rows takes ~10 seconds.

Still have questions?

Contact Us →

Latest from the Aphelion Blog

Deep dives into high-performance data generation, PostgreSQL optimizations, and testing strategies.