Beyond Customers & Orders: Generating Complex Scientific Datasets with Aphelion
Moving past e-commerce demos into the world of hierarchies, taxonomies, and exotic data types.
Most synthetic data stories stop at “customers and orders.” That’s useful—but it’s also the easiest possible case. The real test of a synthetic data platform is whether it can survive outside SaaS dashboards and e‑commerce schemas and step into worlds where the data is dense, weird, and scientific.
This is where Algomimic Aphelion is built to operate.
In this post, we’ll look at how Aphelion can generate complex datasets for domains like bioinformatics—where schemas look more like Rfam than a CRM—and why support for hierarchies, taxonomies, and exotic data types is the difference between a toy and a serious tool.
From flat tables to scientific knowledge graphs
Typical demo datasets look like:
customersorderstransactions
These are flat and intuitive: a few foreign keys, timestamps, and enums. You can fake them with random values and basic constraints and still get something “real enough” for a UI demo.
Scientific and research domains are nothing like that. Take a bioinformatics‑style dataset inspired by resources like Rfam:
- RNA families and subfamilies
- Sequence alignments and covariance models
- Species and taxonomic lineage
- Experimental annotations and curation history
Suddenly, you’re dealing with deep hierarchies, tight referential integrity across dozens of tables, and exotic data types that don’t behave like simple strings. Unlike standard Faker wrappers, Aphelion is built to navigate these depths.
Modeling scientific taxonomies: more than just parent_id
Think about taxonomic trees of species or ontologies like SNOMED. These aren’t just labels; they’re structures. Aphelion respects hierarchical relationships (parent/child, ancestor/descendant) and generates plausible lineages that follow domain rules (no circular ancestry, valid ranks).
Technical Example: aphelion.yaml
Generating a valid taxonomic path using the
ltree type:
tables:
taxonomy_node:
rows: 5000
columns:
path:
type: ltree
# Generates a valid hierarchical path kingdom.phylum.class...
generator: hierarchy_path
params:
depth: 7
max_branching: 5
rank:
type: varchar
generator: enum
params:
values: ['kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species']
Handling 80+ exotic types: ltree, spatial, and beyond
Scientific databases use advanced types to encode reality compactly. Aphelion handles:
- Hierarchical types (
ltree) for representing taxonomic paths. - Spatial types for coordinates or regions of interest.
- Arrays for multi‑valued annotations and accessions.
This matters when testing GIS search indexing, ORM behavior with complex types, or large-scale analytics pipelines. For healthcare IT teams, this is the difference between a failing migration and a successful release.
Conclusion
Aphelion’s ability to understand the architecture of your domain—not just fill tables—is what makes it a proof-of-versatility engine. Whether you're in bioinformatics, 5G network topology, or geospatial analytics, Aphelion generates data that behaves like the real thing.
Ready for Complex Data?
Stop using toy data for scientific schemas. Generate production-grade synthetic data today.