Beyond Customers & Orders: Generating Complex Scientific Datasets with Aphelion

Most synthetic data stories stop at “customers and orders.” That’s useful—but it’s also the easiest possible case. The real test of a synthetic data platform is whether it can survive outside SaaS dashboards and e‑commerce schemas and step into worlds where the data is dense, weird, and scientific.

This is where Algomimic Aphelion is built to operate.

In this post, we’ll look at how Aphelion can generate complex datasets for domains like bioinformatics—where schemas look more like Rfam than a CRM—and why support for hierarchies, taxonomies, and exotic data types is the difference between a toy and a serious tool.

From flat tables to scientific knowledge graphs

Typical demo datasets look like:

customers
orders
transactions

These are flat and intuitive: a few foreign keys, timestamps, and enums. You can fake them with random values and basic constraints and still get something “real enough” for a UI demo.

Scientific and research domains are nothing like that. Take a bioinformatics‑style dataset inspired by resources like Rfam:

RNA families and subfamilies
Sequence alignments and covariance models
Species and taxonomic lineage
Experimental annotations and curation history

Suddenly, you’re dealing with deep hierarchies, tight referential integrity across dozens of tables, and exotic data types that don’t behave like simple strings. Unlike standard Faker wrappers, Aphelion is built to navigate these depths.

Modeling scientific taxonomies: more than just parent_id

Think about taxonomic trees of species or ontologies like SNOMED. These aren’t just labels; they’re structures. Aphelion respects hierarchical relationships (parent/child, ancestor/descendant) and generates plausible lineages that follow domain rules (no circular ancestry, valid ranks).

Technical Example: aphelion.yaml

Generating a valid taxonomic path using the ltree type:

tables:
  taxonomy_node:
    rows: 5000
    columns:
      path:
        type: ltree
        # Generates a valid hierarchical path kingdom.phylum.class...
        generator: hierarchy_path
        params:
          depth: 7
          max_branching: 5
      rank:
        type: varchar
        generator: enum
        params:
          values: ['kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species']

Handling 80+ exotic types: ltree, spatial, and beyond

Scientific databases use advanced types to encode reality compactly. Aphelion handles:

Hierarchical types (ltree) for representing taxonomic paths.
Spatial types for coordinates or regions of interest.
Arrays for multi‑valued annotations and accessions.

This matters when testing GIS search indexing, ORM behavior with complex types, or large-scale analytics pipelines. For healthcare IT teams, this is the difference between a failing migration and a successful release.

Conclusion

Aphelion’s ability to understand the architecture of your domain—not just fill tables—is what makes it a proof-of-versatility engine. Whether you're in bioinformatics, 5G network topology, or geospatial analytics, Aphelion generates data that behaves like the real thing.

Ready for Complex Data?

Stop using toy data for scientific schemas. Generate production-grade synthetic data today.

Get Download Link View Docs