Architecture

Synthetic Data generates realistic synthetic data while preserving relationships between database objects. The architecture is organized around applications, datasets, connectors, and jobs that work together to configure and execute synthetic data generation processes.

Core Components

The following components form the foundation of the Synthetic Data architecture.

Applications

An application represents a business application or database environment for which synthetic data will be generated.

Applications:

Store reference and target connectors.
Maintain discovered schema information.
Store complete metadata about all tables, columns, and relationships present in a single database schema.
Provide a foundation for creating datasets.

Each application acts as a central repository for discovery results and generation configurations.

Connectors

Connectors define how Synthetic Data connects to a database.

Two connector types are supported:

Reference connectors – Used to discover schema metadata and analyze source structures. Reference connectors provide read access to the source environment to support data discovery and generation configuration.
Target connectors – Used as destinations for generated synthetic data. Marking a connector as a target connector allows Synthetic Data to write generated records to the database and, when configured, delete existing records before data generation.

A connector can serve as a reference connector, a target connector, or both.

Datasets

Datasets define the scope of data to be generated. A dataset is created from an application and contains a subset of the application's schema objects that are relevant to a testing scenario.

Datasets can include the following:

Selected tables from the application schema
Relationships between the selected tables
Generation configurations that control how synthetic data is produced
Data volume settings that determine the amount of data to generate

Multiple datasets can be created from the same application to support different testing requirements.

Generator Algorithms

Generator algorithms define how Synthetic Data creates values for columns. Synthetic Data supports multiple generator frameworks, each representing a different approach to synthetic data generation, such as generating values from seed lists, patterns, or other generation strategies.

A generator instance is a configured implementation of a generator framework that defines the specific behavior used during data generation. For example, a seed-list generator framework might have one instance configured with a list of common first names and another instance configured with a list of city names.

Synthetic Data includes a set of globally available generator instances that can be used across datasets. Additional generator instances can be created and configured to support organization-specific requirements.

Within a dataset, generator instances can be assigned to individual columns to control how synthetic values are generated. Synthetic Data automatically assigns default generator instances based on discovered metadata and profiling results, but users can review and modify these assignments as needed.

Generator algorithms help ensure that generated data is realistic, consistent, and appropriate for each column.

Jobs

A job generates synthetic data by executing a dataset against one or more target databases.

A job references the following:

The dataset to generate
One or more reference connector and target connector mappings

A job defines the following:

Execution parameters that control how the job runs
Optional overrides for data volume settings defined in the dataset

Jobs enable you to reuse the same dataset with different target environments and execution configurations.

Data generation workflow

Synthetic Data follows a multi-step workflow to generate synthetic data.

1. Create an application

You create an application and associate one or more reference and target connectors.

2. Discover metadata

Synthetic Data analyzes the schemas available through the reference connectors and collects information about:

Tables
Columns
Relationships
Constraints

During discovery, Synthetic Data also evaluates the discovered metadata and profiling information to assign default generator instances to columns.

The discovered metadata and generator assignments are stored within the application.

3. Configure data generation

You create a dataset from the application and select the tables required for a testing scenario.

Generation rules, generator assignments, relationships, and data volume settings can be reviewed and adjusted as needed.

4. Generate synthetic data

You create and run jobs based on a dataset.

During execution, Synthetic Data:

Preserves referential integrity
Generates realistic synthetic values
Maintains relationships between related tables
Writes generated data to the selected target database