Architecture
Synthetic Data generates realistic synthetic data while preserving relationships between database objects. The architecture is organized around applications, datasets, connectors, and jobs that work together to configure and execute synthetic data generation processes.
Core Components
The following components form the foundation of the Synthetic Data architecture.
Applications
An application represents a business application or database environment for which synthetic data will be generated.
Applications:
-
Store reference and target connectors.
-
Maintain discovered schema information.
-
Store complete metadata about all tables, columns, and relationships present in a single database schema.
-
Provide a foundation for creating datasets.
Each application acts as a central repository for discovery results and generation configurations.
Connectors
Connectors define how Synthetic Data connects to a database.
Two connector types are supported:
-
Reference connectors – Used to discover schema metadata and analyze source structures. Reference connectors provide read access to the source environment to support data discovery and generation configuration.
-
Target connectors – Used as destinations for generated synthetic data. Marking a connector as a target connector allows Synthetic Data to write generated records to the database and, when configured, delete existing records before data generation.
A connector can serve as a reference connector, a target connector, or both.
Datasets
Datasets define the scope of data to be generated. A dataset is created from an application and contains a subset of the application's schema objects that are relevant to a testing scenario.
Datasets can include the following:
-
Selected tables from the application schema
-
Relationships between the selected tables
-
Generation configurations that control how synthetic data is produced
-
Data volume settings that determine the amount of data to generate
Multiple datasets can be created from the same application to support different testing requirements.
Generator Algorithms
Generator algorithms define how Synthetic Data creates values for columns. Synthetic Data supports multiple generator frameworks, each representing a different approach to synthetic data generation, such as generating values from seed lists, patterns, or other generation strategies.
A generator instance is a configured implementation of a generator framework that defines the specific behavior used during data generation. For example, a seed-list generator framework might have one instance configured with a list of common first names and another instance configured with a list of city names.
Synthetic Data includes a set of globally available generator instances that can be used across datasets. Additional generator instances can be created and configured to support organization-specific requirements.
Within a dataset, generator instances can be assigned to individual columns to control how synthetic values are generated. Synthetic Data automatically assigns default generator instances based on discovered metadata and profiling results, but users can review and modify these assignments as needed.
Generator algorithms help ensure that generated data is realistic, consistent, and appropriate for each column.
Jobs
A job generates synthetic data by executing a dataset against one or more target databases.
A job references the following:
-
The dataset to generate
-
One or more reference connector and target connector mappings
A job defines the following:
-
Execution parameters that control how the job runs
-
Optional overrides for data volume settings defined in the dataset
Jobs enable you to reuse the same dataset with different target environments and execution configurations.
Data generation workflow
Synthetic Data follows a multi-step workflow to generate synthetic data.
1. Create an application
You create an application and associate one or more reference and target connectors.
2. Discover metadata
Synthetic Data analyzes the schemas available through the reference connectors and collects information about:
-
Tables
-
Columns
-
Relationships
-
Constraints
During discovery, Synthetic Data also evaluates the discovered metadata and profiling information to assign default generator instances to columns.
The discovered metadata and generator assignments are stored within the application.
3. Configure data generation
You create a dataset from the application and select the tables required for a testing scenario.
Generation rules, generator assignments, relationships, and data volume settings can be reviewed and adjusted as needed.
4. Generate synthetic data
You create and run jobs based on a dataset.
During execution, Synthetic Data:
-
Preserves referential integrity
-
Generates realistic synthetic values
-
Maintains relationships between related tables
-
Writes generated data to the selected target database