Synthetic Data
Synthetic Data enables organizations to generate realistic, statistically representative datasets for non-production environments without exposing sensitive or regulated production data. Unlike masking, which transforms existing production data, Synthetic Data creates entirely new datasets based on schema structure and generation rules.
Synthetic Data helps development and testing teams create structurally valid, referentially consistent datasets for scenarios that production data cannot support, such as testing new schemas, edge cases, or early-stage applications. By automating data generation workflows, Delphix Synthetic Data accelerates development and testing cycles while supporting data privacy and compliance requirements.
How Synthetic Data works
DCT acts as the control plane for all Synthetic Data configuration and execution. Applications, datasets, generators, and jobs are created and managed entirely within DCT, with no separate infrastructure required for setup, execution, or monitoring.
Synthetic Data connects to reference source databases using read-only access and queries only schema metadata. No production data values are read or extracted. Data generation is driven entirely by database structure and generation rules rather than existing records.
Workflow
To generate synthetic data for the first time, complete the following high-level steps.
Configure settings
Configure global Synthetic Data settings that apply across applications, datasets, generators, and jobs. For information about available settings, see Configure Synthetic Data settings topic.
Connect your data
Set up the applications and connectors required for generation.
-
Create an application to represent the source system.
-
Configure reference connectors to discover schema metadata.
-
Configure target connectors to define where synthetic data is written.
-
Run discovery to import schema metadata from the reference environment.
Define what to generate
Create a dataset to specify the scope of synthetic data generation.
-
Select the tables and columns to include.
-
Use AI-assisted discovery to identify relevant tables based on a testing scenario.
-
Review and adjust generation settings as needed.
Configure generators
Generators define how individual column values are produced.
-
Built-in generators are applied automatically through classification rules.
-
Create custom generators if the built-in options do not meet your requirements.
Generate synthetic data
Create and run a job to produce synthetic output.
-
Create a job to map a dataset to a target connector and define data volumes.
-
Run the job to write synthetic data to the target environment.
Limitations
The following limitations apply to the current Early Access release of Synthetic Data.
-
Column-level and table-level
CHECKandUNIQUEconstraints are not enforced during generation. Generated values may fall outside the conditions defined by a check constraint, so tables relying on them may reject the inserted data. -
Primary keys and Foreign keys made up of more than one column (composite keys) are not automatically detected and handled.
-
Foreign keys where a table references its own primary key (e.g. employee.manager_id → employee.id) are not supported.
-
Circular foreign key relationships between multiple tables (e.g. Table A references Table B, which references Table C, which in turn references Table A) are not supported. When a cycle is detected in the relationship graph, generation fails. To resolve this, remove one of the relationships in the cycle from the dataset configuration.
-
References and Functional Generator rules that point to a column in a different schema are not supported. Both the source and target column of a Reference or functional/derived rule must reside in the same schema.
-
Incremental generation is supported for all table types. However, PK continuity (guaranteed no-collision appending) is only available for single-column numeric PKs. For non-numeric or composite PKs, uniqueness is not guaranteed.
-
Generators do not always honor a column’s defined data type, precision and scale. Generated values may exceed the column’s precision/scale (or not match a defined length), so values can be rejected on insert.
-
Existing connectors cannot be re-synced in place. To pick up schema changes, create a new connector rather than resyncing the existing one.
-
Only one generation job runs at a time. Additional jobs submitted while one is running are queued and execute sequentially.