Delphix AI Services

Introduction

Delphix AI Services is a capability available in Data Control Tower (DCT) with an Enterprise license. It allows users to generate synthetic seed lists (or secure lookup lists) to create secure lookup algorithms that are used to mask sensitive data in Continuous Compliance Engine workflows. By using a Delphix-provided language model, you can create secure lookup lists within your DCT environment, helping reduce exposure risks to maintain strict data governance.

This feature is part of Delphix’s broader initiative to integrate AI into its platform. While its initial use case focuses on synthetic lookup generation, AI Services is designed as a foundation for more advanced AI capabilities in future DCT releases.

Supported models

Only Delphix-distributed models are supported with AI Services. These models are prepackaged and pre-quantized for efficient CPU-based processing. You can download the certified model from the Delphix Download Center, no models are shipped with DCT out-of-the-box. Upon upload, any existing model will be replaced by the new model.

Delphix recommends using the q6K or q80 quantization variants of the Meta Llama 3.2 3B model, which offer the best balance of quality and performance. Custom, modified, or third-party models are not certified for use with Delphix AI Services.

DCT system requirements

To use AI Services effectively, the following DCT requirements are recommended:

  • At minimum, the DCT deployment must run at least an 8vCPU, however, a 16vCPU is recommended.

    • At least a 4vCPU should be dedicated to the AI Service for Kubernetes DCT deployments.

  • At minimum, the DCT deployment must have 32 GB of RAM. However, 64 GB RAM is strongly recommended for generating larger lists or for environments with heavier workloads.

  • GPU acceleration is not supported. The models are optimized to run on CPU hardware.

  • The model file size is approximately 3.2 GB, and installation temporarily requires up to 12.8 GB of disk space.

Failing to meet these requirements may result in slow performance or job failures.

Performance impact

Generating synthetic data is a memory and CPU intensive operation. You may notice temporary increases in CPU and RAM usage when large lists are being created. For best results:

  • Schedule generation tasks during off-peak hours to avoid disruption.

  • Monitor resource consumption after enabling the feature to assess impact.

Be aware that AI operations are very CPU intensive and can impact the performance of other CPU intensive operations on the engine.

Data ownership and security

Delphix AI Services is designed with customer data security in mind. All generated data belongs to you, the customer. Delphix does not store, access, or transmit any generated or user data. The feature operates entirely within your environment.

Additionally, Delphix does not use any live data or metadata from your databases, file systems, or other repositories to generate synthetic lists. Results are generated purely from prompt inputs by the end user.

The model files should be handled and stored according to internal security practices.

Model attribution: Built with Llama

Delphix AI Services is powered by the Meta Llama 3 language model. Include the attribution "Built with Meta Llama 3" in compliance and audit documentation where applicable. For additional licensing terms, refer to the Delphix license agreement.

Limitations

While powerful, the feature has several known limitations:

  • Prompt customization is limited. This is not a general-purpose conversational AI engine.

  • Generated lists must be manually downloaded and imported into the Continuous Compliance Engine. There is no automatic integration in this release.

  • Only Delphix-provided models are supported. Modified or community models are not certified for use with Delphix AI Services.

  • Generating very large lists or lists of fully unique values may not be feasible, depending on prompt complexity and data class.

Contact and support

For support, visit the Delphix Support Portal. When submitting a request, include logs, the DCT version, and any error messages to expedite troubleshooting.