Configuring AWS S3 bucket as staging area

The Hyperscale Compliance Orchestrator requires a staging area to read from the source file(s) and write to the target file(s). In Kubernetes deployment of Hyperscale Compliance, we can now leverage AWS S3 buckets accessible to the Hyperscale Compliance Orchestrator and the Continuous Compliance engine as the staging area. This feature is supported on microk8s and AWS EKS distributions.

Refer to the Staging area support matrix for certified connector and staging area configurations.

Prerequisites:

AWS provides Delphix with a Container Storage Interface (CSI) driver, and using this driver, a given Kubernetes application can access S3 objects through a file system interface. To configure the mount point through this CSI driver, a Kubernetes cluster administrator/AWS account admin will need to perform the following tasks:

  1. Creation of S3 bucket

  2. Creation of IAM policy

  3. Creation of IAM Role

  4. Installing the mount point for S3 CSI driver (version v1.11.0 or later)

     

Troubleshooting tip

  • If installing the S3 CSI driver on microk8s, you may encounter the following error:

Copy
Could not check if "/var/snap/microk8s/common/var/lib/kubelet/pods/4a5d6dfb-646a-462d-a223-d510c059f8b7/volumes/kubernetes.io~csi/s3-pv/mount" is a mount point, Failed to read /host/proc/mounts: open /host/proc/mounts: invalid argument

Because microK8s uses /var/snap/microk8s/common/var/lib/kubelet as the kubelet path which is different from other standard distributions. We need to inform this path to driver while installing it using helm.

The helm chart installation command should be:

Copy
helm upgrade --install aws-mountpoint-s3-csi-driver --namespace kube-system aws-mountpoint-s3-csi-driver/aws-mountpoint-s3-csi-driver --set node.kubeletPath=/var/snap/microk8s/common/var/lib/kubelet

 

  • If you encounter the following error when deploying Hyperscale with an S3 bucket staging area, verify that the EC2 instance metadata hop limit is set to 2 or higher.

Copy
026-03-13T13:44:04.401924Z  INFO ThreadId(01) mountpoint_s3::run: mount-s3 1.21.0
2026-03-13T13:44:06.456114Z  WARN ThreadId(01) mountpoint_s3::cli: failed to detect network throughput. Using 10 gbps as throughput. Use --maximum-throughput-gbps CLI flag to configure a target throughput appropriate for the instance. Detection failed due to: failed to get instance type: IMDS query failed: Unknown CRT error
2026-03-13T13:44:06.456140Z  INFO ThreadId(01) mountpoint_s3::cli: target network throughput 10 Gbps
2026-03-13T13:44:08.822965Z  WARN ThreadId(05) list_objects{id=0 bucket="snowflakepoc-test-use1" continued=false delimiter="" max_keys="0" prefix=""}: mountpoint_s3_client::s3_crt_client: duration=2.349268751s request_id=<unknown> error=ClientError(NoSigningCredentials) meta request failed
Error: Failed to create S3 client
 
Caused by:
    0: initial ListObjectsV2 failed for bucket snowflakepoc-test-use1 in region us-east-1
    1: Client error
    2: No signing credentials available, see CRT debug logs
F0313 13:44:08.824833       1 main.go:55] Failed to run Mountpoint: exit status 1


Setup

Once the prerequisites are met, a persistent volume needs to be created. It can be achieved in either of the following ways:

  • Cluster administration creates a PV using the template below and shares the PV name which is configured against the parameter stagePvName in values.yaml.

    Copy
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: <pv-name>
    spec:
      capacity:
        storage: 5Gi # ignored, required
      accessModes:
        - ReadWriteMany
      mountOptions:
        - uid=65436
        - gid=50
        - allow-other
        - allow-delete
        - region <bucket-region>
      csi:
        driver: s3.csi.aws.com  #required
        volumeHandle: s3-csi-driver-volume
        volumeAttributes:
          bucketName: <bucket-name>


  •  Hyperscale Orchestrator creates the PV automatically during deployment using the AWS S3 related properties defined in values.yaml.


The following properties shall be specified in values.yaml (even if PV is created by cluster admin).

  • stagingStorageType

  • authMechanism

  • awsBucketName

  • awsBucketRegion

  • awsBucketPrefix

  • awsBucketDelimiter

  • awsAccessKey

  • awsSecretKey

  • stagingAwsS3SecretName

For more information on these parameters, see Commonly Used Properties

 

  • This feature is compatible only with Continuous Compliance engines v20.0.0 and above.

  • The stagingStorageType must be set as AWS_S3

  • The following two authentication mechanisms to connect to S3 bucket from the Continuous Compliance engine are supported:

    • Instance profile based authentication (AWS_ROLE)

    • Access and secret key based authentication (AWS_SECRET)

  • If  authMechanism is set as AWS_ROLE,  an IAM instance profile, granting access to the S3 bucket, must be attached to the EC2 instances of the Hyperscale Compliance engine (or to the EKS cluster in case of EKS deployment) and the Continuous Compliance engines. For further details, refer to this AWS Knowledge Center article.

  • If  authMechanism is set as AWS_SECRET, either of the following information shall be provided:

    • stagingAwsS3SecretName : Name of the Kubernetes secret that holds awsAccessKey and awsSecretKey information. It can be created by cluster admin using command line as follows:

      Copy
      kubectl create secret generic <secret-name> --from-literal=accessKeyId=<bucketAccessKeyId> --from-literal=secretAccessKey=<bucketSecretAccessKey>
    • awsAccessKey and awsSecretKey : Hyperscale Orchestrator will use these values and automatically create a generic secret.

 

Using AWS S3 Bucket as Staging Area (Optional for File Connector)

To use an AWS S3 bucket as the staging area, the File Connector must be configured with the PySpark writer. AWS S3–based staging is supported only when the writer is set to PySpark.

To enable AWS S3 as the staging area, set the following parameter in the values-file-connector.yaml file:

useS3URIStagingArea: true

When this option is enabled, the File Connector uses the S3 URI–based staging mechanism instead of CSI-based staging for file operations.

Switching between CSI-based staging and S3 URI–based staging is controlled through the useS3URIStagingArea configuration parameter.