Configuring AWS S3 bucket as staging area
The Hyperscale Compliance Orchestrator requires a staging area to read from the source file(s) and write to the target file(s). In Kubernetes deployment of Hyperscale Compliance, we can now leverage AWS S3 buckets accessible to the Hyperscale Compliance Orchestrator and the Continuous Compliance engine as the staging area. This feature is supported on microk8s and AWS EKS distributions.
Refer to the Staging area support matrix for certified connector and staging area configurations.
Prerequisites:
AWS provides Delphix with a Container Storage Interface (CSI) driver, and using this driver, a given Kubernetes application can access S3 objects through a file system interface. To configure the mount point through this CSI driver, a Kubernetes cluster administrator/AWS account admin will need to perform the following tasks:
-
Creation of S3 bucket
-
Installing the mount point for S3 CSI driver (version v1.11.0 or later)
Troubleshooting tip
-
If installing the S3 CSI driver on microk8s, you may encounter the following error:
Could not check if "/var/snap/microk8s/common/var/lib/kubelet/pods/4a5d6dfb-646a-462d-a223-d510c059f8b7/volumes/kubernetes.io~csi/s3-pv/mount" is a mount point, Failed to read /host/proc/mounts: open /host/proc/mounts: invalid argument
Because microK8s uses /var/snap/microk8s/common/var/lib/kubelet as the kubelet path which is different from other standard distributions. We need to inform this path to driver while installing it using helm.
The helm chart installation command should be:
helm upgrade --install aws-mountpoint-s3-csi-driver --namespace kube-system aws-mountpoint-s3-csi-driver/aws-mountpoint-s3-csi-driver --set node.kubeletPath=/var/snap/microk8s/common/var/lib/kubelet
-
If you encounter the following error when deploying Hyperscale with an S3 bucket staging area, verify that the EC2 instance metadata hop limit is set to 2 or higher.
026-03-13T13:44:04.401924Z INFO ThreadId(01) mountpoint_s3::run: mount-s3 1.21.0
2026-03-13T13:44:06.456114Z WARN ThreadId(01) mountpoint_s3::cli: failed to detect network throughput. Using 10 gbps as throughput. Use --maximum-throughput-gbps CLI flag to configure a target throughput appropriate for the instance. Detection failed due to: failed to get instance type: IMDS query failed: Unknown CRT error
2026-03-13T13:44:06.456140Z INFO ThreadId(01) mountpoint_s3::cli: target network throughput 10 Gbps
2026-03-13T13:44:08.822965Z WARN ThreadId(05) list_objects{id=0 bucket="snowflakepoc-test-use1" continued=false delimiter="" max_keys="0" prefix=""}: mountpoint_s3_client::s3_crt_client: duration=2.349268751s request_id=<unknown> error=ClientError(NoSigningCredentials) meta request failed
Error: Failed to create S3 client
Caused by:
0: initial ListObjectsV2 failed for bucket snowflakepoc-test-use1 in region us-east-1
1: Client error
2: No signing credentials available, see CRT debug logs
F0313 13:44:08.824833 1 main.go:55] Failed to run Mountpoint: exit status 1
Setup
Once the prerequisites are met, a persistent volume needs to be created. It can be achieved in either of the following ways:
-
Cluster administration creates a PV using the template below and shares the PV name which is configured against the parameter stagePvName in values.yaml.
CopyapiVersion: v1
kind: PersistentVolume
metadata:
name: <pv-name>
spec:
capacity:
storage: 5Gi # ignored, required
accessModes:
- ReadWriteMany
mountOptions:
- uid=65436
- gid=50
- allow-other
- allow-delete
- region <bucket-region>
csi:
driver: s3.csi.aws.com #required
volumeHandle: s3-csi-driver-volume
volumeAttributes:
bucketName: <bucket-name>
-
Hyperscale Orchestrator creates the PV automatically during deployment using the AWS S3 related properties defined in values.yaml.
The following properties shall be specified in values.yaml (even if PV is created by cluster admin).
-
stagingStorageType
-
authMechanism
-
awsBucketName
-
awsBucketRegion
-
awsBucketPrefix
-
awsBucketDelimiter
-
awsAccessKey
-
awsSecretKey
-
stagingAwsS3SecretName
For more information on these parameters, see Commonly Used Properties
-
This feature is compatible only with Continuous Compliance engines v20.0.0 and above.
-
The stagingStorageType must be set as AWS_S3
-
The following two authentication mechanisms to connect to S3 bucket from the Continuous Compliance engine are supported:
-
Instance profile based authentication (AWS_ROLE)
-
Access and secret key based authentication (AWS_SECRET)
-
-
If authMechanism is set as AWS_ROLE, an IAM instance profile, granting access to the S3 bucket, must be attached to the EC2 instances of the Hyperscale Compliance engine (or to the EKS cluster in case of EKS deployment) and the Continuous Compliance engines. For further details, refer to this AWS Knowledge Center article.
-
If authMechanism is set as AWS_SECRET, either of the following information shall be provided:
-
stagingAwsS3SecretName : Name of the Kubernetes secret that holds awsAccessKey and awsSecretKey information. It can be created by cluster admin using command line as follows:
Copykubectl create secret generic <secret-name> --from-literal=accessKeyId=<bucketAccessKeyId> --from-literal=secretAccessKey=<bucketSecretAccessKey> -
awsAccessKey and awsSecretKey : Hyperscale Orchestrator will use these values and automatically create a generic secret.
-
Using AWS S3 Bucket as Staging Area (Optional for File Connector)
To use an AWS S3 bucket as the staging area, the File Connector must be configured with the PySpark writer. AWS S3–based staging is supported only when the writer is set to PySpark.
To enable AWS S3 as the staging area, set the following parameter in the values-file-connector.yaml file:
useS3URIStagingArea: true
When this option is enabled, the File Connector uses the S3 URI–based staging mechanism instead of CSI-based staging for file operations.
Switching between CSI-based staging and S3 URI–based staging is controlled through the useS3URIStagingArea configuration parameter.