clusters
Creates, updates, deletes or gets a cluster resource or lists clusters in a region
Overview
| Name | clusters |
| Type | Resource |
| Description | Resource Type definition for AWS::SageMaker::Cluster |
| Id | awscc.sagemaker.clusters |
Fields
| Name | Datatype | Description |
|---|---|---|
cluster_arn | string | The Amazon Resource Name (ARN) of the HyperPod Cluster. |
vpc_config | object | Specifies an Amazon Virtual Private Cloud (VPC) that your SageMaker jobs, hosted models, and compute resources have access to. You can control access to and from your resources by configuring a VPC. For more information, see https://docs.aws.amazon.com/sagemaker/latest/dg/infrastructure-give-access.html |
node_recovery | string | If node auto-recovery is set to true, faulty nodes will be replaced or rebooted when a failure is detected. If set to false, nodes will be labelled when a fault is detected. |
instance_groups | array | The instance groups of the SageMaker HyperPod cluster. |
restricted_instance_groups | array | The restricted instance groups of the SageMaker HyperPod cluster. |
orchestrator | object | Specifies parameter(s) specific to the orchestrator, e.g. specify the EKS cluster. |
cluster_role | string | The cluster role for the autoscaler to assume. |
node_provisioning_mode | string | Determines the scaling strategy for the SageMaker HyperPod cluster. When set to 'Continuous', enables continuous scaling which dynamically manages node provisioning. If the parameter is omitted, uses the standard scaling approach in previous release. |
creation_time | string | The time at which the HyperPod cluster was created. |
cluster_name | string | The name of the HyperPod Cluster. |
failure_message | string | The failure message of the HyperPod Cluster. |
auto_scaling | object | Configuration for cluster auto-scaling |
cluster_status | string | The status of the HyperPod Cluster. |
tags | array | Custom tags for managing the SageMaker HyperPod cluster as an AWS resource. You can add tags to your cluster in the same way you add them in other AWS services that support tagging. |
region | string | AWS region. |
For more information, see AWS::SageMaker::Cluster.
Methods
| Name | Accessible by | Required Params |
|---|---|---|
create_resource | INSERT | , region |
delete_resource | DELETE | data__Identifier, region |
update_resource | UPDATE | data__Identifier, data__PatchDocument, region |
list_resources | SELECT | region |
get_resource | SELECT | data__Identifier, region |
SELECT examples
Gets all properties from an individual cluster.
SELECT
region,
cluster_arn,
vpc_config,
node_recovery,
instance_groups,
restricted_instance_groups,
orchestrator,
cluster_role,
node_provisioning_mode,
creation_time,
cluster_name,
failure_message,
auto_scaling,
cluster_status,
tags
FROM awscc.sagemaker.clusters
WHERE region = 'us-east-1' AND data__Identifier = '<ClusterArn>';
INSERT example
Use the following StackQL query and manifest file to create a new cluster resource, using stack-deploy.
- Required Properties
- All Properties
- Manifest
/*+ create */
INSERT INTO awscc.sagemaker.clusters (
,
region
)
SELECT
'{{ }}',
'{{ region }}';
/*+ create */
INSERT INTO awscc.sagemaker.clusters (
VpcConfig,
NodeRecovery,
InstanceGroups,
RestrictedInstanceGroups,
Orchestrator,
ClusterRole,
NodeProvisioningMode,
ClusterName,
AutoScaling,
Tags,
region
)
SELECT
'{{ VpcConfig }}',
'{{ NodeRecovery }}',
'{{ InstanceGroups }}',
'{{ RestrictedInstanceGroups }}',
'{{ Orchestrator }}',
'{{ ClusterRole }}',
'{{ NodeProvisioningMode }}',
'{{ ClusterName }}',
'{{ AutoScaling }}',
'{{ Tags }}',
'{{ region }}';
version: 1
name: stack name
description: stack description
providers:
- aws
globals:
- name: region
value: '{{ vars.AWS_REGION }}'
resources:
- name: cluster
props:
- name: VpcConfig
value:
SecurityGroupIds:
- '{{ SecurityGroupIds[0] }}'
Subnets:
- '{{ Subnets[0] }}'
- name: NodeRecovery
value: '{{ NodeRecovery }}'
- name: InstanceGroups
value:
- InstanceGroupName: '{{ InstanceGroupName }}'
InstanceStorageConfigs:
- {}
LifeCycleConfig:
SourceS3Uri: '{{ SourceS3Uri }}'
OnCreate: '{{ OnCreate }}'
TrainingPlanArn: '{{ TrainingPlanArn }}'
ThreadsPerCore: '{{ ThreadsPerCore }}'
OverrideVpcConfig: null
InstanceCount: '{{ InstanceCount }}'
OnStartDeepHealthChecks:
- '{{ OnStartDeepHealthChecks[0] }}'
ImageId: '{{ ImageId }}'
CurrentCount: '{{ CurrentCount }}'
ScheduledUpdateConfig:
ScheduleExpression: '{{ ScheduleExpression }}'
DeploymentConfig:
AutoRollbackConfiguration:
Alarms:
- AlarmName: '{{ AlarmName }}'
BlueGreenUpdatePolicy:
MaximumExecutionTimeoutInSeconds: '{{ MaximumExecutionTimeoutInSeconds }}'
TerminationWaitInSeconds: '{{ TerminationWaitInSeconds }}'
TrafficRoutingConfiguration:
CanarySize:
Type: '{{ Type }}'
Value: '{{ Value }}'
LinearStepSize: null
Type: '{{ Type }}'
WaitIntervalInSeconds: '{{ WaitIntervalInSeconds }}'
RollingUpdatePolicy:
MaximumBatchSize: null
MaximumExecutionTimeoutInSeconds: '{{ MaximumExecutionTimeoutInSeconds }}'
RollbackMaximumBatchSize: null
WaitIntervalInSeconds: '{{ WaitIntervalInSeconds }}'
InstanceType: '{{ InstanceType }}'
ExecutionRole: '{{ ExecutionRole }}'
- name: RestrictedInstanceGroups
value:
- OverrideVpcConfig: null
InstanceCount: '{{ InstanceCount }}'
OnStartDeepHealthChecks: null
EnvironmentConfig:
FSxLustreConfig:
SizeInGiB: '{{ SizeInGiB }}'
PerUnitStorageThroughput: '{{ PerUnitStorageThroughput }}'
InstanceGroupName: null
InstanceStorageConfigs: null
CurrentCount: '{{ CurrentCount }}'
TrainingPlanArn: '{{ TrainingPlanArn }}'
InstanceType: null
ThreadsPerCore: '{{ ThreadsPerCore }}'
ExecutionRole: null
- name: Orchestrator
value:
Eks:
ClusterArn: '{{ ClusterArn }}'
- name: ClusterRole
value: '{{ ClusterRole }}'
- name: NodeProvisioningMode
value: '{{ NodeProvisioningMode }}'
- name: ClusterName
value: '{{ ClusterName }}'
- name: AutoScaling
value:
Mode: '{{ Mode }}'
AutoScalerType: '{{ AutoScalerType }}'
- name: Tags
value:
- Value: '{{ Value }}'
Key: '{{ Key }}'
DELETE example
/*+ delete */
DELETE FROM awscc.sagemaker.clusters
WHERE data__Identifier = '<ClusterArn>'
AND region = 'us-east-1';
Permissions
To operate on the clusters resource, the following permissions are required:
Read
sagemaker:DescribeCluster,
sagemaker:ListTags
Create
sagemaker:CreateCluster,
sagemaker:DescribeCluster,
sagemaker:UpdateClusterSoftware,
sagemaker:AddTags,
sagemaker:ListTags,
sagemaker:BatchAddClusterNodes,
sagemaker:BatchDeleteClusterNodes,
eks:DescribeAccessEntry,
eks:DescribeCluster,
eks:CreateAccessEntry,
eks:DeleteAccessEntry,
eks:AssociateAccessPolicy,
iam:CreateServiceLinkedRole,
iam:PassRole,
kms:DescribeKey,
kms:CreateGrant,
ec2:DescribeImages,
ec2:DescribeSnapshots,
ec2:ModifyImageAttribute,
ec2:ModifySnapshotAttribute
Update
sagemaker:UpdateCluster,
sagemaker:UpdateClusterSoftware,
sagemaker:DescribeCluster,
sagemaker:ListTags,
sagemaker:AddTags,
sagemaker:DeleteTags,
sagemaker:BatchAddClusterNodes,
sagemaker:BatchDeleteClusterNodes,
eks:DescribeAccessEntry,
eks:DescribeCluster,
eks:CreateAccessEntry,
eks:DeleteAccessEntry,
eks:AssociateAccessPolicy,
iam:PassRole,
kms:DescribeKey,
kms:CreateGrant,
sagemaker:BatchAddClusterNodes,
sagemaker:BatchDeleteClusterNodes,
ec2:DescribeImages,
ec2:DescribeSnapshots,
ec2:ModifyImageAttribute,
ec2:ModifySnapshotAttribute
List
sagemaker:ListClusters
Delete
sagemaker:DeleteCluster,
sagemaker:DescribeCluster,
eks:DescribeAccessEntry,
eks:DeleteAccessEntry