clusters
Creates, updates, deletes or gets a cluster resource or lists clusters in a region
Overview
| Name | clusters |
| Type | Resource |
| Description | Resource Type definition for AWS::SageMaker::Cluster |
| Id | awscc.sagemaker.clusters |
Fields
- get (all properties)
- list (identifiers only)
| Name | Datatype | Description |
|---|---|---|
cluster_arn | string | The Amazon Resource Name (ARN) of the HyperPod Cluster. |
vpc_config | object | Specifies an Amazon Virtual Private Cloud (VPC) that your SageMaker jobs, hosted models, and compute resources have access to. You can control access to and from your resources by configuring a VPC. For more information, see https://docs.aws.amazon.com/sagemaker/latest/dg/infrastructure-give-access.html |
node_recovery | string | If node auto-recovery is set to true, faulty nodes will be replaced or rebooted when a failure is detected. If set to false, nodes will be labelled when a fault is detected. |
instance_groups | array | The instance groups of the SageMaker HyperPod cluster. |
restricted_instance_groups | array | The restricted instance groups of the SageMaker HyperPod cluster. |
orchestrator | object | Specifies parameter(s) specific to the orchestrator, e.g. specify the EKS cluster. |
cluster_role | string | The cluster role for the autoscaler to assume. |
node_provisioning_mode | string | Determines the scaling strategy for the SageMaker HyperPod cluster. When set to 'Continuous', enables continuous scaling which dynamically manages node provisioning. If the parameter is omitted, uses the standard scaling approach in previous release. |
creation_time | string | The time at which the HyperPod cluster was created. |
cluster_name | string | The name of the HyperPod Cluster. |
failure_message | string | The failure message of the HyperPod Cluster. |
auto_scaling | object | Configuration for cluster auto-scaling |
cluster_status | string | The status of the HyperPod Cluster. |
tags | array | Custom tags for managing the SageMaker HyperPod cluster as an AWS resource. You can add tags to your cluster in the same way you add them in other AWS services that support tagging. |
region | string | AWS region. |
| Name | Datatype | Description |
|---|---|---|
cluster_arn | string | The Amazon Resource Name (ARN) of the HyperPod Cluster. |
region | string | AWS region. |
For more information, see AWS::SageMaker::Cluster.
Methods
| Name | Resource | Accessible by | Required Params |
|---|---|---|---|
create_resource | clusters | INSERT | , region |
delete_resource | clusters | DELETE | Identifier, region |
update_resource | clusters | UPDATE | Identifier, PatchDocument, region |
list_resources | clusters_list_only | SELECT | region |
get_resource | clusters | SELECT | Identifier, region |
SELECT examples
- get (all properties)
- list (identifiers only)
Gets all properties from an individual cluster.
SELECT
region,
cluster_arn,
vpc_config,
node_recovery,
instance_groups,
restricted_instance_groups,
orchestrator,
cluster_role,
node_provisioning_mode,
creation_time,
cluster_name,
failure_message,
auto_scaling,
cluster_status,
tags
FROM awscc.sagemaker.clusters
WHERE
region = 'us-east-1' AND
Identifier = '{{ cluster_arn }}';
Lists all clusters in a region.
SELECT
region,
cluster_arn
FROM awscc.sagemaker.clusters_list_only
WHERE
region = 'us-east-1';
INSERT example
Use the following StackQL query and manifest file to create a new cluster resource, using stack-deploy.
- Required Properties
- All Properties
- Manifest
/*+ create */
INSERT INTO awscc.sagemaker.clusters (
,
region
)
SELECT
'{{ }}',
'{{ region }}';
/*+ create */
INSERT INTO awscc.sagemaker.clusters (
VpcConfig,
NodeRecovery,
InstanceGroups,
RestrictedInstanceGroups,
Orchestrator,
ClusterRole,
NodeProvisioningMode,
ClusterName,
AutoScaling,
Tags,
region
)
SELECT
'{{ vpc_config }}',
'{{ node_recovery }}',
'{{ instance_groups }}',
'{{ restricted_instance_groups }}',
'{{ orchestrator }}',
'{{ cluster_role }}',
'{{ node_provisioning_mode }}',
'{{ cluster_name }}',
'{{ auto_scaling }}',
'{{ tags }}',
'{{ region }}';
version: 1
name: stack name
description: stack description
providers:
- aws
globals:
- name: region
value: '{{ vars.AWS_REGION }}'
resources:
- name: cluster
props:
- name: vpc_config
value:
security_group_ids:
- '{{ security_group_ids[0] }}'
subnets:
- '{{ subnets[0] }}'
- name: node_recovery
value: '{{ node_recovery }}'
- name: instance_groups
value:
- instance_group_name: '{{ instance_group_name }}'
instance_storage_configs:
- {}
life_cycle_config:
source_s3_uri: '{{ source_s3_uri }}'
on_create: '{{ on_create }}'
training_plan_arn: '{{ training_plan_arn }}'
threads_per_core: '{{ threads_per_core }}'
override_vpc_config: null
instance_count: '{{ instance_count }}'
on_start_deep_health_checks:
- '{{ on_start_deep_health_checks[0] }}'
image_id: '{{ image_id }}'
current_count: '{{ current_count }}'
scheduled_update_config:
schedule_expression: '{{ schedule_expression }}'
deployment_config:
auto_rollback_configuration:
alarms:
- alarm_name: '{{ alarm_name }}'
blue_green_update_policy:
maximum_execution_timeout_in_seconds: '{{ maximum_execution_timeout_in_seconds }}'
termination_wait_in_seconds: '{{ termination_wait_in_seconds }}'
traffic_routing_configuration:
canary_size:
type: '{{ type }}'
value: '{{ value }}'
linear_step_size: null
type: '{{ type }}'
wait_interval_in_seconds: '{{ wait_interval_in_seconds }}'
rolling_update_policy:
maximum_batch_size: null
maximum_execution_timeout_in_seconds: '{{ maximum_execution_timeout_in_seconds }}'
rollback_maximum_batch_size: null
wait_interval_in_seconds: '{{ wait_interval_in_seconds }}'
instance_type: '{{ instance_type }}'
execution_role: '{{ execution_role }}'
- name: restricted_instance_groups
value:
- override_vpc_config: null
instance_count: '{{ instance_count }}'
on_start_deep_health_checks: null
environment_config:
f_sx_lustre_config:
size_in_gi_b: '{{ size_in_gi_b }}'
per_unit_storage_throughput: '{{ per_unit_storage_throughput }}'
instance_group_name: null
instance_storage_configs: null
current_count: '{{ current_count }}'
training_plan_arn: '{{ training_plan_arn }}'
instance_type: null
threads_per_core: '{{ threads_per_core }}'
execution_role: null
- name: orchestrator
value:
eks:
cluster_arn: '{{ cluster_arn }}'
- name: cluster_role
value: '{{ cluster_role }}'
- name: node_provisioning_mode
value: '{{ node_provisioning_mode }}'
- name: cluster_name
value: '{{ cluster_name }}'
- name: auto_scaling
value:
mode: '{{ mode }}'
auto_scaler_type: '{{ auto_scaler_type }}'
- name: tags
value:
- value: '{{ value }}'
key: '{{ key }}'
UPDATE example
Use the following StackQL query and manifest file to update a cluster resource, using stack-deploy.
/*+ update */
UPDATE awscc.sagemaker.clusters
SET PatchDocument = string('{{ {
"NodeRecovery": node_recovery,
"ClusterRole": cluster_role,
"NodeProvisioningMode": node_provisioning_mode,
"AutoScaling": auto_scaling,
"Tags": tags
} | generate_patch_document }}')
WHERE
region = '{{ region }}' AND
Identifier = '{{ cluster_arn }}';
DELETE example
/*+ delete */
DELETE FROM awscc.sagemaker.clusters
WHERE
Identifier = '{{ cluster_arn }}' AND
region = 'us-east-1';
Permissions
To operate on the clusters resource, the following permissions are required:
- Read
- Create
- Update
- List
- Delete
sagemaker:DescribeCluster,
sagemaker:ListTags
sagemaker:CreateCluster,
sagemaker:DescribeCluster,
sagemaker:UpdateClusterSoftware,
sagemaker:AddTags,
sagemaker:ListTags,
sagemaker:BatchAddClusterNodes,
sagemaker:BatchDeleteClusterNodes,
eks:DescribeAccessEntry,
eks:DescribeCluster,
eks:CreateAccessEntry,
eks:DeleteAccessEntry,
eks:AssociateAccessPolicy,
iam:CreateServiceLinkedRole,
iam:PassRole,
kms:DescribeKey,
kms:CreateGrant,
ec2:DescribeImages,
ec2:DescribeSnapshots,
ec2:ModifyImageAttribute,
ec2:ModifySnapshotAttribute
sagemaker:UpdateCluster,
sagemaker:UpdateClusterSoftware,
sagemaker:DescribeCluster,
sagemaker:ListTags,
sagemaker:AddTags,
sagemaker:DeleteTags,
sagemaker:BatchAddClusterNodes,
sagemaker:BatchDeleteClusterNodes,
eks:DescribeAccessEntry,
eks:DescribeCluster,
eks:CreateAccessEntry,
eks:DeleteAccessEntry,
eks:AssociateAccessPolicy,
iam:PassRole,
kms:DescribeKey,
kms:CreateGrant,
sagemaker:BatchAddClusterNodes,
sagemaker:BatchDeleteClusterNodes,
ec2:DescribeImages,
ec2:DescribeSnapshots,
ec2:ModifyImageAttribute,
ec2:ModifySnapshotAttribute
sagemaker:ListClusters
sagemaker:DeleteCluster,
sagemaker:DescribeCluster,
eks:DescribeAccessEntry,
eks:DeleteAccessEntry