crawlers
Creates, updates, deletes or gets a crawler resource or lists crawlers in a region
Overview
| Name | crawlers |
| Type | Resource |
| Description | Resource Type definition for AWS::Glue::Crawler |
| Id | awscc.glue.crawlers |
Fields
- get (all properties)
- list (identifiers only)
| Name | Datatype | Description |
|---|---|---|
classifiers | array | A list of UTF-8 strings that specify the names of custom classifiers that are associated with the crawler. |
description | string | A description of the crawler. |
schema_change_policy | object | The policy that specifies update and delete behaviors for the crawler. The policy tells the crawler what to do in the event that it detects a change in a table that already exists in the customer's database at the time of the crawl. The SchemaChangePolicy does not affect whether or how new tables and partitions are added. New tables and partitions are always created regardless of the SchemaChangePolicy on a crawler. The SchemaChangePolicy consists of two components, UpdateBehavior and DeleteBehavior. |
configuration | string | Crawler configuration information. This versioned JSON string allows users to specify aspects of a crawler's behavior. |
recrawl_policy | object | When crawling an Amazon S3 data source after the first crawl is complete, specifies whether to crawl the entire dataset again or to crawl only folders that were added since the last crawler run. For more information, see Incremental Crawls in AWS Glue in the developer guide. |
database_name | string | The name of the database in which the crawler's output is stored. |
targets | object | Specifies data stores to crawl. |
crawler_security_configuration | string | The name of the SecurityConfiguration structure to be used by this crawler. |
name | string | The name of the crawler. |
role | string | The Amazon Resource Name (ARN) of an IAM role that's used to access customer resources, such as Amazon Simple Storage Service (Amazon S3) data. |
lake_formation_configuration | object | Specifies AWS Lake Formation configuration settings for the crawler |
schedule | object | A scheduling object using a cron statement to schedule an event. |
table_prefix | string | The prefix added to the names of tables that are created. |
tags | object | The tags to use with this crawler. |
region | string | AWS region. |
| Name | Datatype | Description |
|---|---|---|
name | string | The name of the crawler. |
region | string | AWS region. |
For more information, see AWS::Glue::Crawler.
Methods
| Name | Resource | Accessible by | Required Params |
|---|---|---|---|
create_resource | crawlers | INSERT | Role, Targets, region |
delete_resource | crawlers | DELETE | Identifier, region |
update_resource | crawlers | UPDATE | Identifier, PatchDocument, region |
list_resources | crawlers_list_only | SELECT | region |
get_resource | crawlers | SELECT | Identifier, region |
SELECT examples
- get (all properties)
- list (identifiers only)
Gets all properties from an individual crawler.
SELECT
region,
classifiers,
description,
schema_change_policy,
configuration,
recrawl_policy,
database_name,
targets,
crawler_security_configuration,
name,
role,
lake_formation_configuration,
schedule,
table_prefix,
tags
FROM awscc.glue.crawlers
WHERE
region = 'us-east-1' AND
Identifier = '{{ name }}';
Lists all crawlers in a region.
SELECT
region,
name
FROM awscc.glue.crawlers_list_only
WHERE
region = 'us-east-1';
INSERT example
Use the following StackQL query and manifest file to create a new crawler resource, using stack-deploy.
- Required Properties
- All Properties
- Manifest
/*+ create */
INSERT INTO awscc.glue.crawlers (
Targets,
Role,
region
)
SELECT
'{{ targets }}',
'{{ role }}',
'{{ region }}';
/*+ create */
INSERT INTO awscc.glue.crawlers (
Classifiers,
Description,
SchemaChangePolicy,
Configuration,
RecrawlPolicy,
DatabaseName,
Targets,
CrawlerSecurityConfiguration,
Name,
Role,
LakeFormationConfiguration,
Schedule,
TablePrefix,
Tags,
region
)
SELECT
'{{ classifiers }}',
'{{ description }}',
'{{ schema_change_policy }}',
'{{ configuration }}',
'{{ recrawl_policy }}',
'{{ database_name }}',
'{{ targets }}',
'{{ crawler_security_configuration }}',
'{{ name }}',
'{{ role }}',
'{{ lake_formation_configuration }}',
'{{ schedule }}',
'{{ table_prefix }}',
'{{ tags }}',
'{{ region }}';
version: 1
name: stack name
description: stack description
providers:
- aws
globals:
- name: region
value: '{{ vars.AWS_REGION }}'
resources:
- name: crawler
props:
- name: classifiers
value:
- '{{ classifiers[0] }}'
- name: description
value: '{{ description }}'
- name: schema_change_policy
value:
update_behavior: '{{ update_behavior }}'
delete_behavior: '{{ delete_behavior }}'
- name: configuration
value: '{{ configuration }}'
- name: recrawl_policy
value:
recrawl_behavior: '{{ recrawl_behavior }}'
- name: database_name
value: '{{ database_name }}'
- name: targets
value:
s3_targets:
- connection_name: '{{ connection_name }}'
path: '{{ path }}'
sample_size: '{{ sample_size }}'
exclusions:
- '{{ exclusions[0] }}'
dlq_event_queue_arn: '{{ dlq_event_queue_arn }}'
event_queue_arn: '{{ event_queue_arn }}'
catalog_targets:
- connection_name: '{{ connection_name }}'
database_name: '{{ database_name }}'
dlq_event_queue_arn: '{{ dlq_event_queue_arn }}'
tables:
- '{{ tables[0] }}'
event_queue_arn: '{{ event_queue_arn }}'
delta_targets:
- connection_name: '{{ connection_name }}'
create_native_delta_table: '{{ create_native_delta_table }}'
write_manifest: '{{ write_manifest }}'
delta_tables:
- '{{ delta_tables[0] }}'
mongo_db_targets:
- connection_name: '{{ connection_name }}'
path: '{{ path }}'
jdbc_targets:
- connection_name: '{{ connection_name }}'
path: '{{ path }}'
exclusions:
- '{{ exclusions[0] }}'
enable_additional_metadata:
- '{{ enable_additional_metadata[0] }}'
dynamo_db_targets:
- path: '{{ path }}'
scan_all: '{{ scan_all }}'
scan_rate: null
iceberg_targets:
- connection_name: '{{ connection_name }}'
paths:
- '{{ paths[0] }}'
exclusions:
- '{{ exclusions[0] }}'
maximum_traversal_depth: '{{ maximum_traversal_depth }}'
hudi_targets:
- connection_name: '{{ connection_name }}'
paths:
- '{{ paths[0] }}'
exclusions:
- '{{ exclusions[0] }}'
maximum_traversal_depth: '{{ maximum_traversal_depth }}'
- name: crawler_security_configuration
value: '{{ crawler_security_configuration }}'
- name: name
value: '{{ name }}'
- name: role
value: '{{ role }}'
- name: lake_formation_configuration
value:
use_lake_formation_credentials: '{{ use_lake_formation_credentials }}'
account_id: '{{ account_id }}'
- name: schedule
value:
schedule_expression: '{{ schedule_expression }}'
- name: table_prefix
value: '{{ table_prefix }}'
- name: tags
value: {}
UPDATE example
Use the following StackQL query and manifest file to update a crawler resource, using stack-deploy.
/*+ update */
UPDATE awscc.glue.crawlers
SET PatchDocument = string('{{ {
"Classifiers": classifiers,
"Description": description,
"SchemaChangePolicy": schema_change_policy,
"Configuration": configuration,
"RecrawlPolicy": recrawl_policy,
"DatabaseName": database_name,
"Targets": targets,
"CrawlerSecurityConfiguration": crawler_security_configuration,
"Role": role,
"LakeFormationConfiguration": lake_formation_configuration,
"Schedule": schedule,
"TablePrefix": table_prefix,
"Tags": tags
} | generate_patch_document }}')
WHERE
region = '{{ region }}' AND
Identifier = '{{ name }}';
DELETE example
/*+ delete */
DELETE FROM awscc.glue.crawlers
WHERE
Identifier = '{{ name }}' AND
region = 'us-east-1';
Permissions
To operate on the crawlers resource, the following permissions are required:
- Create
- Read
- Update
- Delete
- List
glue:CreateCrawler,
glue:GetCrawler,
glue:TagResource,
iam:PassRole
glue:GetCrawler,
glue:GetTags,
iam:PassRole
glue:UpdateCrawler,
glue:UntagResource,
glue:TagResource,
iam:PassRole
glue:DeleteCrawler,
glue:GetCrawler,
glue:StopCrawler,
iam:PassRole
glue:ListCrawlers,
iam:PassRole