Skip to main content

crawlers

Creates, updates, deletes or gets a crawler resource or lists crawlers in a region

Overview

Namecrawlers
TypeResource
DescriptionResource Type definition for AWS::Glue::Crawler
Idawscc.glue.crawlers

Fields

NameDatatypeDescription
classifiersarrayA list of UTF-8 strings that specify the names of custom classifiers that are associated with the crawler.
descriptionstringA description of the crawler.
schema_change_policyobjectThe policy that specifies update and delete behaviors for the crawler. The policy tells the crawler what to do in the event that it detects a change in a table that already exists in the customer's database at the time of the crawl. The SchemaChangePolicy does not affect whether or how new tables and partitions are added. New tables and partitions are always created regardless of the SchemaChangePolicy on a crawler. The SchemaChangePolicy consists of two components, UpdateBehavior and DeleteBehavior.
configurationstringCrawler configuration information. This versioned JSON string allows users to specify aspects of a crawler's behavior.
recrawl_policyobjectWhen crawling an Amazon S3 data source after the first crawl is complete, specifies whether to crawl the entire dataset again or to crawl only folders that were added since the last crawler run. For more information, see Incremental Crawls in AWS Glue in the developer guide.
database_namestringThe name of the database in which the crawler's output is stored.
targetsobjectSpecifies data stores to crawl.
crawler_security_configurationstringThe name of the SecurityConfiguration structure to be used by this crawler.
namestringThe name of the crawler.
rolestringThe Amazon Resource Name (ARN) of an IAM role that's used to access customer resources, such as Amazon Simple Storage Service (Amazon S3) data.
lake_formation_configurationobjectSpecifies AWS Lake Formation configuration settings for the crawler
scheduleobjectA scheduling object using a cron statement to schedule an event.
table_prefixstringThe prefix added to the names of tables that are created.
tagsobjectThe tags to use with this crawler.
regionstringAWS region.

For more information, see AWS::Glue::Crawler.

Methods

NameResourceAccessible byRequired Params
create_resourcecrawlersINSERTRole, Targets, region
delete_resourcecrawlersDELETEIdentifier, region
update_resourcecrawlersUPDATEIdentifier, PatchDocument, region
list_resourcescrawlers_list_onlySELECTregion
get_resourcecrawlersSELECTIdentifier, region

SELECT examples

Gets all properties from an individual crawler.

SELECT
region,
classifiers,
description,
schema_change_policy,
configuration,
recrawl_policy,
database_name,
targets,
crawler_security_configuration,
name,
role,
lake_formation_configuration,
schedule,
table_prefix,
tags
FROM awscc.glue.crawlers
WHERE
region = 'us-east-1' AND
Identifier = '{{ name }}';

INSERT example

Use the following StackQL query and manifest file to create a new crawler resource, using stack-deploy.

/*+ create */
INSERT INTO awscc.glue.crawlers (
Targets,
Role,
region
)
SELECT
'{{ targets }}',
'{{ role }}',
'{{ region }}';

UPDATE example

Use the following StackQL query and manifest file to update a crawler resource, using stack-deploy.

/*+ update */
UPDATE awscc.glue.crawlers
SET PatchDocument = string('{{ {
"Classifiers": classifiers,
"Description": description,
"SchemaChangePolicy": schema_change_policy,
"Configuration": configuration,
"RecrawlPolicy": recrawl_policy,
"DatabaseName": database_name,
"Targets": targets,
"CrawlerSecurityConfiguration": crawler_security_configuration,
"Role": role,
"LakeFormationConfiguration": lake_formation_configuration,
"Schedule": schedule,
"TablePrefix": table_prefix,
"Tags": tags
} | generate_patch_document }}')
WHERE
region = '{{ region }}' AND
Identifier = '{{ name }}';

DELETE example

/*+ delete */
DELETE FROM awscc.glue.crawlers
WHERE
Identifier = '{{ name }}' AND
region = 'us-east-1';

Permissions

To operate on the crawlers resource, the following permissions are required:

glue:CreateCrawler,
glue:GetCrawler,
glue:TagResource,
iam:PassRole