Skip to main content

crawlers

Creates, updates, deletes or gets a crawler resource or lists crawlers in a region

Overview

Namecrawlers
TypeResource
DescriptionResource Type definition for AWS::Glue::Crawler
Idawscc.glue.crawlers

Fields

NameDatatypeDescription
classifiersarrayA list of UTF-8 strings that specify the names of custom classifiers that are associated with the crawler.
descriptionstringA description of the crawler.
schema_change_policyobjectThe policy that specifies update and delete behaviors for the crawler. The policy tells the crawler what to do in the event that it detects a change in a table that already exists in the customer's database at the time of the crawl. The SchemaChangePolicy does not affect whether or how new tables and partitions are added. New tables and partitions are always created regardless of the SchemaChangePolicy on a crawler. The SchemaChangePolicy consists of two components, UpdateBehavior and DeleteBehavior.
configurationstringCrawler configuration information. This versioned JSON string allows users to specify aspects of a crawler's behavior.
recrawl_policyobjectWhen crawling an Amazon S3 data source after the first crawl is complete, specifies whether to crawl the entire dataset again or to crawl only folders that were added since the last crawler run. For more information, see Incremental Crawls in AWS Glue in the developer guide.
database_namestringThe name of the database in which the crawler's output is stored.
targetsobjectSpecifies data stores to crawl.
crawler_security_configurationstringThe name of the SecurityConfiguration structure to be used by this crawler.
namestringThe name of the crawler.
rolestringThe Amazon Resource Name (ARN) of an IAM role that's used to access customer resources, such as Amazon Simple Storage Service (Amazon S3) data.
lake_formation_configurationobjectSpecifies AWS Lake Formation configuration settings for the crawler
scheduleobjectA scheduling object using a cron statement to schedule an event.
table_prefixstringThe prefix added to the names of tables that are created.
tagsobjectThe tags to use with this crawler.
regionstringAWS region.

For more information, see AWS::Glue::Crawler.

Methods

NameResourceAccessible byRequired Params
create_resourcecrawlersINSERTRole, Targets, region
delete_resourcecrawlersDELETEIdentifier, region
update_resourcecrawlersUPDATEIdentifier, PatchDocument, region
list_resourcescrawlers_list_onlySELECTregion
get_resourcecrawlersSELECTIdentifier, region

SELECT examples

Gets all properties from an individual crawler.

SELECT
region,
classifiers,
description,
schema_change_policy,
configuration,
recrawl_policy,
database_name,
targets,
crawler_security_configuration,
name,
role,
lake_formation_configuration,
schedule,
table_prefix,
tags
FROM awscc.glue.crawlers
WHERE
region = '{{ region }}' AND
Identifier = '{{ name }}';

INSERT example

Use the following StackQL query and manifest file to create a new crawler resource, using stack-deploy.

/*+ create */
INSERT INTO awscc.glue.crawlers (
Targets,
Role,
region
)
SELECT
'{{ targets }}',
'{{ role }}',
'{{ region }}'
RETURNING
ErrorCode,
EventTime,
Identifier,
Operation,
OperationStatus,
RequestToken,
ResourceModel,
RetryAfter,
StatusMessage,
TypeName
;

UPDATE example

Use the following StackQL query and manifest file to update a crawler resource, using stack-deploy.

/*+ update */
UPDATE awscc.glue.crawlers
SET PatchDocument = string('{{ {
"Classifiers": classifiers,
"Description": description,
"SchemaChangePolicy": schema_change_policy,
"Configuration": configuration,
"RecrawlPolicy": recrawl_policy,
"DatabaseName": database_name,
"Targets": targets,
"CrawlerSecurityConfiguration": crawler_security_configuration,
"Role": role,
"LakeFormationConfiguration": lake_formation_configuration,
"Schedule": schedule,
"TablePrefix": table_prefix,
"Tags": tags
} | generate_patch_document }}')
WHERE
region = '{{ region }}' AND
Identifier = '{{ name }}'
RETURNING
ErrorCode,
EventTime,
Identifier,
Operation,
OperationStatus,
RequestToken,
ResourceModel,
RetryAfter,
StatusMessage,
TypeName
;

DELETE example

/*+ delete */
DELETE FROM awscc.glue.crawlers
WHERE
Identifier = '{{ name }}' AND
region = '{{ region }}'
RETURNING
ErrorCode,
EventTime,
Identifier,
Operation,
OperationStatus,
RequestToken,
ResourceModel,
RetryAfter,
StatusMessage,
TypeName
;

Additional Parameters

Mutable resources in the Cloud Control provider support additional optional parameters which can be supplied with INSERT, UPDATE, or DELETE operations. These include:

ParameterDescription
ClientToken
A unique identifier to ensure the idempotency of the resource request.This allows the provider to accurately distinguish between retries and new requests.
A client token is valid for 36 hours once used.
After that, a resource request with the same client token is treated as a new request.
If you do not specify a client token, one is generated for inclusion in the request.
RoleArn
The ARN of the IAM role used to perform this resource operation.The role specified must have the permissions required for this operation.
If you do not specify a role, a temporary session is created using your AWS user credentials.
TypeVersionId
For private resource types, the type version to use in this resource operation.If you do not specify a resource version, the default version is used.

Permissions

To operate on the crawlers resource, the following permissions are required:

glue:CreateCrawler,
glue:GetCrawler,
glue:TagResource,
iam:PassRole