Shows how to use the AWS SDK for .NET to work with AWS Glue.
AWS Glue is a scalable, serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development.
- Running this code might result in charges to your AWS account. For more details, see AWS Pricing and Free Tier.
- Running the tests might result in charges to your AWS account.
- We recommend that you grant your code least privilege. At most, grant only the minimum permissions required to perform the task. For more information, see Grant least privilege.
- This code is not tested in every AWS Region. For more information, see AWS Regional Services.
For prerequisites, see the README in the dotnetv3
folder.
- Hello AWS Glue (
ListJobs
)
Code examples that show you how to perform the essential operations within a service.
Code excerpts that show you how to call individual service functions.
- CreateCrawler
- CreateJob
- DeleteCrawler
- DeleteDatabase
- DeleteJob
- DeleteTable
- GetCrawler
- GetDatabase
- GetJobRun
- GetJobRuns
- GetTables
- ListJobs
- StartCrawler
- StartJobRun
For general instructions to run the examples, see the
README in the dotnetv3
folder.
Some projects might include a settings.json file. Before compiling the project, you can change these values to match your own account and resources. Alternatively, add a settings.local.json file with your local settings, which will be loaded automatically when the application runs.
After the example compiles, you can run it from the command line. To do so, navigate to the folder that contains the .csproj file and run the following command:
dotnet run
Alternatively, you can run the example from within your IDE.
This example shows you how to get started using AWS Glue.
This example shows you how to do the following:
- Create a crawler that crawls a public Amazon S3 bucket and generates a database of CSV-formatted metadata.
- List information about databases and tables in your AWS Glue Data Catalog.
- Create a job to extract CSV data from the S3 bucket, transform the data, and load JSON-formatted output into another S3 bucket.
- List information about job runs, view transformed data, and clean up resources.
This scenario requires the following scaffold resources:
- An S3 bucket that can contain the Python ETL job script and receive output data.
- An AWS Identity and Access Management (IAM) role that can be assumed by AWS Glue. The role must grant read-write access to the S3 bucket and standard rights needed by AWS Glue.
You can deploy and destroy these resources by using the AWS Cloud Development Kit
(AWS CDK). To do this, run cdk deploy
or cdk destroy
in the
/resources/cdk/glue_role_bucket folder.
When the AWS CDK script reports the bucket name and the IAM role that was created, open the settings.json
file and fill in
the BucketName, RoleName, and ScriptURL values.
Also copy the Python script flight_etl_job_script.py
from
/aws-doc-sdk-examples/python/example_code/glue/flight_etl_job_script.py
to the S3 bucket.
Example:
"BucketName": "bucket-name-from-cdk-script",
"CrawlerName": "any-name-for-crawler",
"RoleName": "role-name-from-cdk-script",
"SourceData": "s3://crawler-public-us-east-1/flight/2016/csv",
"DbName": "example-flights-db",
"Cron": "cron(15 12 * * ? *)",
"ScriptURL": s3://bucket-name-from-cdk-script/flight_etl_job_script.py
"JobName": "glue-mvp-job"
⚠ Running tests might result in charges to your AWS account.
To find instructions for running these tests, see the README
in the dotnetv3
folder.
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
SPDX-License-Identifier: Apache-2.0