diff --git a/tests/system/providers/amazon/CONTRIBUTING.md b/tests/system/providers/amazon/CONTRIBUTING.md new file mode 100644 index 0000000000000..ee4c713ec6ca6 --- /dev/null +++ b/tests/system/providers/amazon/CONTRIBUTING.md @@ -0,0 +1,199 @@ + + +# Goals + +This document aims to develop and maintain consistency across the system tests in +the Amazon provider package and sets forth guidelines to use when possible. With +the expectation that there will always be some edge cases, please refer to this +document as a guide when contributing system tests for the Amazon provider package. + +Contributions are welcome and are greatly appreciated! Every bit helps, and credit +will always be given. For example, I stole that line from @potiuk! + +# Scope + +This guide is meant to be used in addition to the [community system test design +guide](/tests/system/README.md) and only applies to system tests within the Amazon +provider package, though other providers are welcome to adopt them as well. In any +cases of conflicting information or confusion, this document should take precedence +within the Amazon provider package. + +# Tenants + +* Keep it Self-Contained +* Keep it DAMP +* Use the Helpers + +## Keep it Self-Contained + +For system tests, we would prefer to see each test be a fully self-contained unit. +This means that any infrastructure which is needed, such as an S3 bucket, must be +created and destroyed in the test's setup and teardown steps. Some exceptions are +outlined below in the `SystemTestContextBuilder()` section. Please document all +prerequisites for a given system test in your code for easy replication. + +## Keep it DAMP + +Whenever possible, avoid adding new helper methods to reuse configuration or setup +code. This means there will be duplicate code across tests, which violates the DRY +(Don't Repeat Yourself) programming principle, but ensures that any given test can +be modified easily without having to consider what those changes may mean for every +other test. + +Similarly, whenever possible, a variable should be defined in the module instead of +checking for it in external locations. For example, if you create and break down an +S3 bucket, the bucket name should be defined in the file rather than setting it with +an environment variable. + +## Use the Helpers + +Some helper methods for system tests have been provided in the [utils module](aws/utils/__init__.py). + +### SystemTestContextBuilder() + +`SystemTestContextBuilder()` should be used as the first task in every Amazon system +test. When used without any stub methods, it will generate and set the test's `ENV_ID`. +It will check for an environment variable defining the system test environment ID. If +one is found, it will validate that the ID meets system test requirements. Otherwise, +it will generate one and export a value. For more information, see `ENV_ID` below. + +Any variables which need to be fetched from an external source, such as an environment +variable or SSM, should be added with the `.add_variable()` stub method. The method +will check a number of sources to find a value before falling back to the default. +Use this to provide values for things which must be generated before the test can run. +Whenever possible, it is preferred to use default values when creating these resources +and document the settings in the code for easy reproduction. For example, a comment +such as "This test requires the ARN of a default EC2 instance named `system_test_instance`." +is preferred over creating an EC2 instance using custom settings to try to improve +performance or reduce expense. + +Some examples of when this should be used are: + +1. A resource which needs special or elevated privileges to create. For example, an IAM Role. +2. A resource which is very time-consuming to create and would slow down tests. For example, an RDS DB. +3. A default resource which is shared across many tests. For example, the default VPC. + +### set_env_id() [*DEPRECATED*] + +**NOTE:** `set_env_id()` is deprecated as this functionality is now handled by the +`SystemTestContextBuilder()`. Early AIP-47 system tests used this method so the +below description will remain here until those tests can be updated. + +`set_env_id()` should ~~always~~ no longer be used to set the value for `ENV_ID`. It +will check for an environment variable defining the system test environment ID. If one +is not found, it will generate one and export it. If one is found, it will validate that +the ID meets system requirements. For more information, see `ENV_ID` below. + +### fetch_variable() [*DEPRECATED*] + +**NOTE:** Calling `fetch_variable()` directly is deprecated. Instead, please use the +`.add_variable()` stub method of `SystemTestContextBuilder()`. Early AIP-47 system +tests used this method so the below description will remain here until those tests can +be updated. + +`fetch_variable()` accepts the name of a variable and an optional default value, and +will check a number of sources to find a value for it before falling back to the default. +Use this for any constant whose value is being sourced from outside the module. This is +used to pass values in for things which must be generated before the test can run. + +Some examples of when this should be used are: + +1. A resource which needs special or excessively elevated privileges to create. For example, an IAM Role. +2. A resource which would be very time expensive to create and would slow down tests For example, an RDS DB. +3. A default resource which is shared across many tests. For example, the default VPC. + +# Conventions + +## Location and Naming + +All system tests for the Amazon provider package should be in `tests/system/providers/amazon/aws/`. +If there is only one system test for a given service, the module name should be `example_{service}`. +For example, `example_athena.py`. If more than one module is required, the names should be +descriptive. For example, `example_redshift_cluster.py` and `example_redshift_sql.py`. + +## Environment ID + +`ENV_ID` should be set via the the `SystemTestContextBuilder` and not manually. This +value should be used as part of any value which is test-specific such as the name of +an S3 bucket being created. For example, `BUCKET_NAME = f'{ENV_ID}-test-bucket'`. + +If you choose to define an ENV_ID in your local environment rather than let the helper +generate one, the value must be lower-case alphanumeric with no special characters and +start with a letter. This ensures maximum compatibility across Amazon services since +different services and resources have different naming requirements. + +## DAG ID + +`DAG_ID` should be the first constant in the module and will be used only when declaring +the DAG object. The value should be the same as the module name. For example, the module +`example_athena.py` should have a DAG_ID value of `example_athena`. We declare the `DAG_ID` +as a constant as defined in AIP-47, but due to different services using different naming +conventions, the underscore in the `DAG_ID` is not always safe to use. Therefore, for the +sake of consistency, `ENV_ID` will be used when building asset names instead. + +## IAM Role ARNs + +When required, an IAM Role ARN should be imported using the `add_variable()` stub with +the key `ROLE_ARN`. `add_variable()` allows the test runner to store the values in the +environment variables or in a secret manager and standardizing the key name makes using +a secret manager easier. + +## Task Dependencies + +The task flow definition at the end of the DAG should use the chain() method rather than +the `>>` notation. It should also include comment lines to denote the Setup, Body, and +Teardown stages of the test if such stages exist. For example: + +``` +chain( + # TEST SETUP + test_context, + create_database, + create_table, + # TEST BODY + copy_selected_data, + # TEST TEARDOWN + delete_database, +) +``` + +## Cleanup + +If your test uses any cleanup tasks, they should include `trigger_rule=TriggerRule.ALL_DONE` +to ensure they run even if some test steps fail. This in turn requires that the `watcher()` +method is added as outlined in [AIP-47](https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-47+New+design+of+Airflow+System+Tests). +For example: + +``` +chain( + # TEST SETUP + task0, + # TEST BODY + task1, + # TEST TEARDOWN + task2, # task2 has trigger rule "all done" defined +) + +from tests.system.utils.watcher import watcher + +# This test needs watcher in order to properly mark success/failure +# when "tearDown" task with trigger rule is part of the DAG +list(dag.tasks) >> watcher +``` diff --git a/tests/system/providers/amazon/README.md b/tests/system/providers/amazon/README.md index d87b5828d4783..54106e0d36693 100644 --- a/tests/system/providers/amazon/README.md +++ b/tests/system/providers/amazon/README.md @@ -43,7 +43,7 @@ tests/system/providers/amazon/aws ## Initial configuration -Each test requires some environment variables. Check how to set them up on your +Each test may require some environment variables. Check how to set them up on your operating system, but on UNIX-based operating systems `export NAME_OF_ENV_VAR=value` should work. To confirm that it is set up correctly, run `echo $NAME_OF_ENV_VAR` which will display its value. @@ -57,21 +57,28 @@ NAME_OF_ENV_VAR=value pytest --system amazon tests/system/providers/amazon/aws/e ### Required environment variables -- `SYSTEM_TESTS_ENV_ID` - AWS System Tests will generate and export this value if one does not exist. +- `SYSTEM_TESTS_ENV_ID` - AWS System Tests use `SystemTestContextBuilder` to generate +and export this value if one does not exist. - An environment ID is a unique value across different executions of system tests. This - is needed because the CI environment may run the tests on various versions of Airflow - in parallel. If this is the case, the value of this variable ensures that resources - that are created during the tests will not interfere with each other. +An environment ID is a unique value across multiple/different executions of a system +test. It is required because the CI environment may run the tests on various versions +of Airflow in parallel. If this is the case, the value of this variable ensures that +resources that are created during the test execution do not interfere with each other. - The value is used as part of the name for resources which have different requirements. - For example: an S3 bucket name can not use underscores, but an Athena table name can not - use hyphens. In order to minimize conflicts, this variable should be a randomized value - using only lowercase letters A-Z and digits 0-9, and start with a letter. +The value is used as part of the name for resources which have different requirements. +For example, an S3 bucket name can not use underscores whereas an Athena table name can +not use hyphens. In order to minimize conflicts, this variable should be a randomized +value using only lowercase letters A-Z and digits 0-9, and start with a letter. ## Settings for specific tests Amazon system test files are designed to be as self-contained as possible. They will contain any sample data and configuration values which are required, and they will create and tear -down any required infrastructure. Some tests will require an IAM Role ARN, and the requirements -for those Roles should be documented inside the docstring of the test file. +down any required infrastructure. Some tests will require an IAM Role ARN or other external +variables that can not be created by the test itself, and the requirements for those values +should be documented inside the docstring of the test file. + +### Contributing + +Please see [CONTRIBUTING.md](CONTRIBUTING.md) for community guidelines and best practices for +writing AWS System Tests for Apache Airflow.