This repo provides a Harlequin adapter for Athena.
harlequin-athena
depends on harlequin
, so installing this package will also install Harlequin.
To install this adapter into an activated virtual environment:
pip install harlequin-athena
poetry add harlequin-athena
If you do not already have Harlequin installed:
pip install harlequin-athena
If you would like to add the Athena adapter to an existing Harlequin installation:
pipx inject harlequin harlequin-athena
Alternatively, you can install Harlequin with the athena
extra:
pip install harlequin[athena]
poetry add harlequin[athena]
pipx install harlequin[athena]
This adapter uses pyathena as the client for Athena, it will use PyArrow as the results cursor.
Important
There's currently an issue when setting the limit for a query and running a query which returns duplicated columns.
The duplicated columns will not be properly created, only the last added column will show in the result set.
This is also an issue when running fetchall()
however, this has been easily fixed by using ArrowCursor.as_arrow()
Config options are:
workgroup
: Athena workgroup to run queries on. If this is specified and the workgroup is set to override client settings thens3_staging_dir
is ignored. This is the prefered option.s3_staging_dir
: The S3 bucket where Athena stores query results and other metadata.result_reuse_enable
: Whether to cache results or not. The query string must match exactly for this to work. Enabled by default.result_reuse_minutes
: How long to cache results for in minutes. 60 minutes by default.unload
: Whether to use the unload option in PyAthena. This will convertSELECT
statements to unload statements which will persist the results tos3_staging_dir
as Snappy compressed Parquet files. It will speed up queries, at the cost of higher overal AWS costs.region_name
: AWS region name.aws_access_key_id
: AWS access key id. This is not recommended, use environment variables or credentials file instead.aws_secret_access_key
: AWS secret access key. This is not recommended, use environment variables or credentials file instead.aws_session_token
: AWS session token when using assumed roles. This is not recommended, use environment variables or credentials file instead.
While you can pass credentials on the options when running harlequin with this adapter, you should instead setup credentials in a way that lets boto3's credentials provider chain find them, see the AWS docs
An AWS account is needed to run the tests, the account should have the following AWS resources:
- A S3 bucket to store Athena query results
- Optional: an Athena workgroup.
Depends on the following environment variables:
export AWS_ATHENA_WORKGROUP=my-workgroup
export AWS_ATHENA_S3_STAGING_DIR=s3://your_s3_bucket/path
export AWS_DEFAULT_REGION=eu-central-1
Credentials for AWS should also be set as environment variables:
export AWS_ACCESS_KEY_ID="<your_access_key>"
export AWS_SECRET_ACCESS_KEY="<your_secret_access_key>"
# optionally you can also set a session token
# export AWS_SESSION_TOKEN="<your_token>"